Token Efficiency Guidelines¶
Qin Yu, 21 May 2026, updated 2 Jun 2026
Token efficiency means spending tokens on useful context and reasoning while removing avoidable overhead.
Keep Active Context Small¶
- Lead with the outcome — state the desired result, scope, constraints, stop condition, validation steps, and expected summary. Short prompts are useful; vague prompts are expensive.
- Treat context as working memory, not storage — attach only relevant files and URLs. Start a new session for unrelated work.
- Keep IDE context focused — Copilot can use open files for inline suggestions, so close irrelevant tabs.
- Reference specific files — reference durable information in files or docs instead of pasting large blocks of text. Avoid screenshots unless the visual information matters.
- Keep code modular — split oversized files and functions along meaningful boundaries.
- Work in phases — use Research → Plan → Implement → Verify so each phase receives only the context it needs.
- Isolate noisy work — use sub-agents for exploration, parallel tasks, and specialised reviews. Return concise findings.
Keep Instructions and Tools Lean¶
- Use the right configuration layer — keep mandatory repo rules in versioned instruction files, reusable workflows in skills, specialised behaviour in scoped agents, deterministic checks in hooks or CI, and temporary details in the session prompt.
- Keep persistent instructions short and current — record only durable conventions and recurring fixes. Review and prune them periodically. Use memory as a convenience, not as the source of mandatory policy.
- Load specialised context only when needed — use skills and file references for progressive disclosure instead of placing every workflow and reference in the default context.
- Remove unused tools — every unnecessary MCP server or tool adds overhead. Prefer a direct API, native CLI, or script for narrow, stable, high-frequency operations.
Control Tool Output¶
- Think in code, not prose — analyse large files or datasets with a script when possible. Scripts are deterministic, reusable, and produce compact output.
- Request bounded outputs — ask for the exact artifact you need, such as one function or a concise diff summary.
- Prefer structured output — use summaries, tables, and focused JSON instead of raw logs. Compress verbose shell output with tools such as rtk-ai/rtk.
- Collapse related tool calls — batch independent reads and actions where supported. Plugins such as jsturtevant/copilot-codeact-plugin can help.
Match the Workflow to the Task¶
- Route by task shape — use cheaper models for extraction, formatting, search-heavy work, and simple transforms. Reserve stronger models for ambiguity, architecture, difficult debugging, and high-risk review.
- Judge cost by completed work — account for retries, long loops, repeated tool output, reasoning tokens, cache costs, and image tokens rather than headline model prices alone.
- Parallelise independent work — run separate reviews, comparisons, or checks concurrently when their results do not depend on each other.
- Add critique loops selectively — use Generate → Critique → Revise → Validate when quality matters more than latency.
Validate and Improve¶
- Catch errors early with deterministic guardrails — use tests, linters, type checkers, security scanners, hooks, and CI so mistakes do not compound across long workflows.
- Analyse usage regularly — use Codex
/status; Copilot CLI/context,/usage,/session, and experimental/chronicle tips; or Claude Code/context,/usage, and/cost. - Benchmark representative tasks — use evals when changing prompts, models, tools, or routing rules. Different models respond differently to context ordering and instruction style.
- Treat misses as incidents — when a workflow fails, update the relevant instruction, skill, hook, eval, or routing rule instead of repeatedly retrying the same approach.
API Workflow Optimisations¶
When building or scripting agent workflows via API:
- Put stable prompt prefixes first — place invariant instructions before variable task data to improve prompt-cache reuse.
- Keep tool contracts compact — design tools to return concise, structured results rather than verbose text.
References
- OpenAI prompt caching
- OpenAI Codex CLI slash commands
- GitHub Copilot prompt engineering
- GitHub Copilot content exclusion
- GitHub Copilot CLI command reference
- GitHub Copilot CLI best practices
- GitHub Copilot Chronicle
- VS Code context management
- Claude Code commands
- Anthropic: Writing tools for agents