Skip to content

Token Efficiency Guidelines

Qin Yu, 21 May 2026, updated 2 Jun 2026


Token efficiency means spending tokens on useful context and reasoning while removing avoidable overhead.

Keep Active Context Small

  • Lead with the outcome — state the desired result, scope, constraints, stop condition, validation steps, and expected summary. Short prompts are useful; vague prompts are expensive.
  • Treat context as working memory, not storage — attach only relevant files and URLs. Start a new session for unrelated work.
  • Keep IDE context focused — Copilot can use open files for inline suggestions, so close irrelevant tabs.
  • Reference specific files — reference durable information in files or docs instead of pasting large blocks of text. Avoid screenshots unless the visual information matters.
  • Keep code modular — split oversized files and functions along meaningful boundaries.
  • Work in phases — use Research → Plan → Implement → Verify so each phase receives only the context it needs.
  • Isolate noisy work — use sub-agents for exploration, parallel tasks, and specialised reviews. Return concise findings.

Keep Instructions and Tools Lean

  • Use the right configuration layer — keep mandatory repo rules in versioned instruction files, reusable workflows in skills, specialised behaviour in scoped agents, deterministic checks in hooks or CI, and temporary details in the session prompt.
  • Keep persistent instructions short and current — record only durable conventions and recurring fixes. Review and prune them periodically. Use memory as a convenience, not as the source of mandatory policy.
  • Load specialised context only when needed — use skills and file references for progressive disclosure instead of placing every workflow and reference in the default context.
  • Remove unused tools — every unnecessary MCP server or tool adds overhead. Prefer a direct API, native CLI, or script for narrow, stable, high-frequency operations.

Control Tool Output

  • Think in code, not prose — analyse large files or datasets with a script when possible. Scripts are deterministic, reusable, and produce compact output.
  • Request bounded outputs — ask for the exact artifact you need, such as one function or a concise diff summary.
  • Prefer structured output — use summaries, tables, and focused JSON instead of raw logs. Compress verbose shell output with tools such as rtk-ai/rtk.
  • Collapse related tool calls — batch independent reads and actions where supported. Plugins such as jsturtevant/copilot-codeact-plugin can help.

Match the Workflow to the Task

  • Route by task shape — use cheaper models for extraction, formatting, search-heavy work, and simple transforms. Reserve stronger models for ambiguity, architecture, difficult debugging, and high-risk review.
  • Judge cost by completed work — account for retries, long loops, repeated tool output, reasoning tokens, cache costs, and image tokens rather than headline model prices alone.
  • Parallelise independent work — run separate reviews, comparisons, or checks concurrently when their results do not depend on each other.
  • Add critique loops selectively — use Generate → Critique → Revise → Validate when quality matters more than latency.

Validate and Improve

  • Catch errors early with deterministic guardrails — use tests, linters, type checkers, security scanners, hooks, and CI so mistakes do not compound across long workflows.
  • Analyse usage regularly — use Codex /status; Copilot CLI /context, /usage, /session, and experimental /chronicle tips; or Claude Code /context, /usage, and /cost.
  • Benchmark representative tasks — use evals when changing prompts, models, tools, or routing rules. Different models respond differently to context ordering and instruction style.
  • Treat misses as incidents — when a workflow fails, update the relevant instruction, skill, hook, eval, or routing rule instead of repeatedly retrying the same approach.

API Workflow Optimisations

When building or scripting agent workflows via API:

  • Put stable prompt prefixes first — place invariant instructions before variable task data to improve prompt-cache reuse.
  • Keep tool contracts compact — design tools to return concise, structured results rather than verbose text.
References