Skip to content

Building Better Agents with Fewer Tokens

Qin Yu, 21 May 2026, updated 1 Jun 2026


Practical guidelines for using agentic AI effectively.

We are now context engineers.

TL;DR: Agent output quality — and most of your token spend — is determined by the context window and the harness that builds it. The model matters, but the durable gains come from context discipline, model routing, sub-agent isolation, deterministic validation, and ongoing maintenance of your agent configuration.

Quick Start: 8 Things You Can Do Today

  1. Lead with the outcome — state the desired result, constraints, stop condition, and validation steps. Short prompts are good; underspecified prompts are not.
  2. Choose the good-enough model — route extraction, formatting, and search-heavy work to cheaper models; reserve frontier models for ambiguity, architecture, synthesis, and review.
  3. Work in phases — Research → Plan → Implement → Verify, with a separate session or sub-agent per phase.
  4. Treat context as working memory, not storage — keep the hot context small, current, and relevant; put durable knowledge in files, docs, or skills.
  5. Isolate noisy work in sub-agents — send read-heavy or parallel tasks to separate context windows and pass only summaries back.
  6. Add deterministic guardrails — tests, linters, type checkers, security scanners, and hooks catch errors before they compound.
  7. Keep agent configuration leanAGENTS.md/CLAUDE.md, skills, prompts, and MCP servers should be short, current, and versioned.
  8. Treat misses as incidents — when an agent fails, update the relevant instruction, skill, hook, eval, or routing rule instead of retrying.

The complete guide for token efficiency: Token Efficiency Guidelines.


Intro - The Mental Model: The Agent Is a Harness

An agent is not just a model. It is a harness around a model.

The harness assembles the working context. It decides or mediates:

  • which instruction layers are loaded,
  • which files are read or retrieved,
  • which tool outputs enter the conversation,
  • which tools are available,
  • which memories are recalled,
  • when context is compacted,
  • which model is used,
  • which actions require approval,
  • and which validation steps run after changes.

You shape these decisions by what you attach, select, open, mention, permit, configure, and ask the agent to do. You constrain the search space; the harness turns that into model context.

flowchart LR
    U[You and your project] --> H[Agent harness]
    H --> I[Instruction layers]
    H --> T[Tools and MCP]
    H --> M[Memory and session state]
    H --> C[Compaction and context management]
    H --> V[Validation and approvals]
    I --> L[LLM]
    T --> L
    M --> L
    C --> L
    V --> L
    L --> H
    H --> U

GitHub Copilot, OpenAI Codex, and Claude Code expose different interfaces, but they are variations of this same architecture. The model is only one component; quality comes from how the harness assembles context, tools, state, and validation around it.


Further Reading

  • 1. Quality, Cost, and the Context Window


    How the context window shapes output quality and token spend — and how to manage both.

    Read

  • 2. Agent Configuration


    AGENTS.md, skills, prompts, MCP servers, and hooks — what they do and how to keep them healthy.

    Read

  • 3. Orchestration and Model Selection


    When to use sub-agents, how to route models, and how to structure multi-agent workflows.

    Read

  • 4. Token Efficiency Guidelines


    A concise checklist for reducing token overhead while preserving quality.

    Read

Talks used in the making of this guide
Useful official references