Skip to content

Quality, Cost, and the Context Window

Qin Yu, 21 May 2026, updated 29 May 2026


Why Quality Matters More Than Cost

Instead of counting tokens, make every token count.

Token cost is real, but wasted work usually costs more than expensive tokens. A cheap run is not cheap if it produces a wrong patch, wastes review time, or sends the agent down a long recovery path.

Small miss rates compound across long workflows. If each step has a 1% chance of going wrong, a 50-step run has roughly a 40% chance of missing somewhere:

\[ 1 - 0.99^{50} \approx 40\% \]

A cheap but uncontrolled long agent loop can cost more than several short, well-scoped, higher-quality runs. Fewer steps, tighter context, and early error detection matter more than raw model power.

Deterministic Guardrails

If an early step goes wrong and nothing catches it, the agent continues down the wrong path. Catch errors early with deterministic controls:

  • Unit tests — verify correctness at the function level before the agent proceeds.
  • Linters and type checkers — catch structural and type errors immediately.
  • Security scanners — flag vulnerabilities before they propagate into downstream work.
  • Hooks — run non-negotiable checks automatically after edits or before commits.
  • Evals — replay representative tasks when changing prompts, models, tools, or routing rules.

Treat these as infrastructure, not optional polish. Better test coverage enables more aggressive automation.

References:


Understanding the Context Window

Provide as little context as possible, but as much as required.

An LLM is a next-token predictor. Its output quality depends on what it sees in the context window. The context window is shaped by the agent harness, not only by the base model:

Background reading

The diagram below shows two distinct boundaries. You control what you send to the harness — prompts, files, instructions, skills, MCP tools, and so on. The harness then decides how to assemble those inputs into the model's context window: what to include, how to order it, when to compact, and which model to call. Understanding this boundary explains why context discipline and prompt structure matter so much.

sequenceDiagram
  autonumber
  participant You as You<br/>+<br/>Your Project
  participant Agent as Agent<br/>aka<br/>Harness
  participant LLM

  loop Per task

  rect rgb(200, 150, 255)
  Note right of You: Prompts,<br/>Files,<br/>Instructions,<br/>Skills,<br/>MCPs,<br/>...
  You->>Agent: Send ↑ to
  end

  Note over Agent: e.g.<br/>VS Code Chat, or<br/>Copilot CLI, or<br/>Copilot Cloud Agent, or<br/>Claude Code, or<br/>OpenAI Codex

  loop Per turn (compaction when context window fills)
  Agent->>LLM: Send context to
  Note over LLM: e.g.<br/>GPT-5.5, or<br/>Claude Opus 4.7
  LLM-->>Agent: Text response
  end

  Agent-->>You: Result

  end

You control what enters the harness: prompts, files/URLs, instructions, skills, MCP tools, screenshots, and sometimes memories. That is the primary lever for both quality and cost.

Context Rot

Avoid treating the context window as free storage. Long-context systems have known failure modes:

  1. Position bias — models often retrieve better from the beginning and end than the middle.
  2. Irrelevant-token distraction — unrelated content competes with useful content.
  3. Stale-session drift — old assumptions, failed attempts, and tool outputs remain influential.
  4. Compaction loss — automatic summaries are useful, but they can omit details that later become important.

The old rule of thumb "keep context below 60%" is a useful caution, not a law of physics. Treat context as scarce working memory: keep noisy exploration outside the main conversation, use files for durable information, summaries for intermediate findings, and sub-agents for isolated investigation. Start a new chat or session when switching to unrelated work.

References:

Context Discipline

Context engineering is curating and maintaining the optimal set of tokens in the agent loop. The harness manages prompt caching, tool search, memory, and compaction, but you are responsible for what you add yourself.

Irrelevant context degrades output quality and increases cost.

  • Attach only the files and URLs directly relevant to the current task.
  • Open a new chat window for each distinct task.
  • Use sub-agents to isolate task-specific context into separate context windows.
  • Prefer compact summaries over raw logs.
  • Prefer file references and targeted reads over pasting large blobs.
  • Avoid screenshots unless the visual information is genuinely needed.

Prompt Engineering

Prompt engineering is writing and organising instructions for optimal LLM output. It is a subset of context engineering, but its influence on output quality is outsized.

A good prompt usually specifies:

  1. Outcome — what result you want.
  2. Scope — what the agent may and may not change.
  3. Inputs — which files, URLs, tickets, logs, or docs matter.
  4. Process — whether to research, plan, implement, or review.
  5. Stop condition — when the agent should stop.
  6. Validation — how success should be checked.
  7. Output contract — what summary you expect at the end.

Politeness tokens are not the problem. Vagueness is.

Working in Phases

Divide complex work into distinct phases, each with its own context window:

  1. Research — gather information, read documentation, explore the codebase. Do not edit.
  2. Plan — synthesise findings into a concrete, reviewable plan.
  3. Implement — execute against the plan with focused context.
  4. Verify — run deterministic checks and summarise residual risks.

Use sub-agents for finer isolation: each sub-task gets its own context window, preventing earlier phases from polluting implementation decisions.

A reusable phase prompt set:

Research mode

Read only. Do not edit files.

Task:
<task>

Scope:
<files, directories, issue, logs, docs, or constraints>

Return:
- relevant files/functions and why they matter
- current behaviour, with evidence
- constraints and invariants
- likely failure modes
- missing information
- recommended next step
Plan mode

Do not edit files.

Use the research findings to produce a bounded implementation plan.

Return:
- goal
- assumptions
- files likely to change
- ordered steps
- validation commands
- risks and rollback notes
- stop conditions
Implement mode

Execute only the approved plan.

Rules:
- stay within scope
- do not add dependencies unless explicitly approved
- stop if the plan is wrong or a blocker appears
- run relevant validation commands

Return:
- touched files
- what changed
- validation results
- unresolved risks