Orchestration Patterns and Model Selection¶
Qin Yu, 21 May 2026, updated 29 May 2026
Orchestration Patterns¶
Primarily for platform engineers
These patterns are most directly applicable if you are building or configuring automated agent pipelines — for example, writing a multi-step CI workflow or a research automation script that chains multiple agent calls. If you use Copilot, Claude Code, or similar interactive tools day-to-day, treat this as background reading. The mental models are useful, but you won't wire them up yourself. The exception is Prompt Chaining (pattern 1): the Research → Plan → Implement → Verify discipline applies to any interactive session.
Sub-agents are not just a speed feature. They are a way to control context pollution, parallelism, and failure boundaries.
1. Prompt Chaining¶
Use for fixed workflows:
Research → Plan → Implement → Verify
Each stage has a clear input and output contract.
2. Routing¶
Classify the task first, then choose a path:
- cheap model for extraction,
- mid-tier model for routine coding,
- frontier model for ambiguous design.
3. Parallelisation¶
Use for independent work:
- review three modules independently,
- compare multiple implementation options,
- run independent sub-agents for tests, docs, and security.
4. Orchestrator–Workers¶
Use when decomposition is not known in advance. The main agent plans, delegates, and integrates.
5. Evaluator–Optimizer¶
Use when quality matters more than latency:
Generate → Critique → Revise → Validate
6. Advisor Strategy¶
Run a cheaper executor by default and consult a stronger model only when strategic ambiguity appears. This is one of the highest-leverage patterns for reducing cost without sacrificing quality.
Reference:
Model Selection¶
General Rule¶
Route by task shape, not by brand.
| Task shape | Good default |
|---|---|
| Formatting, renaming, simple transforms | Cheap / mini model |
| Grep-heavy research, extraction, summarisation | Cheap or mid-tier model |
| Routine coding | Mid-tier coding model |
| Ambiguous debugging | Strong reasoning model |
| Architecture and design | Frontier model |
| Security-sensitive review | Strong model plus deterministic checks |
| Long-horizon autonomous work | Strong model with tight guardrails |
Auto mode reduces operational overhead and routes by availability, latency, and task conditions. Manual override still matters when you need predictability.
Use lighter models for:
- reformatting,
- small edits,
- documentation cleanup,
- read-heavy inspection.
Use stronger models for:
- subtle bugs,
- architectural changes,
- ambiguous requirements,
- high-risk reviews.
Copilot pricing and model accounting change frequently. Check the current billing page rather than relying on old heuristics or premium-request multipliers.
References:
Codex makes the configuration stack relatively explicit:
AGENTS.mdfor durable instructions,config.tomlfor profiles and settings,- skills for progressive disclosure,
- explicit sub-agents for isolated work,
- compaction and prompt caching in API-style workflows.
Example profile structure:
model = "gpt-5.5"
[features]
memories = false
[profiles.fast]
model = "gpt-5.4-mini"
[profiles.deep]
model = "gpt-5.5"
[memories]
use_memories = false
generate_memories = false
disable_on_external_context = true
Use the fast profile for low-risk reading or extraction. Use the deep profile for complex implementation, architecture, or debugging.
References:
Claude Code rewards strict separation of concerns:
CLAUDE.mdfor project memory,- hooks for deterministic actions,
- skills for reusable workflows,
- sub-agents for isolated specialised work,
- MCP only when it earns its overhead.
A practical model-routing pattern:
- Haiku-class models for cheap scanning or read-only work,
- Sonnet-class models for routine coding,
- Opus-class models for difficult long-horizon reasoning or high-risk review.
References:
Reasoning Tokens and Hidden Costs¶
Primarily for API users
These cost factors matter most if you are calling LLM APIs directly and paying per token. If you use Copilot, Claude Code, or another product with a flat subscription, you have no direct visibility or control over them. The practical takeaway for everyone: a model that looks cheap can become expensive in practice if it loops, overuses tools, or repeatedly retries — so judge models by task completion quality, not headline price.
Not all costs show up as final output text. Watch for:
- reasoning tokens,
- extended thinking budgets,
- cache write costs,
- cache hit savings,
- image tokens,
- repeated tool outputs,
- long retry loops.
The cheapest visible model becomes expensive when it loops, overuses tools, or repeatedly fails.
References: