Skip to content

Orchestration Patterns and Model Selection

Qin Yu, 21 May 2026, updated 29 May 2026


Orchestration Patterns

Primarily for platform engineers

These patterns are most directly applicable if you are building or configuring automated agent pipelines — for example, writing a multi-step CI workflow or a research automation script that chains multiple agent calls. If you use Copilot, Claude Code, or similar interactive tools day-to-day, treat this as background reading. The mental models are useful, but you won't wire them up yourself. The exception is Prompt Chaining (pattern 1): the Research → Plan → Implement → Verify discipline applies to any interactive session.

Sub-agents are not just a speed feature. They are a way to control context pollution, parallelism, and failure boundaries.

1. Prompt Chaining

Use for fixed workflows:

Research → Plan → Implement → Verify

Each stage has a clear input and output contract.

2. Routing

Classify the task first, then choose a path:

  • cheap model for extraction,
  • mid-tier model for routine coding,
  • frontier model for ambiguous design.

3. Parallelisation

Use for independent work:

  • review three modules independently,
  • compare multiple implementation options,
  • run independent sub-agents for tests, docs, and security.

4. Orchestrator–Workers

Use when decomposition is not known in advance. The main agent plans, delegates, and integrates.

5. Evaluator–Optimizer

Use when quality matters more than latency:

Generate → Critique → Revise → Validate

6. Advisor Strategy

Run a cheaper executor by default and consult a stronger model only when strategic ambiguity appears. This is one of the highest-leverage patterns for reducing cost without sacrificing quality.

Reference:


Model Selection

General Rule

Route by task shape, not by brand.

Task shape Good default
Formatting, renaming, simple transforms Cheap / mini model
Grep-heavy research, extraction, summarisation Cheap or mid-tier model
Routine coding Mid-tier coding model
Ambiguous debugging Strong reasoning model
Architecture and design Frontier model
Security-sensitive review Strong model plus deterministic checks
Long-horizon autonomous work Strong model with tight guardrails

Auto mode reduces operational overhead and routes by availability, latency, and task conditions. Manual override still matters when you need predictability.

Use lighter models for:

  • reformatting,
  • small edits,
  • documentation cleanup,
  • read-heavy inspection.

Use stronger models for:

  • subtle bugs,
  • architectural changes,
  • ambiguous requirements,
  • high-risk reviews.

Copilot pricing and model accounting change frequently. Check the current billing page rather than relying on old heuristics or premium-request multipliers.

References:

Codex makes the configuration stack relatively explicit:

  • AGENTS.md for durable instructions,
  • config.toml for profiles and settings,
  • skills for progressive disclosure,
  • explicit sub-agents for isolated work,
  • compaction and prompt caching in API-style workflows.

Example profile structure:

model = "gpt-5.5"

[features]
memories = false

[profiles.fast]
model = "gpt-5.4-mini"

[profiles.deep]
model = "gpt-5.5"

[memories]
use_memories = false
generate_memories = false
disable_on_external_context = true

Use the fast profile for low-risk reading or extraction. Use the deep profile for complex implementation, architecture, or debugging.

References:

Claude Code rewards strict separation of concerns:

  • CLAUDE.md for project memory,
  • hooks for deterministic actions,
  • skills for reusable workflows,
  • sub-agents for isolated specialised work,
  • MCP only when it earns its overhead.

A practical model-routing pattern:

  • Haiku-class models for cheap scanning or read-only work,
  • Sonnet-class models for routine coding,
  • Opus-class models for difficult long-horizon reasoning or high-risk review.

References:

Reasoning Tokens and Hidden Costs

Primarily for API users

These cost factors matter most if you are calling LLM APIs directly and paying per token. If you use Copilot, Claude Code, or another product with a flat subscription, you have no direct visibility or control over them. The practical takeaway for everyone: a model that looks cheap can become expensive in practice if it loops, overuses tools, or repeatedly retries — so judge models by task completion quality, not headline price.

Not all costs show up as final output text. Watch for:

  • reasoning tokens,
  • extended thinking budgets,
  • cache write costs,
  • cache hit savings,
  • image tokens,
  • repeated tool outputs,
  • long retry loops.

The cheapest visible model becomes expensive when it loops, overuses tools, or repeatedly fails.

References: