Open any pitch deck from the last two years and you will find the same diagram: a single autonomous agent, sitting in the middle, reasoning its way to an answer. Open the systems those companies actually run in production and you find something far less romantic. A classifier in front. A fixed three-step pipeline behind it. A small loop bolted onto one step that genuinely needed it. The autonomous agent in the diagram is mostly absent. What replaced it is a handful of plain patterns, composed.
This is not a failure of ambition. It is what reliability looks like. Teams shipping AI features that survive real traffic have converged on a short catalogue of design patterns, and the interesting fact about it is how short it is. Five patterns cover almost everything. None is exotic. The hard part is not learning them. It is the discipline to pick the cheapest one that solves the problem and stop there.
Origin: where the catalogue came from
The reference text is Anthropic's "Building Effective Agents", published in December 2024 after their engineering team worked with dozens of groups building these systems. It did something useful: instead of describing one grand architecture, it named the small set of compositions that kept showing up, and drew a line between two kinds of system. A workflow runs language models and tools along predefined code paths. An agent lets the model direct its own process, choosing steps and tools as it goes. That distinction is worth its own piece, and it gets one in workflow or agent.
What matters for patterns is that almost everything in the catalogue is a workflow. The model does hard work inside fixed steps, but a human wrote the sequence. The single autonomous agent, the model running an open-ended loop, sits at the far end and is the rarest thing in production, not the default.
The catalogue spread fast because it described what people were already doing. OpenAI's "A Practical Guide to Building Agents" reached the same conclusions with different names, recommending teams start with a single agent and add structure only when a real failure forces it. Independent catalogues followed, from Augment Code, SitePoint, and others. When several groups working separately land on the same five shapes, the shapes are real.
Present: the five patterns and what each one costs
Each pattern below answers the same three questions. What is it. What task shape does it fit. What does it cost in latency and tokens, because every pattern past a single call costs something, and pretending otherwise is how budgets break.
Prompt chaining
Decompose a task into ordered steps, and let each model call work on the output of the last. A program sits between the steps and can check the intermediate result before passing it on, a checkpoint the original guide calls a gate.
It fits any task you can cleanly write down in advance as a fixed sequence. Generate marketing copy, then translate it. Draft a document outline, validate it against requirements, then write the full document. Legal contract review runs this way in tools like Thomson Reuters CoCounsel and Robin AI: extract the clauses, classify each by risk, then summarise for a human. Breaking one hard instruction into several narrow ones measurably lifts accuracy, because each call has one job and a smaller surface to get wrong.
The cost is linear. Three steps mean three sequential calls, so latency is the sum of all of them and tokens scale with the number of steps plus whatever the intermediate outputs carry. Errors propagate forward, and a gate only catches the failure you thought to write a check for. Chaining is the cheapest pattern past a single call, and the right default whenever the steps are knowable.
Routing
Classify the input first, then send it down a path built for that class. The classifier can be a model or a cheaper method, and each path gets its own prompt, its own tools, sometimes its own model.
It fits any workload that arrives in distinct categories that benefit from separate handling. A customer support system routes refund questions, technical issues, and account changes to different downstream flows. The most common use is cost control: send easy queries to a small fast model and reserve a frontier model for the hard ones. Intercom's Fin support agent routes this way, and a coding system might send file navigation to a cheap model and code review to an expensive one, where routing the wrong task to the wrong model can swing input cost by 5 times.
Routing is close to free. The classification step is small, and one benchmark put routing overhead under 50 milliseconds against the 2 to 15 seconds a real model call takes. The risk is concentrated in one place: a misclassification sends the request down the wrong path, and the router is a single point of failure. It usually pays for itself by cutting spend on the easy majority.
Parallelization
Run independent model calls at the same time and combine the results. It comes in two shapes. Sectioning splits a task into genuinely independent subtasks that run at once. Voting runs the same task several times and aggregates the answers.
Sectioning fits work that breaks into pieces with no dependency between them: scan a pull request for security, style, and test coverage in parallel, then merge the findings. GitHub Advanced Security runs CodeQL alongside third-party scanners like Snyk this way. Voting fits work where one sample is unreliable and consensus is safer: ask several calls whether a piece of content is appropriate, or have multiple reviewers check code for the same class of bug, and take the majority.
Parallelization buys latency and pays in tokens. Wall-clock time drops toward the slowest single branch, but you pay for every branch, so a five-way vote costs roughly five times the tokens of one call. Partial failure is a design problem you have to settle before you ship: when three of five branches return, do you wait, retry, or proceed. Use it when latency or confidence is worth a token multiple you can name.
Orchestrator-workers
A lead model receives the task, decides at runtime what subtasks it needs, delegates each to a worker model, and synthesises the results. The difference from parallelization is that the subtasks are not fixed in advance. The orchestrator invents them per input.
It fits open-ended problems where you cannot predict the decomposition. Anthropic's own multi-agent research system works this way: a lead agent breaks a research question into directions and spawns subagents to chase each one. Coding agents that change many files at once use the same shape, with Cursor's agent mode cited as an example where one worker adds tests while another updates docs.
This is where the cost turns sharp. Anthropic measured their multi-agent research system using about 15 times the tokens of a plain chat, with a single agent already at roughly 4 times. On their internal research evaluation the multi-agent setup beat single-agent Claude Opus 4 by 90.2 percent, and token usage alone explained about 80 percent of the score variance, so the spend is doing real work. But it only pays when the task is breadth-first and the value is high. Anthropic was blunt that most coding tasks have fewer truly parallel parts than research, and that the pattern fails when subagents need to share context. Coordination failure is its own large bucket: the MAST failure taxonomy from UC Berkeley, built from more than 1,600 annotated traces across seven multi-agent frameworks, puts inter-agent misalignment at roughly 37 percent of all failures, second only to specification and design problems. Reach for orchestrator-workers when one context window genuinely cannot hold the task, and not before. The wider argument is in the multi-agent debate.
Evaluator-optimizer
One model generates a response. A second model evaluates it against stated criteria and returns specific feedback. The generator revises, and the loop repeats until the evaluator passes it or a limit is hit.
It fits tasks with a clear quality bar that iteration genuinely improves. Anthropic names literary translation, where an evaluator catches nuance a first pass misses, and Anthropic's own cookbook ships a working example. Trigger.dev's translate-and-refine guide implements the same loop. The structural point, made well by AgentPatterns.ai, is that splitting generation from judgement works because a fresh critic catches what a generator, attached to its own output, will not.
The cost is roughly one extra call per round, so a three-round loop is around twice the tokens and twice the latency of a single generation. The pattern only earns that if you can write the evaluation criteria down concretely. If "good" is fuzzy, the evaluator just adds cost and an argument. Use it when quality is measurable and the task is worth a second opinion.
The pattern that beats all five
Here is the through-line, and it is the whole point. None of these patterns is the goal. The goal is the simplest system that hits the quality bar. All five patterns above are workflows in Anthropic's taxonomy, fixed code paths with model calls inside them, and that is the feature, not a limitation. A workflow is predictable, cheap, and debuggable in a way an open-ended agent is not. The autonomous agent, the model running its own loop with no human-written sequence, is the separate sixth shape, and it is the one you reach for last.
Every catalogue says this, in nearly the same words. Anthropic: find the simplest solution, add complexity only when it demonstrably improves outcomes. Augment Code: match the pattern to the problem and do not build for anticipated requirements you have not actually observed failing. Practitioner write-ups put it bluntly: if a task can be solved with prompt chaining or one well-prompted call, do that. A useful heuristic from the same field guidance: if you can still write down all the steps before the system runs, you want a workflow, not an agent.
The temptation runs the other way because the autonomous agent is the impressive artefact. It also has the worst cost profile. Loop-heavy agents routinely run 10 to 100 dollars per session once context and retries pile up. Most production systems should not go past parallelization, and the ones that do should confine the expensive pattern to the one subtask that truly needs it, inside a fixed shell.
Future and impact
Two forces are pulling on this catalogue, and they pull the same way. Models keep getting better at instructions and at staying coherent across long structured inputs, so a careful workflow now does work that looked agentic two years ago. The floor is rising, and a rising floor favours the simpler pattern. At the same time the patterns are hardening into infrastructure: orchestration frameworks ship them as named building blocks, which is convenient and also a trap, because a one-click orchestrator-workers template makes the expensive pattern as easy to drop in as the cheap one. The skill the tooling cannot give you is restraint.
The honest risk is over-engineering by reflex. Coordination failure is already one of the largest failure buckets in multi-agent systems, and every extra pattern adds a surface that can break, a cost line, and a path that is harder to trace. The catalogue is small for a reason. Picking the right pattern is mostly the discipline to pick the smallest one and resist the next.
For an enterprise, this is where an implementation partner like Perform Digital earns its keep. The work is rarely picking a model. It is reading the true shape of each task, scripting what can be scripted, reserving orchestrator-workers and evaluator loops for the subtasks that genuinely need them, and proving the choice with evaluation. Done well, the result barely looks like AI: a classifier, a short chain, one small loop where the problem demanded it. That modesty is the design pattern that survives. The same logic runs through the orchestration frameworks that package these shapes, where the question is always whether the framework's defaults match the pattern your task actually needs.
Council summary
This post argues that production AI systems run on a small, fixed catalogue of patterns, prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer, and that all five are workflows: predictable code paths with model calls inside them, not the autonomous agent of the pitch deck. It does the genuinely useful thing of pricing each pattern, in latency and tokens, so the reader can see that cost climbs sharply from a near-free router to an orchestrator-workers setup that Anthropic measured at roughly fifteen times the tokens of a plain chat. The central claim is that the real skill is restraint: pick the cheapest pattern that clears the quality bar, and confine the expensive ones to the single subtask that truly needs them. A practitioner should leave able to name the shape of any task in front of them and reach for the smallest pattern that fits, treating the autonomous agent as the option of last resort rather than the default.
Comments