"Multi-agent" is not an architecture. It is a category.
Inside the category, there are three actual structures. They are meaningfully different in how they fail, how they scale, and what they cost to debug.
Using the same word for all three is why design conversations stay vague for too long, and why postmortems often cannot identify which structural choice caused the failure.
The terminology in the field is fragmented. OpenAI Swarm named one pattern. LangGraph named another. AutoGen named a third. The frameworks use different vocabulary for the same concepts, and sometimes the same vocabulary for different concepts.
The productive response is not to memorize framework terminology. It is to learn the three underlying structures and map every system you encounter onto one of them before you start building.
The picture
Manager-worker. One orchestrator receives the task, decomposes it, dispatches units of work to N workers, collects results, and synthesizes. The workers do not communicate with each other. All coordination flows through the manager. This is the pattern Anthropic's research system used and the pattern PL parallel-triage uses.
The manager is the single point of coordination, which is both the strength — coherent synthesis, single audit trail — and the weakness: if the synthesis step is complex, the manager becomes the bottleneck and the parallelism gain narrows.
Pipeline. Sequential stages, each agent specialized, output of stage K feeds input of stage K+1. No orchestrator above the stages — each agent hands off to the next. This is how the Ostronaut content pipeline works: ingest hands to chunking, chunking hands to embedding, embedding hands to indexing, indexing hands to grading.
The strength is specialization: each stage has a narrow, clean job with a small prompt and a focused tool set. The weakness is that failures propagate forward and are only visible downstream. A bad chunk produces a bad embedding produces a bad index, and the grader finds the problem three steps after it was introduced.
Peer-to-peer. N agents collaborating without a single coordinator. Agents communicate with each other directly, negotiate, critique, and revise. AutoGen popularized this pattern. The promise is that agents can challenge each other's reasoning, producing higher-quality outputs through dialogue.
The reality is that peer-to-peer systems are the hardest to debug by a significant margin. When three agents produce a synthesized output that none of them would have produced individually, tracing how the system got there requires reading N transcripts and reconstructing the conversation graph. Do not start here.
Why it matters now
OpenAI Swarm (2024) popularized the handoff pattern and put the manager-worker structure in front of a large audience. LangGraph popularized state-machine orchestration — a formalized version of the pipeline pattern. AutoGen popularized peer dialogue.
The terminology is fragmented; the underlying patterns have not changed.
The practical problem is that teams often inherit a pattern from their framework choice rather than choosing a pattern deliberately. LangGraph makes pipeline and state-machine designs natural, so teams using LangGraph build pipelines even when manager-worker would be cleaner. Framework gravity is a real force.
Knowing the patterns lets you resist it. Choosing a framework to match the pattern is better engineering than choosing a pattern to match the framework.
In the next lesson, we go one level deeper: for any pattern you choose, the failure surface between agents is the handoff, and the handoff deserves the same rigor as the pattern selection itself.
A source you should trust
- OpenAI Swarm reference (2024). The clearest worked example of the manager-worker handoff pattern. The code is deliberately simple; the design is instructive.
- AutoGen documentation (Microsoft, 2024). The clearest worked example of the peer-to-peer pattern and the best treatment of agent conversation protocols.
- LangGraph documentation. The clearest worked example of the pipeline and state-machine pattern. The graph metaphor maps onto pipelines naturally; use it to see where the stages are.
A recipe
Pick the pattern by job-to-be-done, not by framework:
- Manager-worker when: the task is a fan-out with independent units and the synthesis is relatively cheap. The orchestrator's job is dispatch and collection. Use this for any task that passes the four-question test from Lesson 3.
- Pipeline when: stages are sequential and each stage is specialized enough to benefit from its own prompt and tool surface. The specialization must be real — meaningfully different prompts and tools — not just cosmetic separation.
- Peer-to-peer when: agents need to genuinely negotiate or critique each other, and the quality gain from disagreement is worth the debugging cost. This is the pattern to use last.
For each pattern, before committing: name the failure mode.
For manager-worker: manager bottleneck and context loss at synthesis. For pipeline: error propagation and late failure detection. For peer-to-peer: irreconcilable disagreement and untraceable output. If you cannot name the failure mode, you have not understood the pattern well enough to use it safely.
The smell of it going wrong
The most common error is picking the pattern from the framework's defaults rather than from the task. A team using LangGraph builds a state machine because the framework makes it natural, not because the task is a state machine.
A "peer-to-peer" system that has one agent doing most of the work is really manager-worker with an asymmetric workload. The cleanup is to acknowledge the asymmetry and make the coordinator explicit, which clarifies the design and makes debugging tractable.
A "pipeline" with one stage doing the work of three is a specialization failure. That stage is too broad — it is accumulating responsibilities that belong in separate stages. This shows up as a single stage that is slower than all others and harder to test in isolation.
The manager in a manager-worker pattern that becomes a reasoning bottleneck is the most common scaling failure in real systems. The manager was designed to synthesize results, but synthesis turned out to be as complex as the original task. At that point you have deferred the hard work, not eliminated it.
A judgment call from real work
PL parallel-triage is a clean manager-worker design. The parent Claude Code session is the manager: it owns the issue list, spawns one worker per issue, monitors completion, and reviews the resulting PRs. Workers are subagents running in isolated worktrees. They do not communicate with each other. All coordination flows through the parent.
That clarity — one pattern, named, with explicit manager and worker roles — is what makes the design debuggable. When something goes wrong, the question is always one of two things: was it the manager's synthesis, or was it one worker's output? The pattern makes that question answerable in minutes, not hours.
The pl-judgment MCP server is a single-agent design. No swarm pattern applies because the task does not decompose naturally into independent units. The pattern selection here is "none" — which is a valid answer to the pattern question.
The Ostronaut content pipeline has pipeline elements: ingest → chunk → embed → index → grade. Each stage is specialized; each handoff is well-defined; no stage needs to reach back to an earlier stage.
Three different systems, three different decisions, all deliberate. The deliberateness is the point. The pattern was chosen before the framework, not derived from it.
Rules from this lesson
-
Name the pattern before you name the framework; "multi-agent" is too coarse a category to design with or debug against.
-
Pick the pattern by job-to-be-done: manager-worker for fan-out, pipeline for sequential specialization, peer-to-peer for negotiated quality.
-
Peer-to-peer is the hardest to debug; exhaust the other two patterns before adopting it.
Apply lever, risk, rollback. Lever: naming the pattern before you build makes every downstream decision — handoff contract, state design, quota plan — faster and more specific. Risk: framework gravity can pull you toward a pattern the framework makes easy but your task does not need. Rollback: switching patterns early is cheap; switching after the system is built is a rebuild.