Two essays, same year, opposite conclusions. Both are right.
That is not a hedge. It is the actual lesson.
Cognition's "Don't Build Multi-Agents" lands like a warning shot: context fragmentation is real, handoff costs are real, and most multi-agent designs fail for reasons the architecture diagram never shows.
Anthropic's research-system essay lands like a counter-argument: here is a class of tasks where fan-out is not optional, and here is how they made it work in production.
The productive reading is not which one wins. It is: which task regime am I in, and which essay applies?
The 2024 multi-agent hype cycle produced a generation of designs that skipped this question entirely. Teams pulled in LangGraph or AutoGen, wired agent A to call agent B, deployed, and watched the system produce inconsistent outputs on tasks where one agent would have been cheaper and more coherent.
The failure was not a framework failure. It was a regime failure. The architects had not asked which essay covered their task type.
The picture
Imagine a two-column table. On the left, Cognition's regime: tasks where context accumulates across steps, where a single agent can hold everything in one session, where the coordination overhead of a second agent exceeds the specialization benefit.
On the right, Anthropic's regime: tasks where the work is genuinely parallel, where independent agents explore separate branches, where synthesis happens at the end rather than in the middle.
Below the table, a quadrant. The x-axis is sequential-to-parallel task structure. The y-axis is shared-to-independent context. Cognition's regime lives in the top-left: sequential work on shared context. Anthropic's regime lives in the bottom-right: parallel work on independent context.
The dangerous zone is the top-right quadrant — parallel work on shared context — where teams reach for fan-out but context dependency makes every handoff lossy. Most multi-agent disasters happen there, usually because the designers drew a diagram that looked like the bottom-right but built something that behaved like the top-right.
Most real product tasks cluster in the top-left and bottom-right. Naming which quadrant you are in before you write code is the regime decision. Skipping it is the regime error.
Why it matters now
The hype cycle has moved to agents. Every product company is being asked to "add an agent" somewhere. The engineers proposing it have usually read one of the two essays, or neither.
They have not read them side by side, as a pair in productive tension. That omission is the single most common architecture defect in AI products being built in 2024–2026: multi-agent chosen before the regime question is answered.
Both essays exist because smart people had to clean up after this error at scale. Each is a reaction to watching the wrong architecture applied to the wrong task type.
There is also a temporal point: both essays were published in 2024, after the first generation of multi-agent systems had shipped and failed in characteristic ways. They are postmortems in essay form. The arguments are grounded, not theoretical.
Reading them together — in one sitting, with a specific product task in mind — is one of the cheapest educations available in the field right now.
The next nine lessons build on the regime decision. None of them make sense without it.
A source you should trust
- Cognition, "Don't Build Multi-Agents" (2024). The primary warning. Short, dense, earned by experience building Devin. Read it before you open any agent framework documentation.
- Anthropic, "How we built our multi-agent research system" (2024). The counter-essay with a worked example. Critically, Anthropic does not argue that multi-agent is always good — they argue that a specific task shape made it the right choice. That scoping is the point.
- OpenAI Swarm (2024). A lightweight reference implementation of the handoff pattern. Read alongside both essays as a concrete artifact of what the manager-worker pattern looks like in actual code.
- AutoGen documentation (Microsoft, 2024). The peer-to-peer pattern articulated clearly, useful as a counterpoint to the hierarchical designs the other sources emphasize.
A recipe
A read-them-together protocol that takes about two hours and produces a real design artifact:
- Read both essays in one sitting, back to back, Cognition first, Anthropic second.
- For each essay, list the task types its argument applies to. Be concrete: not "research tasks" but "tasks where N independent documents need reading and the synthesis is an aggregate, not a dependency chain."
- For each task type your product needs to support, decide which essay's regime applies. Write it out, not just in your head.
- For any task where you want multi-agent but the regime is ambiguous, write the single-agent counterfactual: how would one agent do this job, what does it cost, what does it miss.
- In your design doc, write the disagreement-resolution in one sentence: "We are choosing multi-agent here despite Cognition's warning because our task is embarrassingly parallel and context does not accumulate across agents."
- If you cannot write step 5 cleanly, you are not ready to build.
The smell of it going wrong
The most reliable early signal is a design doc that cites only one of the two essays. If it cites only Anthropic's, the team is probably overconfident about parallelism. If it cites only Cognition's, the team is probably dismissing a fan-out task that would genuinely benefit from parallel exploration.
The second signal is the justification collapsing to "it sounds cleaner with two agents." Cleaner for whom? Cleaner to draw on a whiteboard is not the same as cleaner to debug at midnight when agent B is producing outputs that contradict agent A's assumptions.
The third signal is a design where no one can name what each agent owns exclusively. If the answer to "what is agent A's exclusive responsibility?" is "roughly the first half," those agents are not specialized. They are the same agent running twice with a handoff boundary inserted into the middle of a problem that does not have a natural split.
A judgment call from real work
The PL parallel-triage system was designed after reading both essays deliberately. The decision to use fan-out was a regime call: each GitHub issue is independent work, context does not accumulate across issues, and the synthesis step — reviewing the resulting PRs — is cheap relative to the per-issue exploration.
That is Anthropic's regime. The design reflects it.
What the design still had to respect from Cognition's essay: shared quota, shared identity, shared infrastructure. The agents are independent on task context but not independent on resources. That distinction — task-level independence versus resource-level independence — is what Lessons 7 and 8 cover in depth.
Ignoring it produced the quota-429 incident. The regime decision made the design coherent; it did not make the engineering trivial. You still have to engineer everything downstream correctly.
The reward for getting the regime right is that every subsequent decision — which pattern, what handoff contract, how to manage quota — has a firm conceptual foundation to build on.
Rules from this lesson
- Read both essays before writing a multi-agent design doc; the disagreement is the lesson, not either essay alone.
- Multi-agent is a regime decision, not a default architecture and not a framework selection.
- Write the disagreement-resolution explicitly in your design doc; "it sounds cleaner" does not qualify.
- Task-level independence and resource-level independence are different properties; a system can have one without the other.