The course's deliverable is a memo.
Not a diagram, not a decision in a planning meeting, not a comment in a pull request: a written memo that a colleague could read in six months and understand exactly why you made this decision, what you considered, and what would change the answer.
The reason the deliverable is written is not ceremony. It is that the act of writing forces precision at exactly the point where vague intuitions become expensive mistakes.
"We thought multi-agent would be cleaner" cannot survive a memo. The memo demands: cleaner how, measured by what, compared to what baseline? Those questions are uncomfortable when the answer is thin. Answering them is the work.
The discipline of writing "we considered the second agent and rejected it for these specific reasons" is what protects future-you from rebuilding the system in six months having forgotten the reasoning that led to the first design.
The brief
Pick a real system where multi-agent has been proposed or where you are tempted to propose it. Produce six artifacts:
-
The task-shape analysis. Which essay's regime are you in: Cognition's (sequential, accumulated context) or Anthropic's (parallel, independent context)? Justify in two paragraphs, with specific reference to the four-question fan-out test from Lesson 3.
-
The single-agent counterfactual. The version that does the same job without a second agent. Be rigorous: this is not a strawman. It is the real alternative that the multi-agent design must beat to justify its added complexity.
-
The decision. Multi-agent or single-agent. One sentence. Followed by the disagreement-resolution: which essay's argument you are adopting and why the other essay's warning does not apply in your regime.
-
If multi-agent: five sub-artifacts. The pattern (from Lesson 4). The handoff contract (from Lesson 5). The shared-state design (from Lesson 6). The quota plan (from Lessons 7 and 8). The observability plan (from Lesson 9). All five are required.
-
If single-agent: revisit conditions. The specific, concrete conditions under which a future revisit would be justified. Not "when the system gets more complex" — what specific change in task structure, scale, or constraints would change the regime decision?
-
The one-page memo. Wrapping all of the above for a leadership audience. Plain prose. No diagrams in the memo (they live in the appendix). The reader should be able to make the decision — or understand why you made it — after reading one page.
The picture
The published capstone presents two anonymized worked examples side by side: one that decides for multi-agent, one that decides against. The contrast is the lesson.
The pro-multi-agent example: a document-corpus research task, high variance across units, synthesis is additive. Regime is Anthropic's. Pattern is manager-worker. Fan-out cap is twelve agents by calculation. Handoff contract specifies a structured finding schema with source citations. Shared state is isolated by design. Quota plan includes a stagger interval and checkpoints after each document cluster. Observability is structured spans linked to the parent task.
The anti-multi-agent example: a rules-application task where context accumulates and rules interact. Regime is Cognition's. The single-agent counterfactual is viable, faster to implement, and cheaper to debug. The multi-agent alternative would fragment exactly the context that makes the evaluation coherent. Decision is single-agent. Revisit conditions: if the rule set grows large enough that a single context window cannot hold the full rule set and the candidate, evaluate a retrieval-augmented single-agent design before considering multi-agent.
The contrast shows that the framework is not biased toward either decision. It produces multi-agent when the task warrants it and single-agent when it does not.
Why it matters now
Multi-agent design decisions made in planning meetings, in Slack threads, or in hallway conversations are invisible to future engineers, invisible to future-you, and impossible to audit when the system behaves unexpectedly six months later.
The memo makes the decision legible and revisable.
That legibility is worth more than it costs. The memo takes three hours to write. The system it governs will run for months or years. The ratio is favorable.
A source you should trust
- Lessons 1 through 9 of this course. All of them. The memo is the integration artifact; if a section of the memo is weak, the corresponding lesson is the place to go back and re-read.
- Cognition's essay and Anthropic's essay. Both are required reading before the memo is finalized. If you have not read one of them, the task-shape analysis will have a gap.
A recipe
A three-hour capstone working session:
- Pick the system. (10 minutes) Be specific: what is the job-to-be-done, what tools does the system use, what does the output look like?
- Task-shape analysis. (30 minutes) Run the four-question fan-out test. Name the regime. Re-read the relevant essay with the specific task in mind.
- Single-agent counterfactual. (30 minutes) Sketch the single-agent version in enough detail that you could build it. What tools does it call? What is the context window at peak? What is the latency per task?
- Decision and disagreement-resolution. (15 minutes) One sentence for the decision. One paragraph for the reasoning.
- If multi-agent: design the five sub-artifacts. (60 minutes) Do not skip any of the five. The ones that are hardest to write are the ones most likely to cause incidents.
- One-page memo. (30 minutes) Write to a non-technical leadership audience. Cut jargon. Name the lever, the risk, and the rollback.
- Sleep on it. Revisit tomorrow. Decisions that look clean at hour three often look questionable at hour twenty-four.
The smell of it going wrong
The decision was made before the analysis was written. This is the most common failure mode of design memos: the author already knows the conclusion and works backward. The tell is a task-shape analysis that focuses entirely on why the chosen option is good rather than interrogating whether it is justified.
The single-agent counterfactual is a strawman. If the single-agent version in the memo is obviously incomplete or impossible, the multi-agent decision has not been tested against a real baseline.
The justification is "it sounds cleaner." By lesson ten, this should feel like a red flag without needing a rule to name it.
The memo is more than two pages. If the decision requires more than two pages to justify, either the analysis has not been synthesized, or the author is hedging. A two-page memo that is read is worth more than a ten-page memo that is not.
A judgment call from real work
PL has lived both decisions, and they are the worked examples above.
The pl-judgment MCP server is a deliberate single-agent design. The task is applying a structured rule set to a candidate where rules interact and context accumulates. The multi-agent alternative was designed, written out, and rejected. The rejection memo exists.
The parallel-triage system is a deliberate multi-agent design. The single-agent counterfactual — one agent, sequential processing — was sketched and compared. The comparison showed that sequential processing of thirty issues would take three times longer with no quality gain, because the issues are independent and there is no context benefit to processing them sequentially.
Same product company. Different decisions. Both defensible. Both written down before the first line of code was committed.
That is the standard this capstone asks you to meet.
Rules from this lesson
- The decision is a written memo, not a hallway conversation; writing forces precision at the point where vague intuitions become expensive mistakes.
- The single-agent counterfactual is required, not optional; a multi-agent decision untested against a real single-agent baseline has not been made rigorously.
- Both essays must be cited and engaged; choosing by vibes, even experienced vibes, is not the same as choosing by regime analysis.