Most multi-agent failures do not happen inside an agent. They happen between agents.
The agent internals are the part that gets attention in papers and demos.
The handoff is the part that breaks in production.
The pattern is so consistent it has a name: "agent A worked, agent B worked, but together they failed."
The cause is almost always a handoff that was specified vaguely, or not specified at all.
The reason handoffs are under-specified is that they are easy to skip. The architecture diagram shows an arrow between two boxes. The arrow feels self-explanatory.
What does the arrow carry? What does the receiving agent know, and what does it not know? What happens if the payload is malformed?
The arrow has no room for those questions. The handoff contract does.
This lesson is the API design lesson for multi-agent systems. If you have ever written a REST API with an undocumented field that downstream callers silently depended on, you have already lived this problem in a different context.
The picture
Draw two agents with an arrow between them. Label the arrow with four properties.
First: payload — the data that is explicitly passed, with a schema written out.
Second: references — the shared context the receiver can access but the sender does not duplicate.
Third: re-derived state — what the receiver must reconstruct from its inputs because it was not included in the payload.
Fourth: acknowledgement — the signal that confirms the handoff succeeded, separate from whether the receiver eventually completes the task.
Now annotate each property with its failure mode. A thin payload means the receiver re-derives more than it should, introducing drift. Missing references mean the receiver works without context it needed. Implicit re-derived state means the receiver makes assumptions the sender did not verify. Absent acknowledgement means handoff failure is invisible until task failure.
The diagram reveals that the arrow on the architecture slide is four separate engineering decisions, each with its own failure mode. Teams that draw the arrow and move on have deferred four problems into production.
Why it matters now
Multi-agent failures at the handoff are the most expensive failures in the system, because they are the hardest to attribute. When agent B produces wrong output, the question is: did agent B reason incorrectly, or did agent B receive an incomplete or wrong input? Without a specified handoff contract, you cannot answer that question without reading both transcripts in full.
The microservice-API literature has been solving this exact problem for thirty years. Contract design, schema versioning, explicit failure signals, idempotent operations — these patterns exist and transfer directly to agent handoffs.
The "just have agents talk to each other" framing makes the contract feel unnecessary. It is not.
The engineering discipline is the same. The vocabulary is the same. The only thing that changed is that one party in the API is a language model.
A source you should trust
- OpenAI Swarm's handoff documentation. The clearest published treatment of handoff contract design in an agent context. The code shows what "explicit handoff" looks like in practice.
- Microservice API design literature broadly. Any treatment of REST contract design, Pact consumer-driven contracts, or OpenAPI specification will deepen your intuition here. The vocabulary is different; the problem is identical.
A recipe
Four properties every handoff contract should declare before the first line of agent code is written:
- What is passed. The payload schema, written out as a data structure. Not "the relevant information" — the specific fields, their types, and their optionality. Every field the receiver will use must be in the schema.
- What is shared (reference). Context the receiver can access by reading a shared source — a file, a database row, a tool call — but that the sender does not duplicate in the payload. References save tokens but introduce a dependency: if the shared source changes between send and receive, the receiver's view is stale.
- What is re-derived. State the receiver must reconstruct from its inputs. This category should be small, explicit, and reviewed carefully. Every re-derivation is an opportunity for the receiver to arrive at a different conclusion than the sender, producing silent inconsistency.
- What is acknowledged. The signal that the handoff itself succeeded, separate from the receiver's eventual task outcome. A task that fails at minute thirty could have failed because the handoff was malformed at minute zero.
The smell of it going wrong
The clearest tell is a handoff payload described as "everything we know so far" or "a summary of the session." That is not a schema. It is a compression of the context, and it will be interpreted differently by the receiver than it was intended by the sender.
A second tell: the receiver re-derives state that was available in the sender's session but was not included in the payload. This wastes tokens and, worse, allows the receiver to re-derive it differently. If the sender spent ten turns building a model of the user's intent and then hands off with a two-sentence summary, the receiver builds its own model. That model will not match the sender's.
The third tell is no explicit success signal for the handoff. In multi-agent systems, the handoff and the task completion are separated in time. A handoff that delivered a malformed payload may show no error until the downstream agent produces wrong output, by which point the trail is cold.
A judgment call from real work
PL subagent worktrees use the Agent tool's spawn protocol as the handoff mechanism. The child agent does not see the parent's session context. The handoff payload is the spawn prompt itself, and that prompt must be self-contained.
The prompt always contains: task description, branch name, working directory path, and success criteria in machine-readable form. What is referenced rather than duplicated: the codebase itself, available via the worktree. What is re-derived: nothing, by design. If something must be re-derived, the spawn prompt is considered incomplete and is revised before spawning.
The rejoin mechanism is SendMessage: the child's completion or explicit failure report carried back to the parent. The acknowledgement signal is the message itself.
That discipline — treating the spawn prompt as a contract with a schema, not as a chat message — is what makes the PL subagent pattern reproducible across issues, sessions, and engineers.
Apply lever, risk, rollback. Lever: a written handoff contract cuts debugging time by giving you the join point between two transcripts. Risk: over-specification can create brittle contracts that break when task requirements change. Rollback: contracts can be versioned; a schema change is cheaper than debugging a silent handoff failure.
Rules from this lesson
- The handoff is the API; design it before the agents, and treat the schema as a contract not a wish.
- Explicit payload schema beats implicit "everything we know" every time; gaps in the schema become gaps in the receiver's reasoning.
- Handoff success and task success are separate signals; instrument both or you will conflate handoff failure with agent failure.