The Claude Agent SDK — one concrete loop, walked end to end — The Agent Loop — what's actually running when an agent "works for hours"

Reading a production SDK end to end is worth twenty blog posts at twice the length.

This is not a metaphor. Blog posts describe the loop from the outside, selecting for the decisions worth writing about. An SDK exposes every design decision the implementors made: where they put the stopping condition, what defaults they chose for eviction, how they handle tool-call retries, which parts of the loop they expose to you and which they encapsulate. Reading that is reading a worked example of every principle in the preceding lessons — with actual tradeoffs, not abstract ones.

The decisions the SDK makes for you are not arbitrary. They reflect the most common production usage patterns the SDK team has seen. Understanding why a default exists is often more useful than knowing what it is.

The Claude Agent SDK is the reference implementation for this course for two reasons. First, it is the SDK this course is built on — you are reading this inside a Claude Code session that is itself an instantiation of the loop. Second, it is as operator-grade a primary source as exists: it was written by the people who build the model and who have seen more production agent failures than any external team.

Start with the smallest possible loop. A single tool, a simple task, a termination on the first goal-predicate satisfaction. Run it locally. Not to see it work — to see it work and then map every line back to the five components.

The model in the SDK is the claude-* endpoint called on each turn. The SDK makes one implicit decision here that is worth surfacing: it defaults to the same model throughout a run, not the cheapest model for reasoning turns and a stronger model for tool-selection turns. That default may be the right choice for your product surface. It may not. You inherit it if you do not override it.

The context window management is where the SDK's opinionated choices become most visible. The SDK's default conversation management retains full turn history up to the context limit. When the limit is approached, it compresses or truncates. The compression strategy — what gets summarized, what gets dropped — is an SDK decision, not a law of physics. Knowing the default tells you whether you need to override it.

The tool-call surface is declared in the tools array you pass to the client. The SDK enforces the schema strictly — tool parameters that do not match the declared schema will fail at call time, not at planning time. This is the right behavior and also the behavior that most teams discover by reading an error, not by reading the documentation.

The memory layer is the component the SDK leaves most explicitly to you. The SDK does not decide where to put persistent memory, how to inject it, or when to evict it. This is deliberate — the SDK is designed to be memory-agnostic. The MEMORY.md auto-memory pattern used in this repo (short entries, persisted across sessions, injected into the system prompt) is one implementation. Vector-store injection on each turn is another. Both are equally valid; neither is the default.

The stopping condition is where the SDK's explicit design choices are most important to read. The SDK ships with a token-budget parameter and a timeout. It does not ship with a no-progress guard or a goal predicate — both of those are yours to implement. The default is "run until you hit the budget or timeout." For interactive use, that default is fine. For long autonomous runs, it is not.

Why it matters now

By late 2026, the major SDKs have converged on similar loop primitives with different ergonomic tradeoffs. Anthropic's Claude Agent SDK, OpenAI's Agents SDK, and LangGraph all implement variants of the same five-component model. The surface differences are substantial — naming conventions, tool-declaration syntax, state-management API — but the underlying structure is the same.

Knowing one SDK well means you can read the others quickly. The question "where does this SDK handle stopping conditions?" has the same answer structure across all three frameworks; you just have to know which surface to look at.

A source you should trust

Claude Agent SDK reference documentation. The primary source, updated as the SDK evolves. The most important sections for this course are the loop lifecycle documentation, the tool-use examples, and the conversation management section that describes the default context compression behavior. Read the defaults section before you override anything.
The CLAUDE.md harness in this repository. The project CLAUDE.md, combined with the MCP server configurations and the memory files, is a production-grade worked example of the SDK configured for a specific product surface. It is the example you can read while you apply the five-component map.
Anthropic engineering blog posts accompanying SDK releases. The design-rationale posts explain why specific defaults were chosen, which is often more useful than knowing what the defaults are.

A recipe

A one-hour read-the-SDK exercise you can run before committing to any agent stack:

Open the SDK quickstart. Run the smallest possible loop locally — one tool, one task, one turn. Verify it works.
Put the five-component diagram on a second monitor. With each section of the quickstart code, label it: which component does this code implement?
Make a list of every implicit decision the SDK makes for you. Stopping condition default, context eviction policy, tool-call retry behavior, model temperature default, error handling behavior. If you cannot find a default by reading the docs, find it by reading the source. You are inheriting all of these.
For each default on the list, decide: does this default match your production surface, or do you need to override it? Write down the rationale either way.
Write a one-page memo: "what the SDK assumes, what we override, what we keep." This memo is the onboarding document for the next engineer who joins the team and asks why the agent behaves the way it does.

The smell of it going wrong

The team treats the SDK as a black box and cannot answer what its default stopping condition is. They will discover the default under load.
The agent's behavior changes unexpectedly when the SDK version upgrades. This happens because an implicit default changed and nobody knew which default was load-bearing for the product.
The same anti-pattern appears in production that the SDK documentation explicitly warns against. The documentation was not read.
Multiple engineers are debugging the same loop using different mental models of what the SDK is doing. The five-component walkthrough should produce a shared map; the absence of one produces parallel confusion.
The memory layer is being implemented in four different ways across different parts of the system because the SDK was integrated separately in each case, and nobody wrote the memo.

A judgment call from real work

The harness you are reading this in — Claude Code with a project CLAUDE.md, MCP servers (PostHog, Zopnight, Apollo), subagent tooling, and MEMORY.md auto-memory — is a worked example of the Claude Agent SDK pattern at the specific boundary between platform and product.

Every component of the five-part loop has been explicitly configured rather than defaulted. The system instruction is the CLAUDE.md. The memory layer is the auto-memory files under .claude/projects/*/memory/. The tool-call surface is the combination of built-in tools and MCP-exposed tools defined in the project configuration. The stopping condition is a combination of SDK defaults and the explicit token-hygiene rules in the CLAUDE.md that tell the agent when to compact and when to start a new conversation.

The design choices PL has made on top of the default SDK:

Auto-memory types (locked, reference, feedback, project) structure the memory layer so that different categories of persisted knowledge have different injection priorities and eviction rules. Locked memories are always injected. Feedback memories are injected in the current domain. This is a deliberate override of the SDK's flat memory model.

Isolation=worktree subagents give each spawned child its own git working tree and its own file context, ensuring that parallel children writing to the filesystem do not overwrite each other's work. This is a deadlock-on-shared-state prevention mechanism baked into the spawn protocol.

SendMessage handoff is the designed rejoin protocol — the structured way a child agent signals completion and passes work back to the parent, rather than leaving files for the parent to discover asynchronously. This is a goal-predicate implementation at the multi-agent layer.

Each of these design choices was made by reading what the SDK defaults to and deciding whether that default served the specific production surface. The memo for each lives in the MEMORY.md files you can read in this repository.

The result is a harness that behaves predictably because it was designed explicitly, not one that works until the SDK upgrades a default and the behavior silently changes. That predictability is the product of the one-page memo discipline applied at each decision point. It is not a byproduct of experience. It is a practice.

Rules from this lesson

Read one SDK end to end before committing to a stack; every default is an implicit decision you inherit.
The defaults section of the documentation is the most important section; read it before the quickstart.
The most productive SDK reading is on a system that runs the same code you intend to ship — not on a demo toy.
Write the one-page memo; it is the onboarding document for the next engineer on your team.