MCP — what it is, what it isn't, what it changes — Tool & Memory Design — when the agent's effective IQ depends on its toolbelt

Treat MCP as the package-manager moment for agent tools — read it as distribution, not magic.

Before MCP, every agent harness reinvented the tool integration layer. If you wanted your agent to call a CRM, you wrote a CRM tool for your Claude app, a different CRM tool for your LangGraph app, and a third one for the OpenAI harness running the other feature. The logic was identical, the glue was different, and maintaining three copies was a tax nobody had time for. MCP (Model Context Protocol, specified and released by Anthropic in late 2024) resolves this the same way npm resolved it for JavaScript libraries: publish once, mount anywhere. A CRM MCP server exposes a tool surface; any agent harness that speaks the protocol can mount it.

That is the shift. Notice what it is not: it is not a capability improvement. The tools available through an MCP server are the same tools you could have written before. The agent's reasoning capability is the same. What changed is the distribution graph — from N×M (every app re-integrates every tool) to N+M (every app mounts any server). That is a real and meaningful change in economics. It is not an intelligence upgrade.

The picture

Draw two diagrams. In the pre-MCP world, each agent application has its own tool adapters, and the same external capability — say, a Notion workspace — is integrated four different ways in four different apps: with different schemas, different auth, different return shapes, different error handling. Each integration is locally maintained. When the Notion API changes, four integrations break.

In the post-MCP world, there is one Notion MCP server. It exposes a defined tool surface. Every agent application mounts the same server. When the API changes, one server gets updated. The distribution graph went from a mesh to a hub.

The economics of this shift reward a specific discipline: the cheap thing now is mounting an existing server. The non-cheap thing is still designing the tool surface. Those are different jobs. Lesson 1's toolbelt discipline applies at the server design layer; this lesson applies at the consumer layer — when you are the app team deciding which MCP servers to mount and how many tools to expose to your agent.

Why it matters now

Anthropic's MCP rollout in late 2024 and the subsequent industry uptake through 2025 produced a marketplace effect: hundreds of MCP servers now exist for common integrations. This is genuinely useful. It also created a new failure mode that did not exist before: the "mount everything" reflex.

When adding a tool cost four hours of integration work, teams were selective about which tools to add. When adding a tool costs twenty minutes of mounting a server, the friction that used to enforce discipline disappears. The result is toolbelts with thirty tools, agents that misfire constantly, and teams that respond by looking for a smarter model rather than a better-designed toolbelt. The model is not the problem.

A source you should trust

The Model Context Protocol specification is the primary source and is genuinely readable in a single sitting. It covers the message protocol, the server lifecycle, the tool registration format, and the auth model. Reading the spec directly takes an afternoon and gives you a cleaner mental model than any secondary summary.

Anthropic's MCP documentation and example servers provide operator-grade reference implementations. The filesystem and Brave search servers are the canonical simple examples; read one of them to understand what a well-designed server surface looks like before you evaluate third-party servers.

A recipe

How to evaluate an MCP integration before you mount it:

Read the server's tool list. Apply the toolbelt audit from Lesson 1: cap at seven, look for overlapping descriptions, check for compound verbs. If the server has thirty tools and no documentation on which subset to expose, treat that as a smell.
Identify which tools are read-only and which are write or irreversible. Read-only tools carry low risk in most agent harnesses. Write tools require an explicit human approval gate unless you have deliberately designed against one.
Audit the auth model. Does the server inherit the agent's user identity, or does it run as a service account? Service account auth means the agent can act on behalf of any user of the service, which may not be what you want.
Inspect the error surface. Does each tool's return shape carry enough information to let the agent recover from a failure — retry, fallback, escalate — or does it just return "something went wrong"? Error surfaces are where tool design maturity shows.
If the server has more than ten tools, decide upfront which subset you will expose to your agent. The server's tool count is the server author's problem; your agent's vocabulary is your problem.

The smell of it going wrong

The team mounts an MCP server with thirty tools and exposes all of them to the agent, treating the server's surface as the toolbelt design.
The agent has write-level access through MCP to an external system — a production database, a payment processor, a customer email channel — without a human approval gate.
The auth boundary of a mounted server is implicit. Nobody on the team can answer the question "what does this server have access to, and on whose behalf?"
The team's pitch for a new feature is "we found an MCP server that does this." Finding a distribution primitive is not the same as having a product strategy.
The toolbelt grew from five tools to twenty over six weeks, each addition justified individually, with no cumulative audit.

A judgment call from real work

When the PL pl-judgment MCP server was designed for external consumption — meaning other agent harnesses could mount it — the question of what to expose was deliberate.

The internal API had twelve operations. The MCP surface exposes five. The seven that were withheld are a mix of write operations (anything that modifies stored judgment records), operations that require human context the external agent would not have, and housekeeping operations that are useful internally but not composable from the outside.

The five that are exposed are all read-side: retrieve a judgment framework, evaluate a claim against a framework, surface relevant prior decisions, check the evidence chain on a claim, and return a confidence estimate with reasoning. Every one of them returns enough context that the mounting agent can decide what to do next without hitting the server again.

The auth model runs on a scoped API key with read-only permissions on the judgment store. The key cannot write or delete. If a consuming agent wanted to commit a decision back to the PL judgment system, it would do so through a different channel with a human approval step. The MCP surface is intentionally read-only to keep the blast radius small.

Rules from this lesson

MCP is distribution; tool design still has to happen — do not let mounting ease substitute for vocabulary discipline.
Audit an MCP server before mounting it; do not expose tools to your agent that you would not have written yourself.
Write and irreversible operations through MCP need an auth boundary you have audited before the first deployment.

Treat MCP as the package-manager moment for agent tools — read it as distribution, not magic.

Read the server's tool list. Apply the toolbelt audit from Lesson 1: cap at seven, look for overlapping descriptions, check for compound verbs. If the server has thirty tools and no documentation on which subset to expose, treat that as a smell.
Identify which tools are read-only and which are write or irreversible. Read-only tools carry low risk in most agent harnesses. Write tools require an explicit human approval gate unless you have deliberately designed against one.
Audit the auth model. Does the server inherit the agent's user identity, or does it run as a service account? Service account auth means the agent can act on behalf of any user of the service, which may not be what you want.
Inspect the error surface. Does each tool's return shape carry enough information to let the agent recover from a failure — retry, fallback, escalate — or does it just return "something went wrong"? Error surfaces are where tool design maturity shows.
If the server has more than ten tools, decide upfront which subset you will expose to your agent. The server's tool count is the server author's problem; your agent's vocabulary is your problem.

The smell of it going wrong

The team mounts an MCP server with thirty tools and exposes all of them to the agent, treating the server's surface as the toolbelt design.
The agent has write-level access through MCP to an external system — a production database, a payment processor, a customer email channel — without a human approval gate.
The auth boundary of a mounted server is implicit. Nobody on the team can answer the question "what does this server have access to, and on whose behalf?"
The team's pitch for a new feature is "we found an MCP server that does this." Finding a distribution primitive is not the same as having a product strategy.
The toolbelt grew from five tools to twenty over six weeks, each addition justified individually, with no cumulative audit.

A judgment call from real work

When the PL pl-judgment MCP server was designed for external consumption — meaning other agent harnesses could mount it — the question of what to expose was deliberate.

Rules from this lesson

MCP is distribution; tool design still has to happen — do not let mounting ease substitute for vocabulary discipline.
Audit an MCP server before mounting it; do not expose tools to your agent that you would not have written yourself.
Write and irreversible operations through MCP need an auth boundary you have audited before the first deployment.