PL auto-memory — a working example — Tool & Memory Design — when the agent's effective IQ depends on its toolbelt

Read a real memory system before you design your own.

Most published memory system designs are theoretical: diagrams of how it could work, APIs that could exist, taxonomies that could organize the storage. What is rarely published is a system that runs in production, has generated hundreds of sessions of history, has been wrong in specific ways, and has been corrected by writing feedback memories about how to write memories. The PL auto-memory is that system, and it sits in this repository — ~/.claude/projects/-Users-raramuri-Projects-pragmaticleaders-io/memory/ — legible and inspectable while you read this.

The lesson is not "copy this system." It is "read a system that faced the real design problems and see how it solved them." Most of the interesting architecture decisions were not made upfront — they emerged from operational problems.

The picture

The architecture is deliberately simple. Four memory types, each stored as a separate markdown file with YAML frontmatter. One always-loaded index, MEMORY.md, that pointers into the most cross-cutting files. Cross-references between memories use [[name]] links — the link is a retrieval trigger, not a hyperlink.

The four types:

User memories (user_*) hold stable facts about the person: their role, background, how they think, what they build. These are semantic memories. They rarely change; when they do, the file is updated in place rather than creating a new entry.

Feedback memories (feedback_*) hold procedural rules: how to work with this person. Format, tone, decision presentation, what to do when ambiguous. Each feedback memory follows the rule-why-how-to-apply structure. The "why" is load-bearing — without it, the rule is a preference that cannot be reasoned about; with it, the rule is a principle that can be applied to novel situations.

Project memories (project_*) hold current state of work: where things stand on active initiatives, key decisions and their rationale, what changed in recent sessions. These are episodic plus semantic — they capture what happened and what is now true as a result.

Reference memories (reference_*) hold external system anchors: how to reach things, canonical names for infrastructure components, access patterns. These are semantic and stay stable until the external system changes.

Why it matters now

There are very few published production memory systems you can inspect directly. Most case studies are at one remove: a blog post describing what the system does, not the actual structure of the files it stores. The PL auto-memory is unusual in being directly readable — you can ls the directory while reading this lesson, read a sample of the files, and see the actual write protocol applied across dozens of real sessions.

This kind of legibility is the fastest possible learning loop for memory system design. You are not reading a description of a design decision; you are reading the decision artifact itself.

A source you should trust

The auto-memory system itself is the primary source. Read MEMORY.md to see what gets indexed and how the index entries are written. Then read one feedback memory and one project memory to see the structural difference. The contrast is instructive: feedback memories are compressed and rule-shaped; project memories are narrative and state-shaped.

The four-type taxonomy (user / feedback / project / reference) is the practical simplification of the academic four-regime taxonomy from Lesson 4. The mapping is not exact: "user" corresponds roughly to semantic, "feedback" to procedural, "project" to episodic plus semantic, "reference" to semantic. The PL types are tuned for an operator working with an AI assistant on a product over time; the academic types are tuned for describing memory in general. Both are useful, and knowing the mapping helps you extend the practical system when you encounter a fact that does not fit cleanly.

A recipe

The PL auto-memory write protocol in five steps:

Decide the type. User (who they are and how they work), feedback (how the agent should behave with them), project (what is currently happening and what was decided), reference (where to find things). If the fact does not land in one type, either it is two facts or it needs more specificity before being committed.
Write a separate file with YAML frontmatter — at minimum: name, description, and type fields — and a body structured for the type. Feedback memories use rule-why-how-to-apply. Project memories use state-plus-decision narrative. Reference memories use precise anchor text.
Link related memories with [[name]] cross-references in the body text where one memory presupposes or extends another. This is not organizational overhead — it is retrieval signal. Cross-references make related memories easier to surface together.
Add an index pointer in MEMORY.md only if the memory is relevant to most future sessions. This is the contested decision. The default is to index; the discipline is to resist that default for topic-narrow memories that would tax every session unnecessarily.
Update or remove memories that turn out wrong or outdated. Stale memories are not harmless. An agent acting on outdated procedural memory behaves incorrectly with confidence. Audit the memory store when a session surfaces a wrong assumption.

The smell of it going wrong

Two memories cover the same topic and disagree. This happens when a new memory is added without searching for an existing memory on the same topic. The agent retrieves both, receives conflicting guidance, and resolves the conflict by making an inference it should not have to make.
MEMORY.md has grown past 200 lines and is loading content into every session that has not been relevant in weeks. The index has lost its curation discipline; the result is context overhead on every call.
Memories reference paths, API names, or feature flags that no longer exist. The agent acts on stale memory with confidence because nothing in the system flags the reference as outdated. Memories should be treated as potentially stale and verified before being acted on.
The type field is wrong: feedback memories stored as user memories lose the rule-why-how-to-apply structure and become a flat preference list that the system cannot reason about contextually.
New memories are added but old memories are never reviewed or removed. The store grows monotonically; quality degrades as the signal-to-noise ratio falls.

A judgment call from real work

The feedback_memory_index_discipline memory is the canonical recent example of the system improving its own write protocol, and it is worth walking the full cycle.

During a course-authoring session, a playbook for course visualization was developed. The playbook was specific to how PL visualizes learning progress in course-authoring contexts — the approach to concept maps, the naming conventions for visualizations, the tradeoffs between different chart types for learning data. It was a useful artifact; it had operational value.

When the playbook was committed as a memory file, the natural next step was to index it in MEMORY.md. This was the temptation: the playbook felt important, and the index felt like the right place for important things. The pointer was added.

In the next session — on a completely different topic, deployment infrastructure — MEMORY.md loaded the course-visualization pointer. It was irrelevant. It added twelve lines of context to a session where no course visualization was happening. At scale, with twenty topic-narrow memories indexed alongside twenty cross-cutting ones, half the context in every session would be irrelevant noise.

The feedback memory that emerged from catching this: index discipline. Cross-cutting memories — decision presentation format, permission conventions, the deploy tier names, the subagent patterns — belong in the index. Topic-narrow playbooks — course visualization, Razorpay integration specifics, specific deployment sequences — belong as unindexed files, retrievable when the topic is active but not loaded universally.

The meta-lesson: the system's discipline improved by writing a memory about how to write memories. That recursion is not accidental. A production memory system that does not generate feedback memories about its own write protocol will not improve over time. The write protocol is itself a procedural memory that needs curation.

Rules from this lesson

The four-type taxonomy is small enough to hold in your head and rich enough to cover most production cases — user, feedback, project, reference maps to who, how, what, and where.
The index pays for itself only for cross-cutting memories; topic-narrow files should be retrievable but not universally loaded.
Memory systems improve over time when the system writes feedback memories about how to write memories — the protocol is itself a first-class memory artifact.

Read a real memory system before you design your own.

Decide the type. User (who they are and how they work), feedback (how the agent should behave with them), project (what is currently happening and what was decided), reference (where to find things). If the fact does not land in one type, either it is two facts or it needs more specificity before being committed.
Write a separate file with YAML frontmatter — at minimum: name, description, and type fields — and a body structured for the type. Feedback memories use rule-why-how-to-apply. Project memories use state-plus-decision narrative. Reference memories use precise anchor text.
Link related memories with [[name]] cross-references in the body text where one memory presupposes or extends another. This is not organizational overhead — it is retrieval signal. Cross-references make related memories easier to surface together.
Add an index pointer in MEMORY.md only if the memory is relevant to most future sessions. This is the contested decision. The default is to index; the discipline is to resist that default for topic-narrow memories that would tax every session unnecessarily.
Update or remove memories that turn out wrong or outdated. Stale memories are not harmless. An agent acting on outdated procedural memory behaves incorrectly with confidence. Audit the memory store when a session surfaces a wrong assumption.

The smell of it going wrong

Two memories cover the same topic and disagree. This happens when a new memory is added without searching for an existing memory on the same topic. The agent retrieves both, receives conflicting guidance, and resolves the conflict by making an inference it should not have to make.
MEMORY.md has grown past 200 lines and is loading content into every session that has not been relevant in weeks. The index has lost its curation discipline; the result is context overhead on every call.
Memories reference paths, API names, or feature flags that no longer exist. The agent acts on stale memory with confidence because nothing in the system flags the reference as outdated. Memories should be treated as potentially stale and verified before being acted on.
The type field is wrong: feedback memories stored as user memories lose the rule-why-how-to-apply structure and become a flat preference list that the system cannot reason about contextually.
New memories are added but old memories are never reviewed or removed. The store grows monotonically; quality degrades as the signal-to-noise ratio falls.

A judgment call from real work

The feedback_memory_index_discipline memory is the canonical recent example of the system improving its own write protocol, and it is worth walking the full cycle.

Rules from this lesson

The four-type taxonomy is small enough to hold in your head and rich enough to cover most production cases — user, feedback, project, reference maps to who, how, what, and where.
The index pays for itself only for cross-cutting memories; topic-narrow files should be retrievable but not universally loaded.
Memory systems improve over time when the system writes feedback memories about how to write memories — the protocol is itself a first-class memory artifact.