This path is for the product person who is being asked to ship an autonomous-agent system and does not want to find out at hour six that the harness was a hope rather than an artifact.
The framing comes from a working definition: harness engineering is the work of building the cage that lets autonomous agents run safely for hours or days. Eval harnesses, agent loops, tool wiring, memory, checkpoints, observability, recovery, cost control. It is the unsexy plumbing under the sexy "AI agents" headline, and it is where the difference between a science-fair demo and a shipped product actually lives.
You do not need to be an ML engineer to do this work. You need vocabulary, judgment, and a mental model. This path supplies all three.
The structure mirrors the layers of an agentic system. Stage 1 is the single-agent loop — see it clearly, draw it from memory, name the five places it can fail. Stage 2 is the eval harness — the only artifact that lets you ship anything more than vibes. Stage 3 is the agent's hands and notebook — tools and memory as deliberate design surfaces, not as accidents. Stage 4 is multi-agent orchestration — when the swarm is a multiplier, when it is a tax, and how to tell the difference before you sign the bill. Stage 5 is production — the part of the system that wakes the on-call, the dashboards that prevent it, and the postmortem ritual that turns every 3am page into a regression test.
Each course in the path follows the same lesson shape, designed to keep cognitive load low and the operating-manual feel high:
- The move — one line, the JTBD this lesson teaches.
- The picture — a diagram or sketch of the mental model. Some are interactive, some are static, all are version-controlled in this repo.
- Why it matters now — two or three sentences on what changed in 2024–2026 that makes this lesson current rather than 2019-era folklore.
- A source you should trust — one or two citations, each with a sentence on why the source is credible, not just a URL.
- A recipe — a ten- to thirty-line snippet or checklist you can use today.
- The smell of it going wrong — three to five bullets on failure modes you should be able to pattern-match before they happen.
- A judgment call from real work — one anecdote from PL, Ostronaut, talvinder.com, or sideb.club. What we tried, what broke, what we learned. The credibility layer.
The fifty lessons sit on top of the AI Manual's twelve-chapter spine. The Manual gives you the conceptual base — when AI is the right answer, how to climb the model-selection ladder, how to design for hallucination, how to think about cost and safety. This path takes you from "I can spec an AI feature" to "I can spec a system that runs autonomously, fails gracefully, and proves it works."
Work through the courses in order the first time. After that, treat the path as a reference — when an eval suite is the blocker, return to Course 2; when the on-call is paging, return to Course 5; when somebody on your team proposes a six-agent swarm for a problem one agent could solve, return to Course 4 before signing off.
A note on currency: the field is moving fast enough that any path written in 2026 will need refresh by 2027. The lessons here are written to age well — they cite primary sources, they teach the loop rather than the framework du jour, and the case studies are chosen because the patterns they illustrate will outlast the specific products. When a model name, a SDK, or a vendor name dates, the underlying judgment will not.
The path is opinionated about what it does not teach: it does not teach "how to write a prompt" (covered in the AI Manual), it does not teach "AI ethics as a separate topic" (covered as constraints inside each course), and it does not climax in a "build your own ChatGPT" capstone (the wrong frame — the work is building the harness, not building the model). When you finish the fifth course, you should be able to walk into a vendor demo, a leadership review, or a postmortem and ask the question that turns the room toward signal: show me the harness.