Building With AI vs. Building AI Products — the pm manual

Using Claude Code is not the same job as shipping Claude Code. Both are valuable. Confusing them gets you fired or out-shipped.

Talvinder Singh

There are two jobs in 2026 and they get mixed up in every roadmap conversation. The first is using AI to ship your product faster — Cursor in the IDE, Claude writing the first PRD draft, Midjourney for the marketing site. The second is shipping a product where AI is the product — where if the model goes away tomorrow, you have nothing to sell. Both jobs are legitimate. Neither is the other.

Chapter 1 told you whether the problem wants AI at all. Chapter 2 told you which model to pick. This chapter is the staffing fork underneath every other decision in this manual: are you building with AI, or building AI? Get it wrong and the cost is not a missed sprint — it is hiring an ML research team to babysit an API call, or hiring an integration engineer to own a model that needs an eval harness, a data flywheel, and a hallucination budget. The wrong staffing call is more expensive than the wrong model call. You cannot fire your way out of it. (Rule ai-6.)

Job one: the builder using AI

You are still shipping the product you were already shipping. AI is a lever inside the workflow — yours, your team's, the user's. You did not become an AI company. You became a faster company.

The patterns that actually move the needle:

Cursor or Copilot in the engineering loop. GitHub's 2023 research showed surveyed developers completed tasks roughly 55% faster with Copilot enabled, and the 2024-25 wave of agentic coding tools (Cursor, Claude Code, Copilot Workspace) pushed that further on bounded tasks. The lever is real and conditional: it shows up on greenfield code, scaffolding, and well-specified refactors, and shrinks fast on novel architecture and code that has to be exactly right the first time. (See Tool Use, Function Calling, Agents — The Maturity Ladder.)

AI-assisted PRDs and specs. The biggest single lift for most PMs. Claude or GPT generating a 1200-word first draft in three minutes turns the blank-page problem from an hour into ten minutes of editing. The output is 60% right; the PM still does the thinking, but skips the writing friction. The trap is shipping the 60% as the final version — engineering teams notice within a paragraph, and the PRD loses authority for the quarter.

AI-assisted design exploration and research synthesis. Figma's AI, v0, Lovable for rough comps and first-pass copy. Twenty user interviews into a 500-word "top five pain points with quotes" output; three months of support tickets into a tagged taxonomy. Magic when scoped, credibility-ending when over-trusted — a hallucinated quote in your synthesis deck is a credibility event you do not recover from in the same quarter.

The mental model that holds it all together is maker-checker. AI is the maker. You are the checker. The check is not optional, it is not a rubber stamp, and it is not delegable to another AI. Every artifact AI produces that carries your name needs a human pass that catches the wrongness AI cannot catch in itself — the hallucinated stat, the plausible-sounding but politically untenable recommendation, the suggested metric that your team does not actually track. (Rule ai-81.)

When the maker-checker discipline breaks, you get AI-slop: an artifact that reads fine, follows the form, and contains nothing a thinking person needed. The slop tax compounds. One AI-slop PRD costs a sprint. Three AI-slop PRDs in a row cost the team's trust in PM artifacts entirely.

What not to delegate, ever: prioritisation, strategy, user research itself, final-version writing that carries your name, and any decision that requires judgment about your specific market. AI has no context on your politics, your runway, your hiring pipeline, or what your last board meeting decided. Treat it as a first-draft machine and a summarisation engine. Treat it as a decision-maker and it will confidently tell you to do the wrong thing in a polished voice.

The staffing cost of job one is small. Subscriptions, an internal-tools owner, an AI-policy doc, a quarterly review of which tools are paying for themselves. Job one does not need an ML engineer. Anyone who tells you it does is selling something.

Job two: the builder of AI products

Here the model is not a lever — it is the product. Remove the model and the company has nothing to sell. Grammarly, Jasper, ElevenLabs, Karya, Cursor (yes, the IDE is the wrapper but the model orchestration is the product), most of the legal-AI and medical-AI startups. This is the AI-as-product camp from chapter 1, and once you are in it the rest of the manual stops being optional reading.

The shape of job two:

Eval-driven development. You cannot ship what you cannot measure. Job two teams have an eval set before they have a product, treat it as a regression suite on every prompt or index change, and watch eval drift the way SaaS teams watch CAC. Without the eval harness, every model swap is a coin flip and every prompt edit is a prayer. (See Eval Before Launch.)

Model providers as suppliers, with lock-in priced in. Anthropic, OpenAI, Google, and the open-weights ecosystem are your supply chain — not your friends. They will reprice, deprecate, change rate limits and safety policies, and ship features that compete with you, usually with thirty days notice. Job-two teams design a provider-switchable seam, eval against at least two providers, and keep an open-weights fallback warm enough to swap in a week. Name only one provider on your stack diagram and you are renting demand from them.

Differentiation when the model is commoditised. The base capability converges to commodity at roughly the speed of every twelve-month frontier release. Whatever made your wrapper magical in 2024 is in the next ChatGPT release. The moats that survive: proprietary data the lab cannot get, workflow integration the user lives inside, a domain feedback loop you have been running for years, distribution the lab has no path into. Pick at least two. (Rule ai-83.)

Owning the data loop. Job two products produce a stream of inputs, model outputs, and user reactions — the accept/reject signal, the edit-after-accept, the silent abandonment. That stream, properly captured, is the only training resource the foundation lab cannot replicate. It is also the only path from "thin wrapper" to "actual product." Job two teams treat the data loop as infrastructure equal to the model — instrumented, governed, retained, and replayable.

Cost and latency as first-class P&L line items. Inference cost-per-user-per-month sits on the dashboard next to MRR. p95 latency sits next to conversion. Job-one teams can ignore this; their AI bill is the marketing budget for a quarter. Job-two teams cannot — the model bill scales with usage, and a free-tier user who churns at month three after burning ₹400 of inference is a strictly worse customer than the one who never signed up. (See Cost & Latency as First-Class Product Constraints.)

The staffing cost of job two is large. An ML or applied-AI lead who owns evals and model selection. A data engineer who owns the feedback loop. A product engineer who owns the prompt and tool layer as production code with tests. A PM who can talk to all three without flinching, and who reads chapters 2 through 10 of this manual as a job description, not as background reading.

The two jobs are merging — the skills are not

Every product team in 2026 is doing both. The engineering team uses Cursor (job one) while shipping a feature where the user gets an AI suggestion (job two). The PM writes the PRD with Claude (job one) for a product whose core retention loop is an AI feature (job two). The merge happens at the team level, in the same week, sometimes on the same ticket.

The trap is assuming the skills transfer. They do not. Being excellent at using Cursor tells you nothing about how to design an eval set. Being excellent at running an eval set tells you nothing about whether your team should be using Cursor at all. The judgment for each job is sourced from a different place — job one from product taste and editorial discipline, job two from ML practice and product economics.

The clean way to think about it on a roadmap: for every feature, ask which job is in play, and staff that line accordingly. A PRD-writing improvement is a job-one investment — an internal-tools call, not an ML hire. A new "suggest the next step" feature inside the product is a job-two investment — and if you staff it like job one, you will ship a demo and learn six months later that it has no moat, no eval, and no path to defensibility.

Three worked examples

Example A — Fintech using AI to write specs, but shipping no AI features

A Series-B fintech in Bengaluru. PM team heavy on AI tooling internally — Claude for first-draft PRDs, GPT for ticket synthesis, Cursor in engineering. The product itself has zero AI features. Loans, KYC, dashboards, repayments. The regulator's tolerance for hallucination on a credit decision is zero, and the team knows it.

A clean job-one shop. The lever is real — roughly 30% more spec volume than pre-Claude, engineering velocity up on bounded tasks. The roadmap has no AI initiative, and that is the correct answer. The CEO has defended it to investors twice; both times the answer was "we use AI heavily inside, and we will ship AI features when a user job wants one." Six quarters in, that job has not appeared. The discipline is the strategy.

Example B — Startup where the entire product is an AI feature

A 12-person seed-stage team building a legal-drafting assistant. The product is the model orchestration — retrieval over Indian case law, drafting tuned for Indian contract conventions, a review layer flagging clause-level risk. Remove the model, nothing left.

They made the job-two staffing call early. Applied-AI lead before the second product engineer, an eval set of 400 graded drafts before paying users, accept/reject loop instrumented from day one. Differentiation is the eval set and the proprietary corpus of Indian-context examples — neither replicable by OpenAI or Anthropic in eighteen months. Provider-switchable model layer; quarterly eval against three providers. They did not pretend job one would scale into job two.

Example C — Hybrid: internal AI heavy, external product mostly not

A 60-person B2B SaaS doing horizontal workflow automation. Internally: Claude Code, AI-assisted PRDs, support deflection, research synthesis. Externally: one "summarise this thread" feature, and that is it. Rest of the product is structured workflow, integrations, dashboards — none AI-shaped.

The most common shape in 2026, and the one that confuses leadership most. Internal velocity makes the org feel like an AI company; external surface says it is not. Both true. The PM job is to keep the two separate in roadmap conversations, and to keep scanning for the one external user job that is genuinely AI-shaped — the way Klarna found customer-service deflection. When it appears, you flip that sub-line into job two and staff it like Example B. The wrong move is letting internal AI fluency leak into external features that do not earn their cost. "We use AI a lot internally" is not a customer benefit.

What to do on Monday morning

Open your current roadmap. For each initiative, write one of two letters next to it: L for lever (job one, AI accelerates how you build), or P for product (job two, AI is what the user is paying for). If you cannot decide, the initiative is not specified well enough to ship.

If the page is mostly L, your job is to industrialise the lever — pick the tools, write the policy, train the team, and stop pretending you need an ML hire. If the page is mostly P, you owe yourself chapter 2 through chapter 10 of this manual end to end before the next planning session. If the page is mixed, label every line, and resist the temptation to staff them the same.

The next chapter (The 2026 Model Landscape) is the supplier atlas — what each model lab is good at, where the open-weights line is, and what is likely to be commodity in eighteen months. Read it as supply-chain due diligence for job two, and as a cost-discipline note for job one.

Rules

Where to go next

Chapter 1 — When AI is the right answer: the gate before this fork. (When AI Is the Right Answer (and When It Isn't))
Chapter 2 — The model-selection ladder: for job two, the supplier-selection discipline. (The Model-Selection Ladder)
Chapter 4 — Eval before launch: the non-optional regression suite for job two. (Eval Before Launch)
Chapter 6 — Tool use, function calling, agents: where Cursor and Claude Code sit, and why that matters for both jobs. (Tool Use, Function Calling, Agents — The Maturity Ladder)
Chapter 9 — Cost and latency as first-class constraints: the P&L discipline that separates a job-two product from a science fair project. (Cost & Latency as First-Class Product Constraints)
Companion: Working with Engineers — most of the maker-checker discipline lives at this seam.