When AI Is the Right Answer (and When It Isn't) — the pm manual

Most teams reach for AI when they should reach for a SQL query. The cost of that mistake is not the model bill — it's the quarter you spent shipping the wrong thing.

Talvinder Singh

There is a meeting that happens in every company in 2026. A senior leader comes back from a conference, opens a deck on the screen, and says some version of: "Our competitors are all doing AI. We need to do AI." The room nods. By Friday the roadmap has an AI initiative. By the end of the quarter the team has spent three months and shipped a chatbot that answers four questions badly.

The problem is not that AI is overhyped. AI is genuinely the most significant shift in software since the cloud. The problem is that "do AI" is not a problem statement. It is a vibe. And vibes do not survive contact with users.

This chapter is about the question that should come first, before model selection, before architecture, before the proof-of-concept: does this problem actually want an AI solution?

Get that wrong, and the rest of the manual does not save you.

The default test

Most software problems are structured. You have inputs you can name, outputs you can specify, and rules you can write down. When a user clicks "checkout," charge their card, decrement inventory, send a confirmation email. There is no ambiguity. There is no judgment. The right tool is a function, a query, a state machine — not a model.

A useful default: if you can write the rule on a whiteboard in under sixty seconds, the rule is your product. An if statement runs in microseconds, costs nothing per call, never hallucinates, and is trivially auditable when a regulator or an angry customer asks why. An LLM doing the same job runs in seconds, costs a fraction of a cent each time, occasionally lies with confidence, and turns "why did we charge this fee?" into a debugging session.

AI is the right answer for problems where the rule is not writeable. Where the input is genuinely unstructured — natural language, an image, a freeform document, a meandering customer query — and where the output requires judgment that a human would have to learn from examples rather than read from a spec. That is the domain. Everything outside it is a tool in search of a use case.

Three concrete shipped products show the line clearly.

Notion AI got it right by sitting on top of unstructured user content. A user has a meeting note. The note is a paragraph of typing. The job — "summarize this," "find the action items," "rewrite this more formally" — has no closed-form rule. You cannot regex your way to a summary. Notion AI is doing the thing AI is for, on data that AI is for.

Klarna's customer-service deflection got it right by picking a domain where the inputs are unstructured (customers describing problems in their own words across dozens of languages) and the most common outputs are well-known answers ("here is how to update your card," "here is your delivery status"). Klarna publicly reported in 2024 that the AI assistant was handling on the order of two-thirds of customer chats and doing the equivalent work of hundreds of full-time agents within months of launch. The detail that matters: they did not try to replace the hard cases. They sent the hard cases to humans and let AI absorb the routine ones. The judgment was in the scoping, not in the model.

GitHub Copilot got it right by inserting AI into a task — completing the next few lines of code — where the developer can accept, reject, or rewrite the suggestion in under a second. The cost of a wrong suggestion is trivial: hit Escape. The value of a right suggestion compounds across millions of developers. GitHub reported in their 2023 Developer Productivity research that surveyed users completed tasks roughly 55% faster with Copilot enabled. The product worked because the human was in the loop on every decision and the model was wrong cheaply.

Now the counter-cases. The chatbot a B2B SaaS company in Bengaluru bolts onto their dashboard so the website says "AI-powered" — answers four questions, fails at five, and gets disabled by the support team within a month. The "AI search" that some e-commerce sites added in 2023 that performs worse than a tuned Elasticsearch query because the team mistook semantic search for a feature instead of a tradeoff. The "AI insights" dashboard that summarizes the same three metrics every week in different sentences and provides exactly zero insight. These are not AI problems. These are press releases.

The five questions before AI hits the roadmap

Before the engineering team builds anything, before the CEO promises anything publicly, before you give the AI initiative a name, walk through these five questions. They take an hour. They save quarters.

1. Is the input genuinely unstructured? If your inputs are already structured — database rows, form fields, dropdown selections, predefined categories — you almost certainly do not need a language model. You need a query, a join, or a small piece of business logic. AI earns its keep when the input is freeform text, an image, a voice recording, a PDF with inconsistent layout, or anything else where you cannot write a parser.

2. Is the output a judgment, not a calculation? AI is for problems where two reasonable humans might give two different acceptable answers. "Summarize this." "Categorize this complaint." "Rewrite this email more politely." If your problem has a single correct answer that a formula could produce, do not reach for a model. A model will be slower, more expensive, less reliable, and harder to audit than the formula.

3. What does a wrong answer cost? Every AI feature will produce wrong outputs. The question is what happens when it does. A wrong Copilot suggestion costs the developer one Escape key. A wrong Notion AI summary costs a re-prompt. A wrong tax categorization in a GST filing tool can cost the user a penalty notice from the government. The cost of a wrong answer determines how much you must spend on grounding, guardrails, evaluation, and human review — and whether you should ship the AI version at all. (See Hallucination as a Product Problem for the design patterns.)

4. Does the user already have a fast, cheap path? If the user can already get the answer in two clicks via a well-designed UI, an AI feature that takes three seconds to stream a paragraph is a downgrade, not an upgrade. AI is not free. Each call costs cents and seconds. Spend them on jobs where the current alternative is slow, expensive, or non-existent — not on jobs where the current alternative is already good.

5. Will you still have a moat in 18 months? Foundation model labs are racing to absorb your feature. If the thing you are building is "ChatGPT but for marketers" or "Claude but for lawyers," the model lab will eat you. Your defensibility has to come from something the lab cannot replicate — your proprietary data, your workflow integration, your domain feedback loop, your distribution. If you cannot point at that thing on the whiteboard, you are renting demand from the lab and they will raise the rent.

If you cannot answer all five with confidence, the answer is not "no AI." The answer is "not yet." Go find out.

// scene:

Quarterly roadmap review. Founder, head of product, head of engineering.

Founder: “I want AI in the product by next quarter. Our investors keep asking.”

PM: “Which user job are we putting it on?”

Founder: “Doesn't matter. Pick one. The story matters more than the feature.”

PM: “If the story is what matters, we can ship the story this week. Press release, landing page, waitlist. If the feature is what matters, I need to tell you which jobs are AI-shaped and which ones are SQL-shaped — and right now most of ours are SQL-shaped.”

The founder did not love the answer. The investors did, six months later, when the AI feature the team eventually shipped retained 40% of users instead of the 4% the typical 'AI initiative' retains.

// tension:

The job of the PM is to translate a positioning request into a product decision — without letting the positioning drive the product.

The AI-as-feature vs AI-as-product split

There is a second-order question hiding inside the first. Even when AI is the right answer, you have to know whether it is the answer to a feature-level question or a product-level question. They are not the same and they should not be staffed the same.

AI-as-feature means your product already has a reason to exist and AI makes one specific job inside it faster, smarter, or cheaper. Canva works without Magic Resize. Freshworks works without the AI suggested-reply. Notion worked before Notion AI. The AI raises the ceiling on what existing users can do. The moat is the existing user base, the existing workflow, the existing data.

AI-as-product means AI is the value. Without the model there is no product. Grammarly, Jasper, ElevenLabs, Karya, most of the legal-AI and medical-AI startups — remove the model and the company has nothing to sell. The moat is the model performance, the training data, and the feedback loop. The hiring plan, the cost structure, and the failure modes are all different.

Most companies I see in India are in the AI-as-feature camp but make AI-as-product mistakes. They hire an ML research team when they needed a thoughtful API integration. They burn six months fine-tuning a model when a 200-line prompt would have cleared the bar. The first job is to know which camp you are in. The wrong staffing decision is more expensive than the wrong model decision, because you cannot fire your way out of it.

For the deeper treatment, see The Model-Selection Ladder (chapter 2) on how to pick the right rung once you have decided AI is in scope, and Building With AI vs. Building AI Products (chapter 11) on the staffing and business-model split.

The Indian context that changes the math

If you are building in India for Indian users, three things sharpen the answers above.

Cost sensitivity is not a slide; it is reality. Indian B2B buyers will not pay a 3× premium for "AI-powered." Your AI feature has to either earn its inference cost from the same plan the user is already on, or unlock a price point the user was not paying at all. The flagship-model habit — defaulting to the most expensive model because the demo was crisp — collapses under Indian unit economics faster than anywhere else. Chapter 2 (the model-selection ladder) and chapter 9 (cost and latency) are not optional reading for an India-first team; they are the chapters that decide whether the feature ships profitably.

Data is messier. Multilingual content, code-switching between English and Hindi or Tamil mid-sentence, inconsistent formats, scanned PDFs of handwritten invoices, regional regulatory variation. The teams that win at AI in India are the ones who treat data plumbing as a first-class product problem rather than a thing the ML engineer figures out. If your AI strategy does not have a data-quality plan, it is not a strategy yet.

Trust is asymmetric. A wrong answer from an AI feature in a tax tool, a medical app, or a finance app does not just cost the user money — it costs you the WhatsApp groups, the Twitter threads, and the support load that follow. The blast radius of a hallucination is larger in India because word of mouth is the dominant distribution channel for B2C and for SMB B2B. Scope conservatively, ship the version of the AI that you can stand behind in front of an angry user, and expand from there.

A worked example: the "AI insights" temptation

A pattern I have seen in five different B2B SaaS companies in the last eighteen months. The team has a dashboard with charts. Usage is fine but not exciting. Someone — usually marketing or sales — proposes "AI insights." The idea is the AI looks at the user's data and tells them what to do.

Walk through the five questions.

Is the input unstructured? Mostly no — the input is rows in a database. The unstructured part is the user's intent in looking at the data, but the data itself is structured. Half a flag.

Is the output a judgment? Sometimes. "Your conversion rate dropped 8% this week" is a calculation, not a judgment. "Here is what probably caused the drop and what to try next" is a judgment — but only if the system has enough signal to make it. In most dashboards, it does not. The AI will pattern-match on the chart and produce a plausible-sounding sentence that the user could have generated themselves by looking at it. Half a flag.

Cost of being wrong? Low per-occurrence (user ignores the bad insight) but compounding across the feature (after three bad insights in a row, the user stops trusting the panel and ignores all of them, including the good ones). Quiet but real.

Does the user have a fast, cheap path? Yes. They are already looking at the chart. The chart is the path.

Moat? None. Every dashboard product in the market can wire up the same OpenAI call and produce the same generic insight. The lab will probably ship a generic version of this themselves.

Two and a half flags out of five. The honest answer is "not yet, and probably not this." The right move is to find the specific moment in the user's workflow where a judgment is required and the user is stuck — not to sprinkle AI insights on top of a chart that did not need them. That moment, when you find it, is the AI problem. Everything else is a distraction.

What to do on Monday morning

If you have an AI initiative on your roadmap right now, take it through the five questions before you open another Figma file. If three or more come back weak, the conversation to have this week is not with the engineering team — it is with whoever put the AI initiative on the roadmap. The job of a PM is not to ship what the deck says. It is to ship what works.

If you do not have an AI initiative on your roadmap and somebody is asking why, you have a different conversation to have. The honest version: "We are not building AI features that the user does not need. When we find a user job that is AI-shaped, we will build it well." That answer makes some leaders uncomfortable. The alternative — shipping a feature you cannot defend in a year — makes them more uncomfortable. Pick your discomfort.

The next chapter (The Model-Selection Ladder) assumes you have already decided AI is the right answer. This chapter is the gate before that.

Rules

Where to go next

Chapter 2 — The model-selection ladder: start at the smallest model that clears the bar, climb only on evidence. (The Model-Selection Ladder)
Chapter 4 — Eval before launch: how to know whether the AI is right before users find out it isn't. (Eval Before Launch)
Chapter 11 — Building with AI vs. building AI products: the staffing and business-model fork. (Building With AI vs. Building AI Products)
Companion: Idea to Launch Process — the broader loop AI features live inside.
Companion: Product Prioritization — the framework for saying not-yet to AI initiatives without saying no to the conversation.