The most expensive sentence in AI product work is usually said casually.
"Let's just use the best model for now."
What sounds like temporary pragmatism is often a permanent margin decision. Teams say it in prototypes, keep it in pilot, and then realize too late that the feature only looks attractive because nobody multiplied the cost by real traffic.
This is why When AI Is the Right Answer and The Model-Selection Ladder belong together. The first asks whether the job deserves AI at all. The second asks which rung is good enough. This lesson adds the consequence: if you default to frontier for work that a smaller model could do, you are not being ambitious. You are silently choosing bad unit economics.
The phrase to get comfortable with is unit economics: revenue and cost at the level of a user, a session, or a feature use. AI product teams that do not think this way end up with beautiful demos and ugly finance reviews.
Why does the gap matter so much? Because the difference between rungs compounds in four directions at once.
First, the per-token rates jump sharply. The exact numbers will move quarter to quarter, but the structural truth does not: the top rung is not 20% more expensive than the lower rung. It is often many multiples more expensive. That means a product decision you barely notice at prototype scale becomes dominant at production scale.
Second, output tokens are expensive and teams chronically overspend on them. They optimize input cost, then let the system write essays where paragraphs would do. A chatty model on a premium rung is paying the highest price on the most avoidable part of the interaction.
Third, context bloat sneaks in. Teams stuff long prompts, too many examples, and oversized retrieval results into every call because they are optimizing for perceived safety. The system gets more expensive before it gets meaningfully better. In practice, bloated context is one of the most common reasons a feature looks "AI-expensive." It is not the nature of AI. It is the nature of sloppy packaging.
Fourth, retries and agent loops multiply everything. One expensive call is manageable. Five expensive calls in one user action is a different business. The feature that looked affordable in a single-turn demo becomes structurally costly once the production workflow includes tool retries, repair attempts, and follow-up passes.
This is why cost discussions should not stop at price-per-million-token tables. Those tables are inputs, not decisions. The real question is: what does one successful user outcome cost me on this rung?
Here is the clean product way to think about it.
For every AI feature, estimate cost per useful outcome, not cost per call.
If the feature is a support answer, what does one resolved support conversation cost?
If the feature is a coding action, what does one accepted and minimally edited suggestion cost?
If the feature is a document summary, what does one summary the user actually keeps cost?
This matters because a more expensive model can still be the right choice if it dramatically reduces costly downstream failures. But the reverse is also true: a frontier model can be the wrong choice if its additional quality does not produce additional economic value.
Klarna's AI support case is useful here because it highlights what teams get tempted by. A company sees meaningful automation potential, sees the prestige of a top provider, and assumes the premium model is justified because the upside is large. Sometimes it is. But unless you measure the whole funnel -- handle rate, escalate rate, repeat inquiry rate, cost to recover mistakes -- you are only reading the top line of the story.
Now look at DeepSeek's cost-curve shock. The strategic lesson was not merely "Chinese labs are catching up." The deeper lesson was that the market price of advanced model capability can collapse faster than managers expect. If your product only works when you pay the frontier premium, your margin is hostage to vendor pricing and your differentiation is thinner than you think. If your product works because you built disciplined routing and thoughtful UX around cheaper intelligence, you are far safer when the market reprices.
That is why default-to-frontier is so dangerous. It creates at least five forms of hidden debt.
The first debt is pricing debt. You anchor the product to a cost structure you may never be able to pass through to users.
The second debt is downgrade debt. Once stakeholders get used to frontier outputs, moving down later feels like deterioration even when the lower rung is objectively enough.
The third debt is design debt. Expensive models make teams lazy about scope. They stop asking whether the interaction should be shorter, more structured, or split into cheaper stages.
The fourth debt is traffic debt. The feature works financially on pilot traffic and quietly breaks when usage becomes mainstream, which is exactly when you least want to redesign it.
The fifth debt is strategic debt. The company starts believing the moat is access to the best model rather than the product choices wrapped around it.
Here is a blunt but useful question for product reviews: if traffic on this feature doubled next quarter, would we be excited or nervous?
If the honest answer is nervous, the unit economics are not healthy yet.
Healthy AI products become more attractive as they scale because their marginal value survives their marginal cost. Unhealthy AI products become scarier as they scale because each additional engaged user expands an already fragile bill.
There is also a pricing-tier mistake that shows up everywhere: giving frontier intelligence away to free users because the demo looked better. This is almost always upside down. Free tiers need disciplined caps, fast lower-rung routing, and a clear reason to upgrade. Enterprise tiers can justify premium reasoning where the value of a correct answer is large enough and the contract covers it. Mixing these up is how companies end up subsidizing their least valuable traffic with their most expensive inference.
The correct stance is not "never use frontier." The correct stance is "make frontier earn its seat."
There are good reasons to pay more:
When the cost of a wrong answer is genuinely high.
When the task requires long-horizon reasoning the smaller rung cannot clear.
When the user segment pays enough that premium intelligence is part of the promise.
When a more expensive model reduces downstream human review enough to be net-cheaper at the workflow level.
But these are explicit cases, not defaults.
The strongest product teams turn cost discipline into design discipline. They shorten outputs. They constrain formats. They trim retrieval. They separate cheap and expensive paths. They design fallback behaviors before launch. They ask which steps truly need model reasoning and which steps can be deterministic. This is not penny-pinching. It is product clarity.
It also changes organizational behavior. Once the team knows every model choice maps to margin, the conversation improves. Founders stop treating model brand as status. Engineers stop overfitting to the best-looking demo. PMs stop greenlighting features that only work under fantasy assumptions.
One more important point: cost discipline is not anti-user. It is pro-longevity. Users do not benefit from a feature that feels magical for three months and then gets rate-limited, downgraded chaotically, or hidden behind sudden paywalls because the economics never worked. The version that survives is the version that was priced honestly from the start.
If you absorb only one sentence from this lesson, make it this one: every frontier default is a pricing decision pretending to be a technical decision.
Once you see it that way, you cannot unsee it. And once you cannot unsee it, you are ready for the next move: build an eval set tight enough that you can prove when a cheaper system is good enough rather than arguing about it in meetings.
Rules from this lesson
- Always translate model choice into cost per useful outcome, not just cost per call or cost per token.
- Frontier by default is usually a pricing decision made lazily. Treat it with the same seriousness as any packaging or margin choice.
- Context bloat, verbose outputs, and retries are often bigger drivers of AI cost than teams admit. Fix the product shape before blaming the market price.
- Use premium models where error cost, user value, or workflow savings justify them. Do not subsidize routine traffic with frontier defaults.
- If traffic growth makes the team nervous, the AI feature is not economically sound yet.