The week the capex story broke
On 27 January 2025, Nvidia lost roughly $589 billion of market capitalisation in a single trading session — the largest one-day market-cap loss in the history of the US stock market. The proximate cause was a research paper and a model release from a company most US investors had not heard of two weeks earlier: DeepSeek, a Hangzhou-based AI lab spun out of the Chinese quantitative hedge fund High-Flyer.
The headline that circulated on trading desks that morning — "a Chinese startup trained a frontier model for $6 million" — was simultaneously true, misleading, and beside the point. The number is real and it is from DeepSeek's own paper. It is also explicitly the cost of the final pre-training run, not the cost of the program. And the part of the story that mattered most for product strategy was not the dollar figure at all.
The real story is this: for most of 2023 and 2024, the AI industry's planning assumptions — the ones that justified hundred-billion-dollar capex commitments, the ones that anchored every "moat from scale" pitch deck, the ones that informed US export-control policy — were built on a price-per-frontier-capability curve that turned out to be substantially wrong. DeepSeek did not invent the techniques that broke the curve. It published a credible, reproducible, open-weights demonstration that the curve had already broken. The market read the paper and re-priced eighteen months of narrative in a single session.
For a product team, this case is not about geopolitics or stock prices. It is about what happens to your strategy when a planning assumption you treated as a constant turns out to be a variable, and how to build product organisations that survive that kind of correction.
What DeepSeek actually shipped
Two models, in two months, both released with open weights under permissive licences. The technical artefacts deserve to be described precisely, because the loose summaries that circulated in the days after release got most of the detail wrong.
DeepSeek-V3, December 2024. A 671-billion-parameter Mixture-of-Experts language model with roughly 37 billion active parameters per token. Trained on 14.8 trillion tokens. Released under an MIT-style licence with weights downloadable from Hugging Face. The accompanying technical report (arXiv:2412.19437) disclosed an unusual amount of detail about the training run, including the now-famous figure: 2.788 million H800 GPU-hours, valued at roughly $5.6 million at $2 per GPU-hour. On standard public benchmarks — MMLU, HumanEval, GSM8K, MATH, the long-context retrieval evals — V3 sat in the same band as GPT-4o and Claude 3.5 Sonnet (June 2024 era), the closed frontier of that moment.
DeepSeek-R1, January 2025. A reasoning model in the OpenAI o1 / o3 line — chain-of-thought trained, with significant inference-time compute spent on deliberation. The R1 paper described a reinforcement-learning recipe (DeepSeek-R1-Zero, then R1 with cold-start SFT) that produced reasoning traces competitive with o1 on AIME, MATH-500, and Codeforces. R1 was also released open-weights, alongside a family of distilled smaller models built on Qwen and Llama bases. The distilled 32B and 70B variants were good enough on most reasoning benchmarks that they immediately became the default open-source reasoning models for teams that could not (or would not) run o1.
Two facts are worth holding together. First: the models are real. The benchmarks were independently reproduced within days by Western labs, by Hugging Face, and by independent evaluators. The capability claims are not marketing — they are checked. Second: the cost claim is narrower than the headlines made it sound. The $5.6 million is the cost of the final pre-training run for V3, at the GPU-hour price DeepSeek assumed. It does not include the cost of failed runs, prior experiments, the smaller models DeepSeek had already trained (V1, V2, the V2.5 series), the salaries of the research team, the cluster acquisition cost, or — and this matters — the research lineage from High-Flyer's prior trading-systems work that provided the initial GPU base.
DeepSeek's own paper is careful about this. The journalism downstream of it largely was not.
What the market read into it
The market reaction on 27 January 2025 was not a sober reading of a technical report. It was a discontinuous re-pricing of a thesis. The thesis being re-priced was roughly: frontier AI capability requires hundred-thousand-GPU clusters, billion-dollar training runs, and access to the highest-end Nvidia hardware. That requirement creates a durable moat for the labs that have raised the capital to fund it, and a durable revenue stream for Nvidia, which sells the picks and shovels.
The DeepSeek release made three uncomfortable points against that thesis at once.
One: frontier-adjacent capability could be reached on H800s, the deliberately throttled-for-China variant of the H100 that US export controls had been designed to keep out of Chinese frontier labs. The export-control regime had assumed that hardware was the binding constraint. DeepSeek's result suggested that the binding constraint was actually engineering — and that engineering pressure created by the hardware ceiling had pushed DeepSeek to discover efficiencies that the unconstrained Western labs had not had to discover.
Two: the open-weights gap to the closed frontier was now a quarter, not a generation. The conventional wisdom inside US AI policy circles in 2024 had been that open-source models lagged the frontier by roughly twelve to eighteen months and that the gap was stable or widening. R1 collapsed that estimate. By February 2025, the most capable open-weights reasoning model in the world was Chinese, MIT-licensed, and free to download.
Three: the "$5.6 million" number, even with all its caveats, was off by an order of magnitude from the public guidance Western frontier labs had been giving about training costs. GPT-4 was widely reported to have cost over $100 million to train. Gemini Ultra estimates ran higher. Even allowing for the difference between "final run" and "program cost," the gap was not a small efficiency improvement — it was a structural break in the assumed cost curve.
The trillion-dollar AI infrastructure narrative — the one that justified the OpenAI-Microsoft Stargate announcement that same January, that priced Nvidia at a $3 trillion market cap, that funded the hyperscaler capex commitments running into 2027 — was built on the assumption that the cost curve was steep and would stay steep. The market saw evidence that it was not, and did the only thing markets do when a key assumption updates: re-priced everything downstream of it in a session.
The honest skepticism
A case study that ends with "China made it cheap, capex is over" would be wrong, and worse, would be repeating the same lazy reading that the January market re-pricing did. The DeepSeek result is consequential. It is also more nuanced than the social-media version suggested.
The training-cost figure is for one run, not a program. Stratechery's Ben Thompson, Dylan Patel and the Semianalysis team, and several academic analyses published in the weeks after R1's release converged on roughly the same correction: DeepSeek's total program cost — including failed runs, the V1/V2/V2.5 lineage, the cluster, salaries, and the High-Flyer prior — is plausibly in the $500M to $1B range over the multi-year arc, much closer to Western frontier-lab spend than the $5.6M headline suggested. The efficiency gain is real, but it is roughly a 5–10x improvement on per-final-run cost, not the 20x the headline implied.
What DeepSeek may have spent less on, relative to Western labs, is harder to verify but worth naming. Western frontier labs spend significantly on post-training safety: RLHF, red-teaming, constitutional AI, refusal training, jailbreak resistance. DeepSeek's models are noticeably less aggressive on refusals and noticeably more willing to discuss topics that Western models will decline. That is a content-moderation choice, but it is also a cost choice — the alignment tax is real, and it is paid in compute and in human labeller time. A fair comparison of "training cost" should separate "the run that produces a model that can answer questions" from "the program that produces a model an enterprise can safely deploy."
The benchmark parity is also more textured than the leaderboards suggest. DeepSeek-V3 and R1 are strong on math, coding, and structured reasoning — the domains where reinforcement learning from verifiable rewards has the most signal. They are weaker on the long-tail-of-knowledge, multimodal-native-reasoning, and long-horizon-agentic tasks where the Western frontier still leads as of May 2026. Public benchmarks systematically favour the domains DeepSeek was optimising for. Private evaluations — the kind product teams actually need before shipping — show a more mixed picture.
And the geopolitical context matters. DeepSeek is a Chinese company subject to Chinese law. The model is open-weights, which mitigates some of the data-sovereignty risk for self-hosting teams, but the lineage of the training data, the training process, and the corporate entity sits inside a regulatory regime that several Western jurisdictions have decided is incompatible with regulated workloads. The strategic question for a product team is not just "does the model perform" — it is "can I ship a product built on this model to my customers in their jurisdiction." For many enterprise-software companies, that answer in 2025 was no, regardless of capability.
The honest read: DeepSeek demonstrated that the cost curve for frontier-adjacent capability is steeper-downward than Western labs had publicly signalled, that hardware constraints are less binding than US policy assumed, and that the open-weights gap is months not years. It did not demonstrate that "AI is now cheap." It demonstrated that the floor of capable AI is cheaper than the ceiling of capable AI by more than the industry had priced in.
The 2026 reality
Sixteen months on, what stuck and what did not.
The DeepSeek-R1 distilled models are the default OSS reasoning baseline. Almost every open-weights reasoning model released in 2025 — from Llama 4's reasoning variants to Mistral's mid-2025 MoE family to the Qwen 3 reasoning line — traces lineage, technique, or distillation back to R1. The R1 recipe — RL from verifiable rewards, cold-start SFT, distillation into smaller bases — has become standard practice. The technical innovation outlived the news cycle.
The cost curve on open weights has collapsed roughly 5–10x for equivalent capability between January 2025 and May 2026. The mini-tier closed-frontier models (GPT-5-mini, Sonnet 4.5 Haiku, Gemini Flash) priced down in response. The price floor of "good enough for most jobs" inference is now under a tenth of a cent per thousand tokens for many workloads. The product-strategy implication is in The 2026 Model Landscape — most of what was "frontier capability" in 2024 is now a commodity input.
The closed frontier did not collapse. Anthropic, OpenAI, and Google kept the lead on the harder jobs — long-horizon agents, multimodal native reasoning, frontier coding. The DeepSeek line continued to ship — V4 and R2 in 2025, with credible-but-not-leading performance — but did not displace the Western frontier at the top of the capability stack. The convergence was on the middle of the stack, where most product work actually lives.
The Nvidia stock recovered most of the January 2025 drop within six months as capex commitments held and inference demand grew faster than the cost-per-token decline. The "AI is now cheap, capex is over" thesis turned out to be wrong in detail — capex stayed elevated because inference volume grew faster than the cost curve dropped. The lesson there is its own case study about the difference between cost per unit and total cost.
US export-control policy was reworked through 2025 in response to the DeepSeek result. The October 2025 chip-control update tightened restrictions on advanced packaging and HBM memory rather than on raw compute — an acknowledgement that the original "limit GPU access" framing had not constrained the capability outcome the policy was designed to constrain. Whether the new framing works is a 2027 question.
The strategic implications for product teams
Three things changed for product strategy on 27 January 2025. None of them is "switch your stack to DeepSeek."
Your "AI cost will only go down" planning assumption was right — but for the wrong reasons. Most product strategy decks written between 2022 and 2024 had a slide that assumed AI inference costs would decline 5–10x over three years. That slide turned out to be correct. The reasoning behind it — "the frontier labs will optimise their inference stacks and pass savings to developers" — turned out to be incomplete. The actual driver was open-weights competitive pressure forcing the closed labs to price down their mini-tiers. If you built your product on the assumption that your single closed-frontier provider would deliver the cost decline organically, you got the right answer for the wrong reason, and you are now exposed to the next correction in the same way.
Platform risk on a single frontier provider is higher in 2026 than it looked in 2024, not lower. This is counter-intuitive and important. When the gap between frontier and open-weights was eighteen months and stable, picking a frontier provider was a tolerable bet — the next-best alternative was bad enough that switching costs were absorbed by capability gains. When the gap is three months and the open-weights cohort is itself competitive, the cost of provider lock-in did not change but the opportunity cost of not being able to swap rose sharply. Every quarter you cannot move your inference workload to a 5x-cheaper provider is real margin you are leaving on the table. (The 2026 Model Landscape develops this further; rule ai-93.)
The case for an abstraction layer in your product just got stronger. In 2024, the argument for our_llm_call(prompt, model_class) was "future optionality is worth one engineer-week." In 2026, the argument is "the cost curve under your product is moving faster than your roadmap and the alternative is to manually refactor your inference layer every six months." Teams that built the abstraction layer in 2024 spent the January 2025 week evaluating DeepSeek as a config-flag experiment. Teams that did not spent the same week explaining to their boards why their unit economics had not improved. (Rules ai-94 and ai-95.)
The deeper point is that capacity planning under genuine uncertainty about the cost curve is a different exercise from capacity planning under a known-steep cost curve. The product organisation that survives is the one whose architecture treats the model layer as a fast-moving commodity input rather than a load-bearing strategic dependency. That posture is cheap to adopt before you need it and expensive to retrofit after.
The geopolitics, stated carefully
A product team does not need a political opinion about US-China technology competition. It does need to understand that the regulatory environment around AI infrastructure is now a first-class strategic variable, not an afterthought.
Stated without editorial: US semiconductor export controls implemented from October 2022 onwards restricted Chinese access to the highest-end Nvidia training chips. Chinese frontier labs, DeepSeek among them, worked on H800s — Nvidia's deliberately throttled China-market variant — and on a mix of domestic alternatives. The H800 has roughly half the memory bandwidth of the H100. Training a frontier-competitive model on H800s required engineering choices — fine-grained pipeline parallelism, custom communication kernels, FP8 mixed precision, the DualPipe scheduling described in the V3 paper — that an unconstrained lab with H100s would not have had the incentive to develop. The export-control regime, designed to slow Chinese AI capability development, created the engineering pressure that produced the efficiencies that re-priced the Western capex narrative. The policy did not fail because the policy was poorly executed; it failed because the underlying model of how AI capability scales was incomplete.
For a product team, the strategic reality is this: the geopolitical posture of your provider stack is now a real input to your risk model. If your product serves regulated industries, you need to know whether your inference layer can be served by providers in the jurisdictions your customers are in. If your product depends on capability gains that are concentrated in one geopolitical bloc's labs, you are exposed to policy decisions you do not control. The DeepSeek case is the clearest illustration we have that this exposure is not theoretical.
This is not a case for or against any policy. It is a case for putting "regulatory jurisdiction of my model provider" on the same row in your risk register as "uptime of my model provider."
What this case teaches
The judgment lessons, tagged to the AI Manual rules that frame them in The 2026 Model Landscape and Building With AI vs. Building AI Products.
-
Never trust capex narratives that justify lock-in. When a vendor tells you that their scale is a permanent moat, they are describing the present and projecting it. The DeepSeek result is the cleanest recent example of a "permanent" capability moat eroding in months. (Rule ai-93.)
-
The open-weights gap closes faster than vendors predict. Every public estimate of the OSS-to-frontier gap from 2023 and 2024 was too long. Assume the next estimate is too long as well. Build for a six-month gap, not a two-year one. (Rule ai-92.)
-
Your abstraction layer is your hedge. The engineering cost of
our_llm_call()is paid once; the cost of not having it is paid every time the cost curve under you moves. Teams that built the layer in 2024 monetised the DeepSeek release as a tailwind. Teams that did not, did not. (Rule ai-94.) -
Separate "final run" from "program cost" in every vendor claim. This is the discipline the DeepSeek headlines failed. When a provider, internal team, or competitor reports a training or inference cost figure, ask what is included and what is not. The interesting number is total cost to a deployable model, not the cheapest sub-component of it.
-
Public benchmarks are a vendor's marketing budget. Capability claims on MMLU, HumanEval, and the rest are useful as ceiling indicators, not as deployment signals. The thing that decides whether a model survives in your product is your private eval (Eval Before Launch), run on every model release including the ones that do not make the trade press. (Rule ai-89.)
-
Convergence is the trend; commoditisation is the mechanism. The 2026 landscape — six frontier-class labs, an open-weights cohort closing fast, mini-tier pricing collapsing — is a single story about middle-of-the-stack commoditisation. Plan as if the middle of the stack is a fast-moving commodity input and the top of the stack is a slower-moving specialty good. (Rule ai-90.)
-
Geopolitical jurisdiction is a stack decision. Where your model is served from, who regulates the provider, and which jurisdictions your customers will accept are first-class inputs to your architecture in 2026, not a compliance afterthought. (Rule ai-93.)
-
Distinguish "cost is going down" from "cost is going down for the reason I think it is." The cost-down assumption was right; the assumed mechanism was wrong; the strategic implications of "right answer, wrong reason" are different from the implications of "right answer, right reason." Audit the mechanism, not just the conclusion.
Sources
- DeepSeek-AI. "DeepSeek-V3 Technical Report." arXiv:2412.19437, December 2024.
- DeepSeek-AI. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." Released January 2025.
- Patel, Dylan et al. "DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts." Semianalysis, January 2025.
- Thompson, Ben. "DeepSeek FAQ." Stratechery, 27 January 2025.
- Wall Street Journal, Financial Times, Bloomberg coverage, 27–31 January 2025, on Nvidia market-cap movement and the broader AI-stock re-pricing.
- US Bureau of Industry and Security export-control updates, October 2022 and October 2023, defining the H800 hardware regime DeepSeek trained on.