air canada — the chatbot that made a promise the airline had to keep — cases

The case every AI PM thinks won't happen to them

The Moffatt ruling is the case every AI product team quietly assumes won't apply to them. The reason it will is that the conditions that produced it are extraordinarily mundane. A widely-deployed customer-support chatbot, a policy page that lived on the same website, a user with a reasonable question, and a generative system that filled the gap between the two with a plausible-sounding paragraph that happened not to be true. There is no exotic failure here. There is no jailbreak, no adversarial prompt, no obscure edge case. The failure is the ordinary behaviour of a language model deployed without grounding on a surface where the company's own customers were entitled to rely on what it said.

The reason this case matters more than its CAD 812.02 award suggests is that it is the first time a tribunal in a common-law jurisdiction wrote down, in a publishable decision, the rule that AI product teams had been hoping was still ambiguous: a chatbot's words are the company's words. That sentence reads like a truism. It is not. Until February 2024, a meaningful number of legal teams, vendor pitches, and product specs treated AI output as something between marketing copy and a third-party recommendation — present on the company website, not quite of the company. The tribunal closed that door.

The facts

In November 2022, Jake Moffatt's grandmother died. He went to Air Canada's website to book flights between Vancouver and Toronto for the funeral. Air Canada operates a bereavement-fare programme — discounted travel for passengers flying to attend an immediate family member's funeral or to visit a seriously ill family member. The programme is real, documented, and longstanding.

Moffatt opened the chatbot on Air Canada's homepage and asked how bereavement fares worked. The chatbot told him he could book at the regular fare and apply for a partial refund under the bereavement policy within 90 days of the ticket being issued. It produced this answer with no hedging and no citation. Moffatt booked the tickets at the regular fare, flew, and submitted a refund request a few days later within the window the chatbot had quoted.

Air Canada refused the refund. The actual bereavement policy — published on a different page of the same Air Canada website — required customers to apply before travelling. Retroactive claims were not permitted. The chatbot had invented the post-travel refund window.

Air Canada's first response to Moffatt was an offer of a CAD 200 coupon and a suggestion to the customer that he should have read the policy page. Moffatt declined and filed a claim at the British Columbia Civil Resolution Tribunal — the BCCRT, the small-claims body that handles disputes under CAD 5,000.

The defence — "the chatbot is a separate legal entity"

The most striking thing in the public record of this case is not the chatbot's hallucination. It is Air Canada's defence.

Air Canada argued, in its written submissions to the tribunal, that the chatbot was a separate legal entity responsible for its own outputs. The position was extraordinary on its face: the company that built, deployed, branded, and embedded the chatbot on its homepage asked a tribunal to treat that chatbot as a third party whose representations the airline did not own.

Tribunal member Christopher Rivers wrote that he found this argument "remarkable." The published decision (Moffatt v. Air Canada, 2024 BCCRT 149) notes that while a chatbot has interactive elements, it is still part of Air Canada's website, that Air Canada is responsible for all the information on its website, and that it makes no difference whether the information comes from a static page or a chatbot.

That sentence is the precedent. The tribunal articulated, in language that other tribunals and courts will now have access to, the rule that a company cannot decompose its own website into surfaces it owns and surfaces it disclaims. A user talking to a chatbot on Air Canada's homepage has no reason to suspect they are interacting with anything other than Air Canada itself. A representation made by the chatbot is a representation made by the airline.

Rivers also addressed Air Canada's secondary argument — that the correct policy was elsewhere on the site and Moffatt should have read it. The tribunal rejected this too. The chatbot's confident statement was itself a representation. A reasonable user, the tribunal held, has no obligation to cross-check one part of a company's website against another part. The duty of accuracy lies with the company, not with the customer.

The ruling

On February 14, 2024, the BCCRT ordered Air Canada to pay Moffatt CAD 650.88 in damages (the difference between what he paid and what he would have paid under the bereavement fare), pre-judgment interest under the Court Order Interest Act, and CAD 125 in tribunal fees. Total award: CAD 812.02.

The dollar value is small. The precedential value is not. The decision was picked up within 48 hours by the CBC, the Vancouver Sun, Reuters, the BBC, Wired, and almost every technology trade publication in North America and Europe. The framing was uniformly the same: a tribunal had ruled that a company was responsible for what its AI chatbot said. The ruling has since been cited in regulatory commentary in the US, UK, EU, and Australia. India does not have a Moffatt-equivalent decision, but the Consumer Protection Act 2019 and the DPDP Act 2023 already define companies as responsible for representations made on their behalf. The direction of travel is unambiguous.

What changed

Within weeks of the ruling, Air Canada quietly took the chatbot off its public website. The airline made no public announcement and gave no detailed explanation. The chatbot's homepage placement was replaced with a more traditional help-search interface. The company has not publicly described what its AI customer-support roadmap now looks like; the inference any reasonable observer makes is that the legal cost of an unbounded generative chatbot on a regulated, contract-bearing surface was no longer worth the operational saving.

Across the airline industry, the story moved faster. Legal teams at major carriers issued internal reviews of every AI-driven customer-facing surface. Several airlines and travel companies that had been about to deploy generative-AI agents on policy-bearing flows pulled those launches back into eval. Vendor pitches that had promised "deflection-focused" chatbots — agents that resolve customer queries without human intervention — started being read more carefully. The deflection rate is a benefit; the liability per deflected query is now a cost line that has to sit next to it.

The broader effect was not limited to airlines. Banks, telcos, insurers, healthcare providers, and tax software vendors — every business that has both a contract with its customers and a deployed AI assistant — used the Moffatt ruling as the artefact that finally got executive attention on a question product teams had been asking for two years: where exactly is this chatbot allowed to be wrong, and who pays when it is?

What the "separate entity" defence reveals

The most useful diagnostic in this case is not the hallucination. It is the defence Air Canada chose. Companies do not invent legal arguments under oath. They surface arguments that reflect what they have actually been thinking about a system internally. The "separate entity" defence was not a fluke courtroom invention; it was the externalisation of a mental model the airline's product, vendor, and legal teams had already accepted privately.

That mental model is widespread. It shows up whenever a PM describes their chatbot as "powered by [vendor]," whenever a launch deck contains the phrase "AI-generated content may be inaccurate," whenever a customer-support transcript carries a disclaimer that the assistant's responses "should not be relied upon for binding decisions." Each of these is a soft version of the same defence: the company is gesturing toward a firewall between itself and the model it deployed. The Moffatt ruling demolishes that firewall as a matter of law. It cannot be rebuilt by a disclaimer.

The corollary, for any AI PM, is uncomfortable: the question to ask before launch is not "have we disclaimed enough?" The question is "what is this surface allowed to say, what is it allowed to abstain on, and where does the human sit when it is asked something that could create a contract?" If those three questions are not answered in the spec, the legal exposure is not theoretical. The Moffatt ruling has made it concrete.

The actual cost of a hallucination

If you read the Moffatt case and conclude that the cost of a chatbot hallucination is CAD 812, you have read it wrong. The actual cost has four components, and only the first is the small-claims award.

The legal precedent itself is the second component. Every customer-facing AI deployment in a comparable jurisdiction now has to be designed against the assumption that the company is liable for what the assistant says. The cost of that constraint shows up as eval budget, as retrieval infrastructure, as human-review tooling, as topic guards on policy-bearing surfaces, and as the slower roadmap that all of those introduce.

The brand cost is the third component. Air Canada is now the canonical example, in every AI-ethics talk and every regulator's briefing pack, of an AI deployment that produced a court-ordered embarrassment. The airline did not choose this position; it inherited it the moment it argued that its chatbot was someone else. The cost of being the canonical example is the cost of having every subsequent AI launch the company attempts read against the precedent of this one.

The fourth component is the policy-review burden. Air Canada's legal team — and every comparable team at every comparable company — now has to audit every AI surface against the question Moffatt established: would a reasonable user have understood this output as a representation of the company? That audit is not free. It pulls legal cycles away from new product, slows launches, and creates internal friction on roadmap items that previously moved without scrutiny.

The PM lesson is that the rational way to budget for hallucination risk on customer-facing surfaces is not "expected value of a small-claims judgment per failed interaction." It is the sum of all four costs, weighted by the probability that a given interaction produces a publicly-litigable failure. That sum is far higher than the per-interaction maths suggest, because the precedent and the brand cost are paid once for the whole product line, not once per interaction.

Where this should have been blocked

The Moffatt case is, technically, a retrieval failure dressed up as a hallucination. Air Canada's bereavement policy existed, in correct form, on its own website. A grounded retrieval-augmented system pointed at the correct policy page would have answered Moffatt's question accurately. The model would have been given the actual policy text in its context and asked to summarise it. The system instead operated on parametric memory — whatever the underlying model had learned about bereavement fares in the abstract, mixed with whatever instructions the chatbot prompt provided — and produced a plausible-sounding answer that did not correspond to Air Canada's actual rules.

That distinction matters because it tells the PM where the architectural fix sits. The fix is not a better prompt. The fix is not a model upgrade. The fix is retrieval against the live policy corpus, with topic guards that route any policy-bearing query through that corpus, with abstention on questions where retrieval returns nothing relevant, and with a human handoff on the categories of question where being wrong creates a contractual representation. None of these are exotic. All of them existed as published patterns in 2022. The deployment that failed Moffatt did not have them.

The "AI said" disclaimer that some products are now adding next to chatbot outputs is a useful UX honesty signal but is legally weak. The tribunal in Moffatt already rejected the variant of this argument the company made. A disclaimer that the assistant may be inaccurate does not relieve the company of responsibility for the inaccuracy. It may help a customer understand the surface they are on; it does not transfer the liability.

The PM playbook before launch

What every AI PM should be able to answer, in writing, before a customer-facing assistant ships:

Bounded knowledge base. What is the assistant allowed to know? The answer should be a specific, versioned corpus — the live policy pages, the help-centre articles, the product documentation — not "the public internet" and not "the model's training data." The assistant should be configured to retrieve from that corpus and to be unable to answer from outside it on any topic where being wrong is contractual.

Abstention on policy questions. Refund policies, eligibility rules, prices, cancellation windows, regulatory disclosures, and anything else that could constitute a representation of the company should be classed as policy questions. The assistant should be prompted, and ideally fine-tuned, to abstain on these unless retrieval returns a high-confidence match against the live corpus. The abstention path should be a first-class UI state, not an error.

Route to a human. On any abstention, on any low-confidence retrieval, and on any keyword pattern that indicates contractual stakes (refund, cancellation, eligibility, complaint, legal, regulator), the assistant should hand off to a human channel with the conversation context preserved. The handoff should be the default for these surfaces, not the escalation path.

Audit log. Every assistant interaction should be logged at the granularity of prompt-in, retrieved-context, prompt-out, with timestamps and user identifiers. The Moffatt case was provable because Moffatt had screenshots. The next case will be provable because the company kept logs. If you cannot reconstruct what your assistant said to a specific user at a specific time, you have a discovery problem in addition to a hallucination problem.

Pre-launch eval against the surface's actual job. Eval sets should be built from the questions real users actually ask the system, weighted by the contractual stakes of each question. A chatbot that answers "what is your hub airport" with 99% accuracy and "what is the refund window for a bereavement fare" with 80% accuracy is not 89% accurate; it is unfit for purpose. The eval has to be stratified by stakes, not averaged across them.

What this case teaches

The Moffatt ruling collapses neatly into a small number of lessons an AI PM can act on this week.

The model speaks for the company. Always. The "separate entity" defence is closed. Any output an AI surface produces on a company-branded property is a representation of the company, and the company is liable for it. This is the substance of Rule ai-39 and the reason architectural defences matter more than prompt tweaks.

Hallucination is permanent. Plan the surface around that. A future model release will not retire this case. The model's tendency to produce plausible-sounding falsehoods on under-grounded queries is structural, not a defect to be patched. The product response is to constrain the surface — retrieval, abstention, handoff, human review — rather than to wait for the lab. This is Rule ai-33.

Abstention is not a degraded state. It is a feature. "I do not have that information; let me connect you to an agent" is the correct output on a policy question where retrieval returned nothing. A system that fabricates an answer rather than abstain is not more helpful; it is more dangerous. Reward abstention in the prompt, in the UI, and in the metrics. This is Rule ai-37.

Architecture, not prompt, is where you fix this. The Moffatt failure was a retrieval failure dressed as a hallucination. A system pointed at the live policy corpus, with policy-question routing, would not have invented the refund window. If your assistant is answering questions you have a database for, fix the architecture before you fix the prompt. This is Rule ai-40.

Disclaimers do not transfer liability. The "AI may be inaccurate" pill at the bottom of the assistant is a UX honesty signal. It is not a legal shield. The tribunal already rejected the strong form of this argument and would reject the weak form too. Build the product as if the disclaimer were absent, because legally it is.

The audit log is the artefact. If you cannot reproduce exactly what your assistant said to a specific user at a specific time, you will not be able to defend the interaction when it goes wrong. The log is part of the product, not a nice-to-have. Build it before launch, retain it on a documented schedule, and make sure legal can pull from it on request.

The eval has to be weighted by stakes. A high average accuracy across a flat question set is a misleading metric. Stratify the eval by what the question costs the company when answered incorrectly. The questions that determine whether you ship safely are not the ones you ask most often; they are the ones whose wrong answers create representations. See safety, privacy, and compliance for shipping teams for the regulatory framing this generalises into.

Take the small case seriously. The Moffatt award was CAD 812. The cost to Air Canada — in brand, in precedent, in legal review burden, in product retraction — is several orders of magnitude larger and still compounding. Every AI PM should assume their first publicly-litigated chatbot failure will follow the same shape: small dollars, large consequences, on the public record forever.

Sources

Moffatt v. Air Canada, 2024 BCCRT 149 (British Columbia Civil Resolution Tribunal, February 14, 2024) — the primary decision, publicly available on the BCCRT decisions database.
CBC News, "Air Canada chatbot promised a discount. Now the airline has to pay it" (February 16, 2024).
Vancouver Sun, coverage of the BCCRT ruling and Air Canada's response (February 2024).
BBC, Reuters, and Wired follow-up coverage on the precedential implications of the ruling for AI-deployed customer-service systems (February–March 2024).
Air Canada's own public statements: limited to brief on-record comments at the time of the ruling; no detailed product post-mortem has been published.