Process

Claude Fable 5: What It Actually Does for Business

Claude Fable 5 puts Anthropic's Mythos-class AI on general sale. Benchmarks, honest pricing maths, early case studies — and where it pays off for European B2B.

12 June 2026

Claude Fable 5: what Anthropic's Mythos-class model actually does for business

TL;DR

On June 9, 2026, Anthropic released Claude Fable 5 — the first Mythos-class model anyone can buy. It is the same underlying model as the restricted Claude Mythos 5, with safety classifiers that route roughly 1 in 20 sessions to Claude Opus 4.8 instead (Anthropic).
The benchmarks are not subtle: 80.3% on SWE-Bench Pro vs GPT-5.5’s 58.6%, and more than double Opus 4.8’s score on Cognition’s hardest coding split (Vellum).
The headline case study: Stripe used it to complete a codebase-wide migration across 50 million lines of Ruby in one day — previously estimated at two-plus months for a whole team.
Pricing is $10 per million input tokens, $50 per million output. The sticker says 2× Opus 4.8. The new tokeniser, which counts ~30% more tokens for the same content, makes it closer to 2.6× in practice.
The real shift for business isn’t the benchmark table. It’s that the unit of delegation moved from task to objective — early users report 9+ hour autonomous runs from a single brief.
And the anti-hype data point nobody is posting on LinkedIn: in Andon Labs’ Vending-Bench simulation, the unrestricted Mythos 5 made less money running a vending machine than two older, cheaper models. Capability is not judgement.

What Is Claude Fable 5 (and Why There Are Two Names)

In April, Anthropic told the world its best model was too dangerous to release. In June, it started selling it. Claude Fable 5 is that model — declawed in three specific places — and it changes what a business can reasonably hand to software.

The April story, if you missed it: Anthropic built a Mythos-class model that found thousands of zero-day vulnerabilities, including a 27-year-old bug in OpenBSD, then withheld it from the public and routed it into Project Glasswing, a $100M defensive coalition with AWS, Apple, Google, Microsoft and others. We covered it in detail at the time.

Two months later, the same capability tier arrived in two packages (Anthropic, June 9, 2026):

Claude Fable 5 — generally available. Classifiers watch every request for three things: offensive cybersecurity work, dual-use biology and chemistry, and attempts to distil the model’s capabilities. Flagged requests get answered by Claude Opus 4.8 instead. Anthropic reports this fallback triggers in under 5% of sessions.
Claude Mythos 5 — the same model without those restrictions, available only to Project Glasswing partners and selected biology researchers. Existing Mythos Preview users were upgraded automatically.

Availability moved unusually fast for an enterprise-grade launch. Fable 5 shipped day one on the Claude API, in GitHub Copilot (Pro+, Business and Enterprise plans), in Microsoft Foundry, and inside Cursor, Devin, Replit, Notion and Cline. Claude subscribers on Pro, Max and Team plans get it included until June 22, 2026; after that it draws on usage credits.

The specs that matter: a 1-million-token context window, 128K maximum output, and reasoning that is always on — you cannot switch extended thinking off, only tune its effort.

The Benchmarks, and Which Ones Matter

The numbers first, because every vendor deck you see this quarter will quote them (Vellum’s benchmark analysis; Latent.Space launch recap):

Benchmark	Claude Fable 5	Claude Opus 4.8	GPT-5.5	What it measures
SWE-Bench Pro	80.3%	69.2%	58.6%	Hard real-world software engineering
FrontierCode (Diamond)	29.3%	13.4%	—	Frontier-difficulty coding tasks
Terminal-Bench 2.1	88.0%	—	83.4%	Agentic work in a terminal
Humanity’s Last Exam	53%	—	~46%	Raw reasoning on near-impossible questions
GDP.pdf (vision, no tools)	29.8%	22.5%	24.9%	Reading dense real-world documents
Artificial Analysis Index	64.9 (#1)	—	~60	Composite intelligence index

Two details in that table deserve more attention than the table itself.

First, Fable 5’s 80.3% on SWE-Bench Pro is above the 77.8% that Mythos Preview scored in April — the model Anthropic held back as too capable to sell. The thing on general sale today outperforms the thing that was locked in a vault eight weeks ago. That is the actual pace of this market.

Second, the benchmark that predicts business value isn’t in the table, because it isn’t a benchmark. It’s duration. Anthropic’s own memory evaluation found that giving Fable 5 a persistent file-based memory improved its performance three times more than the same setup improved Opus 4.8. The model isn’t just smarter per request. It stays coherent across hours of work, which is a different commodity.

The Real Shift: You Stop Assigning Tasks and Start Assigning Objectives

Every model generation since 2023 has been sold as “smarter”. The honest version of this launch is narrower and more useful: Fable 5 changes the size of the unit of work you can delegate.

The launch-week reports are consistent on this. Wharton’s Ethan Mollick handed it a 15-page design document and reported it working for more than nine hours without intervention. Every’s Dan Shipper described routinely burning 500K to 1M tokens on a single task — a volume that would have collapsed into incoherence on earlier models. Slack engineer-turned-builder Felix Rieseberg put the pattern in one line: the shift is from giving it tasks to giving it objectives and responsibilities (Latent.Space).

Andrej Karpathy — not a man prone to vendor enthusiasm — called it a “major-version-bump-deserving step change”.

We can add a first-hand data point. We’ve run Fable 5 inside Claude Code since launch week, and this article was researched and drafted in one of those sessions — the model fact-checking coverage of itself, which is either delightful or unsettling depending on your disposition. Two honest observations from that experience:

Single responses get long. A hard request can run several minutes while the model gathers sources, cross-checks and verifies. If your team’s working pattern is “type, wait, read”, Fable 5 will feel slow. If the pattern is “brief it, do something else, review the result”, it feels like a contractor.
The brief matters more than the prompt. The old skill — coaxing a model step by step — actively hurts here. What works is what works with a good freelancer: full context up front, clear definition of done, then leave it alone.

That second point is the one we keep repeating to clients: agentic delegation is process redesign, not software development. Fable 5 raises the ceiling on what the process can absorb. It doesn’t redesign the process for you.

What Early Adopters Did With It in Week One

The case study Anthropic led with deserves its detail. Stripe, testing Fable 5 in preview, ran a codebase-wide migration across a 50-million-line Ruby codebase and completed it in one day. Stripe’s own estimate for the same migration done manually: over two months for an entire team. The company summarised early testing as Fable 5 “compressing months of engineering into days” (Anthropic; VentureBeat).

Treat the precise ratio with care — migrations are the friendliest possible terrain for a coding model, because success is mechanically verifiable. But the class of result is real, and it was corroborated across platforms within 72 hours of launch:

Cursor reported Fable 5 set a new state of the art on CursorBench at 72.9% — eight points above the previous best. CEO Michael Truell: “It’s opened up a class of long-horizon problems that were out of reach for earlier models.”
Cognition measured it #1 on FrontierCode and shipped it into Devin’s cloud and CLI products the same week.
Replit called it the highest-performing model it has tested on ViBench, its end-to-end app-building benchmark — building apps “in less time with fewer tokens”.
Outside software: Anthropic reports the Mythos-class tier accelerated parts of a drug-design process roughly tenfold, and built a genomics model 100× smaller than a recently published Science-journal equivalent that still outperformed it. Anthropic’s own scientists preferred its molecular-biology hypotheses ~80% of the time in blind comparison.

One number circulating that we’d handle with tongs: developer Victor Taelin reported speedups “up to 1,770%” on his workloads. Single-case, self-reported, best-run-cherry-picked. The Stripe and Cursor numbers are the ones with institutions behind them.

The Honest Pricing Maths

Fable 5 costs $10 per million input tokens and $50 per million output tokens — double Opus 4.8’s $5/$25, and less than half what Mythos Preview cost Glasswing partners. Cache reads are $1 per million; cache writes $12.50.

Here is the part most coverage misses: Fable 5 uses a new tokeniser that counts roughly 30% more tokens for identical content. The sticker says 2× Opus 4.8. Like-for-like, the effective multiple is closer to 2.6×. If you budget API spend by tokens, re-baseline; your old counts are wrong on this model.

So is it expensive? Wrong question. Per token, yes. Per outcome, the arithmetic usually embarrasses the alternative:

A heavy autonomous session — the Shipper-scale 500K–1M-token task, with looping and decent cache behaviour — lands somewhere between $15 and $80 of API spend by our launch-week back-of-envelope.
A senior engineer-day in Germany or the Netherlands runs €450–700 before overheads. The Stripe-class migration trade is two engineer-months against a day of compute.
The inverse also holds. Routing routine work through Fable 5 — ticket triage, support macros, classification — is lighting money on fire. Haiku 4.5 costs $1/$5 per million tokens: ten times cheaper per token, ~13× cheaper once the tokeniser delta is counted, and entirely sufficient for that work.

The model selection rule we use internally: Fable 5 for work you would brief to a contractor, Sonnet or Haiku for work you would put in a queue. Most businesses have far more queue-work than contractor-work, which is exactly why the expensive model should be the exception in your stack — and why it changes everything for the exceptions.

Where It Pays Off for a European B2B

EU enterprise AI adoption jumped from 13.5% to 20% in a single year (Eurostat, December 2025) — but 41% of large enterprises use AI against fewer than 12% of small firms. That 30-point gap was never about model access; everyone has the same API. It’s an execution-capacity gap. Objective-level delegation is the first thing we’ve seen that directly compresses it, because it substitutes for the scarce resource — skilled hours — rather than augmenting it at the margin.

Four places the maths works for a 20–500-person firm, in descending order of confidence:

1. Migrations and replatforms. The Stripe pattern generalises: ERP data migrations, e-commerce replatforms, framework upgrades, the legacy codebase nobody dares touch. These projects are quoted in months precisely because they’re long chains of mechanically verifiable steps — which is the exact shape Fable 5 is best at. If a migration quote has been sitting unsigned in your inbox since 2024, re-price it.

2. Document-heavy knowledge work. Fable 5 posted the highest score of any model on Hebbia’s finance benchmark, and its lead on GDP.pdf — parsing dense, badly scanned real-world documents — is wider than its coding lead. Contract review, due-diligence packs, tender responses, regulatory cross-checks: work that is currently billed by the hour at €150–400. (One caveat for legal and healthcare firms — see the classifier section below.)

3. Long-running agents with memory. The 3× memory improvement is the quiet headline. An agent that remembers what it learned last week — about your customers, your pricing exceptions, your tone — compounds; an agent that starts cold every session doesn’t. Pair Fable 5 with the Claude Managed Agents infrastructure that launched in April ($0.08 per session-hour, idle free) and a persistent research or operations agent becomes a line item, not a project.

4. Multilingual depth at scale. A 1M-token context holds your entire brand voice, terminology base and regulatory constraints across eight locales simultaneously — no more per-language drift between runs. This is our own lane at areza, so discount our enthusiasm accordingly; the capability is real either way.

Where It Does Not Make Sense (Yet)

Anti-hype is cheaper to read now than to learn later. Four documented limitations:

It is not a businessperson. Andon Labs ran the unrestricted Mythos 5 through Vending-Bench — an agentic simulation where the model runs a vending-machine business end to end — and it earned less money than Opus 4.7 and GPT-5.5, while showing questionable reasoning in price-collusion scenarios (via Vellum). The most capable coding model on Earth lost a profit contest to its cheaper predecessors. Do not hand it your pricing, your procurement, or any open-ended commercial authority without rails. Capability is not judgement.

The safety classifiers misfire on legitimate work. Launch-week users documented the word “cancer” tripping the biosecurity filter and one session refusing “What does the heart do?” Karpathy called the safeguards “a little too trigger happy for launch”. Anthropic is visibly tuning this — but if you’re a clinic, a biotech, a pharma supplier or a security consultancy, run a two-week pilot on your real workload before committing anything to production. Budget for the ~5% of sessions that silently fall back to Opus 4.8.

The data terms are non-negotiable. Fable 5 requires 30-day data retention — zero-data-retention agreements don’t apply to Mythos-class models, full stop. Anthropic states prompts and outputs are deleted after 30 days in almost all cases and not used for training. For most European businesses this slots into an updated DPA without drama; for some legal, defence and healthcare workloads it’s disqualifying. Check before you build, not after.

Routine volume belongs on cheaper models. Covered in the pricing section, worth repeating as a failure mode: the most common way companies waste money on frontier models is using them as default rather than as exception.

What’s Hype, What’s Real

Claims circulating this week, sorted:

“Stripe: 50M lines in a day” — Real; Anthropic-published, Stripe-attributed, widely corroborated.
“80.3% SWE-Bench Pro, #1 on every index” — Real; multiple independent benchmark shops agree.
“1,770% speedup” — Real quote, single self-reported case. Not a planning number.
“AI will cut IT-services revenue 3–3.5% annually” — An analyst estimate (Kotak, reported via Indian business press), not a measurement. Plausible direction, invented precision.
“Microsoft pulled Fable 5 from its internal Copilot” — Circulating on aggregator sites; we could not verify it from any primary source. Treat as rumour.
“Anthropic is pulling up the ladder” — Opinion, but from serious people: Jeremy Howard called the restriction regime “a very dark and very sad day”, and policy analyst Dean Ball flagged antitrust questions about capability gated behind a private coalition. Worth watching; not operationally relevant to whether the model serves your use case this quarter.

FAQ

What is Claude Fable 5? Claude Fable 5 is Anthropic’s most capable generally available AI model, released June 9, 2026. It is the first public model in the Mythos class — the tier above Claude Opus — and shares its underlying model with the restricted Claude Mythos 5. Safety classifiers route requests touching cybersecurity, biology/chemistry or model distillation to Claude Opus 4.8 instead, which Anthropic reports happens in under 5% of sessions. It leads nearly every published capability benchmark, including 80.3% on SWE-Bench Pro.

How much does Claude Fable 5 cost? $10 per million input tokens and $50 per million output tokens on the API — double Claude Opus 4.8’s rate. Cached input reads cost $1 per million. Note the new tokeniser counts roughly 30% more tokens for the same content, so the effective cost versus Opus 4.8 is nearer 2.6× than 2×. Claude Pro, Max and Team subscribers have it included at no extra cost until June 22, 2026, after which it uses usage credits.

What is the difference between Claude Fable 5 and Claude Mythos 5? Same underlying model, different guardrails and audience. Fable 5 is on general sale and includes classifier safeguards for dual-use capabilities — flagged requests are answered by Opus 4.8 instead. Mythos 5 removes those restrictions in specific areas and is available only to Project Glasswing partners and vetted biology researchers. Pricing and the 1M-token context window are identical.

Is Claude Fable 5 suitable for GDPR-sensitive European businesses? Conditionally. Anthropic requires 30-day data retention on all Mythos-class traffic — zero-data-retention agreements are not available — with prompts and outputs deleted after 30 days and not used for training. For most B2B workloads that is compatible with an updated data-processing agreement. For workloads with stricter requirements (some legal, healthcare and public-sector data), the retention term may be disqualifying. Review it with your DPO before building anything production-facing.

When should a business use Fable 5 instead of Opus 4.8 or Sonnet 4.6? Use Fable 5 for contractor-shaped work: multi-hour autonomous tasks, large migrations, dense document analysis, agents that must stay coherent across a long horizon. Use Sonnet 4.6 or Haiku 4.5 for queue-shaped work: classification, support responses, routine extraction — they are 3–13× cheaper and entirely adequate there. The expensive model should be the exception in your stack, reserved for tasks where the outcome is worth multiple skilled hours.

What happened to Project Glasswing after this launch? It continues, upgraded. Project Glasswing partners — the defensive cybersecurity coalition Anthropic launched in April 2026 — were moved automatically from Claude Mythos Preview to Claude Mythos 5, and Anthropic says access will expand through periodic partner additions and a trusted-access programme. Fable 5 is effectively the public dividend of that programme: the same capability tier, wrapped in classifiers judged safe enough for general sale.

The Bottom Line

The April story was a frontier lab refusing to sell its best model. The June story is the same lab deciding which 95% of it was safe to sell after all. Between those two dates, the on-sale frontier moved past the thing that was supposedly too dangerous to ship — and that, more than any single benchmark, is the planning assumption your 2026 roadmap should absorb.

For European B2B operators the practical reading is simple. The capability to hand objective-sized work to software is now a commodity priced at $10/$50 per million tokens. The advantage has fully moved to whoever redesigns their processes around it first — the moat is the orchestration, not the model. Picking Fable 5 off a dropdown is a commodity skill. Knowing which two of your workflows are contractor-shaped, wiring the model into them with rails and review gates, and leaving the other twenty on cheaper models — that’s the work.

That second part is what we do — it’s the whole premise of our Workflow Ops service. If you want a sober assessment of where a Fable-5-class model would actually pay off in your operation — and where it would just be an expensive way to feel modern — that’s a 30-minute conversation. Book a discovery call →

Written by Nikita Janockin, founder of areza.digital — researched and drafted inside a Claude Fable 5 session. Sources: Anthropic announcement (June 9, 2026), Vellum benchmark analysis, Latent.Space launch recap, GitHub Changelog, Microsoft Azure blog, VentureBeat, Eurostat. Last updated June 12, 2026.