devtake.dev

Anthropic is sending Mythos 5, the model it called too dangerous, to cyberdefenders and the US government

Mythos 5 is the same model as Fable 5 with cyber safeguards lifted, going to Project Glasswing defenders and, Anthropic says, ~150 orgs across 15+ countries.

Clara Wexler · · 3 min read · 5 sources
A hand holds a smartphone showing the Claude Mythos app logo against a dark backdrop with Anthropic's orange burst symbol.
Image via BBC News · Source

Anthropic spent months calling its Mythos model too dangerous to ship. On June 9 it shipped Mythos 5 anyway, to a vetted set of cyberdefenders and US government partners through Project Glasswing, with the model’s cybersecurity guardrails deliberately switched off.

It’s the same underlying model as Claude Fable 5, the version the company put on public plans the same day. The only difference is what each one is allowed to do. As Anthropic put it, “the safeguards are what distinguish the two, and why we’ve given them different names.” Fable 5 routes cyber and biology questions to a weaker model. Mythos 5 answers them.

What we know

  • Mythos 5 is Fable 5 with the lock removed. Both run the same Mythos-class weights. For Glasswing partners, the cybersecurity safeguards are lifted, so the model will do the offensive-security work the public version refuses, according to Anthropic’s announcement.
  • The capability gap is the whole point. On ExploitBench, which scores offensive-security skill, Mythos 5 hits 78%, against 69% for the earlier Mythos Preview and just 40% for Opus 4.8, per benchmark figures Anthropic shared. That 40-to-78 jump is exactly the capability the company wouldn’t sell to the public.
  • Access is gated and growing. Mythos 5 goes only to Glasswing’s cyberdefenders and infrastructure providers for now. Anthropic says it plans to expand the program to roughly 150 organizations across more than 15 countries, plus a separate trusted-access track for biomedical researchers that lifts the biology safeguards while keeping the cyber ones on.
  • Everything is logged. Prompts and outputs from Mythos-class models “are retained for 30 days for trust and safety purposes,” on every platform where the models run.

This is a reversal, and Anthropic isn’t hiding it. The company had argued the raw model’s ability to find and exploit software flaws was too dangerous for a broad release. Dianne Penn, who heads product management for research and labs, told Fortune the calculus changed once the guardrails matured: “the reason why we’re releasing Fable Five now is very much due to us feeling more confident with our safety guardrails in place.”

What we don’t know

Anthropic disclosed the ExploitBench scores and a 30-day retention window on June 9. It disclosed almost nothing about who is on the other end of the program.

  • Which US agencies are in Project Glasswing, and what they’re cleared to run the model against.
  • How Anthropic vets a “cyberdefender” or an “infrastructure provider,” and who decides when the roster grows toward its stated 150 organizations.
  • What oversight, if any, sits between a Glasswing partner and a frontier offensive-cyber model with the brakes off.
  • Whether the 30-day logging extends to a partner’s own findings, or only to prompts and outputs.

What this means for you

The bet here is asymmetry. Anthropic’s pitch, backed by Glasswing’s first-month haul of more than 10,000 vulnerabilities, is that defenders need the offensive capability more than attackers do, because finding your own bugs first is how you close them. That logic only holds if the gate holds. A model that scores 78% on weaponizing vulnerabilities is the same tool whether a defender or an attacker is typing, and “vetted partner” is doing enormous work in that sentence. If you run infrastructure, the practical question isn’t whether this makes you safer in theory. It’s whether your vendors are inside Glasswing, what they’re feeding it, and who is watching. Those answers aren’t public yet. The history here doesn’t inspire blind trust either: Mythos has already been the subject of a breach scare before it ever shipped, which is a reminder that “restricted access” and “contained” are not the same word.

Share this article

Quick reference

ExploitBench
A benchmark that scores how well a model can find and weaponize software vulnerabilities, the offensive-cyber skill Anthropic restricts. Higher means more capable.

Sources

Mentioned in this article