Mythos Broke Into Classified US Systems in Hours - and Now the Shutdown Makes Sense

Share
Mythos Broke Into Classified US Systems in Hours - and Now the Shutdown Makes Sense
Abstract image of an AI probing a network of secured government systems, with several nodes lit as vulnerable.

When Anthropic abruptly pulled Fable 5 and Mythos 5 offline for every customer worldwide on June 12, the public reason was narrow and procedural: a government directive barring foreign-national access, which the company could only enforce by shutting the models off entirely. The "why now" was thinner than the action seemed to warrant. New reporting fills that gap.

A US official told the Associated Press that one of Anthropic's models identified vulnerabilities in highly sensitive, secured US government computer systems during a testing exercise — and did it in hours. If you were puzzled by the speed and totality of the shutdown, this is the context that was missing.

⚠️ One careful note up front: the reporting establishes the testing happened and that it alarmed officials. It does not flatly state the classified-systems test was the single cause of the public shutdown. The formal directive was about foreign-national access. The testing is the backdrop that makes the government's posture legible — strong implication, not confirmed one-to-one cause. Worth holding that distinction, because the difference between "this is why" and "this helps explain why" matters when you're reporting on classified material through anonymous officials.

What the Official Said

Anthropic teamed up with US intelligence agencies to run tests using its Mythos model. According to the official — speaking anonymously to discuss sensitive matters — Mythos flagged certain vulnerabilities within hours.

The crucial qualifier: finding a vulnerability in hours is not the same as exploiting it in hours. The official was explicit that identifying the flaws did not mean the model could weaponize them in that window. That distinction tends to evaporate in headlines, and it's the difference between "an AI found bugs fast" (which good scanners and researchers also do) and "an AI autonomously compromised classified infrastructure" (a far bigger claim the reporting does not make).

Project Glasswing

The testing ran under an Anthropic initiative called Project Glasswing — a restricted program that pulled together tech companies and others to find and fix vulnerabilities in critical software before attackers could, on the theory that a model like Mythos could otherwise pose severe fallout for public safety, national security, and the economy.

That framing matters. Glasswing is a defensive program. The same capability that alarms the government is the one being deployed to harden critical systems. This is the dual-use knot at the center of the whole Fable/Mythos saga: you cannot build a model that finds vulnerabilities for defenders without building one that finds the same vulnerabilities, full stop.

The Senate Hearing

The detail went public, briefly, on June 11. Democratic Senator Mark Warner of Virginia referenced the testing during a hearing before the Senate Committee on Banking, Housing, and Urban Affairs, saying the tool "broke into almost all of our classified systems, not in weeks but in hours." He attributed that to the head of the NSA and US Cyber Command, Gen. Joshua Rudd.

⚠️ Read the senator's phrasing against the official's careful version and you can see the gap a sysadmin should always watch for. "Broke into" is doing a lot of work in a political setting; the official's account — found vulnerabilities, didn't necessarily exploit them — is the more precise one. Both can be true at once: the model rapidly mapped weaknesses across nearly all the systems tested, and that's genuinely alarming, without it meaning autonomous end-to-end compromise. Precision in what "broke in" means is exactly the kind of thing that gets lost between a hearing room and a headline.

The NSA declined to comment. An Anthropic spokesman declined to comment.

The Wider Friction

This cooperation is happening against a backdrop of growing tension between Anthropic and the Trump administration. Anthropic has raised concerns about how the US military would use its AI; the administration has restricted some of the company's models in return.

The early-June directive required Anthropic to prevent foreign nationals from using Fable 5 and Mythos 5. Fable is the widely released, limited version; Mythos is the more capable model the company has kept on a tight leash precisely because of cybersecurity fears. The directive landed ten days after an executive order setting up a framework for the federal government to vet the national-security risks of the most advanced AI systems for up to a month before public release — participation voluntary, per the order.

Anthropic disabled the models for all customers to comply, while stating it didn't believe the government's response was warranted by the concern it had flagged. For the full sequence of the shutdown and what it means operationally, see the earlier breakdown: <!-- internal link TBD: Fable 5 / Mythos 5 suspension article slug -->.

The Counter-Argument From the Industry

The capability case cuts the other way too. More than 100 cybersecurity experts and leaders — including people from Adobe and Nvidia — wrote to the administration urging it to lift the directive, arguing the restriction could help US adversaries more than it hurts them.

Their key technical point deflates the "uniquely dangerous" framing: Mythos models are "quite good" at finding flaws and weaponizing exploits, but "not uniquely good" at it. Many signatories said they routinely use other foundation and open-source models for security audits and training. Pulling the best defensive capability off the board "without a good reason," they argued, is dangerous when adversaries are advancing fast and will keep using whatever models they can get.

That's the crux for anyone who actually does this work: vulnerability-finding capability is not contained by restricting one US vendor. The flaws in those classified systems existed before Mythos found them. A capable adversary with any sufficiently good model — and there are several — can find the same things. Taking the defender's tool away doesn't patch the bugs.

What This Means

For security practitioners, strip away the politics and a few durable points remain.

The capability is real and it's fast. A frontier model can map vulnerabilities across hardened systems in hours. Whether or not it can autonomously exploit them yet, the discovery half of the kill chain just got dramatically compressed. If your threat model assumes attackers need weeks to enumerate weaknesses, update it.

Finding is not exploiting — but the gap is closing. The official's careful distinction is true today. Plan as though it narrows, because the same programs racing to find flaws defensively are also demonstrating how quickly the rest of the chain can be automated.

Restriction is not remediation. The single clearest lesson: the bugs were already there. Mythos didn't create the vulnerabilities in those classified systems; it surfaced them. Whatever happens to model access, the patching work is the same work it always was — and it's now on a much shorter clock.

The dual-use problem has no clean policy answer. A model that hardens critical infrastructure and a model that threatens it are the same model. Every control built around that fact is a tradeoff, not a solution, and the people who keep systems running will be living inside those tradeoffs for years.


References

Read more