Technology
Anthropic releases guarded Mythos AI model, blocks cyber abuse
Anthropic has moved one of its most capable AI systems into public view while hardening it against the uses that most alarm security researchers. The company released a guarded version of its Mythos-class model that blocks high-risk requests in cybersecurity and biology, and it automatically falls back to Claude Opus 4.8 when a user tries to push it into disallowed territory. The release is a clear test of whether frontier AI can be sold broadly without turning into a more efficient weapon for abuse.
That tension was already visible in Anthropic’s own warning that the underlying model was strong enough to spot and exploit software vulnerabilities. The company said it put the system through extensive jailbreaking attempts and more than 1,000 hours of external bug bounty and red-team testing, but hackers were unable to find a universal bypass. Anthropic’s safety bug bounty program is aimed at exactly that kind of failure, searching for jailbreaks that can defeat its Constitutional Classifiers.

Anthropic framed the rollout as a two-track strategy. A spokesperson said, “The difference is not what the model can do, but what our safeguards will allow.” Dianne Penn described the move as part of a “race to the top,” a phrase that captures the industry’s current argument: the companies racing ahead on capability are now being judged on whether they can contain it. Anthropic also expanded access to a restricted version for select customers already approved to use the model, signaling that the most advanced features remain behind a narrower gate.
The commercial details show how carefully Anthropic is balancing reach with restraint. Claude Fable 5, the public-facing version of Mythos, is available through the Claude API and consumption-based Enterprise plans, with subscription access rolling out in stages. It was set to remain included at no extra cost in Pro, Max, Team, and seat-based Enterprise plans through June 22 before shifting to usage credits. Anthropic’s pricing page lists Claude Fable 5 and Claude Mythos 5 at $10 per million input tokens and $50 per million output tokens, double the listed price of Claude Opus 4.8.
The broader stakes extend well beyond one product launch. Anthropic said Project Glasswing began in April 2026 with roughly 50 partners using Claude Mythos Preview to scan codebases for vulnerabilities, then expanded on June 2 to about 150 additional organizations in more than 15 countries. Those partners, including operators in power, water, healthcare, communications, and hardware, have found more than 10,000 high- or critical-severity vulnerabilities. Anthropic said a major attack on most of those organizations could affect more than 100 million people, and it expects other AI companies to have Mythos-class models within 6 to 12 months, possibly without comparable safeguards. That makes Anthropic’s guardrails more than a product feature: they are an early attempt to define the rules for how much power the market can safely absorb.
Sources
- [1]semafor.com
- [2]anthropic.com
- [3]cnbc.com
- [4]techcrunch.com
- [5]platform.claude.com
- [6]support.claude.com