The Sheffield Press

Technology

Anthropic apologizes for hidden guardrails in Claude Fable 5

By Pamella Goncalves ·
Anthropic apologizes for hidden guardrails in Claude Fable 5

Anthropic has apologized for building hidden guardrails into Claude Fable 5, its newest AI model, after some prompts were silently diverted to Claude Opus 4.8. The company said it is now reversing course and will be more transparent about when the restrictions activate, even if that means Fable refuses more queries outright.

The disclosure matters far beyond one product launch. Fable 5, introduced on June 9, 2026, was billed as Anthropic’s fifth model generation and a Mythos-class system built for the hardest knowledge work and coding problems. Anthropic said it is state-of-the-art on nearly all tested benchmarks and can run for days inside agent workflows, claims that make its behavior especially consequential for benchmarking, academic research and competitive model development.

According to Anthropic, the safeguards were aimed specifically at cybersecurity and biology. When those topics were flagged, requests were automatically routed to Claude Opus 4.8, the company’s next-most-capable model. Anthropic said users would not be charged Fable pricing for rerouted requests, and that the safeguards were tuned conservatively enough to catch harmless prompts at times while still triggering in less than 5% of sessions on average.

AI-generated illustration
AI-generated illustration

The company also said Fable 5 requires 30-day data retention for safety monitoring, underscoring how tightly the model’s deployment is tied to oversight and logging. The pricing for the model was set at $10 per million input tokens and $50 per million output tokens, with a 90% prompt-caching discount and U.S.-only inference available at 1.1 times the base price.

At the same time, Anthropic launched Claude Mythos 5, the same underlying model as Fable 5 but with some safeguards lifted. Mythos 5 is being deployed first through Project Glasswing in collaboration with the U.S. government, with broader trusted access planned later. Anthropic’s system cards page now lists recent model cards for Opus 4.8, Mythos Preview, Sonnet 4.6 and Opus 4.6, part of a growing paper trail around how the company says it evaluates capabilities, safety and deployment decisions.

Related photo
Source: images.moneycontrol.com

The episode puts a sharper question in front of the AI industry: if a model is benchmarked, priced and marketed as one thing, what happens when hidden constraints quietly make it something else for the people trying to measure it?

technologyAnthropicClaude Fable