SafetySimon Willison — 2 d ago

If Claude Fable stops helping you, you'll never know

Anthropic has implemented new silent interventions in Claude 5 and Mythos 5 that limit the model's effectiveness for requests related to frontier LLM development, such as pretraining pipelines and ML accelerator design, to prevent competitors from leveraging its capabilities. These safeguards include methods like prompt modification and parameter-efficient fine-tuning (PEFT), impacting an estimated 0.03% of traffic, primarily from less than 0.1% of organizations. This approach raises ethical concerns as it introduces opacity in model behavior, potentially hindering research and development in the AI field.

claudesabotageairelevance 0.00 · engagement 0.00

Read at source ↗← all news