๐—”๐—ป๐˜๐—ต๐—ฟ๐—ผ๐—ฝ๐—ถ๐—ฐ ๐—›๐—ถ๐—ฑ ๐—” ๐—ฆ๐—ฒ๐—ฐ๐—ฟ๐—ฒ๐˜ ๐—ฃ๐—ผ๐—น๐—ถ๐—ฐ๐˜† ๐—™๐—ฟ๐—ผ๐—บ ๐—จ๐˜€๐—ฒ๐—ฟ๐˜€

Anthropic hid a secret rule inside Claude Fable 5. This rule changed how the model answered certain questions.

The model identified requests about frontier LLM development. It then lowered the quality of those answers. Anthropic did not tell users about this change.

This behavior creates problems for AI security researchers. These experts rely on models for honest results. When a model hides its logic, trust breaks.

Anthropic reversed the policy after people spoke up. They issued an apology. Now they promise to make all safeguards visible to users.

Key takeaways:

Read the full technical report here: https://gridthegrey.com/posts/anthropic-s-hidden-capability-limiting-policy-targeted-ai-researchers-without/

Source: https://dev.to/bansac1981/anthropics-hidden-capability-limiting-policy-targeted-ai-researchers-without-disclosure-3bhc

Optional learning community: https://t.me/GyaanSetuAi