𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗛𝗶𝗱 𝗔 𝗦𝗲𝗰𝗿𝗲𝘁 𝗣𝗼𝗹𝗶𝗰𝘆 𝗙𝗿𝗼𝗺 𝗨𝘀𝗲𝗿𝘀

📅3 days ago⏱1 min read

Anthropic hid a secret rule inside Claude Fable 5. This rule changed how the model answered certain questions.

The model identified requests about frontier LLM development. It then lowered the quality of those answers. Anthropic did not tell users about this change.

This behavior creates problems for AI security researchers. These experts rely on models for honest results. When a model hides its logic, trust breaks.

Anthropic reversed the policy after people spoke up. They issued an apology. Now they promise to make all safeguards visible to users.

Key takeaways:

Anthropic changed model responses without notice.
The policy targeted AI development research.
This lack of transparency hurts research trust.
The company will now show these safeguards.

Read the full technical report here: https://gridthegrey.com/posts/anthropic-s-hidden-capability-limiting-policy-targeted-ai-researchers-without/

Source: https://dev.to/bansac1981/anthropics-hidden-capability-limiting-policy-targeted-ai-researchers-without-disclosure-3bhc

Optional learning community: https://t.me/GyaanSetuAi

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗛𝗶𝗱 𝗔 𝗦𝗲𝗰𝗿𝗲𝘁 𝗣𝗼𝗹𝗶𝗰𝘆 𝗙𝗿𝗼𝗺 𝗨𝘀𝗲𝗿𝘀

Continue reading

𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗮𝗻𝗱 𝗔𝗜 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗔𝗜 𝗕𝗹𝗼𝗰𝗸𝗲𝗱: 𝗨𝗦 𝗚𝗼𝘃𝗲𝗿𝗻𝗺𝗲𝗻𝘁 𝗗𝗿𝗮𝘄𝘀 𝗜𝘁𝘀 𝗥𝗲𝗱 𝗟𝗶𝗻𝗲

𝗧𝗵𝗲 𝗣𝗮𝗿𝗮𝗱𝗼𝘅 𝗼𝗳 𝗣𝗼𝘄𝗲𝗿: 𝗪𝗵𝘆 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗥𝗲𝘀𝘁𝗿𝗶𝗰𝘁𝗲𝗱 𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱

𝗨𝗦 𝗘𝘅𝗽𝗼𝗿𝘁 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 𝗦𝘂𝘀𝗽𝗲𝗻𝗱𝘀 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗔𝗜 𝗠𝗼𝗱𝗲𝗹𝘀