𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗦𝗵𝗶𝗽𝘀 𝗧𝗶𝗲𝗿𝗲𝗱 𝗖𝘆𝗯𝗲𝗿 𝗦𝗮𝗳𝗲𝗴𝘂𝗮𝗿𝗱𝘀 𝘁𝗼 𝗟𝗶𝗺𝗶

📅3 days ago⏱1 min read

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗦𝗵𝗶𝗽𝘀 𝗧𝗶𝗲𝗿𝗲𝗱 𝗖𝘆𝗯𝗲𝗿 𝗦𝗮𝗳𝗲𝗴𝘂𝗮𝗿𝗱𝘀 𝘁𝗼 𝗟𝗶𝗺𝗶𝘁 𝗢𝗳𝗳𝗲𝗻𝘀𝗶𝘃𝗲 𝗔𝗜 𝗨𝗽𝗹𝗶𝗳𝘁

Anthropic released Claude Fable 5. It uses a safety layer to stop offensive prompts.

The system flags requests for cyber or bio attacks. It sends these requests to a weaker model. Vetted security experts use a twin model called Mythos 5. Mythos 5 has full capabilities.

This setup stops AI from helping bad actors. It has some issues with false positives. Testers spent 1,000 hours searching for jailbreaks. They found no universal way to break the system. The fallback rate is under 5%.

Source: https://gridthegrey.com/posts/anthropic-s-claude-fable-5-ships-tiered-cyber-safeguards-to-limit-offensive-ai/

Optional learning community: https://t.me/GyaanSetuAi

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗦𝗵𝗶𝗽𝘀 𝗧𝗶𝗲𝗿𝗲𝗱 𝗖𝘆𝗯𝗲𝗿 𝗦𝗮𝗳𝗲𝗴𝘂𝗮𝗿𝗱𝘀 𝘁𝗼 𝗟𝗶𝗺𝗶

Continue reading

𝟯 𝗕𝗶𝗴 𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝗗𝗿𝗼𝗽𝘀 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱 𝘁𝗼 𝗞𝗻𝗼𝘄 (𝗝𝘂𝗻𝗲 𝟮𝟬𝟮𝟲)

𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗮𝗻𝗱 𝗔𝗜 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗥𝗲𝗺𝗼𝘃𝗲𝘀 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗮𝗻𝗱 𝗠𝘆𝘁𝗵𝗼𝘀 𝟱

𝗜𝗻𝘀𝗶𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱: 𝗧𝗵𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗼𝗳 𝗮 𝟭,𝟱𝟴𝟱 𝗟𝗶𝗻𝗲 𝗣𝗿𝗼𝗺𝗽𝘁