๐—”๐—ป๐˜๐—ต๐—ฟ๐—ผ๐—ฝ๐—ถ๐—ฐ'๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐˜‚๐—ฑ๐—ฒ ๐—™๐—ฎ๐—ฏ๐—น๐—ฒ ๐Ÿฑ ๐—ฆ๐—ต๐—ถ๐—ฝ๐˜€ ๐—ง๐—ถ๐—ฒ๐—ฟ๐—ฒ๐—ฑ ๐—–๐˜†๐—ฏ๐—ฒ๐—ฟ ๐—ฆ๐—ฎ๐—ณ๐—ฒ๐—ด๐˜‚๐—ฎ๐—ฟ๐—ฑ๐˜€ ๐˜๐—ผ ๐—Ÿ๐—ถ๐—บ๐—ถ๐˜ ๐—ข๐—ณ๐—ณ๐—ฒ๐—ป๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—”๐—œ ๐—จ๐—ฝ๐—น๐—ถ๐—ณ๐˜

Anthropic released Claude Fable 5. It uses a safety layer to stop offensive prompts.

The system flags requests for cyber or bio attacks. It sends these requests to a weaker model. Vetted security experts use a twin model called Mythos 5. Mythos 5 has full capabilities.

This setup stops AI from helping bad actors. It has some issues with false positives. Testers spent 1,000 hours searching for jailbreaks. They found no universal way to break the system. The fallback rate is under 5%.

Source: https://gridthegrey.com/posts/anthropic-s-claude-fable-5-ships-tiered-cyber-safeguards-to-limit-offensive-ai/

Optional learning community: https://t.me/GyaanSetuAi