𝗛𝗼𝘄 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗖𝗼𝗻𝘁𝗮𝗶𝗻𝘀 𝗖𝗹𝗮𝘂𝗱𝗲

📅1 week ago⏱1 min read

Anthropic shares how they stop Claude from breaking things. They use three isolation patterns.

AI agents get stronger. This means they cause more damage if they fail. You should not try to stop all failures. You should limit the damage.

Anthropic uses three layers of defense:

Environment: Sandboxes and VMs.
Model: Prompts and training.
External content: Tool permissions.

One layer is not enough.

Pattern 1: Ephemeral Containers Used for claude.ai. The agent runs in a gVisor container. It has no local machine access. The system deletes the container after each session.

Pattern 2: OS Sandboxing Used for Claude Code. It uses Seatbelt and bubblewrap. It limits writes to the workspace. It blocks the network by default.

Pattern 3: Full VMs Used for Claude Cowork. It uses a full virtual machine. It has its own kernel and filesystem. There is no way to override the sandbox.

Lessons for your AI agents:

Build hard boundaries first. Model behavior is a guess. Sandboxes are a fact.
Ignore user approvals. Users click yes too often. Build boundaries for when nobody watches.
Block network access. Stop data leaks by whitelisting only what is needed.
Fix the trust boundary. Do not load config files before the user trusts the folder.

Anthropic delayed their best model until the containment was ready. You should do the same.

Source: https://dev.to/tyson_cung/how-anthropic-contains-claude-3-isolation-patterns-for-shipping-safe-ai-agents-4ppa Optional learning community: https://t.me/GyaanSetuAi

𝗛𝗼𝘄 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗖𝗼𝗻𝘁𝗮𝗶𝗻𝘀 𝗖𝗹𝗮𝘂𝗱𝗲

Continue reading

𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗜𝘀 𝗧𝗵𝗲 𝗡𝗲𝘄 𝗕𝗮𝘁𝘁𝗹𝗲𝗴𝗿𝗼𝘂𝗻𝗱

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰'𝘀 𝗖𝗹𝗮𝘂𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 𝟱 𝗦𝗵𝗶𝗽𝘀 𝗧𝗶𝗲𝗿𝗲𝗱 𝗖𝘆𝗯𝗲𝗿 𝗦𝗮𝗳𝗲𝗴𝘂𝗮𝗿𝗱𝘀 𝘁𝗼 𝗟𝗶𝗺𝗶

𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗛𝗶𝗱 𝗔 𝗦𝗲𝗰𝗿𝗲𝘁 𝗣𝗼𝗹𝗶𝗰𝘆 𝗙𝗿𝗼𝗺 𝗨𝘀𝗲𝗿𝘀

𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗻𝗴 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲 𝗦𝘂𝗯𝗮𝗴𝗲𝗻𝘁𝘀