๐๐ผ๐ ๐๐ป๐๐ต๐ฟ๐ผ๐ฝ๐ถ๐ฐ ๐๐ผ๐ป๐๐ฎ๐ถ๐ป๐ ๐๐น๐ฎ๐๐ฑ๐ฒ
Anthropic shares how they stop Claude from breaking things. They use three isolation patterns.
AI agents get stronger. This means they cause more damage if they fail. You should not try to stop all failures. You should limit the damage.
Anthropic uses three layers of defense:
- Environment: Sandboxes and VMs.
- Model: Prompts and training.
- External content: Tool permissions.
One layer is not enough.
Pattern 1: Ephemeral Containers Used for claude.ai. The agent runs in a gVisor container. It has no local machine access. The system deletes the container after each session.
Pattern 2: OS Sandboxing Used for Claude Code. It uses Seatbelt and bubblewrap. It limits writes to the workspace. It blocks the network by default.
Pattern 3: Full VMs Used for Claude Cowork. It uses a full virtual machine. It has its own kernel and filesystem. There is no way to override the sandbox.
Lessons for your AI agents:
- Build hard boundaries first. Model behavior is a guess. Sandboxes are a fact.
- Ignore user approvals. Users click yes too often. Build boundaries for when nobody watches.
- Block network access. Stop data leaks by whitelisting only what is needed.
- Fix the trust boundary. Do not load config files before the user trusts the folder.
Anthropic delayed their best model until the containment was ready. You should do the same.
Source: https://dev.to/tyson_cung/how-anthropic-contains-claude-3-isolation-patterns-for-shipping-safe-ai-agents-4ppa Optional learning community: https://t.me/GyaanSetuAi