๐—•๐—จ๐—œ๐—Ÿ๐——๐—œ๐—ก๐—š ๐—” ๐—–๐—›๐—”๐—ข๐—ฆ ๐—ง๐—˜๐—ฆ๐—ง๐—œ๐—ก๐—š ๐—ง๐—ข๐—ข๐—Ÿ ๐—™๐—ข๐—ฅ ๐—ฉ๐—œ๐——๐—˜๐—ข ๐—”๐—ฃ๐—œ๐—ฆ

A 200 OK response broke our discovery service.

The server cut off the body at 8 KB. Our system parsed the partial data. It wrote empty rows to the database. Users in three regions saw empty results for 40 minutes. No alerts fired. All health checks stayed green.

We assumed failures were honest. We expected 500 errors or connection failures. The server lied to us.

We built a testing rig to simulate these dishonest failures. It injects faults into the read paths.

The rig tests these failures:

We focus on four goals:

This process revealed hidden bugs. Our retry logic failed during alternating errors. Future dated timestamps broke our cache. Slow trickles starved our workers.

Do not trust your upstreams. Build a tool to lie to your code. Find bugs before they wake you up at 2am.

Source: https://dev.to/ahmet_gedik778845/building-a-chaos-testing-harness-for-multi-region-video-api-endpoints-1oh3