Your Agent Didn't Break Prod. Your Pipeline Did.

Your agent did not break production. Your pipeline did.

Many teams use agents to open pull requests. They use CI to check for linting, tests, and builds. Then, a scheduled job moves code from staging to production. This setup eventually fails.

The problem is not a malicious agent. The problem is a bad process. You collapsed two different questions into one gate:

  • Did this pass CI?
  • Is this safe for a human to see right now?

These are not the same thing. CI checks if code works. It does not check if a feature is ready for customers.

If your pipeline treats "merged" and "live" as the same thing, you have a problem. You have opted into continuous deployment without deciding to.

You must decouple these two events.

You can use feature flags to do this. Feature flags are just a boolean and an if statement. You do not need expensive tools to start. A simple config value or environment variable works.

My setup follows these rules:

  • PRs merge to main, but main is not what stays live.
  • A separate release step promotes main to a production branch.
  • I must explicitly say go. No cron jobs or timers.
  • The release waits for the build to serve traffic.
  • An automated check hits key endpoints to confirm the site works.
  • A human does a final manual check on the changes.

This creates a gate. A human, a machine, and another human all get a chance to say no before a user sees anything.

If a bug still reaches a user, you need to revert fast. To do this, follow the expand and contract pattern for database migrations.

  • Add a new column as nullable.
  • Backfill the data.
  • Write to both old and new columns.
  • Read from the new one.
  • Drop the old column only in a later release.

If you skip this, your revert button is useless. If a migration drops a column the old code needs, you cannot roll back. You just stay broken.

An agent makes these mistakes happen faster. It removes the manual friction that used to hide your bad processes. The discipline you skipped was never optional. It was just being covered by a human noticing a mistake before 5:00 PM on a Friday.

Take the human out of the loop, and the discipline becomes mandatory.

Source: https://dev.to/mattstratton/your-agent-didnt-break-prod-your-pipeline-did-4g9o