𝗧𝗵𝗲 𝗛𝘂𝗺𝗮𝗻 𝗶𝗻 𝘁𝗵𝗲 𝗟𝗼𝗼𝗽 𝗦𝗥𝗘

📅4 hours ago⏱1 min read

𝗧𝗵𝗲 𝗛𝘂𝗺𝗮𝗻-𝗶𝗻-𝘁𝗵𝗲-𝗟𝗼𝗼𝗽 𝗦𝗥𝗘

Automation moves faster than humans.

In 2021, a Fastly configuration change caused a global outage. The automation spread the error in under a minute. It took humans 49 minutes to fix it.

This is the core challenge of AI-assisted SRE. AI can detect and fix issues at speeds humans cannot match. The danger is not the technology. The danger is the speed gap between automated actions and human accountability.

You must design an escalation policy to define where automation ends and human judgment begins.

Use the Automation Autonomy Spectrum to govern your AI:

• Level 0 (Manual): AI provides no help. Humans do everything. • Level 1 (Assisted): AI provides context. Humans make all decisions. • Level 2 (Supervised): AI suggests actions. Humans must approve each one. • Level 3 (Conditional): AI acts within set rules. Humans get notified. • Level 4 (Autonomous): AI acts and verifies alone.

Never leave an automation at Level 4 forever. Systems change. An automation that works today might become dangerous tomorrow if the underlying issue shifts. You must review every autonomous action regularly.

Shift from automation to human oversight when these four triggers occur:

Low Confidence: The AI is unsure of its diagnosis.
High Blast Radius: The action affects too many services or users.
Novelty: The failure pattern is new and unseen by the AI.
Regulation: The action touches a protected or compliant system.

Do not let "the AI decided" be your excuse. Every action must trace back to a human or a policy approved by leadership.

Build your policy before you turn on the automation. Use data to prove your AI is accurate. If your AI is wrong too often, downgrade its autonomy immediately.

Source: https://dev.to/npayyappilly/the-human-in-the-loop-sre-designing-automation-escalation-policies-for-ai-assisted-operations-2c7f

Optional learning community: https://t.me/GyaanSetuAi

𝗧𝗵𝗲 𝗛𝘂𝗺𝗮𝗻 𝗶𝗻 𝘁𝗵𝗲 𝗟𝗼𝗼𝗽 𝗦𝗥𝗘

Continue reading

𝗛𝗼𝘄 𝗔𝗜 𝗖𝗵𝗮𝗻𝗴𝗲𝘀 𝗦𝗥𝗘 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀

𝗧𝗵𝗲 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗥𝘂𝗹𝗲 𝗙𝗼𝗿 𝗦𝗮𝗳𝗲 𝗔𝗜

𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗶𝗻 𝗔𝗜: 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 𝗶𝘀 𝗡𝗼𝘁 𝗘𝗻𝗼𝘂𝗴𝗵

𝗧𝗵𝗲 𝗨𝗻𝘀𝗲𝗲𝗻 𝗖𝗼𝗻𝘀𝗲𝗾𝘂𝗲𝗻𝗰𝗲𝘀 𝗼𝗳 𝗔𝗜

𝗛𝗼𝘄 𝘁𝗼 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗔𝗜 𝗦𝗥𝗘 𝗧𝗼𝗼𝗹𝘀