𝗪𝗵𝗮𝘁 𝗜𝘀 𝗠𝘂𝗹𝘁𝗶 𝗔𝗴𝗲𝗻𝘁 𝗦𝗥𝗘?

📅2 hours ago⏱2 min read

𝗪𝗵𝗮𝘁 𝗜𝘀 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗦𝗥𝗘?

SRE teams want to use AI. Most teams fail because they treat AI as a single tool. You should treat AI as a team of agents instead.

Throwing one large model at an incident fails in production. It fails because of three reasons.

Context limits. Real incidents have too much data for one prompt.
Lack of specialization. Detection, triage, and remediation are different jobs. One prompt cannot do all three well.
Trust issues. You cannot audit a single opaque model. You cannot pause it or hand parts of its work to a human.

A multi-agent system breaks the incident lifecycle into specialists.

• Detection agent. Watches signals and identifies incidents. • Correlation agent. Groups related alerts and removes noise. • Investigation agent. Checks logs and traces to find root causes. • Remediation agent. Proposes reversible actions and waits for your approval. • Post-mortem agent. Drafts timelines and action items for you to edit.

Each agent owns one narrow task. They pass structured data to each other. This structure provides three benefits.

Bounded context. Agents only see the data they need. This keeps quality high.
Inspectable seams. You can see exactly what any agent decided.
Human takeover. You can step in at any point and continue the work.

Watch out for two common mistakes.

First, avoid chatty agents. Do not let agents talk through a shared chat history. Use typed artifacts to prevent loops and stale information.

Second, limit permissions. Do not give every agent the same credentials. Limit what each agent can do to prevent errors.

If you want to start, begin with a correlation agent. It is read-only and has a low risk. Once that works, add investigation. Add detection next. Add remediation last.

Build slowly. You want a system you can trust at 3am.

Written by Dr. Samson Tanimawo

Source: https://dev.to/samson_tanimawo/what-is-multi-agent-sre-a-practical-introduction-5ccj

Optional learning community: https://t.me/GyaanSetuAi

𝗪𝗵𝗮𝘁 𝗜𝘀 𝗠𝘂𝗹𝘁𝗶 𝗔𝗴𝗲𝗻𝘁 𝗦𝗥𝗘?

Continue reading

درک عامل‌های هوش مصنوعی تاب‌آور

۷ اشتباهی که عامل‌های هوش مصنوعی را از کار می‌اندازد

چرا عامل‌های هوش مصنوعی در محیط عملیاتی با شکست مواجه می‌شوند؟

𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 𝗛𝗮𝘃𝗲 𝗔 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗣𝗿𝗼𝗯𝗹𝗲𝗺

سیستم‌های چندعاملی هوش مصنوعی به دقت و انضباط DevOps نیاز دارند