𝗗𝗼𝗻'𝘁 𝗨𝘀𝗲 𝗔𝗻 𝗟𝗟𝗠 𝗧𝗼 𝗗𝗲𝗰𝗶𝗱𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻𝘀

Translated for your language. Leer el original.

AI-assisted draft.

GyaanSetu Editorialhace 17 horas2min de lectura

Stop using LLMs to decide what your AI agent is allowed to do.

I belong to a group called AARM. We study how to secure AI agents. We agree on one thing: control must sit at the point of action. You check a tool call before it runs. The agent cannot bypass this check. Telling an agent "please do not do this" is not a security model.

Many people use a second LLM as a judge. The agent wants to act. You send that action to a second model. You ask it if the action is safe. The model says yes or no. This is a model watching a model. This approach has two major flaws.

First, the judge has the same weakness as the agent. Agents can be tricked by prompt injection or clever user requests. If you can trick the agent, you can likely trick the judge. You are putting a second system that responds to pressure in front of the first one.

Second, LLMs are not deterministic. You can ask a model the same question twice and get different answers. This happens because of sampling. For most tasks, this is fine. For security, it is a liability.

An agent might be allowed to delete a database on Tuesday but blocked on Wednesday. There is no logic to explain why. It was just a different roll of the dice. You cannot explain this to an auditor. You cannot rely on it at two in the morning when things go wrong.

A rule is different. A rule says "deny delete on production." This works every single time. You can test it. You can audit the logs. You can stand behind the decision.

Models are useful for security, but not as the final gate. Use models for soft work:

Spotting weird patterns.
Flagging sensitive text.
Scoring risk levels.
Identifying anomalies.

Let the model flag the issue, but do not let it open the gate. The final decision must sit on a system that gives the same answer every time.

The closer your agent gets to money, production data, or customer info, the more this matters. If an agent writes a bad paragraph, it is not a crisis. If an agent drops a database, it is a disaster.

The final decision should be boring. It should be a hard line the agent cannot talk its way past.

Source: https://dev.to/brianrhall/dont-use-an-llm-to-decide-what-your-ai-agent-is-allowed-to-do-1dkn

Optional learning community: https://t.me/GyaanSetuAi

𝗗𝗼𝗻'𝘁 𝗨𝘀𝗲 𝗔𝗻 𝗟𝗟𝗠 𝗧𝗼 𝗗𝗲𝗰𝗶𝗱𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻𝘀

Seguir leyendo

Respuesta a incidentes de IA agéntica: Revertir agentes rebeldes

𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 𝗳𝗼𝗿 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀

𝗦𝗰𝗼𝗿𝗶𝗻𝗴 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀: 𝗗𝗲𝘁𝗲𝗿𝗺𝗶𝗻𝗶𝘀𝘁𝗶𝗰 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 + 𝗮𝗻 𝗟𝗟𝗠 𝗝𝘂𝗱𝗴𝗲

𝗠𝘂𝗹𝘁𝗶 𝗔𝗴𝗲𝗻𝘁 𝗔𝗜 𝗦𝘆𝘀𝘁𝗲𝗺𝘀: 𝗔 𝗚𝘂𝗶𝗱𝗲 𝘁𝗼 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀

𝗛𝗮𝗿𝗱𝗲𝗻𝗶𝗻𝗴 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 𝗔𝗴𝗮𝗶𝗻𝘀𝘁 𝗣𝗿𝗼𝗺𝗽𝘁 𝗜𝗻𝗷𝗲𝗰𝘁𝗶𝗼𝗻