𝗗𝗼𝗻'𝘁 𝗨𝘀𝗲 𝗔𝗻 𝗟𝗟𝗠 𝗧𝗼 𝗗𝗲𝗰𝗶𝗱𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻𝘀

Stop using LLMs to decide what your AI agent is allowed to do.

I belong to a group called AARM. We study how to secure AI agents. We agree on one thing: control must sit at the point of action. You check a tool call before it runs. The agent cannot bypass this check. Telling an agent "please do not do this" is not a security model.

Many people use a second LLM as a judge. The agent wants to act. You send that action to a second model. You ask it if the action is safe. The model says yes or no. This is a model watching a model. This approach has two major flaws.

First, the judge has the same weakness as the agent. Agents can be tricked by prompt injection or clever user requests. If you can trick the agent, you can likely trick the judge. You are putting a second system that responds to pressure in front of the first one.

Second, LLMs are not deterministic. You can ask a model the same question twice and get different answers. This happens because of sampling. For most tasks, this is fine. For security, it is a liability.

An agent might be allowed to delete a database on Tuesday but blocked on Wednesday. There is no logic to explain why. It was just a different roll of the dice. You cannot explain this to an auditor. You cannot rely on it at two in the morning when things go wrong.

A rule is different. A rule says "deny delete on production." This works every single time. You can test it. You can audit the logs. You can stand behind the decision.

Models are useful for security, but not as the final gate. Use models for soft work:

  • Spotting weird patterns.
  • Flagging sensitive text.
  • Scoring risk levels.
  • Identifying anomalies.

Let the model flag the issue, but do not let it open the gate. The final decision must sit on a system that gives the same answer every time.

The closer your agent gets to money, production data, or customer info, the more this matters. If an agent writes a bad paragraph, it is not a crisis. If an agent drops a database, it is a disaster.

The final decision should be boring. It should be a hard line the agent cannot talk its way past.

Source: https://dev.to/brianrhall/dont-use-an-llm-to-decide-what-your-ai-agent-is-allowed-to-do-1dkn

Optional learning community: https://t.me/GyaanSetuAi