不要再盲目信任智能体：将审批与精确的工具调用绑定

📅4 hours ago⏱2 min read

𝗦𝘁𝗼𝗽 𝗧𝗿𝘂𝘀𝘁𝗶𝗻𝗴 𝗧𝗵𝗲 𝗔𝗴𝗲𝗻𝘁: 𝗕𝗶𝗻𝗱 𝗔𝗽𝗽𝗿𝗼𝘃𝗮𝗹𝘀 𝘁𝗼 𝗘𝘅𝗮𝗰𝘁 𝗧𝗼𝗼𝗹 𝗖𝗮𝗹𝗹𝘀

Most agentic systems protect dangerous actions like file writes or money transfers with a simple approval.

Usually, this approval is a boolean flag in the system state. Example: approved: true.

This is a mistake. A boolean fails in three ways that attackers exploit:

Flip: An attacker changes the state from false to true via prompt injection or code flaws.
Replay: You approve a safe command like "read file." The system sees "true" and allows a second, dangerous command like "delete database."
Argument Drift: You approve "send $10." An attacker changes the amount to $10,000 before execution. The flag still says "true."

The problem is that you are modeling approval as a property of the entire session. It must be evidence for one specific call.

How to fix it:

When a human approves a call, create a secure tag. This tag must lock these four things:

The unique tool call ID.
A hash of the exact arguments.
The user identity.
An expiration time.

Verify this tag at the exact moment of execution. Use a secret key that only the system knows.

Follow these rules for implementation:

Use Canonicalization: Both the approver and the executor must hash the exact same bytes. Use RFC 8785 to ensure numbers and keys match.
Fail Closed: If a tag is missing, expired, or wrong, return a specific "not approved" error. Do not treat it as a standard tool result.
Deny by Default: Only allow tools that require explicit approval. Deny everything else.
Handle Replays: If you use engines like Temporal, ensure your secret key is deterministic. If the key changes after a system restart, all existing approvals will fail.

Authorization should not be a floating piece of state. It must be a bound envelope that proves: "This specific person approved these specific arguments for this specific tool until this specific time."

Stop using booleans. They are not a simplification. They are a bug.

Source: https://dev.to/whatsonyourmind/stop-trusting-the-agent-bind-tool-call-approvals-to-the-exact-call-5080

Optional learning community: https://t.me/GyaanSetuAi

不要再盲目信任智能体：将审批与精确的工具调用绑定

Continue reading

𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝗜𝗻𝗰𝗶𝗱𝗲𝗻𝘁 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲: 𝗥𝗼𝗹𝗹 𝗕𝗮𝗰𝗸 𝗥𝗼𝗴𝘂𝗲 𝗔𝗴𝗲𝗻𝘁𝘀

智能体 AI 治理框架

我的编程智能体每一步微小的操作都要征求许可

构建生产级智能体循环

AI 智能体存在可靠性问题