𝗦𝘁𝗼𝗽 𝗧𝗿𝘂𝘀𝘁𝗶𝗻𝗴 𝗧𝗵𝗲 𝗔𝗴𝗲𝗻𝘁: 𝗕𝗶𝗻𝗱 𝗔𝗽𝗽𝗿𝗼𝘃𝗮𝗹𝘀 𝘁𝗼 𝗘𝘅𝗮𝗰𝘁 𝗧𝗼𝗼𝗹 𝗖𝗮𝗹𝗹𝘀
Most agentic systems protect dangerous actions like file writes or money transfers with a simple approval.
Usually, this approval is a boolean flag in the system state. Example: approved: true.
This is a mistake. A boolean fails in three ways that attackers exploit:
- Flip: An attacker changes the state from false to true via prompt injection or code flaws.
- Replay: You approve a safe command like "read file." The system sees "true" and allows a second, dangerous command like "delete database."
- Argument Drift: You approve "send $10." An attacker changes the amount to $10,000 before execution. The flag still says "true."
The problem is that you are modeling approval as a property of the entire session. It must be evidence for one specific call.
How to fix it:
When a human approves a call, create a secure tag. This tag must lock these four things:
- The unique tool call ID.
- A hash of the exact arguments.
- The user identity.
- An expiration time.
Verify this tag at the exact moment of execution. Use a secret key that only the system knows.
Follow these rules for implementation:
- Use Canonicalization: Both the approver and the executor must hash the exact same bytes. Use RFC 8785 to ensure numbers and keys match.
- Fail Closed: If a tag is missing, expired, or wrong, return a specific "not approved" error. Do not treat it as a standard tool result.
- Deny by Default: Only allow tools that require explicit approval. Deny everything else.
- Handle Replays: If you use engines like Temporal, ensure your secret key is deterministic. If the key changes after a system restart, all existing approvals will fail.
Authorization should not be a floating piece of state. It must be a bound envelope that proves: "This specific person approved these specific arguments for this specific tool until this specific time."
Stop using booleans. They are not a simplification. They are a bug.
Optional learning community: https://t.me/GyaanSetuAi