𝗟𝗟𝗠 𝗣𝗿𝗼𝗺𝗽𝘁 𝗜𝗻𝗷𝗲𝗰𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆
LLMs have no hard boundary between instructions and data. Everything in the context window is one stream of tokens. Prompt injection happens when attacker data acts as instructions. You cannot filter your way to safety. You must manage it with defense-in-depth.
The failure of common defenses:
- Keyword Blocklists: Attackers use synonyms, misspellings, or different languages to bypass them. Filtering strings does not filter intent.
- Output Redaction: Attackers can fragment or encode secrets so a literal string match fails.
- LLM Judges: A separate model can be socially engineered to believe a secret is harmless.
- Human Review: Humans see rendered text, not raw bytes. They cannot see hidden characters used in ASCII smuggling.
ASCII Smuggling is a major threat. It uses invisible characters like Unicode Tags or zero-width spaces to hide instructions. The model reads them, but the human sees nothing. This allows identity spoofing and data exfiltration via email or calendars.
How to defend your application:
- Sanitize raw payloads: Strip control characters and zero-width characters before they reach the model.
- Use allowlists: Define the specific Unicode categories you need instead of chasing bad ones.
- Normalize data: Use NFKC-normalization on all inputs.
- Minimize secrets: Do not put sensitive data in the context window if the model does not need it.
- Treat RAG as untrusted: Assume any document you retrieve for a model is a potential injection vector.
- Watch for anomalies: Flag inputs where the visible length differs from the raw code-point count.
Security is a pipeline flaw, not just a model flaw. The fix lives in your application code.
Source: https://dev.to/geekaara/llm-prompt-injection-guardrail-security-glm
Optional learning community: https://t.me/GyaanSetuAi