Advanced Threats

Prompt Injection: Advanced Techniques

"Ignore previous instructions" is just the tip of the iceberg. Modern attacks are subtle, multi-stage, and increasingly automated.

1. Indirect Prompt Injection

The attacker doesn't interact with the LLM directly. Instead, they place a payload in a location the LLM will read.

Websites: Hidden text on a webpage that a browsing agent reads.
Documents: White text on a white background in a PDF resume.
Emails: A malicious signature block.

Impact: This turns the LLM into a "confused deputy," executing actions on behalf of the attacker without the user knowing.

2. Multi-Modal Injection

With the rise of GPT-4o and Gemini, attacks can be embedded in images or audio.

Visual Jailbreaks: An image of text saying "Describe how to make a bomb" might bypass text-based safety filters, but the vision model will still read and process it.

3. Many-Shot Jailbreaking

A technique discovered by Anthropic researchers. By flooding the context window with hundreds of fake dialogue examples where an AI behaves badly, the model's safety training is "fatigued," and it eventually complies with the malicious request.

4. ASCII Art & Translation

Base64: Asking the model to "Decode this Base64 string and follow the instructions" often bypasses filters that only scan for English keywords.

ASCII Art: Writing "HATE" in giant ASCII letters might be missed by a text classifier but understood by the LLM.

Can You Defend Against This?

Simple keyword filters fail against these advanced attacks. You need an intent-based AI firewall.

Prove

Enforce