Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 1163

Token Smuggling

What is Token Smuggling?

Token SmugglingA class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous.


Token smuggling exploits the mismatch between how a model tokenizes and decodes text and how its content classifiers analyse it. Attackers split forbidden words across multiple tokens, use Base64, ROT-13, Unicode look-alikes, leet-speak, low-resource languages, or instruct the model to assemble the malicious string from harmless pieces — for example "concatenate the second letter of each word". Variants include payload smuggling through tool inputs and obfuscated function calls. The technique works because guardrails often inspect surface text rather than the model's reconstructed intent. Mitigations include classifier ensembles that operate on decoded text, semantic-level intent detection, decoding-aware safety models, runtime sandboxing of tool calls, and continuous adversarial red-team evaluations.

Examples

  1. 01

    An attacker asking an LLM to take the first letter of ten harmless words to spell out a forbidden chemical synthesis term.

  2. 02

    Encoding a malicious request in Base64 so a safety filter sees only random-looking characters while the LLM happily decodes and complies.

Frequently asked questions

What is Token Smuggling?

A class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous. It belongs to the AI & ML Security category of cybersecurity.

What does Token Smuggling mean?

A class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous.

How does Token Smuggling work?

Token smuggling exploits the mismatch between how a model tokenizes and decodes text and how its content classifiers analyse it. Attackers split forbidden words across multiple tokens, use Base64, ROT-13, Unicode look-alikes, leet-speak, low-resource languages, or instruct the model to assemble the malicious string from harmless pieces — for example "concatenate the second letter of each word". Variants include payload smuggling through tool inputs and obfuscated function calls. The technique works because guardrails often inspect surface text rather than the model's reconstructed intent. Mitigations include classifier ensembles that operate on decoded text, semantic-level intent detection, decoding-aware safety models, runtime sandboxing of tool calls, and continuous adversarial red-team evaluations.

How do you defend against Token Smuggling?

Defences for Token Smuggling typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for Token Smuggling?

Common alternative names include: Token smuggling jailbreak, Encoded prompt injection.

Related terms