Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 619

LLM System Prompt Leak

What is LLM System Prompt Leak?

LLM System Prompt LeakAn attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.


A system prompt leak occurs when a user induces a deployed LLM application to reveal its hidden system prompt, developer instructions, or attached context such as API keys, internal documentation, or tool definitions. Attackers use direct requests, role-play framings, translation tricks, character-encoding obfuscation, or indirect prompt injection through documents the model is asked to summarise. Even partial leaks help adversaries reverse-engineer business logic, find guardrail bypasses, and craft tailored jailbreaks or social-engineering content. Mitigations include treating system prompts as low-trust public data, removing secrets from prompts, using server-side policy checks, output filtering, and instructing the model not to reveal its instructions while accepting that determined adversaries will often succeed.

Examples

  1. 01

    An attacker tells a chatbot to repeat everything above its first user message in code blocks, exposing the full system prompt and an embedded API key.

  2. 02

    A summarisation assistant given a malicious PDF returns its hidden tool descriptions because the document instructs it to do so.

Frequently asked questions

What is LLM System Prompt Leak?

An attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools. It belongs to the AI & ML Security category of cybersecurity.

What does LLM System Prompt Leak mean?

An attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.

How does LLM System Prompt Leak work?

A system prompt leak occurs when a user induces a deployed LLM application to reveal its hidden system prompt, developer instructions, or attached context such as API keys, internal documentation, or tool definitions. Attackers use direct requests, role-play framings, translation tricks, character-encoding obfuscation, or indirect prompt injection through documents the model is asked to summarise. Even partial leaks help adversaries reverse-engineer business logic, find guardrail bypasses, and craft tailored jailbreaks or social-engineering content. Mitigations include treating system prompts as low-trust public data, removing secrets from prompts, using server-side policy checks, output filtering, and instructing the model not to reveal its instructions while accepting that determined adversaries will often succeed.

How do you defend against LLM System Prompt Leak?

Defences for LLM System Prompt Leak typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for LLM System Prompt Leak?

Common alternative names include: System prompt extraction, Prompt exfiltration.

Related terms