MCP Attacks
What is MCP Attacks?
MCP AttacksAttacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
MCP attacks target the Model Context Protocol, an open standard introduced by Anthropic that lets AI assistants connect to external tools, data sources, and applications through a common interface. Because an MCP server can expose tools, resources, and prompts to the model, a malicious or compromised server can inject hidden instructions, exfiltrate the user's data, request dangerous tool calls, or rebind tool definitions after the user has approved them (a 'rug pull'). Related techniques include tool description poisoning, cross-server confused-deputy attacks, and prompt injection via returned documents. Mitigations include signed and pinned server identities, scoped permissions, explicit consent per tool call, and sandboxing of MCP server processes.
● Examples
- 01
A malicious MCP server changes a previously approved tool description so future calls silently exfiltrate the user's emails.
- 02
A document returned by an MCP server contains hidden instructions that tell the assistant to send the user's API keys to an attacker's webhook.
● Frequently asked questions
What is MCP Attacks?
Attacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts. It belongs to the AI & ML Security category of cybersecurity.
What does MCP Attacks mean?
Attacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
How does MCP Attacks work?
MCP attacks target the Model Context Protocol, an open standard introduced by Anthropic that lets AI assistants connect to external tools, data sources, and applications through a common interface. Because an MCP server can expose tools, resources, and prompts to the model, a malicious or compromised server can inject hidden instructions, exfiltrate the user's data, request dangerous tool calls, or rebind tool definitions after the user has approved them (a 'rug pull'). Related techniques include tool description poisoning, cross-server confused-deputy attacks, and prompt injection via returned documents. Mitigations include signed and pinned server identities, scoped permissions, explicit consent per tool call, and sandboxing of MCP server processes.
How do you defend against MCP Attacks?
Defences for MCP Attacks typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for MCP Attacks?
Common alternative names include: Model Context Protocol attack, MCP tool injection.
● Related terms
- ai-security№ 866
Prompt Injection
An attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
- ai-security№ 528
Indirect Prompt Injection
A prompt-injection variant where malicious instructions are hidden inside third-party content (web pages, documents, emails) that an LLM later ingests through retrieval, browsing, or tool use.
- ai-security№ 619
LLM System Prompt Leak
An attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.