Agentic AI Security
What is Agentic AI Security?
Agentic AI SecurityThe discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius.
Agentic AI security covers the controls, threat models, and runtime guardrails needed when large language models stop merely answering and start acting — issuing tool calls, browsing the web, writing to files, sending emails, or executing transactions. Compared to a chat-only LLM, an agent's untrusted inputs (retrieved pages, tool outputs, multimodal content) flow directly into next-step decisions, so a single instance of indirect prompt injection can pivot into data exfiltration, account takeover, or destructive actions. Effective programs combine least-privilege tool scoping, sandboxed execution, structured output validation, human-in-the-loop checkpoints for high-impact actions, allow-listed tools, isolated browsing contexts, and detection of behavioral drift such as exfiltration patterns or out-of-policy tool sequences. As of 2025–2026, agentic AI security is the fastest-growing slice of AI security work, driven by Anthropic's Claude tool use, OpenAI's Operator-class agents, and enterprise rollouts via MCP-based agent runtimes.
● Examples
- 01
A purchasing agent reads an attacker-controlled vendor email containing hidden 'forward all invoices' instructions and tries to act on them.
- 02
An engineering copilot agent is constrained to read-only git tools and a sandboxed shell, with destructive commands gated behind explicit human approval.
● Frequently asked questions
What is Agentic AI Security?
The discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius. It belongs to the AI & ML Security category of cybersecurity.
What does Agentic AI Security mean?
The discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius.
How does Agentic AI Security work?
Agentic AI security covers the controls, threat models, and runtime guardrails needed when large language models stop merely answering and start acting — issuing tool calls, browsing the web, writing to files, sending emails, or executing transactions. Compared to a chat-only LLM, an agent's untrusted inputs (retrieved pages, tool outputs, multimodal content) flow directly into next-step decisions, so a single instance of indirect prompt injection can pivot into data exfiltration, account takeover, or destructive actions. Effective programs combine least-privilege tool scoping, sandboxed execution, structured output validation, human-in-the-loop checkpoints for high-impact actions, allow-listed tools, isolated browsing contexts, and detection of behavioral drift such as exfiltration patterns or out-of-policy tool sequences. As of 2025–2026, agentic AI security is the fastest-growing slice of AI security work, driven by Anthropic's Claude tool use, OpenAI's Operator-class agents, and enterprise rollouts via MCP-based agent runtimes.
How do you defend against Agentic AI Security?
Defences for Agentic AI Security typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Agentic AI Security?
Common alternative names include: LLM agent security, autonomous agent security.
● Related terms
- ai-security№ 969
Prompt Injection
An attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
- ai-security№ 586
Indirect Prompt Injection
A prompt-injection variant where malicious instructions are hidden inside third-party content (web pages, documents, emails) that an LLM later ingests through retrieval, browsing, or tool use.
- ai-security№ 731
MCP Attacks
Attacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
- ai-security№ 689
LLM Guardrails
Mechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.
- ai-security№ 1285
Tool-Use Injection
Attacks that manipulate an LLM agent's tool-calling layer — forging tool arguments, smuggling instructions through tool outputs, or coaxing the model into calling unsanctioned tools.
- ai-security№ 440
Excessive Agency
OWASP LLM06 — granting an LLM-driven system more functionality, permissions, or autonomy than it actually needs, so that a successful prompt injection or model error translates into outsized real-world impact.