Model Denial of Service
What is Model Denial of Service?
Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.
Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.
● Examples
- 01
An attacker scripts thousands of requests with maximum-allowed context windows, generating six-figure cloud bills before quotas trip.
- 02
An agent prompt-injection convinces the model to enter a tool-use loop that calls the expensive document-summarization API hundreds of times per session.
● Frequently asked questions
What is Model Denial of Service?
OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill. It belongs to the AI & ML Security category of cybersecurity.
What does Model Denial of Service mean?
OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.
How does Model Denial of Service work?
Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.
How do you defend against Model Denial of Service?
Defences for Model Denial of Service typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Model Denial of Service?
Common alternative names include: LLM04, LLM DoS, Token-burn attack.
● Related terms
- ai-security№ 870
OWASP LLM Top 10
An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
- attacks№ 333
Denial-of-Service (DoS) Attack
An attack that exhausts a system's bandwidth, compute, memory, or application resources so that legitimate users can no longer access the service.
- network-security№ 1008
Rate Limiting
Rate limiting caps the number of requests an identifier (IP, user, API key, or token) may make over a time window, protecting APIs and apps from abuse, scraping, and brute-force.
- ai-security№ 027
Agentic AI Security
The discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius.
- ai-security№ 969
Prompt Injection
An attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
- ai-security№ 689
LLM Guardrails
Mechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.