Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 786

Model Denial of Service

What is Model Denial of Service?

Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.


Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

Examples

  1. 01

    An attacker scripts thousands of requests with maximum-allowed context windows, generating six-figure cloud bills before quotas trip.

  2. 02

    An agent prompt-injection convinces the model to enter a tool-use loop that calls the expensive document-summarization API hundreds of times per session.

Frequently asked questions

What is Model Denial of Service?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill. It belongs to the AI & ML Security category of cybersecurity.

What does Model Denial of Service mean?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.

How does Model Denial of Service work?

Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

How do you defend against Model Denial of Service?

Defences for Model Denial of Service typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for Model Denial of Service?

Common alternative names include: LLM04, LLM DoS, Token-burn attack.

Related terms