Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 786

Model Denial of Service

Qu'est-ce que Model Denial of Service ?

Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.


Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

Exemples

  1. 01

    An attacker scripts thousands of requests with maximum-allowed context windows, generating six-figure cloud bills before quotas trip.

  2. 02

    An agent prompt-injection convinces the model to enter a tool-use loop that calls the expensive document-summarization API hundreds of times per session.

Questions fréquentes

Qu'est-ce que Model Denial of Service ?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill. Cette notion relève de la catégorie Sécurité de l'IA et du ML en cybersécurité.

Que signifie Model Denial of Service ?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.

Comment fonctionne Model Denial of Service ?

Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

Comment se défendre contre Model Denial of Service ?

Les défenses contre Model Denial of Service combinent habituellement des contrôles techniques et des pratiques opérationnelles, comme détaillé dans la définition ci-dessus.

Quels sont les autres noms de Model Denial of Service ?

Noms alternatifs courants : LLM04, LLM DoS, Token-burn attack.

Termes liés