Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 786

Model Denial of Service

Was ist Model Denial of Service?

Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.


Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

Beispiele

  1. 01

    An attacker scripts thousands of requests with maximum-allowed context windows, generating six-figure cloud bills before quotas trip.

  2. 02

    An agent prompt-injection convinces the model to enter a tool-use loop that calls the expensive document-summarization API hundreds of times per session.

Häufige Fragen

Was ist Model Denial of Service?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill. Es gehört zur Kategorie KI- und ML-Sicherheit der Cybersicherheit.

Was bedeutet Model Denial of Service?

OWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.

Wie funktioniert Model Denial of Service?

Model Denial of Service (LLM04 in the OWASP Top 10 for LLM Applications) covers attacks that exhaust the resources behind an LLM-powered system rather than knock down a network. Specific patterns include flooding the model with maximum-context inputs to drive up token cost; crafting recursive or self-referential prompts that trigger long generations; abusing tool-calling agents to cascade dozens of expensive sub-calls; submitting inputs that defeat caching; and exploiting retrieval pipelines to pull massive documents into every request. The blast radius is operational (the chatbot becomes unusable) and financial (a single attacker can burn five- or six-figure inference bills in hours). Mitigations include strict per-user input/output token caps, max-step limits on agent loops, semantic and exact-match caching, rate-limit on tool fan-out, async queueing with budget guards, and observability dashboards keyed to spend per tenant.

Wie schützt man sich gegen Model Denial of Service?

Schutzmaßnahmen gegen Model Denial of Service kombinieren typischerweise technische Kontrollen und operative Praktiken, wie in der Definition oben beschrieben.

Welche anderen Bezeichnungen gibt es für Model Denial of Service?

Übliche alternative Bezeichnungen: LLM04, LLM DoS, Token-burn attack.

Verwandte Begriffe