Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 618

LLM Guardrails

What is LLM Guardrails?

LLM GuardrailsMechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.


Guardrails are the policy layer of LLM applications. They include classifiers and rule-based filters for prompt-injection or jailbreak attempts, topic, persona, and tone controls, output schema validation, PII or secret scrubbing, refusal handling, citation requirements, and tool-call constraints. Implementations range from open-source frameworks such as NVIDIA NeMo Guardrails, Guardrails AI, and Microsoft's Presidio, to vendor APIs like OpenAI Moderation or Anthropic's safety endpoints, to bespoke logic inside agent frameworks. Guardrails complement model-internal alignment, LLM firewalls, and MLSecOps practices. They should be testable, versioned, and continuously validated by red teaming, since attackers focus on finding the gap between guardrails and model behaviour.

Examples

  1. 01

    A guardrail that forces a financial-advice chatbot to include a regulatory disclaimer in every response.

  2. 02

    A schema validator that drops any LLM output not matching the expected JSON for a database write.

Frequently asked questions

What is LLM Guardrails?

Mechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model. It belongs to the AI & ML Security category of cybersecurity.

What does LLM Guardrails mean?

Mechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.

How does LLM Guardrails work?

Guardrails are the policy layer of LLM applications. They include classifiers and rule-based filters for prompt-injection or jailbreak attempts, topic, persona, and tone controls, output schema validation, PII or secret scrubbing, refusal handling, citation requirements, and tool-call constraints. Implementations range from open-source frameworks such as NVIDIA NeMo Guardrails, Guardrails AI, and Microsoft's Presidio, to vendor APIs like OpenAI Moderation or Anthropic's safety endpoints, to bespoke logic inside agent frameworks. Guardrails complement model-internal alignment, LLM firewalls, and MLSecOps practices. They should be testable, versioned, and continuously validated by red teaming, since attackers focus on finding the gap between guardrails and model behaviour.

How do you defend against LLM Guardrails?

Defences for LLM Guardrails typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for LLM Guardrails?

Common alternative names include: AI guardrails, Generative AI guardrails.

Related terms

See also