Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 033

AI Safety

What is AI Safety?

AI SafetyThe discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.


AI safety is broader than traditional security: it addresses harms even when no adversary is present, such as accidents, bias, deception, runaway autonomous behaviour, dual-use misuse, and catastrophic or existential risk. Technical work includes alignment, interpretability, evaluation, robust training, monitoring, and capability elicitation. Operationally it covers responsible-scaling policies, model cards, deployment guardrails, and access controls. Institutions like the UK and US AI Safety Institutes, the EU AI Office, NIST (AI RMF) and frontier labs publish safety standards. AI safety is distinct from but overlaps deeply with AI security: insecure models often become unsafe, and unsafe models complicate security incident response.

Examples

  1. 01

    An LLM provider implementing a responsible-scaling policy that pauses training above a defined capability threshold.

  2. 02

    Evaluating an agentic model for autonomous-replication and self-exfiltration capabilities before public release.

Frequently asked questions

What is AI Safety?

The discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions. It belongs to the AI & ML Security category of cybersecurity.

What does AI Safety mean?

The discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.

How does AI Safety work?

AI safety is broader than traditional security: it addresses harms even when no adversary is present, such as accidents, bias, deception, runaway autonomous behaviour, dual-use misuse, and catastrophic or existential risk. Technical work includes alignment, interpretability, evaluation, robust training, monitoring, and capability elicitation. Operationally it covers responsible-scaling policies, model cards, deployment guardrails, and access controls. Institutions like the UK and US AI Safety Institutes, the EU AI Office, NIST (AI RMF) and frontier labs publish safety standards. AI safety is distinct from but overlaps deeply with AI security: insecure models often become unsafe, and unsafe models complicate security incident response.

How do you defend against AI Safety?

Defences for AI Safety typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for AI Safety?

Common alternative names include: Frontier AI safety, Responsible AI.

Related terms

See also