AI Safety
What is AI Safety?
AI SafetyThe discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.
AI safety is broader than traditional security: it addresses harms even when no adversary is present, such as accidents, bias, deception, runaway autonomous behaviour, dual-use misuse, and catastrophic or existential risk. Technical work includes alignment, interpretability, evaluation, robust training, monitoring, and capability elicitation. Operationally it covers responsible-scaling policies, model cards, deployment guardrails, and access controls. Institutions like the UK and US AI Safety Institutes, the EU AI Office, NIST (AI RMF) and frontier labs publish safety standards. AI safety is distinct from but overlaps deeply with AI security: insecure models often become unsafe, and unsafe models complicate security incident response.
● Examples
- 01
An LLM provider implementing a responsible-scaling policy that pauses training above a defined capability threshold.
- 02
Evaluating an agentic model for autonomous-replication and self-exfiltration capabilities before public release.
● Frequently asked questions
What is AI Safety?
The discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions. It belongs to the AI & ML Security category of cybersecurity.
What does AI Safety mean?
The discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.
How does AI Safety work?
AI safety is broader than traditional security: it addresses harms even when no adversary is present, such as accidents, bias, deception, runaway autonomous behaviour, dual-use misuse, and catastrophic or existential risk. Technical work includes alignment, interpretability, evaluation, robust training, monitoring, and capability elicitation. Operationally it covers responsible-scaling policies, model cards, deployment guardrails, and access controls. Institutions like the UK and US AI Safety Institutes, the EU AI Office, NIST (AI RMF) and frontier labs publish safety standards. AI safety is distinct from but overlaps deeply with AI security: insecure models often become unsafe, and unsafe models complicate security incident response.
How do you defend against AI Safety?
Defences for AI Safety typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for AI Safety?
Common alternative names include: Frontier AI safety, Responsible AI.
● Related terms
- ai-security№ 024
AI Alignment
The research and engineering effort to ensure AI systems pursue goals, follow instructions, and behave in ways that match the intentions of their developers and users.
- ai-security№ 027
AI Governance
The policies, processes, roles, and controls organisations and regulators use to ensure AI systems are developed, deployed, and operated responsibly and lawfully.
- ai-security№ 032
AI Red Team
A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.
- ai-security№ 777
OWASP LLM Top 10
An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
- ai-security№ 029
AI Incident Response
The set of processes, roles, and playbooks an organisation uses to detect, contain, investigate, communicate, and recover from incidents involving AI systems.
- ai-security№ 028
AI Hallucination
A failure mode in which a generative AI system outputs content that is fluent and confident but factually wrong, fabricated, or unsupported by its sources.
● See also
- № 1123Synthetic Media
- № 035AI Watermarking
- № 026AI Content Detection
- № 391EU AI Act