Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 032

AI Red Team

What is AI Red Team?

AI Red TeamA specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.


AI red teaming extends traditional red teaming to AI-specific failure modes: prompt injection, jailbreaks, harmful content generation, hallucinated authority, model theft, data exfiltration via tools, agentic abuse, and emergent dual-use risks. It blends adversarial ML expertise with policy, sociotechnical, and offensive-security skills. Microsoft, Anthropic, OpenAI, Google DeepMind, and NIST (via the AI Safety Institute and AI 600-1 profile) all run or recommend structured red-team programs, often combining manual probing, automated attack suites, and crowdsourced bug-bounty events. Outputs feed model alignment, evaluation harnesses, guardrails, governance controls, and incident-response playbooks. AI red teams are an explicit requirement under the EU AI Act for high-risk and general-purpose AI models.

Examples

  1. 01

    A pre-launch red team probing a chatbot for jailbreaks, data leakage, and harmful-output failure modes.

  2. 02

    A government-sponsored exercise testing whether an open-weights model can be coaxed into producing biothreat instructions.

Frequently asked questions

What is AI Red Team?

A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do. It belongs to the AI & ML Security category of cybersecurity.

What does AI Red Team mean?

A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.

How does AI Red Team work?

AI red teaming extends traditional red teaming to AI-specific failure modes: prompt injection, jailbreaks, harmful content generation, hallucinated authority, model theft, data exfiltration via tools, agentic abuse, and emergent dual-use risks. It blends adversarial ML expertise with policy, sociotechnical, and offensive-security skills. Microsoft, Anthropic, OpenAI, Google DeepMind, and NIST (via the AI Safety Institute and AI 600-1 profile) all run or recommend structured red-team programs, often combining manual probing, automated attack suites, and crowdsourced bug-bounty events. Outputs feed model alignment, evaluation harnesses, guardrails, governance controls, and incident-response playbooks. AI red teams are an explicit requirement under the EU AI Act for high-risk and general-purpose AI models.

How do you defend against AI Red Team?

Defences for AI Red Team typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for AI Red Team?

Common alternative names include: AI red teaming, Generative AI red team.

Related terms

See also