Adversarial Example
What is Adversarial Example?
Adversarial ExampleAn input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
Adversarial examples were highlighted by Szegedy et al. (2013) and Goodfellow et al.'s 2014 FGSM paper, which showed that tiny pixel-level perturbations could cause state-of-the-art image classifiers to misclassify with high confidence. Crafting them typically uses gradient-based optimization (FGSM, PGD, Carlini-Wagner) or black-box query strategies; they transfer across models, enabling attacks without internal access. Beyond images, adversarial examples exist for text, audio, code, and malware detectors. They underpin most evasion attacks in production. Defences include adversarial training, certified robustness (randomized smoothing), input preprocessing, ensembling, and runtime anomaly detection, though no defence yet provides full robustness in high-dimensional settings.
● Examples
- 01
A stop sign covered with carefully designed stickers that an autonomous-driving classifier reads as a speed-limit sign.
- 02
An audio clip indistinguishable from background noise that voice-assistant ASR transcribes as a malicious command.
● Frequently asked questions
What is Adversarial Example?
An input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction. It belongs to the AI & ML Security category of cybersecurity.
What does Adversarial Example mean?
An input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
How does Adversarial Example work?
Adversarial examples were highlighted by Szegedy et al. (2013) and Goodfellow et al.'s 2014 FGSM paper, which showed that tiny pixel-level perturbations could cause state-of-the-art image classifiers to misclassify with high confidence. Crafting them typically uses gradient-based optimization (FGSM, PGD, Carlini-Wagner) or black-box query strategies; they transfer across models, enabling attacks without internal access. Beyond images, adversarial examples exist for text, audio, code, and malware detectors. They underpin most evasion attacks in production. Defences include adversarial training, certified robustness (randomized smoothing), input preprocessing, ensembling, and runtime anomaly detection, though no defence yet provides full robustness in high-dimensional settings.
How do you defend against Adversarial Example?
Defences for Adversarial Example typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Adversarial Example?
Common alternative names include: Adversarial input, Adversarial perturbation.
● Related terms
- ai-security№ 393
Evasion Attack (ML)
An inference-time attack in which an adversary crafts inputs that bypass a deployed machine-learning model's intended decision, such as evading a malware classifier or content filter.
- ai-security№ 081
Backdoor Attack (ML)
A training-time attack that implants a hidden behaviour in a model so it acts normally on clean inputs but produces an attacker-chosen output whenever a secret trigger appears.
- ai-security№ 032
AI Red Team
A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.
- ai-security№ 691
MLSecOps
The discipline of integrating security and risk controls across the entire machine-learning lifecycle, from data sourcing through training, deployment, monitoring, and retirement.
- ai-security№ 281
Data Poisoning
An attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
- ai-security№ 777
OWASP LLM Top 10
An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.