Transferable Adversarial Attack
What is Transferable Adversarial Attack?
Transferable Adversarial AttackAn attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.
A transferable adversarial attack exploits the empirical observation, first systematised by Papernot, McDaniel, and Goodfellow, that adversarial examples generated against one model often remain misclassified by other models trained on similar data. An attacker can therefore train a local substitute model, craft adversarial inputs with white-box methods such as FGSM or PGD, and submit them to a remote black-box target with no internal access. Transferability has been demonstrated against image classifiers, malware detectors, NLP models, and commercial cloud APIs. Defences include adversarial training on diverse perturbations, input transformation, ensemble disagreement detectors, and certified robustness methods such as randomised smoothing.
● Examples
- 01
An attacker trains a substitute CNN locally and crafts FGSM examples that also evade a remote image-moderation API.
- 02
Adversarial malware samples generated against an open-source classifier still bypass several commercial machine-learning antivirus engines.
● Frequently asked questions
What is Transferable Adversarial Attack?
An attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target. It belongs to the AI & ML Security category of cybersecurity.
What does Transferable Adversarial Attack mean?
An attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.
How does Transferable Adversarial Attack work?
A transferable adversarial attack exploits the empirical observation, first systematised by Papernot, McDaniel, and Goodfellow, that adversarial examples generated against one model often remain misclassified by other models trained on similar data. An attacker can therefore train a local substitute model, craft adversarial inputs with white-box methods such as FGSM or PGD, and submit them to a remote black-box target with no internal access. Transferability has been demonstrated against image classifiers, malware detectors, NLP models, and commercial cloud APIs. Defences include adversarial training on diverse perturbations, input transformation, ensemble disagreement detectors, and certified robustness methods such as randomised smoothing.
How do you defend against Transferable Adversarial Attack?
Defences for Transferable Adversarial Attack typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Transferable Adversarial Attack?
Common alternative names include: Cross-model adversarial transfer, Black-box transfer attack.
● Related terms
- ai-security№ 018
Adversarial Example
An input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
- ai-security№ 014
Adaptive Attack
An attack on a machine-learning system that is specifically designed to evade or break a known defence, instead of using a generic, defence-agnostic technique.
- ai-security№ 703
Model Extraction
An attack that reconstructs a confidential machine-learning model's parameters, behaviour, or training data by systematically querying its public API.
- ai-security№ 032
AI Red Team
A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.