Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 1168

Transferable Adversarial Attack

What is Transferable Adversarial Attack?

Transferable Adversarial AttackAn attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.


A transferable adversarial attack exploits the empirical observation, first systematised by Papernot, McDaniel, and Goodfellow, that adversarial examples generated against one model often remain misclassified by other models trained on similar data. An attacker can therefore train a local substitute model, craft adversarial inputs with white-box methods such as FGSM or PGD, and submit them to a remote black-box target with no internal access. Transferability has been demonstrated against image classifiers, malware detectors, NLP models, and commercial cloud APIs. Defences include adversarial training on diverse perturbations, input transformation, ensemble disagreement detectors, and certified robustness methods such as randomised smoothing.

Examples

  1. 01

    An attacker trains a substitute CNN locally and crafts FGSM examples that also evade a remote image-moderation API.

  2. 02

    Adversarial malware samples generated against an open-source classifier still bypass several commercial machine-learning antivirus engines.

Frequently asked questions

What is Transferable Adversarial Attack?

An attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target. It belongs to the AI & ML Security category of cybersecurity.

What does Transferable Adversarial Attack mean?

An attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.

How does Transferable Adversarial Attack work?

A transferable adversarial attack exploits the empirical observation, first systematised by Papernot, McDaniel, and Goodfellow, that adversarial examples generated against one model often remain misclassified by other models trained on similar data. An attacker can therefore train a local substitute model, craft adversarial inputs with white-box methods such as FGSM or PGD, and submit them to a remote black-box target with no internal access. Transferability has been demonstrated against image classifiers, malware detectors, NLP models, and commercial cloud APIs. Defences include adversarial training on diverse perturbations, input transformation, ensemble disagreement detectors, and certified robustness methods such as randomised smoothing.

How do you defend against Transferable Adversarial Attack?

Defences for Transferable Adversarial Attack typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for Transferable Adversarial Attack?

Common alternative names include: Cross-model adversarial transfer, Black-box transfer attack.

Related terms