Data Poisoning
What is Data Poisoning?
Data PoisoningAn attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
Data poisoning targets the training stage of the ML lifecycle. Attackers manipulate datasets — public web crawls, crowd-sourced labels, fine-tuning corpora, or feedback logs — to bias the model, degrade accuracy, or implant trigger-activated behaviour. Carlini et al. demonstrated in 2023 that even tiny fractions of poisoned web data can corrupt large pre-training corpora. Variants include availability attacks (degrade overall accuracy), targeted attacks (cause specific misclassifications), and backdoor attacks (activate on a chosen trigger). Defences focus on dataset provenance and signing, deduplication, anomaly detection on training data, robust learning algorithms, and continuous evaluation against benchmark and adversarial test sets.
● Examples
- 01
An attacker editing Wikipedia or expired domains so the polluted text is scraped into a future pre-training corpus.
- 02
A malicious contributor submitting mislabeled samples to an open-source image-classification dataset.
● Frequently asked questions
What is Data Poisoning?
An attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors. It belongs to the AI & ML Security category of cybersecurity.
What does Data Poisoning mean?
An attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
How does Data Poisoning work?
Data poisoning targets the training stage of the ML lifecycle. Attackers manipulate datasets — public web crawls, crowd-sourced labels, fine-tuning corpora, or feedback logs — to bias the model, degrade accuracy, or implant trigger-activated behaviour. Carlini et al. demonstrated in 2023 that even tiny fractions of poisoned web data can corrupt large pre-training corpora. Variants include availability attacks (degrade overall accuracy), targeted attacks (cause specific misclassifications), and backdoor attacks (activate on a chosen trigger). Defences focus on dataset provenance and signing, deduplication, anomaly detection on training data, robust learning algorithms, and continuous evaluation against benchmark and adversarial test sets.
How do you defend against Data Poisoning?
Defences for Data Poisoning typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Data Poisoning?
Common alternative names include: Training data poisoning, Dataset poisoning.
● Related terms
- ai-security№ 081
Backdoor Attack (ML)
A training-time attack that implants a hidden behaviour in a model so it acts normally on clean inputs but produces an attacker-chosen output whenever a secret trigger appears.
- ai-security№ 034
AI Supply Chain Risk
The set of threats arising from the third-party datasets, base models, libraries, plug-ins, and infrastructure that organisations combine to build and deploy AI systems.
- ai-security№ 729
Nightshade Attack
A data-poisoning technique developed by the University of Chicago's Glaze team that adds imperceptible perturbations to images so that text-to-image models trained on them learn deeply distorted concepts.
- ai-security№ 691
MLSecOps
The discipline of integrating security and risk controls across the entire machine-learning lifecycle, from data sourcing through training, deployment, monitoring, and retirement.
- ai-security№ 018
Adversarial Example
An input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
- ai-security№ 777
OWASP LLM Top 10
An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
● See also
- № 704Model Inversion
- № 393Evasion Attack (ML)
- № 666Membership Inference Attack
- № 1026Shadow AI
- № 025AI Bill of Materials (AIBOM)
- № 898RAG Security
- № 897RAG
- № 376Embedding Attacks