Data Anonymization
What is Data Anonymization?
Data AnonymizationIrreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information.
Data anonymization removes or alters identifiers, quasi-identifiers, and sensitive attributes so that re-identification is no longer reasonably possible. Techniques include suppression, generalization, perturbation, aggregation, and randomization, often evaluated against privacy models such as k-anonymity, l-diversity, t-closeness, or differential privacy. Truly anonymized data falls outside the scope of GDPR (Recital 26), but the bar is high: regulators such as the EDPB and CNIL require formal re-identification risk assessments considering means "reasonably likely" to be used, including auxiliary datasets. Common pitfalls include relying on hashing alone, releasing high-dimensional micro-data, or treating pseudonymized data as anonymous.
● Examples
- 01
Publishing hospital readmission statistics aggregated by region and quarter, with cells below five suppressed.
- 02
Releasing a public mobility dataset where trajectories are generalized to neighborhood-week granularity.
● Frequently asked questions
What is Data Anonymization?
Irreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information. It belongs to the Privacy & Data Protection category of cybersecurity.
What does Data Anonymization mean?
Irreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information.
How does Data Anonymization work?
Data anonymization removes or alters identifiers, quasi-identifiers, and sensitive attributes so that re-identification is no longer reasonably possible. Techniques include suppression, generalization, perturbation, aggregation, and randomization, often evaluated against privacy models such as k-anonymity, l-diversity, t-closeness, or differential privacy. Truly anonymized data falls outside the scope of GDPR (Recital 26), but the bar is high: regulators such as the EDPB and CNIL require formal re-identification risk assessments considering means "reasonably likely" to be used, including auxiliary datasets. Common pitfalls include relying on hashing alone, releasing high-dimensional micro-data, or treating pseudonymized data as anonymous.
How do you defend against Data Anonymization?
Defences for Data Anonymization typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Data Anonymization?
Common alternative names include: Anonymization, De-identification (strong sense).
● Related terms
- privacy№ 875
Pseudonymization
A technique that replaces direct identifiers in personal data with reversible aliases, so that the data can no longer be attributed to an individual without additional, separately kept information.
- privacy№ 576
k-Anonymity
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
- privacy№ 603
l-Diversity
An extension of k-anonymity introduced by Machanavajjhala et al. that requires each equivalence class to contain at least l well-represented values for every sensitive attribute.
- privacy№ 1126
t-Closeness
A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.
- privacy№ 317
Differential Privacy
A mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded.
- privacy№ 280
Data Minimization
A privacy principle requiring organizations to collect, process, and retain only the personal data that is strictly necessary for a defined, lawful purpose.
● See also
- № 1164Tokenization (Privacy)
- № 279Data Masking
- № 1165Tor / Tor Browser
- № 755Onion Routing
- № 503I2P