k-Anonymity
What is k-Anonymity?
k-AnonymityA privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
k-Anonymity, formalized by Sweeney in 2002, protects against re-identification by ensuring that quasi-identifier combinations (such as age, ZIP code, and gender) each appear in at least k records, forming equivalence classes. It is achieved via generalization (replacing exact values with ranges or broader categories) and suppression (removing rare values), often using algorithms like Mondrian or Incognito. While k-anonymity reduces linkage attacks, it does not protect against homogeneity or background-knowledge attacks if a sensitive attribute is identical within an equivalence class, motivating l-diversity and t-closeness extensions. Practitioners pick k based on data utility, risk appetite, and regulatory expectations under GDPR Recital 26.
● Examples
- 01
A medical dataset generalized so that every age/ZIP combination matches at least five patients (k=5).
- 02
Generalizing date of birth to year-only to satisfy k-anonymity in a public research release.
● Frequently asked questions
What is k-Anonymity?
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers. It belongs to the Privacy & Data Protection category of cybersecurity.
What does k-Anonymity mean?
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
How does k-Anonymity work?
k-Anonymity, formalized by Sweeney in 2002, protects against re-identification by ensuring that quasi-identifier combinations (such as age, ZIP code, and gender) each appear in at least k records, forming equivalence classes. It is achieved via generalization (replacing exact values with ranges or broader categories) and suppression (removing rare values), often using algorithms like Mondrian or Incognito. While k-anonymity reduces linkage attacks, it does not protect against homogeneity or background-knowledge attacks if a sensitive attribute is identical within an equivalence class, motivating l-diversity and t-closeness extensions. Practitioners pick k based on data utility, risk appetite, and regulatory expectations under GDPR Recital 26.
How do you defend against k-Anonymity?
Defences for k-Anonymity typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for k-Anonymity?
Common alternative names include: k-Anonymization.
● Related terms
- privacy№ 274
Data Anonymization
Irreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information.
- privacy№ 603
l-Diversity
An extension of k-anonymity introduced by Machanavajjhala et al. that requires each equivalence class to contain at least l well-represented values for every sensitive attribute.
- privacy№ 1126
t-Closeness
A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.
- privacy№ 317
Differential Privacy
A mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded.
- privacy№ 875
Pseudonymization
A technique that replaces direct identifiers in personal data with reversible aliases, so that the data can no longer be attributed to an individual without additional, separately kept information.
- privacy№ 818
Personally Identifiable Information (PII)
Any data that can identify a specific individual on its own or when combined with other information, such as names, identifiers, or biometric records.