k-Anonymity
What is k-Anonymity?
k-AnonymityA privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
k-Anonymity, formalized by Sweeney in 2002, protects against re-identification by ensuring that quasi-identifier combinations (such as age, ZIP code, and gender) each appear in at least k records, forming equivalence classes. It is achieved via generalization (replacing exact values with ranges or broader categories) and suppression (removing rare values), often using algorithms like Mondrian or Incognito. While k-anonymity reduces linkage attacks, it does not protect against homogeneity or background-knowledge attacks if a sensitive attribute is identical within an equivalence class, motivating l-diversity and t-closeness extensions. Practitioners pick k based on data utility, risk appetite, and regulatory expectations under GDPR Recital 26.
● Examples
- 01
A medical dataset generalized so that every age/ZIP combination matches at least five patients (k=5).
- 02
Generalizing date of birth to year-only to satisfy k-anonymity in a public research release.
● Frequently asked questions
What is k-Anonymity?
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers. It belongs to the Privacy & Data Protection category of cybersecurity.
What does k-Anonymity mean?
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
How do you defend against k-Anonymity?
Defences for k-Anonymity typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for k-Anonymity?
Common alternative names include: k-Anonymization.