t-Closeness
What is t-Closeness?
t-ClosenessA privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.
t-Closeness, introduced in 2007, mitigates skewness and similarity attacks against l-diversity by requiring that the distribution of a sensitive attribute within every equivalence class be within a threshold t of the distribution in the full dataset, typically measured by Earth Mover's Distance. This prevents adversaries from inferring sensitive attributes when an equivalence class is dominated by semantically close but distinct values (for example several rare cancer types). Achieving low t generally costs utility because more generalization or suppression is needed, so practitioners pick t through risk and utility trade-offs. t-Closeness is often layered on top of k-anonymity and l-diversity in healthcare, government, and research releases.
● Examples
- 01
Ensuring the salary distribution within every gender/age cell is within t=0.2 of the population distribution.
- 02
Applying t-closeness so that no equivalence class disproportionately contains a single, rare disease.
● Frequently asked questions
What is t-Closeness?
A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution. It belongs to the Privacy & Data Protection category of cybersecurity.
What does t-Closeness mean?
A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.
How does t-Closeness work?
t-Closeness, introduced in 2007, mitigates skewness and similarity attacks against l-diversity by requiring that the distribution of a sensitive attribute within every equivalence class be within a threshold t of the distribution in the full dataset, typically measured by Earth Mover's Distance. This prevents adversaries from inferring sensitive attributes when an equivalence class is dominated by semantically close but distinct values (for example several rare cancer types). Achieving low t generally costs utility because more generalization or suppression is needed, so practitioners pick t through risk and utility trade-offs. t-Closeness is often layered on top of k-anonymity and l-diversity in healthcare, government, and research releases.
How do you defend against t-Closeness?
Defences for t-Closeness typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for t-Closeness?
Common alternative names include: t-Closeness anonymization.
● Related terms
- privacy№ 576
k-Anonymity
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
- privacy№ 603
l-Diversity
An extension of k-anonymity introduced by Machanavajjhala et al. that requires each equivalence class to contain at least l well-represented values for every sensitive attribute.
- privacy№ 274
Data Anonymization
Irreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information.
- privacy№ 317
Differential Privacy
A mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded.
- privacy№ 875
Pseudonymization
A technique that replaces direct identifiers in personal data with reversible aliases, so that the data can no longer be attributed to an individual without additional, separately kept information.
- privacy№ 280
Data Minimization
A privacy principle requiring organizations to collect, process, and retain only the personal data that is strictly necessary for a defined, lawful purpose.