Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 1126

t-Closeness

What is t-Closeness?

t-ClosenessA privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.


t-Closeness, introduced in 2007, mitigates skewness and similarity attacks against l-diversity by requiring that the distribution of a sensitive attribute within every equivalence class be within a threshold t of the distribution in the full dataset, typically measured by Earth Mover's Distance. This prevents adversaries from inferring sensitive attributes when an equivalence class is dominated by semantically close but distinct values (for example several rare cancer types). Achieving low t generally costs utility because more generalization or suppression is needed, so practitioners pick t through risk and utility trade-offs. t-Closeness is often layered on top of k-anonymity and l-diversity in healthcare, government, and research releases.

Examples

  1. 01

    Ensuring the salary distribution within every gender/age cell is within t=0.2 of the population distribution.

  2. 02

    Applying t-closeness so that no equivalence class disproportionately contains a single, rare disease.

Frequently asked questions

What is t-Closeness?

A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution. It belongs to the Privacy & Data Protection category of cybersecurity.

What does t-Closeness mean?

A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.

How does t-Closeness work?

t-Closeness, introduced in 2007, mitigates skewness and similarity attacks against l-diversity by requiring that the distribution of a sensitive attribute within every equivalence class be within a threshold t of the distribution in the full dataset, typically measured by Earth Mover's Distance. This prevents adversaries from inferring sensitive attributes when an equivalence class is dominated by semantically close but distinct values (for example several rare cancer types). Achieving low t generally costs utility because more generalization or suppression is needed, so practitioners pick t through risk and utility trade-offs. t-Closeness is often layered on top of k-anonymity and l-diversity in healthcare, government, and research releases.

How do you defend against t-Closeness?

Defences for t-Closeness typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for t-Closeness?

Common alternative names include: t-Closeness anonymization.

Related terms