Differential Privacy
What is Differential Privacy?
Differential PrivacyA mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded.
Differential privacy, formalized by Dwork, McSherry, Nissim, and Smith, guarantees that the probability of any output changes by at most a small factor e^epsilon (and optionally delta) when one record is added or removed from a dataset. Mechanisms include the Laplace, Gaussian, and exponential mechanisms, as well as DP-SGD for machine learning. Cumulative privacy loss is tracked with a privacy budget (epsilon-delta) and advanced composition or moments accountants. The U.S. Census Bureau (2020 decennial), Apple, Google, and Microsoft have deployed it for telemetry and statistics. Unlike syntactic models (k-anonymity, l-diversity), it provides provable, future-proof guarantees regardless of adversary auxiliary knowledge.
● Examples
- 01
Apple's keyboard suggestions reporting emoji frequencies via local differential privacy.
- 02
Training a healthcare model with DP-SGD so individual patient records cannot be memorized.
● Frequently asked questions
What is Differential Privacy?
A mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded. It belongs to the Privacy & Data Protection category of cybersecurity.
What does Differential Privacy mean?
A mathematical framework that quantifies privacy loss when releasing statistics or training models, by adding calibrated noise so any single individual's contribution is provably bounded.
How does Differential Privacy work?
Differential privacy, formalized by Dwork, McSherry, Nissim, and Smith, guarantees that the probability of any output changes by at most a small factor e^epsilon (and optionally delta) when one record is added or removed from a dataset. Mechanisms include the Laplace, Gaussian, and exponential mechanisms, as well as DP-SGD for machine learning. Cumulative privacy loss is tracked with a privacy budget (epsilon-delta) and advanced composition or moments accountants. The U.S. Census Bureau (2020 decennial), Apple, Google, and Microsoft have deployed it for telemetry and statistics. Unlike syntactic models (k-anonymity, l-diversity), it provides provable, future-proof guarantees regardless of adversary auxiliary knowledge.
How do you defend against Differential Privacy?
Defences for Differential Privacy typically combine technical controls and operational practices, as detailed in the full definition above.
What are other names for Differential Privacy?
Common alternative names include: DP, Epsilon-Differential Privacy.
● Related terms
- privacy№ 274
Data Anonymization
Irreversibly transforming personal data so that no individual can be identified, directly or indirectly, even when combined with other available information.
- privacy№ 576
k-Anonymity
A privacy model proposed by Latanya Sweeney that requires every record in a dataset to be indistinguishable from at least k-1 others based on its quasi-identifiers.
- privacy№ 603
l-Diversity
An extension of k-anonymity introduced by Machanavajjhala et al. that requires each equivalence class to contain at least l well-represented values for every sensitive attribute.
- privacy№ 1126
t-Closeness
A privacy model by Li, Li, and Venkatasubramanian that strengthens l-diversity by limiting how far the distribution of a sensitive attribute in any class differs from its global distribution.
- privacy№ 875
Pseudonymization
A technique that replaces direct identifiers in personal data with reversible aliases, so that the data can no longer be attributed to an individual without additional, separately kept information.
- privacy№ 280
Data Minimization
A privacy principle requiring organizations to collect, process, and retain only the personal data that is strictly necessary for a defined, lawful purpose.