Carnegie Mellon University

Data Privacy
Laboratory

Laboratory for International Data Privacy


Computational Data Privacy Protection



L. Sweeney. Computational Data Privacy Protection, LIDAP-WP5. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA: 2000. [Members only: Full paper in PDF 215 pages, 725KB]

Abstract

The goal of the work presented in this working paper is to explore computational techniques for releasing useful information in such a way that the identity of any individual or entity contained in data cannot be recognized while the data remain practically useful. I begin by demonstrating ways to learn information about entities from publicly available information. I then provide a formal framework for reasoning about disclosure control and the ability to infer the identities of entities contained within the data. I formally define and present null-map, k-map and wrong-map as models of protection. Each model provides protection by ensuring that released information maps to no, k or incorrect entities, respectively.

The working paper ends by examining four computational systems that attempt to maintain privacy while releasing electronic information. These systems are: (1) my Scrub System, which locates personally-identifying information in letters between doctors and notes written by clinicians; (2) my Datafly II System, which generalizes and suppresses values in field-structured data sets; (3) Statistics Netherlands' Mu-Argus System, which is becoming a European standard for producing public-use data; and, (4) my k-Similar algorithm, which finds optimal solutions such that data are minimally distorted while still providing adequate protection. By introducing anonymity and quality metrics, I show that Datafly II can overprotect data, Scrub and Mu-Argus can fail to provide adequate protection, but k-similar finds optimal results.


Related LIDAP links



Spring 2001 LIDAP [LIDAP@lab.privacy.cs.cmu.edu]