Carnegie Mellon University

Data Privacy
Laboratory

Laboratory for International Data Privacy


All the Data on All the People



L. Sweeney. All the Data on All the People, LIDAP-WP3. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA: 2000. [Members only: Full paper in PDF 103 pages, 843KB]

Abstract

In this working paper, I examine the tremendous growth in information being collected on individuals and relate this growth to access to inexpensive computers with large storage capacities. Therefore, the trend in collecting so much information is expected to continue to increase. From the examples provided in this working paper, it is clear that many details in the lives of most people are being documented in databases somewhere and that there exists few operational barriers to restrict the sharing of collected information. I present a formal model for characterizing real-world data sharing policies and define privacy and risk metrics to compare policies. I then apply these metrics to the real-world practices of sharing hospital discharge data. Findings include: (1) 25 of the 44 states that collect hospital discharge data share the information on a public or semi-public basis; (2) the number of people eligible to receive a copy of the data is greater than the number of people whose information is contained in the data; and, (3) publicly available data tends to be overly distorted and so more copies of the more sensitive, semi-publicly available data are more commonly distributed. Having so much sensitive information available makes it even more difficult for other organizations to release information that are effectively anonymous. Scientific contributions from this working paper include a mathematical model for comparing and assessing data sharing policies.


Related LIDAP links



Spring 2001 LIDAP [LIDAP@lab.privacy.cs.cmu.edu]