Research Activities

The primary goal of Data Privacy Lab research is to create architectural, algorithmic and technological foundations for the maintenance of the privacy of individuals, the confidentiality of organizations, and the protection of sensitive information, given the requirement that information be released publicly or semi-publicly. We seek to invent balanced approaches that integrate technology and policy together for the purpose of satisfying society's need for data while protecting society's need for privacy. We will develop these new approaches by studying problems and specific data collections contributed by our industrial and government partners, and we will make our results immediately available to Data Privacy Lab partners. Thus, our partners will have access to new methods and findings long before they become commercially or publicly available.

Data Privacy Lab research can be viewed either in terms of basic scientific issues to be addressed, or in terms of specific data and applications. The exact list of applications and specific data collections examined will be determined in great part by the needs of our industrial and government partners. The list of basic research topics is therefore based on those needs and on faculty research interests and expertise. Clearly, the most important scientific and policy issues will have significant impact across many different application areas. This allows Data Privacy Lab to spread the cost of this basic research over multiple problem domains and multiple funding sources. Below is a sample of current research activities in the Data Privacy Lab.

For a sample of projects currently underway, see Data Privacy Lab current projects.

  • Disclosure control techniques and systems. (k-anonymity, Datafly)
    The Data Privacy Lab already has the leading systems and algorithms for rendering data sufficiently anonymous. However, exploring new techniques, protection models, and anonymous data systems, as well as applying known systems and models to different kinds of data, such as genetic and DNA, GIS and video surveillance images, remains an ongoing research activity.

  • Distributed Privacy.
    The goal of this work is to provide the architectural means to electronically coordinate information from vast numbers of distributed, autonomous data holders so that intended release and declassification policies can be collectively enforced, even when related inferences may not have been explicitly stated. An example includes automated surveillance of data in order to detect bioterrorist attacks and naturally occurring outbreaks. Another example concerns the automated construction of meta-level data systems.

  • Automated policy enforcement.
    The goal of this work is to automatically transfer textually stated policy statements into enforceable software actions even when related inferences are not necessarily stated. Examples include: sharing information internationally while respecting the European Union directive; collecting personal information over the Internet in one country (e.g. the United States) on citizens in another country (e.g. Canada); and, sharing data based on the new HHS regulations.

  • Anonymity certification.
    The goal of this work is to automatically assess and report the identifiability of information contained within a given data set. An example includes an automated system that determines the number of people that could be identified in a publicly available data set.

  • Privacy metrics.
    The goal of this work is to define and assess useful metrics for measuring the extent and character of privacy problems, risks and liabilities. Having such allows stated practices and proposed policies to be compared. While the Data Privacy Lab already has some effective metrics for computing the amount of information collected on individuals and for measuring privacy risk and liability, demonstrating the effectiveness of these metrics and exploring supporting metrics remains an ongoing research activity.

  • Privacy policy frameworks.
    The goal of this work is to explore and assess existing and proposed policies concerning data privacy in comprehensive frameworks. Examples include: examining the intent and usefulness of informed consent in the secondary sharing of medical data; devising dynamic on-line consent systems; and, designing holistic models of data collection and sharing.

  • Anonymous linking.
    The goal of this work is to develop computational tools and practices to electronically link data sets and render the results sufficiently anonymous even though the original data sets are themselves rendered sufficiently anonymous and originate from data holders that do not share information with each other. The idea is that data are linked on cryptographic equivalents of the fully identified data that is hidden within the sufficiently anonymous data. Results appear as visible values in a regular text-based flat file.

Data Privacy Lab faculty research interests include many additional topics as well, such as visualization of data collections and privacy problems, re-identification experiments, linking and profiling techniques, and privacy issues specific to the Internet.


Related Data Privacy Lab links



Summer 2003 Data Privacy Laboratory [LIDAP@privacy.cs.cmu.edu]