Volume 2, Issue 13
Privacy in the Information Age
These days, we know better than to reveal too much personal information online in order to protect ourselves against identity theft and to maintain our individual privacy. But how much information is too much?
Image credit: ThinkStock.
Simple demographic information (such as gender, birthdate or race) is commonly linked to identity (people’s names) in public records such as voter registration databases or birth records. This means that other, seemingly anonymous records containing private information about individuals, such as medical histories, may be traceable back to an individual even if his or her name and social security number were removed from the data. This creates a problem when access to data about individuals is required but privacy must be protected, for example, in datasets necessary for financial accounting or scientific research.
Data is made more anonymous by altering pieces of data that can be used to identify an individual, such as gender or birthdate (day, month and year of birth), called quasi-identifiers. This is done either by generalizing (making less specific), suppressing (removing), or distorting (changing) pieces of information. Such alterations result in a trade-off between privacy and either precision, completeness or accuracy of the data. How can we ensure that data is useful and minimally distorted while protecting the privacy of individuals?
Simple demographics often identify people uniquely. Source: Latanya Sweeney, Harvard University.
Latanya Sweeney, a computer scientist at Harvard University, decided to tackle this problem. She found that just a few pieces of simple, demographic data are often enough to identify a specific individual. For example, 87 percent of Americans are uniquely identifiable by their gender, birthdate and zip code!
As a solution, Professor Sweeney created a computer algorithm to optimize the generalization and suppression of quasi-identifiers to ensure a minimum level of anonymity, called k-anonymity. For a record to meet the desired k-anonymity standard, the quasi-identifiers for any given record are identical to (and thus indistinguishable from) those for at least k − 1 other records, where k is a user-defined parameter. This Preferred Minimal Generalization Algorithm, or MinGen for short, provides k-anonymity protection with minimal distortion of data.
Image of Latanya Sweeney.
Who thinks of this stuff? Latanya Sweeney is the head of the Data Privacy Lab at Harvard University, where she solves real-world problems through research in computer science and public policy. Dr. Sweeney has created a variety of computational tools to protect individual privacy, including facial de-identification software, surveillance technology that operates with a customizable level of identifiability. She is also the creator of Scrub, a program that successfully identifies and replaces 99-100% of personally identifiable information about patients contained in notes and letters shared between physicians without inhibiting effective consultation on patient care. She has testified on re-identifiability of data to the Department of Homeland Security, the Department of Defense and the United States Senate. When she’s not working to find new ways to protect online privacy or prevent identity theft, you might spot Dr. Sweeney riding her motorcycle around Cambridge, Massachusetts.
Read more about Latanya Sweeney’s computer science and policy research, including k-anonymity, on her website (http://latanyasweeney.org/) and on the website for Harvard’s Data Privacy Lab (http://dataprivacylab.org/).
Check out Dr. Sweeney's most recent work on Discrimination in Online Ad Delivery at: http://dataprivacylab.org/projects/onlineads/.
Watch for a new interactive website (aboutmyinfo.org) from the Data Privacy Lab that will tell you how many people match your characteristics after you enter some basic demographical information.
In honor of Women’s History Month, read more about women in computer science at the Anita Borg Institute (http://anitaborg.org/news/profiles-of-technical-women/famous-women-in-computer-science/; http://anitaborg.org/news/archive/senior-technical-women-profiles-of-success/) and at the National Center for Women & Information Technology (https://www.ncwit.org/itnews).