Deconstructing Data Mining: Protecting Privacy and Civil Liberties in Automated Decision-Making

Lindsey Barrett

"Big Data" is the bogeyman of the information age: powerful, and as ill-defined as it is abstractly threatening. Broadly, it encompasses "technology that maximize[s] computational power and algorithmic accuracy"; "types of analyses that draw on a range of tools to clean and compare data"; and the underlying belief in the correlation between the size of the data set, and its ability to produce increasingly accurate and nuanced insights.3 Put another way, "'Big data' [is] the amassing of huge amounts of statistical information on social and economic trends and human behavior." The belief in the prescient value of big data has led to widespread collection of information on citizens and consumers in both the public and private sectors, though that distinction has become increasingly permeable. Data brokers, companies that create and sell detailed profiles of consumers for profit, sell their products to private and public entities alike, and often do not have data quality control clauses in the contracts governing those interactions. These profiles also often refer directly or indirectly to sensitive attributes, such as race, gender, age, and socioeconomic status. This brave new world of big data is no longer new. But the mechanics of the algorithms relying on that data, and the process by which decisions are made using that information, merits a sharpened focus. Algorithmic decision-making is increasingly replacing existing practices in both the public and private sector, making an understanding of the technical construction of those algorithms increasingly crucial. This is all the more true for processes in which the consumer or citizen does not have a voice, and the logic behind the decision is fundamentally opaque. It is difficult, if not impossible, for that consumer or citizen to challenge an adverse decision made about her when the basis for the decision is unavailable. In the private sector, automated predictions are used to calculate loan rates, credit scores, insurance risk, employment evaluations, and in hiring searches. In the public sector, they are being used for risk prediction in law enforcement," as well as for sentencing, and to calculate benefits. Further, there is a pervasive and misguided belief in the inherent neutrality of algorithmic decision-making by virtue of its empiricism. But data is not inherently neutral, and neither are the algorithms that process it. Each is the product of the beliefs, fallibilities, and biases of the person who created them. If those fallibilities are unaccounted for, algorithms will simply replicate the pre-existing inequalities encoded in their intake data and structure. This memorandum will provide an overview on the basics of algorithms and data mining, and explore how automated decision-making can unintentionally reveal sensitive information, or unintentionally base their predictions on protected traits, implicating individual privacy and civil liberties.

Deconstructing Data Mining: Protecting Privacy and Civil Liberties in Automated Decision-Making

Files and links (1)

Abstract

Metrics

Details

Deconstructing Data Mining: Protecting Privacy and Civil Liberties in Automated Decision-Making

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media