Logo image
Reidentification Risk in Panel Data: Protecting for k-Anonymity
Journal article   Peer reviewed

Reidentification Risk in Panel Data: Protecting for k-Anonymity

Shaobo Li, Matthew J. Schneider, Yan Yu and Sachin Gupta
Information systems research, v 34(3), pp 1066-1088
01 Sep 2023

Abstract

Business & Economics Information Science & Library Science Management Science & Technology Social Sciences Technology
We consider the risk of reidentification of panelists in marketing research data that are widely used to obtain insights into buyer behavior and to develop marketing strategy. We find that 17%-94% of the panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We first demonstrate that the risk of reidentification is vastly understated by unicity, the conventional measure. Instead, we propose a new measure of reidentification risk, termed sno-unicity, which accounts for the longitudinal nature of panel data, and show that it is much larger than unicity. To protect the privacy of panelists, we consider the well-known privacy notion of k-anonymity and develop a new approach called graph-based minimum movement k-anonymization (k -MM) that is designed especially for panel data. The proposed k-MM approach can be formulated as an optimization problem in which the objective is to minimally distort variables in the original data based on weights that users prespecify corresponding to their use case. We further show how our approach can be extended to achieve l-diversity. We apply the k-MM approach to two different panel data sets that are widely used in marketing research. To achieve a given privacy level, compared with several benchmark protection methods, the protected data from our method result in the least distortion in inferences about key marketing metrics, such as brand market shares, share of category requirements, brand switching rates, and marketing-mix parameters estimated from a hierarchical Bayesian brand choice model.

Metrics

18 Record Views
4 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Information Science & Library Science
Management
Logo image