Mathematical Methods In Social Sciences Mathematics Physical Sciences Science & Technology Social Sciences Social Sciences, Mathematical Methods Statistics & Probability
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The US Census Bureau collects millions of interrelated time series microdata that are hierarchical and contain many 0s and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian generalized linear mixed models with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the magnitudes or number of entities. We find that, as the prior distributions of the variance components in the Bayesian generalized linear mixed model become more precise towards zero, protection of confidentiality increases and the quality of inference deteriorates. We evaluate our methodology by using a strict privacy measure, empirical differential privacy and a newly defined risk measure, the probability of range identification, which directly measures attribute disclosure risk. We illustrate our results with the US Census Bureau's quarterly workforce indicators.
A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data
Creators
Matthew J. Schneider - Cornell University
John M. Abowd - Cornell University
Publication Details
Journal of the Royal Statistical Society. Series A, Statistics in society, v 178(4), pp 963-975
Publisher
Wiley
Number of pages
13
Grant note
1131848 / Divn Of Social and Economic Sciences; National Science Foundation (NSF); NSF - Directorate for Social, Behavioral & Economic Sciences (SBE)
BCS 0941226; SES 9978093; ITR 0427889; SES 0922005; SES 1131848 / National Science Foundation; National Science Foundation (NSF)
Resource Type
Journal article
Language
English
Academic Unit
Decision Sciences (and Management Information Systems)
Web of Science ID
WOS:000365391100010
Scopus ID
2-s2.0-84942363747
Other Identifier
991021852204704721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: