Logo image
Generating Poisson-distributed differentially private synthetic data
Journal article   Open access

Generating Poisson-distributed differentially private synthetic data

Harrison Quick
Journal of the Royal Statistical Society. Series A, Statistics in society, v 184(3), pp 1093-1108
01 Jul 2021
url
http://arxiv.org/abs/1906.00455View

Abstract

Mathematical Methods In Social Sciences Mathematics Physical Sciences Science & Technology Social Sciences Social Sciences, Mathematical Methods Statistics & Probability
The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end-user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson-distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county-level heart disease-related death counts. This study demonstrates the benefits of the proposed approach's flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

Metrics

16 Record Views
7 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Social Sciences, Mathematical Methods
Statistics & Probability
Logo image