Logo image
Using spatiotemporal models to generate synthetic data for public use
Journal article   Peer reviewed

Using spatiotemporal models to generate synthetic data for public use

Harrison Quick and Lance A. Waller
Spatial and spatio-temporal epidemiology, v 27
01 Nov 2018
PMID: 30409375

Abstract

Life Sciences & Biomedicine Public, Environmental & Occupational Health Science & Technology
When agencies release public-use data, they must be cognizant of the potential risk of disclosure associated with making their data publicly available. This issue is particularly pertinent in disease mapping, where small counts pose both inferential challenges and potential disclosure risks. While the small area estimation, disease mapping, and statistical disclosure limitation literatures are individually robust, there have been few intersections between them. Here, we formally propose the use of spatiotemporal data analysis methods to generate synthetic data for public use. Specifically, we analyze ten years of county-level heart disease death counts for multiple age-groups using a Bayesian model that accounts for dependence spatially, temporally, and between age-groups; generating synthetic data from the resulting posterior predictive distribution will preserve these dependencies. After demonstrating the synthetic data's privacy-preserving features, we illustrate their utility by comparing estimates of urban/rural disparities from the synthetic data to those from data with small counts suppressed. (C) 2018 Elsevier Ltd. All rights reserved.

Metrics

3 Record Views
8 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Public, Environmental & Occupational Health
Logo image