Journal article
Using imputation to provide location information for nongeocoded addresses
PloS one, v 5(2), pp e8998-e8998
10 Feb 2010
PMID: 20161766
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.
In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes) with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.
The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count) with a measure of uncertainty that are based on all the case data, the geocodes and imputed nongeocodes. Similar strategies can be applied in other analysis settings.
Metrics
Details
- Title
- Using imputation to provide location information for nongeocoded addresses
- Creators
- Frank C Curriero - Department of Environmental Health Sciences and Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America. fcurrier@jhsph.eduMartin KulldorffFrancis P BoscoeAnn C Klassen
- Publication Details
- PloS one, v 5(2), pp e8998-e8998
- Publisher
- Public LIbrary of Science (PLOS); United States
- Grant note
- R21 CA124921 / NCI NIH HHS
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Community Health and Prevention
- Web of Science ID
- WOS:000274442700001
- Scopus ID
- 2-s2.0-77949376259
- Other Identifier
- 991014878407104721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Public, Environmental & Occupational Health