Evaluating the quality of the 1000 genomes project data
Saurabh Belsare, Michal Levy-Sakin, Yulia Mostovoy, Steffen Durinck, Subhra Chaudhuri, Ming Xiao, Andrew S. Peterson, Pui-Yan Kwok, Somasekar Seshagiri and Jeffrey D. Wall
Background Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. Results We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account. Conclusions The quality of the 1000 Genomes data needs to be considered while using this database for further studies. This work presents an analysis that can be used for these assessments.
Evaluating the quality of the 1000 genomes project data
Creators
Saurabh Belsare - University of California, San Francisco
Michal Levy-Sakin - University of California, San Francisco
Yulia Mostovoy - University of California, San Francisco
Steffen Durinck - Genentech
Subhra Chaudhuri - Genentech
Ming Xiao - Drexel University
Andrew S. Peterson - Genentech
Pui-Yan Kwok - University of California, San Francisco
Somasekar Seshagiri - Genentech
Jeffrey D. Wall - University of California, San Francisco
Publication Details
BMC genomics, v 20(1), pp 620-620
Publisher
Springer Nature
Number of pages
14
Grant note
CA0095684 / Genentech research grant
R01GM115433 / NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Institute of General Medical Sciences (NIGMS)
R01 GM115433 / NIH; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA
Resource Type
Journal article
Language
English
Academic Unit
School of Biomedical Engineering, Science, and Health Systems
Web of Science ID
WOS:000481741100001
Scopus ID
2-s2.0-85071046433
Other Identifier
991019168102904721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: