Journal article
Characterizing the Empirical Distribution of Prokaryotic Genome n-mers in the Presence of Nullomers
Journal of computational biology, v 21(10), pp 732-740
01 Oct 2014
PMID: 25075627
Abstract
Characterizing the empirical distribution of the frequency of
n
-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the
n
-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of
n
-mers (6 ≤
n
≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each
n
-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of
n
-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.
Metrics
5 Record Views
Details
- Title
- Characterizing the Empirical Distribution of Prokaryotic Genome n-mers in the Presence of Nullomers
- Creators
- Loni Philip Tabb - 1Department of Epidemiology & Biostatistics, Drexel University, Philadelphia, PennsylvaniaWei Zhao - 2Division of Translational Medicine and Human Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PennsylvaniaJingyu Huang - 3Department of Biometrics Services, Frontage Laboratories, Exton, PennsylvaniaGail L Rosen - 4Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania
- Publication Details
- Journal of computational biology, v 21(10), pp 732-740
- Publisher
- Mary Ann Liebert, Inc
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering; Urban Health Collaborative; Epidemiology and Biostatistics
- Web of Science ID
- WOS:000342301600002
- Scopus ID
- 2-s2.0-84907915138
- Other Identifier
- 991014877665304721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Biochemical Research Methods
- Biotechnology & Applied Microbiology
- Computer Science, Interdisciplinary Applications
- Mathematical & Computational Biology
- Statistics & Probability