Journal article
Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
Journal of biomedicine & biotechnology, v 2011
2011
PMID: 21541181
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate theperformance of several algorithms on a real acid mine drainage dataset.
Metrics
Details
- Title
- Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
- Creators
- Gail L Rosen - Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USARobi Polikar - Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ 08028, USADiamantino A Caseiro - Spoken Language Systems Laboratory, Instituto Superior Técnico, 1049-001 Lisbon, PortugalSteven D Essinger - Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USABahrad A Sokhansanj - School of Biomedical Engineering, Science, and Health Systems, Drexel University, Philadelphia, PA 19104, USA
- Publication Details
- Journal of biomedicine & biotechnology, v 2011
- Publisher
- Wiley
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Web of Science ID
- WOS:000298503800001
- Scopus ID
- 2-s2.0-79959326732
- Other Identifier
- 991014878107004721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Biotechnology & Applied Microbiology
- Medicine, Research & Experimental