Logo image
Comprehensive benchmarking and ensemble approaches for metagenomic classifiers
Journal article   Open access   Peer reviewed

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers

Alexa B R McIntyre, Rachid Ounit, Ebrahim Afshinnekoo, Robert J Prill, Elizabeth Hénaff, Noah Alexander, Samuel S Minot, David Danko, Jonathan Foox, Sofia Ahsanuddin, …
Genome biology, v 18(1), pp 182-182
21 Sep 2017
PMID: 28934964
url
https://doi.org/10.1186/s13059-017-1299-7View
Published, Version of Record (VoR)CC BY V4.0 Open

Abstract

Benchmarking - methods Benchmarking - standards Contig Mapping - methods Contig Mapping - standards DNA Barcoding, Taxonomic - methods DNA Barcoding, Taxonomic - standards Humans Metagenome Microbiota Phylogeny Sequence Analysis, DNA - methods Sequence Analysis, DNA - standards Software
One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

Metrics

8 Record Views
216 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Industry collaboration
Domestic collaboration
Web of Science research areas
Biotechnology & Applied Microbiology
Genetics & Heredity
Logo image