Journal article
MetaCurator: A hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers
METHODS IN ECOLOGY AND EVOLUTION, v 11(1)
Jan 2020
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
While metabarcoding and metagenomic approaches are increasingly popular, questions remain about how best to analyse and taxonomically characterize the sequence data produced by such methods. Due to a lack of software infrastructure, important reference sequence curation steps are often ignored. We present MetaCurator, a software package designed for automated reference sequence curation and highly generalizable across markers and study systems. MetaCurator contains two signature tools. IterRazor utilizes profile hidden Markov models and an iterative search framework to exhaustively identify and extract the precise marker of interest from available references. DerepByTaxonomy dereplicates sequences using a taxonomically aware approach, removing duplicates only when they belong to the same taxon. This is important for highly conserved markers, such as plant rbcL and trnL, which often display no sequence divergence across taxa, even at the genus level. Using MetaCurator, we produced reference sequence databases for a popular arthropod COI marker as well as four plant barcoding markers, trnL, rbcL, ITS2 and trnH. In comparing these databases to those produced by recent and comparable studies, we show that the Metacurator pipeline exhibits greater sensitivity during sequence extraction, especially for poorly conserved markers. Further, database taxonomic richness was not decreased following sequence dereplication, as observed in previous studies. MetaCurator is supported on OSX and Linux and is freely available under a GPL v3.0 license at . The reference databases produced in this work, and the commands used for curation, are available at .
Metrics
Details
- Title
- MetaCurator: A hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers
- Publication Details
- METHODS IN ECOLOGY AND EVOLUTION, v 11(1)
- Publisher
- WILEY; HOBOKEN
- Number of pages
- 0
- Grant note
- Costco-Project Apis m. Honey Bee Biology Fellowship: NA; The Ohio State University
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Drexel University
- Web of Science ID
- WOS:000493044400001
- Scopus ID
- 2-s2.0-85074851889
- Other Identifier
- 991021860767704721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Ecology