Logo image
The Naïve Bayes Classifier ++ for Metagenomic Taxonomic Classification-Query Evaluation
Journal article   Open access   Peer reviewed

The Naïve Bayes Classifier ++ for Metagenomic Taxonomic Classification-Query Evaluation

Haozhe Neil Duan, Gavin Hearne, Robi Polikar and Gail L Rosen
Bioinformatics (Oxford, England), v 41(1), btae743
19 Dec 2024
PMID: 39700412
url
https://doi.org/10.1093/bioinformatics/btae743View
Published, Version of Record (VoR)Open Access (License Unspecified) Open

Abstract

taxonomic classification metagenomics Machine Learning
This study examines the query performance of the NBC ++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC ++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. NBC ++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC ++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC ++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information. Source code and Dockerfile are available at http://github.com/EESI/Naive_Bayes. Supplementary data are available at Bioinformatics online, and databases are available at Zenodo records #11657719 and #11643985.

Metrics

10 Record Views
1 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Biochemical Research Methods
Biotechnology & Applied Microbiology
Computer Science, Interdisciplinary Applications
Mathematical & Computational Biology
Statistics & Probability
Logo image