The amount of metagenomic data is growing rapidly while the computational methods for metagenome analysis are still in their infancy. It is important to develop novel statistical learning tools for the prediction of associations between bacterial communities and disease phenotypes and for the detection of differentially abundant features. In this study, we presented a novel statistical learning method for simultaneous association prediction and feature selection with metagenomic samples from two or multiple treatment populations on the basis of count data. We developed a linear programming based support vector machine with L-1 and joint L-1,L-infinity penalties for binary and multiclass classifications with metagenomic count data (metalinprog). We evaluated the performance of our method on several real and simulation datasets. The proposed method can simultaneously identify features and predict classes with the metagenomic count data.
Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data
Creators
Zhenqiu Liu - University of Maryland Marlene and Stewart Greenebaum Cancer Center
Dechang Chen - Uniformed Services University of the Health Sciences
Li Sheng - Drexel University
Amy Y. Liu - Brown University
Publication Details
PloS one, v 8(3), pp e53253-e53253
Publisher
Public Library Science
Number of pages
7
Grant note
1R03CA133899 / National Cancer Institute; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI)
ADT-1220747; CCF-0729080 / National Science Foundation (NSF)
R03CA133899 / NATIONAL CANCER INSTITUTE; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI)
Resource Type
Journal article
Language
English
Academic Unit
Mathematics
Web of Science ID
WOS:000317418500001
Scopus ID
2-s2.0-84875440137
Other Identifier
991019168762704721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: