Journal article
Information-theoretic approaches to SVM feature selection for metagenome read classification
Computational biology and chemistry, v 35(3), pp 199-209
Jun 2011
PMID: 21704267
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
[Display omitted]
► We find that feature selection improves performance of metagenomic taxonomic classification with SVMs. ► The mRMR information theoretic method is the best feature selection method, especially for the phylum level. ► Using feature selection with 9mers does not improve over 6mer feature selection.
Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback–Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.
Metrics
Details
- Title
- Information-theoretic approaches to SVM feature selection for metagenome read classification
- Creators
- Elaine Garbarine - Electrical and Computer Engineering Department, Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USAJoseph DePasquale - Electrical and Computer Engineering Department, Rowan University, 201 Mullhica Rd., Glassboro, NJ 08028, USAVinay Gadia - Electrical and Computer Engineering Department, Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USARobi Polikar - Electrical and Computer Engineering Department, Rowan University, 201 Mullhica Rd., Glassboro, NJ 08028, USAGail Rosen - Electrical and Computer Engineering Department, Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA
- Publication Details
- Computational biology and chemistry, v 35(3), pp 199-209
- Publisher
- Elsevier
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Web of Science ID
- WOS:000293157900012
- Scopus ID
- 2-s2.0-79959749487
- Other Identifier
- 991014878357604721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Biology
- Computer Science, Interdisciplinary Applications