Book chapter
Chapter 14 - Advances in Machine Learning for Processing and Comparison of Metagenomic Data
Computational Systems Biology, pp 295-329
2014
Abstract
Recent advances in next-generation sequencing have enabled high-throughput determination of biological sequences in microbial communities, also known as microbiomes. The large volume of data now presents the challenge of how to extract knowledge—recognize patterns, find similarities, and find relationships—from complex mixtures of nucleic acid sequences currently being examined. In this chapter we review basic concepts as well as state-of-the-art techniques to analyze hundreds of samples which each contain millions of DNA and RNA sequences. We describe the general character of sequence data and describe some of the processing steps that prepare raw sequence data for inference. We then describe the process of extracting features from the data, assigning taxonomic and gene labels to the sequences. Then we review methods for cross-sample comparisons: (1) using similarity measures and ordination techniques to visualize and measure differences between samples and (2) feature selection and classification to select the most relevant features for discriminating between samples.
Finally, in conclusion, we outline some open research problems and challenges left for future research.
Metrics
19 Record Views
7 citations in Scopus
Details
- Title
- Chapter 14 - Advances in Machine Learning for Processing and Comparison of Metagenomic Data
- Creators
- Jean-Luc Bouchot - Drexel UniversityWilliam L. Trimble - Argonne National LaboratoryGregory Ditzler - Drexel UniversityYemin Lan - Drexel UniversitySteve Essinger - Department of Electrical and Computer Engineering, Drexel University, PA, Philadelphia, USAGail L Rosen - Drexel University, Electrical and Computer Engineering
- Publication Details
- Computational Systems Biology, pp 295-329
- Publisher
- Elsevier
- Edition
- Second Edition
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Scopus ID
- 2-s2.0-84902412171
- Other Identifier
- 991019173772404721