Logo image
Neural Network-based Taxonomic Clustering for Metagenomics
Conference proceeding   Open access

Neural Network-based Taxonomic Clustering for Metagenomics

Steven D. Essinger, Robi Polikar, Gail L. Rosen and IEEE
2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010
01 Jan 2010
url
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.387.2763View

Abstract

Computer Science Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering Engineering, Electrical & Electronic Science & Technology Technology
Metagenomic studies inherently involve sampling genetic information from an environment potentially containing thousands of distinctly different microbial organisms. This genetic information is sequenced producing many short fragments (<500 base pair (bp)); each is tentatively a small representative of the DNA coding structure. Any of the fragments may belong to any of the organisms in the sample, but the relationship is unknown a priori. Furthermore, most of these organisms have not been identified and correspondingly are not represented in any of the publicly available search databases. Our goal is to be able to predict the taxonomic classification of an organism based on the fragments obtained from an environmental sample that may include many (some previously unidentified) organisms. To elucidate the diversity and composition of the sample, we first use a supervised naive Bayes classifier to score the fragments of known genomes, followed by an unsupervised clustering to group fragments from similar organisms together. We are then free to analyze each cluster separately. This is challenging since we are not interested in similar sequences, but sequences that come from similar genomes, which are known to vary widely intra-genomically. Our dataset comprises of an extremely challenging scenario involving clustering fragments at the phyla level, where none of the phyla have been previously seen or identified. We present two variations of our proposed approach, one based on ART and K-means. We show that ART can cluster 500bp fragments from 17 novel phyla at an overall isolation/grouping that is 10% better than K-means and nearly 7 times over chance.

Metrics

12 Record Views
2 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Hardware & Architecture
Computer Science, Theory & Methods
Engineering, Electrical & Electronic
Logo image