Conference proceeding
Semi-supervised and Incremental VSEARCH for Metagenomic Classification
2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), pp 1119-1126
2022
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
DNA Sequencing of microbial communities from environmental samples generates large volumes of data, which can be analyzed using various bioinformatics pipelines. Unsupervised clustering algorithms are usually an early and critical step in an analysis pipeline, since much of such data are unlabeled, unstructured, or novel. However, curated reference databases that provide taxonomic label information are also increasing and growing, which can help in the classification of sequences, and not just clustering. In this contribution, we report on our progress in developing a semi-supervised approach for genomic clustering algorithms, such as U/VSEARCH. The primary contribution of this approach is the ability to recognize previously seen or unseen novel sequences using an incremental approach: for sequences whose examples were previously seen by the algorithm, the algorithm can predict a correct label. For previously unseen novel sequences, the algorithm assigns a temporary label and then updates that label with a permanent one if/when such a label is established in a future reference database. The incremental learning aspect of the proposed approach provides the additional benefit and capability to process the data continuously as new datasets become available. This functionality is notable as most sequence data processing platforms are static in nature, designed to run on a single batch of data, whose only other remedy to process additional data is to combine the new and old data and rerun the entire analysis. We report our promising preliminary results on an extended 16S rRNA database.
Metrics
8 Record Views
1 citations in Scopus
Details
- Title
- Semi-supervised and Incremental VSEARCH for Metagenomic Classification
- Creators
- Emrecan Ozdogan - Rowan UniversityAdriana Fasino - Rowan UniversityRachel Nguyen - Drexel UniversityBahrad Sokhansanj - Drexel UniversityGail Rosen - Drexel UniversityRobi Polikar - Rowan University
- Publication Details
- 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), pp 1119-1126
- Conference
- 2022 IEEE Symposium Series on Computational Intelligence (SSCI) (Singapore, Singapore, 04 Dec 2022–07 Dec 2022)
- Publisher
- IEEE
- Number of pages
- 8
- Grant note
- 1936782 / U.S. National Science Foundation; National Science Foundation (NSF)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Web of Science ID
- WOS:000971973800150
- Scopus ID
- 2-s2.0-85147796781
- Other Identifier
- 991020575635804721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Interdisciplinary Applications
- Computer Science, Theory & Methods