A text-mining approach for classification of genomic fragments

V Gadia; G Rosen

doi:10.1109/BIBMW.2008.4686216

Back

A text-mining approach for classification of genomic fragments

Conference proceeding

Open access

A text-mining approach for classification of genomic fragments

V Gadia and G Rosen

2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Nov 2008

DOI: https://doi.org/10.1109/BIBMW.2008.4686216

Featured in Collection : UN Sustainable Development Goals @ Drexel

Files and links (1)

url

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.909.6525View

SubmittedOpen Access (License Unspecified), Open

Abstract

Performance evaluation

Euclidean distance

Frequency

Spatial databases

Phylogeny

Testing

Bioinformatics

DNA

Data Analysis

Genomics

Genome identification is an emerging area of interest due to the study of environmental DNA samples. We show that performance approaches 50% for classifying 500 bp fragments when using 12 mer features, but more importantly, the performance linearly increases for large N. Secondly, we determine that an inverted TF-IDF measure performs 16% better when only using 80% of the words, as opposed to taking the fullset (100%). This increase implies that while too sparse of a feature subset does not produce good results, a carefully selected set has the potential to improve genome classification over a random feature set. Computing even 80% of all possible features can result in a significant savings in computation. The Euclidean classifier and TF-IDF measures will pave the way for more discriminative classification techniques.

Metrics

15 Record Views

2 citations in Web of Science

6 citations in Scopus

Details

Title: A text-mining approach for classification of genomic fragments
Creators: V Gadia - Drexel University
G Rosen - Drexel University
Publication Details: 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops
Conference: 2008 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (Philadelphia, Pennsylvania, United States, 03 Nov 2008–05 Nov 2008)
Publisher: IEEE
Number of pages: 2
Resource Type: Conference proceeding
Language: English
Academic Unit: Electrical and Computer Engineering
Web of Science ID: WOS:000262067000015
Scopus ID: 2-s2.0-58049145645
Other Identifier: 991014878363004721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas: Engineering, Biomedical

A text-mining approach for classification of genomic fragments

Files and links (1)

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media