Logo image
Machine Learning Applications to DNA Subsequence and Restriction Site Analysis
Conference proceeding   Open access

Machine Learning Applications to DNA Subsequence and Restriction Site Analysis

E. Moyer, A. Das and IEEE
2020 IEEE SIGNAL PROCESSING IN MEDICINE AND BIOLOGY SYMPOSIUM
01 Jan 2020
url
http://arxiv.org/abs/2011.03544View

Abstract

Engineering Engineering, Biomedical Engineering, Electrical & Electronic Life Sciences & Biomedicine Medical Informatics Science & Technology Technology
Based on the BioBricks (TM) standard, restriction synthesis is a novel catabolic iterative DNA synthesis method that utilizes endonucleases to synthesize a query sequence from a reference sequence. In this work, the reference sequence is built from shorter subsequences by classifying them as applicable or inapplicable for the synthesis method using three different machine learning methods: Support Vector Machine (SVM), random forest, and Convolution Neural Network (CNN). Before applying these methods to the data, a series of feature selection, curation, and reduction steps are applied to create an accurate and representative feature space. Following these preprocessing steps, three different pipelines are proposed to classify subsequences based on their nucleotide sequence and other relevant features corresponding to the restriction sites of over 200 endonucleases. The sensitivity using SVM, random forest, and CNN are 94.9%, 92.7%, 91.4%, respectively. Moreover, each method scores lower in specificity with SVM, random forest, and CNN resulting in 77.4%, 85.7%, and 82.4%, respectively. In addition to analyzing these results, the misclassifications in SVM and CNN are investigated. Across these two models, different features with a derived nucleotide specificity visually contribute more to classification compared to other features. This observation is an important factor when considering new nucleotide sensitivity features for future studies.

Metrics

6 Record Views
8 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Engineering, Biomedical
Engineering, Electrical & Electronic
Medical Informatics
Logo image