Journal article
iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond
PeerJ (San Francisco, CA), v 13, e19171
28 Apr 2025
PMID: 40313391
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Background The advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. Methods One recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. Results iMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. Conclusion The incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (
Metrics
2 Record Views
Details
- Title
- iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond
- Creators
- Robi Polikar - Rowan UniversityHyunwoo Yoo - Drexel UniversityJames R Brown - Drexel UniversityBahrad A Sokhansanj - Drexel UniversityGail L Rosen - Drexel UniversityMohammadsaleh Refahi - Drexel University
- Publication Details
- PeerJ (San Francisco, CA), v 13, e19171
- Publisher
- PeerJ
- Number of pages
- 11
- Grant note
- National Science Foundation
We utilized OpenAI's ChatGPT4o to assist in drafting and editing parts of this manuscript, particularly in refining the language.
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Web of Science ID
- WOS:001479747900001
- Scopus ID
- 2-s2.0-105003750764
- Other Identifier
- 991022048900104721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Biochemistry & Molecular Biology