Dataset
A Dataset for Drug Resistance Classification from Antimicrobial DNA Sequences
01 Jan 2024
Abstract
This dataset provides a curated and standardized collection of antimicrobial resistance (AMR) gene sequences and annotations for drug resistance classification tasks. It integrates entries from the Comprehensive Antibiotic Resistance Database (CARD) and MEGARes v3.0, and unifies resistance labels using the Antibiotic Resistance Ontology (ARO). To enhance reliability, classes with fewer than 15 samples were excluded. Each data sample includes a full-length nucleotide sequence, along with harmonized annotations for Drug Class, Resistance Mechanism, and Gene Family.
The dataset covers 9 major antimicrobial Drug Classes:
Beta-lactams
Aminoglycosides
Glycopeptides
Tetracyclines
Fluoroquinolones
MLS (Macrolide-Lincosamide-Streptogramin)
Sulfonamides
Phenicol
Multi-drug resistance
Resistance mechanisms include categories such as antibiotic inactivation, target alteration, efflux, target protection, target replacement, and reduced permeability to antibiotics.
Gene family annotations show a long-tailed distribution, with frequently observed families including beta-lactamases, aminoglycoside-modifying enzymes, major facilitator superfamily (MFS) efflux pumps, ribosomal protection proteins, and rRNA methyltransferases.
This dataset has been used in studies involving sequence-based classification models such as Nucleotide Transformer. For model training, input sequences were truncated to 1000 base pairs, although the dataset itself provides full-length sequences. It is suitable for AMR prediction tasks and supports research in computational biology, genomic analysis, and biomedical natural language processing.
Metrics
54 Record Views
Details
- Title
- A Dataset for Drug Resistance Classification from Antimicrobial DNA Sequences
- Creators
- Hyunwoo Yoo - Drexel UniversityBahrad Sokhansanj - Drexel UniversityJames Brown - Drexel UniversityGail Rosen - Drexel University
- Publisher
- Zenodo
- Resource Type
- Dataset
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Other Identifier
- 991022054406804721