Journal article
PharmBERT: a domain-specific BERT model for drug labels
Briefings in bioinformatics, v 24(4), bbad226
14 Jun 2023
PMID: 37317617
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Human prescription drug labeling contains a summary of the essential scientific information needed for the safe and effective use of the drug and includes the Prescribing Information, FDA-approved patient labeling (Medication Guides, Patient Package Inserts and/or Instructions for Use), and/or carton and container labeling. Drug labeling contains critical information about drug products, such as pharmacokinetics and adverse events. Automatic information extraction from drug labels may facilitate finding the adverse reaction of the drugs or finding the interaction of one drug with another drug. Natural language processing (NLP) techniques, especially recently developed Bidirectional Encoder Representations from Transformers (BERT), have exhibited exceptional merits in text-based information extraction. A common paradigm in training BERT is to pretrain the model on large unlabeled generic language corpora, so that the model learns the distribution of the words in the language, and then fine-tune on a downstream task. In this paper, first, we show the uniqueness of language used in drug labels, which therefore cannot be optimally handled by other BERT models. Then, we present the developed PharmBERT, which is a BERT model specifically pretrained on the drug labels (publicly available at Hugging Face). We demonstrate that our model outperforms the vanilla BERT, ClinicalBERT and BioBERT in multiple NLP tasks in the drug label domain. Moreover, how the domain-specific pretraining has contributed to the superior performance of PharmBERT is demonstrated by analyzing different layers of PharmBERT, and more insight into how it understands different linguistic aspects of the data is gained.
Metrics
Details
- Title
- PharmBERT: a domain-specific BERT model for drug labels
- Creators
- Taha ValizadehAslani - Drexel UniversityYiwen Shi - Drexel UniversityPing Ren - Center for Drug Evaluation and ResearchJing Wang - Center for Drug Evaluation and ResearchYi Zhang - Center for Drug Evaluation and ResearchMeng Hu - Center for Drug Evaluation and ResearchLiang Zhao - Center for Drug Evaluation and ResearchHualou Liang (Corresponding Author) - Drexel University
- Publication Details
- Briefings in bioinformatics, v 24(4), bbad226
- Publisher
- Oxford University Press
- Number of pages
- 10
- Grant note
- 75F40119C10106 / United States Food and Drug Administration
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science (Informatics); Electrical and Computer Engineering; School of Biomedical Engineering, Science, and Health Systems; [Retired Faculty]
- Web of Science ID
- WOS:001007664900001
- Scopus ID
- 2-s2.0-85165521337
- Other Identifier
- 991020624903504721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Biochemical Research Methods
- Mathematical & Computational Biology