Using large language models for analyzing drug label data

Taha ValizadehAslani

doi:10.17918/00001912

Large language models (LLMs) have emerged as a milestone in the realm of machine learning, offering groundbreaking capabilities in understanding human language with remarkable coherence and versatility. Their application spans a multitude of domains, ranging from text classification to information extraction. Such models are first pre-trained on vast data corpora to learn general language patterns and then are fine-tuned on specific downstream tasks. Despite their impressive capabilities, LLMs encounter several limitations, including sub-optimal performance in understanding the domain-specific language of drug labels, limited performance in analyzing class-imbalanced data, and high computational complexity of training. Human prescription drug labeling contains a summary of the essential scientific information needed for the safe and effective use of the drug and includes critical information about drug products, such as pharmacokinetics and adverse events. Automatic information extraction from drug labels can facilitate processes such as finding the adverse reaction of the drugs or finding the interaction of one drug with another drug. Although LLMs are an interesting solution to perform this task, because of their training on generic data, they may not possess the depth of knowledge or the ability required to understand the language in such a specialized field. To address this issue, this work presents the developed PharmBERT, which is a BERT model specifically pretrained on the drug labels, publicly available online for the NLP community. We demonstrate that our model outperforms models such as vanilla BERT, ClinicalBERT, and BioBERT in multiple NLP tasks in the drug label domain. Another shortcoming of LLMs lies in their handling of class-imbalanced data, leading to a subpar performance in identifying and accurately responding to minority classes. This class-imbalance can skew the model's learning process, favoring majority classes, and thus diminishing its effectiveness in recognizing and appropriately addressing less represented categories. To cope with these challenges, we present a simple modification of standard fine-tuning. Specifically, we propose a two-stage fine-tuning. In Stage 1, we fine-tune the final layer of the pretrained model with class-balanced augmented data, generated using ChatGPT. As a large generative language model, ChatGPT is capable of generating novel yet contextually similar responses to a given text sample, which makes it an excellent candidate for data augmentation. In Stage 2, we perform the standard fine-tuning. Our modification has several benefits: (1) it leverages pretrained representations by only fine-tuning a small portion of the model parameters while keeping the rest untouched; (2) it allows the model to learn an initial representation of the specific task; and importantly (3) it protects the learning of tail classes from being at a disadvantage during the model updating. The experimental results show that the proposed two-stage fine-tuning outperforms vanilla fine-tuning and state-of-the-art methods on the above datasets. The third issue of LLMs is the high computational complexity of training. Due to the large number of parameters in many state-of-the-art LLMs, such as Bidirectional Encoder Representations from Transformers (BERT), the process of fine-tuning exhibits a high level of intricacy. One attractive solution to this issue is parameter-efficient fine-tuning, which means only fine-tuning a small portion of the model and not changing the rest. However, the question that which portion of the BERT model is the most important one for fine-tuning remains unanswered. Here, we first analyze different components in the BERT model to see which one undergoes the most change after fine-tuning. We show that output LayerNorm changes more than other components when finetuned for different General Language Understanding Evaluation (GLUE) tasks. Then we show that only fine-tuning this component reaches a comparable, or in some cases better, performance to full fine-tuning and other parameter-efficient fine-tuning methods. Moreover, we use Fisher information to find out which parameters in LayerNorm are the most important ones and show that with a small performance degradation, many NLP tasks in the GLUE benchmark can be solved by fine-tuning only a small, selected portion of LayerNorm. In summary, LLMs hold immense potential to redefine and enhance various fields, including drug-label analysis, but they also suffer from limitations like sub-optimal performance in understanding domain-specific texts of drug labels, not learning minority classes in the class-imbalanced data, and high computational complexity. In this thesis, for each one of these shortcomings, first, the problem is analyzed and then a solution is provided.

Using large language models for analyzing drug label data

Files and links (1)

Abstract

Metrics

Details

Using large language models for analyzing drug label data

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media