Computer Science - Computation and Language Computer Science - Learning
Fine-tuning a pre-trained model, such as Bidirectional Encoder
Representations from Transformers (BERT), has been proven to be an effective
method for solving many natural language processing (NLP) tasks. However, due
to the large number of parameters in many state-of-the-art NLP models,
including BERT, the process of fine-tuning is computationally expensive. One
attractive solution to this issue is parameter-efficient fine-tuning, which
involves modifying only a minimal segment of the model while keeping the
remainder unchanged. Yet, it remains unclear which segment of the BERT model is
crucial for fine-tuning. In this paper, we first analyze different components
in the BERT model to pinpoint which one undergoes the most significant changes
after fine-tuning. We find that output LayerNorm changes more than any other
components when fine-tuned for different General Language Understanding
Evaluation (GLUE) tasks. Then we show that only fine-tuning the LayerNorm can
reach comparable, or in some cases better, performance to full fine-tuning and
other parameter-efficient fine-tuning methods. Moreover, we use Fisher
information to determine the most critical subset of LayerNorm and demonstrate
that many NLP tasks in the GLUE benchmark can be solved by fine-tuning only a
small portion of LayerNorm with negligible performance degradation.
Metrics
25 Record Views
Details
Title
LayerNorm: A key component in parameter-efficient fine-tuning
Creators
Taha ValizadehAslani
Hualou Liang
Publication Details
arXiv.org
Resource Type
Preprint
Language
English
Academic Unit
School of Biomedical Engineering, Science, and Health Systems