Logo image
Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training
Preprint   Open access

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Qingyang Li and Weimao Ke
01 Oct 2024
url
https://arxiv.org/abs/2410.01019View
Preprint (Author's original)arXiv.org - Non-exclusive license to distribute Open

Abstract

Computer Science - Computation and Language Computer Science - Learning
This paper examines the pivotal role of dropout techniques in mitigating overfitting in language model training. It conducts a comprehensive investigation into the influence of variable dropout rates on both individual layers and residual connections within the context of language modeling. Our study conducts training of a decoder implementation on the classic Tiny Shakespeare data to examine the effects of the adjustments on training efficiency and validation error. Results not only confirm the benefits of dropout for regularization and residuals for convergence, but also reveal their interesting interactions. There exists an important trade-off between the depth of residual connections and the dropout on these connections for optimal deep neural network convergence and generalization.

Metrics

5 Record Views

Details

Logo image