Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Qingyang Li; Weimao Ke

doi:10.48550/arxiv.2410.01019

Back

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Preprint

Open access

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Qingyang Li and Weimao Ke

01 Oct 2024

DOI: https://doi.org/10.48550/arxiv.2410.01019

Files and links (1)

url

https://arxiv.org/abs/2410.01019View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

Computer Science - Learning

This paper examines the pivotal role of dropout techniques in mitigating overfitting in language model training. It conducts a comprehensive investigation into the influence of variable dropout rates on both individual layers and residual connections within the context of language modeling. Our study conducts training of a decoder implementation on the classic Tiny Shakespeare data to examine the effects of the adjustments on training efficiency and validation error. Results not only confirm the benefits of dropout for regularization and residuals for convergence, but also reveal their interesting interactions. There exists an important trade-off between the depth of residual connections and the dropout on these connections for optimal deep neural network convergence and generalization.

Metrics

5 Record Views

Details

Title: Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training
Creators: Qingyang Li
Weimao Ke
Resource Type: Preprint
Language: English
Academic Unit: Information Science
Other Identifier: 991021906083004721

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media