Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

Jake Ryland Williams; Haoran Zhao

doi:10.48550/arxiv.2311.07510

Back

Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

Preprint

Open access

Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

Jake Ryland Williams and Haoran Zhao

arXiv.org

13 Nov 2023

DOI: https://doi.org/10.48550/arxiv.2311.07510

Files and links (1)

url

https://doi.org/10.48550/arxiv.2311.07510View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Learning

Mathematics - Probability

Physics - Data Analysis, Statistics and Probability

Statistics - Machine Learning

Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied.

Metrics

24 Record Views

Details

Title: Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units
Creators: Jake Ryland Williams
Haoran Zhao
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Information Science
Other Identifier: 991021806682604721

Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media