Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Haoran Zhao; Jake Ryland Williams

doi:10.48550/arxiv.2311.11012

Back

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Preprint

Open access

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Haoran Zhao and Jake Ryland Williams

arXiv.org

18 Nov 2023

DOI: https://doi.org/10.48550/arxiv.2311.11012

Files and links (1)

url

https://doi.org/10.48550/arxiv.2311.11012View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.

Metrics

7 Record Views

Details

Title: Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models
Creators: Haoran Zhao
Jake Ryland Williams
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Information Science
Other Identifier: 991021806415904721

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media