Iterative Decoding for Compositional Generalization in Transformers

Luana Ruiz; Joshua Ainslie; Santiago Ontañón

doi:10.48550/arxiv.2110.04169

Back

Iterative Decoding for Compositional Generalization in Transformers

Preprint

Open access

Iterative Decoding for Compositional Generalization in Transformers

Luana Ruiz, Joshua Ainslie and Santiago Ontañón

arXiv (Cornell University)

09 Dec 2021

DOI: https://doi.org/10.48550/arxiv.2110.04169

Files and links (1)

url

https://doi.org/10.48550/arxiv.2110.04169View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

Computer Science - Learning

Deep learning models generalize well to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for longer examples than those seen at training. This paper introduces iterative decoding, an alternative to seq2seq that (i) improves transformer compositional generalization in the PCFG and Cartesian product datasets and (ii) evidences that, in these datasets, seq2seq transformers do not learn iterations that are not unrolled. In iterative decoding, training examples are broken down into a sequence of intermediate steps that the transformer learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. We conclude by illustrating some limitations of iterative decoding in the CFQ dataset.

Metrics

11 Record Views

Details

Title: Iterative Decoding for Compositional Generalization in Transformers
Creators: Luana Ruiz
Joshua Ainslie
Santiago Ontañón
Publication Details: arXiv (Cornell University)
Resource Type: Preprint
Language: English
Academic Unit: Computer Science (Computing)
Other Identifier: 991021869113004721

Iterative Decoding for Compositional Generalization in Transformers

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media