Computer Science - Computation and Language Computer Science - Learning
Deep learning models generalize well to in-distribution data but struggle to
generalize compositionally, i.e., to combine a set of learned primitives to
solve more complex tasks. In sequence-to-sequence (seq2seq) learning,
transformers are often unable to predict correct outputs for longer examples
than those seen at training. This paper introduces iterative decoding, an
alternative to seq2seq that (i) improves transformer compositional
generalization in the PCFG and Cartesian product datasets and (ii) evidences
that, in these datasets, seq2seq transformers do not learn iterations that are
not unrolled. In iterative decoding, training examples are broken down into a
sequence of intermediate steps that the transformer learns iteratively. At
inference time, the intermediate outputs are fed back to the transformer as
intermediate inputs until an end-of-iteration token is predicted. We conclude
by illustrating some limitations of iterative decoding in the CFQ dataset.
Metrics
11 Record Views
Details
Title
Iterative Decoding for Compositional Generalization in Transformers
Creators
Luana Ruiz
Joshua Ainslie
Santiago Ontañón
Publication Details
arXiv (Cornell University)
Resource Type
Preprint
Language
English
Academic Unit
Computer Science (Computing)
Other Identifier
991021869113004721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services