Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Kiran Vodrahalli; Santiago Ontanon; Nilesh Tripuraneni; Kelvin Xu; Sanil Jain; Rakesh Shivanna; Jeffrey Hui; Nishanth Dikkala; Mehran Kazemi; Bahare Fatemi; Rohan Anil; Ethan Dyer; Siamak Shakeri; Roopali Vij; Harsh Mehta; Vinay Ramasesh; Quoc Le; Ed Chi; Yifeng Lu; Orhan Firat; Angeliki Lazaridou; Jean-Baptiste Lespiau; Nithya Attaluri; Kate Olszewska

doi:10.48550/arxiv.2409.12640

Back

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Preprint

Open access

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, …

19 Sep 2024

DOI: https://doi.org/10.48550/arxiv.2409.12640

Files and links (1)

url

https://arxiv.org/abs/2409.12640View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

Computer Science - Learning

We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We perform evaluations on several state-of-the-art models and demonstrate both that a) the proposed evaluations are high-signal and b) that there is significant room for improvement in synthesizing long-context information.

Metrics

20 Record Views

Details

Title: Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
Creators: Kiran Vodrahalli
Santiago Ontanon
Nilesh Tripuraneni
Kelvin Xu
Sanil Jain
Rakesh Shivanna
Jeffrey Hui
Nishanth Dikkala
Mehran Kazemi
Bahare Fatemi
Rohan Anil
Ethan Dyer
Siamak Shakeri
Roopali Vij
Harsh Mehta
Vinay Ramasesh
Quoc Le
Ed Chi
Yifeng Lu
Orhan Firat
Angeliki Lazaridou
Jean-Baptiste Lespiau
Nithya Attaluri
Kate Olszewska
Resource Type: Preprint
Language: English
Academic Unit: Computer Science
Other Identifier: 991021904294004721

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media