Logo image
Calibrating Large Language Models with Sample Consistency
Conference proceeding   Open access

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan and Chris Callison-Burch
Proceedings of the ... AAAI Conference on Artificial Intelligence, v 39(18), pp 19260-19268
11 Apr 2025
url
https://doi.org/10.1609/aaai.v39i18.34120View
Published, Version of Record (VoR) Open

Abstract

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications Computer Science, Theory & Methods Science & Technology Computer Science Technology
Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often inherently uncalibrated and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we derive model confidence from the distribution of multiple randomly sampled generations, using three measures of consistency. We extensively evaluate eleven open and closed-source models on nine reasoning datasets. Results show that consistency-based calibration methods outperform existing post-hoc approaches in terms of calibration error. Meanwhile, we find that factors such as intermediate explanations, model scaling, and larger sample sizes enhance calibration, while instruction-tuning makes calibration more difficult. Moreover, confidence scores obtained from consistency can potentially enhance model performance. Finally, we offer guidance on choosing suitable consistency metrics for calibration, tailored to model characteristics such as the exposure to instruction-tuning and RLHF. Code - https://github.com/veronica320/Calibrating-LLMs-with-Consistency Extended version - https://arxiv.org/abs/2402.13904

Metrics

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Interdisciplinary Applications
Computer Science, Theory & Methods
Logo image