Logo image
Word-Sequence Entropy: Towards uncertainty estimation in free-form medical question answering applications and beyond
Journal article   Open access   Peer reviewed

Word-Sequence Entropy: Towards uncertainty estimation in free-form medical question answering applications and beyond

Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Yue Zhang, Ren Wang, Xiaoshuang Shi and Kaidi Xu
Engineering applications of artificial intelligence, v 139, 109553
Jan 2025
url
http://arxiv.org/abs/2402.14259View

Abstract

Generative inequality Open-ended medical question-answering Semantic relevance Uncertainty quantification
Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general uncertainty measure for free-form answers has not been well-established in open-ended medical question-answering (QA) tasks, where generative inequality introduces a large number of irrelevant words and sequences within the generated set for uncertainty quantification (UQ), which can lead to biases. This paper proposes Word-Sequence Entropy (WSE), which calibrates uncertainty at both the word and sequence levels based on semantic relevance, highlighting keywords and enlarging the generative probability of trustworthy responses when performing UQ. We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs), and demonstrate that WSE exhibits superior performance in accurate UQ under two standard criteria for correctness evaluation. Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement (e.g., a 6.36% improvement in model accuracy on the COVID-QA dataset) in the performance of LLMs when employing responses with lower uncertainty that are identified by WSE as final answers, without requiring additional task-specific fine-tuning or architectural modifications. [Display omitted] •We devise Word-Sequence Entropy for uncertainty analysis in medical query-answering.•We investigate the issue of generative inequality in medical responses.•We capture and highlight keywords and reliable sequences based on semantic relevance.•We resample based on the calibrated uncertainty.

Metrics

19 Record Views
4 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Automation & Control Systems
Computer Science, Artificial Intelligence
Engineering, Electrical & Electronic
Engineering, Multidisciplinary
Logo image