TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention

Jinhao Duan; Fei Kong; Hao Cheng; James Diffenderfer; Bhavya Kailkhura; Lichao Sun; Xiaofeng Zhu; Xiaoshuang Shi; Kaidi Xu

doi:10.1109/ICCV51701.2025.00692

Back

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention

Conference proceeding

Open access

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention

Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi and Kaidi Xu

Proceedings / IEEE International Conference on Computer Vision, pp 7372-7382

19 Oct 2025

DOI: https://doi.org/10.1109/ICCV51701.2025.00692

Files and links (1)

url

https://arxiv.org/pdf/2503.10602View

Open

Abstract

Communication systems

Computer networks

HTTP

large vision-language models

object hallucination

Protocols

Radio access networks

Regional area networks

Space communications

Video equipment

Videos

Wide Area Networks

Object Hallucination (\text{OH}) has been acknowledged as one of the major trustworthy challenges in Large VisionLanguage Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the "overall truthfulness" of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as "per-token" hallucination indicators, which is essential for mitigating O H . In this paper, we first conduct an in-depth exploration of LVLM internal states with OH issues and discover that LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist "generic truthful directions" shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inferencetime intervention during LVLM decoding. We further propose ComnHallu to enhance both cross-LVLM and crossdata hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.

Metrics

1 Record Views

Details

Title: TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention
Creators: Jinhao Duan - Drexel University
Fei Kong - University of Electronic Science and Technology of China
Hao Cheng - The Hong Kong University of Science and Technology (Guangzhou)
James Diffenderfer - Landesamt für Landwirtschaft und nachhaltige Landentwicklung
Bhavya Kailkhura - Landesamt für Landwirtschaft und nachhaltige Landentwicklung
Lichao Sun - Lehigh University
Xiaofeng Zhu - University of Electronic Science and Technology of China
Xiaoshuang Shi - University of Electronic Science and Technology of China
Kaidi Xu - Drexel University
Publication Details: Proceedings / IEEE International Conference on Computer Vision, pp 7372-7382
Publisher: IEEE
Grant note: DE-AC52-07NA27344 / Lawrence Livermore National Laboratory (10.13039/100006227) U.S. Department of Energy (10.13039/100000015)
Resource Type: Conference proceeding
Language: English
Academic Unit: Computer Science
Other Identifier: 991022179438804721

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media