Book chapter
Enhancing Object Detection by Leveraging Large Language Models for Contextual Knowledge
Pattern Recognition, pp 299-314
03 Dec 2024
Abstract
The adoption of deep learning-based object detection models has proliferated across numerous applications. However, their efficacy is significantly constrained under challenging imaging conditions like fog or occlusion. In response to these limitations, we present a novel approach that transcends these hurdles by exploiting scene contextual knowledge distilled from Large Language Models (LLMs). This methodology empowers our model to deduce and anticipate object presence within a scene by leveraging contextual knowledge akin to human perception, thereby overcoming the constraints of direct visual cues. Our method synergizes the capabilities of object detection models with the contextual interpretation and predictive capacity of LLaMA, an advanced LLM. Our framework operates exclusively on the labels and positional information provided by a detection algorithm, sidestepping the reliance on pixel-level image data both during training and inference. The effectiveness of our approach is validated through extensive experiments conducted on the COCO-2017 dataset, including a modified version simulating reduced visibility conditions. The empirical findings underscore the superior performance of our integrated model compared to standalone YOLO models, particularly evident in adverse conditions, where notable enhancements in detection accuracy are observed across various object sizes.
Metrics
32 Record Views
2 citations in Scopus
Details
- Title
- Enhancing Object Detection by Leveraging Large Language Models for Contextual Knowledge
- Creators
- Amirreza RouhiDiego PatiñoDavid K. Han
- Contributors
- Apostolos Antonacopoulos (Editor)Subhasis Chaudhuri (Editor)Rama Chellappa (Editor)Cheng-Lin Liu (Editor)Saumik Bhattacharya (Editor)Umapada Pal (Editor)
- Publication Details
- Pattern Recognition, pp 299-314
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Switzerland; Cham
- Number of pages
- 16
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Scopus ID
- 2-s2.0-85211897429
- Other Identifier
- 991021985086404721