Conference proceeding
Unified visual language modeling for zero-shot multitask inspection of civil infrastructure
Proceedings of SPIE, the international society for optical engineering, v 13951
16 Apr 2026
Abstract
This paper examines multi task computer vision for civil infrastructure maintenance by evaluating a unified visual language approach for captioning, semantic segmentation, and object detection on transportation infrastructure imagery. Two pretrained model variants(Florence-2 base and large) were tested in a zero-shot setting, using prompt conditioned sequence to sequence processing and generating three captions per image as prompts for downstream tasks. Evaluation used average precision (AP) and average recall (AR) for open vocabulary detection, caption to phrase grounding detection, and referring expression segmentation. Computational profiling on an NVIDIA T4 indicates the smaller variant requires roughly 2 GB GPU memory with runtimes from 0.2 to 2.3 seconds, while the larger variant requires nearly 4 GB with runtimes from 0.4 to 3.9 seconds and higher CPU and GPU utilization. A complex scene revealed complementary strengths across variants: the base variant succeeded at bridge segmentation where the other failed, while the other detected all small instances (e.g., graffiti on concrete) missed by the first. Thisstudy’s prompt engineering across various prompts identified that a geometry-focused prompt can optimize inspection outcomes, providing practical guidance for deploying these models in real-world infrastructure monitoring applications.
Metrics
1 Record Views
Details
- Title
- Unified visual language modeling for zero-shot multitask inspection of civil infrastructure
- Creators
- Farzad Azizi Zade - Independent Researcher (Iran, Islamic Republic of)Pedram Bazrafshan - Drexel UniversityArvin Ebrahimkhanlou - Drexel University
- Contributors
- Kara J. Peters (Editor) - North Carolina State UniversityFabrizio Ricci (Editor) - Univ. degli Studi di Napoli Federico II (Italy)Piervincenzo Rizzo (Editor) - University of PittsburghChristoph Schaal (Editor) - California State University, Northridge
- Publication Details
- Proceedings of SPIE, the international society for optical engineering, v 13951
- Publisher
- SPIE
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Civil, Architectural, and Environmental Engineering; Mechanical Engineering and Mechanics
- Other Identifier
- 991022180805704721