Computer Science - Computer Vision and Pattern Recognition
Following the advent of the Artificial Intelligence (AI) era of large models,
Multimodal Large Language Models (MLLMs) with the ability to understand
cross-modal interactions between vision and text have attracted wide attention.
Adversarial examples with human-imperceptible perturbation are shown to possess
a characteristic known as transferability, which means that a perturbation
generated by one model could also mislead another different model. Augmenting
the diversity in input data is one of the most significant methods for
enhancing adversarial transferability. This method has been certified as a way
to significantly enlarge the threat impact under black-box conditions. Research
works also demonstrate that MLLMs can be exploited to generate adversarial
examples in the white-box scenario. However, the adversarial transferability of
such perturbations is quite limited, failing to achieve effective black-box
attacks across different models. In this paper, we propose the
Typographic-based Semantic Transfer Attack (TSTA), which is inspired by: (1)
MLLMs tend to process semantic-level information; (2) Typographic Attack could
effectively distract the visual information captured by MLLMs. In the scenarios
of Harmful Word Insertion and Important Information Protection, our TSTA
demonstrates superior performance.
Metrics
11 Record Views
Details
Title
Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models
Creators
Hao Cheng
Erjia Xiao
Jiahang Cao
Le Yang
Kaidi Xu
Jindong Gu
Renjing Xu
Publication Details
ArXiv.org
Resource Type
Preprint
Language
English
Academic Unit
Computer Science (Computing)
Other Identifier
991021881386504721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services