Traditional adversarial attacks concentrate on manipulating clean examples in
the pixel space by adding adversarial perturbations. By contrast, semantic
adversarial attacks focus on changing semantic attributes of clean examples,
such as color, context, and features, which are more feasible in the real
world. In this paper, we propose a framework to quickly generate a semantic
adversarial attack by leveraging recent diffusion models since semantic
information is included in the latent space of well-trained diffusion models.
Then there are two variants of this framework: 1) the Semantic Transformation
(ST) approach fine-tunes the latent space of the generated image and/or the
diffusion model itself; 2) the Latent Masking (LM) approach masks the latent
space with another target image and local backpropagation-based interpretation
methods. Additionally, the ST approach can be applied in either white-box or
black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ
datasets, and our framework demonstrates great fidelity, generalizability, and
transferability compared to other baselines. Our approaches achieve
approximately 100% attack success rate in multiple settings with the best FID
as 36.61. Code is available at
https://github.com/steven202/semantic_adv_via_dm.
Metrics
22 Record Views
Details
Title
Semantic Adversarial Attacks via Diffusion Models
Creators
Chenan Wang
Jinhao Duan
Chaowei Xiao
Edward Kim
Matthew Stamm
Kaidi Xu
Resource Type
Preprint
Language
English
Academic Unit
Electrical and Computer Engineering; Computer Science