ICML Backdooring VLMs via Concept-Driven Triggers

Poster
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

Backdooring VLMs via Concept-Driven Triggers

Yufan Feng · Weimin Lyu · Yuxin Wang · Benjamin Tan · Yani Ioannou

Keywords: [ explainable AI ] [ ai safety ] [ backdoor attack ] [ vision-language model ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Sat 19 Jul 3 p.m. PDT — 3:45 p.m. PDT

Abstract:

Vision–language models (VLMs) have recently achieved impressive performance, yet their growing complexity raises new security concerns. We introduce the first concept‐driven backdoor for instruction‐tuned VLMs, leveraging visual concept encoders to stealthily trigger the backdoor at multiple levels of abstraction. The attacked model retains clean-input performance while reliably activating the backdoor when the target visual concept is present. Experiments on Flickr data with a broad set of concepts show that both concrete and abstract concepts can effectively serve as triggers, revealing the model's inherent sensitivity to semantic visual features. Further analysis has shown a correlation between the concept strength and attack success, reflecting an alignment between concept activation and the learned backdoor behaviour. In addition, we show that our attack can be applied in a real-world attack scenario. This work exposes a novel vulnerability in multimodal assistants and underscores the need for concept-aware defence strategies.

Chat is not available.

Poster in Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

Backdooring VLMs via Concept-Driven Triggers

Yufan Feng · Weimin Lyu · Yuxin Wang · Benjamin Tan · Yani Ioannou

Poster
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)