Skip to yearly menu bar Skip to main content


Spotlight Poster

Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices

Junhao Dong · Piotr Koniusz · Yifei Zhang · Hao Zhu · Weiming Liu · Xinghua Qu · Yew Soon ONG

East Exhibition Hall A-B #E-2109
[ ] [ ]
Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.

Lay Summary:

Adversarial examples pose a security threat to AI systems by perturbing their decisions. To counteract this type of attack, we generate adversarial samples and encapsulate them into an adversarial simplex, which is used to robustify VLM against adversarial attacks.

Chat is not available.