ICML Poster Model Immunization from a Condition Number Perspective

Spotlight Poster

Model Immunization from a Condition Number Perspective

Amber Yijia Zheng · Cedar Site Bai · Brian Bullins · Raymond A. Yeh

East Exhibition Hall A-B #E-1604

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Oral presentation: Oral 5A Safety and Security
Thu 17 Jul 10 a.m. PDT — 11 a.m. PDT

Abstract:

Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/model-immunization-cond-num.

Lay Summary:

When powerful AI models are open-sourced, there is a risk that it could be fine-tuned to produce harmful content. This paper addresses how to train these models in a way that makes misuse more difficult while still ensuring they remain useful for safe purposes. We explore this issue by examining how easily a model can be optimized after its initial training. By optimizing the condition number during training, we improve the model's resistance of being fine-tuned with harmful data. Empirically, our method works both in theory and practice, yielding promising results. We hope this approach contributes to making AI models safer for public release.

Chat is not available.