Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Technical AI Governance

Locking Open Weight Models with Spectral Deformation

Domenic Rosati · Sebastian Dionicio · Xijie Zeng · Subhabrata Majumdar · Frank Rudzicz · Hassan Sajjad


Abstract:

To protect against misuse of open weight foundation models, we develop a method that globally ``locks" training by using convergence rate control. The lock is constructed by making the convergence rate too slow and instable to expect convergence on any downstream task. We find that the conditioning of the Hessian of neural network losses, which controls convergence rate, can be provably controlled using the spectral values of the weight matrices alone. We leverage this guarantee to develop an efficient iterative algorithm and provide early empirical results that show the effectiveness of our lock. In order to make use of this algorithm, we provide a policy analysis through the lens of digital rights management and anti-circumvention law. We use this analogy to illustrate how training locks can help with AI governance broadly.

Chat is not available.