Poster
in
Workshop: Workshop on Technical AI Governance
Locking Open Weight Models with Spectral Deformation
Domenic Rosati · Sebastian Dionicio · Xijie Zeng · Subhabrata Majumdar · Frank Rudzicz · Hassan Sajjad
To protect against misuse of open weight foundation models, we develop a method that globally ``locks" training by using convergence rate control. The lock is constructed by making the convergence rate too slow and instable to expect convergence on any downstream task. We find that the conditioning of the Hessian of neural network losses, which controls convergence rate, can be provably controlled using the spectral values of the weight matrices alone. We leverage this guarantee to develop an efficient iterative algorithm and provide early empirical results that show the effectiveness of our lock. In order to make use of this algorithm, we provide a policy analysis through the lens of digital rights management and anti-circumvention law. We use this analogy to illustrate how training locks can help with AI governance broadly.