Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability

Koopman Autoencoders Learn Neural Representation Dynamics

Nishant Suresh Aswani · Saif Jabari

[ ] [ Project Page ]
Sat 19 Jul 10:40 a.m. PDT — 11:40 a.m. PDT

Abstract:

This paper explores a simple question: can we model the internal transformations of a neural network using dynamical systems theory? We introduce Koopman autoencoders to capture how neural representations evolve through network layers, treating these representations as states in a dynamical system. Our approach learns a surrogate model that predicts how neural representations transform from input to output, with two key advantages. First, by way of lifting the original states via an autoencoder, it operates in a linear space, making editing the dynamics straightforward. Second, it preserves the topologies of the original representations by regularizing the autoencoding objective. We demonstrate that these surrogate models naturally replicate the progressive topological simplification observed in neural networks. As a practical application, we show how our approach enables targeted class unlearning in the Yin-Yang and MNIST classification tasks.

Chat is not available.