Poster
Maximum Total Correlation Reinforcement Learning
Bang You · Puze Liu · Huaping Liu · Jan Peters · Oleg Arenz
West Exhibition Hall B2-B3 #W-700
Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.
When training artificial intelligence (AI) to perform in the real world, we want them to perform as simply as possible while still being able to solve their tasks. This is because simpler behavior is easier to understand and predict, and is more likely to still work in situations the AI didn't encounter during training. However, it is difficult to increase the simplicity of behavior without a clear way to measure it.In this work, we investigate a concept called total correlation to quantify the simplicity of an AI's behavior. Think of total correlation as measuring how many fewer bytes you would need to describe the behavior to someone else if your communication was optimized for describing the full behavior as a whole instead of the individual time steps. For example, a humanoid robot that uses a clean, periodic gait has higher total correlation than a slightly irregular gait that performs unnecessary adaptations to sensor noise.We propose a method to learn behaviors with higher total correlations, and our tests on simulated robots yielded impressive results. Our method naturally generated highly predictable and compressible behaviors. For tasks that benefit from it, like a robot's gait, this often meant periodic movements (like a regular walking pattern). We observed these significant advantages, including superior robustness to unexpected noise and modeling errors (such as inaccurate mass during training), across various tasks, even those requiring complex, non-periodic movements. Crucially, these advantages came without sacrificing performance; in fact, our robots actually improved at their original tasks compared to standard methods.This research offers a new perspective on training AI, suggesting that explicitly optimizing for simplicity in behavior can lead to more reliable, adaptable, and ultimately more trustworthy intelligent systems for real-world applications.