Poster
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic · Johannes Müller · Nico Scherf
West Exhibition Hall B2-B3 #W-518
Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
Reinforcement learning (RL) is a type of AI that learns by trial and error, often achieving impressive results in games, robotics, and other tasks that require reasoning in multiple steps. But this trial-and-error process can lead to unsafe behavior while the system is still learning—like breaking rules or taking risky actions. Our work introduces a new method called Constrained Trust Region Policy Optimization (C-TRPO) that helps RL systems stay safe while learning without making any specific assumptions about the task. Instead of allowing the system to explore freely and hoping it stays within limits, C-TRPO carefully guides the learning process so that all new behaviors are safe by design. This means it avoids unsafe actions not just at the end, but throughout training. We also show how our method connects to other popular approaches and test it on several tasks. The results show that C-TRPO keeps the system within safety limits while still performing well.