Skip to yearly menu bar Skip to main content


Poster

Reinforcement Learning with Random Time Horizons

Enric Borrell · Lorenz Richter · Christof Schuette

West Exhibition Hall B2-B3 #W-1020
[ ] [ ]
Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

We extend the standard reinforcement learning framework to random time horizons. While the classical setting typically assumes finite and deterministic or infinite runtimes of trajectories, we argue that multiple real-world applications naturally exhibit random (potentially trajectory-dependent) stopping times. Since those stopping times typically depend on the policy, their randomness has an effect on policy gradient formulas, which we (mostly for the first time) derive rigorously in this work both for stochastic and deterministic policies. We present two complementary perspectives, trajectory or state-space based, and establish connections to optimal control theory. Our numerical experiments demonstrate that using the proposed formulas can significantly improve optimization convergence compared to traditional approaches.

Lay Summary:

In this work, we improve reinforcement learning in situations where it is unclear how long the related task should last. In previous work, one assumes a fixed amount of time or a task that goes on forever. But in real life, tasks often end at random times - for example, a game might end early if the player loses, or a robot might stop working if its battery runs out. We show that these random endings affect how the learning process should work, especially when it comes to adjusting the system to improve over time. We carefully figure out how to change the learning process (using so-called policy gradients) when tasks end randomly, for two types of learning systems - those that make decisions randomly and those that consider deterministic decisions. We show through experiments that our method helps the learning process work faster and better than previous methods.

Chat is not available.