Poster
in
Workshop: Exploration in AI Today (EXAIT)
Bayesian Hypothesis Testing Policy Regularization
Sarah Rathnam · Susan Murphy · Finale Doshi-Velez
Keywords: [ reinforcement learning ] [ regularization ] [ Bayesian hypothesis testing ]
In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online. BHTPR uses Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.