Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Exploration in AI Today (EXAIT)

Bayesian Hypothesis Testing Policy Regularization

Sarah Rathnam · Susan Murphy · Finale Doshi-Velez

Keywords: [ reinforcement learning ] [ regularization ] [ Bayesian hypothesis testing ]


Abstract:

In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online. BHTPR uses Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.

Chat is not available.