ICML Bayesian Hypothesis Testing Policy Regularization

Poster
in
Workshop: Exploration in AI Today (EXAIT)

Bayesian Hypothesis Testing Policy Regularization

Sarah Rathnam · Susan Murphy · Finale Doshi-Velez

Keywords: [ reinforcement learning ] [ regularization ] [ Bayesian hypothesis testing ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online. BHTPR uses Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.

Chat is not available.

Poster in Workshop: Exploration in AI Today (EXAIT)

Bayesian Hypothesis Testing Policy Regularization

Sarah Rathnam · Susan Murphy · Finale Doshi-Velez

Poster
in
Workshop: Exploration in AI Today (EXAIT)