Skip to yearly menu bar Skip to main content


Poster

A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents

Kaiwen Wang · Dawen Liang · Nathan Kallus · Wen Sun

West Exhibition Hall B2-B3 #W-1006
[ ] [ ]
Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

We study risk-sensitive RL where the goal is learn a history-dependent policy that optimizes some risk measure of cumulative rewards.We consider a family of risks called the optimized certainty equivalents (OCE), which captures important risk measures such as conditional value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. In this setting, we propose two meta-algorithms: one grounded in optimism and another based on policy gradients, both of which can leverage the broad suite of risk-neutral RL algorithms in an augmented Markov Decision Process (MDP). Via a reductions approach, we leverage theory for risk-neutral RL to establish novel OCE bounds in complex, rich-observation MDPs. For the optimism-based algorithm, we prove bounds that generalize prior results in CVaR RL and that provide the first risk-sensitive bounds for exogenous block MDPs. For the gradient-based algorithm, we establish both monotone improvement and global convergence guarantees under a discrete reward assumption. Finally, we empirically show that our algorithms learn the optimal history-dependent policy in a proof-of-concept MDP, where all Markovian policies provably fail.

Lay Summary:

In high-stakes settings (e.g., healthcare, finance, systems), we often care not only about the average outcome, but also about avoiding bad outcomes, tail events, or reducing variance. Our paper proposes a framework for solving these risk-sensitive applications via reinforcement learning with the optimized certainty equivalent—a broad class of risk measures that captures important cases such as Conditional Value-at-Risk (CVaR) and mean-variance. We reduce the challenging risk-sensitive RL problem into a standard RL problem, enabling the use of many existing algorithms from the literature. By combining our reduction with risk-neutral RL methods, we derive strong theoretical guarantees even in tasks with high-dimensional state spaces, such as exogenous block MDPs. In sum, our work shows that practical, risk-sensitive objectives can be addressed using well-established RL techniques through a principled reduction framework.

Chat is not available.