Poster
Online Learning in Risk Sensitive constrained MDP
Arnob Ghosh · Mehrdad Moharrami
West Exhibition Hall B2-B3 #W-610
In many practical sequential decision-making applications (e,g, finance, safe navigation, etc..), it is important to consider that the risk-based measure of the cost is below a certain threshold. Currently, the CMDP setting can only address the scenario where the expected cost is below a certain threshold. This is the first paper that obtains the regret and the violation bound for an MDP with entropic risk constraints. We show that, unlike the unconstrained setup, here, the Markovian policy may not be optimal. Hence, we augment the state-space and consider a constrained optimized certainty equivalence. In order to obtain the regret and the violation bounds, we overcome specific challenges of infinite augmented state-space and the lack of strong duality because of the non-linearity, unlike the traditional CMDP setting. Some key important questions remain open, like whether we can improve the bound.