ICML Poster Learning Utilities from Demonstrations in Markov Decision Processes

Poster

Learning Utilities from Demonstrations in Markov Decision Processes

Filippo Lazzati · Alberto Maria Metelli

West Exhibition Hall B2-B3 #W-703

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Although it is well-known that humans commonly engage in *risk-sensitive* behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a *risk-neutral* agent. As such, beyond $(i)$ introducing model misspecification, $(ii)$ they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates $(i)$, and by explicitly representing risk attitudes through (learnable) *utility* functions, it solves $(ii)$. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate *both* our model and our algorithms.

Lay Summary:

How can we effectively learn an agent’s risk attitude from demonstrations of their behavior? We address this question by considering a key aspect often overlooked by existing methods: human behavior depends not just on the current state, but on the full history of past experiences.We introduce a new behavioral model that explicitly captures an agent’s risk attitude while allowing decisions to depend on the entire past. We analyze the theoretical properties of this model and develop principled algorithms to infer risk preferences from observed behavior. Our experiments demonstrate that our model outperforms existing approaches.This work lays the foundation for more realistic representations of human behavior in sequential decision-making—models that reflect the complex ways in which past experiences shape present choices.

Chat is not available.