Poster
Learning Utilities from Demonstrations in Markov Decision Processes
Filippo Lazzati · Alberto Maria Metelli
West Exhibition Hall B2-B3 #W-703
How can we effectively learn an agent’s risk attitude from demonstrations of their behavior? We address this question by considering a key aspect often overlooked by existing methods: human behavior depends not just on the current state, but on the full history of past experiences.We introduce a new behavioral model that explicitly captures an agent’s risk attitude while allowing decisions to depend on the entire past. We analyze the theoretical properties of this model and develop principled algorithms to infer risk preferences from observed behavior. Our experiments demonstrate that our model outperforms existing approaches.This work lays the foundation for more realistic representations of human behavior in sequential decision-making—models that reflect the complex ways in which past experiences shape present choices.