Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

What Matters when Modeling Human Behavior using Imitation Learning?

Aneri Muni · Esther Derman · Vincent Taboga · Pierre-Luc Bacon · Erick Delage


Abstract:

As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuring safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.

Chat is not available.