ICML Measuring Belief Updates in Curious Agents

Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding

Measuring Belief Updates in Curious Agents

Joschka Strüber · Ilze Amanda Auzina · Shashwat Goel · Susanne Keller · Jonas Geiping · Ameya Pandurang Prabhu · Matthias Bethge

Keywords: [ agentic reasoning ] [ language models ] [ internal world model ] [ multi-turn reasoning ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

To effectively perform open-ended tasks, language models must identify gaps in their knowledge, take actions to acquire new information, and update their internal world models accordingly. This raises a key question, how can we assess whether their reasoning chains and multi-turn actions contribute to improving beliefs in their internal world model? In this paper, we demonstrate a simple, scalable method of measuring belief updates by sequentially assessing the log-probabilities that a language model assigns to the true belief across multi-turn actions. We assess model belief updates on a multi-turn RL reasoning benchmark, 'Twenty Questions'. Our findings show that recent Qwen3 models struggle to update their beliefs, even when the quality of generated questions is controlled for. Through counterfactual experiments, we validate that finetuning teaches the student models to perform coherent belief updates, which they could not do before. Intriguingly, we find that measuring model beliefs also allows detecting reward-hacking in RL-trained models. Overall, we offer a novel perspective on measuring and understanding intermediate beliefs of language models.

Chat is not available.

Poster in Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding

Measuring Belief Updates in Curious Agents

Joschka Strüber · Ilze Amanda Auzina · Shashwat Goel · Susanne Keller · Jonas Geiping · Ameya Pandurang Prabhu · Matthias Bethge

Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding