Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding

Evaluating Self-Orienting in Language and Reasoning Models

Eric Bigelow · Zergham Ahmed · Tomer Ullman

Keywords: [ Self-Representation ] [ Large Language Models ] [ Computational Cognitive Science ]


Abstract:

We present a novel evaluation approach based on research in cognitive science, which studies the ability of an agent to self-orient (i.e., identify what problem it is solving and which agent it is in the environment). Our task involves a grid-world where the agent needs to navigate to a goal, but does not have prior knowledge of the world, including which entity it controls. Humans solve this task in a two-step process, first figuring out what agent they control, in other words self-orienting, and then navigating to the goal. We ask whether LLMs can accomplish this task. We found that state-of-the-art LLMs (GPT-4o) have the ability to efficiently self-orient with near-optimal performance, but this ability disappears with in-context reasoning (OpenAI o4-mini). However, we find that this ability reemerges for reasoning models trained with more advanced methods, such as backtracking (o3).

Chat is not available.