ICML World Models and Consistent Mistakes in LLMs

Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding

World Models and Consistent Mistakes in LLMs

Christopher Wolfram · Aaron Schein

Keywords: [ world models ] [ LLMs ] [ consistency ]

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Do LLMs have a consistent world model that is reflected in their responses? We study whether the behavior of gpt-4o reflects an underlying world model by measuring the consistency of its mistakes across different prompts and prompting strategies. We find that gpt-4o makes consistent mistakes regardless of the exact prompt phrasing or prompt language. However, substantially different prompts that rely on the same underlying information often yield inconsistent results, suggesting that gpt-4o's responses may not reflect a single universal world model.

Chat is not available.

Poster in Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding

World Models and Consistent Mistakes in LLMs

Christopher Wolfram · Aaron Schein

Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding