Spotlight Poster
Mastering Board Games by External and Internal Planning with Language Models
John Schultz · Jakub Adamek · Matej Jusup · Marc Lanctot · Michael Kaisers · Sarah Perrin · Daniel Hennes · Jeremy Shar · Cannada Lewis · Anian Ruoss · Tom Zahavy · Petar Veličković · Laurel Prince · Satinder Singh · Eric Malmi · Nenad Tomasev
East Exhibition Hall A-B #E-2508
Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains. In this paper, we aim to demonstrate this across board games (Chess, Fischer Random / Chess960, Connect Four, and Hex), and we show that search-based planning can yield significant improvements in LLM game-playing strength. We introduce, compare and contrast two major approaches: In external search, the model guides Monte Carlo Tree Search (MCTS) rollouts and evaluations without calls to an external game engine, and in internal search, the model is trained to generate in-context a linearized tree of search and a resulting final choice. Both build on a language model pre-trained on relevant domain knowledge, reliably capturing the transition and value functions in the respective environments, with minimal hallucinations. We evaluate our LLM search implementations against game-specific state-of-the-art engines, showcasing substantial improvements in strength over the base model, and reaching Grandmaster-level performance in chess while operating closer to the human search budget. Our proposed approach, combining search with domain knowledge, is not specific to board games, hinting at more general future applications.
- Large Language Models (LLMs) demonstrate impressive performance across various tasks that require complex reasoning. Yet, they still struggle to play board games as simple as tic-tac-toe. 2. We developed an LLM that can play different board games, reaching Grandmaster-level chess performance. We investigated different planning strategies that enable the LLM to improve its performance, the more “thinking time” we provide to the model. 3. In the future, similar planning strategies can unlock strong performance improvements in LLMs applied to other reasoning problems.