Oral
in
Workshop: Workshop on Computer Use Agents
Improving LLM Agent Planning for Computer Use via In-Context Learning with Atomic Fact Augmentation and Lookahead Search
Samuel Holt · Max Ruiz Luyten · Thomas Pouplin · Mihaela van der Schaar
Large Language Models (LLMs) are increasingly capable for computer use agents but often require significant guidance to perform effectively in complex, interactive digital environments. Existing methods struggle with adapting to dynamic information (e.g., broken links, changing UIs) or efficiently using past experiences for better task performance (e.g., efficient command sequences) without fine-tuning. We introduce an LLM agent framework enhancing planning for computer use via in-context learning, atomic fact augmentation, and recursive lookahead search. Our agent extracts task-critical "atomic facts" (e.g., UI element X leads to Y
, sequence A is shortcut for B
) from its trajectories. These facts augment LLM components for action proposal, latent world model simulation, and value estimation. Planning uses depth-limited lookahead, where the LLM simulates trajectories guided by facts and history. This allows online improvement without weight updates. Theoretical motivation links performance to fact-based abstraction and LLM simulation accuracy. Empirically, our agent shows improved adaptability on challenging tasks, achieving more optimal behavior by accumulating experience, showcased in TextFrozenLake and ALFWorld, which model challenges like navigating interfaces with hazards and executing multi-step procedural tasks.