Skip to yearly menu bar Skip to main content


Poster

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Lutfi Erdogan · Hiroki Furuta · Sehoon Kim · Nicholas Lee · Suhong Moon · Gopala Anumanchipalli · Kurt Keutzer · Amir Gholaminejad

East Exhibition Hall A-B #E-2810
[ ] [ ]
Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.

Lay Summary:

Current AI systems struggle with complex tasks that require multiple steps, like booking flights or finding information across websites. This is because they try to both plan what to do and execute detailed actions simultaneously, which often leads to confusion and mistakes.We created Plan-and-Act, a system that separates the planning process from the action execution. Like how a head chef plans a meal while line cooks execute specific tasks, our system uses one component to create high-level plans and another to carry out precise actions. We also developed methods to generate training examples that teach our system to plan effectively.Our approach significantly improves AI's ability to complete multi-step online tasks, achieving state-of-the-art performance on web navigation benchmarks. This advancement could lead to more helpful digital assistants that can navigate websites, complete online forms, or find information across multiple pages without getting lost or confused. In the future, this could make digital services more accessible to everyone, especially for complex tasks that currently require significant technical knowledge or patience.

Chat is not available.