Skip to yearly menu bar Skip to main content


Poster

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Zican Hu · Wei Liu · Xiaoye Qu · Xiangyu Yue · Chunlin Chen · Zhi Wang · Yu Cheng

East Exhibition Hall A-B #E-2802
[ ] [ ] [ Project Page ]
Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework GLIDER (Grounding Language Models as EffIcient Decision-Making Agents via Offline HiErarchical Reinforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and learning for long-horizon tasks. Furthermore, GLIDER facilitates fast online adaptation to non-stationary environments owing to the strong transferability of its task-agnostic low-level skills. Experiments on ScienceWorld and ALFWorld benchmarks show that GLIDER achieves consistent performance gains, along with enhanced generalization capabilities.

Lay Summary:

Large language models (LLMs) have difficulty handling complex decision-making tasks, especially when feedback is limited. They often get lost in long-term planning and struggle to explore effectively, like a chess player who can't think multiple moves ahead.We developed GLIDER, a framework that breaks down complex tasks into smaller, manageable steps. Like a skilled manager delegating tasks, GLIDER uses a two-level system where high-level planning guides step-by-step execution.This approach helps LLMs tackle challenging tasks more efficiently and adapt to new situations better, showing significant improvements in virtual environments that test reasoning and problem-solving abilities.

Chat is not available.