Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Context-lite Multi-turn Reinforcement Learning for LLM Agents

Chen · Jiayu Chen · Hao Zhu · Jeff Schneider


Abstract:

Finetuning large language model (LLM) agents with multi-turn reinforcement learning (RL) is a promising direction. However, applying multi-turn RL to agentic tasks presents unique challenges not typically encountered in reasoning tasks such as solving math problems. These include long interaction histories that hinder relevant context retrieval, sparse rewards that slow down learning, and variable trajectory lengths that reduce training efficiency. To address these challenges, we propose Context-lite Multi-turn RL, a framework that incorporates:(1) customizable agent memory mechanism, allowing the agent to flexibly include different lengths of historical interaction in each turn’s prompt based on task requirements, and(2) Dual-discounting GAE, which decouples step-level and token-level credit assignment.Experiments demonstrate that our method surpasses the zero-shot performance of state-of-the-art LLMs across four BabyAI scenarios, while also achieving greater efficiency and effectiveness than variants lacking either the memory mechanism or dual-discounting GAE.

Chat is not available.