ICML Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng · Yang Zhou · Brian Bartoldson · Bhavya Kailkhura · Fan Lai · Jiawei Zhao · Beidi Chen

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Reinforcement learning, such as PPO and GRPO, has powered recent breakthroughs in LLM reasoning. Scaling rollout to sample more prompts enables models to selectively use higher-quality data for training, which can stabilize RL training and improve model performance, but at the cost of significant computational overhead. In this paper, we first show that a substantial portion of this overhead can be avoided by skipping uninformative prompts before rollout. Our analysis of reward dynamics reveals a strong temporal consistency in prompt value: prompts that are uninformative in one epoch are likely to remain uninformative in near future epochs. Based on these insights, we propose GRESO (GRPO with Efficient Selective Rollout, an online, lightweight pre-rollout filtering algorithm that predicts and skips uninformative prompts. By evaluating GRESO on a broad range of math benchmarks and models, like Qwen2.5-Math-1.5B/7B and DeepSeek-R1-Distill-Qwen-1.5B, we show that GRESO achieves up to 2.4x wall-clock time speedup in rollout and up to 2.0x speedup in total training time without accuracy degradation.

Chat is not available.

Poster in Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng · Yang Zhou · Brian Bartoldson · Bhavya Kailkhura · Fan Lai · Jiawei Zhao · Beidi Chen

Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models