Poster
LLMs Can Reason Faster Only If We Let Them
Bilgehan Sel · Lifu Huang · Naren Ramakrishnan · Ruoxi Jia · Ming Jin
East Exhibition Hall A-B #E-2710
Large language models (LLMs) are making inroads into classical AI problems such as automated planning, yet key shortcomings continue to hamper their integration. Chain-of-Thought (CoT) struggles in complex multi-step reasoning, and Tree-of-Thoughts requires multiple queries that increase computational overhead. Recently, Algorithm-of-Thoughts (AoT) have shown promise using in-context examples, at the cost of significantly longer solutions compared to CoT. Aimed at bridging the solution length gap between CoT and AoT, this paper introduces AoT-O3, which combines supervised finetuning on AoT-style plans with a reinforcement learning (RL) framework designed to reduce solution length. The RL component uses a reward model that favors concise, valid solutions while maintaining planning accuracy. Empirical evaluations indicate that AoT-O3 shortens solution length by up to 80\% compared to baseline AoT while maintaining or surpassing prior performance. These findings suggest a promising pathway for more efficient, scalable LLM-based planning.
Large language models (LLMs) can solve complex problems better when they are guided in smarter ways. The paper introduces a new method called AoT-O3 that helps these models plan more efficiently by giving rewards for shorter, accurate solutions. This approach significantly cuts down on the steps needed to reach a solution—by up to 80\%—without sacrificing quality. As a result, it also reduces energy use and makes AI more scalable and environmentally friendly.