Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
Large Reasoning Models Know How to Think Efficiently
Zeyu Xing · Xing Li · Huiling Zhen · Xianzhi Yu · Mingxuan Yuan · Sinno Jialin Pan
Large Reasoning Models (LRMs) exhibit potential in problem-solving through extended Chain-of-Thought (CoT) generation, enhancing robustness and accuracy by iteratively revising user prompts. However, excessive CoT generation poses challenges in LLM inference, as prolonged decoding due to redundant tokens creates computational bottlenecks. This paper introduces two training-free self-thinking methods—Pre-judged Reasoning and Fallback Reasoning—which optimize inference efficiency via dynamic selection of fast thinking and reasoning strategies based on LRMs’ intrinsic task complexity classification capabilities. Evaluations on the MATH500 and AIME24 dataset demonstrate that Pre-judged Reasoning reduces token generation by up to 26.6\% compared to slow reasoning without compromising accuracy. Similarly, Fallback Reasoning achieves a reduction of up to 24.0\% in generated tokens, enabling significantly faster task completion. Both methods substantially reduce computational overhead while retaining the accuracy of LRMs.