Poster
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang · Jie Wang · Jilai Pan · Xilin Xia · Huiling Zhen · Mingxuan Yuan · Jianye Hao · Feng Wu
East Exhibition Hall A-B #E-2703
Large language models, like ChatGPT, are great at solving complex problems by thinking through different possible steps — a bit like how a person might work through a puzzle. But for the computer to try out many possible ways of solving a problem, it usually needs to spend a lot of time thinking, which makes these models slow to use.To solve this, we created a new method called Speculative Search (SpecSearch). Our approach speeds up the thinking process by letting a smaller, faster program work together with the larger, smarter model. The small model quickly generates possible steps, and then the large model only spends time checking and keeping the high-quality ones. This way, the system avoids wasting time on ideas that wouldn’t be helpful anyway.Our experiments show that SpecSearch makes language models much faster — over twice as fast in some cases — without losing their ability to reason well. We have shared our code at https://github.com/MIRALab-USTC/LLMReasoning-SpecSearch, so others can use and build on our method for making AI smarter and faster.