Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: New In ML

AR-CoT: Agnostic Reasoning with Compressed Long Chain-of-Thought


Abstract:

Chain-of-Thought significantly enhances a model's reasoning capability, but long reasoning chains, sometimes loaded with redundant information, may not bolster the model's reasoning proficiency. We've noted that compressing long reasoning chains can yield shorter ones, spotlighting implicit reasoning patterns. Herein, we present a novel post-training framework named AR-CoT, designed to acquire more advantageous and agnostic reasoning patterns to invigorate the model's reasoning capacity. To this end, we've experimented with diverse compression techniques and constructed a high-quality dataset of reasoning chains derived from the compression of their longer counterparts. Furthermore, we've fine-tuned the model using reinforcement learning, enabling it to grasp these agnostic reasoning patterns. Applying this method to Qwen2.5-7B-Instruct and fine-tuning it with a small yet high-quality dataset of reasoning chains has elevated the model's accuracy on GPQA by 17.91\%, with accuracy gains also observed in other reasoning tasks. Additionally, our analysis reveals that after learning these agnostic reasoning patterns, the model's final attention layers undergo significant transformations, and it demonstrates a greater knack for focusing on words pertinent to reasoning within the text.\footnote{The code and dataset used in this study are planned to be released soon for use and verification by other researchers. The specific release time and access channels will be announced through relevant academic platforms or research institution websites.

Chat is not available.