Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Halil Alperen Gozeten · Muhammed Emrullah Ildiz · Xuechen Zhang · Hrayr Harutyunyan · Ankit Singh Rawat · Samet Oymak
Keywords: [ transformers ] [ multi token sampling ] [ latent space reasoning ] [ parallel exploration ] [ policy optimization ] [ chain-of-thought ]
We propose CoT2, a framework using continuously-valued tokens that enables language models to track multiple reasoning paths in parallel and provide a novel CoT2 supervision strategy where we match the softmax outputs to the empirical token distributions of a set of target traces. Theoretically, we show that CoT2 offers sample-complexity benefits and construct a one-layer transformer that solves the subset-sum problem with sufficient embedding capacity. We also introduce continuous sampling methods, showing that reinforcement learning with CoT2 notably improves logical reasoning performance compared to discrete and continuous baselines.