Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Continuous Chain of Thought Enables Parallel Exploration and Reasoning

Halil Alperen Gozeten · Muhammed Emrullah Ildiz · Xuechen Zhang · Hrayr Harutyunyan · Ankit Singh Rawat · Samet Oymak

Keywords: [ transformers ] [ multi token sampling ] [ latent space reasoning ] [ parallel exploration ] [ policy optimization ] [ chain-of-thought ]


Abstract:

We propose CoT2, a framework using continuously-valued tokens that enables language models to track multiple reasoning paths in parallel and provide a novel CoT2 supervision strategy where we match the softmax outputs to the empirical token distributions of a set of target traces. Theoretically, we show that CoT2 offers sample-complexity benefits and construct a one-layer transformer that solves the subset-sum problem with sufficient embedding capacity. We also introduce continuous sampling methods, showing that reinforcement learning with CoT2 notably improves logical reasoning performance compared to discrete and continuous baselines.

Chat is not available.