Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian
Keywords: [ transformer ] [ reasoning ] [ chain of continuous thought ] [ superposition ]
Abstract:
In this paper, we prove that a two-layer transformer with $D$ steps of continuous chain-of-thoughts (CoTs) can solve the directed graph reachability problem, where $D$ is the diameter of the graph, while the best known result of constant-depth transformers with discrete CoTs requires $O(n^2)$ decoding steps where $n$ is the number of vertices ($D
Chat is not available.
Successful Page Load