Skip to yearly menu bar Skip to main content


Poster

Latent Thought Models with Variational Bayes Inference-Time Computation

Deqian Kong · Minglu Zhao · Dehong Xu · Bo Pang · Shu Wang · Edouardo Honig · Zhangzhang Si · Chuan Li · Jianwen Xie · Sirui Xie · Ying Nian Wu

East Exhibition Hall A-B #E-3211
[ ] [ ]
Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation. The project page is available at https://deqiankong.github.io/blogs/ltm.

Lay Summary:

Traditional Large language models (LLMs) can scale by increasing model size and data size. However, as model sizes grow rapidly, data availability has emerged as a critical bottleneck for continued scaling. We propose Latent Thought Models (LTMs) that add explicit "latent thought vectors" as internal abstract representations. Before generating text, LTMs first develop those internal thoughts, then use them to guide word-by-word generation. The model learns through a dual-rate process: fast learning that adapts thoughts for specific text and slow learning of general linguistic patterns.Compared to LLMs, LTMs achieve much better sample and computational efficiency. LTMs demonstrate in-context learning at a significantly smaller scale. Most importantly, LTMs introduce "inference-time computation" as a new scaling axis beyond LLMs, potentially transforming how we build efficient and generalizable AI systems.

Chat is not available.