Poster
When do neural networks learn world models?
Tianren Zhang · Guanyu Chen · Feng Chen
West Exhibition Hall B2-B3 #W-804
Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions--even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.
Will neural networks trained on trillions of data points "understand" the world behind the data like humans? We take an initial step toward formulating and answering this question by leveraging a universal learning tendency shared by both neural networks and humans: preferring "simple" solutions among all solutions that explain the data.Our paper’s main result shows that a neural network model can understand the data--in the sense of recovering the underlying data-generating process--if two conditions are met: the model prefers "simple" solutions, and it is trained on a sufficient number of tasks. Our second result shows that the model’s architecture also influences its understanding of the data.These findings have implications for assessing what modern neural networks actually learn from data and how they make predictions on previously unseen data. As a by-product, we also find a natural way to define "simplicity" that may apply to other scenarios.