Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
Rethinking Description Length: A TabPFN-Based Approximation of Bayesian Mixture Codes
Afiq Aswadi · Susan Wei · Ria Jeffrey
The Minimum Description Length principle, a model selection framework based on Occam’s razor, is typically studied through universal codes, and the associated codelengths. Blier & Ollivier (2018) studied the MDL principle for deep neural networks. They compare various codes, and find that a prequential code, based on neural networks trained through stochastic gradient descent, has shorter codelength than variational and two-part codes. Recent developments in deep learning point to a better way to approximate the Bayesmixture code than the variational code. Specifically Hollmann et al. (2023) present a transformer architecture, called Tabular Prior-Data Fitted Networks (TabPFNs), which are trained on synthetic data generated from a vast array of prior-likelihood pairs, and is encouraged to learn the corresponding Bayes posterior predictive distribution. We then use TabPFN to induce a code through in-context learning and demonstrate on real world datasets from the OpenML-CC18 suite that the resulting code is consistently shorter than the prequential code corresponding to MLPs.