Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation
Jacob Si · Zijing Ou · Mike Qu · Zhengrui Xiang · Yingzhen Li
Diffusion models for tabular data generation face a conundrum between separate and unified data representations. The former struggles with jointly capturing multi-modal distributions, while the latter often relies on sparse, suboptimal encodings and incurs high computational costs. In this work, we address the latter by presenting TabRep, a diffusion architecture trained with a unified, continuous representation tailored for tabular data. Motivated by geometric insights of the data manifold, our representation is dense, separable, and preserves intrinsic relationships. TabRep achieves state-of-the-art performance, synthesizing data that surpasses the original in downstream quality, while maintaining privacy and efficiency.