Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation

Jacob Si · Zijing Ou · Mike Qu · Zhengrui Xiang · Yingzhen Li


Abstract:

Diffusion models for tabular data generation face a conundrum between separate and unified data representations. The former struggles with jointly capturing multi-modal distributions, while the latter often relies on sparse, suboptimal encodings and incurs high computational costs. In this work, we address the latter by presenting TabRep, a diffusion architecture trained with a unified, continuous representation tailored for tabular data. Motivated by geometric insights of the data manifold, our representation is dense, separable, and preserves intrinsic relationships. TabRep achieves state-of-the-art performance, synthesizing data that surpasses the original in downstream quality, while maintaining privacy and efficiency.

Chat is not available.