Skip to yearly menu bar Skip to main content


Poster

Exponential Family Variational Flow Matching for Tabular Data Generation

Andres Guzman Cordero · Floor Eijkelboom · Jan-Willem van de Meent

West Exhibition Hall B2-B3 #W-1014
[ ] [ ]
Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.

Lay Summary:

Recent techniques for creating realistic artificial data have greatly improved image and text generation. However, generating realistic tabular data, for example, patient records, is still challenging, even though this type of data is everywhere in practical applications.To address this, we introduce TabbyFlow, a new method specifically designed to generate realistic tabular data. Tabular data often includes different kinds of information: numbers, categories, yes/no answers, etc. Our approach, called Exponential Family Variational Flow Matching (EF-VFM), can handle all these different data types smoothly. It cleverly combines and models numerical and categorical data, ensuring the generated data closely matches real data.Our approach simplifies and improves the way artificial tabular data is generated, making it more accurate and realistic. Tests show that TabbyFlow outperforms other leading methods, producing data that better matches real-world tabular information.

Chat is not available.