ICML Towards Synthetic Data for Fine-tuning Tabular Foundation Models

Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

Towards Synthetic Data for Fine-tuning Tabular Foundation Models

Magnus Bühler · Lennart Purucker · Frank Hutter

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Tabular foundation models pre-trained on synthetically generated datasets have exhibited strong in-context learning capabilities. While fine-tuning can further enhance predictive performance, overfitting to the training data of a downstream task poses a significant risk in tiny-to-small data regimes. We propose a fine-tuning method that employs synthetically generated fine-tuning data to avoid overfitting and improve generalization performance. We study three variants of data generation methods and empirically demonstrate that they mitigate overfitting and outperform standard fine-tuning approaches across five tiny-to-small real-world datasets. Our data generation methods leverage density estimators and structural causal models, akin to those employed during pre-training, to yield the best performance. Our findings indicate that synthetic data generation, a central element in pre-training, can be successfully adapted to enhance fine-tuning.

Chat is not available.

Poster in Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

Towards Synthetic Data for Fine-tuning Tabular Foundation Models

Magnus Bühler · Lennart Purucker · Frank Hutter

Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)