Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Marco Spinaci · Marek Polewczyk · Maximilian Schambach · Sam Thelin


Abstract:

Tabular in-context learning (ICL) models such as TabPFN and TabICL have recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Trained exclusively on synthetic data, these models however do not fully leverage the rich semantics and world knowledge contained in real-world data. Tabular ICL models based on pretrained large language models such as TabuLa-8B integrate semantics and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. Aiming to bridge this gap, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. Using specialized embeddings for different data modalities and training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark.

Chat is not available.