Workshop
1st Workshop on Foundation Models for Structured Data (FMSD)
Nick Erickson · Xiyuan Zhang · Abdul Fatir Ansari · Boran Han · Mononito Goswami · Samuel Gabriel Müller · Lennart Purucker · Yuyang Wang · Christos Faloutsos · Michael Mahoney
West Ballroom D
Fri 18 Jul, 9 a.m. PDT
Structured data foundation models are an emerging area of research undergoing rapid growth, yet they still remain critically under-explored relative to image and text modalities. So far, the different structured data sub-communities have had little opportunity to come together and share insights about how to build foundation models for structured data. Yet, strong synergies exist across modalities since models share similar pre-training and in-context learning paradigms. Furthermore, models trained on one modality can also demonstrate promising predictive performance in another. This workshop brings together the tabular and time series communities to jointly discuss foundation models for structured data, enabling the communities to capitalize on their synergies. We aim for advancements in foundation models that unify structured data modalities, addressing challenges of scalability and generalization across real-world applications. This emerging field promises to transform how we approach structured data analysis and drive new opportunities across various domains.
Schedule
Fri 9:00 a.m. - 9:15 a.m.
|
Opening Remarks
(
Intro
)
>
|
🔗 |
Fri 9:15 a.m. - 9:45 a.m.
|
Invited Talk: Frank Hutter
(
Invited Talk
)
>
|
🔗 |
Fri 9:45 a.m. - 10:15 a.m.
|
Invited Talk: Andrew Gordon Wilson
(
Invited Talk
)
>
|
🔗 |
Fri 10:15 a.m. - 10:30 a.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Fri 10:30 a.m. - 10:40 a.m.
|
Spotlight 1: ConTextTab: A Semantics-Aware Tabular In-Context Learner
(
Spotlight Talk
)
>
|
🔗 |
Fri 10:40 a.m. - 10:50 a.m.
|
Spotlight 2: Toto: An Open Time Series Foundation Model Optimized for Observability
(
Spotlight Talk
)
>
|
🔗 |
Fri 10:50 a.m. - 11:00 a.m.
|
Spotlight 3: Do You Really Need Public Data? Surrogate Public Data for Differential Privacy on Tabular Data
(
Spotlight Talk
)
>
|
🔗 |
Fri 11:00 a.m. - 11:10 a.m.
|
Spotlight 4: TiRex: Zero-Shot Forecasting Across Long and Short Horizons
(
Spotlight Talk
)
>
|
🔗 |
Fri 11:10 a.m. - 12:50 p.m.
|
Poster Session and Lunch
(
Poster Session
)
>
|
🔗 |
Fri 12:50 p.m. - 1:20 p.m.
|
Invited Talk: Yan Liu
(
Invited Talk
)
>
|
🔗 |
Fri 1:20 p.m. - 1:30 p.m.
|
Spotlight 5: Towards Generalizable Multimodal ECG Representation Learning with LLM-extracted Clinical Entities
(
Spotlight Talk
)
>
|
🔗 |
Fri 1:30 p.m. - 1:40 p.m.
|
Spotlight 6: Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
(
Spotlight Talk
)
>
|
🔗 |
Fri 1:40 p.m. - 1:50 p.m.
|
Spotlight 7: CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data only
(
Spotlight Talk
)
>
|
🔗 |
Fri 1:50 p.m. - 2:00 p.m.
|
Spotlight 8: Exploring Relational Database Foundation Models from a Graph Perspective
(
Spotlight Talk
)
>
|
🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Invited Talk: Hao Wang
(
Invited Talk
)
>
|
🔗 |
Fri 2:30 p.m. - 3:00 p.m.
|
Invited Talk: Mihaela Van Der Schaar
(
Invited Talk
)
>
|
🔗 |
Fri 3:00 p.m. - 3:15 p.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Fri 3:15 p.m. - 3:45 p.m.
|
Invited Talk: Josh Gardner
(
Invited Talk
)
>
|
🔗 |
Fri 3:45 p.m. - 4:45 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Fri 4:45 p.m. - 5:00 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
|
🔗 |
-
|
Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard ( Poster ) > link | Si-Yang Liu · Qile Zhou · Han-Jia Ye 🔗 |
-
|
CLEAR: Contextual Logic-based Explanations for Anomaly Reasoning ( Poster ) > link | Vikash Sharma · Vipul Joshi · Anurag Tripathi · Mayank Jauhari · Amir Raza 🔗 |
-
|
Towards Benchmarking Foundation Models for Tabular Data With Text ( Poster ) > link | Martin Mráz · Breenda Das · Anshul Gupta · Lennart Purucker · Frank Hutter 🔗 |
-
|
Calibration Properties of Time Series Foundation Models ( Poster ) > link | Coen Adler · Yuxin Chang · Samar Abdi · Padhraic Smyth 🔗 |
-
|
TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems ( Poster ) > link | Si-Yang Liu · Han-Jia Ye 🔗 |
-
|
Gateformer: Advancing Multivariate Time Series Forecasting via Temporal and Variate-Wise Attention with Gated Representations ( Poster ) > link | Yu-Hsiang Lan 🔗 |
-
|
Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries ( Poster ) > link | Josefa Stoisser · Marc Martell · Kaspar Märtens · Lawrence Phillips · Stephen Town · Rory Donovan-Maiye · Julien Fauqueur 🔗 |
-
|
Efficient Table Generation for Zero-Shot Column Type Annotation ( Poster ) > link | Ehsan Hoseinzade · Ke Wang 🔗 |
-
|
GATS: A Time-Series Dataset for Addressing General Aviation Flight Safety ( Poster ) > link | Charles Duong · Aidan LaBella · Pak Iong Long · Nathan DePiero · Aditya Iyer · Elise Carman · Randall Balestriero · Travis Desell 🔗 |
-
|
Foundation models for time series forecasting and policy evaluation in infectious disease epidemics: a modelling study ( Poster ) > link | Suprabhath Kalahasti · Benjamin Faucher · boxuan wang · Claudio Ascione · Federico Baldo · Eugenio Valdano 🔗 |
-
|
State-Space Models for Tabular Prior-Data Fitted Networks ( Poster ) > link | Felix Koch · Marcel Wever · Fabian Raisch · Benjamin Tischler 🔗 |
-
|
Do You Really Need Public Data? Surrogate Public Data for Differential Privacy on Tabular Data ( Poster ) > link | Shlomi Hod · Lucas Rosenblatt · Julia Stoyanovich 🔗 |
-
|
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models ( Poster ) > link | Rosen Yu · Cyril Picard · Faez Ahmed 🔗 |
-
|
Two-Stage Contrastive Language Electrocardiogram Pre-training for Fine-Grained Waveform Features ( Poster ) > link | Haitao Li · che liu · Zhengyao Ding · Ziyi Liu · Zhengxing Huang 🔗 |
-
|
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning ( Poster ) > link | Xiaojun Guo · Ang Li · Yifei Wang · Stefanie Jegelka · Yisen Wang 🔗 |
-
|
Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting ( Poster ) > link | Defu Cao · Zijie Lei · Jiao Sun · Yan Liu 🔗 |
-
|
Are Time Series Foundation Models Ready for Zero-Shot Forecasting? ( Poster ) > link | Yunkai Zhang · Qi Zeng · Yawen Zhang · Zhijie Xu · Ming Zheng · Chongyang Gao · Muyan Jiang · Zeyu Zheng 🔗 |
-
|
Instruction Tuning of Large Language Models for Tabular Data Generation—in One Day ( Poster ) > link | Milad Abdollahzadeh · Abdul Raheem · Zilong Zhao · Uzair Javaid · Kevin Yee · Venkata Abhishek Nalam · Tram Truong-Huu · Biplab Sikdar 🔗 |
-
|
Multivariate Calibration is Performative: A Perspective on Pitfalls and Progress ( Poster ) > link | Sofian Zalouk · Charles Marx · Syrine Belakaria · Chris De Sa · Stefano Ermon 🔗 |
-
|
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression ( Poster ) > link | Julianna Piskorz · Katarzyna Kobalczyk · Mihaela van der Schaar 🔗 |
-
|
Toward Scientific Foundation Models for Aquatic Ecosystems ( Poster ) > link |
14 presentersAbhilash Neog · Medha Sawhney · Kazi Sajeed Mehrab · Sepideh Fatemi · Mary Lofton · Amartya Dutta · Aanish Pradhan · Bennett McAfee · Emma Marchisin · Robert Ladwig · Arka Daw · Cayelan C. Carey · Paul Hanson · Anuj Karpatne |
-
|
Towards Synthetic Data for Fine-tuning Tabular Foundation Models ( Poster ) > link | Magnus Bühler · Lennart Purucker · Frank Hutter 🔗 |
-
|
Self-Imputation and Cross-Variable Learning Improve Water Quality Prediction with Sparse Data ( Poster ) > link | Xiaofeng Liu · Xiaobo Xia · Xuechen Zhang · Mohna Chakraborty · Xiyuan Chang · Kuai Fang · William Currie · Samet Oymak 🔗 |
-
|
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks ( Poster ) > link | Shangqing Xu · Harshavardhan Kamarthi · Haoxin Liu · B. Aditya Prakash 🔗 |
-
|
Theoretical Insights into In-context Learning with Unlabeled Data ( Poster ) > link | Yingcong Li · Xiangyu Chang · Muti Kara · Xiaofeng Liu · Amit Roy-Chowdhury · Samet Oymak 🔗 |
-
|
Toto: An Open Time Series Foundation Model Optimized for Observability ( Poster ) > link |
19 presentersBen Cohen · Emaad Khwaja · Youssef Doubli · Salahidine Lemaachi · Chris Lettieri · Charles Masson · Hugo Miccinilli · Elise Ramé · Qiqi Ren · Afshin Rostamizadeh · Jean du Terrail · Anna-Monica Toon · Kan Wang · Stephan Xie · Zongzhe Xu · Viktoriya Zhukova · David Asker · Ameet Talwalkar · Othmane Abou-Amal |
-
|
Assessing the Robustness of Tabular Prior-Data Fitted Network Classifier ( Poster ) > link | Ali Nawaz · Amir Ahmad · Shehroz Khan 🔗 |
-
|
TiRex: Zero-Shot Forecasting Across Long and Short Horizons ( Poster ) > link | Andreas Auer · Patrick Podest · Daniel Klotz · Sebastian Böck · Günter Klambauer · Sepp Hochreiter 🔗 |
-
|
Foundation Models for Clinical Records at Health System Scale ( Poster ) > link | Haresh Rengaraj Rajamohan · Xiang Gao · Weicheng Zhu · Shih-Lun Huang · Long Chen · Kyunghyun Cho · Cem Deniz · Narges Razavian 🔗 |
-
|
Towards Generalizable Multimodal ECG Representation Learning with LLM-extracted Clinical Entities ( Poster ) > link | Mingsheng Cai · Jiuming Jiang · Wenhao Huang · che liu · Rossella Arcucci 🔗 |
-
|
W-LSTMix: A Hybrid Modular Forecasting Framework for Trend and Pattern Learning in Short-Term Load Forecasting ( Poster ) > link | SHIVAM DWIVEDI · Anuj Kumar · Harish Kumar Saravanan · Pandarasamy Arjunan 🔗 |
-
|
GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data? ( Poster ) > link | Bowen Jiang · Yangxinyu Xie · Xiaomeng Wang · Jiashu HE · John Hutchison · Jordan Branham · Camillo Taylor · Tanwi Mallick 🔗 |
-
|
Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting ( Poster ) > link | Ege Onur Taga · Halil Alperen Gozeten · Kutay Tire · Rahul Dalvi · Reinhard Heckel · Samet Oymak 🔗 |
-
|
Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting ( Poster ) > link | Mert Cakiroglu · Idil Altun · HASAN KURBAN · Elham buxton · Mehmet Dalkilic 🔗 |
-
|
Leveraging Linguistic Concepts for Structured Data Extraction ( Poster ) > link | Ahmed Nabhan · Suleiman Khan 🔗 |
-
|
From Structured Data to Clinical Notes: Robust Clinical Decision Support with Fine-Tuned LLMs ( Poster ) > link | Frederike Lübeck · Jonas Wildberger · Frederik Träuble · Maximilian Mordig · Sergios Gatidis · Andreas Krause · Bernhard Schölkopf 🔗 |
-
|
Random Initialization Can’t Catch Up: The Advantage of Language Model Transfer for Time Series Forecasting ( Poster ) > link | Roland Riachi · Kashif Rasul · Arjun Ashok · Prateek Humane · Alexis Roger · Andrew Williams · Yuriy Nevmyvaka · Irina Rish 🔗 |
-
|
LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis ( Poster ) > link | Berkay Döner · Thorir Ingolfsson · Luca Benini · Yawei Li 🔗 |
-
|
RECoRD: A Multi-Agent LLM Framework for Reverse Engineering Codebase to Relational Diagram ( Poster ) > link | Yuan Xue · Xiaoyu Lu · Yunfei Bai · Hoiyi Ng · Yunan Liu 🔗 |
-
|
Dual Adaptation of Time-Series Foundation Models for Financial Forecasting ( Poster ) > link | Fatemeh Chitsaz · Saman Haratizadeh 🔗 |
-
|
TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation ( Poster ) > link | Jacob Si · Zijing Ou · Mike Qu · Zhengrui Xiang · Yingzhen Li 🔗 |
-
|
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data only ( Poster ) > link | Shifeng Xie · Vasilii Feofanov · Marius Alonso · Ambroise Odonnat · Jianfeng Zhang · Ievgen Redko 🔗 |
-
|
Towards Interpretable Time Series Foundation Models ( Poster ) > link | Matthieu Boileau · Philippe Helluy · Jérémy Pawlus · Svitlana Vyetrenko 🔗 |
-
|
AdaRec: Adaptive Recommendation with LLMs via Narrative Profiling and Dual-Channel Reasoning ( Poster ) > link | Meiyun Wang · Charin Polpanumas 🔗 |
-
|
DriMM: Drilling Multimodal Model for Time-Series and Text in the Era of Large Models ( Poster ) > link | Sebastiaan Buiting · Soumyadipta Sengupta · Abdallah Benzine · Amine EL KHAIR · Imane Khaouja · Youssef Tamaazousti 🔗 |
-
|
Photoplethysmography, Foundation Models, Hypertension and Diabetes ( Poster ) > link | George Searle 🔗 |
-
|
ConTextTab: A Semantics-Aware Tabular In-Context Learner ( Poster ) > link | Marco Spinaci · Marek Polewczyk · Maximilian Schambach · Sam Thelin 🔗 |
-
|
Rethinking Description Length: A TabPFN-Based Approximation of Bayesian Mixture Codes ( Poster ) > link | Afiq Aswadi · Susan Wei · Ria Jeffrey 🔗 |
-
|
Do Large Foundation Models Improve Time Series Segmentation? An Industrial Case Study in Oil and Gas Drilling ( Poster ) > link | Imane Khaouja · Amine EL KHAIR · Abdallah Benzine · Sebastiaan Buiting · Soumyadipta Sengupta · Youssef Tamaazousti 🔗 |
-
|
Do-PFN: In-Context Learning for Causal Effect Estimation ( Poster ) > link | Jake Robertson · Arik Reuter · Siyuan Guo · Noah Hollmann · Frank Hutter · Bernhard Schölkopf 🔗 |
-
|
LEAD - Framework for efficient time-series anomaly detection on large scale data using LLMs ( Poster ) > link | Akash Chandrayan · Amir ZIDI · Matthew Reimherr · Anis Mjirda · Abhinav Pradhan 🔗 |
-
|
Learning What Matters First: Sequential Adaptation of Time Series Foundation Models for Robust Financial Forecasting ( Poster ) > link | Fatemeh Chitsaz · Saman Haratizadeh 🔗 |
-
|
From Tabular to Time Series: Can TabPFN Handle Mixed Data? A Study on PhysioNet ( Poster ) > link | Zichao Li · Bingyang Wang · Zong Ke 🔗 |
-
|
Exploring Relational Database Foundation Models from a Graph Perspective ( Poster ) > link | Yanbo Wang · Xiyuan Wang · Quan Gan · Minjie Wang · Qibin Yang · David Wipf · Muhan Zhang 🔗 |
-
|
One-Run Privacy Auditing for Structured Generative and Foundation Models ( Poster ) > link | Rishav Chourasia · Zilong Zhao · Uzair Javaid 🔗 |
-
|
Lights Out, Tabs On: Advancing Row-Column Encoding for Tabular LLMs ( Poster ) > link | Yi-Kai Zhang · Huai-Hong Yin · Xin Li · Haoyu Cao · Yinsong Liu · Deqiang Jiang · Xing Sun · De-Chuan Zhan · Han-Jia Ye 🔗 |
-
|
Early Stopping Tabular In-Context Learning ( Poster ) > link | Jaris Küken · Lennart Purucker · Frank Hutter 🔗 |
-
|
Towards a Multi-Modal Foundation Model for Inertial Confinement Fusion: Combining Structured Data and Diagnostic Images ( Poster ) > link | Michael Jones · Bogdan Kustowski 🔗 |
-
|
Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification ( Poster ) > link | Vasilii Feofanov · Marius Alonso · Songkang Wen · Romain Ilbert · Hongbo Guo · Malik TIOMOKO · Lujia Pan · Jianfeng Zhang · Ievgen Redko 🔗 |
-
|
FoMo-0D: A Foundation Model for Zero-shot Outlier Detection ( Poster ) > link | Yuchen Shen · Haomin Wen · Leman Akoglu 🔗 |
-
|
Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data ( Poster ) > link | Anurag Garg · Muhammad Ali · Noah Hollmann · Lennart Purucker · Samuel Gabriel Müller · Frank Hutter 🔗 |
-
|
Soft Contrastive Learning for Irregular Multivariate Time Series ( Poster ) > link | Junghoon Lim · Seunghan Lee · Taeyoung Park 🔗 |
-
|
Towards Fair In-Context Learning with Tabular Foundation Models ( Poster ) > link | Patrik Kenfack · Samira Ebrahimi Kahou · Ulrich Aïvodji 🔗 |
-
|
Causal Foundation Models: Disentangling Physics from Instrument Properties ( Poster ) > link | Jeroen Audenaert · Daniel Muthukrishna · Paul Gregory · David W. Hogg · V Villar 🔗 |
-
|
Explore the Time Series Forecasting Potential of TabPFN Leveraging the Intrinsic Periodicity of Data ( Poster ) > link | Sibo Cai · Xi Sun · Hui Zhong 🔗 |
-
|
Simulation-Pretrained Foundation Models for Domain-General Astronomical Time Series Tasks with Minimal Labeled Data ( Poster ) > link | Rithwik Gupta · Daniel Muthukrishna 🔗 |
-
|
From Video Classification to Action Detection: Foundation vs. Task-Specific Models ( Poster ) > link | Goncalo Mesquita · Ana Rita Cóias · Alexandre Bernardino · Artur Dubrawski 🔗 |
-
|
MORPHEUS : A Foundation Model for Multivariate Time Series Forecasting ( Poster ) > link | Prathamesh Patil · Amit Varshney · Manoj Cherukumalli · Harsh Deshpande · Leonard Eun · Dushyant Sahoo · Naren Chittar 🔗 |
-
|
TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction ( Poster ) > link | Tommy M. Xu · Andy Zhang · Xiangyu Sun · Lauren Zung · Hossein Hajimirsadeghi · Greg Mori 🔗 |
-
|
Improving Treatment Effect Estimation with LLM-Based Data Augmentation ( Poster ) > link | Nicolas Huynh · Julianna Piskorz · Jeroen Berrevoets · Max Ruiz Luyten · Mihaela van der Schaar 🔗 |
-
|
TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning ( Poster ) > link | Ron Shapira Weber · shahar benishay · Andrey Lavrinenko · Shahaf E. Finder · Oren Freifeld 🔗 |
-
|
LLM Agents Struggle at Time Series Machine Learning Engineering ( Poster ) > link | Yifu Cai · Xinyu Li · Mononito Goswami · Michal Wilinski · Gus Welter · Artur Dubrawski 🔗 |