Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

Self-Imputation and Cross-Variable Learning Improve Water Quality Prediction with Sparse Data

Xiaofeng Liu · Xiaobo Xia · Xuechen Zhang · Mohna Chakraborty · Xiyuan Chang · Kuai Fang · William Currie · Samet Oymak


Abstract:

Accurate water quality prediction is essential for effective environmental management, yet infrequent sampling results in severe data sparsity, posing significant challenges for training traditional deep learning models. To address this, we propose a novel two-stage framework that leverages a tabular foundation model for multivariate time series prediction under sparse data conditions. In the first stage, the model self-imputes missing water quality values using hydroclimatic and calendar-based features; in the second stage, the imputed time series of all other water quality variables serve as augmented inputs to further improve prediction for each target variable. Evaluated on a continental-scale dataset, our proposed solution significantly outperforms both direct foundation models and traditional deep learning model baselines. We also demonstrate that explicit self-imputation for missing data yields more accurate predictions than relying on the model's internal mechanisms. To the best of our knowledge, this is the first study to demonstrate the effectiveness of tabular foundation models for sparse environmental time series prediction, providing a reliable and data-efficient alternative to traditional deep sequence models.

Chat is not available.