Poster
in
Workshop: DataWorld: Unifying data curation frameworks across domains
Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting
Ege Onur Taga · Halil Alperen Gozeten · Kutay Tire · Rahul Dalvi · Reinhard Heckel · Samet Oymak
Keywords: [ regression analysis ] [ data augmentation ] [ time series forecasting ] [ data selection ]
While significant effort has been devoted to developing deep learning architectures for time series forecasting, the role of data in the training pipeline remains relatively overlooked. In this work, we propose Filter, Augment, Forecast (FAF): an online data curation strategy based on (1) data selection to filter out low-quality (e.g., noisy) examples and (2) augmentation of the remaining high-quality data. We use reference model-based filtering inspired by the reducible holdout loss selection (RHO-LOSS) from the language modeling literature. We identify limitations of RHO-LOSS under domain shifts common in time series and introduce the adaptive RHO method (AdaRho), which improves performance by updating the reference model during training. We provide a theoretical analysis using random matrix theory, highlighting the impact of reference models and noise on data selection. FAF improves forecasting accuracy across diverse architectures without altering them, achieving a 5.6% median MSE and 3.2% median MAE reduction on nine datasets.