Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
Toward Dataset Distillation for Regression Problems
Jamie Mahowald · Ravi Srinivasan · Zhangyang “Atlas” Wang
Dataset distillation is a growing technique that compresses large datasets into smaller synthetic datasets while preserving learning characteristics. However, it remains under-studied for regression problems. This paper presents a theoretical framework for regression dataset distillation using bilevel optimization, where inner loops optimize model parameters on distilled data, while the outer loops refine the dataset itself. For regularized linear regression, we derive closed-form solutions and show approximation guarantees when the number of features is greater than the size of the distilled dataset, using Polyak-Łojasiewicz properties to yield linear rates. Numerical experiments support our predictions with high determination, validating our theory while reducing dataset size by an order of magnitude.