ICML Poster Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization

Poster

Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization

Jianting Chen

East Exhibition Hall A-B #E-1708

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Core-set selection (CS) for deep learning has become crucial for enhancing training efficiency and understanding datasets by identifying the most informative subsets. However, most existing methods rely on heuristics or complex optimization, struggling to balance efficiency and effectiveness. To address this, we propose a novel CS objective that adaptively balances losses between core-set and non-core-set samples by minimizing the sum of squared losses across all samples. Building on this objective, we introduce theMaximum Reduction as Maximum Contribution criterion (MRMC), which identifies samples with the maximal reduction in loss as those making the maximal contribution to overall convergence. Additionally, a balance constraint is incorporated to ensure an even distribution of contributions from the core-set. Experimental results demonstrate that MRMC improves training efficiency significantly while preserving model performance with minimal cost.

Lay Summary:

In training deep learning models, selecting the most representative data subsets can significantly improve training efficiency and help us better understand the data. However, most existing methods either rely on heuristic rules or require cumbersome computations, making it difficult to achieve both efficiency and effectiveness. To address this issue, we propose a novel approach that automatically balances the importance of different data samples and selects the most valuable data subsets by calculating each sample's actual contribution to the model's training progress. Our experiments demonstrate that this method not only accelerates model training but also maintains performance comparable to using the full dataset, while saving substantial computational resources.

Chat is not available.