ICML Poster Adaptive Data Collection for Robust Learning Across Multiple Distributions

Poster

Adaptive Data Collection for Robust Learning Across Multiple Distributions

Chengbo Zang · Mehmet Turkcan · Gil Zussman · Zoran Kostic · Javad Ghaderi

East Exhibition Hall A-B #E-1701

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: We propose a framework for adaptive data collection aimed at robust learning in multi-distribution scenarios under a fixed data collection budget. In each round, the algorithm selects a distribution source to sample from for data collection and updates the model parameters accordingly. The objective is to find the model parameters that minimize the expected loss across all the data sources. Our approach integrates upper-confidence-bound (UCB) sampling with online gradient descent (OGD) to dynamically collect and annotate data from multiple sources. By bridging online optimization and multi-armed bandits, we provide theoretical guarantees for our UCB-OGD approach, demonstrating that it achieves a minimax regret of $O(T^{\frac{1}{2}}(K\ln T)^{\frac{1}{2}})$ over $K$ data sources after $T$ rounds. We further provide a lower bound showing that the result is optimal up to a $\ln T$ factor. Extensive evaluations on standard datasets and a real-world testbed for object detection in smart-city intersections validate the consistent performance improvements of our method compared to baselines such as random sampling and various active learning methods.

Lay Summary:

Data collection and annotation is essential for robust model performance in modern machine learning and deep learning systems. Our research introduces a novel method to adaptively decide where to collect and annotate data from multiple data sources, so that the model learns as efficiently as possible under a limited budget. In an iterative fashion, the system selects a data source, collects and annotates new sample from it, and updates the model accordingly. We propose an algorithm that combines techniques from optimization and reinforcement learning with robust mathematical guarantees. We further test its performance on both well-known datasets across multiple tasks and in a real-world smart-city testbed, demonstrating its effectiveness and flexibility.

Chat is not available.