Poster
Adaptive Data Collection for Robust Learning Across Multiple Distributions
Chengbo Zang · Mehmet Turkcan · Gil Zussman · Zoran Kostic · Javad Ghaderi
East Exhibition Hall A-B #E-1701
Data collection and annotation is essential for robust model performance in modern machine learning and deep learning systems. Our research introduces a novel method to adaptively decide where to collect and annotate data from multiple data sources, so that the model learns as efficiently as possible under a limited budget. In an iterative fashion, the system selects a data source, collects and annotates new sample from it, and updates the model accordingly. We propose an algorithm that combines techniques from optimization and reinforcement learning with robust mathematical guarantees. We further test its performance on both well-known datasets across multiple tasks and in a real-world smart-city testbed, demonstrating its effectiveness and flexibility.