Poster
Faster Stochastic Optimization with Arbitrary Delays via Adaptive Asynchronous Mini-Batching
Amit Attia · Ofir Gaash · Tomer Koren
West Exhibition Hall B2-B3 #W-920
Training modern machine learning models often involves huge datasets and running computations in parallel across many computing units. But when these systems update their models, they sometimes use outdated (or “stale”) information because different units report back at different times. This delay can slow down learning and degrade performance.We developed new methods that allow training algorithms to effectively reduce the impact of delays by making fewer, more meaningful updates using the most relevant parts of the delayed computations. Instead of relying on the average delay, which might be sensitive to a few very slow responses, our methods adapt to how often certain delays occur. This shift can lead to faster and more stable training.Our methods can be applied to many standard training algorithms with little to no modification, and they scale naturally with increasing parallelism, making them compelling options for large-scale systems. By adjusting how and when updates are made, they make better use of the available computations, even when some are delayed. As a result, our methods can help machine learning models learn faster and more reliably in real-world computing environments.