Poster
Instance-Optimal Pure Exploration for Linear Bandits on Continuous Arms
Sho Takemori · Yuhei Umeda · Aditya Gopalan
West Exhibition Hall B2-B3 #W-906
The linear bandit problem is an online optimization problem involving an unknown, linear reward function. The learner gains information about this function solely through noisy evaluations. Recognizing its practical relevance, we focus on the case of a continuous arm set (action space), which can be viewed as a specialized instance of Bayesian optimization. Furthermore, we address a pure exploration setting, where the learner's primary goal is to identify a high-performing arm as quickly as possible. Devising an (asymptotically) optimal learning strategy is a significant hurdle, as it necessitates optimization over the probability space defined on the arm set, a potentially infinite-dimensional space. We present a tractable algorithm, requiring a manageable number of oracle calls, and formally demonstrate its asymptotic optimality.