Poster
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Aya Kayal · Sattar Vakili · Laura Toni · Da-shan Shiu · Alberto Bernacchia
West Exhibition Hall B2-B3 #W-911
In many decision-making tasks—like tuning a chatbot, designing a product, or choosing ad content—it’s unrealistic to ask people to assign precise scores to every option. But people are usually much better at comparing two options and saying which one they prefer. This kind of preference feedback is often more reliable, even if it’s less detailed. Still, collecting it can be costly and time-consuming. That’s why it’s important to design smart algorithms that can figure out the best choice using as few of these comparisons as possible.Our work focuses on building such efficient algorithms. The goal is to learn the best possible option by carefully choosing which pairs of options to compare, making the most of each piece of feedback.What we found is both surprising and exciting: even though preference feedback is less detailed than numerical ratings, it’s still powerful enough to learn almost as well. In fact, we show that the number of comparisons needed to find a near-optimal decision is about the same as if we had access to full numeric scores. This result brings us closer to making machine learning more user-friendly and practical in real-world settings.