Skip to yearly menu bar Skip to main content


Poster

Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Bingshan Hu · Zhiming Huang · Tianyue Zhang · Mathias Lécuyer · Nidhi Hegde

West Exhibition Hall B2-B3 #W-911
[ ] [ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.

Lay Summary:

Since machine learning algorithms without consideration of differential privacy constantly expose private information, this paper works on developing efficient differentially private sequential learning algorithms under bandit feedback model. It is centered around understanding how to trade off privacy and regret, and finding the fundamental limits that prevent private sequential learning algorithms from having a good trade-off.

Chat is not available.