ICML Poster Lightweight Protocols for Distributed Private Quantile Estimation

Spotlight Poster

Lightweight Protocols for Distributed Private Quantile Estimation

Anders Aamand · Fabrizio Boninsegna · Abigail Gentle · Jacob Imola · Rasmus Pagh

East Exhibition Hall A-B #E-1005

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: Distributed data analysis is a large and growing field driven by a massive proliferation of user devices, and by privacy concerns surrounding the centralised storage of data. We consider two \emph{adaptive} algorithms for estimating one quantile (e.g.~the median) when each user holds a single data point lying in a domain $[B]$ that can be queried once through a private mechanism; one under local differential privacy (LDP) and another for shuffle differential privacy (shuffle-DP). In the adaptive setting we present an $\varepsilon$-LDP algorithm which can estimate any quantile within error $\alpha$ only requiring $O(\frac{\log B}{\varepsilon^2\alpha^2})$ users, and an $(\varepsilon,\delta)$-shuffle DP algorithm requiring only $\widetilde{O}((\frac{1}{\varepsilon^2}+\frac{1}{\alpha^2})\log B)$ users. Prior (nonadaptive) algorithms require more users by several logarithmic factors in $B$. We further provide a matching lower bound for adaptive protocols, showing that our LDP algorithm is optimal in the low-$\varepsilon$ regime. Additionally, we establish lower bounds against non-adaptive protocols which paired with our understanding of the adaptive case, proves a fundamental separation between these models.

Lay Summary:

Finding the middle or “typical” value in a dataset — like the median income — is a common goal in statistical data analysis. But when the data is private, for example users’ personal info on their phones, it becomes tricky: How do we estimate this statistic without revealing any single users personal data.This paper focuses on how to estimate quantiles (such as the median) when data stays on individual devices and is shared in a privacy-preserving way using local differential privacy (LDP). LDP is a method for protecting users by adding noise to their answers before they’re shared. For example, one can imagine asking users yes/no questions about their data, but instead of always answering truthfully, they only tell you the correct answer with probability 60% and otherwise lie with probability 40%. Based on any answer, it is impossible to make a qualified guess about the user's data, but aggregating many such answers from different users one can start to understand statistics of the data set. However, before this work, existing methods have required many such noisy answers from users in order to get accurate results.We propose a smarter method that asks better questions in multiple rounds, adapting the questions based on previous answers — like a game of “warmer/colder” that quickly homes in on the right value. This approach drastically reduces the number of users needed, while still respecting privacy. In fact, we show that our method is optimal: Under the constraints of LDP, no protocol can do better in a strong mathematical sense. We further combine this with a shuffling technique, which mixes up users’ responses, a process which boosts user privacy even further.

Chat is not available.