Spotlight Poster
Lightweight Protocols for Distributed Private Quantile Estimation
Anders Aamand · Fabrizio Boninsegna · Abigail Gentle · Jacob Imola · Rasmus Pagh
East Exhibition Hall A-B #E-1005
Finding the middle or “typical” value in a dataset — like the median income — is a common goal in statistical data analysis. But when the data is private, for example users’ personal info on their phones, it becomes tricky: How do we estimate this statistic without revealing any single users personal data.This paper focuses on how to estimate quantiles (such as the median) when data stays on individual devices and is shared in a privacy-preserving way using local differential privacy (LDP). LDP is a method for protecting users by adding noise to their answers before they’re shared. For example, one can imagine asking users yes/no questions about their data, but instead of always answering truthfully, they only tell you the correct answer with probability 60% and otherwise lie with probability 40%. Based on any answer, it is impossible to make a qualified guess about the user's data, but aggregating many such answers from different users one can start to understand statistics of the data set. However, before this work, existing methods have required many such noisy answers from users in order to get accurate results.We propose a smarter method that asks better questions in multiple rounds, adapting the questions based on previous answers — like a game of “warmer/colder” that quickly homes in on the right value. This approach drastically reduces the number of users needed, while still respecting privacy. In fact, we show that our method is optimal: Under the constraints of LDP, no protocol can do better in a strong mathematical sense. We further combine this with a shuffling technique, which mixes up users’ responses, a process which boosts user privacy even further.