ICML Poster On Differential Privacy for Adaptively Solving Search Problems via Sketching

Poster

On Differential Privacy for Adaptively Solving Search Problems via Sketching

Shiyuan Feng · Ying Feng · George Li · Zhao Song · David Woodruff · Lichen Zhang

West Exhibition Hall B2-B3 #W-1017

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Oral presentation: Oral 4C Privacy and Uncertainty Quantification
Wed 16 Jul 3:30 p.m. PDT — 4:30 p.m. PDT

Abstract: Recently differential privacy has been used for a number of streaming, data structure, and dynamic graph problems as a means of hiding the internal randomness of the data structure, so that multiple possibly adaptive queries can be made without sacrificing the correctness of the responses. Although these works use differential privacy to show that for some problems it is possible to tolerate $T$ queries using $\widetilde{O}(\sqrt{T})$ copies of a data structure, such results only apply to {\it numerical estimation problems}, and only return the {\it cost} of an optimization problem rather than the solution itself. In this paper we investigate the use of differential privacy for adaptive queries to {\it search} problems, which are significantly more challenging since the responses to queries can reveal much more about the internal randomness than a single numerical query. We focus on two classical search problems: nearest neighbor queries and regression with arbitrary turnstile updates. We identify key parameters to these problems, such as the number of $c$-approximate near neighbors and the matrix condition number, and use different differential privacy techniques to design algorithms returning the solution point or solution vector with memory and time depending on these parameters. We give algorithms for each of these problems that achieve similar tradeoffs.

Lay Summary:

A significant challenge to many modern day AI systems is the presence of malicious attackers. For example, a malicious attacker to chatGPT might try to generate prompts that misguide the model and let it mistakenly leak crucial and private information. This type of attack can happen at various aspects of the AI systems, especially many tools used by chatGPT-like large language models. One such tool they use follows from a simple idea: when user inputs a prompt, they invoke tools to search in a database for similar prompts, and use themselves to generate responses given the new information. This powerful search functionality significantly improves the answers generated by these models, however, they also contain a lot of important private information that could be compromised by an attacker. In this work, we develop database search tools that under some mild conditions, any malicious attacker could not learn any information from our database by only observing the result returned from the search. For some other important AI problems, such as fitting a line to learn the relationship between cancers and patient symptoms, we also develop very efficient approaches that can estimate this fitting line quickly and protect the privacy of patients.

Chat is not available.