Skip to yearly menu bar Skip to main content


Poster

Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

Andy Dong · Wei-Ning Chen · Ayfer Ozgur

East Exhibition Hall A-B #E-800
[ ] [ ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

We study how inherent randomness in the training process—where each sample (or client in federated learning) contributes only to a randomly selected portion of training—can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce balanced iteration subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that in certain regimes, this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for nontrivial privacy amplification.

Lay Summary:

Machine learning models typically protect privacy by adding random noise during training, but too much noise can harm accuracy. We show that hiding which parts of the model or which training steps each data point sees makes it harder to trace any one example. This hidden randomness gives a boost to privacy (so we can add less extra noise to achieve the same level of privacy) without changing the basic training algorithm. We are the first to explain and quantify exactly how much extra privacy this gives. Our approach is especially useful in federated learning, where devices have limited compute power: by training only on submodels, weaker devices can still participate, and we show how it also improves privacy guarantees.

Chat is not available.