Skip to yearly menu bar Skip to main content


Poster
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Lillian Sun · Kevin Cong · Jay Chooi · Russell Li

Keywords: [ optimizers ] [ deep learning ] [ privacy ] [ differential privacy ]

[ ] [ Project Page ]
Sat 19 Jul 3 p.m. PDT — 3:45 p.m. PDT

Abstract: As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A significant challenge remains in implementing DP optimizers that retain strong performance while preserving privacy. Recent advances introduced ever more efficient optimizers, with AdamW being a popular choice for training deep learning models because of strong empirical performance. We study DP-AdamW and introduce DP-AdamW-BC, a differentially private variant of the AdamW optimizer with DP bias correction for the second moment estimator. We start by showing theoretical results for privacy and convergence guarantees of DP-AdamW and DP-AdamW-BC. Then, we empirically analyze the behavior of both optimizers across multiple privacy budgets ($\epsilon = 1, 3, 7$). We find that DP-AdamW outperforms existing state-of-the-art differentially private optimizers like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15\% higher on text classification, up to 5\% higher on image classification, and consistently 1\% higher on graph node classification. Moreover, we empirically show that incorporating bias correction in DP-AdamW (DP-AdamW-BC) consistently decreases accuracy, in contrast to the improvement of DP-AdamBC improvement over DP-Adam.

Chat is not available.