Skip to yearly menu bar Skip to main content


Poster

Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts

Amir Najafi · Samin Mahdizadeh Sani · Farzan Farnia

West Exhibition Hall B2-B3 #W-903
[ ] [ ]
Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: We address the challenge of certifying the performance of a federated learning model on an unseen target network using only measurements from the source network that trained the model. Specifically, consider a source network "A" with $K$ clients, each holding private, non-IID datasets drawn from heterogeneous distributions, modeled as samples from a broader meta-distribution $\mu$. Our goal is to provide certified guarantees for the model’s performance on a different, unseen network "B", governed by an unknown meta-distribution $\mu'$, assuming the deviation between $\mu$ and $\mu'$ is bounded—either in Wasserstein distance or an $f$-divergence. We derive worst-case uniform guarantees for both the model’s average loss and its risk CDF, the latter corresponding to a novel, adversarially robust version of the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality. In addition, we show how the vanilla DKW bound enables principled certification of the model's true performance on unseen clients within the same (source) network. Our bounds are efficiently computable, asymptotically minimax optimal, and preserve clients' privacy.We also establish non-asymptotic generalization bounds that converge to zero as $K$ grows and the minimum per-client sample size exceeds $\mathcal{O}(\log K)$. Empirical evaluations confirm the practical utility of our bounds across real-world tasks. The project code is available at: github.com/samin-mehdizadeh/Robust-Evaluation-DKW

Lay Summary: We propose a privacy-preserving and polynomial-time procedure for evaluating the performance of a given machine learning model over a federated network of $K$ clients. Our goal, however, is to provide robust (worst-case) performance guarantees—i.e., bounds that remain valid when the model is deployed on a closely distributed but unseen network. This scenario frequently arises in pilot deployments, where the pilot is conducted in a slightly different region, city, or community. We demonstrate that both the model’s average loss and its risk CDF can be uniformly and robustly bounded, provided the aforementioned privacy and efficiency constraints are met. To achieve this, we reformulate classical results in statistics—namely, the Glivenko–Cantelli theorem and the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality—into novel, adversarially robust versions that account for distributional shifts. These robust formulations underpin our theoretical guarantees and allow us to quantify uncertainty under worst-case deviations. Finally, we validate our bounds through extensive numerical experiments on real-world datasets, demonstrating both their practical accuracy and their resilience in heterogeneous and privacy-sensitive settings.

Chat is not available.