ICML Poster Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Poster

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil · Axel Carlier · Lai Xing Ng · Wei Tsang Ooi

West Exhibition Hall B2-B3 #W-808

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation—causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategies—untargeted and targeted—which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose SARD, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent. These guarantees hold across classification, regression, and multi-task settings. Empirical results demonstrate that SARD significantly improves robustness under adversarial attacks while maintaining strong clean performance, marking a critical step toward secure and trustworthy L2D deployment.

Lay Summary:

Two-Stage Learning-to-Defer systems enable optimal task delegation across multiple agents but assume clean inputs, making them vulnerable to adversarial perturbations. These subtle attacks can misroute queries, overload experts, or bias allocations—compromising both performance and trust in high-stakes applications.We introduce the first comprehensive framework to study and defend two-stage L2D systems against adversarial threats. We design two new attack strategies that reveal systemic vulnerabilities. To defend against these, we propose SARD, a convex algorithm.Our theoretical guarantees and empirical results show that SARD dramatically improves robustness under adversarial conditions while maintaining strong clean performance. This work lays the foundation for secure and trustworthy deployment of L2D systems in safety-critical domains like healthcare, finance, and autonomous systems.

Chat is not available.