Poster
Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
Yannis Montreuil · Axel Carlier · Lai Xing Ng · Wei Tsang Ooi
West Exhibition Hall B2-B3 #W-808
Two-Stage Learning-to-Defer systems enable optimal task delegation across multiple agents but assume clean inputs, making them vulnerable to adversarial perturbations. These subtle attacks can misroute queries, overload experts, or bias allocations—compromising both performance and trust in high-stakes applications.We introduce the first comprehensive framework to study and defend two-stage L2D systems against adversarial threats. We design two new attack strategies that reveal systemic vulnerabilities. To defend against these, we propose SARD, a convex algorithm.Our theoretical guarantees and empirical results show that SARD dramatically improves robustness under adversarial conditions while maintaining strong clean performance. This work lays the foundation for secure and trustworthy deployment of L2D systems in safety-critical domains like healthcare, finance, and autonomous systems.