Oral
in
Workshop: ICML 2025 Workshop on Collaborative and Federated Agentic Workflows (CFAgentic @ ICML'25)
LLMSELECTOR: Learning to Select Models in Compound AI Systems
Lingjiao Chen · Jared Davis · Boris Hanin · Peter Bailis · James Zou · Matei Zaharia · Ion Stoica
Compound AI systems that combine multiple LLM calls, such as Self-Refine and Multiagent-Debate, are increasingly critical to AI advancements. Perhaps surprisingly, we find empirically that choosing different models for different modules has a substantial effect on these systems’ performance. Thus, we ask a core question in compound AI systems: for each LLM call or module in the system, how should one decide which LLM to use? As a first step, we formally show that the model selection problem (MSP) is computationally intractable. Next, we propose LLMS8 ELECTOR, a principled framework that learns LLMs’ strengths and weaknesses across different modules through an LLM evaluator and then performs an efficient optimization to select which models to use in any given compound system with a bounded number of modules. Our theoretical analysis gives mathematical conditions under which LLMSELECTOR only requires LLM calls scaling linearly with the number of modules and the number of LLMs to identify the optimal model selec14 tion. Extensive experiments across diverse tasks, including question answering, constrained text generation, and code execution, demonstrate that LLMSELECTOR confers 4%-73% accuracy gains for compound AI systems like Self-Refine and Multiagent-Debate with general-purpose models (e.g., GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro), and 3%-21% gains with frontier reasoning models (e.g., o3-mini, 19 Claude 3.7 Sonnet, Gemini 2.0 Flash).