Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon · Adrian Lafuente · Shlok Natarajan · Nahum Maru · Hristo Todorov · Etash Guha · Estefany Kelly Buchanan · Mayee Chen · Neel Guha · Christopher Re · Azalia Mirhoseini


Abstract:

Inference-time techniques, such as repeated sampling or iterative revisions, are emerging as powerful ways to enhance large-language models (LLMs) at test time. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. Given a compute budget and a set of available LLMs, Archon explores a large design space to discover optimized configurations tailored to target benchmarks. It can design custom or general-purpose architectures that advance the Pareto frontier of accuracy vs. maximum token budget compared to top-performing baselines. Across instruction-following, reasoning, and coding tasks, we show that Archon can leverage additional inference compute budget to design systems that outperform frontier models such as OpenAI’s o1, GPT-4o, and Claude 3.5 Sonnet by an average of 15.1%.

Chat is not available.