Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Programmatic Representations for Agent Learning

FormulaCode: Evaluating Agentic Superoptimization on Large Codebases

Atharva Sehgal · James Hou · Swarat Chaudhuri · Jennifer Sun · Yisong Yue


Abstract:

Rapid advances in LLM agents have shown the ability to optimize code using continuous objective functions — a significant leap beyond traditional code generation techniques. However, there is an urgent need for novel benchmarks that can effectively measure this capability and translate it into real-world impact. Current code benchmarks, which often rely on binary pass/fail outcomes, offer a limited evaluation framework that falls short of capturing the full potential of these emerging capabilities. To bridge this gap, we introduce FormulaCode, a novel benchmark designed for evaluating agentic superoptimization on large codebases, with a focus on real-world performance optimization. Constructed from a dataset of 451 real-world performance bottlenecks automatically mined from Github, FormulaCode enables comprehensive testing of an agent's ability to triage, diagnose, and resolve inefficiencies in realistic software environments. FormulaCode proves to be a challenging benchmark for frontier LLMs and agentic frameworks, with unrestricted repository exploration emerging as a principal component for finding performance inefficiencies. By introducing FormulaCode, our goal is to drive the development of next‑generation optimization algorithms that meet the rigorous demands of real‑world software projects.

Chat is not available.