ICML e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Amrith Setlur · Matthew Yang · Charlie Snell · Jeremiah Greer · Ian Wu · Virginia Smith · Max Simchowitz · Aviral Kumar

[ Abstract ] [ Project Page ]

[ OpenReview]

Fri 18 Jul 11:15 a.m. PDT — noon PDT

Abstract:

Test-time scaling offers a promising path to improve LLM reasoning; however, the true promise of this paradigm lies in extrapolation (i.e., to scale performance as LLMs "think" for longer). We show that one way to enable extrapolation is by training the LLM at in-context exploration; that is, training the LLM to effectively spend its test time budget by chaining operations (such as generation, verification, refinement, etc.). To enable in-context exploration, we identify three key ingredients as part of our recipe e3: (1) chaining asymmetries in base LLM competence, e.g., chaining verification (easy) with generation (hard), as a way to implement in-context search; (2) leveraging negative gradients from incorrect traces to amplify exploration that chains additional asymmetries; and (3) aligning task difficulty with training token budget to structure in-context exploration. Our recipe e3 produces the best performing 1.7B model on AIME/HMMT'25, and can also extrapolate compute to 2.5x the model training budget.

Chat is not available.

Poster in Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Amrith Setlur · Matthew Yang · Charlie Snell · Jeremiah Greer · Ian Wu · Virginia Smith · Max Simchowitz · Aviral Kumar

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)