ICML Making LLMs Program Interpreters via Execution Trace Chain of Thought

Poster
in
Workshop: Programmatic Representations for Agent Learning

Making LLMs Program Interpreters via Execution Trace Chain of Thought

Koshi Eguchi · Takuya Akiba

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Programmatic representations constitute policies, reward functions, environment models, and skill libraries for autonomous agents. However, their practical value hinges on large language models (LLMs) that can understand and reason about code, not merely generate it. A crucial aspect of this reasoning is the ability of LLMs to predict the outcome of the code (or ``execute'' it), a critical yet less developed area. Improving this capability is essential for verifiable policies, self-auditing reward functions, and debuggable environment models within program-centric agents.To address this, we propose \emph{ET-CoT (Execution Trace Chain of Thought)}, an approach where LLMs learn to generate a detailed and systematic program execution trace as a chain of thought to predict program outcomes. Taking Python as an example, we designed a program-execution trace format inspired by recent theoretical advances. Next, we developed a new Python interpreter called \emph{PyTracify}, which outputs these traces during execution. We then generated a large number of traces and fine-tuned an LLM using them. This ET-CoT approach allows the LLMs to execute Python programs consistently by generating the trace as a CoT. Specifically, our fine-tuned model outperforms other models of comparable size on code execution benchmarks such as CRUXEval-O and LiveCodeBench.

Chat is not available.

Poster in Workshop: Programmatic Representations for Agent Learning

Making LLMs Program Interpreters via Execution Trace Chain of Thought

Koshi Eguchi · Takuya Akiba

Poster
in
Workshop: Programmatic Representations for Agent Learning