Poster
in
Workshop: Programmatic Representations for Agent Learning
Making LLMs Program Interpreters via Execution Trace Chain of Thought
Koshi Eguchi · Takuya Akiba
Programmatic representations constitute policies, reward functions, environment models, and skill libraries for autonomous agents. However, their practical value hinges on large language models (LLMs) that can understand and reason about code, not merely generate it. A crucial aspect of this reasoning is the ability of LLMs to predict the outcome of the code (or ``execute'' it), a critical yet less developed area. Improving this capability is essential for verifiable policies, self-auditing reward functions, and debuggable environment models within program-centric agents.To address this, we propose \emph{ET-CoT (Execution Trace Chain of Thought)}, an approach where LLMs learn to generate a detailed and systematic program execution trace as a chain of thought to predict program outcomes. Taking Python as an example, we designed a program-execution trace format inspired by recent theoretical advances. Next, we developed a new Python interpreter called \emph{PyTracify}, which outputs these traces during execution. We then generated a large number of traces and fine-tuned an LLM using them. This ET-CoT approach allows the LLMs to execute Python programs consistently by generating the trace as a CoT. Specifically, our fine-tuned model outperforms other models of comparable size on code execution benchmarks such as CRUXEval-O and LiveCodeBench.