Skip to yearly menu bar Skip to main content


Poster
in
Workshop: CODEML: Championing Open-source DEvelopment in Machine Learning

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie · Alex Xie · Divyanshu Sheth · Pengfei Liu · Daniel Fried · Carolyn Rose

[ ] [ Project Page ]
Fri 18 Jul 2:15 p.m. PDT — 3 p.m. PDT

Abstract:

We introduce RepoST, a scalable method to build repository-level code generation environments that provide execution feedback for model training. Unlike existing works that require building the entire repository for execution, which is challenging for both human and LLMs and limits the scalability of the datasets, we leverage sandbox testing, which isolates the target function and its dependencies to a separate script for testing. In inference, models can still access the natural repository for code generation, and the script will be used to provide execution feedback. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 824 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval.

Chat is not available.