Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: New In ML

Language Models as Off-Policy Simulators for Coding Tasks


Abstract:

This paper introduces the idea of using language models as off-policy simulators for entire codebases. Reliable simulators that generate responses based on user commands would enable efficient software testing and unlock additional methods to test language models on coding abilities. As a result, this paper proposes a new LMOPS benchmark to evaluate code simulator abilities and evaluates current state of the art models. Notable findings include the effect that model scaling has on model performance on LMOPS, and this paper identifies three common failure cases and strategies to mitigate these situations.

Chat is not available.