Poster
How Do Large Language Monkeys Get Their Power (Laws)?
Rylan Schaeffer · Joshua Kazdan · John Hughes · Jordan Juravsky · Sara Price · Aengus Lynch · Erik Jones · Robert Kirk · Azalia Mirhoseini · Sanmi Koyejo
East Exhibition Hall A-B #E-1701
Thu 17 Jul 3:30 p.m. PDT — 4:30 p.m. PDT
Recent research has shown a curious pattern: when language AIs are given multiple tries at a set of tasks, their overall success improves according to a "power law"—a predictable, but not overly fast, curve. This was puzzling because, for any single task, more tries should make success much more likely, very quickly (exponentially). Our work solves this by showing that while individual tasks do follow this rapid improvement, the overall power law emerges due to how task difficulties are spread. Specifically, a small number of extremely hard tasks, where the AI has a tiny chance of success on any single attempt, collectively slow down the average improvement to a power law, even as each problem is still being tackled exponentially faster with more tries. Understanding this allows us to explain why some AI models or tasks don't follow this power law (they lack enough super-hard problems) and, more importantly, lets us predict this scaling behavior much more efficiently, using far less computational power, simply by looking at the initial success rates, especially on those toughest challenges.