Skip to yearly menu bar Skip to main content


Poster
in
Workshop: CODEML: Championing Open-source DEvelopment in Machine Learning

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura · Shahrad Mohammadzadeh · Ivan Anokhin · Jacob-Junqi Tian · Mandana Samiei · Taz Scott-Talib · Irina Rish · Doina Precup · Reihaneh Rabbany · Nishanth V Anand

[ ] [ Project Page ]
Fri 18 Jul 2:15 p.m. PDT — 3 p.m. PDT

Abstract:

Reinforcement learning has proven effective for fine-tuning large language models (LLMs) using reward models trained on human preference data. However, collecting such feedback remains expensive, especially in dynamic settings like personalized tutoring, where users' preferences shift over time and through past interactions. To address this, we present \texttt{AIF-GEN}, the first synthetic preference data generation platform designed for traditional and lifelong RLHF. We use \texttt{AIF-GEN} to instantiate 18 synthetic datasets and evaluate its quality using an LLM. We also perform human evaluation on a subset of the generated datasets to further confirm its quality. Our results show \texttt{AIF-GEN}’s potential to support the development of traditional and lifelong RLHF algorithms that align LLMs.

Chat is not available.