Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Rejection Sampling Based Fine Tuning Secretly Performs PPO

Gautham Govind Anil · Dheeraj Nagaraj · Karthikeyan Shanmugam · Sanjay Shakkottai

[ ] [ Project Page ]
Fri 18 Jul 2:30 p.m. PDT — 3:15 p.m. PDT

Abstract:

Several downstream applications of pre-trained generative models require task-specific adaptations based on reward feedback. In this work, we examine strategies to fine-tune a pre-trained model given non-differentiable rewards on generations. We establish connections between Rejection Sampling based fine-tuning and Proximal Policy Optimization (PPO) - we use this formalism to establish PPO with marginal KL constraints for diffusion models. A framework for intermediate denoising step fine-tuning is then proposed for more sample-efficient fine-tuning of diffusion models. Experimental results are presented on the tasks of layout generation and molecule generation to validate the claims.

Chat is not available.