Poster
Preference Adaptive and Sequential Text-to-Image Generation
Ofir Nabati · Guy Tennenholtz · Chih-wei Hsu · Moonkyung Ryu · Deepak Ramachandran · Yinlam Chow · Xiang Li · Craig Boutilier
East Exhibition Hall A-B #E-2505
We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
Have you ever tried to create an image using AI from a text description, only for the result to not quite match what you had in mind? It's often difficult to perfectly convey complex or evolving artistic visions with a single instruction, leading to a frustrating trial-and-error process.We've developed an AI assistant called PASTA that learns your preferences through a more conversational approach to image generation. Instead of just one attempt, PASTA shows you several image options based on your initial idea. You then pick the images you like best, and PASTA uses this feedback to refine its suggestions over several turns, guiding the image generation closer to your desired outcome. To build this, we collected new data on how people make these sequential choices and even created simulated users to help train our AI.This research makes image generation with AI a more collaborative and intuitive experience. It allows users to better express their specific ideas, helping them bring complex or abstract visions to life more effectively. Ultimately, this work aims to make AI image generation tools more satisfying and better aligned with individual user intent, and we're sharing our data to help other researchers build even more advanced creative AI.