Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
CCC: Enhancing Video Generation via Structured MLLM Feedback
Jing Gu · Ashwin Nagarajan · Tejas Polu · Kaizhi Zheng · Ruijian Zha · Jie Yang · Xin Eric Wang
Video generation from natural-language prompts has made impressive strides, but current systems frequently misalign outputs with their input descriptions, dropping critical details, and hallucinating unintended content. Existing approaches to improving video quality typically rely on heavyweight post-editing models, which may introduce new artifacts, or costly fine-tuning of the generator backbone, limiting scalability and accessibility. While multimodal large language models (MLLMs) have demonstrated strong capabilities in diagnosing visual-text misalignment, their use has largely focused on image-level improvement rather than video.Therefore, we introduce Critique Coach Calibration (CCC), a training-free, test-time prompt-adaptation framework that closes the loop between generation and evaluation. In each iteration, an off-the-shelf MLLM produces a structured critique of a generated video, highlighting misaligned semantics, subject drift, and missing objects, and then reformulates the input prompt based on its own feedback. By repeating this critique–coach cycle, CCC drives steady improvements in video quality without modifying the generator or relying on external editing modules. Empirical results on diverse video scenarios demonstrate that our approach consistently enhances semantic alignment and visual quality.