Poster
AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment
Yuqin Cao · Xiongkuo Min · Yixuan Gao · Wei Sun · Guangtao Zhai
West Exhibition Hall B2-B3 #W-306
Can LMMs be utilized to evaluate the quality of audio-visual content (AGAV) generated by video-to-audio (VTA) methods? Our goal is to adapt LMMs to score AGAVs like humans.We introduce AGAVQA-3k, the first large-scale AGAV quality assessment dataset, comprising 3,382 AGAVs from 16 VTA methods. We further propose AGAV-Rater, an LMM-based model that can score AGAVs, as well as audio and music generated from text, across multiple dimensions. Remarkably, AGAV-Rater achieves state-of-the-art performance and can help VTA methods select the highest-quality AGAVs to present to users.Our research contributes to the study of AGAVs' perceptual quality and demonstrates its potential for supervising and controlling the quality of AGAVs.