Poster
in
Affinity Workshop: New In ML
MERIT: Multimodal Emotion Recognition via RL-Enhanced Test-Time Adaptation
Chen Zhang · Wenqing Wu · Yingqiu Zhang · Peihong He · Ziyang Liu
Multimodal Emotion Recognition (MER) seeks to understand human emotions by integrating information from textual, visual, and auditory modalities. While recent advances in Multimodal Large Language Models (MLLMs), such as Emotion-LLaMA, have demonstrated strong performance, they often struggle with distribution shifts and generalization to unseen test domains.In this work, we propose a reinforcement learning-enhanced MER framework that integrates Test-Time Reinforcement Learning (TTRL) with a novel Majority Voting-based Verified Reward mechanism. By incorporating an emotion-aware reward function shaped by an emotional distance matrix, our method enables dynamic adaptation of lightweight LoRA adapters within a frozen Qwen-32B backbone, thereby enhancing both emotional consistency and generalization.Extensive experiments on benchmark datasets, including CMU-MOSEI and IEMOCAP, show that our approach consistently outperforms strong baselines in terms of both accuracy and emotion consistency. Furthermore, ablation studies confirm the effectiveness of our soft reward design. These results underscore the potential of combining RL-based test-time adaptation with LLM-driven MER, offering a promising path toward more robust, adaptive, and emotionally intelligent AI systems.