Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models

Uncertainty Quantification for MLLMs

Gregory Kang Ruey Lau · Hieu Dao · Nicole Hui Lin Kan · Bryan Kian Hsiang Low

Keywords: [ Multimodal LLMs ] [ MLLM ] [ reliable deployment ] [ selective answering ] [ Uncertainty quantification ]


Abstract:

Multimodal Large Language Models (MLLMs) hold promise in tackling challenging multimodal tasks, but may generate seemingly plausible but erroneous output, making them hard to trust and deploy in real-life settings. Generating accurate uncertainty metrics quickly for each MLLM response during inference could enable interventions such as escalating queries with uncertain responses to human experts or larger models for improved performance. However, existing uncertainty quantification methods require external verifiers, additional training, or high computational resources, and struggle to handle scenarios such as out-of-distribution (OOD) or adversarial settings. To overcome these limitations, we present an efficient and effective training-free framework to estimate MLLM output uncertainty at inference time without external tools, by computing metrics based on the diversity of the MLLM's responses that is augmented with internal indicators of each output's coherence. We empirically show that our method significantly outperforms benchmarks in predicting incorrect responses and providing calibrated uncertainty estimates, including for OOD, adversarial and domain-specific (e.g., medical radiology) data settings.

Chat is not available.