Skip to yearly menu bar Skip to main content


Poster

Efficient Quantification of Multimodal Interaction at Sample Level

Zequn Yang · Hongfa Wang · Di Hu

East Exhibition Hall A-B #E-1401
[ ] [ ] [ Project Page ]
Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Interactions between modalities—redundancy, uniqueness, and synergy—collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction.Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI's precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.

Lay Summary:

We quantify how information is generated through multimodal interactions, efficiently distinguishing whether it originates from shared sources across modalities, is specific to a single modality, or emerges synergistically from their combined effect. This quantification offers practical insights for real-world datasets.

Chat is not available.