Poster
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun · Mark Rothermel · Marcus Rohrbach · Anna Rohrbach
West Exhibition Hall B2-B3 #W-506
The proliferation of disinformation demands reliable and scalable fact-checking solutions. We present Dynamic Evidence-based FAct-checking with Multimodal Experts (DEFAME), a modular, zero-shot MLLM pipeline for open-domain, text-image claim verification. DEFAME operates in a six-stage process, dynamically selecting the tools and search depth to extract and evaluate textual and visual evidence. Unlike prior approaches that are text-only, lack explainability, or rely solely on parametric knowledge, DEFAME performs end-to-end verification, accounting for images in claims and evidence while generating structured, multimodal reports. Evaluation on the popular benchmarks VERITE, AVeriTeC, and MOCHEG shows that DEFAME surpasses all previous methods, establishing itself as the new general state-of-the-art fact-checking system for uni- and multimodal fact-checking. Moreover, we introduce a new multimodal benchmark, ClaimReview2024+, featuring claims after the knowledge cutoff of GPT-4o, avoiding data leakage. Here, DEFAME drastically outperforms the GPT-4o baselines, showing temporal generalizability and the potential for real-time fact-checking.
Misinformation is becoming more common and harder to detect—especially when it mixes text with images. People often believe what they see, and misleading image-text combinations can quickly spread across the internet. To help address this, we built a system called DEFAME that checks whether claims found online are true or false, using both text and images.DEFAME mimics how a human fact-checker might work: it searches the web, reviews images, and cross-checks information from different sources. Unlike earlier systems that look at text alone or rely heavily on built-in memory, DEFAME uses external tools to find fresh and reliable evidence and then explains its verdict in clear, structured reports.We tested DEFAME on standard fact-checking tasks and also built a new set of recent claims, chosen to come after the time when models like GPT-4o were last updated. This is a good way to test how well a system handles fresh, real-world information. DEFAME not only beat older methods but also outperformed powerful models like GPT-4o on these newer claims. This shows that DEFAME may be better suited for keeping up with breaking news and fast-spreading misinformation.