Poster
in
Affinity Workshop: 4th MusIML workshop at ICML’25
GazaVHR: AI-Driven Legally Grounded Conflict Harm Documentation
Nesibe Sebnem Paluluoglu · Dilara Zeynep Gürer · Muhammed Akıncı · Mustafa Taha Kocyigit
We present GazaVHR, a vision-language model (VLM)-annotated dataset for fine-grained analysis of potential human rights violations in Gaza conflict imagery. Sourced from 145,662 conflict-related tweets, our pipeline integrates vision-language models, vision encoders, and semantic clustering to generate structured annotations with minimal manual intervention. Beginning with 176,731 raw images, a multi-stage filtering (content rules, deduplication, semantic clustering) identifies 13,834 visually unique instances that are most likely conflict-relevant. To ensure legal relevance, we align results with the Kanıt (Evidence) dataset: 231 expert-curated images grounded in the Rome Statute of the International Criminal Court (ICC Articles 5–8). This framework refines the dataset to 4,603 high-confidence images likely indicative of conflict-related harm. While our work highlights AI’s potential to systematize human rights documentation at scale, we acknowledge limitations in reduced manual oversight and biases inherent to LLM-based annotation and hashtag-driven social media data.