Poster
in
Affinity Workshop: New In ML
CLIP-Guided Diffusion for Weakly Supervised Construction Equipment Detection
The digital transformation of the construction industry has created a demand for visual intelligence systems that are accurate, low-cost, and easy to deploy on smart construction sites. However, fully and semi-supervised object detection methods face limitations in such environments due to high annotation costs and complex, dynamic scenes. This study introduces a weakly supervised object localization approach based on the Generative Prompting Framework (GenPromp) framework, leveraging generative prompts and the CLIP multimodal model. Our method relies only on image-level labels and enables both automatic classification and localization of construction machinery. Experiments on the Alberta Construction Image Dataset (ACID) dataset show that the approach achieves 81.2% localization accuracy. The results highlight the effectiveness of generative prompts and multimodal semantic alignment in improving localization performance. These findings demonstrate the potential of weakly supervised learning for practical applications in real-world, complex construction environments.