Poster
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)
FaceSafe: An Inpainting Pipeline for Privacy-Compliant Scalable Image Datasets
Sydney Su · Lening Cui · Ananya Salian · Roger You · Hao Cui · Charles Duong · Kevin Zhu · Sean O'Brien · Vasu Sharma
Keywords: [ LAION ] [ Diffusion Models ] [ Dataset ] [ Privacy ] [ Inpainting ]
Large-scale web-scraped datasets have contributed significantly to progress in deep learning, yet the extensive presence of biometrics data, such as faces, poses a legitimate legal, ethics, and privacy issue. Existing approaches address this by removing sensitive images entirely, often sacrificing downstream performance, or purchasing use of licensed images. To address this gap, we present a novel privacy preserving transformation pipeline that uses a diffusion-based inpainting model to systematically replace detected faces in images with multiple, synthetic variants conditioned on different demographic attributes, resulting in a novel, privacy-preserving dataset of distinct face images. Our method, evaluated on 12,000 images transformed from LAION-400M and CelebA-HQ, eliminates privacy risks without significant loss of image quality or diversity. This transformation pipeline will serve as a scalable guideline for the creation of datasets that follow legal and ethical privacy constraints.