Oral
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models
Low Resource Reconstruction Attacks Through Benign Prompts
Sol Yarkoni · Roi Livni
Sat 19 Jul 8:25 a.m. PDT — 5 p.m. PDT
The rising popularity of diffusion models, have raised serious concerns around privacy, copyright, and data leakage. Prior work has demonstrated that training data can be partially reconstructed, but these attacks often require significant resources, training set access, or carefully crafted prompts. In this work, we present a low-resource attack that reveals a more subtle risk: even seemingly innocuous prompts can lead to the unintended reconstruction of real training images. Strikingly, we show that prompts like “Abstract Art Unisex T-Shirt” can generate identifiable human faces included in the training data. Our findings point to a systemic vulnerability rooted in the use of scraped e-commerce data, where templated layouts tightly couple visual content with prompt patterns. This raises new concerns about the ease with which unintentional data leaks may occur.