Abstract:
Text-to-image diffusion models can inadvertently memorize andregenerate unique training images, posing serious privacy andcopyright risks. While recent work links such memorization to sharpspikes in the model’s log-density Hessian, existing diagnostics stop atflagging \emph{that} a model overfits, not \emph{which} samples are toblame or how to remove them. We introduce \emph{layer-wise influencetracing}, a scalable Hessian decomposition that assigns every trainingimage a curvature-based influence score. Deleting only the top$1\%$ high-risk images and performing a single, low-learning-ratefine-tune cuts verbatim reconstructions in Stable Diffusion XL by$72\%$ while keeping Fréchet Inception Distance within $1\%$ of thebaseline. The full procedure costs just 2.3 GPU-hours—over an order ofmagnitude cheaper than full-Hessian methods—and yields similar gains ona 1-billion-parameter distilled backbone. Our results turn a coarsememorization signal into an actionable, data-centric mitigationstrategy, paving the way toward privacy-respecting generative models at10 B+ scale.