Invited Talk
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models
Invited Talk 5: Pratyush Maini - What Memorization Research Taught Me About Safety
As large language models scale, our understanding of "safety" has remained frustratingly surface-level—often limited to red teaming, RLHF, and output filtering. In this talk, I'll discuss how research into memorization provides a fundamentally different lens on safety. I'll trace the development of methods to precisely quantify memorization, detect unauthorized use of training data, and rigorously evaluate attempts at unlearning. This perspective illuminates broader societal and legal concerns, including emerging questions around copyright and ownership in a rapidly evolving technological landscape. Ultimately, this research underscores the need for natively safe models, as opposed to those with patches of alignment added on top of a harmful model.