Fri 9:00 a.m. - 9:45 a.m.
|
Yasaman Bahri (Google DeepMind), On the emergence of linear structure in word embeddings
(
Invited Talk
)
>
|
Yasaman Bahri
🔗
|
Fri 9:45 a.m. - 10:00 a.m.
|
Rishi Sonthalia (Boston College), Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
(
Contributed talk
)
>
|
🔗
|
Fri 10:00 a.m. - 10:15 a.m.
|
Jivan Waber, Fundamental Limits of Learning Single-Index Models under Structured Data
(
Contributed talk
)
>
|
🔗
|
Fri 10:15 a.m. - 11:15 a.m.
|
Poster session / Break
(
Poster Session
)
>
|
🔗
|
Fri 11:15 a.m. - 12:00 p.m.
|
Reza Gheissari (Northwestern University), Local geometry and effective spectral theory of high-dimensional classification
(
Invited Talk
)
>
|
Reza Gheissari
🔗
|
Fri 12:00 p.m. - 12:45 p.m.
|
Florent Krzakala (EPFL), Asymptotics of Neural Networks in High Dimensions: From Sparse Representations to Deep Hierarchies
(
Invited Talk
)
>
|
FLORENT KRZAKALA
🔗
|
Fri 12:45 p.m. - 2:00 p.m.
|
Lunch/Break
|
🔗
|
Fri 2:00 p.m. - 2:45 p.m.
|
Nathan Srebro (TTIC& University of Chicago), Is A Good Input Distribution All You Need?
(
Invited Talk
)
>
|
Nati Srebro
🔗
|
Fri 2:45 p.m. - 3:00 p.m.
|
Emanuele Troiani (EPFL) Bayes optimal learning of attention-indexed models
(
Contributed talk
)
>
|
🔗
|
Fri 3:00 p.m. - 3:15 p.m.
|
Alireza Mousavi-Hosseini (University of Toronto), When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
(
Contributed talk
)
>
|
🔗
|
Fri 3:15 p.m. - 3:30 p.m.
|
Fabiola Ricci (SISSA) Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
(
Contributed talk
)
>
|
🔗
|
Fri 3:30 p.m. - 4:00 p.m.
|
Break
|
🔗
|
Fri 4:00 p.m. - 4:45 p.m.
|
Zhou Fan (Yale), Dynamical mean-field analysis of adaptive Langevin diffusions
|
Zhou Fan
🔗
|
Fri 4:45 p.m. - 5:30 p.m.
|
Poster Session
|
🔗
|
-
|
Understanding Generalization in Diffusion Models via Probability Flow Distance
(
Poster
)
>
link
|
Huijie Zhang · Zijian Huang · Siyi Chen · Jinfan Zhou · Zekai Zhang · Peng Wang · Qing Qu
🔗
|
-
|
Origins of Creativity in Attention Based Diffusion Models
(
Poster
)
>
link
|
Emma Finn · T. Anderson Keller · Emmanouil Theodosis · Demba Ba
🔗
|
-
|
Towards an Optimal Control Perspective of ResNet Training
(
Poster
)
>
link
|
Jens PĂĽttschneider · Simon Heilig · Asja Fischer · Timm Faulwasser
🔗
|
-
|
Reduce and Conquer: Independent Component Analysis at linear sample complexity
(
Poster
)
>
link
|
Fabiola Ricci · Lorenzo Bardone · Sebastian Goldt
🔗
|
-
|
Reduce and Conquer: Independent Component Analysis at linear sample complexity
(
Oral
)
>
link
|
Fabiola Ricci · Lorenzo Bardone · Sebastian Goldt
🔗
|
-
|
Catalyst: Structured Pruning with Robust Bifurcation Dynamics
(
Poster
)
>
link
|
Jaeheun Jung · Donghun Lee
🔗
|
-
|
Grokking and Generalization Collapse: Insights from HTSR theory
(
Poster
)
>
link
|
Hari Prakash · Charles H Martin
🔗
|
-
|
Bayesian Influence Functions for Scalable Data Attribution
(
Poster
)
>
link
|
Philipp Kreer · Wilson Wu · Maxwell Adam · Zach Furman · Jesse Hoogland
🔗
|
-
|
On the Interaction of Noise, Compression, and Adaptivity under $(L_0,L_1)$-Smoothness: An SDE Approach
(
Poster
)
>
link
|
Enea Monzio Compagnoni · Rustem Islamov · Antonio Orvieto · Eduard Gorbunov
🔗
|
-
|
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
(
Poster
)
>
link
|
Chen Fan · Mark Schmidt · Christos Thrampoulidis
🔗
|
-
|
Low Rank Gradients and Where To Find Them
(
Poster
)
>
link
|
Rishi Sonthalia · Michael Murray · Guido Montufar
🔗
|
-
|
Reactivation: Empirical NTK Dynamics Under Task Shifts
(
Poster
)
>
link
|
Yuzhi LIU · Zixuan Chen · Zirui zhang · Yufei Liu · Giulia Lanzillotta
🔗
|
-
|
The Price of Robustness: Stable Classifiers Need Overparameterization
(
Poster
)
>
link
|
Jonas von Berg · Adalbert Fono · Massimiliano Datres · Sohir Maskey · Gitta Kutyniok
🔗
|
-
|
Adapting to High Dimensional Concepts with Metalearning
(
Poster
)
>
link
|
Max Gupta
🔗
|
-
|
Theoretical Guarantees and Training Dynamics of Contrastive Learning: How Misaligned Data Influence Feature Purity
(
Poster
)
>
link
|
Jiawei Sun · Shuai Zhang · Hongkang Li · Meng Wang
🔗
|
-
|
Learning how to step in gradient-based optimization: beyond convexity and smoothness
(
Poster
)
>
link
|
Dravyansh Sharma
🔗
|
-
|
Towards Understanding Orthogonalization in Muon
(
Poster
)
>
link
|
Valentyn Boreiko · Zhiqi Bu · Sheng Zha
🔗
|
-
|
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
(
Poster
)
>
link
|
Alireza Mousavi-Hosseini · Clayton Sanford · Denny Wu · Murat Erdogdu
🔗
|
-
|
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
(
Oral
)
>
link
|
Alireza Mousavi-Hosseini · Clayton Sanford · Denny Wu · Murat Erdogdu
🔗
|
-
|
Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks
(
Poster
)
>
link
|
Yehonatan Avidan · Haim Sompolinsky
🔗
|
-
|
The Interplay Between Implicit Bias and Adversarial Robustness in Linear Convolutional Neural Networks
(
Poster
)
>
link
|
AurĂ©lien Boland · Hannah Pinson
🔗
|
-
|
Feature learning is decoupled from generalization in high capacity neural networks
(
Poster
)
>
link
|
Niclas Göring · Charles London · Abdurrahman Erturk · Chris Mingard · Yoonsoo Nam · Ard Louis
🔗
|
-
|
Selective Prediction via Training Dynamics
(
Poster
)
>
link
|
Stephan Rabanser · Anvith Thudi · Kimia Hamidieh · Adam Dziedzic · Israfil Bahceci · Akram Bin Sediq · Hamza Sokun · Nicolas Papernot
🔗
|
-
|
How Transformers Get Rich: Training Dynamics Analysis
(
Poster
)
>
link
|
Mingze Wang · Ruoxi Yu · Weinan E · Lei Wu
🔗
|
-
|
Quantization and the Bottom of the Loss Landscape
(
Poster
)
>
link
|
Luca Di Carlo · Daniel Bernstein · David Schwab
🔗
|
-
|
The Shape of Generalization through the Lens of Norm-based Capacity Control
(
Poster
)
>
link
|
Yichen Wang · Yudong Chen · Lorenzo Rosasco · Fanghui Liu
🔗
|
-
|
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
(
Poster
)
>
link
|
Pulkit Gopalani · Wei Hu
🔗
|
-
|
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
(
Poster
)
>
link
|
Junsoo Oh · Jerry Song · Chulhee Yun
🔗
|
-
|
In Search of Adam’s Secret Sauce
(
Poster
)
>
link
|
Antonio Orvieto · Robert Gower
🔗
|
-
|
A solvable generative model with a linear, one-step denoiser
(
Poster
)
>
link
|
Indranil Halder
🔗
|
-
|
On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance
(
Poster
)
>
link
|
Qiaoyue Tang · Alain Zhiyanov · Mathias LĂ©cuyer
🔗
|
-
|
Two-point deterministic equivalence for SGD in random feature models
(
Poster
)
>
link
|
Alexander Atanasov · Blake Bordelon · Jacob A Zavatone-Veth · Courtney Paquette · Cengiz Pehlevan
🔗
|
-
|
On the Existence of Hidden Subnetworks Within a Randomly Weighted Multi-Head Attention Mechanism
(
Poster
)
>
link
|
Hikari Otsuka · Yasuyuki Okoshi · Daichi Fujiki · Susumu Takeuchi · Masato Motomura · Daiki Chijiwa
🔗
|
-
|
Studying Data Complexity and Learned Structure in Neural Networks with Bayesian Probes
(
Poster
)
>
link
|
Maxwell Adam · Zach Furman · Wilson Wu · Philipp Kreer · Jesse Hoogland
🔗
|
-
|
Latent Concept Disentanglement in Transformer-based Language Models
(
Poster
)
>
link
|
Guan Zhe Hong · Bhavya Vasudeva · Vatsal Sharan · Cyrus Rashtchian · Prabhakar Raghavan · Rina Panigrahy
🔗
|
-
|
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
(
Poster
)
>
link
|
Minhak Song · Beomhan Baek · Kwangjun Ahn · Chulhee Yun
🔗
|
-
|
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
(
Poster
)
>
link
|
Zhanpeng Zhou · Yongyi Yang · Mahito Sugiyama · Junchi Yan
🔗
|
-
|
Understanding Lookahead Dynamics Through Laplace Transforms
(
Poster
)
>
link
|
Aniket Sanyal · Tatjana Chavdarova
🔗
|
-
|
Emergent Linear Separability of Unseen Data Points in High-dimensional Last-Layer Feature Space
(
Poster
)
>
link
|
Taehun Cha · Donghun Lee
🔗
|
-
|
Attention with Trained Embeddings Provably Selects Important Tokens
(
Poster
)
>
link
|
Diyuan Wu · Aleksandr Shevchenko · Samet Oymak · Marco Mondelli
🔗
|
-
|
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
(
Poster
)
>
link
|
Alessandro Favero · Antonio Sclocchi · Francesco Cagnetta · Pascal Frossard · Matthieu Wyart
🔗
|
-
|
A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
(
Poster
)
>
link
|
Nandan Kumar Jha · Brandon Reagen
🔗
|
-
|
Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers
(
Poster
)
>
link
|
Peter SĂşkenĂk · Christoph Lampert · Marco Mondelli
🔗
|
-
|
Topology-Aware Robust Representation Balancing for Estimating Causal Effects
(
Poster
)
>
link
|
Amirhossein Farzam · Ahmed Aloui · Vahid Tarokh · Guillermo Sapiro
🔗
|
-
|
Training Dynamics of In-Context Learning in Linear Attention
(
Poster
)
>
link
|
Yedi Zhang · Aaditya Singh · Peter Latham · Andrew Saxe
🔗
|
-
|
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
(
Poster
)
>
link
|
Yujun Kim · Chaewon Moon · Chulhee Yun
🔗
|
-
|
The Silent Helper: How Implicit Regularization Enhances Group Robustness
(
Poster
)
>
link
|
12 presenters
Nahal Mirzaie · Mahdi Ghaznavi · Hosna Oyarhoseini · Alireza Alipanah · Erfan Sobhaei · Ali Abbasi · Amirmahdi Farzane · Hossein Jafarinia · Parsa Sedeh · Arefe Boushehrian · Mahdieh Soleymani Baghshah · Mohammad H Rohban
🔗
|
-
|
Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping
(
Poster
)
>
link
|
Simone Bombari · Inbar Seroussi · Marco Mondelli
🔗
|
-
|
Bayes optimal learning of attention-indexed models
(
Poster
)
>
link
|
Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborova
🔗
|
-
|
Bayes optimal learning of attention-indexed models
(
Oral
)
>
link
|
Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborová
🔗
|
-
|
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
(
Poster
)
>
link
|
Xinghan Li · Haodong Wen · Kaifeng Lyu
🔗
|
-
|
Different simultaneous mechanisms for in-context recall have distinct learning dynamics
(
Poster
)
>
link
|
Sultan Daniels · Dylan Davis · Dhruv Gautam · Wentinn Liao · Gireeja Ranade · Anant Sahai
🔗
|
-
|
A simple connection from loss flatness to compressed neural representations
(
Poster
)
>
link
|
Shirui Chen · Stefano Recanatesi · Eric SheaBrown
🔗
|
-
|
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
(
Poster
)
>
link
|
Teodora Srećković · Jonas Geiping · Antonio Orvieto
🔗
|
-
|
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
(
Poster
)
>
link
|
Annalisa Belloni · Lorenzo Noci · Antonio Orvieto
🔗
|
-
|
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
(
Poster
)
>
link
|
Vittorio Erba · Emanuele Troiani · Lenka Zdeborová · FLORENT KRZAKALA
🔗
|
-
|
Data-Free Transformer Quantization Using Parameter-Space Symmetry
(
Poster
)
>
link
|
Lucas Laird · Bo Zhao · Rose Yu · Robin Walters
🔗
|
-
|
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rank Solutions
(
Poster
)
>
link
|
Baekrok Shin · Chulhee Yun
🔗
|
-
|
Revisiting the Goldilocks Zone in Inhomogeneous Networks
(
Poster
)
>
link
|
Zacharie Cuchet · Sarath Chandar · Ekaterina Lobacheva
🔗
|
-
|
Emergence of Hebbian Dynamics in Regularized Non-Local Learners
(
Poster
)
>
link
|
David Koplow · Tomaso A Poggio · Liu Ziyin
🔗
|
-
|
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
(
Poster
)
>
link
|
Tongcheng Zhang · Zhanpeng Zhou · Mingze Wang · Andi Han · Wei Huang · Taiji Suzuki · Junchi Yan
🔗
|
-
|
Rethinking Memorization–Generalization Trade-Off in Generative Models
(
Poster
)
>
link
|
Jiseok Chae · Kyuwon Kim · Donghwan Kim
🔗
|
-
|
Data Free Metrics Are Not Reparameterisation Invariant Under the Critical and Robust Layer Phenomena
(
Poster
)
>
link
|
Gabryel Mason-Williams · Israel Mason-Williams · Fredrik Dahlqvist
🔗
|
-
|
Information-Geometric Neural Granger Causality
(
Poster
)
>
link
|
Pauline Bourigault · Danilo Mandic
🔗
|
-
|
Symmetries in Weight Space Learning: To Retain or Remove?
(
Poster
)
>
link
|
Fynn Kiwitt · Behrooz Tahmasebi · Stefanie Jegelka
🔗
|
-
|
Emergent Specialization: Rare Token Neurons in Language Models
(
Poster
)
>
link
|
Jing Liu · Haozheng Wang · Yueheng Li
🔗
|
-
|
Quantitative Bounds for Length Generalization in Transformers
(
Poster
)
>
link
|
Zachary Izzo · Eshaan Nichani · Jason Lee
🔗
|
-
|
Lyapunov Learning at the Onset of Chaos
(
Poster
)
>
link
|
Alessandro Londei · Denise Lanzieri · Matteo Benati · Vittorio Loreto
🔗
|
-
|
Understanding Mamba in In-Context Learning with Outliers: A Theoretical Generalization Analysis
(
Poster
)
>
link
|
Hongkang Li · Songtao Lu · Xiaodong Cui · Pin-Yu Chen · Meng Wang
🔗
|
-
|
Input differentiation via negative computation
(
Poster
)
>
link
|
Linghao Kong · Angelina Ning · Nir Shavit
🔗
|
-
|
Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
(
Poster
)
>
link
|
Karthik Viswanathan · Sangeon Park
🔗
|
-
|
Exploring L2-Phase Transitions on Error Landscapes
(
Poster
)
>
link
|
Ibrahim Ersoy · Karoline Wiesner
🔗
|
-
|
Generalisation and Safety Critical Evaluations at Sharp Minima: A Geometric Reappraisal
(
Poster
)
>
link
|
Israel Mason-Williams · Gabryel Mason-Williams · Helen Yannakoudakis
🔗
|
-
|
Learning curves theory of hierarchically compositional data with power-law distributed features
(
Poster
)
>
link
|
Francesco Cagnetta · Hyunmo Kang · Matthieu Wyart
🔗
|
-
|
Tracing the representation geometry of language models from pretraining to post-training
(
Poster
)
>
link
|
Melody Li · Kumar Agrawal · Arna Ghosh · Komal Teru · Guillaume Lajoie · Blake Richards
🔗
|
-
|
On Generalization of Spectral Gradient Descent: A Case Study on Imbalanced Data
(
Poster
)
>
link
|
Bhavya Vasudeva · Puneesh Deora · Christos Thrampoulidis
🔗
|
-
|
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
(
Poster
)
>
link
|
Jiping Li · Rishi Sonthalia
🔗
|
-
|
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
(
Oral
)
>
link
|
Jiping Li · Rishi Sonthalia
🔗
|
-
|
Exploration Behavior of Untrained Policies
(
Poster
)
>
link
|
Jacob Adamczyk
🔗
|
-
|
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
(
Poster
)
>
link
|
Artem Riabinin · Egor Shulgin · Kaja Gruntkowska · Peter Richtarik
🔗
|
-
|
Spectral Dynamics of Contrastive Learning with Spurious Features
(
Poster
)
>
link
|
Naghmeh Ghanooni · Dennis Wagner · Waleed Mustafa · Anthony Lin · Sophie Fellenz · Marius Kloft
🔗
|
-
|
When Can You Get Away with Low Memory Adam?
(
Poster
)
>
link
|
Dayal Singh Kalra · John Kirchenbauer · Maissam Barkeshli · Tom Goldstein
🔗
|
-
|
Benignity of loss landscape with weight decay requires both large overparametrization and initialization
(
Poster
)
>
link
|
Etienne Boursier · Matthew Bowditch · Matthias Englert · Ranko Lazic
🔗
|
-
|
Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity
(
Poster
)
>
link
|
George Giapitzakis · Artur Back de Luca · Kimon Fountoulakis
🔗
|
-
|
Jacobian Alignment Explains Grokking and Centroid Alignment Identifies It
(
Poster
)
>
link
|
Thomas Walker · Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk
🔗
|
-
|
Fundamental Limits of Learning Single-Index Models under Structured Data
(
Poster
)
>
link
|
Jivan Waber · Alireza Mousavi-Hosseini · Murat Erdogdu
🔗
|
-
|
Fundamental Limits of Learning Single-Index Models under Structured Data
(
Oral
)
>
link
|
Jivan Waber · Alireza Mousavi-Hosseini · Murat Erdogdu
🔗
|
-
|
Understanding Normalization Layers for Sparse Training
(
Poster
)
>
link
|
Mohammed Adnan · Ekansh Sharma · Rahul G. Krishnan · Yani Ioannou
🔗
|
-
|
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
(
Poster
)
>
link
|
Konstantinos C Tsiolis · Alireza Mousavi-Hosseini · Murat Erdogdu
🔗
|