3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Workshop

3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Atish Agarwala · Aukosh Jagannath · Jason Lee · Bruno Loureiro · Inbar Seroussi

West Meeting Room 118-120

Fri 18 Jul, 9 a.m. PDT

[ Abstract ]

[ OpenReview]

Modern machine learning applications face the challenge of extracting insights from high-dimensional datasets. The 3rd High-dimensional Learning Dynamics (HiLD) Workshop focuses on predicting and analyzing the behavior of learning algorithms in regimes where both the number of samples and parameters are large. This workshop aims to advance research and foster collaboration in several key areas:1. Developing tractable models and dynamical frameworks to explain phenomena observed in deep neural networks (DNNs) and foundation models;2. Establishing mathematical frameworks for neural scaling laws as network width and depth approach infinity;3. Identifying and characterizing relevant observable quantities in high-dimensional limits;4. Understanding the provable effects of optimization algorithms, hyperparameters, and neural architectures on training and test dynamics.The HiLD Workshop will unite experts from random matrix theory, optimization, high-dimensional statistics/probability, and statistical physics to share diverse perspectives on these challenges. By bringing together theorists and practitioners from machine learning with researchers from these adjacent fields, we aim to create new collaborations between communities that often do not interact. Through talks, poster sessions, and panel discussions, the workshop will explore the fundamental dynamics of learning algorithms in high-dimensional settings. This year's workshop theme is "Navigating Complexity: Feature Learning Dynamics at Scale."

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 9:00 a.m. - 9:45 a.m.	Yasaman Bahri (Google DeepMind), On the emergence of linear structure in word embeddings ( Invited Talk ) >	Yasaman Bahri 🔗
Fri 9:45 a.m. - 10:00 a.m.	Rishi Sonthalia (Boston College), Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting ( Contributed talk ) >	🔗
Fri 10:00 a.m. - 10:15 a.m.	Jivan Waber, Fundamental Limits of Learning Single-Index Models under Structured Data ( Contributed talk ) >	🔗
Fri 10:15 a.m. - 11:15 a.m.	Poster session / Break ( Poster Session ) >	🔗
Fri 11:15 a.m. - 12:00 p.m.	Reza Gheissari (Northwestern University), Local geometry and effective spectral theory of high-dimensional classification ( Invited Talk ) >	Reza Gheissari 🔗
Fri 12:00 p.m. - 12:45 p.m.	Florent Krzakala (EPFL), Asymptotics of Neural Networks in High Dimensions: From Sparse Representations to Deep Hierarchies ( Invited Talk ) >	FLORENT KRZAKALA 🔗
Fri 12:45 p.m. - 2:00 p.m.	Lunch/Break	🔗
Fri 2:00 p.m. - 2:45 p.m.	Nathan Srebro (TTIC& University of Chicago), Is A Good Input Distribution All You Need? ( Invited Talk ) >	Nati Srebro 🔗
Fri 2:45 p.m. - 3:00 p.m.	Emanuele Troiani (EPFL) Bayes optimal learning of attention-indexed models ( Contributed talk ) >	🔗
Fri 3:00 p.m. - 3:15 p.m.	Alireza Mousavi-Hosseini (University of Toronto), When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective ( Contributed talk ) >	🔗
Fri 3:15 p.m. - 3:30 p.m.	Fabiola Ricci (SISSA) Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions ( Contributed talk ) >	🔗
Fri 3:30 p.m. - 4:00 p.m.	Break	🔗
Fri 4:00 p.m. - 4:45 p.m.	Zhou Fan (Yale), Dynamical mean-field analysis of adaptive Langevin diffusions	Zhou Fan 🔗
Fri 4:45 p.m. - 5:30 p.m.	Poster Session	🔗
-	Understanding Generalization in Diffusion Models via Probability Flow Distance ( Poster ) > link Link	Huijie Zhang · Zijian Huang · Siyi Chen · Jinfan Zhou · Zekai Zhang · Peng Wang · Qing Qu 🔗
-	Origins of Creativity in Attention Based Diffusion Models ( Poster ) > link Link	Emma Finn · T. Anderson Keller · Emmanouil Theodosis · Demba Ba 🔗
-	Towards an Optimal Control Perspective of ResNet Training ( Poster ) > link Link	Jens Püttschneider · Simon Heilig · Asja Fischer · Timm Faulwasser 🔗
-	Reduce and Conquer: Independent Component Analysis at linear sample complexity ( Poster ) > link Link	Fabiola Ricci · Lorenzo Bardone · Sebastian Goldt 🔗
-	Reduce and Conquer: Independent Component Analysis at linear sample complexity ( Oral ) > link Link	Fabiola Ricci · Lorenzo Bardone · Sebastian Goldt 🔗
-	Catalyst: Structured Pruning with Robust Bifurcation Dynamics ( Poster ) > link Link	Jaeheun Jung · Donghun Lee 🔗
-	Grokking and Generalization Collapse: Insights from HTSR theory ( Poster ) > link Link	Hari Prakash · Charles H Martin 🔗
-	Bayesian Influence Functions for Scalable Data Attribution ( Poster ) > link Link	Philipp Kreer · Wilson Wu · Maxwell Adam · Zach Furman · Jesse Hoogland 🔗
-	On the Interaction of Noise, Compression, and Adaptivity under $(L_0,L_1)$-Smoothness: An SDE Approach ( Poster ) > link Link	Enea Monzio Compagnoni · Rustem Islamov · Antonio Orvieto · Eduard Gorbunov 🔗
-	Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data ( Poster ) > link Link	Chen Fan · Mark Schmidt · Christos Thrampoulidis 🔗
-	Low Rank Gradients and Where To Find Them ( Poster ) > link Link	Rishi Sonthalia · Michael Murray · Guido Montufar 🔗
-	Reactivation: Empirical NTK Dynamics Under Task Shifts ( Poster ) > link Link	Yuzhi LIU · Zixuan Chen · Zirui zhang · Yufei Liu · Giulia Lanzillotta 🔗
-	The Price of Robustness: Stable Classifiers Need Overparameterization ( Poster ) > link Link	Jonas von Berg · Adalbert Fono · Massimiliano Datres · Sohir Maskey · Gitta Kutyniok 🔗
-	Adapting to High Dimensional Concepts with Metalearning ( Poster ) > link Link	Max Gupta 🔗
-	Theoretical Guarantees and Training Dynamics of Contrastive Learning: How Misaligned Data Influence Feature Purity ( Poster ) > link Link	Jiawei Sun · Shuai Zhang · Hongkang Li · Meng Wang 🔗
-	Learning how to step in gradient-based optimization: beyond convexity and smoothness ( Poster ) > link Link	Dravyansh Sharma 🔗
-	Towards Understanding Orthogonalization in Muon ( Poster ) > link Link	Valentyn Boreiko · Zhiqi Bu · Sheng Zha 🔗
-	When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective ( Poster ) > link Link	Alireza Mousavi-Hosseini · Clayton Sanford · Denny Wu · Murat Erdogdu 🔗
-	When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective ( Oral ) > link Link	Alireza Mousavi-Hosseini · Clayton Sanford · Denny Wu · Murat Erdogdu 🔗
-	Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks ( Poster ) > link Link	Yehonatan Avidan · Haim Sompolinsky 🔗
-	The Interplay Between Implicit Bias and Adversarial Robustness in Linear Convolutional Neural Networks ( Poster ) > link Link	Aurélien Boland · Hannah Pinson 🔗
-	Feature learning is decoupled from generalization in high capacity neural networks ( Poster ) > link Link	Niclas Göring · Charles London · Abdurrahman Erturk · Chris Mingard · Yoonsoo Nam · Ard Louis 🔗
-	Selective Prediction via Training Dynamics ( Poster ) > link Link	Stephan Rabanser · Anvith Thudi · Kimia Hamidieh · Adam Dziedzic · Israfil Bahceci · Akram Bin Sediq · Hamza Sokun · Nicolas Papernot 🔗
-	How Transformers Get Rich: Training Dynamics Analysis ( Poster ) > link Link	Mingze Wang · Ruoxi Yu · Weinan E · Lei Wu 🔗
-	Quantization and the Bottom of the Loss Landscape ( Poster ) > link Link	Luca Di Carlo · Daniel Bernstein · David Schwab 🔗
-	The Shape of Generalization through the Lens of Norm-based Capacity Control ( Poster ) > link Link	Yichen Wang · Yudong Chen · Lorenzo Rosasco · Fanghui Liu 🔗
-	What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers ( Poster ) > link Link	Pulkit Gopalani · Wei Hu 🔗
-	From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning ( Poster ) > link Link	Junsoo Oh · Jerry Song · Chulhee Yun 🔗
-	In Search of Adam’s Secret Sauce ( Poster ) > link Link	Antonio Orvieto · Robert Gower 🔗
-	A solvable generative model with a linear, one-step denoiser ( Poster ) > link Link	Indranil Halder 🔗
-	On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance ( Poster ) > link Link	Qiaoyue Tang · Alain Zhiyanov · Mathias Lécuyer 🔗
-	Two-point deterministic equivalence for SGD in random feature models ( Poster ) > link Link	Alexander Atanasov · Blake Bordelon · Jacob A Zavatone-Veth · Courtney Paquette · Cengiz Pehlevan 🔗
-	On the Existence of Hidden Subnetworks Within a Randomly Weighted Multi-Head Attention Mechanism ( Poster ) > link Link	Hikari Otsuka · Yasuyuki Okoshi · Daichi Fujiki · Susumu Takeuchi · Masato Motomura · Daiki Chijiwa 🔗
-	Studying Data Complexity and Learned Structure in Neural Networks with Bayesian Probes ( Poster ) > link Link	Maxwell Adam · Zach Furman · Wilson Wu · Philipp Kreer · Jesse Hoogland 🔗
-	Latent Concept Disentanglement in Transformer-based Language Models ( Poster ) > link Link	Guan Zhe Hong · Bhavya Vasudeva · Vatsal Sharan · Cyrus Rashtchian · Prabhakar Raghavan · Rina Panigrahy 🔗
-	Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training ( Poster ) > link Link	Minhak Song · Beomhan Baek · Kwangjun Ahn · Chulhee Yun 🔗
-	New Evidence of the Two-Phase Learning Dynamics of Neural Networks ( Poster ) > link Link	Zhanpeng Zhou · Yongyi Yang · Mahito Sugiyama · Junchi Yan 🔗
-	Understanding Lookahead Dynamics Through Laplace Transforms ( Poster ) > link Link	Aniket Sanyal · Tatjana Chavdarova 🔗
-	Emergent Linear Separability of Unseen Data Points in High-dimensional Last-Layer Feature Space ( Poster ) > link Link	Taehun Cha · Donghun Lee 🔗
-	Attention with Trained Embeddings Provably Selects Important Tokens ( Poster ) > link Link	Diyuan Wu · Aleksandr Shevchenko · Samet Oymak · Marco Mondelli 🔗
-	How Compositional Generalization and Creativity Improve as Diffusion Models are Trained ( Poster ) > link Link	Alessandro Favero · Antonio Sclocchi · Francesco Cagnetta · Pascal Frossard · Matthieu Wyart 🔗
-	A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention ( Poster ) > link Link	Nandan Kumar Jha · Brandon Reagen 🔗
-	Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers ( Poster ) > link Link	Peter Súkeník · Christoph Lampert · Marco Mondelli 🔗
-	Topology-Aware Robust Representation Balancing for Estimating Causal Effects ( Poster ) > link Link	Amirhossein Farzam · Ahmed Aloui · Vahid Tarokh · Guillermo Sapiro 🔗
-	Training Dynamics of In-Context Learning in Linear Attention ( Poster ) > link Link	Yedi Zhang · Aaditya Singh · Peter Latham · Andrew Saxe 🔗
-	The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets ( Poster ) > link Link	Yujun Kim · Chaewon Moon · Chulhee Yun 🔗
-	The Silent Helper: How Implicit Regularization Enhances Group Robustness ( Poster ) > link Link	12 presenters Nahal Mirzaie · Mahdi Ghaznavi · Hosna Oyarhoseini · Alireza Alipanah · Erfan Sobhaei · Ali Abbasi · Amirmahdi Farzane · Hossein Jafarinia · Parsa Sedeh · Arefe Boushehrian · Mahdieh Soleymani Baghshah · Mohammad H Rohban 🔗
-	Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping ( Poster ) > link Link	Simone Bombari · Inbar Seroussi · Marco Mondelli 🔗
-	Bayes optimal learning of attention-indexed models ( Poster ) > link Link	Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborova 🔗
-	Bayes optimal learning of attention-indexed models ( Oral ) > link Link	Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborová 🔗
-	Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold ( Poster ) > link Link	Xinghan Li · Haodong Wen · Kaifeng Lyu 🔗
-	Different simultaneous mechanisms for in-context recall have distinct learning dynamics ( Poster ) > link Link	Sultan Daniels · Dylan Davis · Dhruv Gautam · Wentinn Liao · Gireeja Ranade · Anant Sahai 🔗
-	A simple connection from loss flatness to compressed neural representations ( Poster ) > link Link	Shirui Chen · Stefano Recanatesi · Eric SheaBrown 🔗
-	Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling ( Poster ) > link Link	Teodora Srećković · Jonas Geiping · Antonio Orvieto 🔗
-	Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers ( Poster ) > link Link	Annalisa Belloni · Lorenzo Noci · Antonio Orvieto 🔗
-	The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks ( Poster ) > link Link	Vittorio Erba · Emanuele Troiani · Lenka Zdeborová · FLORENT KRZAKALA 🔗
-	Data-Free Transformer Quantization Using Parameter-Space Symmetry ( Poster ) > link Link	Lucas Laird · Bo Zhao · Rose Yu · Robin Walters 🔗
-	Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rank Solutions ( Poster ) > link Link	Baekrok Shin · Chulhee Yun 🔗
-	Revisiting the Goldilocks Zone in Inhomogeneous Networks ( Poster ) > link Link	Zacharie Cuchet · Sarath Chandar · Ekaterina Lobacheva 🔗
-	Emergence of Hebbian Dynamics in Regularized Non-Local Learners ( Poster ) > link Link	David Koplow · Tomaso A Poggio · Liu Ziyin 🔗
-	On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD ( Poster ) > link Link	Tongcheng Zhang · Zhanpeng Zhou · Mingze Wang · Andi Han · Wei Huang · Taiji Suzuki · Junchi Yan 🔗
-	Rethinking Memorization–Generalization Trade-Off in Generative Models ( Poster ) > link Link	Jiseok Chae · Kyuwon Kim · Donghwan Kim 🔗
-	Data Free Metrics Are Not Reparameterisation Invariant Under the Critical and Robust Layer Phenomena ( Poster ) > link Link	Gabryel Mason-Williams · Israel Mason-Williams · Fredrik Dahlqvist 🔗
-	Information-Geometric Neural Granger Causality ( Poster ) > link Link	Pauline Bourigault · Danilo Mandic 🔗
-	Symmetries in Weight Space Learning: To Retain or Remove? ( Poster ) > link Link	Fynn Kiwitt · Behrooz Tahmasebi · Stefanie Jegelka 🔗
-	Emergent Specialization: Rare Token Neurons in Language Models ( Poster ) > link Link	Jing Liu · Haozheng Wang · Yueheng Li 🔗
-	Quantitative Bounds for Length Generalization in Transformers ( Poster ) > link Link	Zachary Izzo · Eshaan Nichani · Jason Lee 🔗
-	Lyapunov Learning at the Onset of Chaos ( Poster ) > link Link	Alessandro Londei · Denise Lanzieri · Matteo Benati · Vittorio Loreto 🔗
-	Understanding Mamba in In-Context Learning with Outliers: A Theoretical Generalization Analysis ( Poster ) > link Link	Hongkang Li · Songtao Lu · Xiaodong Cui · Pin-Yu Chen · Meng Wang 🔗
-	Input differentiation via negative computation ( Poster ) > link Link	Linghao Kong · Angelina Ning · Nir Shavit 🔗
-	Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy ( Poster ) > link Link	Karthik Viswanathan · Sangeon Park 🔗
-	Exploring L2-Phase Transitions on Error Landscapes ( Poster ) > link Link	Ibrahim Ersoy · Karoline Wiesner 🔗
-	Generalisation and Safety Critical Evaluations at Sharp Minima: A Geometric Reappraisal ( Poster ) > link Link	Israel Mason-Williams · Gabryel Mason-Williams · Helen Yannakoudakis 🔗
-	Learning curves theory of hierarchically compositional data with power-law distributed features ( Poster ) > link Link	Francesco Cagnetta · Hyunmo Kang · Matthieu Wyart 🔗
-	Tracing the representation geometry of language models from pretraining to post-training ( Poster ) > link Link	Melody Li · Kumar Agrawal · Arna Ghosh · Komal Teru · Guillaume Lajoie · Blake Richards 🔗
-	On Generalization of Spectral Gradient Descent: A Case Study on Imbalanced Data ( Poster ) > link Link	Bhavya Vasudeva · Puneesh Deora · Christos Thrampoulidis 🔗
-	Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting ( Poster ) > link Link	Jiping Li · Rishi Sonthalia 🔗
-	Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting ( Oral ) > link Link	Jiping Li · Rishi Sonthalia 🔗
-	Exploration Behavior of Untrained Policies ( Poster ) > link Link	Jacob Adamczyk 🔗
-	Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs) ( Poster ) > link Link	Artem Riabinin · Egor Shulgin · Kaja Gruntkowska · Peter Richtarik 🔗
-	Spectral Dynamics of Contrastive Learning with Spurious Features ( Poster ) > link Link	Naghmeh Ghanooni · Dennis Wagner · Waleed Mustafa · Anthony Lin · Sophie Fellenz · Marius Kloft 🔗
-	When Can You Get Away with Low Memory Adam? ( Poster ) > link Link	Dayal Singh Kalra · John Kirchenbauer · Maissam Barkeshli · Tom Goldstein 🔗
-	Benignity of loss landscape with weight decay requires both large overparametrization and initialization ( Poster ) > link Link	Etienne Boursier · Matthew Bowditch · Matthias Englert · Ranko Lazic 🔗
-	Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity ( Poster ) > link Link	George Giapitzakis · Artur Back de Luca · Kimon Fountoulakis 🔗
-	Jacobian Alignment Explains Grokking and Centroid Alignment Identifies It ( Poster ) > link Link	Thomas Walker · Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk 🔗
-	Fundamental Limits of Learning Single-Index Models under Structured Data ( Poster ) > link Link	Jivan Waber · Alireza Mousavi-Hosseini · Murat Erdogdu 🔗
-	Fundamental Limits of Learning Single-Index Models under Structured Data ( Oral ) > link Link	Jivan Waber · Alireza Mousavi-Hosseini · Murat Erdogdu 🔗
-	Understanding Normalization Layers for Sparse Training ( Poster ) > link Link	Mohammed Adnan · Ekansh Sharma · Rahul G. Krishnan · Yani Ioannou 🔗
-	From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD ( Poster ) > link Link	Konstantinos C Tsiolis · Alireza Mousavi-Hosseini · Murat Erdogdu 🔗