Poster
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$ Parametrization
Zixiang Chen · Greg Yang · Qingyue Zhao · Quanquan Gu
East Exhibition Hall A-B #E-2310
Artificial intelligence systems called neural networks have achieved remarkable success in tasks like image recognition and language processing. However, scientists still don't fully understand why these systems work so well. A key puzzle is whether neural networks can simultaneously do two important things: learn useful patterns from data (called "feature learning") and find the best possible solution to a problem (called "global optimization").Previous approaches for deep L-layer networks either allow networks to learn patterns but have little understanding about global optimization, or guarantee finding good solutions but prevent meaningful learning. Our research resolves this puzzle by studying a specific way of setting up neural networks called "Maximal Update Parametrization". We prove mathematically that when networks are made very wide and trained using this approach, they can indeed do both things at once: they learn rich, meaningful patterns from data while also finding globally optimal solutions.We validate our theory through experiments showing how networks maintain diverse, independent features throughout training. This work provides new theoretical foundations for understanding why certain AI training methods work better than others, potentially informing the design of more effective AI systems.