ICML Poster Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Poster

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Max Milkert · David Hyde · Forrest Laine

East Exhibition Hall A-B #E-2000

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth.However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large.To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization.This approach allows us to learn approximations of convex, one-dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts.We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.

Lay Summary:

An artificial neural network is often compared to a brain, where a brain has neurons and synapses, an artificial neural network has parameters - numbers that govern its behavior. It is common practice for these parameters to be set randomly, and then updated to maximize the network's performance on a task, as if the network is learning. Setting the parameters completely at random causes those located deeper in the network to be used inefficiently, and this is hard to correct through learning alone. The method we develop in this paper constrains parameter values both when they are initialized, and throughout the training process, guiding the network to a solution that uses deep parameters effectively.Extending these ideas will hopefully enable dramatic reductions in the size, energy, and computational cost of neural networks.

Chat is not available.