# Nonlinear random matrix theory for deep learning

### Venue

NIPS (2017) (to appear)

### Publication Year

2017

### Authors

Jeffrey Pennington, Pratik Worah

### BibTeX

## Abstract

Neural network configurations with random weights play an important role in the
analysis of deep learning. They define the initial loss landscape and are closely
related to kernel and random feature methods. Despite the fact that these networks
are built out of random matrices, the vast and powerful machinery of random matrix
theory has so far found limited success in studying them. A main obstacle in this
direction is that neural networks are nonlinear, which prevents the straightforward
utilization of many of the existing mathematical results. In this work, we open the
door for direct applications of random matrix theory to deep learning by
demonstrating that the pointwise nonlinearities typically applied in neural
networks can be incorporated into a standard method of proof in random matrix
theory known as the moments method. The test case for our study is the Gram matrix
$Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data
matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit
representation for the trace of the resolvent of this matrix, which defines its
limiting spectral distribution. We apply these results to the computation of the
asymptotic performance of single-layer random feature networks on a memorization
task and to the analysis of the eigenvalues of the data covariance matrix as it
propagates through a neural network. As a byproduct of our analysis, we identify an
intriguing new class of activation functions with favorable properties.