Training ultra-deep CNNs with critical initialization

Lechao Xiao

Yasaman Bahri

Sam Schoenholz

Jeffrey Pennington

NIPS Workshop (2017)

Google Scholar

Abstract

In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing 1000 layers or more. Optimizing networks of such depth is extremely challenging and has up until now been possible only when the architectures incorporate special residual connections and batch normalization. In this work, we demonstrate that it is possible to train vanilla CNNs of depth 1500 or more simply by careful choice of initialization. We derive this initialization scheme theoretically, by developing a mean field theory for the dynamics of signal propagation in random CNNs with circular boundary conditions. We show that the order-to-chaos phase transition of such CNNs is similar to that of fully-connected networks, and we provide empirical evidence that ultra-deep vanilla CNNs are trainable if the weights and biases are initialized near the order-to-chaos transition.

Research Areas

Machine Perception

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Training ultra-deep CNNs with critical initialization

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Training ultra-deep CNNs with critical initialization

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities