Publication Data
The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
Abstract: Whereas theoretical work suggests that deep architectures
might be more efficient at representing highly-varying functions, training deep
architectures was unsuccessful until the recent advent of algorithms based on
unsupervised pre-training. Even though these new algorithms have enabled training deep
models, many questions remain as to the nature of this difficult learning problem.
Answering these questions is important if learning in deep architectures is to be
further improved. We attempt to shed some light on these questions through extensive
simulations. The experiments confirm and clarify the advantage of unsupervised
pre-training. They demonstrate the robustness of the training procedure with respect to
the random initialization, the positive effect of pre-training in terms of optimization
and its role as a regularizer. We empirically show the influence of pre-training with
respect to architecture depth, model capacity, and number of training examples.
