Whereas theoretical work suggests that deep architectures might be more efficient
at representing highly-varying functions, training deep architectures was
unsuccessful until the recent advent of algorithms based on unsupervised
pre-training. Even though these new algorithms have enabled training deep models,
many questions remain as to the nature of this difficult learning problem.
Answering these questions is important if learning in deep architectures is to be
further improved. We attempt to shed some light on these questions through
extensive simulations. The experiments confirm and clarify the advantage of
unsupervised pre-training. They demonstrate the robustness of the training
procedure with respect to the random initialization, the positive effect of
pre-training in terms of optimization and its role as a regularizer. We empirically
show the influence of pre-training with respect to architecture depth, model
capacity, and number of training examples.