This contribution develops a theoretical framework that takes into account the
effect of approximate optimization on learning algorithms. The analysis shows
distinct tradeoffs for the case of small-scale and large-scale learning problems.
Small-scale learning problems are subject to the usual approximation– estimation
tradeoff. Large-scale learning problems are subject to a qualitatively different
tradeoff involving the computational complexity of the underlying optimization
algorithms in non-trivial ways. For instance, a mediocre optimization algorithms,
stochastic gradient descent, is shown to perform very well on large-scale learning