This paper presents a novel learning scenario which combines dimensionality
reduction, supervised learning as well as kernel selection. We carefully define the
hypothesis class that addresses this setting and provide an analysis of its
Rademacher complexity and thereby provide generalization guarantees. The proposed
algorithm uses KPCA to reduce the dimensionality of the feature space, i.e. by
projecting data onto top eigenvectors of covariance operator in a kernel
reproducing space. Moreover, it simultaneously learns a linear combination of base
kernel functions, which defines a reproducing space, as well as the parameters of a
supervised learning algorithm in order to minimize a regularized empirical loss.
The bound on Rademacher complexity of our hypothesis is shown to be logarithmic in
the number of base kernels, which encourages practitioners to combine as many base
kernels as possible.