Many estimation tasks come in groups and hierarchies of related problems. In this
paper we propose a hierarchical model and a scalable algorithm to perform inference
for multitask learning. It infers task correlation and subtask structure in a joint
sparse setting. Implementation is achieved by a distributed subgradient oracle and
the successive application of prox-operators pertaining to groups and sub-groups of
variables. We apply this algorithm to conversion optimization in display
advertising. Experimental results on over 1TB data for up to 1 billion observations
and 1 million attributes show that the algorithm provides significantly better
prediction accuracy while simultaneously being efficiently scalable by distributed
parameter synchronization.