We approach structured output prediction by learning a deep value network (DVN)
that evaluates different output structures for a given input. For example, when
applied to image segmentation, the value network takes an image and a segmentation
mask as inputs and predicts a scalar score evaluating the mask quality and its
correspondence with the image. Once the value network is optimized, at inference,
it finds output structures that maximize the score of the value net via gradient
descent on continuous relaxations of structured outputs. Thus DVN takes advantage
of the joint modeling of the inputs and outputs. Our framework applies to a wide
range of structured output prediction problems. We conduct experiments on
multi-label classification based on text data and on image segmentation problems.
DVN outperforms several strong baselines and the state-of-the-art results on these
benchmarks. In addition, on image segmentation, the proposed deep value network
learns complex shape priors and effectively combines image information with the
prior to obtain competitive segmentation results.