Video Object Discovery and Co-segmentation with Extremely Weak Supervision
Venue
Proceedings of European Conference on Computer Vision (2014)
Publication Year
2014
Authors
Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Nanning Zheng
BibTeX
Abstract
Video object co-segmentation refers to the problem of simultaneously segmenting a
common category of objects from multiple videos. Most existing video
co-segmentation methods assume that all frames from all videos contain the target
objects. Unfortunately, this assumption is rarely true in practice, particularly
for large video sets, and existing methods perform poorly when the assumption is
violated. Hence, any practical video object co-segmentation algorithm needs to
identify the relevant frames containing the target object from all videos, and then
co-segment the object only from these relevant frames. We present a spatiotemporal
energy minimization formulation for simultaneous video object discovery and
co-segmentation across multiple videos. Our formulation incorporates a
spatiotemporal auto-context model, which is combined with appearance modeling for
superpixel labeling. The superpixel-level labels are propagated to the frame level
through a multiple instance boosting algorithm with spatial reasoning
(Spatial-MILBoosting), based on which frames containing the video object are
identified. Our method only needs to be bootstrapped with the frame-level labels
for a few video frames (e.g., usually 1 to 3) to indicate if they contain the
target objects or not. Experiments on three datasets validate the efficacy of our
proposed method, which compares favorably with the state-of-the-art.
