Joint Image and Word Sense Discrimination For Image Retrieval
Abstract
We study the task of learning to rank images given a text
query, a problem that is complicated by the issue of multiple senses. That
is, the senses of interest are typically the visually distinct concepts that
a user wishes to retrieve. In this paper, we propose to learn a ranking
function that optimizes the ranking cost of interest and simultaneously
discovers the disambiguated senses of the query that are optimal for the
supervised task. Note that no supervised information is given about the
senses. Experiments performed on web images and the ImageNet dataset
show that using our approach leads to a clear gain in performance.