Publication Data
Large Scale Visual Semantic Extraction
Abstract: Image annotation is the task of providing textual semantic
to new images, by ranking a large set of possible annotations according to how they
correspond to a given image. In the large scale setting, there could be millions of
images to process and hundreds of thousands of potential distinct annotations. In order
to achieve such a task we propose to build a so-called "embedding space", into which
both images and annotations can be automatically projected. In such a space, one can
then find the nearest annotations to a given image, or annotations similar to a given
annotation. One can even build a visio-semantic tree from these annotations, that
corresponds to how concepts (annotations) are similar to each other with respect to
their visual characteristics. Such a tree will be different from semantic-only trees,
such as WordNet, which do not take into account the visual appearance of concepts.
