Proceedings of the 11th International Workshop on Semantic
Evaluation (SemEval-2017), Association for Computational Linguistics,
Vancouver, Canada, pp. 1-14 (to appear)
Daniel Cer, Mona Diab, Eneko
Agirre, Iñigo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
Applications include machine translation (MT), summarization, generation, question
answering (QA), short answer grading, semantic search, dialog and conversational
systems. The STS shared task is a venue for assessing the current state-of-the-art.
The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track
exploring MT quality estimation (MTQE) data. The task obtained strong participation
from 31 teams, with 17 participating in all language tracks. We summarize
performance and review a selection of well performing methods. Analysis highlights
common errors, providing insight into the limitations of existing models. To
support ongoing work on semantic representations, the STS Benchmark is
introduced as a new shared training and evaluation set carefully selected from the
corpus of English STS shared task data (2012-2017).