Jump to Content

Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

Younes Samih
Suraj Maharjan
Laura Kallmeyer
Thamar Solorio
Proceedings of the Second Workshop on Computational Approaches to Code Switching,, Austin, TX (2016), pp. 50-59

Abstract

This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second Workshop on Computational Approaches to Code Switching. Our system ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second place for Spanish-English with an F1-score of 0.90. The HHU-UHG system introduces a novel unified neural network architecture for language identification in code-switched tweets for both Spanish-English and MSA-Egyptian dialect. The system makes use of word and character level representations to identify code-switching. For the MSA-Egyptian dialect the system does not rely on any kind of language-specific knowledge or linguistic resources such as, Part Of Speech (POS) taggers, morphological analyzers, gazetteers or word lists to obtain state-of-the-art performance.