A Neural Architecture for Dialectal Arabic Segmentation
The Third Arabic Natural Language Processing Workshop
(WANLP), Valencia, Spain (2017), pp. 46-54
Younes Samih, Mohammed Attia, Mohamed
Eldesouki, Hamdy Mubarak, Ahmed Abdelali, Laura Kallmeyer, Kareem Darwish
The automated processing of Arabic dialects is challenging due to the lack of
spelling standards and the scarcity of annotated data and resources in general.
Segmentation of words into their constituent tokens is an important processing step
for natural language processing. In this paper, we show how a segmenter can be
trained on only 350 annotated tweets using neural networks without any
normalization or reliance on lexical features or linguistic resources. We deal with
segmentation as a sequence labeling problem at the character level. We show
experimentally that our model can rival state-of-the-art methods that heavily
depend on additional resources.