Publication Data
How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach
Abstract: The difficulties involved in spelling error detection and
correction in a language have been investigated in this work through the
conceptualization of SpellNet - a weighted network of words, where edges indicate
orthographic proximity between two words. We construct SpellNets for three languages -
Bengali, English and Hindi. Through appropriate mathematical analysis and/or intuitive
justification, we interpret the different topological metrics of SpellNet from the
perspective of the issues related to spell-checking. We make many interesting
observations, the most significant being that the probability of making a read word
error in a language is proportionate to the average weighted degree of SpellNet, which
is found to be highest for Hindi, followed by Bengali and English.
