Publication Data
Proper Name Transcription/Transliteration with ICU Transforms
Abstract: We describe our experience with a deep localization of
Google Maps™, where millions of geographic names from diverse origins had to be
represented in several target languages, including Russian, Mandarin, and Japanese. For
example, a map of Western Europe on maps.google.co.jp shows Japanese labels for almost
all labeled features. We tackle the problem of transliterating from several source
languages into several target languages by pivoting through an explicit intermediate
phonetic representation. Each transliteration scheme is implemented as a sequence of
ICU transforms, reusing a few existing transforms from ICU and CLDR, but consisting
mostly of transforms that we wrote specifically for this problem. Dividing the problem
this way results in many reusable components that make it simple to transliterate
between multiple languages. We discuss the steps that go into building transliteration
rules, describe existing official and de facto standards and guidelines, and give
suggestions for what to do when no consistent guidelines are available. We provide
general recommendations for developing and testing custom ICU transforms.
