Proper Name Transcription/Transliteration with ICU Transforms
Venue
34th Internationalization & Unicode Conference (2010)
Publication Year
2010
Authors
Sascha Brawer, Martin Jansche, Hiroshi Takenaka, Yui Terashima
BibTeX
Abstract
We describe our experience with a deep localization of Google Maps™, where millions
of geographic names from diverse origins had to be represented in several target
languages, including Russian, Mandarin, and Japanese. For example, a map of Western
Europe on maps.google.co.jp shows Japanese labels for almost all labeled features.
We tackle the problem of transliterating from several source languages into several
target languages by pivoting through an explicit intermediate phonetic
representation. Each transliteration scheme is implemented as a sequence of ICU
transforms, reusing a few existing transforms from ICU and CLDR, but consisting
mostly of transforms that we wrote specifically for this problem. Dividing the
problem this way results in many reusable components that make it simple to
transliterate between multiple languages. We discuss the steps that go into
building transliteration rules, describe existing official and de facto standards
and guidelines, and give suggestions for what to do when no consistent guidelines
are available. We provide general recommendations for developing and testing custom
ICU transforms.
