Deploying Google Search by Voice in Cantonese
Venue
12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868
Publication Year
2011
Authors
Yun-hsuan Sung, Martin Jansche, Pedro Moreno
BibTeX
Abstract
We describe our efforts in deploying Google search by voice for Cantonese, a
southern Chinese dialect widely spoken in and around Hong Kong and Guangzhou. We
collected audio data from local Cantonese speakers in Hong Kong and Guangzhou by
using our DataHound smartphone application. This data was used to create
appropriate acoustic models. Language models were trained on anonymized query logs
from Google Web Search for Hong Kong. Because users in Hong Kong frequently mix
English and Cantonese in their queries, we designed our system from the ground up
to handle both languages. We report on experiments with different techniques for
mapping the phoneme inventories for both languages into a common space. Based on
extensive experiments we report word error rates and web scores for both Hong Kong
and Guangzhou data. Cantonese Google search by voice was launched in December 2010.
