Jump to Content
Tom Ouyang

Tom Ouyang

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models. For latency reasons, these operations happen on device, so the models are of limited size and cannot easily cover all the words needed by users for their daily tasks, especially in morphologically rich languages. In particular, the compounding nature of Germanic languages makes their vocabulary virtually infinite. Similarly, heavily inflecting and agglutinative languages (e.g. Slavic, Turkic or Finno-Ugric languages) tend to have much larger vocabularies than morphologically simpler languages, such as English or Mandarin. We propose to model such languages with automatically selected subword units annotated with what we call binding types, allowing the decoder to know when to bind subword units into words. We show that this method brings around 20% word error rate reduction in a variety of compounding languages. This is more than twice the improvement we previously obtained with a more basic approach, also described in the paper. View details
    Preview abstract We demonstrate that a character-level LSTM neural network is able to learn out-of-vocabulary (OOV) words for the purpose of expanding the vocabulary of a virtual keyboard for smartphones. We train such a model using a distributed, on-device learning framework called federated learning. High-frequency words can then be sampled from the generative model by drawing from the joint posterior directly. We study the feasibility of the approach in three different settings: (1) using stochastic gradient descent, on an anonymized dataset of snippets of user content; (2) using simulated federated learning, on a publicly available non-IID per-user dataset from a popular social networking website; (3) using federated learning, on data hosted on user mobile devices. The model is shown to achieve good recall and precision when compared to ground-truth OOV words in settings (1) and (2). With (3) we demonstrate the practicality of this approach by showing that we can learn meaningful OOV words without exporting sensitive user data to servers. View details
    Transliterated mobile keyboard input via weighted finite-state transducers
    Lars Hellsten
    Prasoon Goyal
    Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP) (2017)
    Preview abstract We present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline transliteration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017. View details
    Effects of Language Modeling and its Personalization on Touchscreen Typing Performance
    Andrew Fowler
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), ACM, New York, NY, USA, pp. 649-658
    Preview abstract Modern smartphones correct typing errors and learn userspecific words (such as proper names). Both techniques are useful, yet little has been published about their technical specifics and concrete benefits. One reason is that typing accuracy is difficult to measure empirically on a large scale. We describe a closed-loop, smart touch keyboard (STK) evaluation system that we have implemented to solve this problem. It includes a principled typing simulator for generating human-like noisy touch input, a simple-yet-effective decoder for reconstructing typed words from such spatial data, a large web-scale background language model (LM), and a method for incorporating LM personalization. Using the Enron email corpus as a personalization test set, we show for the first time at this scale that a combined spatial/language model reduces word error rate from a pre-model baseline of 38.4% down to 5.7%, and that LM personalization can improve this further to 4.6%. View details
    Long-Short Term Memory Neural Network for Keyboard Gesture Recognition
    Thomas Breuel
    Johan Schalkwyk
    International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
    Preview
    Both Complete and Correct? Multi-Objective Optimization of Touchscreen Keyboard
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2014), ACM, New York, NY, USA, pp. 2297-2306
    Preview
    Making touchscreen keyboards adaptive to keys, hand postures, and individuals: a hierarchical spatial backoff model approach
    Ying Yin
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), ACM, New York, NY, pp. 2775-2784
    Preview
    Bimanual gesture keyboard
    Proceeding of UIST 2012 – The ACM Symposium on User Interface Software and Technology, ACM, New York, NY, USA, pp. 137-146
    Preview abstract Gesture keyboards represent an increasingly popular way to input text on mobile devices today. However, current gesture keyboards are exclusively unimanual. To take advantage of the capability of modern multi-touch screens, we created a novel bimanual gesture text entry system, extending the gesture keyboard paradigm from one finger to multiple fingers. To address the complexity of recognizing bimanual gesture, we designed and implemented two related interaction methods, finger-release and space-required, both based on a new multi-stroke gesture recognition algorithm. A formal experiment showed that bimanual gesture behaviors were easy to learn. They improved comfort and reduced the physical demand relative to unimanual gestures on tablets. The results indicated that these new gesture keyboards were valuable complements to unimanual gesture and regular typing keyboards. View details
    ChemInk: A Natural Real-Time Recognition System for Chemical Drawings
    Randall Davis
    Proceedings of the International Conference on Intelligent User Interfaces (IUI 2011), ACM, New York, NY, pp. 267-276
    Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition
    Randall Davis
    Advances in Neural Information Processing Systems 22 (NIPS) (2009), pp. 1401-1409
    A Visual Approach to Sketched Symbol Recognition
    Randall Davis
    Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI) (2009), pp. 1463-1468
    Recognition of Hand Drawn Chemical Diagrams
    Randall Davis
    Proceedings of the 22nd National Conference on Artificial intelligence (AAAI) (2007), pp. 846-851
    Strategy Variations in Analogical Problem Solving
    Kenneth Forbus
    Proceedings of the 21st National Conference on Artificial intelligence (AAAI) (2006), pp. 446-451