Shumin Zhai
Shumin Zhai is a Human-Computer Interaction research scientist at Google where he leads and directs research, design, and development of input methods and haptics systems on Google’s and its partner’s flagship products. His research career has contributed to foundational models and understandings of human-computer interaction as well as practical user interface inventions and products based on his scientific and technical insights. He originated and led the SHARK/ShapeWriter project at IBM Research and a start-up company that pioneered the touchscreen word-gesture keyboard paradigm, filing the first patents of this paradigm, publishing the first generation of scientific papers, releasing the first word-gesture keyboard in 2004 and a top ranked (6th) iPhone app called ShapeWriter WritingPad in 2008. His publications have won the ACM UIST Lasting Impact Award and a IEEE Computer Society Best Paper Award, among others. He served as the 4th Editor-in-Chief of ACM Transactions on Computer-Human Interaction, and frequently contributes to other academic boards and program committees. He received his Ph.D. degree at the University of Toronto in 1995. In 2006, he was selected as one of ACM's inaugural class of Distinguished Scientists. In 2010 he was named Member of the CHI Academy and Fellow of the ACM.
His external web page is at www.shuminzhai.com.
Authored Publications
Google Publications
Other Publications
Sort By
TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile Input
Michael Xuelin Huang
Nazneen Nazneen
Alex Chao
ACM CHI Conference on Human Factors in Computing Systems, ACM (2021)
Preview abstract
Off-screen interaction offers great potential for one-handed and eyes-free mobile interaction. While a few existing studies have explored the built-in mobile phone sensors to sense off-screen signals, none met practical requirement. This paper discusses the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone using built-in accelerometer and gyroscope. With sensor location as auxiliary information, TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location. We developed four datasets consisting of over 180K training samples, 38K testing samples, and 87 participants in total. Experimental evaluation demonstrated the effectiveness of the TapNet design and its significant improvement over the state of the art. Along with the datasets, codebase, and extensive experiments, TapNet establishes a new technical foundation for off-screen mobile input.
View details
Active Edge: Designing Squeeze Gestures for the Google Pixel 2
Claire Lee
Melissa Barnhart
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, 274:1-274:13
Preview abstract
Active Edge is a feature of Google Pixel 2 smartphone devices that creates a force-sensitive interaction surface along their sides, allowing users to perform gestures by holding and squeezing their device. Supported by strain gauge elements adhered to the inner sidewalls of the device chassis, these gestures can be more natural and ergonomic than on-screen (touch) counterparts. Developing these interactions is an integration of several components: (1) an insight and understanding of the user experiences that benefit from squeeze gestures; (2) hardware with the sensitivity and reliability to sense a user's squeeze in any operating environment; (3) a gesture design that discriminates intentional squeezes from innocuous handling; and (4) an interaction design to promote a discoverable and satisfying user experience. This paper describes the design and evaluation of Active Edge in these areas as part of the product's development and engineering.
View details
i’sFree: Eyes-Free Gesture Typing via a Touch-Enabled Remote Control
Suwen Zhu
Xiaojun Bi
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 448:1-448:12 (to appear)
Preview abstract
Entering text without having to pay attention to the keyboard is compelling but challenging due to the lack of visual guidance. We propose i'sFree to enable eyes-free gesture typing on a distant display from a touch-enabled remote control. i'sFree does not display the keyboard or gesture trace but decodes gestures drawn on the remote control into text according to an invisible and shifting Qwerty layout. i'sFree decodes gestures similar to a general gesture typing decoder, but learns from the instantaneous and historical input gestures to dynamically adjust the keyboard location. We designed it based on the understanding of how users perform eyes-free gesture typing. Our evaluation shows eyes-free gesture typing is feasible: reducing visual guidance on the distant display hardly affects the typing speed. Results also show that the i’sFree gesture decoding algorithm is effective, enabling an input speed of 23 WPM, 46% faster than the baseline eyes-free condition built on a general gesture decoder. Finally, i'sFree is easy to learn: participants reached 22 WPM in the first ten minutes, even though 40% of them were first-time gesture typing users.
View details
Preview abstract
Word–Gesture keyboards allow users to enter text using continuous input strokes (also known as gesture typing or shape writing). We developed a production model of gesture typing input based on a human motor control theory of optimal control (specifically, modeling human drawing movements as a minimization of jerk—the third derivative of position). In contrast to existing models, which consider gestural input as a series of concatenated aiming movements and predict a user’s time performance, this descriptive theory of human motor control predicts the shapes and trajectories that users will draw. The theory is supported by an analysis of user-produced gestures that found qualitative and quantitative agreement between the shapes users drew and the minimum jerk theory of motor control. Furthermore, by using a small number of statistical via-points whose distributions reflect the sensorimotor noise and speed–accuracy trade-off in gesture typing, we developed a model of gesture production that can predict realistic gesture trajectories for arbitrary text input tasks. The model accurately reflects features in the figural shapes and dynamics observed from users and can be used to improve the design and evaluation of gestural input systems.
View details
M3 Gesture Menu: Design and Experimental Analyses of Marking Menus for Touchscreen Mobile Interaction
Kun Li
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 249:1-249:14
Preview abstract
Despite their learning advantages in theory, marking menus have faced adoption challenges in practice, even on today's touchscreen-based mobile devices. We address these challenges by designing, implementing, and evaluating multiple versions of M3 Gesture Menu (M3), a reimagination of marking menus targeted at mobile interfaces. M3 is defined on a grid rather than in a radial space, relies on gestural shapes rather than directional marks, and has constant and stationary space use. Our first controlled experiment on expert performance showed M3 was faster and less error-prone by a factor of two than traditional marking menus. A second experiment on learning demonstrated for the first time that users could successfully transition to recall-based execution of a dozen commands after three ten-minute practice sessions with both M3 and Multi-Stroke Marking Menu. Together, M3, with its demonstrated resolution, learning, and space use benefits, contributes to the design and understanding of menu selection in the mobile-first era of end-user computing.
View details
A Cost–Benefit Study of Text Entry Suggestion Interaction
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, pp. 83-88
Preview abstract
Mobile keyboards often present error corrections and word completions (suggestions) as candidates for anticipated user input. However, these suggestions are not cognitively free: they require users to attend, evaluate, and act upon them. To understand this trade-off between suggestion savings and interaction costs, we conducted a text transcription experiment that controlled interface assertiveness: the tendency for an interface to present itself. Suggestions were either always present (extraverted), never present (introverted), or gated by a probability threshold (ambiverted). Results showed that although increasing the assertiveness of suggestions reduced the number of keyboard actions to enter text and was subjectively preferred, the costs of attending to and using the suggestions impaired average time performance.
View details
Optimizing Touchscreen Keyboards for Gesture Typing
Brian Smith
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), ACM, New York, NY, USA, pp. 3365-3374
Preview abstract
Despite its growing popularity, gesture typing suffers from a major problem not present in touch typing: gesture ambiguity on the Qwerty keyboard. By applying rigorous mathematical optimization methods, this paper systematically investigates the optimization space related to the accuracy, speed, and Qwerty similarity of a gesture typing keyboard. Our investigation shows that optimizing the layout for gesture clarity (a metric measuring how unique word gestures are on a keyboard) drastically improves the accuracy of gesture typing. Moreover, if we also accommodate gesture speed, or both gesture speed and Qwerty similarity, we can still reduce error rates by 52% and 37% over Qwerty, respectively. In addition to investigating the optimization space, this work contributes a set of optimized layouts such as GK-D and GK-T that can immediately benefit mobile device users.
View details
Effects of Language Modeling and its Personalization on Touchscreen Typing Performance
Andrew Fowler
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), ACM, New York, NY, USA, pp. 649-658
Preview abstract
Modern smartphones correct typing errors and learn userspecific
words (such as proper names). Both techniques are useful, yet little has been published about their technical specifics and concrete benefits. One reason is that typing accuracy is difficult to measure empirically on a large scale. We describe a closed-loop, smart touch keyboard (STK) evaluation system that we have implemented to solve this problem. It includes a principled typing simulator for generating human-like noisy touch input, a simple-yet-effective decoder for reconstructing typed words from such spatial data, a large web-scale background language model (LM), and a method for incorporating LM
personalization. Using the Enron email corpus as a personalization test set, we show for the first time at this scale that a combined spatial/language model reduces word error rate from a pre-model baseline of 38.4% down to 5.7%, and that LM personalization can improve this further to 4.6%.
View details
Long-Short Term Memory Neural Network for Keyboard Gesture Recognition
Preview
Thomas Breuel
Johan Schalkwyk
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Both Complete and Correct? Multi-Objective Optimization of Touchscreen Keyboard
Preview
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2014), ACM, New York, NY, USA, pp. 2297-2306
FFitts Law: Modeling Finger Touch with Fitts’ Law
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), ACM, New York, NY, USA, pp. 1363-1372
Preview abstract
Fitts’ law has proven to be a strong predictor of pointing performance under a wide range of conditions. However, it has been insufficient in modeling small-target acquisition with finger-touch based input on screens. We propose a dual-distribution hypothesis to interpret the distribution of the endpoints in finger touch input. We hypothesize the movement endpoint distribution as a sum of two independent normal distributions. One distribution reflects the relative precision governed by the speed-accuracy tradeoff rule in the human motor system, and the other captures the absolute precision of finger touch independent of the speed-accuracy tradeoff effect. Based on this hypothesis, we derived the FFitts model—an expansion of Fitts’ law for finger touch input. We present three experiments in 1D target acquisition, 2D target acquisition and touchscreen keyboard typing tasks respectively. The results showed that FFitts law is more accurate than Fitts’ law in modeling finger input on touchscreens. At 0.91 or a greater R2 value, FFitts’ index of difficulty is able to account for significantly more variance than conventional Fitts’ index of difficulty based on either a nominal target width or an effective target width in all the three experiments.
View details
Making touchscreen keyboards adaptive to keys, hand postures, and individuals: a hierarchical spatial backoff model approach
Preview
Ying Yin
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), ACM, New York, NY, pp. 2775-2784
Octopus: Evaluating Touchscreen Keyboard Correction and Recognition Algorithms via “Remulation”
Shiri Azenkot
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), ACM, New York, NY, USA, pp. 543-552
Preview abstract
The time and labor demanded by a typical laboratory-based keyboard evaluation are limiting resources for algorithmic adjustment and optimization. We propose Remulation, a complementary method for evaluating touchscreen keyboard correction and recognition algorithms. It replicates prior user study data through real-time, on-device simulation. To demonstrate remulation, we have developed Octopus, an evaluation tool that enables keyboard developers to efficiently measure and inspect the impact of algorithmic changes without conducting resource-intensive user studies. It can also be used to evaluate third-party keyboards in a “black box” fashion, without access to their algorithms or source code. Octopus can evaluate both touch keyboards and word-gesture keyboards. Two empirical examples show that Remulation can efficiently and effectively measure many aspects of touch screen keyboards at both macro and micro levels. Additionally, we contribute two new metrics to measure keyboard accuracy at the word level: the Ratio of Error Reduction (RER) and the Word Score.
View details
Bayesian Touch - A Statistic Criterion of Target Selection with Finger Touch
Proceedings of UIST 2013 – The ACM Symposium on User Interface Software and Technology, ACM, New York, NY, USA, pp. 51-60
Preview abstract
To improve the accuracy of target selection for finger touch, we conceptualize finger touch input as an uncertain process, and derive a statistical target selection riterion, Bayesian Touch Criterion, from combining the basic Bayes’ rule of probability with the generalized dual Gaussian distribution hypothesis of finger touch. Bayesian Touch Criterion states that the selected target is the candidate with the shortest Bayesian Touch Distance to the touch point, which is computed from the touch point to target center distance and the size of the target. We give the derivation of the Bayesian touch criterion and its empirical
evaluation with two experiments. The results show for 2D circular target selection, Bayesian Touch Criterion is
significantly more accurate than the commonly used Visual Boundary Criterion (i.e., a target is selected if and only if the touch point falls within its boundary) and its two variants.
View details
Bimanual gesture keyboard
Proceeding of UIST 2012 – The ACM Symposium on User Interface Software and Technology, ACM, New York, NY, USA, pp. 137-146
Preview abstract
Gesture keyboards represent an increasingly popular way to input text on mobile devices today. However, current gesture keyboards are exclusively unimanual. To take advantage of the capability of modern multi-touch screens, we created a novel bimanual gesture text entry system, extending the gesture keyboard paradigm from one finger to multiple fingers. To address the complexity of recognizing bimanual gesture, we designed and implemented two related interaction methods, finger-release and space-required, both based on a new multi-stroke gesture recognition algorithm. A formal experiment showed that bimanual gesture behaviors were easy to learn. They improved comfort and reduced the physical demand relative to unimanual gestures on tablets. The results indicated that these new gesture keyboards were valuable complements to unimanual gesture and regular typing keyboards.
View details
Touch behavior with different postures on soft smartphone keyboards
Preview
Shiri Azenkot
Proceedings of the 14th international conference on Human-computer interaction with mobile devices and services (MobileHCI '12), ACM (2012), pp. 251-260
A Comparative Evaluation of Finger and Pen Stroke Gestures
Huawei Tu
Xiangshi Ren
ACM CHI 2012 Conference on Human Factors in Computing Systems, ACM, Austin, TX, pp. 1287-1296
Preview abstract
This paper reports an empirical investigation in which participants produced a set of stroke gestures with varying degrees of complexity and in different target sizes using both the finger and the pen. The recorded gestures were then analyzed according to multiple measures characterizing many aspects of stroke gestures. Our findings were as follows: (1) Finger drawn gestures were quite different to pen drawn gestures in basic measures including size ratio and average speed. Finger drawn gestures tended to be larger and faster than pen drawn gestures. They also differed in shape geometry as measured by, for example, aperture of closed gestures, corner shape distance and intersecting points deviation; (2) Pen drawn gestures and finger drawn gestures were similar in several measures including articulation time, indicative angle difference, axial symmetry and proportional shape distance; (3) There were interaction effects between gesture implement (finger vs. pen) and target gesture size and gesture complexity. Our findings show that half of the features we tested were performed well enough by the finger. This finding suggests that "finger friendly" systems should exploit these features when designing finger interfaces and avoid using the other features in which the finger does not perform as well as the pen.
View details
Foundational Issues in Touch-Surface Stroke Gesture Design — An Integrative Review
Per Ola Kristensson, Caroline Appert, Tue Haste Anderson, Xiang Cao
Foundations and Trends in Human–Computer Interaction, NOW (2012), pp. 97-205
Preview abstract
The advent of modern touchscreen devices has unleashed many opportunities and calls for innovative use of stroke gestures as a richer interaction medium. A significant body of knowledge on stroke gesture design is scattered throughout the Human-Computer Interaction research literature. Primarily based on the authors' own decade-long gesture user interface (UI) research which launched the word-gesture keyboard paradigm, Foundational Issues in Touch-Surface Stroke Gesture Design - An Integrative Review synthesizes some of the foundational issues of human motor control complexity, visual and auditory feedback, and memory and learning capacity concerning gesture user interfaces. In the second half of the book a set of gesture UI design principles is derived from the research literature. The book also covers system implementation aspects of gesture UI such as gesture recognition algorithms and design toolkits. Foundational Issues in Touch-Surface Stroke Gesture Design - An Integrative Review is an ideal primer for researchers and graduate students embarking on research in gesture interfaces. It is also an excellent reference for designers and developers who want to leverage insights and lessons learned in the academic research community.
View details
Preview abstract
Browsing a collection of information on a mobile device is a common task, yet it can be difficult due to the small size of mobile displays. A common trade-off offered by many current mobile interfaces is to allow users to switch between an overview and detailed views of particular items. An open question is how much preview of each item to include in the overview. Using a mobile email processing task, we attempted to answer that question. We investigated participants' email processing behaviors under differing preview conditions in a semi-controlled, naturalistic study. We collected log data of participants' actual behaviors as well as their subjective impressions of different conditions. Our results suggest that a moderate level of two to three lines of preview should be the default. The overall benefit of a moderate amount of preview was supported by both positive subjective ratings and fewer transitions between the overview and individual items.
View details
Smart Phone Use by Non-Mobile Business Users
Patti Bao
Jeffrey Pierce
Stephen Whittaker
MobileHCI 2011, ACM, Stockholm, Sweden, pp. 445-454
Preview abstract
The rapid increase in smart phone capabilities has introduced new opportunities for mobile information access and computing. However, smart phone use may still be constrained by both device affordances and work environments. To understand how current business users
employ smart phones and to identify opportunities for improving business smart phone use, we conducted two studies of actual and perceived performance of standard work tasks. Our studies involved 243 smart phone users from a large corporation. We intentionally chose users who
primarily work with desktops and laptops, as these “nonmobile” users represent the largest population of business users. Our results go beyond the general intuition that smart phones are better for consuming than producing information: we provide concrete measurements that show
how fast reading is on phones and how much slower and more effortful text entry is on phones than on computers. We also demonstrate that security mechanisms are a significant barrier to wider business smart phone use. We offer design suggestions to overcome these barriers.
View details
Multilingual Touchscreen Keyboard Design and Optimization
SHRIMP: solving collision and out of vocabulary problems in mobile predictive input with motion gesture
Quasi-qwerty soft keyboard optimization
Pen pressure control in trajectory-based interaction
Foundations for designing and evaluating user interfaces based on the crossing paradigm
“Writing with music”: Exploring the use of auditory feedback in gesture interfaces
Phone n' Computer: teaming up an information appliance with a PC
Min Yin
Jeffrey S. Pierce
Personal and Ubiquitous Computing, vol. 14 (2010), pp. 601-607
The performance of touch screen soft buttons
Using strokes as command shortcuts: cognitive benefits and toolkit support
Shapewriter on the iphone: from the laboratory to the real world
Per Ola Kristensson
Pengjun Gong
Michael Greiner
Shilei Allen Peng
Liang Mico Liu
Anthony Dunnigan
CHI Extended Abstracts (2009), pp. 2667-2670
On the ease and efficiency of human-computer interfaces
ETRA (2008), pp. 9-10
Interlaced QWERTY: accommodating ease of visual search and input flexibility in shape writing
Improving word-recognizers using an interactive lexicon with active and passive words
Command strokes with and without preview: using pen gestures on keyboard for command selection
Hard lessons: effort-inducing interfaces benefit spatial learning
Modeling human performance of pen stroke gestures
Learning shape writing by game playing
The benefits of augmenting telephone voice menu navigation with visual browsing and search
Camera phone based motion sensing: interaction techniques, applications and performance study
Relaxing stylus typing precision by geometric pattern matching
RealTourist - A Study of Augmenting Human-Human and Human-Computer Dialogue with Eye-Gaze Overlay
In search of effective text input interfaces for off the desktop computing
Per Ola Kristensson
Barton A. Smith
Interacting with Computers, vol. 17 (2005), pp. 229-250
Dial and see: tackling the voice menu navigation problem with cross-device user experience integration
Conversing with the user based on eye-gaze patterns
Introduction to sensing-based interaction
TNT: a numeric keypad based text input method
View size and pointing difficulty in multi-scale navigation
Speed-accuracy tradeoff in Fitts' law tasks-on the equivalency of actual and nominal pointing precision
Top-down learning strategies: can they facilitate stylus keyboard learning?
SHARK2: a large vocabulary shorthand writing system for pen-based computers
Human Action Laws in Electronic Virtual Worlds - An Empirical Study of Path Steering Performance in VR
Characterizing computer input with Fitts' law parameters-the information and non-information aspects of pointing
Int. J. Hum.-Comput. Stud., vol. 61 (2004), pp. 791-809
Human Movement Performance in Relation to Path Constraint - The Law of Steering in Locomotion
Refining Fitts' law models for bivariate pointing
Human on-line response to target expansion
Candidate Display Styles in Japanese Input
High precision touch screen interaction
Shorthand writing on stylus keyboard
What's in the eyes for attentive input
Commun. ACM, vol. 46 (2003), pp. 34-39
Collaboration Meets Fitts' Law: Passing Virtual Objects with and without Haptic Force Feedback
Movement model, hits distribution and learning in virtual keyboarding
More than dotting the i's - foundations for crossing-based interfaces
Scale effects in steering law tasks
Chinese input with keyboard and eye-tracking: an anatomical study
Gaze and Speech in Attentive User Interfaces
Paul P. Maglio
Teenie Matlock
Christopher S. Campbell
Barton A. Smith
ICMI (2000), pp. 1-7
Hand eye coordination patterns in target selection
The metropolis keyboard - an exploration of quantitative techniques for virtual keyboard design
In Search of the `Magic Carpet': Design and Experimentation of a Bimanual 3D Navigation Interface
Eser Kandogan
Barton A. Smith
Ted Selker
J. Vis. Lang. Comput., vol. 10 (1999), pp. 3-17
Keeping an Eye for HCI
Carlos Hitoshi Morimoto
David Koons
Arnon Amir
Myron Flickner
SIBGRAPI (1999), pp. 171-176
Performance Evaluation of Input Devices in Trajectory-Based Tasks: An Application of the Steering Law
Manual and Gaze Input Cascaded (MAGIC) Pointing
Multistream Input: An Experimental Study of Document Scrolling Methods
Quantifying Coordination in Multiple DOF Movement and Its Application to Evaluating 6 DOF Input Devices
Manual and Cognitive Benefits of Two-Handed Input: An Experimental Study
Andrea Leganchuk
William Buxton
ACM Trans. Comput.-Hum. Interact., vol. 5 (1998), pp. 326-359
Representation Matters: The Effect of 3D Objects and a Spatial Metaphor in a Graphical User Interface
Graphical Means of Directing User's Attention in the Visual Interface
Dual Stream Input for Pointing and Scrolling
Beyond Fitts' Law: Models for Trajectory-Based HCI Tasks
Anisotropic human performance in six degree-of-freedom tracking: an evaluation of three-dimensional display and control interfaces
Paul Milgram
Anu Rastogi
IEEE Transactions on Systems, Man, and Cybernetics, Part A, vol. 27 (1997), pp. 518-528
An Isometric Tongue Pointing Device
Improving Browsing Performance: A study of four input devices for scrolling and pointing tasks
The Partial-Occlusion Effect: Utilizing Semitransparency in 3D Human-Computer Interaction
William Buxton
Paul Milgram
ACM Trans. Comput.-Hum. Interact., vol. 3 (1996), pp. 254-284
The Influence of Muscle Groups on Performance of Multiple Degree-of-Freedom Input
The "silk cursor": investigating
Input techniques for HCI in 3D environments
The "Silk Cursor": investigating transparency for 3D target acquisition
Virtual Reality for Palmtop Computers
George W. Fitzmaurice
Mark H. Chignell
ACM Trans. Inf. Syst., vol. 11 (1993), pp. 197-218
An evaluation of four 6 degree-of-freedom input techniques
From Icons to Interface Models: Designing Hypermedia from the Bottom Up
John A. Waterworth
Mark H. Chignell
International Journal of Man-Machine Studies, vol. 39 (1993), pp. 453-472
ARGOS: a display system for augmenting reality
Human Performance Evaluation of Manipulation Schemes in Virtual Environments