Henry A. Rowley
Henry A. Rowley received BS degrees in Electrical Engineering and Computer Science from the University of Minnesota in 1992, a Masters in Computer Science from Carnegie Mellon University in 1994, and PhD in Computer Science from Carnegie Mellon University in 1999 for his thesis work on neural network-based face detection. After graduating he worked at Zaxel Systems, Inc. on lossless video compression and multi-view stereo reconstruction, and at Microsoft on Chinese, Japanese, and Korean handwriting recognition. Currently he is a member of the Google Research group, where he has worked on computer vision, machine learning, and most recently handwriting recognition.
Authored Publications
Google Publications
Other Publications
Sort By
Fast Multi-language LSTM-based Online Handwriting Recognition
Thomas Deselaers
Alexander Daryin
Marcos Calvo
Sandro Feuz
Philippe Gervais
International Journal on Document Analysis and Recognition (IJDAR) (2020)
Preview abstract
Handwriting is a natural input method for many people and we continuously invest in improving the recognition quality. Here we describe and motivate the modelling and design choices that lead to a significant improvement across the 100 supported languages, based on recurrent neural networks and a variety of language models.
%
This new architecture has completely replaced our previous segment-and-decode system~\cite{Google:HWRPAMI} and reduced the error rate by 30\%-40\% relative for most languages. Further, we report new state-of-the-art results on \iamondb for both the open and closed dataset setting.
%
By using B\'ezier curves for shortening the input length of our sequences we obtain up to 10x faster recognition times. Through a series of experiments we determine what layers are needed and how wide and deep they should be.
%
We evaluate the setup on a number of additional public datasets.
%
View details
Multi-Language Online Handwriting Recognition
Thomas Deselaers
IEEE Transactions on Pattern Analysis and Machine Intelligence (2016)
Preview abstract
We describe Google's online handwriting recognition system that currently supports 22 scripts and 97 languages. The system's focus is on fast, high-accuracy text entry for mobile, touch-enabled devices. We use a combination of state-of-the-art components and combine them with novel additions in a flexible framework. This architecture allows us to easily transfer improvements between languages and scripts. This made it possible to build recognizers for languages that, to the best of our knowledge, are not handled by any other online handwriting recognition system. The approach also enabled us to use the same architecture both on very powerful machines for recognition in the cloud as well as on mobile devices with more limited computational power by changing some of the settings of the system. In this paper we give a general overview of the system architecture and the novel components, such as unified time- and position-based input interpretation, trainable segmentation, minimum-error rate training for feature combination, and a cascade of pruning strategies. We present experimental results for different setups. The system is currently publicly available in several Google products, for example in Google Translate and as an input method for Android devices.
View details
GyroPen: Gyroscopes for Pen-input with Mobile Phones
Thomas Deselaers
Jan Hosang
IEEE Transactions on Human-Machine Systems, vol. 45 (2015), pp. 263-271
Preview abstract
We present GyroPen, a method for text entry into mobile devices using pen-like
writing interaction reconstructed from standard built-in sensors. The key idea
is to reconstruct a representation of the trajectory of the phone's corner that
is touching a writing surface from the measurements obtained from the phone's
gyroscopes and accelerometers. We propose to directly use the angular
trajectory for this reconstruction, which removes the necessity for accurate
absolute 3D position estimation, a task that can be difficult using low-cost
accelerometers. Recognition is then performed using an off-the-shelf
handwriting recognition system, allowing easy extension to new languages and
scripts. In a small user study (n=10), the average novice participant was able
to write the first word only 37 seconds after the starting to use GyroPen for
the first time. With some experience, users were able to write at the speed of
3-4s for one English word and with a character error rate of 18%.
View details
Large-scale SVD and manifold learning
Preview
Ameet Talwalkar
Journal of Machine Learning Research, vol. 14 (2013), pp. 3129-3152
Large Scale SVD and Manifold Learning
Preview
Ameet Talwalkar
Journal of Machine Learning Research (JMLR) (2013)
Preview abstract
This paper presents the algorithms which power Google Correlate, a tool which finds web search terms whose popularity over time best matches a user-provided time series. Correlate was developed to generalize the query-based modeling techniques pioneered by Google Flu Trends and make them available to end users.
Correlate searches across millions of candidate query time series to find the best matches, returning results in less than 200 milliseconds. Its feature set and requirements present unique challenges for Approximate Nearest Neighbor (ANN) search techniques. In this paper, we present Asymmetric Hashing (AH), the technique used by Correlate, and show how it can be adapted to the specific needs of the product.
We then develop experiments to test the throughput and recall of Asymmetric Hashing as compared to a brute-force search. For "full" search vectors, we achieve a 10x speedup over brute force search while maintaining 97% recall. For search vectors which contain holdout periods, we achieve a 4x speedup over brute force search, also with 97% recall.
View details
Learning Binary Codes for High Dimensional Data Using Bilinear Projections
Yunchao Gong
Svetlana Lazebnik
IEEE Computer Vision and Pattern Recognition (2013)
Preview abstract
Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on large scale datasets like ImageNet, extremely high-dimensional visual descriptors, e.g., Fisher Vectors, are needed. We present a novel method for converting such descriptors to compact similarity-preserving binary codes that exploits their natural matrix structure to reduce their dimensionality using compact bilinear projections instead of a single large projection matrix. This method achieves comparable retrieval and classification accuracy to the original descriptors and to the state-of-the-art Product Quantization approach while having orders of magnitude faster code generation time and smaller memory footprint.
View details
Google Image Swirl: A Large-Scale Content-Based Image Visualization System
Preview
Yushi Jing
Jingbin Wang
David Tsai
Chuck Rosenberg
Michele Covell
WWW (2012), pp. 539-540
Large-Scale Image Annotation using Visual Synset
Preview
David Tsai
Yushi Jing
Yi Liu
Sergey Ioffe
James Rehg
Proc. International Conference on Computer Vision (ICCV) (2011)
Image Saliency: From Local to Global Context
Preview
Meng Wang
Janusz Konrad
Prakash Ishwar
Yushi Jing
Proc. Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Preview abstract
This paper compares the efficacy and efficiency of different
clustering approaches for selecting a set of exemplar images, to present in the context of a semantic concept. We evaluate these approaches using 900 diverse queries, each associated with 1000 web images, and comparing the exemplars chosen by clustering to the top 20 images for that search term. Our results suggest that Affinity Propagation is effective in selecting exemplars that match the top search images but at high computational cost. We improve on these early results using a simple distribution-based selection filter on incomplete clustering results. This improvement allows us to use more computationally efficient approaches to clustering, such as Hierarchical Agglomerative Clustering (HAC) and Partitioning Around Medoids (PAM), while still reaching the same (or better) quality of results as were given by Affinity Propagation in the original study. The computational savings is significant since these alternatives are 7-27 times faster than Affinity Propagation.
View details
Visualizing Web Images via Google Image Swirl
Preview
Yushi Jing
Chuck Rosenberg
Jingbin Wang
Michele Covell
NIPS Workshop on Statistical Machine Learning for Visual Analytics (2009)
Face Tracking and Recognition with Visual Constraints in Real-World Videos
Minyoung Kim
Vladimir Pavlovic
IEEE Computer Vision and Pattern Recognition (CVPR) (2008)
Preview abstract
We address the problem of tracking and recognizing
faces in real-world, noisy videos. We track faces using
a tracker that adaptively builds a target model reflecting
changes in appearance, typical of a video setting. However,
adaptive appearance trackers often suffer from drift, a gradual
adaptation of the tracker to non-targets. To alleviate this
problem, our tracker introduces visual constraints using a
combination of generative and discriminative models in a
particle filtering framework. The generative term conforms
the particles to the space of generic face poses while the discriminative
one ensures rejection of poorly aligned targets.
This leads to a tracker that significantly improves robustness
against abrupt appearance changes and occlusions,
critical for the subsequent recognition phase. Identity of the
tracked subject is established by fusing pose-discriminant
and person-discriminant features over the duration of a
video sequence. This leads to a robust video-based face recognizer
with state-of-the-art recognition performance. We
test the quality of tracking and face recognition on realworld
noisy videos from YouTube as well as the standard
Honda/UCSD database. Our approach produces successful
face tracking results on over 80% of all videos without
video or person-specific parameter tuning. The good tracking
performance induces similarly high recognition rates:
100% on Honda/UCSD and over 70% on the YouTube set
containing 35 celebrities in 1500 sequences.
View details
Preview abstract
This paper examines the problem of extracting low-dimensional manifold structure given millions of high-dimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. Since most manifold learning techniques rely on spectral decomposition, we first analyze two approximate spectral decomposition techniques for large dense matrices (Nystrom and Column-sampling), providing the first direct theoretical and empirical comparison between these techniques. We next
show extensive experiments on learning low-dimensional
embeddings for two large face datasets: CMU-PIE (35
thousand faces) and a web dataset (18 million faces). Our
comparisons show that the Nystrom approximation is superior
to the Column-sampling method. Furthermore, approximate
Isomap tends to perform better than Laplacian
Eigenmaps on both clustering and classification with the
labeled CMU-PIE dataset.
View details
Clustering Billions of Images with Large Scale Nearest Neighbor Search
Preview
Ting Liu
Chuck Rosenberg
IEEE Workshop on Applications of Computer Vision, IEEE (2007)
Canonical Image Selection from the Web
Preview
Yushi Jing
ACM International Conference on Image and Video Retrieval (2007)
Boosting Sex Identification Performance
Preview
International Journal of Computer Vision, vol. 71 (2007), pp. 111-119
Large Scale Image-Based Adult-Content Filtering
Preview
Yushi Jing
1st International Conference on Computer Vision Theory, Sebutal, Portugal (2006)
Boosting Sex Identification Performance
Preview
Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference, AAAI (2005), pp. 1508-1513
The Happy Searcher: Challenges in Web Information Retrieval
Preview
Mehran Sahami
Vibhu Mittal
The Eighth Pacific Rim International Conference on Artificial Intelligence (PRICAI-2004)
Efficient Face Orientation Discrimination
Preview
Mehran Sahami
International Conference on Image Processing (ICIP-2004)
The Effect of Large Training Set Sizes on Online Japanese Kanji and English Cursive Recognizers
Manish Goyal
John Bennett
IWFHR '02: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02), IEEE Computer Society, Washington, DC, USA (2002), pp. 36-40
Anomaly detection through registration
Mei Chen
Takeo Kanade
Dean Pomerleau
Pattern Recognition, vol. 32 (1999), pp. 113-128
Neural Network-Based Face Detection
Takeo Kanade
IEEE Trans. Pattern Anal. Mach. Intell., vol. 20 (1998), pp. 23-38
Anomaly Detection through Registration
Rotation Invariant Neural Network-Based Face Detection
Analyzing Articulated Motion Using Expectation-Maximization
Integrating Text and Face Detection for Finding Informative Poster Frames
Neural Network-Based Face Detection
Human Face Detection in Visual Scenes
Reconstructing 3-D Blood Vessel Shapes from Multiple X-Ray Images
Case Study of a Population Bottleneck: Lions of the Ngorongoro Crater
C. Packer
A. E. Pusey
D. A. Gilbert
J. Martenson
S. J. O'Brien
Conservation Biology, vol. 5 (1991)