Ray Smith

Ray developed the Tesseract OCR engine at HPLabs Bristol for 10 years, followed by a 3 year term developing the text and line drawings pipelines for the HP PrecisionScan product in Greeley, Colorado. After spending a further 7 years developing a new architecture for the Omnipage OCR product for Caere/Scansoft/Nuance, Ray is now at Google, working on Tesseract again.

Google Publications

Improving Book OCR by Adaptive Language and Image Models

Dar-Shyang Lee, Ray Smith

Proceedings of 2012 10th IAPR International Workshop on Document Analysis Systems, IEEE, pp. 115-119
Limits on the Application of Frequency-based Language Models to OCR

Ray Smith

ICDAR, IEEE (2011), pp. 538-542
Table Detection in Heterogeneous Documents

Faisal Shafait, Ray Smith

Document Analysis Systems 2010, ACM International Conference Proceedings series
Adapting the Tesseract Open Source OCR Engine for Multilingual OCR

Ray Smith, Daria Antonova, Dar-Shyang Lee

MOCR '09: Proceedings of the International Workshop on Multilingual OCR (2009)
Combined Orientation and Script Detection using the Tesseract OCR Engine

Ranjith Unnikrishnan, Ray Smith

Workshop on Multilingual OCR (MOCR), Proc. 10th Intl. Conf. on Document Analysis and Recognition (ICDAR), (2009)
Hybrid Page Layout Analysis via Tab-Stop Detection

Ray Smith

Proceedings of the 10th international conference on document analysis and recognition, IEEE (2009)
An Overview of the Tesseract OCR Engine

Ray Smith

Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society (2007), pp. 629-633

Previous Publications

A simple and efficient skew detection algorithm via text row accumulation

Ray Smith

Proceedings 3rd ICDAR'95, IEEE (1995), pp. 1145-1148
Computer processing of line images: a survey

R. W. Smith

Pattern Recogn., vol. 20 (1987), pp. 7-15

Ray Smith

Co-Authors

Google Publications

Previous Publications