
Ray developed the Tesseract OCR engine at HPLabs Bristol for 10 years, followed by a 3 year term developing the text and line drawings pipelines for the HP PrecisionScan product in Greeley, Colorado. After spending a further 7 years developing a new architecture for the Omnipage OCR product for Caere/Scansoft/Nuance, Ray is now at Google, working on Tesseract again.
Adapting the Tesseract Open Source OCR Engine for Multilingual OCR, Ray Smith, Daria Antonova, Dar-Shyang Lee, 2009.
Combined Orientation and Script Detection using the Tesseract OCR Engine, Ranjith Unnikrishnan, Ray Smith, Workshop on Multilingual OCR (MOCR), Proc. 10th Intl. Conf. on Document Analysis and Recognition (ICDAR),, 2009.
Hybrid Page Layout Analysis via Tab-Stop Detection, Ray Smith, Proceedings of the 10th international conference on document analysis and recognition, 2009.
An Overview of the Tesseract OCR Engine, Ray Smith, Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629-633.
A simple and efficient skew detection algorithm via text row accumulation, Ray Smith, Proceedings 3rd ICDAR'95, 1995, pp. 1145-1148.
Computer processing of line images: a survey, R. W. Smith, Pattern Recogn., vol. 20 (1987), pp. 7-15.