
Ray Smith
Ray developed the Tesseract OCR engine at HPLabs Bristol for 10 years, followed by a 3 year term developing the text and line drawings pipelines for the HP PrecisionScan product in Greeley, Colorado. After spending a further 7 years developing a new architecture for the Omnipage OCR product for Caere/Scansoft/Nuance, Ray is now at Google, working on Tesseract again.
Research Areas
Authored Publications
Sort By
Google
Improving Book OCR by Adaptive Language and Image Models
Dar-Shyang Lee
Proceedings of 2012 10th IAPR International Workshop on Document Analysis Systems, IEEE, pp. 115-119
Limits on the Application of Frequency-based Language Models to OCR
ICDAR, IEEE (2011), pp. 538-542
Table Detection in Heterogeneous Documents
Faisal Shafait
Document Analysis Systems 2010, ACM International Conference Proceedings series
Hybrid Page Layout Analysis via Tab-Stop Detection
Proceedings of the 10th international conference on document analysis and recognition, IEEE (2009)
Adapting the Tesseract Open Source OCR Engine for Multilingual OCR
Daria Antonova
Dar-Shyang Lee
MOCR '09: Proceedings of the International Workshop on Multilingual OCR (2009)
Combined Orientation and Script Detection using the Tesseract OCR Engine
Ranjith Unnikrishnan
Workshop on Multilingual OCR (MOCR), Proc. 10th Intl. Conf. on Document Analysis and Recognition (ICDAR), (2009)
An Overview of the Tesseract OCR Engine
Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society (2007), pp. 629-633