Ray Smith

Ray developed the Tesseract OCR engine at HPLabs Bristol for 10 years, followed by a 3 year term developing the text and line drawings pipelines for the HP PrecisionScan product in Greeley, Colorado. After spending a further 7 years developing a new architecture for the Omnipage OCR product for Caere/Scansoft/Nuance, Ray is now at Google, working on Tesseract again.

Google Publications

  •    

    Improving Book OCR by Adaptive Language and Image Models

    Dar-Shyang Lee, Ray Smith

    Proceedings of 2012 10th IAPR International Workshop on Document Analysis Systems, IEEE, pp. 115-119

  •    

    Limits on the Application of Frequency-based Language Models to OCR

    Ray Smith

    ICDAR, IEEE (2011), pp. 538-542

  •    

    Table Detection in Heterogeneous Documents

    Faisal Shafait, Ray Smith

    Document Analysis Systems 2010, ACM International Conference Proceedings series

  •    

    Adapting the Tesseract Open Source OCR Engine for Multilingual OCR

    Ray Smith, Daria Antonova, Dar-Shyang Lee

    MOCR '09: Proceedings of the International Workshop on Multilingual OCR (2009)

  •    

    Combined Orientation and Script Detection using the Tesseract OCR Engine

    Ranjith Unnikrishnan, Ray Smith

    Workshop on Multilingual OCR (MOCR), Proc. 10th Intl. Conf. on Document Analysis and Recognition (ICDAR), (2009)

  •    

    Hybrid Page Layout Analysis via Tab-Stop Detection

    Ray Smith

    Proceedings of the 10th international conference on document analysis and recognition, IEEE (2009)

  •    

    An Overview of the Tesseract OCR Engine

    Ray Smith

    Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society (2007), pp. 629-633

Previous Publications

  •   

    A simple and efficient skew detection algorithm via text row accumulation

    Ray Smith

    Proceedings 3rd ICDAR'95, IEEE (1995), pp. 1145-1148

  •   

    Computer processing of line images: a survey

    R. W. Smith

    Pattern Recogn., vol. 20 (1987), pp. 7-15