Translation-Inspired OCR
Venue
ICDAR-2011
Publication Year
2011
Authors
Dmitriy Genzel, Ashok C. Popat, Nemanja Spasojevic, Michael Jahr, Andrew Senior, Eugene Ie, Frank Yung-Fong Tang
BibTeX
Abstract
Optical character recognition is carried out using techniques borrowed from
statistical machine translation. In particular, the use of multiple simple feature
functions in linear combination, along with minimum-error-rate training, integrated
decoding, and $N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using both synthetic
and real data in five languages.
