Finding Images and Line Drawings in Document-Scanning Systems

Shumeet Baluja

Michele Covell

Proc. International Conference on Document Analysis and Retrieval, IAPR (2009)

Download Google Scholar

Abstract

This work addresses the problem of finding images and line-drawings in scanned pages. It is a crucial processing step in the creation of a large-scale system to detect and index images found in books and historic documents. Within the scanned pages that contain both text and images, the images are found through the use of local-feature extraction, applied across the full scanned page. This is followed by a novel learning system to categorize the local features into either text or image. The discrimination is based on using multiple classifiers trained via stochastic sampling of weak classifiers for each AdaBoost stage. The approach taken in sampling includes stochastic hill climbing across weak detectors, allowing us to reduce our classification error by as much as 25% relative to more naive stochastic sampling. Stochastic hill climbing in the weak classifier space is possible due to the manner in which we parameterize the weak classifier space. Through the use of this system, we improve image detection by finding more line-drawings, graphics, and photographs, as well as reducing the number of spurious detections due to misclassified text, discoloration, and scanning artifacts.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Finding Images and Line Drawings in Document-Scanning Systems

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Finding Images and Line Drawings in Document-Scanning Systems

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities