Visioneer Knowledgebase
Visioneer Knowledgebase
Title: Optimizing OCR Results - Optical Character Recognition
Article ID: VIS1031
Updated: 8/24/2004
Operating Systems: Windows XP / 2000 / Me / 98
Scanner Models: Visioneer Flatbed: 9520, 9420, 7400, 9320, 9220, 9120, 9020, 9000, 8900, 8800, 8700, 8600, 8100, 7700, 7600, 7300, 7100, 6600, 6400, 7600, 6200, 6100, 6000, 5800, 5600, 5300, 4800, 4500, 4400, 3300, 3100, 3000, PaperPort OneTouch
  Visioneer SheetFed: XP300, XP220, XP200, XP100, RoadWarrior (RW120, PRW120), StrobePro, PaperPort Strobe, PaperPort IX, VX, MX
  Visioneer ADF: PT480, XP470, XP450, PT430, S500 , PT680, PT780, 9750, 9650, 9450, 8650
  Xerox Flatbed: X6400, X7600, X4800, X2400
  Xerox SheetFed: TS100, CS200
  Xerox ADF: DM250, DM250L, DM252, DM262, DM262i, DM272, DM150, DM152, DM162, DM510, DM515, DM520, DM632, DM3640, DM752
 
Symptom:
 
You are getting garbled OCR results, characters are missing or inaccurate when you send the document to your word-processing application from PaperPort.
 
Cause:
 
The PaperPort software bundled with our scanners contains the TextBridge OCR engine. When a scanned document is dropped onto the OCR/word-processing link, the majority of plain text items are recognized character by character. However, the resulting document may contain misspellings, strange characters, or may not contain all the formatting and graphics of the original document.

The Visioneer XP470 and XP300 scanners, as well as the Xerox DocuMate 250L, 252, 262, 262i, 272, and 632 scanners, include the OmniPage Pro 12.0 OCR software. The Visioneer 9450, 9650, 9750, XP450, XP100, XP200 scanners, as well as the Xerox DocuMate 510, 515, 520, 250 are provided with a full version of TextBridge Pro 9.0 software. These are additional software packages and are independent of the PaperPort software.

When the OCR program has difficulty recognizing the characters, they are usually part of (or surrounded by) a graphic image so it's ignored as a picture item, text is smaller than 10 point, or the document quality might be poor (e.g. faxed copies or newsprint).

Please click VIS1019 - What You Need To Know About OCR to learn more about OCR.

Note: When converting your scans to an MS-Excel spreadsheet, very large numbers become inaccurate or are converted to scientific notation. This is a limitation of how MS-Excel handles data, the precision of numerical data is limited to 15 digits.

Please click here to view the Microsoft article on Excel Specifications. Click on "Calculation Specifications" to see all the details and limitations relating to this issue.
 
Solution:
 
For best OCR accuracy use:
  • An original printed document.
  • A scanned image where lines in document are on a horizontal plane (skewed/tilted images will not OCR correctly as the OCR engine reads content horizontally left-to-right)
  • A document which is free of lines, marks or smudges.
  • A document whose characters are distinct and separate from each other and are not bleeding together (charcters that are faded or do not have distinct edges will not OCR correctly).
  • A document without underlined characters (Documents with underlined text are difficult to recognize accurately because the underline changes the shape of the letters, especially the letters g, j, p, q and y).
  • A document free of handwritten notes lines or doodles. Anything that is not printed text will hinder the OCR recognition process.
  • A document with black text on a white background. (Colored text, gray backgrounds, images or logos may cause problems.)
 
OCR accuracy can be improved by using the following techniques:
  • Scan at a Proper Resolution (DPI) - Make sure you've changed your scan settings so that you are scanning at or above 300 dpi. For most documents, 300 dpi is the best resolution but if you have small fonts (less than 9 point) try scanning at 400 dpi.
  • Scan At the Proper Brightness - When scanning, make sure you have the proper brightness set for your scanner. If individual letters are broken or illegible, adjust the brightness level so each individual letter has no broken lines.
  • Keep Documents Straight - Documents that are scanned in at an angle or more difficult to recognize than straight ones. Make sure that each page has been scanned so that all the lines of text are straight.
 
If Document Integrity or higher OCR accuracy is desired, consider the following:
  • ScanSoft TextBridge Pro software, perfect for Small Office, Home Office OCR needs
  • Nuance (formerly ScanSoft) OmniPage Pro, a professional grade and more robust OCR product

For example: The Xerox DocuMate 252 will scan both sides of several pages of a document, saving the document as a Text-Over-Image PDF, using PaperPort 9 Pro Office and OmniPage 12 for both Document Integrity (original image) and searchable text capture (www.xeroxscanners.com).
 
Library:
 
VIS1032 - Schedule Automatic OCR in OmniPage 12

 

Visioneer Disclaimer


Visioneer provides these technical articles for information use only. The information is generally for a specific scanner model distributed by Visioneer and a designated version of software provided with the scanner. Visioneer makes reasonable efforts to verify the accuracy of content and issue resolution in these technical articles but cannot guarantee any matter including accuracy or results. The articles are provided "as is", without representation or warranty, express or implied, whether of merchantability, fitness for particular purpose, title, or non-infringement. Visioneer disclaims any liability for damages, whether direct or indirect, special, incidental, or consequential, from use of the information in these articles. Visioneer does not evaluate any effect on software and hardware not provided by Visioneer, and therefore disclaims any liability for same. Visioneer is not responsible for the content of support pages accessed through external links. The articles are subject to revision or change without notice.