Contents - Index

OCR Options

OCR Plug-in Installed
Check this if you intend to register Docs2Manage (D2M) and purchase the optional OCR Plug-in.  If D2M is already registered, please purchase the optional OCR Plug-in on the website and Email your computer (serial) ID code (as when you registered D2M).  In either case, the unlock code is needed to mark D2M as having the OCR Plug-in installed will be sent to you.

DOCWORDS table KEYWORDS field length
This option should not be changed from the default of 255.  This field is the expected length of the KEYWORDS field in the DOCWORDS table.  Many documents' OCR results will exceed the KEYWORDS field length.  In these cases, D2M needs to know the length of this field so it can split the OCR/Keywords results across several records.  D2M adds the results from the many records to have a complete set of keywords for a page or document.

Only OCR New Documents for keywords
Check this field if D2M should only OCR pages that has not been OCRed before.

Minimum Character Confidence (default 20%)
D2M will reject any character less than the confidence percent indicated.

Acceptable Page Confidence (default 30%)
D2M will reject any page with less than the confidence percent indicated.  This is computed by averaging all the character confidences together on that page.

Preferred (default 60%)
A page confidence above this page is always considered a good OCR read.  If auto-rotate is checked, a page exceeding this confidence level is considered the correct orientation to OCR the page.  Further rotation is not necessary.

Auto-rotate until preferred page confidence is met
When this option is checked, D2M will rotate the page and the four different read confidence levels will be analyzed to determine how the page should be rotated.  During this process, if the Preferred page confidence is met, the proper rotation is considered found and testing stops to save time.  The OCR results from the best confidence rotation are stored as the keywords.

Extract Keywords from MS Word Documents 
Supported format - .pdf, .doc, .rtf, .wpd, .wri, .txt, .xls, .eml, .msg, .htm, .html, .xml)
When this option is checked, D2M will attempt to extract keywords from MS Word Documents when they are stored as a file (not as an OLE Windows Document).  These keywords will be stored the same way an OCRed image page is stored even if the MS Word Document is several pages.  To use this feature, you must have the OCR Plug-in and MS Word installed.

Keywords to Exclude
These are words D2M will eliminate that will not aid in searching later.  Words like "and", "or", "but", and "what" are too common to be considered useful.  You may add, remove, or edit words on this list.  There should only be one word per line in this box.

NOTE:  Please test how these settings will work with the documents you add to the system, especially if you plan to use the auto-rotate feature.