RecAPI
|
Module name: | OMR |
Module identifier: | RM_OMR |
Filling methods supported | FM_OMR |
Filters supported: | ignores all filter settings |
Trade-off supported: | none |
Knowledge base file: | none |
Training file supported: | no |
This module is included only in the Professional Recognition Kit (not the OCR kit). To make this technology available in your application, it must be covered by your distribution licensing. See the topic on Licensing in the General Information help system.
This recognition module is used for recognizing optical marks (checkmarks). Typical application areas are in questionnaires, ballot papers, educational tests and reporting or ordering sheets, where the documents to be processed are form-like and filled by respondents, usually by hand.
IMPORTANT NOTE: Autozoning can not find OMR zones, therefore manually (programmatically) created user zones (see FM_OMR and RM_OMR) or pre-defined form templates (see how to use form templates) can be used.
Checkmark zones are bounded by printed frames, which are visible on the input document, but may be visible or invisible in the image passed to the recognition module, due to the use of dropout colors during scanning. The accuracy of this module can be improved by
The values "frames visible" and "frames invisible" give higher accuracy than "auto-detect". This recognition module is not influenced by the recognition trade-off setting.
Sometimes the OMR zones of the User may cut the frames (e.g.: at using the same zone file on all the images about the same form). The module can step over the border of the zone for processing the whole frame, but it is not the default running. This working method can be provoked modifying the setting Kernel.Ocr.OMR.ZoneCorrection to true.
See also the topic Instructions to respondents below.
The frame can be a rectangle, a circle, an ellipse, etc.; it can be shaded. It may be visible or invisible in the image sent for recognition. The dimension of the frame should be at least 45-50 pixels in each direction, that is 3.5 to 4 mm (0.2 inch) in the case of 300 dpi resolution.
This module has been tested on an image with more than 1300 OMR zones.
An OMR (optical mark) zone is unique in that its output always consists of precisely one digit. It can be defined to be one of two or one of three values. When there are two possible values, these are zero (0) for unfilled, one (1) for filled. When three values are possible, the additional value is two (2) for "filled-in-error" (see below).
The safest way to link the output values with the checkboxes, which generated them, is through the LETTER structure output, which contains the zone number and the coordinates (zone, left, top, width, height). This can also help prevent checkmark data being confused with barcode values or other non-checkmark data coming from the same page.
If a page contains mainly or only checkmark data, the output converters "Text - Tab Delimited", "Text - Comma Delimited" or "Excel 97, 2000" can be used to load the data into a spreadsheet program for further analysis and presentation.
The filled-in-error feature allows the application to handle checkboxes that were filled by mistake. This feature is available only with the KernelAPI.
The respondent in this case should completely blacken the frame or checkmark area before marking a new choice. It is not essential that the area be completely blackened, but it must be significantly darker and denser than a "filled" (checked) zone. A "filled-in-error" zone generates an output value 2. This feature functions only if two conditions are met:
The filled-in-error feature functions only on grouped zones. Recognition results for each zone in a group will be 0, 1 or 2. There should be only one filled zone per group plus optionally one filled-in-error. When designing a checkmark document, all zones in a group should have the same checkbox style and size. Up to 32 OMR zones can be grouped.
In CSDK versions earlier than v15 the OMR zones could be grouped by modifying the seq field in the zone structure. From v15 this is not necessary, due to the notion of pizzabox zones. OMR zones can be groupped by collecting them in one pizzabox zone even they are not touching (i.e. one criterion of pizzabox shape is not fulfilled).
OMR processing requires a high degree of accuracy. The two-value detection is inherently accurate; three-value detection is more difficult. Good document design and clear instructions to respondents are very important in getting high accuracy. Printing model samples of ideally filled and filled-in-error checkboxes in the instructions is recommended. Respondents should be urged to fill in the document with a dark blue or black pen. Pencils are to be avoided, as are pens with an ink color close to a dropout color on the scanner to be used.)