RecAPI
Recognition

The Engine can load a number of recognition modules. It is license-dependent whether a given recognition engine is accessible or not. For information about licensing, please see the General Information help system. The User can control the engine running in a given ZONE. This allows the integrating application to perform "multi-module" recognition on any page.

NOTE: The modules MTX, DOT, MAT, HNR are supported on: Windows, RER is supported on: Windows, Linux, Mac OS X.

The enum RECOGNITIONMODULE lists all the possible recognition modules; these are tightly integrated to the Engine:

  • PLUS3W omnifont module for machine-printed text (most accurate),
  • PLUS2W omnifont module for machine-printed text (default),
  • MTX omnifont module for machine-printed text,
  • MOR multi-language omnifont module for machine-printed text,
  • FRX multi-languange omnifont module for machine-printed text,
  • HNR module for handprinted digits,
  • RER third-party module for handprinted alphanumerical characters,
  • DOT module for 9-pin draft dot-matrix printouts,
  • OMR module for optical marks (checkmarks),
  • MAT matrix matching module,
  • BAR module for 1D and 2D barcodes.

The RER recognition module is third-party component.

The rm field of any user zone should contain one of the element of the above mentioned enum type. There is a special one: RM_AUTO means that the Engine will choose the module most likely to be appropriate. It does this first of all by consulting the filling method set for the zone.

The FILLINGMETHOD describes the type of data expected in the zone, e.g. a barcode, a handprinted or a machine generated text. A degree of auto-detection is available for filling method, with the kRecDetectFillingMethod function, useful when the precise filling method used on incoming documents may not be known in advance. It is the User's responsibility to specify a valid recognition module-filling method pair. Any incorrectly set zone will have no recognition result.

RM_AUTO reads the filling method; if only one recognition module is suitable, it is used. When there is a choice, RM_AUTO uses various checks (character set, image size, etc.) to select the best one. Thus, it protects against an invalid FM-RM pair.

If the recognition module is not present when it is needed, the recognition function (kRecRecognize) returns with API_MODULEMISSING_ERR, and there will be no recognized data for the zone concerned. To avoid risk of this, we recommend checking the presence and correct installation of the necessary recognition modules by calling kRecGetModulesInfo right after the Engine's initialization.

The recognition process is typically initiated by the function kRecRecognize. The other way is to call the function kRecProcessPages. The former processes an already loaded page, the latter gets one or more image files as input.

Both methods work on the zones of the page. Internally, the recognition process operates on B/W images. If the loaded image is not a B/W one, the required conversion is performed during the loading or preprocessing.

The kRecProcessPages function is a so-called one-touch processing function or one-step function, because it performs the image loading (either from files or from scanner), the preprocessing, the loading of user zones (if specified), the recognition and the output conversion started by only one User calling. Furthermore, this one calling is enough to process more than one image. Of course, the one-touch function uses the usual settings.

The document level processing also has one-touch solutions.

IMPORTANT NOTES

The default settings of OmniPage 20 (Nuance's desktop application) and OmniPage Capture SDK 20 are not the same. In default, RecAPI of the CSDK does not run in the most accurate mode, but in a less accurate and faster mode, which is a good compromise between the speed and the accuracy. But it can be easily switched into the most accurate mode modifying the value of the setting Kernel.OcrMgr.PreferAccurateEngine to true. This most accurate mode of the CSDK is equivalent to the default of the desktop application. See also kRecSetDefaultRecognitionModule and its notes.

The recognition result is stored in the letter array of the HPAGE. The Recognition Data Handling Module is available for its maintenance. Each recognized character is represented by a LETTER structure containing all the accessible information about the given character (character code, position, size, confidence of the recognition, additional possible tips, font information, formatting information, etc.). This recognition data is directly accessible by kRecGetLetters; it is also the input for the output conversion.

The LETTER structure contains an err field for reporting the confidence of the recognized character. This is a combined value and if its value is 64 or greater the character is considered as suspicious. Of course, this is only a recommended value. For more information, see confidence issues. Another tool for reporting accurately the opinion of the recognition engine(s) is the use of alternatives. The running recognition engine may have more than one tip for each character. In this case, LETTER provides access to the higher-order choices of the character code. For more information, see the usage of alternatives.

Note:
The following code sample recognizes a page and prints the recognition result onto the standard output marking the suspicious characters.
    RECERR rc;
    ...
    // Load image.
    HPAGE hPage;
    rc = kRecLoadImgF(0, "testimage.tif", &hPage, 0);
    // Preprocess image.
    rc = kRecPreprocessImg(0, hPage);
    // Locate zones.
    rc = kRecLocateZones(0, hPage);
    // Recognize image.
    rc = kRecRecognize(0, hPage, NULL);

    // Get recognition result.
    LETTER *pLetters;
    long nLetters;
    rc = kRecGetLetters(hPage, II_CURRENT, &pLetters, &nLetters);
    // Print recognition result.
    for(int i=0;i<nLetters;i++)
    {
        if (pLetters[i].code == UNICODE_REJECTED)
            putwchar(L'~');
        else if (pLetters[i].code == UNICODE_MISSING)
            putwchar(L'^');
        else
            putwchar(pLetters[i].code);
        if (pLetters[i].err >= RE_SUSPECT_THR)
            putwchar(L'*');
    }

    // Free up recognition results given back by the kRecGetLetters function.
    rc = kRecFree(pLetters);
    // Free up page.
    rc = kRecFreeImg(hPage);
    ...

OCR performance issues

There are different issues to be taken into account when you want to improve accuracy. Typically they also have consequences for the processing speed.

  • Image quality - This is one of the most important factors that influences accuracy. A resolution of 300 dpi or 400 dpi is best for recognition. Use image preprocessing to enhance the quality of the given image to get more accurate auto-zoning and recognition. Load grayscale or color images into the Engine without any primary conversion (see IMG_CONVERSION for more information) and combine this with image preprocessing in order to get more accurate output. Our OCR produces more accurate result on grayscale or color images, because they give greater bit info than B/W images. In addition, the OCR engines are adapted to the result of our own binarization methods.
  • Auto-zoning algorithm to be applied - If auto-zoning is needed, use DCM_STANDARD, the most accurate page parser available in the Engine. On the other hand, in some complex cases it might require significantly more time to complete. If processing speed is also important, consider using the algorithm DCM_LEGACY. Finally, use DCM_FAST, when the processing speed has a higher priority than accuracy and the image content of the originals is simple enough (e.g. good quality letters without graphics or tables). In this case, it is recommended to disable the non-gridded table detection algorithm (kRecSetNongriddedTableDetect).
  • Correct RM-FM choice - The most suitable filling method and recognition module should be specified, as described above. If more than one recognition module supports a given filling method, consider whether it is better to specify the recognition module in your program rather than accept the setting selected by the Engine through the RM_AUTO option.
  • Trade-off settings - These are set at page level, and determine how thorough the recognition should be, i.e. whether more time can be taken to try for higher accuracy or not. There are three possible values: TO_ACCURATE, TO_BALANCED, TO_FAST and these are passed to the kRecSetRMTradeoff function.
  • Module specific settings - The behavior, and hence accuracy, of some recognition modules can be influenced by separate module-specific settings. See the list of all the RecAPI settings and the following functions:
    • barcode recognition: kRecSetBarTypes - The barcode module auto-detects the barcode type. This function can be used to limit the acceptable choices. If it is certain that only one type will be used, it is possible to validate only that type. By default, five main types are enabled.
    • HNR module: kRecSetHnrParams, kRecSetHnrStyle - Accuracy can be improved if the application can tell the module the precise location of characters (e.g. they occur in a structured form with comb or cell boxes). If this can't be done, the module will auto-detect their positions. It is also possible to specify European or American handwriting style. To recognize zones with hand-written numbers and no letters, consider choosing RM_HNR rather than RM_RER.
    • MOR, PLUS2W and PLUS3W modules: kRecSetMorFaxed - For poor-quality fax output, the fax compensation can be used.
    • OMR module: kRecSetOmrParams - This module has a setting to indicate whether checkbox borders will be visible or invisible in the loaded image. A value for mark sensitivity can also be given.
    • MAT and RER modules: These modules can be influenced through an optional RM-specific parameter file. This allows settings not supported by the Engine programming interface to be communicated to these modules. (See the proper setting of each module.)
  • Character Set - This determines at zone level which set of characters should be considered as valid. By eliminating characters that are known not to appear in the zone, accuracy can be improved. If non-validated character shapes are encountered in a zone, they are either replaced by the rejection symbol or forced to a similar-shaped validated character. A major component of the Character Set is the language choice. Setting the wrong language(s) and/or language dictionary (or leaving unneeded ones enabled) is likely to slow down recognition and reduce accuracy considerably. See below.
  • Training - The omnifont recognition modules accept training files through kRecSetTrainingFileName. It's worth using a training file, if there are some repeatedly misformed characters in a series of uniformly degraded documents, or the module repeatedly misreads a certain character. Training files can be created as a result of a training session performed in a Capture SDK based application, which incorporates the training feature of the TEC Text Editor Control. Training of recognition modules is supported on: Windows, Linux.
  • Spell Checking Module - Disabling the checking may speed up the recognition a bit, but it causes worse accuracy. See the section Checking for more information.
  • Disabling timeout - Specifying the INFINITE timeout value (kRecSetTimeOut) disables the watch-dog mechanism in the Engine. Disabling the timeout feature may speed up the recognition a little bit. This speed-up is likely to be significant only when recognizing a lot of small-size images successively.

Character set limitation and the checking module both influence accuracy, but in different ways. Both, either or none of them can be used; the integrator should decide which balance is best. Their effects, when used separately, can be summarized as follows:

  • Limiting the character set: recognition results outside the character set are not used.
  • Checking with changing: recognition results outside the checking rules are less likely; both suspicious results and changed items are flagged.
  • Checking without changing: text appears as recognized, but suspicious results are flagged.

Limiting the character set gives the program greatest decision power, using the checking module to only flag errors is safest, but requires more post-processing outside the Engine to check all non-conforming cases.

A typical balance would be to impose broad restrictions by limiting the character set, e.g. specifying the permissible languages, but using the checking module for detailed control over parts of the recognized text where it's important that the original data be recognized and passed for checking precisely as it was written (e.g. for an ID Code incorporating a check-digit function). This later checking should make it possible to determine whether any error is due to optical recognition errors or was originally invalid.

Defining the character set

The Language, Character Set and Code Page Handling Module is responsible for this area.

You can improve text recognition accuracy by narrowing the range of characters valid for recognition. This way the Engine does not always have to choose its solutions from more than 550 characters in the Engine's Total Character Set. (The multi-language omnifont MOR recognition module supports all of these characters; other recognition modules recognize fewer of them.) The character set concept is documented in detail in the topic Character Set in the Engine. Broadly, the set is compiled as follows:

  • Language environment - This involves selecting one or more of the 119 available languages with the kRecSetLanguages function and optionally additional characters validated individually with the kRecSetLanguagesPlus function. Selecting only needed language(s) has a major impact. For example, selecting German only immediately INVALIDATES the Cyrillic and Greek alphabets and over 150 other unneeded accented letters. For more information, tables and descriptions see the related pages of this module. Note that the default selected language is English.
  • Recognition module capabilities - Defining a recognition module for processing a zone may also restrict the available languages or characters within the Language environment. For example, RER cannot read Russian, and the HNR recognition module cannot process letters at all.
  • Filtering - The CSDK provides filters (CHR_FILTER) to further narrow down the character set, by enabling only certain character classes, e.g. digits, uppercase letters etc. The value FILTER_ALL means no filtering.
  • Re-Expanding - An application may require exceptions to the filter rule. The most flexible way to re-expand the character set with individual characters after filtering is to specify them with a call to kRecSetFilterPlus and validate them with the FILTER_PLUS flag in the required zones.
  • Zone-level modification - In addition to the global, page-level definition of the character set, the choice of recognition module, filling method, filtering and use of the expansion string can be fine-tuned on a local, zone level (of course, only for OCR zones created from user zones). Auto-located zones will be set to take the global filter settings. Local filtering and expansion (with FILTER_PLUS) can be set in the zone's filter field. The possible local filter values are the same as the global ones, with an extra one: FILTER_DEFAULT. If this is the only one set, the zone inherits the global filter setting.

Statistics

The integrating application can retrieve timing and other statistical information about the last processed image. This may include:

  • time used to load the image in the Engine’s memory space,
  • time used for pre-processing the image, if any,
  • time used for auto-zoning of the image, if any,
  • time that the image spent in the different recognition modules during recognition, if any,
  • recognition time measured till producing the recognition result,
  • number of recognized characters on the image,
  • number of recognized words on the image,
  • number of rejected characters on the image.

The application calls kRecGetStatistics for this purpose. The fields of the STATISTIC structure will be filled with the relevant information on return.

The structure contains the latest accessible information about each listed statistics field, i.e. if recognition has not run yet on the current HPAGE, but has run on the previous one, the structure contains the data of the previous recognition.

All timing data is measured in milliseconds.