RecAPI
|
Nuance OmniPage Capture SDK has multiple OCR engines, which can be applied on a per-zone basis. This allows a wide range of different types of textual and non-textual data to be recognized, even if they all appear on the same page. For each zone, the application can assign any module available in the configuration, or the choice of module can be left to the Engine.
Using the recognition services of any of these modules for development purposes requires license activation from Nuance Communications, Inc. The different engines require individual licenses, but of course these licenses can be combined and can be made usable with one activation. For more information see the licensing topics of the General Information help system.
FRX multi-lingual omnifont recognition module
MOR multi-lingual omnifont recognition module
MTX omnifont recognition module
PLUS2W and PLUS3W omnifont recognition modules
BAR barcode recognition module
DOT 9-pin draft dot-matrix recognition module
HNR handprinted numeral recognition module
MAT matrix matching recognition module
OMR optical mark recognition module
RER handprint recognition module
Asian recognition module
NOTE: the recognition modules MTX, DOT, HNR, MAT are supported on: Windows, RER is supported on: Windows, Linux, Mac OS X.
When the value RM_AUTO is set, either by default or explicitly, the Engine takes care of recognition module selection for any filling method. When setting specific values for filling methods and recognition modules, it is the programmer’s responsibility to specify a valid recognition module-filling method pair. Any incorrectly set zones will have no recognition results. The following table shows which modules are considered by the automatic recognition module selection, called up by the RM_AUTO value. The order of the recognition modules in the second column shows the priority order for the automatic recognition module selection.
Filling method | Permissible Recognition modules |
FM_OMNIFONT | RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_OMNIFONT_MOR, RM_OMNIFONT_FRX, RM_OMNIFONT_MTX |
FM_DRAFTDOT9 | RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_DOT, RM_OMNIFONT_MTX |
FM_BARCODE | RM_BAR |
FM_OMR | RM_OMR |
FM_HANDPRINT | RM_HNR, RM_RER |
FM_DRAFTDOT24 | RM_OMNIFONT_PLUS2W, RM_OMNIFONT_PLUS3W, RM_OMNIFONT_MOR, RM_OMNIFONT_FRX, RM_OMNIFONT_MTX |
FM_OCRA | RM_OMNIFONT_MOR, RM_OMNIFONT_MTX, RM_MAT, RM_RER |
FM_OCRB | RM_OMNIFONT_MOR, RM_OMNIFONT_MTX, RM_MAT, RM_RER |
FM_MICR | RM_MAT, RM_RER |
FM_BARCODE2D | RM_BAR |
FM_DOTDIGIT | RM_MAT |
FM_DASHDIGIT | RM_MAT |
FM_CMC7 | RM_RER, RM_MAT |
FM_NO_OCR | - |
The correct assignment of a recognition module and a filling method to a zone should mean that the recognition module is able to satisfactorily process the contents of that zone. But it does not guarantee that the recognition module will be able to process every possible character. The characters supported by the Engine are listed in Characters and Code Pages. Most recognition modules recognize only a subset of these. Even if we restrict the Character Set to a limited Language Environment e.g. selecting the German language, the recognition module may not be able to process all the enabled characters. E.g. RM_HNR is able to process hand printed numerals but does not recognize letters. Automatic recognition module selection takes Character Set support of modules into consideration. Selecting a recognition module directly, it is the programmer’s responsibility to select a recognition module capable of supporting the widest character set enabled in the zone. Otherwise this zone may have an incomplete recognition result. The precise character and language support for each module is given in the appropriate recognition module specifications.
Narrowing the Character Set has two effects:
The filtering system allows the Language environment to be narrowed, by enabling only certain character classes, and also by enabling individual characters. A filter is built up from filter elements, as detailed under CHR_FILTER.
Each filter element name tells which character class is enabled, e.g. FILTER_ALL means no filtering. Not all recognition modules interpret all filter elements. Precise information appears in the sub-heading for each module.
Applying a filter may not always enable the same number of characters. E.g. FILTER_MISCELLANEOUS can enable only those miscellaneous characters supported by the recognition module assigned to the zone.
Three accuracy/speed trade-off settings can be specified at page or document level: TO_ACCURATE, TO_BALANCED and TO_FAST. Five recognition modules can interpret these. Precise information appears in the sub-heading for each module.
The checking module has two basic services. It can flag unacceptable recognition results without changing them or it can be permitted to modify recognition results using checking module feedback. The available acceptance rules can come from the following:
These two sources may be combined freely. The checking module and each of its two parts can be enabled or disabled on a per-zone basis. The integrator should try to match the particular parts of the checking module to the contents and recognition modules of individual zones, e.g. allowing checking changes with a language dictionary enabled will be either pointless or even harmful for the modules RM_BAR, RM_OMR and RM_HNR, since it could change their numerical solutions to letters.
Training of recognition modules is supported on: Windows, Linux.
The text recognition modules are trainable, allowing the application to achieve greater accuracy, particularly on stylized fonts, and whenever certain characters are being repeatedly mis-recognized in the same way. Training files can be created as a result of a training session performed in a Capture SDK based application, which incorporates the training feature of the OmniPage application or the TEC Text Editor Control of the CSDK. The created training file can be set calling kRecSetTrainingFileName.
A simple comparison diagram can be seen here. Any other accuracy and time information of the different engine and trade-off configurations can be accessible through the Technical Support. For more details about our Technical Support please see the General Information help system.