RecAPI
RER handprinted recognition module
Module name: RER
Module identifier: RM_RER
Filling methods supported: FM_HANDPRINT, FM_CMC7, FM_OCRA, FM_OCRB, FM_MICR
FM_OMNIFONT (Thai, Vietnamese, Hebrew)
Filters supported: all filter elements
Trade-off supported: TO_ACCURATE, TO_FAST (includes TO_BALANCED)
Knowledge base file: kadmos.uk, hand_s.rec, numplus.rec, and the below language-specific kb-files.
Knowledge base file for Thai OCR: kadmos.uk, ttf_s_th.rec.
Knowledge base file for Hebrew OCR: kadmos.uk, ttf_s_il.rec.
Knowledge base file for Vietnamese OCR: kadmos.uk, ttf_s_vn.rec.
Training file supported: no

This module is supported on: Windows, Linux, Mac OS X.

This module is included only in the Professional Recognition Kit (not the OCR kit). To make this technology available in your application, it must be covered by your distribution licensing.

Thai, Vietnamese and Hebrew OCR can be purchased as an add-on ("Asian Plus") to either the Professional Recognition Kit or the Professional OCR Kit.

See the topic on Licensing in the General Information help system.

Version information

This is a third-party recognition module from reRecognition GmbH, Germany. The Engine contains its recognition engine version 6.0k.

Application areas

This recognition module can be used for recognition of handprinted alphanumerical characters, i.e. upper and lower case letters, the digits and some others. Although it can be used to read flowing text, its main application area is in form-like situations, where the form designer has great control over the content and maybe length of handprinted information given in each zone.

In addition this module recognizes Thai, Vietnamese and Hebrew text. It can handle short embedded English texts within such language text. Thai language is accessible from version 19.0, Hebrew from 20.1, Vietnamese from 20.2. See details below.

Recognition of handprinted text

Range of characters

Selecting the filling method FM_HANDPRINT this module can differentiate 159 characters. These are the digits, 28 punctuation and miscellaneous characters (listed below), letters of the English alphabet plus all accented characters necessary for 98 languages. Fifteen languages have dictionary support: Catalan, Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Spanish and Swedish. Other supported languages include Croatian (with one limitation), Estonian, Gaelic, Indonesian, Latvian, Lithuanian, Slovak, Slovenian, Swahili, Tagalog, Turkish and Welsh (the last two with minor limitations). Cyrillic languages and Greek are not supported. In Hungarian the lower case characters "Small I Acute", "Small O Acute" and the "Small U Acute" are not supported, in effect limiting recognition to upper case characters. These languages can be freely combined, but then dictionary support is not available.

The following punctuation characters can be recognized:

! Exclamation Mark
? Question Mark
Apostrophe-Quote
" Quotation Mark
: Semicolon
, Comma
: Colon
. Period (Full-stop)
- Hyphen-Minus
( Opening Parenthesis
) Closing Parenthesis
[ Opening Square Bracket
] Closing Square Bracket
{ Opening Curly Bracket
} Closing Curly Bracket

The following miscellaneous characters can be recognized:

# Number Sign
% Percent Sign
@ Commercial At
& Ampersand
| Vertical Bar
$ Dollar Sign
* Asterisk
+ Plus Sign
= Equals Sign
_ Spacing Underscore
/ Slash
\ Backslash
< Less-Than Sign
> Greater-Than Sign

Other supported filling methods gives additional character ranges to the capability of RER engine. The description of these ranges can be found in OCR special filling methods and in the summary table of OCR Special Characters.

Knowledge base files

The compulsory knowledge base file is kadmos.uk. The other files with .rec extension are optional, removable, selectable and combinable with each other manually. From them, only the general knowledge base file hand_s.rec is installed with the module during installation of OmniPage Capture SDK v20. The remainder are only in the folder RER_KBFILES of the OmniPage CSDK install CD. (The file hand_s.rec is also here.) The file numplus.rec contains only the knowledge about numbers and some miscellaneous characters. Language-specific knowledge base files are also distributed as listed in the table below. These files have names in the form hand_s_??.rec, where the double question mark within the filename should be replaced by a country code as follows:

CodeLanguage(s) / Territory
al Albanian
at Austrian, German
be Belgian, Dutch, French, German
ch Swiss, French, German, Italian
cs Czech, Slovakian
cz Czech
de German
dk Danish
ee Estonian
es Spanish
eu West-European
fi Finnish
fr French
hu Hungarian
ie Irish, English, Gaelic Irish
it Italian
lt Lithuanian
lv Latvian
nl Dutch
no Norwegian
pl Polish
pt Portuguese
ro Romanian
se Swedish
sf Scandinavia
sl Slovenian
sk Slovakian
tr Turkish
uk UK
us USA

Using optional knowledge base file(s) may improve accuracy. Any subset of them can be simply copied manually into the Engine Binary directory before initiating the Engine. Although the system automatically identifies which knowledge base file is needed for a given situation (e.g. according to the language), recognition speed can be improved by minimizing the number of knowledge base files in the Engine Binary directory.

The module requires at least one .REC file in the Engine Binary directory. It is not necessary to be HAND.REC. On the other hand, the Redistribution Wizard of the CSDK tries to copy only HAND.REC from the binary folder into the selected file set (and sends a message, if this file is not there). Thus if you want to see a different subset of optional knowledge base files in your redistributed file set you should select and copy it manually after running the Redistribution Wizard.

Accuracy issues

Handprint is much harder to recognize accurately than machine generated text, and success depends very heavily on character quality. The use of structured forms to limit the possible range of characters, together with zone-level filters and individual character validation can significantly improve accuracy. This recognition module can apply all the Engine’s possible filter elements to the 159-member character set it supports. Handprinted forms are usually filled by different respondents and this is liable to lower accuracy. If respondents can be given clear filling instructions (e.g. a print model to follow) and be motivated to print clearly, success will be higher.

If the handprint contains numbers only, using the RM_HNR module is likely to give better results than the RM_RER module filtered for numbers only. The functioning of the module can be influenced by the page-level trade-off settings: TO_ACCURATE is respected, while TO_FAST and TO_BALANCED are merged.

Conditions

For successful recognition, the characters should not touch each other. Each character can be zoned individually or a zone may contain one or more lines of characters. Each character must have a height of 30-180 pixels. Well formed characters written in pen are best recognized. Pencil and felt-tip pens give poorer results. When reading from pre-printed forms, dropout colored boxes can be useful to encourage respondents to write characters of even size and spacing. But then, they mustn’t use a pen with the dropout color.

Maximum number of characters in a line: 200.

Number of lines in a zone: No restriction.

Module integration issues

The Engine cannot provide access to all the parameters of reRecognition’s KADMOS toolkit. Note however, that the recognition module can be fine-tuned through parameters of an INI file located under the section [Parm]. A sample INI file RM_RER.INI can be found in the above mentioned folder RER_KBFILES. The full-path of the given INI file can be specified by the setting Kernel.Ocr.RER.UseParamFile, which replaces the function RecSetRMSpecParams of the previous CSDK versions.

Recognition of Thai, Vietnamese and Hebrew text

RER recognition module can recognize only machine printed (FM_OMNIFONT) characters of these languages. Handprinted characters are not supported.

For recognition of such text the given language should be set (LANG_THA, LANG_VIE, LANG_HEB) and Western languages should not be set (except English in one case - see next paragraph).

The module can recognize short English texts embedded in such language text. If embedded texts are in other Latin-alphabet languages, their recognition is also possible, however accented characters may not always be handled correctly. English language MUST be se for embedded text recognition of any Latin-alphabet language. (See also CCJK and Arabic language handling details.)

IMPORTANT NOTE: For the correct working of the recognition of these languages, the language should be set before the preprocess operation.

Note:
The inversion detection, rotation detection and deskew detection steps of preprocess do not work for Thai and Hebrew language images, but their manual mode can be used. Fax correction does not work at all in these cases. However despeckle supports also these images.
Only the DCM_LEGACY auto-zoning algorithm works well for Thai and Hebrew language images, thus decomp method setting has no effect in this case. In addition, only WT_FLOW and WT_GRAPHIC zones are enabled for Thai and Hebrew manual zoning.
This third-party recognition module is tightly integrated to the Engine. For more information on reRecognition's handprinted recognition technology, visit their homepage (http://www.rerecognition.com).