RecAPI
FRX multi-lingual omnifont recognition module
Module name: FRX
Module identifier: RM_OMNIFONT_FRX
Filling methods supported: FM_OMNIFONT
Filters supported: all filter elements
Trade-off supported: none
Knowledge base files: none
Training file supported: yes (supported on: Windows, Linux)

The PLUS2W and PLUS3W recognition modules require the presence of this module. This module is supplied in both the Professional Recognition Kit and the OCR Kit. Its inclusion in your application must be covered by a distribution license. See the topic on Licensing in the General Information help system.

Its associated files are:

baltic.shp Frx shape pack (code page) file.
cyrillic.shp Frx shape pack (code page) file.
greek.shp Frx shape pack (code page) file.
latin1.shp Frx shape pack (code page) file.
latin2.shp Frx shape pack (code page) file.
turkish.shp Frx shape pack (code page) file.
charsettable.chr
asciieng.lng Frx language dictionary. Used in case of multi-language selection.
czech.lng Frx language dictionary data file.
danish.lng Frx language dictionary data file.
dutch.lng Frx language dictionary data file.
english.lng Frx language dictionary data file.
finnish.lng Frx language dictionary data file.
french.lng Frx language dictionary data file.
german.lng Frx language dictionary data file.
greek.lng Frx language dictionary data file.
hungar.lng Frx language dictionary data file.
italian.lng Frx language dictionary data file.
norsk.lng Frx language dictionary data file.
polish.lng Frx language dictionary data file.
port.lng Frx language dictionary data file.
russian.lng Frx language dictionary data file.
spanish.lng Frx language dictionary data file.
swedish.lng Frx language dictionary data file.
turkish.lng Frx language dictionary data file.

Application areas

This module recognizes machine printed text; i.e. from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It should also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers.

Range of characters

This module supports the recognition of Latin, Greek and Cyrillic alphabets with enough accented letters to recognize the 54 languages (Languages and modules).

The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages.

Multi-lingual language support

The language support of this module is based on the module's internal code pages, which contain characters from a related group of languages. The internal code pages of this module are American/European (Latin 1, 1252), Baltic (1257), Central-European (Latin 2, 1250), Cyrillic (1251), Greek (1253) and Turkish (1254).

The module supports multi-language selection for recognition, though it may not recognize languages from different language groups properly. It supports only language combinations within the same Code Page. For example, it properly processes the English, German and Italian language combination, since all these languages belong to the Latin 1 (1252) code page. However, when specifying e.g. both the French and Czech languages, RM_OMNIFONT_FRX may fail to properly recognize some accented characters in the Czech alphabet, since these languages are not in the same code page. The following table contains the languages by code pages supported by FRX.

Latin 2 (1250) Polish, Czech, Hungarian, Romanian, Albanian, Croatian, Wend (Sorbian), Slovak, Slovenian
Cyrillic (1251) Russian, Ukrainian, Byelorussian, Bulgarian, Macedonian, Serbian
Latin 1 (1252) English, German, French, Spanish, Italian, Dutch, Swedish, Norwegian, Finnish, Danish, Portuguese, Portuguese (Brasilian), Catalan, Afrikaans, Aymara, Basque, Breton, Faroese, Friulian, Gaelic, Galician, Eskimo, Icelandic, Indonesian, Latin, Malaysian, Pidgin English, Swahili, Tahitian, Welsh, Frisian, Zulu
Greek (1253) Greek
Turkish (1254) Turkish, Kurdish (written in Latin alphabet)
Baltic (1257) Estonian, Hawaiian, Latvian, Lithuanian

Character attributes

The omnifont recognition module can detect and transmit character attributes: bold, italic or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif and monospaced.

Performance issues

Please consult the topic Performance comparison for information on the balance between speed and accuracy for the most common engine combinations and trade-off settings.