RecAPI
Code Pages in the Engine

The following table summarizes all the Code Pages supported by the Engine. One of these must be specified as the Code Page of the final output document.

The default code page on Windows is the code page of the current OS, on Linux and Mac it is UTF-8. See also auto code page.

CODE PAGE NAME WIDTH DESCRIPTION IMPLEMENTATION
Windows ANSI 8-bit Code Page 1252 Hard-coded
Windows Greek 8-bit Code Page 1253 Hard-coded
Windows Eastern 8-bit Code Page 1250 Hard-coded
Windows Turkish 8-bit Code Page 1254 Hard-coded
Windows Baltic 8-bit Code Page 1257 Hard-coded
Windows Cyrillic 8-bit Code Page 1251 Hard-coded
Windows Esperant 8-bit Non Standard Win From file, derived from CP 1252
Wondows Sami 8-bit Sami From file, derived from CP 1252
Latin 1 8-bit ISO 8859-1 From file, derived from CP 1252
Code Page 437 8-bit DOS Latin US Hard-coded
Greek-ELOT 8-bit DOS Greek Hard-coded
Greek-MEMOTEK 8-bit DOS Greek Hard-coded
Code Page 850 8-bit DOS Latin 1 From file, derived from CP 437
Code Page 852 8-bit DOS Latin 2 From file, derived from CP 437
Code Page 860 8-bit DOS Portuguese From file, derived from CP 437
Code Page 863 8-bit DOS French-Canadian From file, derived from CP 437
Code Page 865 8-bit DOS Nordic From file, derived from CP 437
Code Page 866 8-bit DOS Cyrillic CIS From file, derived from CP 437
CWI Magyar 8-bit DOS Hungarian From file, derived from CP 437
Magyar Ventura 8-bit DOS Hungarian From file, derived from CP 437
IVKAM C-S 8-bit Czech & Slovak From file, derived from CP 437
Mazowia Polish 8-bit DOS Polish From file, derived from CP 437
Sloven & Croat 8-bit 7 bits used From file, derived from CP 437
Turkish 8-bit DOS Turkish From file, derived from CP 437
Icelandic 8-bit DOS Icelandic From file, derived from CP 437
Macintosh 8-bit Mac Western Hard-coded
Mac INSO Latin 2 8-bit MAC CE Hard-coded
Mac Central EU 8-bit PT 202 Hard-coded
Mac Primus CE u 8-bit MAC CE Hard-coded
Maltese 8-bit Malta; 7 bits used From file, derived from CP 437
OCR 8-bit Non Standard Win From file, derived from CP 437
Unicode 16-bitmultilingual Hard-coded
WordPerfect 16-bitmultilingual Hard-coded
WordPerfect Old 16-bitmultilingual Hard-coded
Roman 8 8-bit For HP printers Hard-coded
UTF-8 16-bitmultilingual Hard-coded
Big5 16-bitTraditional Chinese
(supports ETen extension and
HKscs non-standard mode)
From file
EUC-CN 16-bitSimplified Chinese From file
EUC-JP 16-bitJapanese From file
EUC-TW 16-bitTraditional Chinese From file
GB 18030 16-bitSimplified Chinese From file
GBK 16-bitSimplified Chinese From file
HKSCS-2004 16-bitTraditional Chinese
(HKscs standard mode,
supports ETen extension
From file
Shift_JIS 16-bitJapanese From file
UHC 16-bitKorean (extended EUC-KR)From file

For programming, the current Code Page setting of the Engine can be set by kRecSetCodePage and inquired by kRecGetCodePage. The exact list of available Code Pages can be inquired by the functions kRecGetFirstCodePage and kRecGetNextCodePage.

Many derivative Code Pages are offered. Their definitions can be found in the Code Page Definition files, RECOGN.SET, LATIN1.SET and SAMI.SET. Each derivative is based on CP 1252 or CP 437, and the appropriate section in the Code Page Definition file specifies only the changed character positions.

Note:
For most languages you may find best to use one of the unicode codepages: "Unicode" or "UTF-8".
The Windows ANSI name is given by only the OmniPage CSDK, it is not connected with the real ANSI character set (ISO 8859-1). Its name comes from a historical evolution. In fact, this is the code page 1252 (Windows CP 1252).
Details of the WordPerfect code pages:
  • WordPerfect: variable length character codes stored on 1 or 4 bytes. It is used from WordPerfect 6.0.
  • WordPerfect Old: similar to WordPerfect, but it is used before WordPerfect 6.0.
See also the description of CCJK code page files.

Customized Code Pages

If no offered Code Page fulfils your needs, you can develop your own derivative 8-bit Code Page by adding a new Code Page Definition file to the Engine Binary directory. This new file should have a .SET file extension and it should contain a separate section for your custom Code Page. Five Code Pages are available as the basis for customized Code Pages:

  • Windows CP 1252
  • DOS CP 437
  • Greek-Elot
  • Greek Memotek
  • Roman 8

The characters belonging to the custom Code Page should be given in UNICODE and follow the other layout conventions found in the RECOGN.SET basic Code Page Definition file.

Code Page ignored during converting to the output file

In some cases, the Code Page setting of the Engine must be specified together with the Output Text Format for the final output document. With other output formats, specifying the Code Page is superfluous, since these output converters ignore the Code Page setting (e.g. MS Word).