RecAPI
Supported characters with the CCJK languages

For each CCJK language, all the generally used characters that are defined in the country's character encoding standard are recognized. With default settings this means:

  • all Level 1 and some frequently used Level 2 characters in Simplified Chinese
  • all Level 1 characters in Traditional Chinese
  • all Level 1 characters in Japanese
  • all common characters in Korean

Enabling the Kernel.OcrMgr.Asian.FullCharacterSet setting adds all Level 2 characters to the recognized set as well for the Chinese and Japanese languages. (Korean doesn't have level distinctions.) In each setting, English letters, numerals and punctuation are also supported.

The recognized Han characters are the ones defined in the following encoding standards:

Encoding standard Level 1 Level 2
Simplified Chinese GB 2312 3755 3008
Traditional Chinese Big-5 5401 7652
Japanese JIS X 0208 2965 3390
Korean KS X 1001 4888

Simplified Chinese: There are 3755 Level 1 and 3008 Level 2 characters (as defined by the GB 2312 standard) supported. 499 frequently used Level 2 characters (see later) are supported even with the default non-FullCharacterSet setting, while the remaining 2509 Level 2 characters are supported when FullCharacterSet is enabled only.

Traditional Chinese: There are 5401 Level 1 and 7652 Level 2 characters (as defined by the Big-5 standard) supported. Level 2 characters are supported when FullCharacterSet is enabled only.

If the setting Kernel.OcrMgr.Asian.CHTIncludesHKSCS is set to TRUE and the selected language is Traditional Chinese, the Asian recognition module can recognize characters also from Hong Kong character set. Note that this setting may cause some decreasing of the accuracy of Chinese Traditional OCR.

Japanese: There are 2965 Level 1 and 3390 Level 2 kanji characters (as defined by the JIS X 0208 standard) supported. The 83 Hiragana and 86 Katakana characters are supported as well. Level 2 characters are supported when FullCharacterSet is enabled only.

Our Asian engine recognizes half-width katakana characters, however in its output the unicodes of the corresponding full-width katakana characters appear. So it treats half-width katakanas as if full-width katakanas would be printed in a particular font.

Korean: There are 4888 Hanja characters (as defined by the KS X 1001 standard) supported. The 2350 Hangul characters are supported as well.

Default setting Full Character Set Non-Han characters English Numerals Punctuations
Simplified Chinese 3755 (Level 1) + 499 (Level 2) 3008 (Level 2) − 499 (Level 2 used in default setting) - 52 10 74
Traditional Chinese 5401 (Level 1) 7652 (Level 2) - 52 10 86
Japanese 2965 (Level 1) 3390 (Level 2) Hiragana: 83, Katakana: 86 52 10 92
Korean 4888 - Hangul: 2350 52 10 87

The 499 Level 2 Simplified Chinese characters are as follows:

cslevel2.jpg