RecAPI
|
For each CCJK language, all the generally used characters that are defined in the country's character encoding standard are recognized. With default settings this means:
Enabling the Kernel.OcrMgr.Asian.FullCharacterSet setting adds all Level 2 characters to the recognized set as well for the Chinese and Japanese languages. (Korean doesn't have level distinctions.) In each setting, English letters, numerals and punctuation are also supported.
The recognized Han characters are the ones defined in the following encoding standards:
Encoding standard | Level 1 | Level 2 | |
Simplified Chinese | GB 2312 | 3755 | 3008 |
Traditional Chinese | Big-5 | 5401 | 7652 |
Japanese | JIS X 0208 | 2965 | 3390 |
Korean | KS X 1001 | 4888 |
Simplified Chinese: There are 3755 Level 1 and 3008 Level 2 characters (as defined by the GB 2312 standard) supported. 499 frequently used Level 2 characters (see later) are supported even with the default non-FullCharacterSet setting, while the remaining 2509 Level 2 characters are supported when FullCharacterSet is enabled only.
Traditional Chinese: There are 5401 Level 1 and 7652 Level 2 characters (as defined by the Big-5 standard) supported. Level 2 characters are supported when FullCharacterSet is enabled only.
If the setting Kernel.OcrMgr.Asian.CHTIncludesHKSCS is set to TRUE
and the selected language is Traditional Chinese, the Asian recognition module can recognize characters also from Hong Kong character set. Note that this setting may cause some decreasing of the accuracy of Chinese Traditional OCR.
Japanese: There are 2965 Level 1 and 3390 Level 2 kanji characters (as defined by the JIS X 0208 standard) supported. The 83 Hiragana and 86 Katakana characters are supported as well. Level 2 characters are supported when FullCharacterSet is enabled only.
Our Asian engine recognizes half-width katakana characters, however in its output the unicodes of the corresponding full-width katakana characters appear. So it treats half-width katakanas as if full-width katakanas would be printed in a particular font.
Korean: There are 4888 Hanja characters (as defined by the KS X 1001 standard) supported. The 2350 Hangul characters are supported as well.
Default setting | Full Character Set | Non-Han characters | English | Numerals | Punctuations | |
Simplified Chinese | 3755 (Level 1) + 499 (Level 2) | 3008 (Level 2) − 499 (Level 2 used in default setting) | - | 52 | 10 | 74 |
Traditional Chinese | 5401 (Level 1) | 7652 (Level 2) | - | 52 | 10 | 86 |
Japanese | 2965 (Level 1) | 3390 (Level 2) | Hiragana: 83, Katakana: 86 | 52 | 10 | 92 |
Korean | 4888 | - | Hangul: 2350 | 52 | 10 | 87 |
The 499 Level 2 Simplified Chinese characters are as follows: