RecAPI
|
Zone handling tools. More...
Classes | |
struct | ZONE |
ZONE structure. More... | |
struct | ZONEDATA |
ZONEDATA structure. More... | |
Modules | |
Table Recognition Module | |
Table detection and working with tables. | |
Typedefs | |
typedef ZONE * | LPZONE |
Pointer to a structure ZONE. | |
typedef const ZONE * | LPCZONE |
Const pointer to a structure ZONE. | |
typedef ZONEDATA * | LPZONEDATA |
Pointer to a structure ZONEDATA. | |
typedef const ZONEDATA * | LPCZONEDATA |
Const pointer to a structure ZONEDATA. | |
Enumerations | |
enum | FILLINGMETHOD { FM_DEFAULT = 0, FM_OMNIFONT, FM_DRAFTDOT9, FM_BARCODE, FM_OMR, FM_HANDPRINT, FM_BRAILLE, FM_DRAFTDOT24, FM_OCRA, FM_OCRB, FM_MICR, FM_BARCODE2D, FM_DOTDIGIT, FM_DASHDIGIT, FM_RESERVED_2, FM_CMC7, FM_NO_OCR, FM_SIZE } |
Filling methods. More... | |
enum | RECOGNITIONMODULE { RM_AUTO = 0, RM_OMNIFONT_MTX, RM_OMNIFONT_MOR, RM_DOT, RM_BAR, RM_OMR, RM_HNR, RM_RER, RM_BRA, RM_MAT, RM_RESERVED_P, RM_OMNIFONT_PLUS2W, RM_OMNIFONT_FRX, RM_OMNIFONT_PLUS3W, RM_ASIAN, RM_RESERVED_M, RM_RESERVED_A, RM_SIZE } |
Recognition modules (Engines) More... | |
enum | ZONETYPE { WT_FLOW, WT_TABLE, WT_GRAPHIC, WT_AUTO, WT_IGNORE, WT_FORM, WT_VERTTEXT, WT_LEFTTEXT, WT_RIGHTTEXT } |
Basic zone types. More... | |
enum | IMG_DECOMP { DCM_AUTO, DCM_LEGACY, DCM_STANDARD, DCM_FAST } |
Page parse method. More... | |
Functions | |
RECERR RECAPIKRN | kRecSetDecompMethod (int sid, IMG_DECOMP Algorithm) |
Setting the decomp method. | |
RECERR RECAPIKRN | kRecGetDecompMethod (int sid, IMG_DECOMP *pAlgorithm) |
Getting the decomp method. | |
RECERR RECAPIKRN | kRecSetNongriddedTableDetect (int sid, INTBOOL bEnable) |
Setting the non-gridded table detection. | |
RECERR RECAPIKRN | kRecGetNongriddedTableDetect (int sid, INTBOOL *bEnable) |
Getting the non-gridded table detection setting. | |
RECERR RECAPIKRN | kRecSetForceSingleColumn (int sid, INTBOOL bForceSingle) |
Specify the Force Single Column mode. | |
RECERR RECAPIKRN | kRecGetForceSingleColumn (int sid, INTBOOL *pbForceSingle) |
Getting the setting of Force Single Column mode. | |
RECERR RECAPIKRN | kRecLocateZones (int sid, HPAGE hPage) |
Page parsing. | |
RECERR RECAPIKRN | kRecSetPageDescription (int sid, DWORD PageDesc) |
Setting page description data. | |
RECERR RECAPIKRN | kRecGetPageDescription (int sid, DWORD *pPageDesc) |
Getting page description data. | |
RECERR RECAPIKRN | kRecGetZoneCount (HPAGE hPage, int *pnZones) |
Getting the user zone count. | |
RECERR RECAPIKRN | kRecGetZoneInfo (HPAGE hPage, IMAGEINDEX iiImg, LPZONE pZone, int nZone) |
Getting user zone information. | |
RECERR RECAPIKRN | kRecGetZoneLayout (HPAGE hPage, IMAGEINDEX iiImg, LPRECT *ppRects, int *pnRects, int iZone) |
Getting user zone shape information. | |
RECERR RECAPIKRN | kRecGetZoneNodeArray (HPAGE hPage, IMAGEINDEX iiImg, LPPOINT *ppPoints, int *pnNodes, int iZone) |
Getting the polygon of the user zone. | |
RECERR RECAPIKRN | kRecDeleteAllZones (HPAGE hPage) |
Deleting all user zones. | |
RECERR RECAPIKRN | kRecDeleteZone (HPAGE hPage, int nZone) |
Deleting a user zone. | |
void RECAPIKRN | kRecInitZone (LPZONE pZone) |
Initializing a ZONE variable. | |
RECERR RECAPIKRN | kRecInsertZone (HPAGE hPage, IMAGEINDEX iiImg, LPCZONE pZone, int nZone) |
Inserting a user zone. | |
RECERR RECAPIKRN | kRecAddZoneRect (HPAGE hPage, IMAGEINDEX iiImg, const RECT *pRect, int nZone) |
Adding a rectangle to a user zone. | |
RECERR RECAPIKRN | kRecSubZoneRect (HPAGE hPage, IMAGEINDEX iiImg, const RECT *pRect, int nZone) |
Subtracting a rectangle from a user zone. | |
RECERR RECAPIKRN | kRecCopyOCRZones (HPAGE hPage) |
Copying the OCR zone list to a user zone list. | |
RECERR RECAPIKRN | kRecLoadZones (HPAGE hPage, LPCTSTR pFileName) |
Loading user zones. | |
RECERR RECAPIKRN | kRecSaveZones (HPAGE hPage, LPCTSTR pFileName) |
Saving the user zone list. | |
RECERR RECAPIKRN | kRecUpdateZone (HPAGE hPage, IMAGEINDEX iiImg, LPCZONE pZone, int nZone) |
Updating a user zone. | |
RECERR RECAPIKRN | kRecSetZoneLayout (HPAGE hPage, IMAGEINDEX iiImg, LPCRECT pRects, int nRects, int nZone) |
Updating the user zone shape information. | |
RECERR RECAPIKRN | kRecGetOCRZoneCount (HPAGE hPage, int *pnOCRZones) |
Getting the OCR zone count. | |
RECERR RECAPIKRN | kRecGetOCRZoneInfo (HPAGE hPage, IMAGEINDEX iiImg, LPZONE pOCRZone, int nOCRZone) |
Getting OCR zone information. | |
RECERR RECAPIKRN | kRecGetOCRZoneData (HPAGE hPage, IMAGEINDEX iiImg, LPZONEDATA pOCRZoneData, int nOCRZone) |
Getting additional information about OCR zones. | |
RECERR RECAPIKRN | kRecGetOCRZoneLayout (HPAGE hPage, IMAGEINDEX iiImg, LPRECT *ppRects, int *pnRects, int nZone) |
Getting OCR zone shape information. | |
RECERR RECAPIKRN | kRecGetOCRZoneNodeArray (HPAGE hPage, IMAGEINDEX iiImg, LPPOINT *ppPoints, int *pnNodes, int iZone) |
Getting the polygon of the OCR zone. | |
RECERR RECAPIKRN | kRecSaveOCRZones (HPAGE hPage, LPCTSTR pFileName) |
Saving the OCR zone list. | |
RECERR RECAPIKRN | kRecUpdateOCRZone (HPAGE hPage, IMAGEINDEX iiImg, LPCZONE pZone, int nZone) |
Updating the OCR zone. | |
Bitmasks of checking control | |
Defining spell checking behavior by zones. See ZONE::chk_control. | |
#define | CHK_LANGDICT_PROHIBIT 0x00000001 |
Prohibit the use of the Language dictionary. | |
#define | CHK_USERDICT_PROHIBIT 0x00000002 |
Prohibit the use of the user dictionary. | |
#define | CHK_CHECKCBF_PROHIBIT 0x00000004 |
Deprecated. | |
#define | CHK_VERTDICT_PROHIBIT 0x00000008 |
Prohibit the use of the Vertical dictionary. | |
#define | CHK_IGNORE_WHITESPACE 0x00000010 |
Ignore white space characters (SPACE and TAB characters) during checking. This field should be used together with the CHK_PASS_LINES flag. | |
#define | CHK_IGNORE_CASE 0x00000020 |
Case insensitive UD-checking. */. | |
#define | CHK_PASS_LINES 0x00000040 |
Instructs the selected RECOGNITIONMODULE to pass entire lines to the checker, instead of words. Do not use this attribute in conjunction with spell checking. | |
#define | CHK_CORRECTION_DISABLED 0x00000080 |
Retained only for compatibility. | |
#define | CHK_INCLUDE_PUNCTUATION 0x00000100 |
Checking will consider punctuation characters on the boundaries of the strings as well. | |
#define | CHK_CORRECT_PROPERNAMES 0x00000200 |
Retained only for compatibility. | |
#define | CHK_LANGDICT_USED 0x00010000 |
"After recognition flag": the Language dictionary was enabled during the checking process (spell checking was activated for the zone). | |
#define | CHK_USERDICT_USED 0x00020000 |
"After recognition flag": the user dictionary was enabled during the checking process (UD-checking was activated for the zone). | |
#define | CHK_CHECKCBF_USED 0x00040000 |
Deprecated. | |
#define | CHK_VERTDICT_USED 0x00080000 |
"After recognition flag": a Vertical dictionary was enabled during the checking process. | |
Page Descriptor defines | |
Defining behavior of auto-zoning outside user zones. See the usage of page descriptor. | |
#define | LZ_COLUMN_MASK 0x000000ff |
This can be used for masking the LZ_COLUMN flag. | |
#define | LZ_COLUMN_NO 0x00000001 |
This does not find text zones on the page. | |
#define | LZ_COLUMN_ONE 0x00000002 |
The page contains one column (single column mode). | |
#define | LZ_COLUMN_AUTO 0x00000004 |
This finds text zones on the page automatically. | |
#define | LZ_COLUMN_FIND 0x00000008 |
Internal use only. | |
#define | LZ_TABLE_MASK 0x0000ff00 |
This can be used for masking the LZ_TABLE flag. | |
#define | LZ_TABLE_NO 0x00000100 |
This does not find tables on the page. | |
#define | LZ_TABLE_ONE 0x00000200 |
The whole page is one table. | |
#define | LZ_TABLE_AUTO 0x00000400 |
This finds table zones automatically. | |
#define | LZ_GRAPHICS_MASK 0x00ff0000 |
This can be used for masking the LZ_GRAPHICS flag. | |
#define | LZ_GRAPHICS_NO 0x00010000 |
This does not find graphics on the page. | |
#define | LZ_GRAPHICS_ONE 0x00020000 |
The whole page is one graphic. | |
#define | LZ_GRAPHICS_AUTO 0x00040000 |
This finds graphic zones automatically. | |
#define | LZ_FORM 0x01000000 |
The page contains an unfilled form. Do not create any user zones if you use LZ_FORM! See Form Recognition Module as well. | |
#define | LZ_FREEFORM 0x02000000 |
This can be used for recognition of free forms. This is when a page contains a filled, gridded form, and the best possible OCR accuracy is desired, without creating formatted output. In this case the gridded form is decomposed into smaller text zones optimized for OCR. The zones are not sorted by reading order. It cannot be combined with LZ_FORM. It is used only by DCM_STANDARD method, and not used by the Asian auto-zoning algorithms. |
Zone handling tools.
The zone is a rectangular area or the union of specifically located rectangular areas in the page. The upper limit of its dimensions is full page size. It also contains a feature of interest to the user.
The union of rectangles must have a so-called pizzabox shape: the top of each rectangle in the union must touch the bottom of the upper rectangle (i.e. the bottom of the upper one and the top of the lower one is exactly the same). A rectangle can touch at most one rectangle above and one below.
Zones that cannot have a pizzabox shape include:
A pizzabox-shaped zone is a compound and irregular zone.
The image data covered by each zone is handled and processed (typically recognized) separately, according to zone-specific parameters.
NOTE: In both the SDK and its documentation coordinates refer to grid-coordinates - i.e. the top or left borders of pixels. Thus a rectangle does not contain the pixels according to its right and bottom coordinates.
Any HPAGE can contain two types of zones in zone lists:
The user zones are defined by the User. The OCR zones are created by the page parser process, which detects the OCR zones and fills the OCR zone list. When there are user zones, the page parser creates one or more OCR zones from each user zone and it may process the area outside of user zones, as controled by the page descriptor (see below).
IMPORTANT: The CSDK does not support overlapping non-graphical user zones. A graphical user zone (WT_GRAPHIC
) can overlap non-graphical ones. Furthermore, the auto zoning algorithm may create graphical OCR zones overlapping non-graphical ones.
The type of OCR zones never can be WT_AUTO and WT_IGNORE. The created OCR zones always inherit the attributes (e.g.: filter, filling method, etc.; see ZONE) of the user zone inside which they were created. If an OCR zone is created outside user zones, its attributes will be set to default for filling method, recognition module, filters and spell checking related properties.
The recognition process (kRecRecognize) works on OCR zones.
The number of zones in the zone lists can be queried at any time using the functions kRecGetZoneCount and kRecGetOCRZoneCount. All functions that use an index to determine the zone to be queried or modified may receive the index -1. This refers to the last zone in the given zone list. Exception: kRecInsertZone : Assign the value -1 to have the new zone inserted at the end of the zone list. From then on the value -1 refers to this inserted zone.
Zones can be added to the appropriate zone list of any given HPAGE in three different ways:
Automatic page-layout decomposition process (auto-zoning) can be activated directly by calling the kRecLocateZones function for finding text blocks on the image. It creates an entire OCR zone list for the given HPAGE.
OmniPage Capture SDK v20 offers three different algorithms to be applied during auto-zoning: use the kRecSetDecompMethod function to specify the Page parser algorithm to be applied during auto-zoning. For details, see also IMG_DECOMP.
When you use auto-zoning, each resulting zone is initialized with
WT_AUTO
and WT_IGNORE
).All zones created by this function will have
If auto-zoning uses the method DCM_STANDARD, the process will also attempt finding horizontal and vertical rule lines. If there are user zones, auto-zoning searches for rule lines in WT_AUTO and WT_TABLE zones and also outside the user zones - when the page descriptor allows it (see Page Descriptor defines). If there are no user zones, rule line detection is performed. Rule lines are stored in the page, in a line list. The recognition process (PID_RECOGNITION1 et al.) modifies this line list and it retains only the lines that are outside the OCR zones. After zoning, the line list does not contain dotted, dashed or double style lines (RLSTYLE). This information only becomes available after the recognition process.
OCR zones may be changed by the recognition process (kRecRecognize), because some post-processing operations have such effects. For example, when non-gridded table detection (kRecSetNongriddedTableDetect) runs during the recognition process.
Any zone can be locally overridden with the functions kRecUpdateZone and kRecUpdateOCRZone. These allow you to change the attributes of a zone in the zone list. Note that the fields ZONE::rectBBox, and ZONE::type cannot be modified by kRecUpdateOCRZone.
You can choose to search for zones automatically, and/or create your own zones: user zones. To add simple zones to the zone list manually, use the kRecInsertZone function. To add a rectangle to or subtract a rectangle from an existing user zone, use the functions kRecAddZoneRect or kRecSubZoneRect.
The third way of creating zones is to have zones read from a file (called a zone file, or in OmniPage terminology, a zone template file) that contains the attributes of previously saved zones. Zones created this way will also be user zones. An integrating application can save the current user zone definitions to a zone file any time with the kRecSaveZones function. The application can load them from a zone file with the kRecLoadZones function.
NOTE: When a zone file is loaded, any previous zones are removed from the page.
If the application calls the kRecRecognize recognizing function on a page with an empty zone list, the page-layout decomposition function is called automatically.
It is recommended to create homogeneous user zones as much as possible, because they may give better results. It is especially important in the case of Asian languages (either CCJK, Arabic, Thai or Hebrew). WT_AUTO zones can be inhomogeneous.
To get information about any particular zone in the image zone list, invoke the kRecGetZoneInfo and kRecGetOCRZoneInfo functions. These functions are useful to find out more about zones created by auto-zoning.
NOTE: When you update a table-type zone with the kRecUpdateZone function, the cell-detection algorithm will not be activated, resulting in improper table detection within the zone. See the description of creation of table information.
Any changes in user zone list (kRecInsertZone, kRecDeleteZone, kRecDeleteAllZones, kRecAddZoneRect, kRecSubZoneRect, kRecLoadZones, kRecUpdateZone, kRecSetZoneLayout) will make OCR zones invalid; the OCR zone list will be emptied and regenerated.
The page description describes the possible layout elements (text, table, graphics and form) on the page outside of the user zones. These layout elements are found by the page-parse and the recognition processes. The page description has no effect inside the user zones. The LZ_COLUMN
/ LZ_TABLE
/ LZ_GRAPHICS
flags specify how to find text / table / graphic zones.
A valid page description is either a combination of the LZ_COLUMN_column
, LZ_TABLE_table
and LZ_GRAPHICS_graphics
flags, (where column
, table
and graphics
can be NO
, ONE
and AUTO
) or the LZ_FORM
flag (LZ_FORM
cannot be combined with other flags). The default page description is LZ_COLUMN_NO | LZ_TABLE_NO | LZ_GRAPHICS_NO. This means that page-parse does not create OCR zones outside of the user zones. If no user zones were specified LZ_COLUMN_NO
| LZ_TABLE_NO
| LZ_GRAPHICS_NO
is equivalant to LZ_COLUMN_AUTO
| LZ_TABLE_AUTO
| LZ_GRAPHICS_AUTO
.
IMPORTANT NOTE: if the page descriptor is set to LZ_FORM
, there must not be any zones on the page.
If LZ_TABLE_ONE is set LZ_COLUMN_column
and LZ_GRAPHICS_graphics
are not considered. If LZ_GRAPHICS_ONE is set LZ_COLUMN_column
and LZ_TABLE_table
are not considered. If both LZ_TABLE_ONE
and LZ_GRAPHICS_ONE
are set the zoning works as if only LZ_TABLE_ONE
was set.
The DCM_LEGACY and DCM_FAST mode zoning can handle only the following cases:
LZ_COLUMN_NO
| LZ_TABLE_NO
| LZ_GRAPHICS_NO:
this is the default,LZ_COLUMN_AUTO
| LZ_TABLE_AUTO
| LZ_GRAPHICS_AUTO:
only when there is no user zone,LZ_COLUMN_ONE
| LZ_TABLE_AUTO
| LZ_GRAPHICS_AUTO:
only when there is no user zone,any other cases cause an error (API_ERROR_ERR).
Page descriptor can be specified by the function kRecSetPageDescription.
Table detection and handling by Table Recognition Module are also parts of this module.
The working of the Zone Handling Module can also be adjusted with settings in some points.
Detection of filling method of zones can be performed manually calling the function kRecDetectFillingMethod. It works on zones with FM_DEFAULT. If the default filling method (kRecSetDefaultFillingMethod) is set to FM_DEFAULT
, the filling method detection is called automatically at the beginning of recognition process.
If filling method detection cannot determine a type in a given zone, it leaves FM_DEFAULT
in the field fm of the zone.
During recognition if both the default filling method and the field ZONE::fm
are FM_DEFAULT
, the engine supposes FM_OMNIFONT filling method for such zones.
For Western languages, including Greek and languages using the Cyrillic alphabet, automatic detection of left or right rotated vertical text is available, including detection inside table cells.
Auto-detected vertical text zones outside tables take the flags WT_LEFTTEXT or WT_RIGHTTEXT while table cells detected as containing vertical text now include this in the CELL_INFO data.
This auto-detection runs on images with no inserted user zones, or on page portions designated for auto-zoning. Text direction can be forced by inserting user zones into page images containing Western or Cyrillic languages, using the following flags: normal (WT_FLOW), left rotated vertical (WT_LEFTTEXT
) or right rotated vertical (WT_RIGHTTEXT
). Vertical text user zones must be rectangular; they can be placed anywhere on the page and can cover multi-line texts. To force vertical text handling in a table cell, the required flag should be set in the new text type field inside CELL_INFO
.
Automatic left and right text detection can be disabled by switching the Kernel.Decomp.FindRotatedText setting off. Switch this setting off if the processed document does not contain vertical text, because the vertical text detection (and recognition) may increase processing time.
Vertical text can also be auto-detected for CCJK languages. Alternatively, it can be explicitly set by inserting zones with the flags WT_FLOW
for horizontal left-to-right text or WT_VERTTEXT for vertical text with top-to-bottom character flow and right-to-left line flow. As for Western languages, CCJK vertical text zones must be rectangular. If an irregular zone is changed to vertical text content, its shape snaps to a bounding rectangle, removing the irregularity. It is not possible to pass text direction information for table cells to the Asian OCR module – it will perform auto-detection.
In the recognition results, the LETTER structure makeup field contains two additional bits to store the text direction. See also which output converters and formatting levels of the RecAPIPlus support the different types of vertical texts.
enum FILLINGMETHOD |
Filling methods.
This enumerates the possible content types of the zones from the Engine's perspective. Each zone must have one of the filling methods listed here. It can be done by specifying the fm field of the zones defined on the image.
FM_DEFAULT |
The default zone filling method. The actual zone filling method for all zones of this type will be inquired just before recognition, according to the setting previously specified with a separate call to the kRecSetDefaultFillingMethod function. |
FM_OMNIFONT |
The omnifont zone filling method. It denotes a machine printed text with any typeface not highly stylized. All platforms. |
FM_DRAFTDOT9 |
The 9-pin draft dot-matrix zone filling method. It denotes a 9-pin draft dot-matrix printout. Supported on: Windows. |
FM_BARCODE |
The 1D barcode zone filling method. It denotes a one-dimensional barcode within the zone. |
FM_OMR |
The optical mark zone filling method. It denotes a zone with one or more checkboxes that are judged to be marked or unmarked. |
FM_HANDPRINT |
The hand-written zone filling method. It denotes hand-written text within the zone. Supported on: Windows, Mac OS X. |
FM_BRAILLE |
This filling method is NOT available. |
FM_DRAFTDOT24 |
The 24-pin draft dot-matrix zone filling method. It denotes a 24-pin draft dot-matrix printout. All platforms. |
FM_OCRA |
The OCR-A zone filling method. |
FM_OCRB |
The OCR-B zone filling method. |
FM_MICR |
The magnetic ink character filling method. Supported on: Windows, Mac OS X. |
FM_BARCODE2D |
The 2D barcode zone filling method. |
FM_DOTDIGIT |
The dot-digit zone filling method. Supported on: Windows. |
FM_DASHDIGIT |
The dash-digit zone filling method. Supported on: Windows. |
FM_RESERVED_2 |
Internal use only. |
FM_CMC7 |
The CMC7 font zone filling method. Supported on: Windows, Mac OS X. |
FM_NO_OCR |
No recognition will be attempted. |
FM_SIZE |
Number of zone filling methods. |
enum IMG_DECOMP |
Page parse method.
This enum lists the possible values of the Page parser algorithm settings of the Engine. This setting makes it possible to specify one of the three different page parser algorithms for Latin-alphabet languages, or one of the two different algorithms for CCJK languages. In the latter case DCM_LEGACY
and DCM_FAST
are the same. This setting has no effect for Arabic, Thai and Hebrew OCR.
enum RECOGNITIONMODULE |
Recognition modules (Engines)
This enumerates the different recognition modules of the Engine available to the integrating application. All zones must have an assigned recognition module in their rm fields before processing.
enum ZONETYPE |
Basic zone types.
WT_FLOW |
Flowed text. This zone type means that the zone contains textual information arranged horizontally without a table type structure inside. Inside a user zone of this type kRecLocateZones creates one OCR zone of the same type. It can be in OCR zones and user zones. It can also be used for horizontally appearing CCJK characters. |
WT_TABLE |
Table type zone. This type means that the zone contains a table, i.e. rows and columns, with or without a grid. Such zones will be handled differently from flowed text type zones. Inside a user zone of this type kRecLocateZones creates one OCR zone of the same type. The Engine will try to reconstruct as much of the original table text layout of the zone as the final output document format supports. |
WT_GRAPHIC |
Graphic type zone. This type of zone contains graphics, i.e. this zone will not be recognized at all and all other recognition related settings will be ignored. The only reason to have such a zone is to save or export the image inside it. Inside a user zone of this type kRecLocateZones creates one OCR zone of the same type. |
WT_AUTO |
Inside a user zone of this type kRecLocateZones performs a parsing algorithm and it may create several OCR zones of any types except |
WT_IGNORE |
Ignore zone. kRecLocateZones does not create OCR zones inside a user zone of this type. |
WT_FORM |
Form zone. Logical Form Recognition will run within this zone. It indicates an unfilled form and it should be set in the user zone before running kRecLocateZones. kRecLocateZones creates one OCR zone of the same type, the created OCR zone contains a description of the form objects. See also Form Recognition Module. |
WT_VERTTEXT |
Vertical text. For CCJK characters only. |
WT_LEFTTEXT |
Left rotated text. For Latin, Greek and Cyrillic characters only. |
WT_RIGHTTEXT |
Right rotated text. For Latin, Greek and Cyrillic characters only. |
RECERR RECAPIKRN kRecAddZoneRect | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
const RECT * | pRect, | ||
int | nZone | ||
) |
Adding a rectangle to a user zone.
This function adds a new rectangle to an existing user zone. It creates their union. Because the new rectangle can overlap previous rectangles the function recalculates the rectangle list of the zone. The resulting union must be pizza-box except in the case of OMR zones. Of course, table zones and vertical text zones cannot receive new rectangles.
[in] | hPage | Handle of the page. |
[in] | iiImg | The index of the image whose coordinate system you have used in defining the rectangle to be added. It is recommended to use II_CURRENT coordinates if possible. |
[in] | pRect | The rectangle to be added to the given user zone. |
[in] | nZone | The index of the user zone the new rectangle is added to. |
RECERR |
nZone
-th one). RECERR kRecAddZoneRect(IntPtr hPage, IMAGEINDEX iiImg, RECT pRect, int nZone);
Copying the OCR zone list to a user zone list.
This function copies the OCR zones in the place of user zones. It leaves the OCR zones intact, but deletes the former user zones. It can be used to delete/insert/change user zones based on the OCR zones detected by a previous kRecLocateZones.
[in] | hPage | Handle of the page. |
RECERR |
RECERR kRecCopyOCRZones(IntPtr hPage);
Deleting all user zones.
This function deletes all zones both in the user and the OCR zone list for the page.
[in] | hPage | Handle of the page. |
RECERR |
RECERR kRecDeleteAllZones(IntPtr hPage);
Deleting a user zone.
This function deletes a zone from the user zone list of the page.
[in] | hPage | The handle of the page. |
[in] | nZone | Index of the user zone to be deleted. |
RECERR |
RECERR kRecDeleteZone(IntPtr hPage, int nZone);
RECERR RECAPIKRN kRecGetDecompMethod | ( | int | sid, |
IMG_DECOMP * | pAlgorithm | ||
) |
Getting the decomp method.
This function inquires the current Page parser algorithm setting of the Engine.
[in] | sid | Settings Collection ID. |
[out] | pAlgorithm | The current page parser algorithm. |
RECERR |
RECERR kRecGetDecompMethod(int sid, out IMG_DECOMP decompAlg);
RECERR RECAPIKRN kRecGetForceSingleColumn | ( | int | sid, |
INTBOOL * | pbForceSingle | ||
) |
Getting the setting of Force Single Column mode.
This function inquires the current setting of the Force Single Column mode.
[in] | sid | Settings Collection ID. |
[out] | pbForceSingle | Address of a Boolean variable to hold the current Force Single Column mode setting. |
RECERR |
RECERR kRecGetForceSingleColumn(int sid, out bool bEnable);
RECERR RECAPIKRN kRecGetNongriddedTableDetect | ( | int | sid, |
INTBOOL * | bEnable | ||
) |
Getting the non-gridded table detection setting.
This function specifies whether the non-gridded table detection feature of the Engine is enabled.
[in] | sid | Settings Collection ID. |
[out] | bEnable | The value of the current non-gridded table detection setting. |
RECERR |
RECERR kRecGetNongriddedTableDetect(int sid, out bool bEnable);
Getting the OCR zone count.
This function gets the number of zones in the OCR zone list of the page.
[in] | hPage | Handle of the page. |
[out] | pnOCRZones | Address of an integer variable to get the number of zones. |
RECERR |
RECERR kRecGetOCRZoneCount(IntPtr hPage, out int ZoneCount);
RECERR RECAPIKRN kRecGetOCRZoneData | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPZONEDATA | pOCRZoneData, | ||
int | nOCRZone | ||
) |
Getting additional information about OCR zones.
This function can be used for getting additional information about any OCR zone in the OCR zone list of the page.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system is to be used to report the zone's boundary box. |
[out] | pOCRZoneData | Pointer to a variable for storing the requested zone-data information. |
[in] | nOCRZone | Index of the zone in the zone list, from which the information is requested. |
RECERR |
RECERR kRecGetOCRZoneData(IntPtr hPage, IMAGEINDEX iiImg, out ZONEDATA pOCRZoneData, int nOCRZone);
RECERR RECAPIKRN kRecGetOCRZoneInfo | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPZONE | pOCRZone, | ||
int | nOCRZone | ||
) |
Getting OCR zone information.
This function can be used for getting information about any zone in the OCR zone list of the page.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system is used to report the zone's boundary box. |
[out] | pOCRZone | Pointer to a variable for storing the requested zone information. |
[in] | nOCRZone | Index of the zone in the zone list, from which the information is requested. |
RECERR |
RECERR kRecGetOCRZoneInfo(IntPtr hPage, IMAGEINDEX iiImage, out ZONE pZone, int nZone);
RECERR RECAPIKRN kRecGetOCRZoneLayout | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPRECT * | ppRects, | ||
int * | pnRects, | ||
int | nZone | ||
) |
Getting OCR zone shape information.
This function can be used for getting information about the shape of any zone in the OCR zone list of the hPage
page. For more information about the possible shape of the zones see the definition of pizza-box.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system is used to report the shape information. |
[out] | ppRects | Pointer to an array of RECTs for storing the requested shape information. |
[out] | pnRects | Pointer to number variable for storing the number of rectangles in the ppRects array. |
[in] | nZone | Index of the zone in the zone list, from which the information is requested. |
RECERR |
RECERR kRecGetOCRZoneLayout(IntPtr hPage, IMAGEINDEX iiImg, out RECT[] ppRects, int nZone);
RECERR RECAPIKRN kRecGetOCRZoneNodeArray | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPPOINT * | ppPoints, | ||
int * | pnNodes, | ||
int | iZone | ||
) |
Getting the polygon of the OCR zone.
This function retrieves the polygon made up of the OCR zone's vertices. This can be useful for an application with a GUI for drawing irregular zones.
[in] | hPage | The handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system is used to report the points. |
[out] | ppPoints | The pointer of the array of polygon points. This array is allocated by the function and can be freed via calling the function kRecFree. |
[out] | pnNodes | The pointer of an integer retrieving the number of polygon vertices. |
[in] | iZone | The index of the OCR zone in question. |
RECERR |
iiImg
is II_ORIGINAL the polygon may have slanting edges. RECERR kRecGetOCRZoneNodeArray(IntPtr hPage, IMAGEINDEX iiImg, out POINT[] ppPoints, int nZone);
RECERR RECAPIKRN kRecGetPageDescription | ( | int | sid, |
DWORD * | pPageDesc | ||
) |
Getting page description data.
This function gets the current page description data.
[in] | sid | Settings Collection ID. |
[out] | pPageDesc | The actual Page Descriptor. |
RECERR |
RECERR kRecGetPageDescription(int sid, out PAGEDESCRIPTION pPageDesc);
Getting the user zone count.
This function gets the number of zones in the user zone list for the page.
[in] | hPage | Handle of the page. |
[out] | pnZones | Address of an integer variable to get the number of zones. |
RECERR |
RECERR kRecGetZoneCount(IntPtr hPage, out int pnZones);
RECERR RECAPIKRN kRecGetZoneInfo | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPZONE | pZone, | ||
int | nZone | ||
) |
Getting user zone information.
This function can be used for getting information about any zone in the user zone list of the page.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page, whose coordinate system is used to report the zone's boundary box. |
[out] | pZone | Pointer to a variable for storing the requested zone information. |
[in] | nZone | Index of the zone in the zone list, from which the information is requested. |
RECERR |
RECERR kRecGetZoneInfo(IntPtr hPage, IMAGEINDEX iiImg, out ZONE pZone, int nZone);
RECERR RECAPIKRN kRecGetZoneLayout | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPRECT * | ppRects, | ||
int * | pnRects, | ||
int | iZone | ||
) |
Getting user zone shape information.
This function can be used for getting information about the shape of any zone in the user zone list of the hPage
page. For more information about the possible zone shapes see the definition of pizza-box.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page, whose coordinate system is used to report the requested zone shape. |
[out] | ppRects | Pointer to an array of RECTs for storing the requested shape information. |
[out] | pnRects | Pointer to a number variable for storing the number of rectangles in the ppRects array. |
[in] | iZone | Index of the zone in the zone list, from which the information is requested. |
RECERR |
RECERR kRecGetZoneLayout(IntPtr hPage, IMAGEINDEX iiImg, out RECT[] ppRects, int nZone);
RECERR RECAPIKRN kRecGetZoneNodeArray | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPPOINT * | ppPoints, | ||
int * | pnNodes, | ||
int | iZone | ||
) |
Getting the polygon of the user zone.
This function retrieves the polygon made up of the vertices of the user zone. This can be useful for an application with a GUI when drawing irregular zones.
[in] | hPage | The handle of the page. |
[in] | iiImg | Index of the image in the page, whose coordinate system is used to report the points. |
[out] | ppPoints | The pointer of the array of polygon points. This array is allocated by the function and can be freed calling the function kRecFree. |
[out] | pnNodes | The pointer of an integer retrieving the number of polygon vertices. |
[in] | iZone | The index of the user zone in question. |
RECERR |
iiImg
is II_ORIGINAL the polygon may have slanting edges due to the deskew operation. RECERR kRecGetZoneNodeArray(IntPtr hPage, IMAGEINDEX iiImg, out POINT[] ppPoints, int nZone);
void RECAPIKRN kRecInitZone | ( | LPZONE | pZone | ) |
Initializing a ZONE variable.
This function initializes a ZONE variable to default values.
[in] | pZone | Pointer to the zone structure to be initialized. |
none |
type = WT_FLOW; fm = FM_DEFAULT; rm = RM_AUTO; filter = FILTER_DEFAULT; chk_control = 0; chk_fn = NULL; chk_sect = ""; userdata = 0;
RECERR kRecInitZone([In, Out] ZONE zone);
RECERR RECAPIKRN kRecInsertZone | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPCZONE | pZone, | ||
int | nZone | ||
) |
Inserting a user zone.
This function inserts a new zone in the user zone list of the page. After inserting the zone, the zone list will be recalculated automatically. For information about insertion of irregular zones see notes.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system you have used in defining the boundary box for the new zone. It is recommended to use II_CURRENT coordinates if possible. |
[in] | pZone | Pointer to the zone data to be inserted. |
[in] | nZone | Index in the user zone list where the new zone should be inserted. Use zero (0) to insert the zone as the first element of the zone list. To insert a zone as the last element of the zone list, use -1. |
RECERR |
HPAGE hPage; ZONE zone; kRecInitZone(&zone); zone.rectBBox.left = 0; zone.rectBBox.top = 0; zone.rectBBox.right = 100; zone.rectBBox.bottom = 200; zone.fm = FM_OMNIFONT; zone.rm = RM_OMNIFONT_MOR; zone.filter = (CHR_FILTER)(FILTER_UPPERCASE | FILTER_DIGIT); kRecInsertZone(hPage, II_CURRENT, &zone, -1);
RECERR kRecInsertZone(IntPtr hPage, IMAGEINDEX iiImg, [In] ZONE pZone, int nZone);
Loading user zones.
This function loads the user zone list from a zone file. The function attaches the zone list to the page.
[in] | hPage | Handle of the page. |
[in] | pFileName | Name of the zone file to be loaded. |
ZONE_SIZE_WARN | At least one zone has been truncated, because it extends beyond the image |
ZONE_SIZE_ERR | At least one zone has not been loaded, because it extends beyond the image |
RECERR | Other errors |
RECERR kRecLoadZones(IntPtr hPage, string pFileName);
Page parsing.
This function analyzes the page layout structure of the image (auto-zoning). It finds text or graphic blocks on the page, builds an OCR zone list and then connects it to the page. The user zone list is not changed. It activates the PID_DECOMPOSITION process.
[in] | sid | Settings Collection ID. |
[in] | hPage | Handle of the page containing the OCR image to be analyzed. |
RECERR |
kRecLocateZones
they should be inserted before calling this function. kRecLocateZones
creates the OCR zones and puts them into the OCR zone list. Any previously inserted OCR zone is deleted first. The generated OCR zones are used by the recognition process and they may be modified by this. RECERR kRecLocateZones(int sid, IntPtr hPage);
Saving the OCR zone list.
This function saves the current OCR zone list of the page into a zone file. The zone file can be loaded later by kRecLoadZones as user zones.
[in] | hPage | Handle of the page. |
[in] | pFileName | Name of the zone file to be created. |
RECERR |
RECERR kRecSaveOCRZones(IntPtr hPage, string pFileName);
Saving the user zone list.
This function saves the current user zone list of the page into a zone file.
[in] | hPage | Handle of the page. |
[in] | pFileName | Name of the zone file to be created. |
RECERR |
RECERR kRecSaveZones(IntPtr hPage, string pFileName);
RECERR RECAPIKRN kRecSetDecompMethod | ( | int | sid, |
IMG_DECOMP | Algorithm | ||
) |
Setting the decomp method.
This function specifies the Page parser algorithm setting of the Engine. This setting is applied whenever the auto-zoning algorithm is activated (PID_DECOMPOSITION process).
[in] | sid | Settings Collection ID. |
[in] | Algorithm | The page parser algorithm to be set. |
RECERR |
RECERR kRecSetDecompMethod(int sid, IMG_DECOMP decompAlg);
RECERR RECAPIKRN kRecSetForceSingleColumn | ( | int | sid, |
INTBOOL | bForceSingle | ||
) |
Specify the Force Single Column mode.
This function specifies the Force Single Column mode for the page-layout PID_DECOMPOSITION process. It prevents the Engine's de-columnization from detecting columns and placing their contents one below the other. It is useful for conserving the columnar structure in tables.
[in] | sid | Settings Collection ID. |
[in] | bForceSingle | Force Single Column mode to be set (default is FALSE). |
RECERR |
RECERR kRecSetForceSingleColumn(int sid, bool bEnable);
RECERR RECAPIKRN kRecSetNongriddedTableDetect | ( | int | sid, |
INTBOOL | bEnable | ||
) |
Setting the non-gridded table detection.
This function sets the Non-gridded table detection setting of the Engine. Tables with visible grid lines (gridded tables) in an original page can usually be detected successfully by the auto-zoning function. In contrast, tables without visible cell separators in the original are harder to identify as tables, because they might also be word lists or data arranged in columns. The OmniPage CSDK offers an algorithm for detecting such non-gridded tables more confidently. This feature of the Engine can only be used in conjunction with an auto-zoning step. The algorithm is based on the result of the character recognition and runs on the OCR zones created by auto-zoning (including the zones created from a WT_AUTO User zone).
[in] | sid | Settings Collection ID. |
[in] | bEnable | The value to be set for the non-gridded table detection setting (the default is TRUE ). |
RECERR |
RECERR kRecSetNongriddedTableDetect(int sid, bool bEnable);
RECERR RECAPIKRN kRecSetPageDescription | ( | int | sid, |
DWORD | PageDesc | ||
) |
Setting page description data.
The page description data controls how the page parse (see kRecLocateZones and kRecRecognize) runs on the page. Page Descriptor defines describe the different behaviors of page parse. If the program has information about the image, it can help the page parse to achieve better layout results.
[in] | sid | Settings Collection ID. |
[in] | PageDesc | The Page Descriptor. It contains a set of LZ_ flags |
RECERR |
RECERR kRecSetPageDescription(int sid, PAGEDESCRIPTION PageDesc);
RECERR RECAPIKRN kRecSetZoneLayout | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPCRECT | pRects, | ||
int | nRects, | ||
int | nZone | ||
) |
Updating the user zone shape information.
This function updates the shape information of any zone in the user zone list. See also the definition of pizza-box for more information about zone shapes.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system you have used in defining the shape to be updated. |
[in] | pRects | Array of RECTs for storing the shape information. |
[in] | nRects | The number of RECTs in the shape information array. |
[in] | nZone | Index of the zone to be updated. |
RECERR |
RECERR kRecSetZoneLayout(IntPtr hPage, IMAGEINDEX iiImg, RECT[] pRects, int nZone);
RECERR RECAPIKRN kRecSubZoneRect | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
const RECT * | pRect, | ||
int | nZone | ||
) |
Subtracting a rectangle from a user zone.
This function subtracts a rectangle from an existing user zone. The function recalculates the rectangle list of the zone. The resulting list must describe a pizza-box shape. Subtraction cannot be performed on table zones and vertical text zones.
[in] | hPage | Handle of the page. |
[in] | iiImg | The index of the image whose coordinate system you have used in defining the rectangle to be subtracted. It is recommended to use II_CURRENT coordinates if possible. |
[in] | pRect | The rectangle to be subtracted from the given user zone. |
[in] | nZone | The index of the user zone the rectangle is subtracted from. |
RECERR |
RECERR kRecSubZoneRect(IntPtr hPage, IMAGEINDEX iiImg, RECT pRect, int nZone);
RECERR RECAPIKRN kRecUpdateOCRZone | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPCZONE | pZone, | ||
int | nZone | ||
) |
Updating the OCR zone.
This function updates the zone data of any zone in the OCR zone list. The fields ZONE::rectBBox, ZONE::type, ZONE::chk_fn (must be NULL) and chk_sect (must be empty string) cannot be modified.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page, whose coordinate system you have used in defining the zone's boundary box. |
[in] | pZone | Pointer to a zone structure with the necessary updating zone data. |
[in] | nZone | Index of the zone to be updated. |
RECERR |
userdata
, fm
, rm
, filter
, chk_control
. Other fields of the passed zone structure are not considered. RECERR kRecUpdateOCRZone(IntPtr hPage, IMAGEINDEX iiImg, [In] ZONE pZone, int nZone);
RECERR RECAPIKRN kRecUpdateZone | ( | HPAGE | hPage, |
IMAGEINDEX | iiImg, | ||
LPCZONE | pZone, | ||
int | nZone | ||
) |
Updating a user zone.
This function updates the zone data of any zone in the user zone list.
[in] | hPage | Handle of the page. |
[in] | iiImg | Index of the image in the page whose coordinate system you have used in defining the zone's boundary box. |
[in] | pZone | Pointer to a zone structure with the necessary updating zone data. |
[in] | nZone | Index of the zone to be updated. |
RECERR |
RECERR kRecUpdateZone(IntPtr hPage, IMAGEINDEX iiImg, [In] ZONE pZone, int nZone);