Performing Optical Character Recognition. More...

Classes
struct	STATISTIC
	Recognition statistics. More...
struct	_RPPERRORS
	Error list item. More...
Modules
	Recognition Engines having own functions
	There are some recognition engines having one or more own functions or types.
Typedefs
typedef STATISTIC *	LPSTATISTIC
	Pointer type to a STATISTIC type variable.
typedef INTBOOL RECKRNCALL	ONETOUCH_CB (INTBOOL bMore, void pContext, LPCTSTR notused)
	The user-written "One-touch" callback function.
typedef ONETOUCH_CB *	LPONETOUCH_CB
	Pointer to a ONETOUCH_CB callback function.
typedef struct _RPPERRORS	RPPERRORS
	Error list item.
Enumerations
enum	RMTRADEOFF { TO_ACCURATE, TO_BALANCED, TO_FAST }
	Trade-off setting. More...
enum	PDF_REC_MODE { PDF_RM_ALWAYSRECOGNIZE, PDF_RM_MOSTLYGETTEXT, PDF_RM_ALWAYSGETTEXT }
	PDF recognition mode. More...
enum	PDF_PROC_MODE { PDF_PM_AUTO, PDF_PM_NORMAL, PDF_PM_GRAPHICS_ONLY, PDF_PM_TEXT_ONLY, PDF_PM_TEXT_ONLY_EXT, PDF_PM_AS_IMAGE }
	PDF processing mode. More...
Functions
RECERR RECAPIKRN	kRecSetRMTradeoff (int sid, RMTRADEOFF Tradeoff)
	Changing trade-off setting.
RECERR RECAPIKRN	kRecGetRMTradeoff (int sid, RMTRADEOFF *pTradeoff)
	Getting trade-off setting.
RECERR RECAPIKRN	kRecSetDefaultFillingMethod (int sid, FILLINGMETHOD type)
	Changing default filling method.
RECERR RECAPIKRN	kRecGetDefaultFillingMethod (int sid, FILLINGMETHOD *ptype)
	Getting default filling method.
RECERR RECAPIKRN	kRecSetDefaultRecognitionModule (int sid, RECOGNITIONMODULE rm)
	Changing the default recognition module.
RECERR RECAPIKRN	kRecGetDefaultRecognitionModule (int sid, RECOGNITIONMODULE *rm)
	Getting the default recognition module.
RECERR RECAPIKRN	kRecSetTrainingFileName (int sid, LPCTSTR pFileName)
	Setting the training file.
RECERR RECAPIKRN	kRecGetTrainingFileName (int sid, LPTSTR pFileName, size_t iSize)
	Getting the name of the training file.
RECERR RECAPIKRN	kRecRecognize (int sid, HPAGE hPage, LPCTSTR pFilename)
	Recognizing a page.
RECERR RECAPIKRN	kRecProcessPages (int sid, LPCTSTR pDocFile, LPCTSTR pImageFiles, LPONETOUCH_CB pCallback, void pContext, LPCTSTR pTemplate)
	Process multiple pages and convert them to a single document.
RECERR RECAPIKRN	kRecProcessPagesEx (int sid, LPCTSTR pDocFile, LPCTSTR pImageFiles, LPONETOUCH_CB pCallback, void pContext, LPCTSTR pTemplate)
	Process multiple pages and convert them to a single document using multiple recognition threads.
RECERR RECAPIKRN	kRecGetRPPErrorList (RPPERRORS **rppErrs)
	Getting errors from last kRecProcessPagesEx.
RECERR RECAPIKRN	kRecGetStatistics (int sid, LPSTATISTIC pStat)
	Getting statistics about processing.

Detailed Description

Performing Optical Character Recognition.

The Engine can load several recognition engines. The User License determines which ones are available. Select which available engine to run in a given ZONE. According to your choice, the application is able to perform "multi-module" recognition on any image.

The enum RECOGNITIONMODULE lists recognition modules. These are tightly integrated to the Engine.

PLUS3W omnifont module for machine-printed text (most accurate) (supported on: Windows, Linux, Mac OS X),
PLUS2W omnifont module for machine-printed text (default),
MTX omnifont module for machine-printed text (supported on: Windows),
MOR multi-language omnifont module for machine-printed text (supported on: Windows, Linux),
FRX multi-language omnifont module for machine-printed text,
HNR module for handprinted digits (supported on: Windows),
RER module for handprinted alphanumerical characters (supported on: Windows, Linux, Mac OS X),
DOT module for 9-pin draft dot-matrix printouts (supported on: Windows),
OMR module for optical marks (checkmarks),
MAT matrix matching module (supported on: Windows),
BAR module for barcodes,
Asian recognition module for CCJK and Arabic languages.

Note:: The RER recognition module is third-party component.

You can set an additional RECOGNITIONMODULE value: RM_AUTO. If you set this, the Engine will choose the module that is most likely to be appropriate. The decision of the Engine is primarily based on the filling method set for the zone.

The filling method describes the type of data expected in the zone, e.g. a barcode, a handprinted or a machine generated text. A certain degree of auto-detection is available for setting the filling method: use the kRecDetectFillingMethod function for this. You might find this particularly useful when you cannot be sure in advance precisely what filling method to use on the incoming documents. It is essential to specify a valid recognition module-filling method pair. Any incorrectly set zone will have no recognition result. Some filling methods can be linked successfully with one and only one recognition module (e.g. FM_OMR with RM_OMR). Other recognition modules support more than one filling method, and some filling methods are accepted by more than one recognition module. For example, if you work with a medium quality, 9-pin dot-matrix text, either the RM_DOT or one of the omnifont modules could give better results. On a medium quality, 24-pin dot-matrix text, try RM_OMNIFONT_MOR with FM_DRAFTDOT24 and FM_OMNIFONT.

For details on the above and recognition engines see OCR Engines.

RM_AUTO reads the filling method of the zone; if only one recognition module is suitable, that one is used. When there is a choice of recognition modules, RM_AUTO uses various checks (character set, image size, etc.) to select the best one. This way, it prevents an invalid Filling method - Recognition module pair.

If the recognition module is not present when it is needed, the recognition function (kRecRecognize) returns with API_MODULEMISSING_ERR, and there will be no recognized data for the zone in question. To avoid this, call kRecGetModulesInfo just after Engine initialization. This will check the presence and correct installation of the necessary recognition modules.

The working of the Recognition Modules can be adjusted with settings in some points.

IMPORTANT NOTES

The default settings of OmniPage 20 (Nuance's desktop application) and OmniPage Capture SDK 20 are not the same. In default, RecAPI of the CSDK does not run in the most accurate mode, but in a less accurate and faster mode, which is a good compromise between the speed and the accuracy. But it can be easily switched into the most accurate mode modifying the value of the setting Kernel.OcrMgr.PreferAccurateEngine to true. This most accurate mode of the CSDK is equivalent to the default of the desktop application. See also kRecSetDefaultRecognitionModule and its notes.

PDF recognition

PDF files can be separated into two basic classes: image-only PDF and normal PDF. Recognition of image-only PDF is exactly the same as that of any other image format. The normal PDF contains both image and text data. This textual information can be retrieved from the file and used as an aid to the recognition.

In general, the text residing in a PDF file is reliable, thus even a less accurate OCR result combined with this text may be enough to achieve good accuracy. When performing this, the User can adjust the PDF trade-off setting (Kernel.OcrMgr.PDF.TradeOff). Furthermore, the User can specify the mode for handling the text located in a PDF file (PDF_REC_MODE).

The textual information divides the page into graphic and text areas. The User can select which one should be recognized during the OCR process (PDF_PROC_MODE).

Typedef Documentation

typedef INTBOOL RECKRNCALL ONETOUCH_CB(INTBOOL bMore, void *pContext, LPCTSTR *notused)

The user-written "One-touch" callback function.

It allows user intervention in the "One-touch" processing functions. This user-written function is called by the Engine after each processed page, either by the kRecProcessPages or by the kRecScanPages functions (furthermore by RecAPIPlus function RecProcessPagesEx).

Parameters:

[in]	bMore	After processing each page, the Engine calls this function with this parameter indicating whether there are further pages to be processed (i.e. whether or not an ADF still has paper, or whether or not there are further pages in a multi-page image file).
[in]	pContext	User data passed to the callback function by the Engine. The data to be passed can be set with kRecSetCBProgMon.
[out]	not	used

Return values:

TRUE	The Engine is to continue with the next page.
FALSE	The Engine is to abort processing pages.

Note:: This function is integrator-written and it should get user information whether they would like to continue processing with further pages.; The callback function is not to contain calls to any KERNELAPI functions.

typedef struct _RPPERRORS RPPERRORS

Error list item.

This struct contains an item of the error list retrieved by kRecGetRPPErrorList or RecGetRPPErrorList.

Enumeration Type Documentation

enum PDF_PROC_MODE

PDF processing mode.

This setting specifies the processing mode of graphic and text areas of PDF files. See PDF recognition for details. See the setting Kernel.OcrMgr.PDF.ProcessingMode.

Note:: If the value of this setting differs from the default value, kRecPreprocessImg should be called.; In PDF files there may be invisible text areas (e.g. white text on white background, text under an image, etc.). These areas typically contain information only for searching. The content of these areas can appear in the OCR result, when PDF processing mode is PDF_PM_TEXT_ONLY_EXT.; PDF_PM_AS_IMAGE and IMF_PDF_AS_IMAGE mode of IMF PDF load flags are different. Both are slower than normal PDF mode and both process the PDF file as an image. However in PDF_PM_AS_IMAGE mode recognition combines the text information coming from the PDF with the raw result of the OCR process.

Enumerator:

PDF_PM_AUTO	Checks textual information. If it is header- and/or footer-like, both text and graphic areas are recognized in `PDF_PM_AS_IMAGE` mode. Otherwise, this is equivalent to `PDF_PM_NORMAL`. (Default)
PDF_PM_NORMAL	Only the text areas are recognized, if they exist. Otherwise the full page is recognized in image mode.
PDF_PM_GRAPHICS_ONLY	Only the graphic areas are recognized.
PDF_PM_TEXT_ONLY	Only the text areas are recognized.
PDF_PM_TEXT_ONLY_EXT	Only the text areas with invisible text areas are recognized.
PDF_PM_AS_IMAGE	Recognize the full page (both of text & graphics areas). Be warned that this mode is slower than `PDF_PM_AUTO`. See notes above.

enum PDF_REC_MODE

PDF recognition mode.

This setting specifies the usage of text data coming from normal PDF files (non-image-only PDF). See PDF recognition for details. See the setting Kernel.OcrMgr.PDF.RecognitionMode.

Enumerator:

PDF_RM_ALWAYSRECOGNIZE	Combines the characters from the OCR result with the PDF text. (Default)
PDF_RM_MOSTLYGETTEXT	Same as ALWAYSGETTEXT mode unless a font character coding problem is detected in a PDF page; then it is equal to ALWAYSRECOGNIZE mode.
PDF_RM_ALWAYSGETTEXT	Uses the PDF text, relying on the OCR result only to determine the spaces between words (fastest).

enum RMTRADEOFF

Trade-off setting.

This specifies the possible Engine trade-off settings to be applied during the recognition at page level. This setting has a trade-off influence between the accuracy and the speed of recognition. The precise influence depends on the recognition module used. This setting may also influence which auto-zoning and pre-process algorithm will be applied.

Note:

The value of the setting Kernel.OcrMgr.TradeOff can be set calling kRecSetRMTradeoff

Pre-process (kRecPreprocessImg) specific notes:

If resolution enhancement is set to RE_AUTO the enhancement depends on the current trade-off setting according to the following: if trade-off is TO_FAST the resolution enhancement is RE_LEGACY, otherwise it is RE_STANDARD.
If trade-off is TO_FAST pre-process uses a faster, but less accurate deskew, binarization, and despeckle algorithm than otherwise.

When the Page parser algorithm settings of the Engine is set to DCM_AUTO and both parsers are available in the Engine configuration, the Engine trade-off setting determines which page parser algorithm is applied (IMG_DECOMP).

If User wants trade-off to affect only on recognition, it can be set after calling kRecPreprocessImg and kRecLocateZones. However there may be cases, when this is impossible. For example, when kRecLocateZones is not called, or when a one-step function is called (e.g. kRecProcessPages or RecProcessPagesEx). In such a case, setting Kernel.OcrMgr.TradeOff.PreProc to TO_ACCURATE and Kernel.Decomp.Method to DCM_STANDARD will ensure preprocess and zoning to run in default independently from the value of trade-off.

Any of these three settings can always be specified, however, for some recognition modules these are mapped internally into two values.

Recognition module specific notes:

The modules MTX, DOT, HNR and the FAST mode of PLUS2W and PLUS3W are supported on: Windows, PLUS3W is supported on: Windows, Linux, RER is supported on: Windows, Linux, Mac OS X.
RM_OMNIFONT_MOR : Depending on this setting, either a one-pass, a two-pass or a two-pass with ACA (Adaptive Cell Analysis) recognition algorithm is used. Recognition accuracy may also be affected by the use (and settings) of the checking module. The combinations of these two settings result in five different speed/accuracy user choices.
- Two-pass with Adaptive Cell Analysis with spelling. TO_ACCURATE with checking module enabled.
- Two-pass with spelling. TO_BALANCED with checking module enabled.
- Two-pass. TO_BALANCED with checking disabled.
- Single-pass with spelling. TO_FAST with checking module enabled.
- Single-pass. TO_FAST with checking disabled.
RM_OMNIFONT_MTX : This recognition module has a two-value trade-off setting: internally the TO_ACCURATE and TO_BALANCED are mapped to the same setting.
RM_HNR : This recognition module has a two-value trade-off setting: internally the TO_ACCURATE and TO_BALANCED are mapped to the same setting. The TO_ACCURATE means fewer mis-recognized characters, but maybe more rejected ones.
RM_RER : This recognition module has a two-value trade-off setting: internally the TO_FAST and TO_BALANCED are mapped to the same setting.
RM_ASIAN : see the section CCJK trade-off for more information.
All remaining recognition modules: These do not interpret this TRADEOFF setting.

See the settings Kernel.OcrMgr.TradeOff, Kernel.OcrMgr.TradeOff.PreProc and Kernel.OcrMgr.PDF.TradeOff.

Enumerator:

TO_ACCURATE	Most accurate recognition (Default).
TO_BALANCED	Mid-level accuracy/speed recognition.
TO_FAST	Fast recognition.

Function Documentation

RECERR RECAPIKRN kRecGetDefaultFillingMethod	(	int	sid,
		FILLINGMETHOD *	ptype
	)

Getting default filling method.

The kRecGetDefaultFillingMethod function gets the default filling method setting.

Parameters:

[in]	sid	Settings Collection ID.
[out]	ptype	Address of a variable to get the default filling method setting.

Return values:

RECERR

Note:

This function gets the value of the setting Kernel.OcrMgr.DefaultFillingMethod. This setting can be changed by kRecSetDefaultFillingMethod.

The specification of this function in C# is:

 RECERR kRecGetDefaultFillingMethod(int sid, out FILLINGMETHOD type);

RECERR RECAPIKRN kRecGetDefaultRecognitionModule	(	int	sid,
		RECOGNITIONMODULE *	rm
	)

Getting the default recognition module.

The kRecGetDefaultRecognitionModule function retrieves the default recognition module setting.

Parameters:

[in]	sid	Settings Collection ID.
[in]	rm	Pointer of a variable to store the default recognition module.

Return values:

RECERR

Note:

This function gets the value of the setting Kernel.OcrMgr.DefaultRecognitionModule. This setting can be changed by kRecSetDefaultRecognitionModule.

The specification of this function in C# is:

 RECERR kRecGetDefaultRecognitionModule(int sid, out RECOGNITIONMODULE rm);

RECERR RECAPIKRN kRecGetRMTradeoff	(	int	sid,
		RMTRADEOFF *	pTradeoff
	)

Getting trade-off setting.

The kRecGetRMTradeoff function provides the current recognition algorithm trade-off setting. See kRecSetRMTradeoff.

Parameters:

[in]	sid	Settings Collection ID.
[out]	pTradeoff	Pointer of a variable to get the current recognition algorithm trade-off setting.

Return values:

RECERR

Note:

This function gets the value of the setting Kernel.OcrMgr.TradeOff. This setting can be changed by kRecSetRMTradeoff.

The specification of this function in C# is:

 RECERR kRecGetRMTradeoff(int sid, out RMTRADEOFF Tradeoff);

RECERR RECAPIKRN kRecGetRPPErrorList ( RPPERRORS ** rppErrs )

Getting errors from last kRecProcessPagesEx.

The kRecGetRPPErrorList function returns the error list of last kRecProcessPages or kRecProcessPagesEx call.

Parameters:

[out] rppErrs Pointer of a variable to store a pointer of an internal array. This array contains data for the errors that happened.

Return values:

RECERR

Note:

    RPPERRORS *rppErrs;
    kRecGetRPPErrorList(&rppErrs);
    while (rppErrs != NULL)
    {
        LPCSTR p = NULL;
        kRecGetErrorInfo(rppErrs->rc, &p);
        printf("RC:%d/%s, obj:%S - page:%d\n", rppErrs->rc, p, rppErrs->obj, rppErrs->page);
        rppErrs = rppErrs->next;
    }

RECERR RECAPIKRN kRecGetStatistics	(	int	sid,
		LPSTATISTIC	pStat
	)

Getting statistics about processing.

The kRecGetStatistics function returns information about the accuracy and timing data of the latest recognition process in a structure STATISTIC.

Parameters:

[in]	sid	Settings Collection ID.
[out]	pStat	Address of a structure to hold the statistical information.

Return values:

RECERR

Note:

The specification of this function in C# is:

 RECERR kRecGetStatistics(int sid, out STATISTIC pStat);

RECERR RECAPIKRN kRecGetTrainingFileName	(	int	sid,
		LPTSTR	pFileName,
		size_t	iSize
	)

Getting the name of the training file.

The kRecGetTrainingFileName function gets the current training file name setting.

Parameters:

[in]	sid	Settings Collection ID.
[out]	pFileName	Pointer of a buffer where the training file name setting will be copied to.
[in]	iSize	Character count of the buffer. The buffer must be large enough to hold all the characters and a terminating zero.

Return values:

RECERR

Note:

Training of recognition modules is supported on: Windows, Linux.

Call this function to see whether there is currently a training file in use and what its name is. If this call returns an empty string it means that no training file has been selected.

This function gets the value of the setting Kernel.OcrMgr.Training.FileName.

The specification of this function in C# is:

 RECERR kRecGetTrainingFileName(int sid, out string fileName);

RECERR RECAPIKRN kRecProcessPages	(	int	sid,
		LPCTSTR	pDocFile,
		LPCTSTR *	pImageFiles,
		LPONETOUCH_CB	pCallback,
		void *	pContext,
		LPCTSTR	pTemplate
	)

Process multiple pages and convert them to a single document.

This function performs recognition on more than one image file using the Direct TXT output converter to export the results in one common document. If there is more than one page in the specified image file or there are more image files, export can be done only to appendable DirectTXT formats (DTXTOUTPUTFORMATS).

Parameters:

[in]	sid	Settings Collection ID.
[in]	pDocFile	Full path of the output document file.
[in]	pImageFiles	This is the pointer of an array of full paths of the input files. If this is NULL the input is a scanner. The latest path must be a NULL pointer indicating the end of the input list.
[in]	pCallback	Callback function's pointer, which is called after each page. Can be NULL.
[in]	pContext	Context value for passing to the callback function. Can be NULL.
[in]	pTemplate	Full path of a zone file to be loaded before recognition of each image.

Return values:

RECERR

Note:

See details about size limits of input images.

There are two special return values at this function: API_ERRORS_HAPPENED_WARN and API_WARNINGS_HAPPENED_WARN. These warnings signal that there are errors or warnings that happened during the kRecProcessPages call, but some of the pages have been processed correctly. The program can query the complete list of errors that occurred using kRecGetRPPErrorList.

If the input is a scanner, multi-page scanning can be performed in two ways. If the scanner has an ADF, it loads the pages until the ADF is empty. Otherwise, a ONETOUCH_CB is required.

The specification of this function in C# is:

 RECERR kRecProcessPages(int sid, string pDocFile, string[] pImageFiles, ONETOUCH_CB callback, string pTemplate);

The latest item of pImageFiles is not needed to be NULL in C#.

RECERR RECAPIKRN kRecProcessPagesEx	(	int	sid,
		LPCTSTR	pDocFile,
		LPCTSTR *	pImageFiles,
		LPONETOUCH_CB	pCallback,
		void *	pContext,
		LPCTSTR	pTemplate
	)

Process multiple pages and convert them to a single document using multiple recognition threads.

This function performs recognition on more than one image file using the Direct TXT output converter module to export the results in one common document. If there is more than one page in the specified image file or there are more image files, export can be done only to appendable DirectTXT formats (DTXTOUTPUTFORMATS).

Parameters:

[in]	sid	Settings Collection ID.
[in]	pDocFile	Full path of the output document file or output folder. By default if multiple input image files are given, multiple output files are created for each input in the given folder. The folder must exist. If the `pDocFile` parameter specifies only a path, ending back-slash must be there. For appending all processed pages to one output file, set the setting Kernel.RPP.OutputFileName to 0. In this case the `pDocFile` specifies the name of the output file. HSETTING hOF = NULL; kRecSettingGetHandle(NULL, "Kernel.RPP.OutputFileName", &hOF, NULL); kRecSettingSetInt(0, hOF, 0);
[in]	pImageFiles	This is the array pointer of the input files' full paths. If this is NULL the input is a scanner. The latest path must be a NULL pointer indicating the end of the input list.
[in]	pCallback	Callback function's pointer, called after each page. Can be NULL.
[in]	pContext	Context value for passing to the callback function. Can be NULL.
[in]	pTemplate	Full path of a zone file to be loaded before recognition of each image. Can be NULL.

Return values:

RECERR

Note:

Multi-threading is supported on: Windows only. Thus on other platforms this function performs a sequential workflow.

See details about size limits of input images.

There are two special return values at this function: API_ERRORS_HAPPENED_WARN and API_WARNINGS_HAPPENED_WARN. These warnings signal that there are errors or warnings that happened during the kRecProcessPagesEx call, but some of the pages have been processed correctly. The program can query the complete list of errors that occurred using kRecGetRPPErrorList.

This function does not use the pImageFile parameter of the ONETOUCH_CB callback function at all.

The maximum number of recognition threads started by this function can be specified by the setting Kernel.RPP.RecThreadCount.

The specification of this function in C# is:

 RECERR kRecProcessPagesEx(int sid, string pDocFile, string[] pImageFiles, ONETOUCH_CB callback, string pTemplate);

RECERR RECAPIKRN kRecRecognize	(	int	sid,
		HPAGE	hPage,
		LPCTSTR	pFilename
	)

Recognizing a page.

The kRecRecognize function performs the recognition task for a page in the engine's authority.

The function utilizes the zone information to activate the appropriate recognition module on every zone. Each recognition module recognizes the image parts assigned to it in the zone list. If the OCR zone list of the page is empty the PID_DECOMPOSITION page-layout decomposition process will be activated automatically in order to create a zone list for the image, before recognition.

The function offers the services of the checking module for either marking suspicious characters and words, or making the recognition result better.

After having recognized all the zones on the page, the function collects the necessary information about the recognized characters into a homogeneous structure, called the recognition result. It is stored in the HPAGE.

The kRecRecognize function may activate one or more of the processes PID_RECOGNITION1, PID_RECOGNITION2, PID_RECOGNITION3 and PID_SPELLING.

Parameters:

[in]	sid	Settings Collection ID.
[in]	hPage	Handle of the page to be recognized.
[in]	pFilename	Specifies how the recognition result must be stored. If this is not NULL, it must be a real file name with a full path; the recognition result will be stored in this file.

Return values:

RECERR

Note:

If the current image is not a B/W one in this page (i.e. it is a gray-scale or a 24-bit color image), then an implicit secondary image conversion step will be performed automatically to convert the image to a B/W one. The parameter for this conversion can be specified through the kRecSetImgBinarizationMode function.

The application can register a callback function for progress indication (kRecSetCBProgMon). The Engine then calls the registered (PROGMON_CB) callback entry point to allow progress monitoring for the application.

The recognition process uses the OCR zones generated by the decomposition process. In addition, it may modify them.

Some important details about filling method detection can be found here.

If the file specified in pFilename does not exist, it will be created, otherwise the newly recognized information is appended to the existing one, if that is an appendable format. For more information about available formats, see kRecConvert2DTXT.

Calling kRecRecognize with a specified filename gives the same result as calling it with NULL and right afterwards calling kRecConvert2DTXT with the same filename.

There are some cases when pFilename should be NULL.

We do not want any file output, because we will use kRecGetLetters.
We have few pages and we want to convert them together using kRecConvert2DTXT. kRecConvert2DTXT can accept array of HPAGEs. Or we want to do something with the HPAGE between kRecRecognize and kRecConvert2DTXT function calls.
There are a newer function kRecConvert2DTXTEx, which has an additional IMAGEINDEX parameter (comparing to the one without 'Ex'). Thus it can affect on the orientation of the pages at creating a PDF file. It is recommended to use NULL pFilename and call kRecConvert2DTXTEx always for creating PDF files.
We are using RecAPIPlus and the recognized HPAGE will be inserted an HDOC using RecInsertPage.

When an omnifont recognition module is running during recognition, this function may also apply the training data specified by the kRecSetTrainingFileName function.

The application can retrieve a copy of the recognition data by calling the kRecGetLetters function.

The application can remove the recognition result calling kRecFreeRecognitionData.

If a recognition module is not able to recognize an object (i.e. character, barcode or checkmark etc.), this object will be marked as a rejected one. It becomes marked by a rejection symbol during conversion to the final output document. kRecSetRejectionSymbol can be used to specify the rejection symbol for this.

Since the recognition algorithm may use the services of the checking module, the application should call the kRecSetSpell, kRecSetSpellLanguage, kRecSetUserDictionary and other checking related functions BEFORE calling kRecRecognize.

Checking of recognized zone contents may consist of two facilities (or their combination): a supplied Language dictionary, a user dictionary containing words.

This function can fill the line list of the HPAGE, but not all line occurrences are handled in the same way. See RLINE for more information.

The specification of this function in C# is:

    RECERR kRecRecognize(int sid, IntPtr hPage, string filename);
    // or when filename is NULL in C/C++
    RECERR kRecRecognize(int sid, IntPtr hPage);

RECERR RECAPIKRN kRecSetDefaultFillingMethod	(	int	sid,
		FILLINGMETHOD	type
	)

Changing default filling method.

The kRecSetDefaultFillingMethod function specifies the default filling method. The default filling method is applied to all zones on the page with the FM_DEFAULT value in their ZONE::fm field. This substitution of the filling method happens at the beginning of the kRecRecognize calls.

Parameters:

[in]	sid	Settings Collection ID.
[in]	type	Default filling method.

Return values:

RECERR

Note:

Zones with FM_DEFAULT are created either by the page-layout decomposition (auto-zoning) process OR by inserting or updating them with this value.

If this function is not called the default value of this setting is FM_OMNIFONT.

The case when the value of this setting is changed to FM_DEFAULT represents a special situation (i.e. the default filling method is itself "default"). In this case the recognition process starts the automatic zonetype detection algorithm of the Engine (see kRecDetectFillingMethod) in order to determine the filling methods zone by zone.

There is a similar function for specifying a fixed value for the RM_AUTO recognition module kRecSetDefaultRecognitionModule.

This function sets the value of the setting Kernel.OcrMgr.DefaultFillingMethod. This setting can be retrieved by kRecGetDefaultFillingMethod.

The specification of this function in C# is:

 RECERR kRecSetDefaultFillingMethod(int sid, FILLINGMETHOD type);

RECERR RECAPIKRN kRecSetDefaultRecognitionModule	(	int	sid,
		RECOGNITIONMODULE	rm
	)

Changing the default recognition module.

The kRecSetDefaultRecognitionModule function specifies the default recognition module setting. This setting is used to determine the recognition module to be applied for the zones with RM_AUTO in their ZONE::rm field.

Parameters:

[in]	sid	Settings Collection ID.
[in]	rm	Default recognition module.

Return values:

RECERR

Note:

If this function is not called the value of the setting is RM_AUTO.

If the default recognition module is set to RM_AUTO the recognition module for the zone is determined according to its filling method (ZONE::fm, kRecSetDefaultFillingMethod). For FM_OMNIFONT zones RM_OMNIFONT_PLUS2W is used. Note that this is not the most accurate recognition module, but a good compromise between speed and accuracy. If you want to use the most accurate engine set the recognition module to RM_OMNIFONT_PLUS3W for each omnifont zone manually, or set the setting Kernel.OcrMgr.PreferAccurateEngine to true.

The default recognition module affects zones with compatible filling methods only. For example the RM_OMNIFONT_PLUS3W default recognition module will be used for the FM_OMNIFONT, FM_DRAFTDOT9 and FM_DRAFTDOT24 zones if the zone's rm field is RM_AUTO, while other zones (e.g. FM_BARCODE ones) are recognized by a filling method compatible recognition module. Note that there is no such automatism when the zone's ZONE::rm field is specified (so rm is not RM_AUTO)! In that case the specified recognition module will be used even if the fm-rm combination is meaningless.

This function has limited support on some platforms. For details see the type RECOGNITIONMODULE.

This function sets the value of the setting Kernel.OcrMgr.DefaultRecognitionModule. This setting can be retrieved by kRecGetDefaultRecognitionModule.

The specification of this function in C# is:

 RECERR kRecSetDefaultRecognitionModule(int sid, RECOGNITIONMODULE rm);

RECERR RECAPIKRN kRecSetRMTradeoff	(	int	sid,
		RMTRADEOFF	Tradeoff
	)

Changing trade-off setting.

The kRecSetRMTradeoff function specifies a trade-off setting to be applied during preprocessing, auto-zoning and recognition. This setting applies to all recognition modules that can interpret a speed-accuracy trade-off setting.

Parameters:

[in]	sid	Settings Collection ID.
[in]	Tradeoff	Recognition algorithm trade-off setting to be set.

Return values:

RECERR

Note:

Currently the trade-off setting is effective for the following recognition modules:

The RM_OMNIFONT_PLUS2W and RM_OMNIFONT_PLUS3W voting recognition modules,
The multi-lingual RM_OMNIFONT_MOR omnifont recognition module,
The RM_OMNIFONT_MTX omnifont recognition module,
The RM_HNR handprinted numeral recognition module,
The RM_RER handprinted alpha recognition module.
The RM_ASIAN Asian recognition module in CCJK recognition.

The trade-off setting is also effective for the kRecPreprocessImg and kRecLocateZones functions. This setting does not influence the behaviour of any other RECOGNITIONMODULE. Not all of the above modules interpret the settings in the same way. (See RMTRADEOFF.) (The modules MTX, MAT, HNR, and the FAST mode of PLUS2W and PLUS3W are supported on: Windows, PLUS3W is supported on: Windows, Linux, RER is supported on: Windows, Linux, Mac OS X.)

If this function is not called, the default value, TO_ACCURATE is applied.

This function sets the value of the setting Kernel.OcrMgr.TradeOff. This setting can be retrieved by kRecGetRMTradeoff. Recognizing normal (non-image-only) PDF files, the setting Kernel.OcrMgr.PDF.TradeOff may modify the effect of Kernel.OcrMgr.TradeOff.

The specification of this function in C# is:

 RECERR kRecSetRMTradeoff(int sid, RMTRADEOFF Tradeoff);

RECERR RECAPIKRN kRecSetTrainingFileName	(	int	sid,
		LPCTSTR	pFileName
	)

Setting the training file.

The kRecSetTrainingFileName function specifies and loads a training file for the omnifont recognition modules. A training file contains the results of a previous training session, and its use may influence the behavior of these recognition modules.

Parameters:

[in]	sid	Settings Collection ID.
[in]	pFileName	Name of the training file, or NULL. The NULL is used for disabling the use of a previously loaded training file.

Return values:

RECERR

Note:

Training of recognition modules is supported on: Windows, Linux.

This function enables the use of training data stored in a training file. On a document with repeated printing errors or with an unusual typeface some further training can be done to achieve higher accuracy. The result of user-conducted training done with the stand-alone OmniPage Professional can be saved into a training file that can be loaded whenever a similar document is to be processed.

This function modifies the value of the setting Kernel.OcrMgr.Training.FileName.

The specification of this function in C# is:

 RECERR kRecSetTrainingFileName(int sid, string pFileName);

Classes

Modules

Typedefs

Enumerations

Functions

Detailed Description

PDF recognition

Typedef Documentation

Enumeration Type Documentation

Function Documentation