RecAPI
Document Classifier Module

Document Classifier API. More...

Classes

struct  CLASSIFY_INFO
 Structure for information about classification. More...

Typedefs

typedef struct RECDCSTRUCT * DCHANDLE
 Handle of a Document Classifier object.

Functions

RECERR RECAPIKRN kRecOpenDCProject (int sid, LPCTSTR pDCProjectFile, DCHANDLE *phDCProject)
 Opening Document Classifier Project File.
RECERR RECAPIKRN kRecCloseDCProject (DCHANDLE hDCProject)
 Closing a Document Classifier Project.
RECERR RECAPIKRN kRecGetFirstDCClass (DCHANDLE hDCProject, DCHANDLE *phDCClass)
 Starting enumeration of Document Classes.
RECERR RECAPIKRN kRecGetNextDCClass (DCHANDLE hDCPrevClass, DCHANDLE *phDCClass)
 Performing enumeration of Document Classes.
RECERR RECAPIKRN kRecClassifyPage (int sid, DCHANDLE hDCProject, HPAGE hPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
 Classifying a page.
RECERR RECAPIKRN kRecClassifyText (int sid, DCHANDLE hDCProject, LPCTSTR pText, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
 Classifying text.
RECERR RECAPIKRN kRecClassifyDocument (int sid, DCHANDLE hDCProject, LPCTSTR pFileName, int iPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
 Classifying the given page of a document.
RECERR RECAPIKRN kRecGetDCClassName (DCHANDLE hDCClass, LPTSTR *ppName)
 Returning the name of a Document Class.
RECERR RECAPIKRN kRecSetDCConfidenceThreshold (DCHANDLE hDCProject, int ConfidenceThreshold)
 Set the confidence threshold of a Document Classifier Project.
RECERR RECAPIKRN kRecGetDCConfidenceThreshold (DCHANDLE hDCProject, int *pConfidenceThreshold)
 Get the confidence threshold of a Document Classifier Project.

Detailed Description

Document Classifier API.

For detailed description of this module see its separated documentation Document Classifier.chm.


Function Documentation

RECERR RECAPIKRN kRecClassifyDocument ( int  sid,
DCHANDLE  hDCProject,
LPCTSTR  pFileName,
int  iPage,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying the given page of a document.

This function classifies a document or the given page of the document. The document can contain scanned pages, one page from a PDF file or plain text.

Parameters:
[in]sidSettings Collection ID.
[in]hDCProjectHandle of the Document Classifier Project returned by kRecOpenDCProject.
[in]pFileNameName of the file containing the document. It can be image file, PDF or text file.
[in]iPageThe page number of the page to be processed. This parameter is not used if the input file is text file.
[out]phDCPredictedClassAddress of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out]pConfidenceLevelAddress of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out]pClassifyInfoAddress of a variable to store info about classifying.
[out]pLengthAddress of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out]pIsConfidentAddress of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
This function decides if the input is an image file, PDF or text file, based on the filename extension. DC_UNKNOWNEXTENSION_ERR is returned if the extension is unknown.
If the input is an image file or PDF, the function loads and preprocesses it. If text based classification is enabled, the image is recognized as well.
If the input is a text file (i.e. the filename extension is .txt), only text based classification is possible. The program supports the following text encodings: Unicode (both UTF-16 and UTF-8, with or without Byte Order Mark) and non-Unicode text encoded with Windows default codepage (as set in the Control Panel > Region and Language > Administrative pane > Change system locale).
The function returns the handle of the predicted class, and the confidence of the prediction. You can query the name of the class with kRecGetDCClassName. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
RECERR RECAPIKRN kRecClassifyPage ( int  sid,
DCHANDLE  hDCProject,
HPAGE  hPage,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying a page.

This function classifies the given HPAGE.

Parameters:
[in]sidSettings Collection ID.
[in]hDCProjectHandle of the Document Classifier Project returned by kRecOpenDCProject.
[in]hPageHandle of the page to be classified.
[out]phDCPredictedClassAddress of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out]pConfidenceLevelAddress of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out]pClassifyInfoAddress of a variable to store info about classifying.
[out]pLengthAddress of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out]pIsConfidentAddress of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
If the classifier method (defined in Document Classifier Project) is Text or Combined, the function recognizes the image unless the hPage contains letters at the entry. The langauge of the recognition is defined in the Document Classifier Project. Upon returning hPage contains the result of recognition (OCR zones, letters).
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
RECERR RECAPIKRN kRecClassifyText ( int  sid,
DCHANDLE  hDCProject,
LPCTSTR  pText,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying text.

This function classifies the given text.

Parameters:
[in]sidSettings Collection ID.
[in]hDCProjectHandle of the Document Classifier Project returned by kRecOpenDCProject.
[in]pTextNULL terminated text to be classified.
[out]phDCPredictedClassAddress of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out]pConfidenceLevelAddress of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out]pClassifyInfoAddress of a variable to store info about classifying.
[out]pLengthAddress of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out]pIsConfidentAddress of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
RECERR RECAPIKRN kRecCloseDCProject ( DCHANDLE  hDCProject)

Closing a Document Classifier Project.

This function closes a Document Classifier Project opened by kRecOpenDCProject.

Parameters:
[in]hDCProjectHandle of the Document Classifier Project.
Return values:
RECERR
RECERR RECAPIKRN kRecGetDCClassName ( DCHANDLE  hDCClass,
LPTSTR *  ppName 
)

Returning the name of a Document Class.

This function returns the name of a Document Class.

Parameters:
[in]hDCClassHandle of the Document Class.
[out]ppNameAddress of a variable to store the name of the Document Class.
Return values:
RECERR
Note:
Use this function to obtain the name of the Document Class.
RECERR RECAPIKRN kRecGetDCConfidenceThreshold ( DCHANDLE  hDCProject,
int *  pConfidenceThreshold 
)

Get the confidence threshold of a Document Classifier Project.

The kRecGetDCConfidenceThreshold returns the confidence threshold of the given Document Classifier Project.

Parameters:
[in]hDCProjectHandle of the Document Classifier Project returned by kRecOpenDCProject.
[out]pConfidenceThresholdAddress of an integer variable to get the confidence threshold.
Note:
The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed after the Document Classifier Project File is loaded.
RECERR RECAPIKRN kRecGetFirstDCClass ( DCHANDLE  hDCProject,
DCHANDLE phDCClass 
)

Starting enumeration of Document Classes.

This function returns the handle of the first Document Class of the given project.

Parameters:
[in]hDCProjectHandle of the Document Classifier Project.
[out]phDCClassAddress of a variable to store the handle of the first Document Class.
Return values:
RECERR
Note:
The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
The name of the class can be queried by kRecGetDCClassName().
RECERR RECAPIKRN kRecGetNextDCClass ( DCHANDLE  hDCPrevClass,
DCHANDLE phDCClass 
)

Performing enumeration of Document Classes.

This function returns the handle of the next Document Class of the given project.

Parameters:
[in]hDCPrevClassHandle of the previous Document Class.
[out]phDCClassAddress of a variable to store the handle of the next Document Class.
Return values:
RECERR
Note:
The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
The name of the class can be queried by kRecGetDCClassName().
RECERR RECAPIKRN kRecOpenDCProject ( int  sid,
LPCTSTR  pDCProjectFile,
DCHANDLE phDCProject 
)

Opening Document Classifier Project File.

The kRecOpenDCProject opens a Document Classifier Project File (*.dcp).

Parameters:
[in]sidSettings Collection ID.
[in]pDCProjectFilePath to the Project File.
[out]phDCProjectAddress of a variable to store the handle of the Document Classifier Project.
Return values:
RECERR
Note:
Use the Document Classifier Assistant to create, train and test a Document Classifier Project. Document Classifier Assistant lets you define classes, add training and test documents to the classes, train and test the document classifier. After Training and Testing Process you can export a Document Classifier Project File, which contains all the necessary information to perform classification. CSDK provides API (Document Classifier API) for loading the Document Classifier Project File and classify documents.
If the project is no longer needed it should be closed by invoking the kRecCloseDCProject function.
RECERR RECAPIKRN kRecSetDCConfidenceThreshold ( DCHANDLE  hDCProject,
int  ConfidenceThreshold 
)

Set the confidence threshold of a Document Classifier Project.

The kRecSetDCConfidenceThreshold sets the confidence threshold of the given Document Classifier Project.

Parameters:
[in]hDCProjectHandle of the Document Classifier Project returned by kRecOpenDCProject.
[in]ConfidenceThresholdThe value of the current confidence threshold;
Note:
The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed during the after the Document Classifier Project File is loaded.