RecAPI
Layout Retention Output Module

Layout Retention Output. RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. More...

Classes

struct  OUTPUTCONVERTERINFOA
 Output document converter information (ANSI) More...
struct  OUTPUTCONVERTERINFOW
 Output document converter information (Unicode) More...

Defines

#define OUTPUTCONVERTERINFO   WORA(OUTPUTCONVERTERINFO)
 Output document converter information.

Enumerations

enum  DocFormatter_Mode {
  DFM_Lite = 0x00,
  DFM_CharacterStyleConsolidation = 0x01,
  DFM_ParagraphStyleConsolidation = 0x02,
  DFM_StyleConsolidation = 0x03,
  DFM_HeaderFooter = 0x04,
  DFM_CrossrefFind = 0x08,
  DFM_MarginConsolidation = 0x10,
  DFM_Full = 0xFF
}
 Document formatter methods. More...
enum  TColorQValues
 Color quality. More...
enum  TPDFCompatibTypeValues
 Compatibility. More...
enum  R2_HEADERS_RETENTION
 HeadersFooters. More...
enum  TWriteIndex
 Index Page. More...
enum  TPDFSecurityValues
 PDFSecurity type. More...
enum  R2_PAGEBREAKS
 PageBreaks. More...
enum  R2_PICTURES_BPP
 Picture color. More...
enum  R2_PICTURES_DPI
 Pictures. More...
enum  TSignatureTypevalues
 Signature type. More...
enum  R2_TABLES_RETENTION
 Tables. More...
enum  TMRCTypeValues
 MRC use. More...
enum  OUTPUTLEVEL {
  OL_AUTO,
  OL_NOFORMAT,
  OL_RFP,
  OL_TRUEPAGE,
  OL_FLOWINGPAGE,
  OL_SPREADSHEET
}
 Output level of the exported document. More...

Functions

RECERR RECAPIPLS RecSetOutputFormat (int sid, LPCTSTR pFormatname)
 Set the output format.
RECERR RECAPIPLS RecGetOutputFormat (int sid, LPTSTR pFormatname, int len)
 Ask the output format.
RECERR RECAPIPLS RecGetFirstOutputFormat (LPTSTR pFormatname, int len)
 Start the enumeration of the output formats.
RECERR RECAPIPLS RecGetNextOutputFormat (LPTSTR pFormatname, int len)
 Continue the enumeration of the output formats.
RECERR RECAPIPLS RecGetOutputFormatInfo (LPCTSTR pFormatName, OUTPUTCONVERTERINFO *pInfo)
 Get information about the specified output document format converter.
RECERR RECAPIPLS RecGetOutputSettingsHandle (int sid, HSETTING *hSetting)
 Gets the settings handle for the currently set output format.
RECERR RECAPIPLS RecSetOutputLevel (int sid, OUTPUTLEVEL outLevel)
 Set the level of format retention for the final output document.
RECERR RECAPIPLS RecGetOutputLevel (int sid, OUTPUTLEVEL *poutLevel)
 Ask the current level of format retention for the final output document.

Detailed Description

Layout Retention Output. RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X.

RecAPIPlus provides complex accurate layout retention outputs with several file formats such as RTF, DOC, WordML, XLS, PDF, WP, WAV. The RecConvert2Doc and RecProcessPagesEx functions export the given document into the previously mentioned output file formats. See the details about support of different output formats on different platforms.

In several cases our goal is to retain the original layout in the output document, as far as possible. The different converters have different capabilities for retaining the layout. There are 5 output levels (OUTPUTLEVEL) for the several layout retentions. Not every converter can realize every output mode. For example a Word document or a PDF document has Flowing Page and True Page modes, which are very similar to the original output and there are simple text converters, which can retain only the simple text in Plain Text mode (formerly No Format mode) and the text with its attributes in Formatted Text mode (formerly Retain Font and Paragraphs mode).

Besides the output modes, converters have many settings, which can influence the layout (list of converter settings). You can use these settings through the Settings Manager Module.

Page consolidation

When scanning pages from a document with uniform margins, typically the page images do not place the body text content in precisely the same position on each page, due to scanning variations. Previously, users had to manually restore uniform margins after the recognition result was exported. This toolkit examines incoming pages and if it determines that they have similar text area and layout, page consolidation is automatically performed. The program calculates ideal margins, then identifies a vector for each page describing the difference between the actual and ideal margins. These vectors are then applied during the output process to the following file types: RTF, WordML, PDF, DOCX and XPS. This consolidation is totally automatic and cannot be influenced. However the User can decide whether the converter should be apply these vectors or not using the setting ConsolidatePages of the given converters.


Define Documentation

#define OUTPUTCONVERTERINFO   WORA(OUTPUTCONVERTERINFO)

Output document converter information.

On Windows this type can be used as OUTPUTCONVERTERINFOA or OUTPUTCONVERTERINFOW depending on _UNICODE macro. On Linux and Macintosh this is equivalent to OUTPUTCONVERTERINFOA.


Enumeration Type Documentation

Document formatter methods.

These are the possible values of the setting Formatter.df.mode.

Enumerator:
DFM_Lite 

Add page to document.

DFM_CharacterStyleConsolidation 

Find/Consolidate Character styles.

DFM_ParagraphStyleConsolidation 

Find/Consolidate Paragraph styles.

DFM_StyleConsolidation 

Find/Consolidate Paragraph and character styles too.

DFM_HeaderFooter 

Find header/footers.

DFM_CrossrefFind 

Find crossrefs - internal crossrefs - not implemented yet.

DFM_MarginConsolidation 

Margin consolidation.

DFM_Full 

All.

Output level of the exported document.

Pre-defined levels of the format retention for the final output document. The different property values belonging to these settings are documented in the RecSetOutputLevel function. See also the table of the supported output levels by each converters.

Enumerator:
OL_AUTO 

Converter default

OL_NOFORMAT 

Plain text (formerly No formatting mode).

OL_RFP 

Formatted Text (formerly Retain Font and Paragraphs mode).

OL_TRUEPAGE 

True Page.

OL_FLOWINGPAGE 

Flowing Page.

OL_SPREADSHEET 

Spreadsheet.

HeadersFooters.

You can set how headers and footers should be handled. You can set it for every converter, but the default value is different. For more information, see the setting HeadersFooters in the summary table of converter settings.

PageBreaks.

For several converters you can set how you want page breaks to be handled. For more information, see the setting PageBreaks in the summary table of converter settings.

Picture color.

For several converters, you can set the color of the image. For more information, see the setting PictureColor in the summary table of converter settings.

Pictures.

For every converter you can set how you would like to handle images. The default values are different for the different converters. For more information, see the setting Pictures in the summary table of converter settings.

Tables.

For every converter, except the Excel and Html converters, you can set how you would like to handle tables. For more information, see the setting Tables in the summary table of converter settings.

Color quality.

For the PDF converters you can set the color quality. The default is R2ID_PDFCOLORQUALITY_MIN for every PDF converter. For more information, see the setting ColorQuality in the summary table of converter settings.

MRC use.

For PDF converters you can set the MRC type. The default is: R2ID_PDFMRC_NO for every PDF converter. For more information, see the setting UseMRC in the summary table of converter settings.

Compatibility.

For the PDF converters you can set this compatibility value. For more information, see the setting Compatibility in the summary table of converter settings.

PDFSecurity type.

For PDF converters you can set the security type. For more information, see the setting PDFSecurity.Type in the summary table of converter settings.

Signature type.

For the PDF converters you can set the signature type. The default is: R2ID_SIGTYPENONE for every PDF converter. For more information, see the setting Signature.SignatureType in the summary table of converter settings.

Index Page.

You can switch on the Index Page generation in simple or 'InFrame' mode using HTML output converters. If it is switched on, an index page is generated with links to the recognized and converted pages. In this case, you can change the text of the navigation links by changing NavNextText, NavPrevText or NavTOCText. For more information, see the setting IndexPage in the summary table of converter settings.


Function Documentation

RECERR RECAPIPLS RecGetFirstOutputFormat ( LPTSTR  pFormatname,
int  len 
)

Start the enumeration of the output formats.

This starts the enumeration of the document output formats in the current thread.

Parameters:
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
    RECERR RecGetFirstOutputFormat(StringBuilder formatName);
    // or
    RECERR RecGetFirstOutputFormat(out string formatName);
There is a non-enumerating function replacing RecGetFirstOutputFormat and RecGetNextOutputFormat in C#:
 RECERR RecGetAllOutputFormats(out string[] formatnames); 
RECERR RECAPIPLS RecGetNextOutputFormat ( LPTSTR  pFormatname,
int  len 
)

Continue the enumeration of the output formats.

This continues the enumeration of the document output formats in the current thread.

Parameters:
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
    RECERR RecGetNextOutputFormat(StringBuilder formatName);
    // or
    RECERR RecGetNextOutputFormat(out string formatName);
There is a non-enumerating function replacing RecGetFirstOutputFormat and RecGetNextOutputFormat in C#:
 RECERR RecGetAllOutputFormats(out string[] formatnames); 
RECERR RECAPIPLS RecGetOutputFormat ( int  sid,
LPTSTR  pFormatname,
int  len 
)

Ask the output format.

This asks the output document format for the RecConvert2Doc, RecProcessPagesEx functions.

Parameters:
[in]sidSettings Collection ID.
[out]pFormatnameBuffer containing the converter name.
[in]lenLength of the buffer.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
This function gets the value of the setting APIPlus.Output.TextFormat. This setting can be changed by RecSetOutputFormat.
The specification of this function in C# is:
    RECERR RecGetOutputFormat(int sid, StringBuilder formatName);
    // or
    RECERR RecGetOutputFormat(int sid, out string formatName);
RECERR RECAPIPLS RecGetOutputFormatInfo ( LPCTSTR  pFormatName,
OUTPUTCONVERTERINFO *  pInfo 
)

Get information about the specified output document format converter.

Parameters:
[in]pFormatNameThe name of the output conversion format.
[out]pInfoPointer to an OUTPUTCONVERTERINFO variable.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
The specification of this function in C# is:
 RECERR RecGetOutputFormatInfo(string formatName, out OUTPUTCONVERTERINFO pInfo); 
RECERR RECAPIPLS RecGetOutputLevel ( int  sid,
OUTPUTLEVEL poutLevel 
)

Ask the current level of format retention for the final output document.

Parameters:
[in]sidSettings Collection ID.
[out]poutLevelPointer to output level variable.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
This function gets the value of the setting APIPlus.Output.OutputLevel. This setting can be modified by RecSetOutputLevel.
The specification of this function in C# is:
 RECERR RecGetOutputLevel(int sid, out OUTPUTLEVEL outLevel); 
RECERR RECAPIPLS RecGetOutputSettingsHandle ( int  sid,
HSETTING hSetting 
)

Gets the settings handle for the currently set output format.

Parameters:
[in]sidSettings Collection ID.
[out]hSettingPointer to the setting handle
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
For example for the "RTF2000" converter, this function returns the handle for the "Converters.Text.RTF2000" setting.
The specification of this function in C# is:
 RECERR RecGetOutputSettingsHandle(int sid, out IntPtr hSetting); 
RECERR RECAPIPLS RecSetOutputFormat ( int  sid,
LPCTSTR  pFormatname 
)

Set the output format.

It sets the output document format for the RecConvert2Doc, RecProcessPagesEx functions.

Parameters:
[in]sidSettings Collection ID.
[in]pFormatnameConverter name.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
This function sets the value of the setting APIPlus.Output.TextFormat. This setting can be retrieved by RecGetOutputFormat.
See the list of the selectable output formats for more information. In addition see the connection between output formats and output levels, and the information about vertical text support of each converter.
The converter name must be the root of the given converter in the setting tree (e.g. Converters.Text.DocX). For more information see the list of converter settings.
The settings of a given converter are created when the converter is selected for the first time (RecSetOutputFormat). Thus before this action the mentioned settings cannot be accessed.
The specification of this function in C# is:
 RECERR RecSetOutputFormat(int sid, string formatName); 
RECERR RECAPIPLS RecSetOutputLevel ( int  sid,
OUTPUTLEVEL  outLevel 
)

Set the level of format retention for the final output document.

This function can simplify the specifying of output formatting details for the output document.

Parameters:
[in]sidSettings Collection ID.
[in]outLevelThe output level.
Return values:
RECERR
Note:
RecAPIPlus level of CSDK is supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the output formats are supported on all these platforms. See details.
This function sets the value of the setting APIPlus.Output.OutputLevel. This setting can be retrieved by RecGetOutputLevel. In addition, each converter has a setting OutputMode, of which value is used when the OutputLevel is OL_AUTO.
Output levels:
  • Plain Text (formerly No formatting mode) - All formatting information is ignored and replaced by a default value. (One column, left aligned paragraphs, no font attributes, a default font, etc.) Tables and graphics are retained and placed within the text. Highlights, strikeouts and marking for redaction are not transmitted, but redacted text is blacked-out.
  • Formatted Text (formerly Retain Font and Paragraphs mode) - The formatting information on fonts and paragraphs is retained, but the layout related information is ignored. Highlights and strikeouts and redactions are conserved. (This level has a special purpose when saving to Excel: each detected table or spreadsheet in a document is saved to a separate worksheet. Other content is placed on the last worksheet and functions as an index. The tables are replaced by hyperlinks to their own sheet.)
  • True Page - This keeps the look of the original layout of the pages. This is done by absolute positioning of the texts, pictures and tables on the page with boxes, frames or other target application specific methods. This level is only available for target applications capable of handling these. True Page level is the only choice for the XML converter and for all PDF converters except for `PDF Edited'.
  • Flowing Page - Preserves the original layout of the pages, including retaining columns. Boxes and frames are only used when necessary. This level is only available with target applications that can handle columns.
  • Spreadsheet - This level exports the results in tabular form, suitable for use in spreadsheet applications. Each page is placed in a separate worksheet. This level is only available for the Excel and the HTML 3.2 formats.
In the Microsoft Word and Power Point programs, the size of the page is limited. The width and height of the page must be between 0.1 and 22 inches. Because of this, if you scanned or loaded a page that is larger than this limit, you cannot save it into Flowing Page or True Page formats with *.rtf and *.docx and *.pptx file extensions. These formats try to retain the original page size and layout. If you try to save this, you will get an error message and the file will not be saved.
See also the connection between output formats and output levels, and the information about vertical text support of each converter.
The specification of this function in C# is:
 RECERR RecSetOutputLevel(int sid, OUTPUTLEVEL outLevel);