RecAPI
Output formats of RecAPIPlus

The following list contains all the selectable output formats of the converters.

The converters can be adjusted through converter settings. The settings of a given converter are created when the given converter is selected for the first time (RecSetOutputFormat). Thus before this these settings cannot be accessed.

The output converters can generate the output file in different output levels (OUTPUTLEVEL, RecSetOutputLevel). See the connection between converters and output levels.

See also:
Vertical text support of the converters.

The output converters are parts of RecAPIPlus, thus they are supported on: Windows, Linux, Embedded Linux, Mac OS X. However not all the formats exist on all these platforms. See below the differing ones. On platforms where RecAPIPlus is not supported or at using only KernelAPI level, instead, use Direct TXT Output Converter Module.

MP3 Audio and MP3 Audio Premium Quality - These converters create MP3 files from the recognition result. MP3 Audio converter uses Real Speak Solo for generating mp3 files, which is installed with OmniPage CSDK. MP3 Audio Premium Quality converter uses Nuance Vocalizer, which can be installed by the User (see Technical Notes for details). Audio converters are supported on: Windows 32-bit. See the settings of MP3 converters.

HTML 3.2 - The HTML 3.2 format is a clear, small but useable HTML format, this format is supported by ‘all’ HTML interpreters (contrary to HTML 4.0.). See the settings of HTML 3.2 converter.

HTML 4.0 - The HTML 4.0 format is not so clear as HTML 3.2, but Cascading Style Sheet (CSS) technology can be used for box-like absolute positioned objects, styles and manipulating all paragraph and character attributes. See the settings of HTML 4.0 converter.

Microsoft Excel 2003, XP - This converter generates files compatible with the older Microsoft Excel file formats (XLS). This converter is supported on: Windows. See the settings of Excel converters.

RTF Word 2000 - RTF format with features available from Microsoft Word 2000. See the settings of RTF converters.

Microsoft Powerpoint 97 - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Powerpoint. This converter is supported on: Windows. See the settings of Powerpoint converters.

Microsoft Publisher 98 - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Publisher. This converter is supported on: Windows. See the settings of Publisher converter.

WordPad - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Wordpad (and other simple RTF readers). This converter is supported on: Windows. See the settings of WordPad converter.

WordPerfect 9, 10 - Wordperfect binary file format for WordPerfect 9 and up. This converter is supported on: Windows. See the settings of WordPerfect 9 and up converters.

Microsoft Word WordML - A converter for the XML-based file format of Microsoft Word 2003. Its features, capabilities and layout retention quality are practically the same as in the RTF Word 2000 converter. This converter is supported on: Windows. See the settings of WordML converter.

Text - This converter writes the recognized text into a simple text file that can be read by most text editors and word processors. See the settings of Text converters.

Comma Separated Text - This converter writes the recognized text into a tabled text file (Comma delimited text file) that can be read by Excel. “List Separator” separates the cells and NL (new line character) separates the lines of the table. See the settings of Text converters.

Formatted Text - This converter writes the recognized text into a text file, but tries to retain the layout of the page by inserting extra spaces. See the settings of Text converters.

Text with linebreaks - The same as Text converter, but this converter inserts line breaks at the end of lines instead of only inserting them at the end of the paragraphs. See the settings of Text converters.

Unicode Text - Same as Text, but using two-byte Unicode characters. See the settings of Text converters.

Unicode Comma Separated Text - Same as Comma Separated Text, but using two-byte Unicode characters. See the settings of Text converters.

Unicode Formatted Text - Same as Formatted Text, but using two-byte Unicode characters. See the settings of Text converters.

Unicode Text with linebreaks - Same as Text with linebreaks, but using two-byte Unicode characters. See the settings of Text converters.

Kindle Document - Kindle e-book converter. This converter is supported on: Windows. See the settings of Kindle converter.

ePub, ePub Simple and ePub Poem - ePub e-book converters. This converter is supported on: Windows. See the settings of ePub converters.

XML - An XML file format conforming to the Nuance XML schema (http://www.nuance.com/omnipage/xml/ssdoc-schema3.xsd or local <CSDK>bin/ssdoc-schema3.xsd). It contains almost all layout related information and paragraph and character attributes. The page XML output format contains a general description of this format. This converter is supported on: Windows. See the settings of XML converter.

XML Paper Specification and XPSsearchable - Microsoft has released a technology called “XML Paper Specification (XPS)” Its specification is available for download at http://www.microsoft.com/whdc/xps/xpsspec.mspx. This file type has nearly the same functionality as Adobe’s PDF. It can “package” electronic documents into a file, so they yield the same look on every output device, such as monitors, printers or even handheld devices. CSDK can open and process XPS files in their native formats; no special software is needed for this. The level of support is very close to that for PDF files. The Toolkit can convert documents to the XPS file type, with a sub-set of the switches available for PDF and with three ‘flavors’:

  • Image Only
  • Image on Text (Searchable)
  • Normal

This converter is supported on: Windows. See the settings of XPS converters.

PDF with image on text - A PDF converter where the original (input) image are retained in the foreground with the recognized text hidden in the background (but in the correct position). Perfect for archiving & indexing documents. This converter is supported on: Windows. See the settings of PDF converters.

PDF - A highly configurable, general PDF output converter. It supports many PDF features, but relies heavily on the position of the recognized characters. This converter is supported on: Windows. See the settings of PDF converters.

PDF with image substitues - A special PDF converter, where the suspect words are covered by their images cut out from the original image. This converter is supported on: Windows. See the settings of PDF converters.

PDF - Edited - This PDF converter does not rely on the position of the recognized charactes, so it can be used even after inserting large new text portions in the editor. This converter is supported on: Windows. See the settings of PDF converters.

Office 2007 support

The Toolkit can generate output for the new Office 2007 file types DOCX (supported on: Windows, Linux, Embedded Linux, Mac OS X), XLSX (supported on: Windows, Linux, Embedded Linux, Mac OS X) and PPTX (supported on: Windows).

The DOCX philosophy is using a set of separate XML, picture and font files, all compressed into one ZIP-like package file. The real document content is housed in a set of XML files, but there are other XML files that define the connections between the content files and the other files. This allows DOCX file sizes to be typically much smaller than a corresponding DOC file.

This DOCX file type specification can download from: http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm

The DOCX/XLSX/PPTX file types conform with a new Microsoft standard called “Open Packaging Conventions (OPC)” whose specification is available for download at http://go.microsoft.com/fwlink/?linkID=71255

In the CSDK, the DOCX file type is similar to the existing RTF and has similar settings.

Our experiences show that the word processor application Pages of Mac OS X cannot handle the more complex DOCX documents. Thus we introduced a new, not too complex converter for less formatted output formats. The Pages converter generates DOCX files, but inside it supports only plain text (OL_NOFORMAT) and formatted text (OL_RFP) output levels. See the detailed description of different ouput levels at RecSetOutputLevel. This converter is supported everywhere DOCX is supported.

The Toolkit does not accept DOCX document input. The Nuance product PDF Create 5 can generate PDF files from DOCX documents.

See also:
Settings of