RecAPI
Building a robust 24/7 conversion module

There can be cases, when an application requires a functionality, which

  • runs in the background;
  • runs for a long or unlimited time;
  • usually do not have an interactive UI;
  • do not have anyone to intervene in case of a problem.

In such a case a possible solution can be a 24/7 module integrated into the main application. Below there are some recommendations to build such a module. In general the following architecture can fulfill the requirements:

24x7struct.png
  • It is crucial to put the OCR functionality (the worker module) into a separate executable, which can be run in a separate process.
  • This worker module can be controlled through a small amount of high-level commands, e.g. getting page count, recognizing a page, recognizing a document.
  • A thin 24/7 master layer can control the worker module and can restart it if it is necessary using interprocess communication (e.g. pipe, shared memory and semaphores, TCP/IP, .NET remoting).
  • It can be useful to add a watchdog (monitoring) functionality to the master layer. The timeout handling of the CSDK alone might not be enough.
  • Even detected errors may leave the process in an unstable state. Even a warning may be a sign of an unstable internal state. So a restart can be reasonable:
    • at a timeout,
    • after errors,
    • after warnings,
    • regularly (after a few thousand pages).
  • It can be useful to implement a Quit command on the slave layer in order to terminate the worker process in a clean way. On the master side the function TerminateProcess() should be called only as a last resort.
  • After restarting due to an error, or timeout it can be useful to try the recognition again for the latest page or document causing the error.

More details can be found in 24x7Guide.pdf.