Skip to content

<Weeks 7,8> OCR batch processing based on internal-multi threading

Monday, 8 August 2022 | Quoc Hung Tran

Seventh week

In the seventh week, I will implement the OcrTesseract Engine, an internal object, which handles the input scanned document image, get optional tesseract values from the dialog and release the OCR output text and the text file.

figure1

The detailed architecture of this object is shown in the following :

figure1

It implements a function runOcrProcess() that uses Tesseract CLI, a command line program that accepts text input to execute operating system functions. This function returns the type enum state of a process and saves the results are saved in member variables as well :

ValueDescription
PROCESS_COMPLETEAll stages done.
PROCESS_FAILEDA failure happen while processing.
PROCESS_CANCELEDUser has canceled processing.

Eighth week

This week, I started implementing the backend part, an internal multi-thread to manage the batch processing and Data object delivery to send from backend to frontend.

Text Converter Thread Management

This plugin will use an internal multi-thread for OCR processing images. Classes in this part will be implemented based on existing objects to manage and chain threads in digikam. The idea is inspired by using QThreadPool and QRunnable. Existing threads can be reused for new tasks, and QThreadPool is a collection of reusable QThreads.

The part TextConverterActionThread manages the functioning of TextConverterTask. Concretely, TextConverterActionThread manages the instantiations of TextConverterTask.

Each Text Converter Task will initialize one Ocr Tesseract engine object to manage one URL of an image.

The purpose is to allow each OCR image process to run in parallel and stop properly. The run() method of object TextConverterTask is a virtual method that will be reimplemented to facilitate advanced thread management. Here is the architecture of this part:

figure1

Here is a sequence diagram representing the communication between GUI and with backend interface.

figure1

When the user clicks on the button “Start OCR”, the object TextConverterThread instantiates objects TextConverterTask and sets up Tesseract options from the dialog to them. By receiving the signal “clicked” from the dialog button, TextConverterTask creates an OCR engine to control the image’s Url. When the process is finished, all the necessaire outputs are set on the widget list of pictures and text editor.

The most important part is how to deliver the output and set them up on the dialog interface. For this problem, I implement a class Text Converter Data containing the status of a process, the destination path of an output file, and the recognized text extracted from the image. Output data is transferred to the dialog through two signals :

signalStarting(TextConverterActionData), signalFinished(TextConverterActionData)

figure1

In the next few weeks, I will:

  • Implement the functionalities of storing OCR result.
  • Polish and re-implement code if necessary.

Main commits