Skip to content

Thursday, 28 July 2022

In the last edition, I was talking about adding an appropriate dataset to the Comparator Activity. This blog majorly consists of its implementation and the new changes that we now have in the comparator.

Restructuring the whole activity

It was initially planned to add the dataset to the comparator in the following manner:

4-5 years: Numbers from 1-9
5-6 years: Numbers from  1-19
6-7 years: Numbers from  1 - 100
7-8 years: Numbers from  1 - 1 000
8-9 years: Numbers from  1 - 1 000 000
9-10 years: Numbers from  1-1 000 000 000

While adding the dataset made for children aged 9–10 years, we realised that the right and left texts started to overlap for small screen sizes. This led me to restructure the whole activity so the new skeleton did not contain row elements but left and right texts were anchored to the centre to remain intact even for small screen sizes. This seemed to be the best possible solution for this predicament.

The final solution to text overlapping

The above-mentioned restructuring did solve 90% of the issue, but the last level numbers had over 9 digits and they were still overlapping. Another approach to fitting text into a rectangle element was thought of, but it did not seem to deliver the best results.

Finally, this issue was sorted after setting the custom font size to tiny, which would automatically adjust according to the screen size.

Fixing Issues

Other issues investigated included making the layout responsive for different screen sizes, handling errors in the log, fixing the logic to avoid multiple wins, adding consistency in variable declarations and their datatypes, adding canvas to the entire exercise view zone, and some issues that arose as a result of the activity’s restructuring, such as selection on tapping the rows and default selection of the first row when the activity opens.

Apart from this, dividing the repeated code snippets into a separate component file and keyboard binding for the activity have also been implemented this time.

Revamping Comparator

As suggested by mentors, in order to make the activity more organised, we have proposed a new mock-up for the comparator activity, which can be found here.

What’s next?

After the GSoC mid-evaluation, I have updated the project timeline.

We are planning to implement a feature to highlight the wrong answers and revamp the comparator as proposed. So, stay tuned for it!

I am going to be attending Akademy 2022 in person!

This is my first time going to Akademy in-person, so it is quite exciting! I will be doing a talk with Bhushan on the state of Plasma Mobile.

A bit of background on my experience at KDE:

I first started contributing to KDE in 2020, creating the KClock project as a way to learn Qt Quick, and to pick up one of the pending tasks of Plasma Mobile.

Through 2020, I worked on many Plasma Mobile applications such as KWeather, and also started poking around contributing to the shell. I also did some work for the desktop, including some work on adding fingerprint support to the users kcm.


In 2021 and 2022, I got much more heavily involved in KDE frameworks (namely Kirigami) and Plasma Mobile shell contributions.

One of the most exciting developments in the project over the past few years has been the expansion in the number of devices that can now run Plasma Mobile. I will be sure to bring some devices I have for demo purposes :3


Overall, I am very excited to be able to share a lot of the work that has been done when I attend this year’s Akademy! Be sure to check out my talk!

someone here is me…

Monday, 25 July 2022

 

https://phabricator.kde.org/source/latte-dock/

 
Hello everyone,
 
Unfortunately I would like to inform kde community that I am stepping away from Latte development. No time,motivation or interest from my part is the main reason. I hope that this will give free space and air for new developers/maintainers to step in and move Latte forward.
 
I hoped that I would be able to release Latte v0.11 but unfortunately I can not. Releasing Latte v0.11 it would mean that someone would maintain it afterwards and that is no more the case. 
 
For the last 6 years developing Latte was a beautiful journey and taught me plenty new things. I would like to thank you all for that beautiful journey, kde community members, users, developers, enthusiasts and plasma developers.
 
Have fun and enjoy life...
 
 

In the following weeks (June 27,2022 to July 10, 2022), for each test case, I will try to :

  • Research algorithms for pre-processing image.
  • Gather many test cases for improving pre-processing methods.

Based on the results of previous research, I realize that Tesseract does various processing internally before doing the actual OCR. However, some instances still exist where Tesseract can result in a signification reduction in accuracy.

So in this post, I would like to introduce some pre-processing methods that apply directly before passing in Tesseract.

Removing Shadow

  • The main idea is to extract and segment the text from each channel by using Background subtraction (BS) from the grayscale image, which is a common and widely used technique for generating a foreground mask.

  • BS calculates the foreground mask performing a subtraction between the current frame and a background model, containing the static part of the scene or, more in general, everything that can be considered as background given the characteristics of the observed scene.

  • The purpose is to remove the shadow or noise from background of each test case (in each set case here the background of scanned image can be influenced by many external factors)(convert the background into white or black) and then merge each track into an image.

For simplify, here is an example, we can see the Tesseract output can be affected by the light.

figure1

Please look at three picture channels; the amount of shadow in blue in the channel is the most distributed.

figure1

Implementation

First step

Extract the background by making blur the part of the text. Morphological operations are helpful right now.

  • Grayscale dilation of an image involves assigning to each pixel the maximum value found over the neighborhood of the structuring element by using a square kernel with odd-dimensional (kernel 3 x 3).

  • The dilated value of a pixel x is the maximum value of the image in the neighborhood defined.

Dilation is considered a type of morphological operation. Morphological operations are the set of operations that process images according to their shapes.As the kernel B is scanned over the image, we compute the maximal pixel value overlapped by B and replace the image pixel in the anchor point position with that maximal value. As you can deduce, with background with and black text, this maximizing operation causes dark regions within an image to reduce. The bigger size of the kernel is, the blurrier text is. They are transferred as noise as we need to remove.

figure1

Second step

Median Blur helps smooth image and remove noise produced from previous operations. They take each background more generally.

figure1

Third step

cv2.absdiff is a function that helps find the absolute difference between the pixels of the two image arrays. Using this, we can extract just the pixels of the text of the objects. Or on the other way, the background becomes white; the shadow is all removed.

figure1

And finally, merging all channels into one image, we get the following result. We can see the noise in background is removed, we obtain of course a better output.

figure1

Here is more the example test cases that I have tried, and all we almost get the better results:

figure1

figure1

Perspective correction

The issue is more specific to some image acquisition devices (digital cameras or Mobile devices). As a result, the acquired area is not a rectangle but a trapezoid or a parallelogram. Once the image transformation is applied, the corrected font size will look almost similar and give a better OCR result.

I was really helpful to use these ressources to help me explain this post :

Post 1

Post 2

Implementation

Building the optional document scanner with OpenCV can be executed in the following steps :

  • Step 1: Detect edges.
  • Step 2: Use the edges in the images to find the contour (outline) representing the piece of paper being scanned.
  • Step 3: Apply a perspective transform to obtain the top-down view of the document.

Take a look at an example.

First, to speed up the image processing process and improve our edge detection more accurately, we take the ratio of how big our image is compared to the height of 500 pixels, then resize the image.

Convert the image from colored to grayscale, use Gaussian blurring to remove high-frequency noise, and then apply a bilateral Filter, which is highly effective in noise removal while keeping edges sharp. Then I used one Median blur operation similar to the other averaging methods. Here, the central element of the image is replaced by the median of all the pixels in the kernel area. This operation processes the edges while removing the noise. All these methods are helpful for edge detection and improving accuracy. See the following steps of this process:

figure1

Second, we scan a piece of paper that usually will take the shape of a rectangle, so we now know that we have a rectangle with four points and four edges. Therefore, we will assume that the most prominent contour in the new image will take precisely four points in our piece of paper; another way, the most prominent edges have a higher probability of being documents we are scanning.

The algorithm to find the largest contours :

  • Use cv.findContours() function, who finds contours in a binary image. Contours is a list of all the contours in the image under form (x,y) coordinates of boundary points of the object and then sort the contours by area and keep only the largest ones. This only allows us to examine the largest contours, discarding the rest.

  • It loops over the contours and uses cv2.approxPolyDP function to smooth and approximate the quadrilateral. cv2.approxPolyDP works for cases with sharp edges in the shapes like a document boundary.

  • Having four points, we save all rectangles and widths; we determine the width and height, and the top left two points of our rectangle (the largest one) or the largest rectangle should be our document.

figure1

  • Apply four points transformation algorithm; you can see more here. This gives us the bird look to the document like you are flying and see from the vertical perspective “birds eye view.”, which computes the perspective transform matrix by using getPerspectiveTransform function and applies it to obtain our top-down view by calling the function cv2.warpPerspective. Here is the good result :

figure1

Now we take a look at some test cases :

figure1

figure1

figure1

figure1

The method works quite well, but in some cases, the algorithm doesn’t act well because the contour of the piece of paper is quite blurred compared to the background color, so the user should take a photo whose edge is apparent. For example:

figure1

Conclusion

These main ideas basically may be implemented into the plugin into an OCR pre-processing dialog in the future.These main ideas basically may be implemented into the plugin into an OCR pre-processing dialog in the future.

In the following weeks (June 11, 2022 to July 25, 2022 ), I will try to :

  • Make decisions to choose the architecture of the plugin.
  • Design UML for each component of the plugin.
  • Write documentation.

Frontend:

The idea of the OCR processing plugin in digikam is inspired by another plugin used for the conversion from the RAW to the DNG version. The GUI for the batch processing of the plugin, is a dialog, consists of :

The widget list of images to be processed in OCR on the left of the dialog.

On the right side, optional widgets display all the settings components for setting up Tesseract options:

  • Language : Specify language(s) used for OCR.
  • Segmentation mode : Specify page segmentation mode.
  • Engine mode : Specify OCR Engine mode.
  • Resolution dpi : Specify DPI for the input image.

A text Editor for visualizing the content of text detected in the image.

A button to save the content in files.

Important link:

I would like to share a single merge request that contains principally all the implementation of the plugin in the Gsoc period :

https://invent.kde.org/graphics/digikam/-/merge_requests/177

Implementation :

First of all, a TextConverterPlugin is created, an interface that contains brief information about the OCR processing plugin. TextConverterPlugin is inherited from the class DPluginGeneric, a Digikam external plugin class whose virtual functions are overridden for the new features.

This object includes methods overriding from the parent class:

FunctionsDescriptions
name()Returns the user-visible name of the plugin, providing enough information as to what the plugin is about in the context of digiKam.
iid()Returns the plugin interface’s unique top-level internal identification property of the plugin interface. In this case, the formatted identification text is a constant substitute token-string like: “org.kde.digikam.plugin.generic.TextConverter”
icon()Return an icon for the plugin supported by QIcon
authors()Return an authors list for the plugin with authors details informations like author ’s names, their emails, copyright year and roles
description()Return a short description about the plugin
detail()Return a long description about the plugin
setup()Create all internal object instances for a given parent.

The interface is shown like:

figure1

Text Converter Dialog

The idea is to set up a dialog widget. There is a dialog box using TextConverterDialog to list the files processed in OCR and a status to indicate the processing.

TextConverterDialog is a DPluginDialog (Digikam defaulted plugin dialog class) that uses QDialogButtonBox, which presents buttons in a layout that conforms to the interface guidelines for that platform, allows a developer to add buttons to it, and will automatically use the appropriate format for the user’s desktop environment. We can see the design of the dialog here:

figure1

It implements principally a slot TextConverterAction() that uses internal methods and is called to apply OCR processing into the image after pre-processing.

The main dialog consists of all Tesseract options widgets (Text converter Settings) and a text editor to view the OCR result discussed in the following sections.

Text converter Settings

Text Converter Settings object is a widget containing optional widgets displaying all the settings components for setting up Tesseract options for users to select. The main options for these settings are

Three DcomboBox widgets (a combo box widget re-implemented with a reset button to switch to a default item ):

  • Page Segmentation mode (psm).
  • Specify OCR Engine mode.
  • The language or script to use.

One DNumInput an integer num input widget in Digikam using for Tesseract option dpi resolution.

Two QCheckbox for two options for saving OCR Text into separate text files and hosting text recognized by OCR in XMP (Extreme Memory Profile).

TextConverterSettings is a member object of the dialog. Here is a visualization of the OCR settings

figure1

Text Editor

A text Editor for visualizing the content of text detected in the image. A QTextWidget + Sonnet spell checker showing the recognized text from a scanned document. If i select a file from the left side list, the reader is changed accordingly. A text Editor is [DTextEdit which combined these two functionalities.

figure1

Text Converter List

figure1

The widget list of images to be processed in OCR, where the urls are pointing to the pictures, is located in a generic list for all plugins based on QTreeWidget. TextConverterList is inherited from DItemList, an abstract item list. TextConvertList composes DItemsListViewItem, which is an interface of Tree widget items that are used to hold rows of information. Rows usually contain several columns of data, each of which can contain a text label and an icon.

Each text converter item consists of 4 specific features:

ColumnDescription
File Namea URL pointing to the image.
Recognized Wordsnumber of words recognized
Target Filea target file that saves converted text from images.
Statusan indication during processing.

Here is a capture of the text convert List widget which I implemented:

figure1

Results

The architecture and the position of each widget components are designed in the following image. This visualization help me more easy to implement:

figure1

Here is the expected of GUI for the batch processing of the plugin :

figure1

Main commits

3f9d7895

81bf0d9c

5970d961

f17dce69

Next step

In the next few weeks, I will:

  • Implement the Ocr Tesseract Engine object used for OCR text from the image.
  • Implement an internal multi-thread for OCR processing image.
  • Polish and re-implement code if necessary.

Four workers arrive at a construction site to help. They each take a shovel and are eager to start shoveling. But what is that? They notice some dents and some dried cement on the shovels. So what now?

Worker 1

The first worker shrugs and starts shoveling. Better to work with 75% efficiency than not to work at all. Work needs to be done after all.

Worker 2

Another worker takes some time to look for a hammer and frees the shovel from the hardened cement and even flattens some of the dents. After two hours of fixing the tools this worker is able to work with 85% efficiency.

Worker 3

Then there is one worker who says: "Skrew it, I do not like how work is being done here so I will look for another construction site." This worker spends three hours looking for a construction site with new and shiny shovels and is able to work the rest of the day with 90% efficiency. Why not 100%? Well, every construction site has its own problems.

Worker 4

Meanwhile the last worker is busy waving around the shovel and complaining to everyone about the broken tools they have to use. This worker constantly demands a new shovel even though another worker made it clear that there would be no newer shovels any time soon and that nothing could change that. At the end of the day the efficiency of this approach lies at -160%. Not only was no work done by this worker because of all the waving and complaining. Furthermore, the constant complaining kept several other people busy who would have been able to do some work otherwise.

Dear contributors, please don't be Worker 4.

Sunday, 24 July 2022

Akademy 2022 is almost around the corner, and I'm excited to be able to travel again after almost 3 years of Covid, lockdowns and more. I hope to meet all my KDE friends at the beautiful Spanish city of Barcelona, and remove the dust of my C++/QML coding skills. I do hope to be ableContinue reading "I’m going to Akademy 2022"

I mentioned getting started with the tasks and their progress in the previous blog.

Current progress

The second sub-activity of the first activity is swapping the number cards to make consecutive cards 10’s complements of each other. As of now (while writing the blog), there are two levels to this. The first of them consists of the four numbers to swap, and the second level consists of the five numbers to swap. Currently, in this activity, the pupil can swap two numbers by clicking on the first number and then second click on the number with whom they wish to swap. The selected card is a bit enlarged compared to others to provide click feedback and the assurance that this number is assigned. The answer is checked by whether the first pair (first and second card), second pair (third and fourth number card ), and third pair (fifth and sixth number card), if available, are 10’s complements of each other or not. If the pupil can successfully swap the numbers, they proceed to the next sub-level. Now the question arises, “WHAT IS SUB-LEVEL”?. A single level may consist of sub-levels. One has to answer all the sub-level correctly if they wish to proceed to the next level. After reading the “swap” term this many times, it is relatively indicative that the name for this must contain the word “swap.” The name for sub-activity-2 is “Swapping Ten’s Complements.”
The idea and design of the second sub-activity can be found here.

Swapping ten’s complement level 1.
level 2.

In 10’s complement sub-activity 1, there are a few improvements added. Now the users can correct their missteps, which means they can undo the move by clicking on the current swapped card. And suppose another number card overwrites a number card, then the initial number is available for re-selection. The positioning of the question and answer places are also random now. And the “okButton” is only available once all the questions have been answered.

Challenges and learnings

Initially, Apart from the technical difficulties, I could not devote time daily because of the classes and assignments. However, now I have a daily working time slot. Every time I compile the project to visualize the changes, the excitement and nervousness to see if that works or not is the same as on day 1.
One of the less spoken hardships in the software world is deciding the variable name. Sometimes it takes much more time to decide than it should take. Nevertheless, I am also trying to learn this because that will help me in the long run.

Experience as a developer till now

During the very first contribution in the month of December last year, when I was unsure whether I would be able to do it or not, I am not going to lie. I thought of quitting. But then I realized that’s what I have been doing in life in general. Whenever I face challenges, I try to run from them.
But this time, I thought, “what’s the worse that could happen” I won’t be able to learn, but I won’t be able to learn if I walk away anyways.
I am really grateful that I stuck to it and was rigid enough to try to keep learning.
The mentors have helped me gain confidence and understand that “it’s okay to make mistakes, just learn from them.” And now, I also feel it’s all part of the learning process.

What next?

  • Add level 3 and dataset for sub-level for it in swapping 10’s complement
  • Randomize the value present in the numbers cards.
  • Begin with sub-activity 3.

After a lot of ups and downs, I finally got Space hierarchy caching to work on NeoChat.

The commit is here.

I told in my yesterday's post about how a silly error on my part was not letting caching work as expected.

Now, I have invkoed cacheSpaceHierarchy() once when SortFilterRoomListModel is initialized.

populateSpaceHierarchy() accepts an additional parameter to decide if the UI needs to be updated. According to current logic, the UI shoudn't be updated when the caching happens as a part of class initialization. It must be updated only when user clicks on a Space icon.

Saturday, 23 July 2022

This is going to be a rather short blog post, but I think it’s still worth mentioning. Since 5.26, kwin will support only one way of setting up X screens – Xinerama, multi-head won’t be supported anymore. However, despite how “setup-breaking” it may sound, this will most likely not affect you as you probably already use Xinerama.

Before diving any deeper, it’s worth providing you some background. On X11, there are several ways how you could configure your desktop environment to run with multiple monitors – multi-head and Xinerama.

Multi-head is an old school way to run multiple monitors. Basically, with that mode, there’s an X screen per monitor. In Xinerama mode, there’s only one virtual X screen per all outputs. Both modes have their advantages and disadvantages, for example you can’t freely move windows between screens when using multi-head, etc. Xinerama is younger than multi-head and it provides the most user friendly workflow on multi-screen setups, so it’s usually enabled by default in all Linux distributions and many desktop environments are optimized for running in this mode, including Plasma.

Technically, kwin does provide support for both multi-head and Xinerama. But multi-head support has been in neglected and unmaintained state for many many years, e.g. some code (primarily, old code) supports multi-head, while lots of other code (mostly, new code) does not, various system settings modules and plasmashell components do not support multi-head either, etc. It’s also safe to say that no kwin developer has ever tested multi-head within last 5+ years.

So, rather than keep advertising the support for a feature that we don’t maintain and have no plans to fix it, we decided to drop the support for multi-head mode and make Xinerama a hard requirement since 5.26.

FAQ

Does this mean that Plasma won’t support multiple monitors anymore?

No, Plasma will continue supporting setups with multiple monitors, but you will need to ensure that Xinerama is used, which is usually the case and you don’t need to tweak anything.

I used multi-head for some esoteric thing, what should I do now?

It’s highly recommended to give the Wayland session a try. If something’s missing, file a bug report or contact us at https://webchat.kde.org/#/room/#kwin:kde.org.