Skip to content

Sunday, 31 July 2022

Hi everyone! In this blog, I will be discussing the algorithm used in Bismuth to find the closest relative window to be focused for focusWindowByDirection event. If you haven’t read the previous blog, make sure to give it a read here.

Recap from the previous blog

Let’s start with a quick recap though, in the previous blog, we discussed:

focusWindowByDirection requires the following information:

  • direction (from the user) - can be one of: right, left, top/up, bottom/down.
  • activeWindow (from the current session) - this is needed since focusWindowByDirection event is a relative event to your current focused window.
  • Neighbor window candidates (neighborCandidates) to your current window (activeWindow) and the given direction (direction).
// declaration
std::vector<Window> Engine::getNeighborCandidates(const FocusDirection &direction, const Window &basisWindow);

// use
std::vector<Window> neighborCandidates = getNeighborCandidates(direction, basisWindow);
  • From these neighbor candidates (neighborCandidates), we will now find the closest relative window corner. To me, it was tricky to understand at first, so we’ll be discussing this in detail over in the later sections.
  • Once we know the closest relative window corner, we’ll try to find the window which satisfies the corner condition.
  • If there were multiple found, we’ll return the first one based on the time-stamp (last used)

Understanding the scenario

I want to start off with a visual, took me some time to draw it, but in case it doesn’t look good, I’m sorry! My drawing teacher in the high school tried his best, but…

Saturday, 30 July 2022

KStars v3.6.0 is released on 2022.07.30 for MacOS, Linux, and Windows. It's a bi-monthly bugfix release with a couple of exciting features.

Linear 1 Pass Algorithm

John Evans contributed a new Focus algorithm: The Linear 1 Pass Algorithm. When using this algorithm, Ekos initially performs like the Linear algorithm in establishing the first pass V-Curve and fitting a curve to it to find the solution. Then, however, it moves directly to the calculated minimum. Key features include:

  • The algorithm compensates for focuser backlash, providing that backlash is consistent.

  • The algorithm is fast, taking 1 pass to identify optimum focus.

  • The algorithm uses more sophisticated curve fitting to pinpoint the optimum focus position.

  • The algorithm is highly configurable with user control over many parameters like step size and number of steps.




Early tests by various users shows very promising and stable results

Polar-alignment V3


Hy Murveit introduced a new Polar Alignment method based of plate solving. The original polar-alignment error measurement scheme has not changed. The user interface has changed slightly--different messages and a new LED display to indicate progress.

The original polar-alignment refresh/correction method still exists (if you choose the MoveStar or MoveStar & Cal Err refresh methods).

A new polar-alignment refresh/correction method is provided called PlateSolve. It allows for corrections of larger polar-misalignment in a single pass, does not depend on the image display, and may be more reliable if your plate solving is working well.

A new alternative to MoveStar is the PlateSolve method. This can polar align mounts with larger alignment errors in a single procedure. A similar triangle is displayed on the image display, but it is not central to this scheme. 


Rather the user should concentrate on the Updated Error line at the bottom of the display, and attempt to zero the Altitude and Azimuth errors. Also, arrows display the direction the mount needs to move to reduce error.

The method works by plate-solving images as they are captured, and then estimating the user's knob-adjustments from the plate-solve solutions. Note that, since knobs may be moved during exposures, some images may have large star trails and plate solves may fail. Be patient and allow the system to capture a clean image before relying on the error estimate. 

Image ROI Statistics


Madhav Prabhu made his first contribution to KStars by adding Region-Of-Interest (ROI) selection in FITS viewer where users may view statistics for a particular region of the image. The stats include average, median, and standard deviation.



You can select from existing probes of varying sizes (50x50, 100x100..etc), or you can simply hold down the Shift key and drag the mouse to create your own rectangle.

Profile Scripts


For complex equipment profile that have inter-dependencies requiring script execution or programmable delays, the new Script Profile edit provides complete control over the driver startup sequence.


You may define a Pre-Delay and Pre-Script before a driver is executed (e.g. script to turn on the observatory electricity), and define a Post driver delay and script if desired. For some drivers like Pegasus Ultimate Power Box, it is often desirable to start this driver before other drivers so that all configuration is loaded.

Geographic Map Update


Ed Lee made his first contribution to KStars by replacing the old geographic map from a new high quality version from NASA. This should play more nicely on larger monitors and high DPI displays.

Misc. Updates


Small but important quality of life improvements to KStars & Ekos.
  • Hy Murveit Show number of clipped pixels on fitsviewer status bar if show-clipping is enabled
  • Akarsh Simha Improve the manual focus dialog for the SkyMap
  • Sophie Taylor Correct nomenclature for Linear1 focus algorithm with "R2" -> "R²", and add a default and minimum value for R² limit
  • Akarsh Simha Further improvements to Add Catalog Object UI, including auto-filling data from text.
  • Sophie Taylor Improve tooltips for GPG expert settings
  • Akarsh Simha Various fixes and improvements related to DSO catalogs and visibility.
  • Akarsh Simha Refactor `DmsBox` widget and improve the Add Catalog Object form.

This is a maintenance release as part of the ongoing effort to support our users and fix bugs and annoyances. If you think you can support the project with some code changes or your artistic or writing talent, please take a look at the some low hanging fruits at the KMyMoney junior job list. Any contribution is welcome.

Despite the ongoing permanent testing we understand that some bugs may have slipped past our best efforts. If you find one of them, please forgive us, and be sure to report it, either to the mailing list or on bugs.kde.org.

The details

critical

  • 432380 Appimage unable to print reports

major

  • 426161 Duplicating an investment transaction also duplicates a matched but not accepted transaction in the brokerage account with original not the new date on the matched transaction
  • 447025 Calculation of the balance is incorrect for future balances

crash

  • 445458 Investment Cap Gains by Account (Customized) crashes with my dataset. Changing the date of one ledger entry fixes the crash.
  • 450016 Attempting KMyMoney 5.0.8 “Currencies” Maintenance, Application Crash
  • 451677 crashes on new category with a double colon

normal

  • 223708 Closed accounts are not hidden in accounts/categories view
  • 411272 Not saving changes to Shortcuts
  • 424303 Export report as csv file gives a html file
  • 425333 no pre-defined account templates on a mac
  • 428156 OFX import goes to the wrong account
  • 435866 No ledger icon in the pane of the left side
  • 439287 Home view is missing styling
  • 439722 Equities are shown with currencies in new account dialog
  • 439819 Issue with changing credit card limits
  • 439861 Rounding error on investment transactions
  • 440060 Icons are missing on Linux if Breeze icon theme shipped by the bistro is older than 5.81
  • 440111 Tags/Payees Double Enter
  • 440476 Can not update stock price manually
  • 440500 Stock wizard shows online source that no longer exist
  • 440681 Currency list not sorted with locale awareness in the new account wizard
  • 440692 When importing OFX, the OK and Skip buttons are reversed
  • 440695 Unable to inspect the Splits when account is closed.
  • 441292 Impossible to paste into calculator widget
  • 443208 Build failure with aqbanking 6.3.2
  • 444414 Transaction notes are not imported from paypal account
  • 445472 Stock split transactions can cause rounding problems
  • 446990 Wayland: Tooltip on date input fields steals focus, prevents entering data
  • 451891 Setting the payee matching to exact name is not persistent
  • 452068 kmymoney complains about “GPG no secure keyring found”
  • 452497 Scheduled transactions: “Next due date”, “Number remaining” and “Date of final” do not always update in step
  • 452616 Missing transaction information
  • 452720 Provide feature to rename existing loan accounts
  • 452918 Payee > Account Numbers > IBAN does not accept pasted content with a space at the start
  • 456520 OFX import broken upstream

minor

  • 440736 In “New Account” wizard, Enter key does not work on “Parent account” page
  • 441296 Fields in Exchange Rate/Price Editor misaligned
  • 441937 Default cash flow report: Name does not match date range
  • 448013 Unresponsive UI elements in “New File Setup” > “Select Accounts”
  • 452863 New file setup wizard: UK VAT Accounts produces “invalid top-level account type” error

wishlist

  • 399685 add match strings as well as name from deleted payee to new payee
  • 424377 Change default matching behavior for new payees to “match on exact payee name”
  • 440586 When exporting a report, the file name (suggested) takes the report name
  • 441581 When “Amount” at ledgers get the focus by click, select the entire value
  • 444262 Date picker, frequency and process schedule at last day of the month should interact
  • 447480 Allow currencies to be divisible by more than ten decimal places
  • 450965 Please add Functionality to Scheduled Transactions
  • 453922 Decimal and thousands separators in ordinate axis labels are missing

A complete description of all changes can be found in the ChangeLog.

Friday, 29 July 2022

Let’s go for my web review for the week 2022-30.


CosmicStrand: the discovery of a sophisticated UEFI firmware rootkit | Securelist

Tags: tech, security

This is an interesting (and concerning) type of rootkits. Hard to tell how much of it really is in the wild at the moment.

https://securelist.com/cosmicstrand-uefi-firmware-rootkit/106973/


Twenty years of Valgrind | Nicholas Nethercote

Tags: tech, profiling, performance, history

One of the best developer tools around for analysis and profiling. I’m glad it exists, saved me a few times.

https://nnethercote.github.io/2022/07/27/twenty-years-of-valgrind.html


Tags: tech, search

Looks like a somewhat recent alternative in the search engine and document indexing space. Sounds potentially interesting.

https://manticoresearch.com/


Diagram as Code - by Alex Xu - ByteByteGo Newsletter

Tags: tech, markdown, diagrams

Good collection of tools, I knew a couple but not all of them. Will go well with markdown uses or similar.

https://blog.bytebytego.com/p/diagram-as-code


Gum: A tool for glamorous shell scripts 🎀

Tags: tech, shell

Nice tool to spice up your interactive shell scripts.

https://github.com/charmbracelet/gum


Investigating Managed Language Runtime Performance

Tags: tech, performance, c++, go, python, java, javascript

Wow, this is a very good exploration of the performances of several common languages and runtimes. This is one of the most thorough I’ve seen. A good resource for deciding what to pick.

https://www.usenix.org/publications/loginonline/investigating-managed-language-runtime-performance


finally. #embed

Tags: tech, programming, c

Nice new feature coming to C. This is useful stuff. It required quite some fighting to get in though.

https://thephd.dev/finally-embed-in-c23


Retiring Some Terms

Tags: tech, craftsmanship, culture, tests, estimates, technical-debt, agile, refactoring

Very interesting musing about the technical terms we often use wrongly and how it difficult it is to be understood.

https://ronjeffries.com/articles/-z022/0222ff/terms/


Why We Estimate - Embedded Artistry

Tags: tech, product-management, project-management, estimates

Now this is a well balanced piece about estimates. Starting from the “why” to decide how you approach the estimates and the level of details is just very good advice.

https://embeddedartistry.com/blog/2020/03/02/why-we-estimate/


How finishing what you start makes teams more productive and predictable

Tags: tech, project-management, kanban

An advice I often give, it’s nice to see the theory behind it well laid out like that.

https://lucasfcosta.com/2022/07/19/finish-what-you-start.html


Gotta Be Good

Tags: tech, programming, agile, craftsmanship

Hear hear! It’s not supposed to be easy, you need to hone your practices.

https://ronjeffries.com/articles/-z022/0222ff/gotta-be-good/


Engineering Ladders

Tags: tech, management, engineering

That’s nice to see a reusable framework to help organizations get started with their engineering ladder.

http://www.engineeringladders.com/


How to Freaking Find Great Developers By Having Them Read Code | Freaking Rectangle

Tags: tech, programming, interviews

I already do some of that reading code for some profiles, although it’s more geared towards finding mistakes in the code. I like the proposed approach here, will try to do some more of it.

https://freakingrectangle.com/2022/04/15/how-to-freaking-hire-great-developers/


The Deadliest Virus on Earth - YouTube

Tags: science, vaccines

We often forget how much of a problem it used to be.

https://www.youtube.com/watch?v=4u5I8GYB79Y



Bye for now!

Thursday, 28 July 2022

A major part of the on-going preparations of version 6 of the KDE Frameworks is to see that the API breakage due to happen versus version 5 is mostly an API dropage. And software using KDE Frameworks 5 can already find and use the future-proof API in the version 5 series, next to the legacy one. So that the very hurdle to take on porting from KF5 to KF6 is minimal, ideally just changing “5” to “6”, once one has managed to get rid of using the legacy API. This also matches the approach taken by Qt for Qt5 & Qt6, so there is a common API consumer experience. And adding new API already in KF5 allows to field-test its practicability, so KF 6.0 starts with proven API.

Preparing the Ground

To direct developers to the future-proof API there are at least two places:

  • adding notes in the API documentation, to help those writing new code to avoid the outdated API
  • having the compiler emit warnings when building existing code using outdated API

C++ does not come with a native system for integrated tagging and conditional usage of deprecated API and build of its implementation or control over library and version specific emission of warnings. All there is are documentation tools like doxygen which require to have a separate tool-specific tag like @deprecated in the documentation comment, while in the C++ syntax since C++14 there is the [[deprecated(text)]] attribute available (before there were compiler specific attributes), where the API developer has to maintain both separately, and a global option for the compiler like -Wno-deprecated-declarations to not be bothered by any warnings.

For KDE Frameworks therefore almost 3 years ago (ECM/KF 5.64) the CMake utility ECMGenerateExportHeader was introduced: it generates next to the symbol visibility tagging C++ macros (“export macros”) also version-controlled macros for adding compiler deprecation warnings as well as for wrapping code blocks to be visible to the compiler (see blog post for more details). It just does not solve the need for duplication in the documentation comment sadly.

A typical deprecation using those macros looks in the declaration like this

#include <foo_export.h>
#if FOO_ENABLE_DEPRECATED_SINCE(5, 0)
/**
 * @deprecated Since 5.0. Use bar().
 */
FOO_EXPORT
FOO_DEPRECATED_VERSION(5, 0, "Use bar()")
void foo();
#endif

and in the implementation like this

#include "foo.h"
#if FOO_BUILD_DEPRECATED_SINCE(5, 0)
void foo()
{
}
#endif

Which by default will then have the compiler notify an API consumer in a build like this (note also the automatic version hint):

/.../fooconsumer.cpp:27:65: warning: ‘void foo()’ is deprecated: Since 5.0. Use bar() [-Wdeprecated-declarations]

See the guidelines how to deprecate API for further details.

Piling up

The deprecation macros available by that have since been massively deployed: as of the current development version of KF5 there are more than 1000 hits in the installed headers when grepping for deprecated methods and enumerators. So quite some things learned and changed since KF 5.0 in July 2014 (but also before, see below).

As the internal logic generated with ECMGenerateExportHeader requires to list specific version at which API is deprecated, this also gives easy access to interesting insight into the activity. Note also that the new macros allowed to properly deprecate API that was even considered outdated before KF5, but due to missing systematic support failed to be removed on next chance (especially in the KIO core library).

LibraryVersions with new deprecations
attica0.2 5.4 5.23
baloo5.55 5.69
bluez-qt5.57
karchive5.0 5.85
kauth (core)5.71
kauth (widgets)5.92
kbookmarks5.0 5.65 5.69
kcalendarcore5.64 5.89 5.91 5.95 5.96 5.97
kcmutils5.66 5.76 5.82 5.85 5.87 5.88 5.90
kcodecs5.5 5.56
kcompletion4.0 4.5 5.0 5.46 5.66 5.81 5.83
kconfig (core)4.0 5.0 5.24 5.42 5.82 5.89
kconfig (gui)5.11 5.39 5.71 5.82
kconfigwidgets4.0 5.0 5.23 5.32 5.38 5.39 5.64 5.78 5.80 5.82 5.83 5.84 5.85 5.90 5.93
kcontacts5.88 5.89 5.92
kcoreaddons4.0 5.0 5.2 5.65 5.67 5.70 5.72 5.75 5.76 5.78 5.79 5.80 5.84 5.86 5.87 5.88 5.89 5.92 5.95 5.97
kdbusaddons5.68
kdeclarative (kdeclarative)5.0 5.45 5.75 5.91 5.95
kdeclarative (quickaddons)5.88 5.93
kdesu5.0
kdnssd4.0
kfilemetadata5.50 5.60 5.76 5.82 5.89 5.91
kglobalaccel4.2 4.4 5.9 5.90
kglobalaccel (runtime)4.3 5.90
kholidays5.95
ki18n5.0
kiconthemes4.8 5.0 5.63 5.64 5.65 5.66 5.82
kidletime5.76
kio (core)3.0 3.1 3.4 4.0 4.3 4.5 4.6 5.0 5.2 5.8 5.24 5.45 5.48 5.63 5.61 5.64 5.65 5.66 5.69 5.72 5.78 5.79 5.80 5.81 5.82 5.83 5.84 5.86 5.88 5.90 5.91 5.94 5.96 5.97
kio (filewidgets)4.3 4.5 5.0 5.33 5.66 5.70 5.76 5.78 5.86 5.97
kio (kntlm)5.91
kio (widgets)4.0 4.1 4.3 4.4 4.5 4.6 4.7 5.0 5.4 5.6 5.25 5.31 5.32 5.64 5.66 5.71 5.75 5.76 5.79 5.80 5.82 5.83 5.84 5.86 5.87 5.88
kirigami5.80 5.86
kitemmodels4.8 5.65 5.80
kitemviews4.2 4.4 5.0 5.50
kjobwidgets5.79
knewstuff (core)5.31 5.36 5.53 5.71 5.74 5.77 5.83
knewstuff (qtquick)5.81
knewstuff (widgets)5.91
knewstuff5.29 5.76 5.77 5.78 5.79 5.80 5.82 5.85 5.91 5.94
knotifications5.67 5.75 5.76 5.79
kpackage5.21 5.84 5.85 5.86
kparts3.0 4.4 5.0 5.72 5.77 5.78 5.80 5.81 5.82 5.83 5.88 5.90
kquickcharts5.78
krunner5.28 5.71 5.72 5.73 5.76 5.77 5.79 5.81 5.82 5.85 5.86 5.88
kservice5.0 5.15 5.61 5.63 5.66 5.67 5.70 5.71 5.79 5.80 5.81 5.82 5.83 5.86 5.87 5.88 5.89 5.90
ktexteditor5.80
ktextwidgets5.0 5.65 5.70 5.71 5.81 5.83
kunitconversion5.91
kwallet5.72
kwayland (client)5.49 5.50 5.52 5.53 5.73 5.82
kwidgetsaddons5.0 5.13 5.63 5.65 5.72 5.77 5.78 5.85 5.86 5.97
kwindowsystem5.0 5.18 5.38 5.62 5.67 5.69 5.80 5.81 5.82
kxmlgui4.0 4.1 5.0 5.75 5.83 5.84
plasma-framework (plasma)5.6 5.19 5.28 5.30 5.36 5.46 5.67 5.77 5.78 5.81 5.83 5.85 5.86 5.88 5.94
plasma-framework (plasmaquick)5.12 5.25 5.36
prison5.69 5.72
solid5.0
sonnet (core)5.65
sonnet (ui)5.65
syntax-highlighting5.87
threadweaver5.0 5.80

On the QML side deprecations are not that simple (and also not my personal domain), so that is not covered here.

On an expected question: It will be ready, when it is ready. Please join the effort if you can. Find more info e.g. in Volker’s more recent update.

In the last edition, I was talking about adding an appropriate dataset to the Comparator Activity. This blog majorly consists of its implementation and the new changes that we now have in the comparator.

Restructuring the whole activity

It was initially planned to add the dataset to the comparator in the following manner:

4-5 years: Numbers from 1-9
5-6 years: Numbers from  1-19
6-7 years: Numbers from  1 - 100
7-8 years: Numbers from  1 - 1 000
8-9 years: Numbers from  1 - 1 000 000
9-10 years: Numbers from  1-1 000 000 000

While adding the dataset made for children aged 9–10 years, we realised that the right and left texts started to overlap for small screen sizes. This led me to restructure the whole activity so the new skeleton did not contain row elements but left and right texts were anchored to the centre to remain intact even for small screen sizes. This seemed to be the best possible solution for this predicament.

The final solution to text overlapping

The above-mentioned restructuring did solve 90% of the issue, but the last level numbers had over 9 digits and they were still overlapping. Another approach to fitting text into a rectangle element was thought of, but it did not seem to deliver the best results.

Finally, this issue was sorted after setting the custom font size to tiny, which would automatically adjust according to the screen size.

Fixing Issues

Other issues investigated included making the layout responsive for different screen sizes, handling errors in the log, fixing the logic to avoid multiple wins, adding consistency in variable declarations and their datatypes, adding canvas to the entire exercise view zone, and some issues that arose as a result of the activity’s restructuring, such as selection on tapping the rows and default selection of the first row when the activity opens.

Apart from this, dividing the repeated code snippets into a separate component file and keyboard binding for the activity have also been implemented this time.

Revamping Comparator

As suggested by mentors, in order to make the activity more organised, we have proposed a new mock-up for the comparator activity, which can be found here.

What’s next?

After the GSoC mid-evaluation, I have updated the project timeline.

We are planning to implement a feature to highlight the wrong answers and revamp the comparator as proposed. So, stay tuned for it!

I am going to be attending Akademy 2022 in person!

This is my first time going to Akademy in-person, so it is quite exciting! I will be doing a talk with Bhushan on the state of Plasma Mobile.

A bit of background on my experience at KDE:

I first started contributing to KDE in 2020, creating the KClock project as a way to learn Qt Quick, and to pick up one of the pending tasks of Plasma Mobile.

Through 2020, I worked on many Plasma Mobile applications such as KWeather, and also started poking around contributing to the shell. I also did some work for the desktop, including some work on adding fingerprint support to the users kcm.


In 2021 and 2022, I got much more heavily involved in KDE frameworks (namely Kirigami) and Plasma Mobile shell contributions.

One of the most exciting developments in the project over the past few years has been the expansion in the number of devices that can now run Plasma Mobile. I will be sure to bring some devices I have for demo purposes :3


Overall, I am very excited to be able to share a lot of the work that has been done when I attend this year’s Akademy! Be sure to check out my talk!

someone here is me…

Monday, 25 July 2022

 

https://phabricator.kde.org/source/latte-dock/

 
Hello everyone,
 
Unfortunately I would like to inform kde community that I am stepping away from Latte development. No time,motivation or interest from my part is the main reason. I hope that this will give free space and air for new developers/maintainers to step in and move Latte forward.
 
I hoped that I would be able to release Latte v0.11 but unfortunately I can not. Releasing Latte v0.11 it would mean that someone would maintain it afterwards and that is no more the case. 
 
For the last 6 years developing Latte was a beautiful journey and taught me plenty new things. I would like to thank you all for that beautiful journey, kde community members, users, developers, enthusiasts and plasma developers.
 
Have fun and enjoy life...
 
 

In the following weeks (June 27,2022 to July 10, 2022), for each test case, I will try to :

  • Research algorithms for pre-processing image.
  • Gather many test cases for improving pre-processing methods.

Based on the results of previous research, I realize that Tesseract does various processing internally before doing the actual OCR. However, some instances still exist where Tesseract can result in a signification reduction in accuracy.

So in this post, I would like to introduce some pre-processing methods that apply directly before passing in Tesseract.

Removing Shadow

  • The main idea is to extract and segment the text from each channel by using Background subtraction (BS) from the grayscale image, which is a common and widely used technique for generating a foreground mask.

  • BS calculates the foreground mask performing a subtraction between the current frame and a background model, containing the static part of the scene or, more in general, everything that can be considered as background given the characteristics of the observed scene.

  • The purpose is to remove the shadow or noise from background of each test case (in each set case here the background of scanned image can be influenced by many external factors)(convert the background into white or black) and then merge each track into an image.

For simplify, here is an example, we can see the Tesseract output can be affected by the light.

figure1

Please look at three picture channels; the amount of shadow in blue in the channel is the most distributed.

figure1

Implementation

First step

Extract the background by making blur the part of the text. Morphological operations are helpful right now.

  • Grayscale dilation of an image involves assigning to each pixel the maximum value found over the neighborhood of the structuring element by using a square kernel with odd-dimensional (kernel 3 x 3).

  • The dilated value of a pixel x is the maximum value of the image in the neighborhood defined.

Dilation is considered a type of morphological operation. Morphological operations are the set of operations that process images according to their shapes.As the kernel B is scanned over the image, we compute the maximal pixel value overlapped by B and replace the image pixel in the anchor point position with that maximal value. As you can deduce, with background with and black text, this maximizing operation causes dark regions within an image to reduce. The bigger size of the kernel is, the blurrier text is. They are transferred as noise as we need to remove.

figure1

Second step

Median Blur helps smooth image and remove noise produced from previous operations. They take each background more generally.

figure1

Third step

cv2.absdiff is a function that helps find the absolute difference between the pixels of the two image arrays. Using this, we can extract just the pixels of the text of the objects. Or on the other way, the background becomes white; the shadow is all removed.

figure1

And finally, merging all channels into one image, we get the following result. We can see the noise in background is removed, we obtain of course a better output.

figure1

Here is more the example test cases that I have tried, and all we almost get the better results:

figure1

figure1

Perspective correction

The issue is more specific to some image acquisition devices (digital cameras or Mobile devices). As a result, the acquired area is not a rectangle but a trapezoid or a parallelogram. Once the image transformation is applied, the corrected font size will look almost similar and give a better OCR result.

I was really helpful to use these ressources to help me explain this post :

Post 1

Post 2

Implementation

Building the optional document scanner with OpenCV can be executed in the following steps :

  • Step 1: Detect edges.
  • Step 2: Use the edges in the images to find the contour (outline) representing the piece of paper being scanned.
  • Step 3: Apply a perspective transform to obtain the top-down view of the document.

Take a look at an example.

First, to speed up the image processing process and improve our edge detection more accurately, we take the ratio of how big our image is compared to the height of 500 pixels, then resize the image.

Convert the image from colored to grayscale, use Gaussian blurring to remove high-frequency noise, and then apply a bilateral Filter, which is highly effective in noise removal while keeping edges sharp. Then I used one Median blur operation similar to the other averaging methods. Here, the central element of the image is replaced by the median of all the pixels in the kernel area. This operation processes the edges while removing the noise. All these methods are helpful for edge detection and improving accuracy. See the following steps of this process:

figure1

Second, we scan a piece of paper that usually will take the shape of a rectangle, so we now know that we have a rectangle with four points and four edges. Therefore, we will assume that the most prominent contour in the new image will take precisely four points in our piece of paper; another way, the most prominent edges have a higher probability of being documents we are scanning.

The algorithm to find the largest contours :

  • Use cv.findContours() function, who finds contours in a binary image. Contours is a list of all the contours in the image under form (x,y) coordinates of boundary points of the object and then sort the contours by area and keep only the largest ones. This only allows us to examine the largest contours, discarding the rest.

  • It loops over the contours and uses cv2.approxPolyDP function to smooth and approximate the quadrilateral. cv2.approxPolyDP works for cases with sharp edges in the shapes like a document boundary.

  • Having four points, we save all rectangles and widths; we determine the width and height, and the top left two points of our rectangle (the largest one) or the largest rectangle should be our document.

figure1

  • Apply four points transformation algorithm; you can see more here. This gives us the bird look to the document like you are flying and see from the vertical perspective “birds eye view.”, which computes the perspective transform matrix by using getPerspectiveTransform function and applies it to obtain our top-down view by calling the function cv2.warpPerspective. Here is the good result :

figure1

Now we take a look at some test cases :

figure1

figure1

figure1

figure1

The method works quite well, but in some cases, the algorithm doesn’t act well because the contour of the piece of paper is quite blurred compared to the background color, so the user should take a photo whose edge is apparent. For example:

figure1

Conclusion

These main ideas basically may be implemented into the plugin into an OCR pre-processing dialog in the future.These main ideas basically may be implemented into the plugin into an OCR pre-processing dialog in the future.

In the following weeks (June 11, 2022 to July 25, 2022 ), I will try to :

  • Make decisions to choose the architecture of the plugin.
  • Design UML for each component of the plugin.
  • Write documentation.

Frontend:

The idea of the OCR processing plugin in digikam is inspired by another plugin used for the conversion from the RAW to the DNG version. The GUI for the batch processing of the plugin, is a dialog, consists of :

The widget list of images to be processed in OCR on the left of the dialog.

On the right side, optional widgets display all the settings components for setting up Tesseract options:

  • Language : Specify language(s) used for OCR.
  • Segmentation mode : Specify page segmentation mode.
  • Engine mode : Specify OCR Engine mode.
  • Resolution dpi : Specify DPI for the input image.

A text Editor for visualizing the content of text detected in the image.

A button to save the content in files.

Important link:

I would like to share a single merge request that contains principally all the implementation of the plugin in the Gsoc period :

https://invent.kde.org/graphics/digikam/-/merge_requests/177

Implementation :

First of all, a TextConverterPlugin is created, an interface that contains brief information about the OCR processing plugin. TextConverterPlugin is inherited from the class DPluginGeneric, a Digikam external plugin class whose virtual functions are overridden for the new features.

This object includes methods overriding from the parent class:

FunctionsDescriptions
name()Returns the user-visible name of the plugin, providing enough information as to what the plugin is about in the context of digiKam.
iid()Returns the plugin interface’s unique top-level internal identification property of the plugin interface. In this case, the formatted identification text is a constant substitute token-string like: “org.kde.digikam.plugin.generic.TextConverter”
icon()Return an icon for the plugin supported by QIcon
authors()Return an authors list for the plugin with authors details informations like author ’s names, their emails, copyright year and roles
description()Return a short description about the plugin
detail()Return a long description about the plugin
setup()Create all internal object instances for a given parent.

The interface is shown like:

figure1

Text Converter Dialog

The idea is to set up a dialog widget. There is a dialog box using TextConverterDialog to list the files processed in OCR and a status to indicate the processing.

TextConverterDialog is a DPluginDialog (Digikam defaulted plugin dialog class) that uses QDialogButtonBox, which presents buttons in a layout that conforms to the interface guidelines for that platform, allows a developer to add buttons to it, and will automatically use the appropriate format for the user’s desktop environment. We can see the design of the dialog here:

figure1

It implements principally a slot TextConverterAction() that uses internal methods and is called to apply OCR processing into the image after pre-processing.

The main dialog consists of all Tesseract options widgets (Text converter Settings) and a text editor to view the OCR result discussed in the following sections.

Text converter Settings

Text Converter Settings object is a widget containing optional widgets displaying all the settings components for setting up Tesseract options for users to select. The main options for these settings are

Three DcomboBox widgets (a combo box widget re-implemented with a reset button to switch to a default item ):

  • Page Segmentation mode (psm).
  • Specify OCR Engine mode.
  • The language or script to use.

One DNumInput an integer num input widget in Digikam using for Tesseract option dpi resolution.

Two QCheckbox for two options for saving OCR Text into separate text files and hosting text recognized by OCR in XMP (Extreme Memory Profile).

TextConverterSettings is a member object of the dialog. Here is a visualization of the OCR settings

figure1

Text Editor

A text Editor for visualizing the content of text detected in the image. A QTextWidget + Sonnet spell checker showing the recognized text from a scanned document. If i select a file from the left side list, the reader is changed accordingly. A text Editor is [DTextEdit which combined these two functionalities.

figure1

Text Converter List

figure1

The widget list of images to be processed in OCR, where the urls are pointing to the pictures, is located in a generic list for all plugins based on QTreeWidget. TextConverterList is inherited from DItemList, an abstract item list. TextConvertList composes DItemsListViewItem, which is an interface of Tree widget items that are used to hold rows of information. Rows usually contain several columns of data, each of which can contain a text label and an icon.

Each text converter item consists of 4 specific features:

ColumnDescription
File Namea URL pointing to the image.
Recognized Wordsnumber of words recognized
Target Filea target file that saves converted text from images.
Statusan indication during processing.

Here is a capture of the text convert List widget which I implemented:

figure1

Results

The architecture and the position of each widget components are designed in the following image. This visualization help me more easy to implement:

figure1

Here is the expected of GUI for the batch processing of the plugin :

figure1

Main commits

3f9d7895

81bf0d9c

5970d961

f17dce69

Next step

In the next few weeks, I will:

  • Implement the Ocr Tesseract Engine object used for OCR text from the image.
  • Implement an internal multi-thread for OCR processing image.
  • Polish and re-implement code if necessary.