Skip to content

Wednesday, 16 June 2021

Calamares, a Linux system installer used by a few dozen different distro’s, is hosted on GitHub. The source code and issues live there, and the website is run from GitHub pages. This post meanders around GitHub actions – things that happen in response to changes in a project – and how I built a Matrix-notification-thing for Calamares.

Calamares is not a KDE project – call it KDE-adjacent, I guess – so there is a bunch of slightly-different technology and topics than a KDE project would use on KDE Invent, which is a GitLab instance.

The project has two main repositories (calamares and calamares-extensions), and project participants are interested in notifications about two main things:

  • issues opened and closed
  • Continuous Integration builds (CI, after a push and nightly across more platforms)

GitHub Actions provides a (terrible, YAML-based) language for writing down things to do and when to do them. The Calamares project uses this to do notifications for the things we’re interested in. At some point, I think GitHub had specific integrations for IRC notification; since then Calamares has moved most of its messaging to Matrix and we had to re-jig the setup.

All I Wanted Was a Shell Script

Sending notifications to Matrix is actually pretty simple: anything that can do a HTTP POST request will do, which means that the Swiss-army-knife called curl is your best friend.

The Matrix API Documentation is pretty good (and curl-centric). I registered a new account called calamares-bot on the Matrix.org homeserver, and then went through the following steps:

  • Get an access token by following the steps in the Login section. This gives me a random string which I’ll call MATRIX_TOKEN.
  • Figuring out the room ID for #calamares:kde.org. I think I started up Element in a web browser to do that, since the “joining a room via an alias” documentation didn’t make sense to me – and all I have is a canonical name, I don’t even know if that counts as an alias. This is another random string, which I’ll call MATRIX_ROOM.
  • The two strings can be used to send messages to a room:
    curl -XPOST -d '{"msgtype":"m.text", "body":"hello"}' \
      "https://matrix.org/_matrix/client/r0/rooms/$MATRIX_ROOM/send/m.room.message?access_token=$MATRIX_TOKEN"
    

    Here, the documentation is pretty good again (copy-and-paste errors above are my fault).

So what I really want is to run that curl command in response to new issues and build results. It’s just a (one-line!) shell-script, how hard can it be?

Just One Shell Script

GitHub has an “actions marketplace” which encourages you to use arbitrary scripts from other users as actions. Those scripts run in the context of your own CI, along with whatever secrets you hand them. What could possibly go wrong?

The documentation encourages referring to the arbitrary scripts from third parties by the full hash of the corresponding Git commit. I suppose full hashes will mitigate most of the supply-chain attacks on actions, but I fully expect some popular action to be bought up and have a tag changed leading to CI compromises. There’s also script hardening documentation which is both terrifying and laughable (I can see that my own scripts are vulnerable; this is C++ level of footgunnery, provided via Javascript and YAML).

With that in mind, I set out to do-it-myself, in a repository controlled by the Calamares project. That way at best we can compromise our own CI and not somebody else’s.

Initially I had a shell script, in the Calamares repository itself, to do the notifications, but there’s a curiosity: actions run with no checkout of the repository, so to get the script I’d have to clone the repo first. For CI builds that makes sense, but not so much for notification of issues being opened or closed.

So I need my shell script, (remember, this is going to be one line which calls curl with suitable data) to live somewhere that doesn’t need to be checked out.

But They Wouldn’t Give It To Me

Regarding this section title, yes they would, but otherwise I’d miss the opportunity to misquote more bits of Suicidal Tendencies’ “Institutionalized”.

What’s not immediately obvious to my old-school shell-scripting mind is that GitHub’s “compound runners” can effectively be shell scripts, that they can live in a separate repository in subdirectories, and that they’re magically available.

Here is the YAML bla-bla that wraps my one-line shell invocation of curl. There’s input and output parameters being defined and then used in the script – the script being the value of the run key down at the bottom.

So at this point I have

  • a separate git repository,
  • containing a subdirectory,
  • with 25 lines of YAML directing one line of shell-code.

But it’s easy to call! All I need is 5 lines of YAML at the call site (e.g. a GitHub workflow) to invoke the action: one to name it, and 4 lines to pass the parameters in (the strings MATRIX_TOKEN and MATRIX_ROOM and the message to send).

Since the token and room strings can be used to send messages to a given (possibly-private) room as a particular user, they open up the possibility of impersonation; they need to be secret. The repository and organization Settings pages on GitHub have a tab where a secret string can be named and stored. Then the string is available to actions, but won’t be displayed or logged unless you do it deliberately.

.. I’ll probably get hit by a bus anyway

I’m not going to think about how many different VMs or docker instances or whatnot get spun up for the purposes of notification – it happens on GitHub’s Azure backend, all automagically. No wonder people mine crypto as part of their CI pipeline.

In overall terms of code-lengte / lines written, this is remarkably inefficient; the only thing gained is that I have a cargo-cult 5-liner for any workflow step that needs notifications (e.g. issues changed or CI run completed). If I need notifications from other parts of the Calamares project – who knows, there might be a Pull-Request-Notifier at some point if external participation picks up – then I’ll have saved a little time.

For now, though, color me unimpressed with the whole thing. It’s a sunk cost, and part of increasing inertia to help with lock-in on the platform (re-doing this for GitLab would mean changing all the bla-bla surrounding the one line curl invocation). The only upside is that I’m running my untrusted code, and not somebody else’s (and I run my code every day, anyway).

PS. You can use this action yourself; if you trust me (bambi-eyes) invoke the action directly; if you don’t, copy the action into a repository of your own. There’s also other, similar Matrix-notifiers for GitHub actions with more-or-less fancy features.

Hello!

Akademy starts in a few days, and the Champions and I will be focusing on that. However, there are still some interesting updates we’d like to share with you.

Let’s jump right in!

Akademy 2021 logo

Wayland

With every recent Plasma update (and especially the just released version 5.22) the list of features that are X11 exclusive gets smaller and smaller.

Conversely, many users may not be aware that the opposite is also happening: every day there are more features available on Wayland that cannot be found on X11!

There are many resources available describing the security advantages of Wayland over X11, but the ageing protocol has some other shortcomings as well. For example, the last update we highlighted was the recently released VRR support in 5.22. Among other things, this enables an important use case for me: it allows each of my connected displays to operate at their highest refresh rate. I have a 144Hz main display, but occasionally I plug in my TV, which works at 60Hz. Because of limitations of X11, for everything to work, my main display needs to be limited to 60Hz when both of them are active. But not any more thanks to Wayland!

While the KDE developers always try to bring new functionalities to all users, the above example shows that sometimes, either due to X11 limitations or for other reasons, feature parity will not be possible.

For different, but similar reasons, some other features that are Wayland exclusive are:

You can be sure that the list of Wayland exclusives will grow and grow as work progresses.

Méven and Vlad will have a Wayland Goal talk at Akademy. Check out the details here: https://conf.kde.org/event/1/contributions/5/

Consistency

When you think about consistency, you may think of how different parts of your Plasma desktop should look and behave in a similar way, like scrollbars should all look the same on all windows. Or like when you open a new tab, it should always open in the same place, regardless of the app.

But the KDE developers also think about the bigger picture, like: How can we achieve a consistent experience between your desktop and your phone? Here’s where Kirigami comes in! It makes sense to have applications like NeoChat and Tok on both Plasma desktop and Plasma Mobile, and, thanks to the Kirigami framework, users will feel at home on both form factors. Now I want to see Kirigami apps on a smartwatch!

NeoChat desktop and mobile powered by Kirigami NeoChat desktop and mobile powered by Kirigami

Speaking of Kirigami, there is work being done on a component called “swipenavigator” to increase its - you guessed it - consistency, among other fixes. Details of the rewrite are in the merge request.

Do you care about looks? Then you’ll be interested in two MR’s: the first regarding better shadows, and the other is the “Blue Ocean” style for buttons, checkboxes etc. There are some details at Nates blog.

Our Consistency Champion Niccolò has a Goal talk during Akademy, so be sure to watch it!

KDE is All About the Apps

As announced on the community mailing list and the Goals matrix room, there was a meeting last Monday to discuss the way forward with the huge list of topics mentioned in the previous update.

In the meeting, the conclusion was to start with the topics regarding the different platforms we support, as well as the automation of the build/release process of apps.

Taking advantage of the upcoming Akademy, the topics will be discussed during the BoF sessions. Check out the schedule to see when you can attend! Also, don’t miss the “Creating Plasma Mobile apps” BoF!

Of course, like the other Goal Champions, Aleix will have a talk on the first day of Akademy, don’t miss it!

Meta

Right after the three Goal talks at Akademy, there will be a KDE Goals round table, a session where Lydia and I will be joined by the Champions to answer community questions regarding the specific goals, and the goal program as a whole.

Later in the event, on Monday June 21st at 18:00 UTC, I will conduct a BoF regarding selecting the next goals! Be sure to join in, if you were thinking about becoming a Champion yourself, or if you’re just curious about the process.

See you there!

This is how I would look like in the Akademy t-shirt, if Akademy was an in-person event this year. And held outside. This is how I would look like in the Akademy t-shirt, if Akademy was an in-person event this year. And held outside.

If you haven't read my previous blog post on coroutines in C++, you want to do that before reading this blog post.

In the last blog post, I explained how to construct an awaitable Future type.

If you've been tinkering with coroutines, you may have tried the following thing to allow awaiting on a pointer type:

auto operator await(QNetworkReply* it) {
    // ...
}

As far as the C++ compiler is concerned, this ain't kosher, because you're trying to define an operator for a primitive type (a pointer). Your compiler will probably tell you that this needs to be done on a class or enum type.

But that leaves the question of “how do I make a QNetworkReply* co_awaitable if I can't define co_await on a pointer type?” It is possible.

Your promise_type object has more jobs than what I covered in the last blog post. One of them is to potentially provide an await_transform function.

An await_transform function essentially “preprocesses” any values being co_awaited before the compiler attempts to look for a co_await implementation.

In code, if await_transform is defined on a promise_type, any co_await that looks like this:

auto response = co_await pendingValue;

will actually become:

auto response = co_await await_transform(pendingValue);

This is how we co_await a type you can't provide an operator co_await for: transform it into a type you can.

To integrate a QNetworkReply* into our coroutine types from before, this means we need to make an await_transform function that takes a QNetworkReply* and return a future. This is fairly simple.

auto await_transform(QNetworkReply* it) {
    auto future = Future();

    QObject::connect(it, &QNetworkReply::finished, it, [future, it]() {
        if (it->error() == QNetworkReply::NoError) {
            future.succeed(it);
        } else {
            future.fail(it);
        }
    });

    return future;
}

Make a future, mark as success if the reply succeeds, mark as failure if it doesn't succeed. Easy peasy.

Plonk this function in your promise_type from before.

This now allows you to do this:

Future fetch(QString url) {
    auto response = co_await nam.get(url);
    if (response.error() != QNetworkReply::NoError) {
        co_return response.readAll();
    }
    co_return false;
}

You can implement an await_transform for any input you like, and return anything you like, as long as you can co_await it.

The full code for this blog post can be found in a single-file format at https://invent.kde.org/-/snippets/1716. Compile with C++20 and -fcoroutines-ts -stdlib=libc++ for clang, and -fcoroutines for gcc.

A full library built on the style of coroutines introduced here (with type safe template variants) + other goodies for asynchronous Qt is available at https://invent.kde.org/cblack/croutons.

That's all for this blog post. Stay tuned for more C++-related shenanigans.

Contact Me

If you didn't understand anything here, please feel free to come to me and ask for clarification.

Or, want to talk to me about other coroutine stuff I haven't discussed in these blog posts (or anything else you might want to talk about)?

Contact me here:

Telegram: https://t.me/pontaoski Matrix: #pontaoski:tchncs.de (prefer unencrypted DMs)

Tags: #libre

Recently my 4 year-old stepson saw a kid with an RC racing car in a park. He really wanted his own, but with Christmas and his birthday still being a long way away, I decided to solve the “problem” by combining three things I’m really passionate about: LEGO, electronics and programming.

In this short series of blogs I’ll describe how to build one such car using LEGO, Arduino and a bit of C++ (and Qt, of course!).

LEGO

Obviously, we will need some LEGO to build the car. Luckily, I bought LEGO Technic Mercedes Benz Arocs 3245 (40243) last year. It’s a big build with lots of cogs, one electric engine and bunch of pneumatics. I can absolutely recommend it - building the set was a lot of fun and thanks to the Power Functions it has a high play-value as well. There’s also fair amount of really good MOCs, especially the MOC 6060 - Mobile Crane by M_longer is really good. But I’m digressing here. :)

Mercedes Benz Arocs 3245 (40243) Mercedes Benz Arocs 3245 (40243)

The problem with Arocs is that it only has a single Power Functions engine (99499 Electric Power Functions Large Motor) and we will need at least two: one for driving and one for steering. So I bought a second one. I bought the same one, but a smaller one would probably do just fine for the steering.

LEGO Power Functions engine (99499)

I started by prototyping the car and the drive train, especially how to design the gear ratios to not overload the engine when accelerating while keeping the car moving at reasonable speed.

First prototype of engine-powered LEGO car

Turns out the 76244 Technic Gear 24 Tooth Clutch is really important as it prevents the gear teeth skipping when the engine stops suddenly, or when the car gets pushed around by hand.

76244 Technic Gear 24 Tooth Clutch

Initially I thought I would base the build of the car on some existing designs but in the end I just started building and I ended up with this skeleton:

Skelet of first version of the RC car

The two engines are in the middle - rear one powers the wheels, the front one handles the steering using the 61927b Technic Linear Actuator. I’m not entirely happy with the steering, so I might rework that in the future. I recently got Ford Mustang (10265) which has a really interesting steering mechanism and I think I’ll try to rebuild the steering this way.

Wires

58118 Eletric Power Functions Extension Wires

We will control the engines from Arduino. But how to connect the LEGO Power Functions to an Arduino? Well, you just need to buy a bunch of those 58118 Electric Power Functions Extension Wires, cut them and connect them with DuPont cables that can be connected to a breadboard. Make sure to buy the “with one Light Bluish Gray End” version - I accidentally bought cables which had both ends light bluish, but those can’t be connected to the 16511 Battery Box.

We will need 3 of those half-cut PF cables in total: two for the engines and one to connect to the battery box. You probably noticed that there are 4 connectors and 4 wires in each cable. Wires 1 and 4 are always GND and 9V, respectively, regardless of what position is the switch on the battery pack. Wires 2 and 3 are 0V and 9V or vice versa, depending on the position of the battery pack switch. This way we can control the engine rotation direction.

Schematics of PF wires

For the two cables that will control the engines we need all 4 wires connected to the DuPont cable. For the one cable that will be connected to the battery pack we only need the outter wires to be connected, since we will only use the battery pack to provide the power - we will control the engines using Arduino and an integrated circuit.

I used the glue gun to connect the PF wires and the DuPont cables, which works fairly well. You could use a solder if you have one, but the glue also works as an isolator to prevent the wires from short-circuiting.

LEGO PF cable connected to DuPont wires

This completes the LEGO part of this guide. Next comes the electronics :)

Arduino

To remotely control the car we need some electronics on board. I used the following components:

  • Arduino UNO - to run the software, obviously
  • HC-06 Bluetooth module - for remote control
  • 400 pin bread board - to connect the wiring
  • L293D integrated circuit - to control the engines
  • 1 kΩ and 2 kΩ resistors - to reduce voltage between Arduino and BT module
  • 9V battery box - to power the Arduino board once on board the car
  • M-M DuPont cables - to wire everything together

The total price of those components is about €30, which is still less than what I paid for the LEGO engine and PF wires.

Let’s start with the Bluetooth module. There are some really nice guides online how to use them, I’ll try to describe it quickly here. The module has 4 pins: RX, TX, GND and VCC. GND can be connected directly to Arduino’s GND pin. VCC is power supply for the bluetooth module. You can connect it to the 5V pin on Arduino. Now for TX and RX pins. You could connect them to the RX and TX pins on the Arduino board, but that makes it hard to debug the program later, since all output from the program will go to the bluetooth module rather than our computer. Instead connect it to pins 2 and 3. Warning: you need to use a voltage divider for the RX pin, because Arduino operates on 5V, but the HC-06 module operates on 3.3V. You can do it by putting a 1kΩ resistor between Arduino pin 3 and HC-06 RX and 2kΩ resistor between Arduino GND and HC-06 RX pins.

Next comes up the L293D integrated circuit. This circuit will allow us to control the engines. While in theory we could hook up the engines directly to the Arduino board (there’s enough free pins), in practice it’s a bad idea. The engines need 9V to operate, which is a lot of power drain for the Arduino circuitry. Additionally, it would mean that the Arduino board and the engines would both be drawing power from the single 9V battery used to power the Arduino.

Instead, we use the L293D IC, where you connect external power source (the LEGO Battery pack in our case) to it as well as the engines and use only a low voltage signal from the Arduino to control the current from the external power source to the engines (very much like a transistor). The advantage of the L293D is that it can control up to 2 separate engines and it can also reverse the polarity, allowing to control direction of each engine.

Here’s schematics of the L293D:

L293D schematics

To sum it up, pin 1 (Enable 1,2) turns on the left half of the IC, pin 9 (Enable 3,4) turns on the right half of the IC. Hook it up to Arduino's 5V pin. Do the same with pin 16 (VCC1), which powers the overall integrated circuit. The external power source (the 9V from the LEGO Battery pack) is connected to pin 8 (VCC2). Pin 2 (Input 1) and pin 7 (Input 2) are connected to Arduino and are used to control the engines. Pin 3 (Output 1) and pin 6 (Output 2) are output pins that are connected to one of the LEGO engines. On the other side of the circuit, pin 10 (Input 3) and pin 15 (Input 4) are used to control the other LEGO engine, which is connected to pin 11 (Output 3) and pin 14 (Output 4). The remaining four pins in the middle (4, 5, 12 and 13 double as ground and heat sink, so connect them to GND (ideally both Arduino and the LEGO battery GND).

Since we have 9V LEGO Battery pack connected to VCC2, sending 5V from Arduino to Input 1 and 0V to Input 2 will cause 9V on Output 1 and 0V on Output 2 (the engine will spin clockwise). Sending 5V from Arduino to Input 2 and 0V to Input 1 will cause 9V to be on Output 2 and 0V on Output 1, making the engine rotate counterclockwise. Same goes for the other side of the IC. Simple!

Photo of all electronic components wired together Photo of all electronic components wired together

Conclusion

I also built a LEGO casing for the Arduino board and the breadboard to attach them to the car. With some effort I could probably rebuild the chassis to allow the casing to “sink” lower into the construction.

Photo of LEGO car with the electronics on board

The batterry packs (the LEGO Battery box and the 9V battery case for Arduino) are nicely hidden in the middle of the car on the sides next to the engines.

Photo of LEGO Battery Box Photo of Arduino 9V battery case

Now we are done with the hardware side - we have a LEGO car with two engines and all the electronics wired together and hooked up to the engines and battery. In the next part we will start writing software for the Arduino board so that we can control the LEGO engines programmatically. Stay tuned!

Tuesday, 15 June 2021

Every so often there appear some new pics from developer builds of Windows or even leaks such as the recent Windows 11 preview screenshots. More or less every time this happens there are comments from the Linux side that Windows is copying KDE Plasma – a desktop environment that is, granted, among the most similar...... Continue Reading →

Project Overview

GCompris is a high quality educational software suite, including a large number of activities for children aged 2 to 10, some of the activities are game orientated, but nonetheless still educational.

Currently GCompris offers more than 100 activities, and more are being developed. GCompris is free software, it means that you can adapt it to your own needs, improve it, and most importantly share it with children everywhere.

The GCompris project is hosted and developed by the KDE community.

My project goals include adding four new activities to GCompris:

  • Subtraction decimal number activity.
  • Addition decimal number activity.
  • Programming maze loops activity.
  • Mouse control action activity.

Community Bonding Period

During this period I have interacted with my mentors, and discussed multiple design aspects for extending the original decimal activity, so that it can support both addition decimal activity and subtraction decimal activity, I started to add the decimal point character in numPad to support typing it for decimal activities, I managed to add a task for each activity on phabricator to track the progress.

Decimal Activity

First Week Report

So, the first week of coding period has ended. It was exciting and full of challenges. I am happy that I am on the right track and making progress as I’ve promised. I have started by adding subtraction decimal number activity, its goal is to learn subtraction for decimal numbers.

Here is a quick summary of the work done last week:

  1. Adding multiple datasets, from which we generate two different decimal numbers.
  2. Creating a new component MultipleBars.qml, from which the largest decimal number is represented as multiple bars, each bar consists of ten square units, some of them is semi-transparent according to the largest number shown.
  3. Adding numPad.qml to the activity, so that we can ask the child to type the result if he represented the result correctly.
  4. Adding TutorialBase.qml including instructions on how to play with the activity.
what’s next ?

I will start implementing Addition decimal activity, and wait for mentors’ reviews on subtraction decimal activity as it is still in progress.

I am delighted to have such an enthusiastic, helpful and inspiring community

That’s all for now. See you next time!

Thanks for reading!

Once again I plan to be at Akademy. I almost silently attended last year edition. OK… I had a talk there but didn’t blog. I even didn’t post my traditional sketchnotes post. I plan to do better this year.

I’ll try to sketchnote again, we’ll see how that works out. Oddly enough, I might do the 2020 post after the 2021 one. 😀

This year I’ll also be holding a training and a couple of talks. Last but not least, I’ll attend the KF6 BoF. I’ll see if I can attend a couple more but that’ll mainly depend how compatible it is with my schedule otherwise.

Also, I’m particularly proud to be joined by a couple of colleagues from enioka Haute Couture. Without further ado here is where you will or might find us:

  • Friday 18 June, starting at 18:00 CEST, I’ll be holding a 4 hours (!) training about the KDE Stack, if you’re interested in getting a better understanding on how KDE has built the stack for its applications and workspaces, but also how all the pieces are tied together, this will be the place to be;

  • Saturday 19 June, at 12:20 CEST, my colleague Méven Car will give an update about the Wayland Goal, he’ll be joined by Vlad Zahorodnii;

  • Following up his talk, at 13:00 CEST, Méven will also participate in the KDE Goals roundtable;

  • Still the same day, at 21:00 CEST, I’ll be on stage again to talk about KDE Frameworks Architecture, I’ll go back to how it’s structured in KF5 and will propose a potential improvement for KF6;

  • On Monday 20 June, a bunch of eniokians will participate in the KDE e.V. general assembly;

  • Somewhen during the week I’ll participate in the KF6 BoF (not scheduled yet at time of writing), obviously I’ll be interested in discussing the ideas from my talk with the rest of the KDE Frameworks contributors;

  • And finally, Friday 25 June, at 19:00 CEST, I’ll be joined by my colleague Christelle Zouein for our talk about community data analytics, we got a bunch of new toys to play with thanks to Christelle’s work and the community move towards GitLab and we’ll show some results for the first time.

Of course it also means I’m on my way to… ah well… no, I’m not on my way. I’ll just hook up my mic and camera like everyone else. See you all during Akademy 2021!

Bug triaging is a largely invisible and often thankless task. But it’s the foundation of quality in our software offerings. Every day, our users file between 30 and 50 bug reports on https://bugs.kde.org, and often up to 100 right after a big release! Many will be duplicates of pre-existing issues and need to be marked as such. Quite a few will be caused by issues outside of KDE’s control and this also needs to be marked as such. Many will be crash reports with missing or useless backtraces, and their reporters need to be asked to add the missing information to make the bug report actionable. And the rest need to be prioritized, moved to the right component, tagged appropriately, and eventually fixed.

All of this sounds pretty boring. And, to be honest, sometimes it is (I’m really selling this, right?). But it’s critically important to everything we do. Because when it’s not done properly:

  1. Users don’t feel listened to, and start trashing us and our software on social media.
  2. Critical regressions in new releases get missed and are still visible when reviewers check out the new version, so they also trash it in high-profile tech articles and videos.
  3. Un-actionable bug reports pile up and obscure real issues, so developers are less likely to notice them and fix them.
  4. Bugs that occur over and over again don’t accumulate dozens of duplicates, don’t look important enough to prioritize, and therefore don’t get fixed.
  5. Easy-to-fix bugs don’t get fixed by anyone and it’s pretty embarrassing.

Do you see a pattern? Most of these results end up with KDE software being buggier and KDE’s reputation being damaged. It’s not an accident that KDE’s software is less buggy than ever before that that we enjoy a good reputation today. These positive developments are driven by everyone involved, but they rest upon an invisible foundation of good bug triage. And as KDE software becomes more popular, users file more bug reports. So the need for bug triage constantly grows. Currently it is done by just a few people, and we need help. Your help! And it will truly be helpful! If you are a meticulous, detail-oriented person with some technical inclination but no programming ability, triaging bug reports may just be the best way to help KDE. If this sounds like something you’d like to get involved with, go over to https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging and give it a read! I would be happy to offer personal bug triaging mentorship, too. Just click the “Contact me” link here or at the top of the page and I’ll help you get started.

Like most online manuals, the Krita manual has a contributor’s guide. It’s filled with things like “who is our assumed audience?”, “what is the dialect of English we should use?”, etc. It’s not a perfect guide, outdated in places, definitely, but I think it does its job.

So, sometimes I, who officially maintains the Krita manual, look at other project’s contributor’s guides. And usually what I find there is…

Style Guides

The purpose of a style guide is to obtain consistency in writing over the whole project. This can make the text easier to read and simpler to translate. In principle, the Krita manual also has a style guide, because it stipulates you should use American English. But when I find style guides in other projects it’s often filled with things like “active sentences, not passive sentences”, and “use the Oxford comma”.

Active sentences versus passive sentences always gets me. What it means is the difference between “dog bites man” and “man is bitten by dog”. The latter sentence is the one in the passive voice. There’s nothing grammatically incorrect about it. It’s a bit longer, sure, but on the other hand, there’s value in being able to rearrange the sentence like that. For a more Krita specific example, consider this:

“Pixels are stored in working memory. Working memory is limited by hardware.”

“Working memory stores the pixels. Hardware limits the working memory.”

The first example is two sentences in the passive voice, the latter two in the active. The passive voice example is longer, but it is also easier to read, as it groups the concepts together and introduces new concepts later in the paragraph. Because we grouped the concepts, we can even merge the sentences:

“Pixels are stored in working memory, which is limited by hardware.”

But surely, if so many manuals have this in their guide, maybe there is a reason for it? No, the reason it’s in so many manuals’ style guide, is because other manuals have it there. And the reason other manuals have it there, is because magazines and newspapers have it there. And the reason they have that, is because it is recommended by style guides like The Elements of Style. There is some(?) value for magazines and newspapers in avoiding the passive voice because it tends to result in longer sentences than the active voice, but for electronic manuals, I don’t see the point of worrying about things like these. We have an infinite word count, so maybe we should just use that to make the text easier to read?

The problem of copying style rules like this is also obfuscated by the fact a lot of people don’t really know how to write. In a lot of those cases, the style guide seems to be there to allow role playing that you are a serious documentation project, if not a case of ‘look busy’, and it can be very confusing to the person being proofread. I’ve accepted the need for active voice in my university papers, because I figured my teachers wanted to help me lower my word count. I stopped accepting it when I discovered they couldn’t actually identify the passive voice, pointing at paragraphs that needed no work.

This kind of insecurity-driven proofreading becomes especially troublesome when you consider that sometimes “incorrect” language is caused by the writer using a dialect. It makes sense to avoid dialect in documentation, as they contain specific language features that not everyone may know, but it’s a whole other thing entirely to tell people their dialect is wrong. So in these cases, it’s imperative the proofreader knows why certain rules are in place so they can communicate why something should be changed without making the dialect speaker insecure about their language use.

Furthermore, a lot of such style guide rules are filled with linguistic slang, which is abstract and often derived from Latin. People who are not confident in writing will find such terms very intimidating, as well as hard to understand, and this in turn leads to people being less likely to contribute. In a lot of those cases, we can actually identify the problems in question via a computer program. So maybe we should just do that, and not fill our contributor’s guide with scary linguistic terms?

A section of the animation curves reStructuredText as shown in the gitlab UI. Several words are bolded.
One of our animation programmers, Emmet, has a tendency to bold words in the text, which is to help the reader find their way in large pieces of text. We do this nowhere else in the manual, but I’m okay with it? The only thing that bothers me is that the mark up used is the one for strong/bold, even though the sentences in question have clear semantic reasons to be highlighted like this. What this means is that it won’t work really well with a screen reader (markup for bold/strong tends to be spoken with a lot of force by these readers), but this is solved by finding proper semantic mark for this. But overall, this is done with the reader’s comfort in mind, so I don’t see why we shouldn’t spend time on getting this to work instead of worrying it’s non-standard.

LanguageTool

Despite my relaxed approach to proofreading, I too have points at which I draw the line. In particular, there’s things like typos, missing punctuation, errant white-spaces. All these are pretty uncontroversial.

In the past, I’d occasionally run LanguageTool over the text. LanguageTool is a java based style and grammar checker licensed under LGPL 2.1. It has a plugin for LibreOffice, which I used a lot when writing university papers. However, by itself LanguageTool cannot recognize mark-up. To run it over the Krita documentation, I had to first run the text through pandoc to convert from reStructuredText to plain text, which was then fed to the LanguageTool jar.

I semi-automated this task via a bash script:

#!/bin/sh

# Run this file inside the language tool folder.
# First argument is the folder, second your mother tongue.
for file in $(find $1 -iname "*.rst");
do
    pandoc -s $file -f rst -t plain -o ~/checkfile.txt 
    echo $file
    echo $file >> ~/language_errors.txt
    # Run language tool for en-us, without multiple whitespace checking and without the m-dash suggestion rule, using the second argument as the mother tongue to check for false friends.
    java -jar languagetool-commandline.jar -l en-US -m $2 --json -d WHITESPACE_RULE,DASH_RULE[1] ~/checkfile.txt >> ~/language_errors.txt
    rm ~/checkfile.txt
done

This worked pretty decently, though there were a lot of false positives (mitigated a bit by turning off some rules). It was also always a bit of a trick to find the precise location of the error, because the conversion to plaintext changed the position of the error.

I had to give up on this hacky method when we started to include python support, as that meant python code examples. And there was no way to tell pandoc to strip the code examples. So in turn that meant there were just too many false positives to wade through.

There is a way to handle mark-up, though, and that’s by writing a java wrapper around LanguageTool that parses through the marked-up text, and then tells LanguageTool which parts are markup and which parts can be analyzed as text. I kind of avoided doing this for a while because I had better things to do than to play with regexes, and my Java is very rusty.

One of the things that motivated me to look at it again was the appearance of the code quality widget in the Gitlab release notes. Because one of my problems is that notions of “incorrect” language can be used to bully people, I was looking for ways to indicate that everything LanguageTool puts out is considered a suggestion first and foremost. The code quality widget is just a tiny widget that hangs out underneath the merge request description, that says how many extra mistakes the merge request introduces, and is intended to be used with static analysis tools. It doesn’t block the MR, it doesn’t confuse the discussion, and it takes a JSON input, so I figured it’d the ideal candidate for something as trivial as style mistakes.

So, I started up eclipse, followed the instructions on using the Java api (intermissioned by me realizing I had never used maven and needing a quick tutorial), and I started writing regular expressions.

Reusing KSyntaxHighlighter?

So, people who know KDE’s many frameworks know that we have a collection of assorted regex and similar for a wide variety of mark up systems and languages, KSyntaxHighlighter, and it has support for reStructuredText. I had initially hoped I could just write something that could take the rest.xml file and use that to identify the mark up for LanguageTool.

Unfortunately, the regex needs of KSyntaxHighlighter is very different from the ones I need for LanguageTool. KSyntax needs to know whether we have entered a certain context based on the mark-up, but it doesn’t really need to identify the mark-up itself. For example, the mark up for strong in reStructuredText is **strong**.

The regular expression to detect this in rest.xml is \*\*[^\s].*\*\*, translated: Find a *, another *, a character that is not a space, a sequence of zero or more characters of any kind, another * and finally *.

What I ended up needing is: "(?<bStart>\*+?)(?<bText>[^\s][^\*]+?)(?<bEnd>\*+?)", translated: Find group of *, name it ‘bStart’, followed by a group that does not start with a space, and any number of characters after it that is not a *, name this ‘bText’, followed by a group of *, name this ‘bEnd’.

The bStart/bText/bEnd names allow me to append the groups separately to the AnnotatedTextBuilder:


if (inlineMarkup.group("bStart") != null) {
 builder.addMarkup(line.substring(inlineMarkup.start("bStart"), inlineMarkup.end("bStart")));
 handleReadingMarks(line.substring(inlineMarkup.start("bText"), inlineMarkup.end("bText")));
 builder.addMarkup(line.substring(inlineMarkup.start("bEnd"), inlineMarkup.end("bEnd")));
 }

So I had to abandon adopting the KSyntaxHighlighter format for this and do my own regexes.

Results

Eventually, I had something that worked. I managed to get it to write the errors it found to a json file that should work the code quality widget. I also implemented an accepted words list, which at the very least took a third off the initial set of errors. I’ve managed to actually get it to find about 105 errors on the 5000 word KritaFAQ, most of which are misspelled brand names, but it also found missing commas and errant white-spaces.

A small sample of the error output:

{
    "severity": "info",
    "fingerprint": "docs-krita-org/KritaFAQ.rst:8102:8106",
    "description": "Did you mean <suggestion>Wi-Fi<\/suggestion>? (This is the officially approved term by the Wi-Fi Alliance.) (``wifi``)",
    "check_name": "WIFI[1]",
    "location": {
      "path": "docs-krita-org/KritaFAQ.rst",
      "position": {
        "end": {"line": 176},
        "begin": {"line": 176}
      },
      "lines": {"begin": 176}
    },
    "categories": ["Style"],
    "type": "issue",
    "content": "Type: Other, Category: Possible Typo, Position: 8102-8106 \n\nIt might be that your download got corrupted and is missing files (common with bad wifi and bad internet connection in general), in that case, try to find a better internet connection before trying to download again.  \nProblem: Did you mean <suggestion>Wi-Fi<\/suggestion>? (This is the officially approved term by the Wi-Fi Alliance.) \nSuggestion: [Wi-Fi] \nExplanation: null"
  },
  {
    "severity": "info",
    "fingerprint": "docs-krita-org/KritaFAQ.rst:8379:8388",
    "description": "Possible spelling mistake found. (``harddrive``)",
    "check_name": "MORFOLOGIK_RULE_EN_US",
    "location": {
      "path": "docs-krita-org/KritaFAQ.rst",
      "position": {
        "end": {"line": 177},
        "begin": {"line": 177}
      },
      "lines": {"begin": 177}
    },
    "categories": ["Style"],
    "type": "issue",
    "content": "Type: Other, Category: Possible Typo, Position: 8379-8388 \n\nCheck whether your harddrive is full and reinstall Krita with at least 120 MB of empty space.  \nProblem: Possible spelling mistake found. \nSuggestion: [hard drive] \nExplanation: null"
  },
  {
    "severity": "minor",
    "fingerprint": "docs-krita-org/KritaFAQ.rst:8546:8550",
    "description": "Use a comma before 'and' if it connects two independent clauses (unless they are closely connected and short). (`` and``)",
    "check_name": "COMMA_COMPOUND_SENTENCE[1]",
    "location": {
      "path": "docs-krita-org/KritaFAQ.rst",
      "position": {
        "end": {"line": 177},
        "begin": {"line": 177}
      },
      "lines": {"begin": 177}
    },
    "categories": ["Style"],
    "type": "issue",
    "content": "Type: Other, Category: Punctuation, Position: 8546-8550 \n\nIf not, and the problem still occurs, there might be something odd going on with your device and it's recommended to find a computer expert to diagnose what is the problem.\n \nProblem: Use a comma before 'and' if it connects two independent clauses (unless they are closely connected and short). \nSuggestion: [, and] \nExplanation: null"
  }

There’s still a number of issues. Some mark up is still not processed, I need to figure out how to calculate the column, and just simply that I am unhappy with the command line arguments (they’re positional only right now).

One of the things I am really worrying about is the severity of errors. Like I mentioned before, dialects often get targeted by things that determine “incorrect” language, and LanguageTool does have rules that target slang and dialect. Similarly, people tend to take suggestions from computers more readily without question, so, I’ll need to introduce some configuration.

  1. Configuration to turn rules on and off.
  2. Errors that are uncontroversial should be marked higher, so that people are less likely to assume all the errors should be fixed.

But that’ll be at a later point…

Now, you might be wondering: “Where is the actual screenshot of this thing working in the Gitlab UI?” Well, I haven’t gotten it to work there yet. Partially because the manual doesn’t have CI implemented yet (we’re waiting for KDE’s servers to be ready), and partially because I know nothing about CI and have barely got an idea of Java, and am kinda stuck?

But, I can run it for myself now, so I can at the least do some fixes myself. I put the code up here, bear in mind I don’t remember how to use Java at all, so if I am committing Java sins, please be patient with me. Hopefully, if we can get this to work, we can greatly simplify how we handle style and grammar mistakes like these during review, as well as simplifying contributor’s guides.

After many struggles with using git LFS on repositories that need to store big files, I decided to spend some time on checking the status of the built-in partial clone functionality that could possibly let you achieve the same (as of git 2.30).

TL;DR: The building blocks are there, but server-side support is spotty and critical tooling is missing. It’s not very usable yet, but it’s getting there.

How partial clone works

Normally, when you clone a git repository, all file versions throughout the whole repository history are downloaded. If you have multiple revisions of multi-GB binary files, as we have in some projects, this becomes a problem.

Partial clone lets you download only a subset of objects in the repository and defer downloading the rest, until needed. Most of the time, it means checkout.

For example, to clone a repository with blobs in only the latest version of the default branch, you can do as follows:

git clone --filter=blob:none git@example.com:repo.git

The --filter part is crucial; it tells you what to include/omit. It’s called a filter-spec, across the git docs. The specifics of how you can filter are available in git help rev-list. You can filter based on blob size, location the in tree (slow! – guess why), or both.

The remote from which you cloned will be called a “promisor” remote, so-called because it promises to fulfill requests for missing objects when requested later:

[remote "origin"]
    url = git@example.com:repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
    promisor = true
    partialclonefilter = blob:limit=1048576 # limited to 1M

As you change branches, the required files will be downloaded on-demand during checkout.

Below is a video of a partial checkout in action. Notice how the actual files are downloaded during the checkout operation, and not during clone:

Comparison

I checked out Linux kernel from the GitHub mirror, through regular and partial clone, and recorded some statistics:

As you can see, there are some tradeoffs. Checking out takes longer because the actual file content has to be downloaded, not just copied from object store. There are savings in terms of initial clone and repository size because you’re not storing a copy of various driver sources deprecated since the late 90s. The gains would be even more pronounced in repositories that store multiple versions of big binary files. Think evolving game assets or CI system output.

So what are the problems?

Missing/incomplete/buggy server support

The server side needs to implement git v2 protocol. Many don’t do it yet, or do it in a limited manner.

No cleanup tool

As you check out new revisions with big files and download them, you will end up with lots of old data from the previous versions because it’s not cleaned up automatically. Git LFS has the git lfs prune tool. No such thing yet exists for partial clones. See this git mailing list thread.

No separate storage of big files on server side (yet)

Since you want server-side operations to happen quickly, it’s best to store the git repository on a very fast storage, which also happens to be expensive. It would be nice to store big files that don’t really need fast operations (you won’t do diffs on textures or sound files server-side) separately.

Christian Couder of GitLab is working on something around this. It’s already possible to have multiple promisor remotes queried in succession. For example, there could be separate promisor remote backed by a CDN or cloud storage (e.g. S3). However, servers will need to learn how to push the data there when users push their trees.

See this git mailing list thread.

Generally cumbersome UX

Since everything is fresh, you need to add some magical incantations to git commands to have it working. Ideally, some “recommended” filter should be stored server-side, so that users don’t have to come up with filter spec on their own, when cloning.

Resources

Below are some useful links, if you’d like to learn more about partial cloning in git:

Currently, a lot of effort around partial cloning is driven by Christian Couder of GitLab. You can follow some of the development under the following links:

If you would like to learn Git, KDAB offers an introductory training class.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post State of Native Big File Handling in Git appeared first on KDAB.