Skip to content

Thursday, 28 September 2023

for what seems an eternity i've been running linux as my daily driver and evangilising to anyone that would listen.  however working as a carpenter and having a young family didn't leave much energy, let alone time to contribute.  but over the course of the last year slowly my contributions have been increasing.  as we age our bodies aren't capable of the physical things that you used to be easy and this combined with some other events have allowed me to able fulfil a long held ambition and contribute in a meaningful way to KDE and neon.  anyway enough self-reflection from this antopidean.

thanks to the patience of @jriddell, @sitter and @sgmoore i'm very happy to be helping get neon out the door, so to speak.  after the big plasma 6 push that started in March, i've been improving my ruby skills  trying to get neon's tooling into the best possible shape.  as qt6 started to filter down from unstable, it became apparent that compilation time saving hack of building qt5 and kde frameworks5 once for unstable and copying into the stable and user archives wasn't scaling any longer and couldn't be applied to qt6 and kf6.  after changing unstable to track the latest qt6.6 beta's at the request of the plasma devs, stable and user started building their very own qt5 and kf5.  besides a few late nights and much swearing at versioning problems it all went swimmingly. \0/

lately my energies have been split between trying to get more apps with a decicated kf6 branch into the experimental overlay and trying to keep up with the rapid rate of apps in unstable whose master branch has gone qt6/kf6 only.  as sgmoore pointed out porting pim6 alone has been quite the process.  with a raft of new red/failed job build's in unstable and qt6.6 beta4 just released, it looks like the pace isn't going to slow down for the forseeable future.  anyhow i'll sign off with the obligatory screenshot. cheery bye ;]

kate6_on plasma6 on neon_unstable.png, Sep 2023
kate6_on plasma6 on neon_unstable.
kate6_on plasma6 on neon_unstable.

 

 

Wednesday, 27 September 2023

This is a longer one … it took long too.

I would apologise, but it is one of those things where it hurt me more than it will hurt you to read this extremely abbreviated version.

Setting up sudoedit

There is no two ways about it: sudoedit is the smart way to edit text files as root.

In short what sudoedit does is to:

  1. sudo,
  2. copy the file you want to edit into a temporary location,
  3. edit that file with sudo $VISUAL, or failing that sudo $EDITOR, and
  4. when you save and exit the editor, replace the original file with the temporary copy.

To set up Helix as the default text editor in the console, and KWrite when X11 or Wayland are available (both also for sudoedit), I did the following (in Fish shell):

set --universal --export EDITOR helix
set --universal --export VISUAL kwrite

Borg backup

Before I started messing with the Btrfs RAID, I decided to set up the backups.

I decided to stick with Borg, but to simplify setting up and running, used the Borgmatic wrapper. I have to say I am very pleased with it.

First, I installed it with yay borgmatic.

I took my sweet time to set it up, but most of that was spent reading the documentation and the extensive (over 900 lines) config file template.

But ultimately my /etc/borgmatic/config.yaml consist of less than 30 actual lines and, including all the reading, I was done in an afternoon. The interesting bits I share below.

Excludes

Some things are just not worth backing up and just waste resources. So I added the following:

exclude_from:
    - /etc/borgmatic/excludes

exclude_caches: true

exclude_if_present:
    - .nobackup

Then I slapped .nobackup into directories in my home that I typically do not want backed up like downloads, video, games and some (non-hidden) temporary folders.

The exclude_caches bit relies on people following the Cache Directory Tagging Specification, which from what I can tell, is a pretty rare occasion.

In fact, locate CACHEDIR.TAG on my system shows only three such files – I hope this improves as I install more things:

/home/hook/.cache/borg/CACHEDIR.TAG
/home/hook/.cache/fontconfig/CACHEDIR.TAG
/home/hook/.cargo/registry/CACHEDIR.TAG
So to filter out more caches and for the rest I created a /etc/borgmatic/excludes file that includes: (click to expand)
## Temporary and backup files

*~
*.tmp
*.swp

## Temporary and cache directories

**/tmp
**/temp
**/cache
**/Cache
**/.cache
**/.Cache
**.cache
**.Cache
**.ccache

## Big files

*.iso
*.img

## Top level directories to always exclude

/dev
/etc/mtab
/media
/mnt
/proc
/run
/sys
/tmp
/var/cache
/var/tmp

## Local applications to be excluded (for more use `.nobackup`)

/home/*/.local/share/akonadi
/home/*/.local/share/baloo
/home/*/.local/share/lutris
/home/*/.local/share/Trash
/home/*/.local/share/Steam
/home/*/.kde4/share/apps/ktorrent
/home/*/.config/chromium
/home/*/.wine

I have to agree with the author of the CacheDir spec, caches are aplenty and it is hard to RegExp them all. If only everyone just put their cache in ~/.cache

Relying on .nobackup is a new approach I tried, but let us see how it works out for me. I am cautiously optimistic, as it is much easier to just touch .nobackup than it is to sudoedit /etc/borgmatic/config.yaml, enter the password and copy paste the folder.

Checks

Backups are only worth anything if you can restore from them. Borg does offer checks, and Borgmatic offers a way to easily fine-tune which checks to run and how often.

With the below settings Borgmatic:

  • every day when it creates a backup, also checks the integrity of the repository and the archives
  • once a month, tries to extract the last backup
  • every three months checks the integrity of all data
checks:
    - name: repository
      frequency: always
    - name: archives
      frequency: always
    - name: extract
      frequency: 1 month
    - name: data
      frequency: 3 months

Restrict access

Of course I encrypt my backups.

To further limit access to the backup server, accessing the backup server is only possible with an SSH key. Furthermore, the ~/.ssh/authorized_keys on the server restricts access only to the specific backup repository and only allows the borg serve command to be ran:

command="cd {$path_to_borg_repos}; borg serve --restrict-to-path {$path_to_borg_repos}/{$repo_of_this_machine}",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc {$ssh_key}

I already had Borg running on the backup server, so I merely needed to add another line like the above to the server and set up Borgmatic on Leza.

Automate it all

At the end I enabled systemctl enable --now borgmatic.timer to have systemd handle the “cronjob” on a daily basis.

Borgmatic does the heavy lifting of figuring what exactly (if anything) it needs to do that day, so that was super-simple.

Btrfs RAID

Well, RAID-ifying my Btrfs was quite a ride … and I have no-one to blame but myself1 for the issues I had.

I also have to thank everyone on #btrfs IRC channel (esp. balrog, dicot, kepstin, multicore, opty, specing, Zygo); dalto from the EndeavourOS forum, and above all TJ from KDE’s Matrix channel for their help, helping me to dig myself out of the mess I made.

Below, I will document things as they should have been done, and add what I did wrong and how we fixed it in a expandable sub-section.

Do note that “what I should have done” is just me applying some hindsight to my errors, it may still have holes. Also the “what I did” parts are partially re-created from memory and omit a lot of trial and error of trying to fix stuff.

Baloo fixed on Btrfs

The Baloo reindexes everything after every reboot issue is now fixed and will be part of the KDE Frameworks 5.111 release.

Add a new device

As I described in the my base install blog post, the partitioning of my Goodram PX500 SSD was (roughly):

  • 1 GBESP
  • 990 GBLUKS + Btrfs
  • 9 GBLUKS + swap

The Goodram PX500 has 953,86 GiB of space.

But Samsung 970 Evo Plus has 931,51 GiB of space.

Although both drives are marketed as 1 TB (= 931,32 GiB), there was a difference of 20 GiB between them. And unfortunately in the wrong direction, so just the system partition on the Goodram SSD was larger than the whole Samsung SSD.

After plugging in the new SSD, what I should have done was just:

  • create a 1 GiB fat32 partition at the beginning,
  • (create a 10 GiB LUKS + swap partition at the end),
  • create a LUKS partition of what-ever remains on the newly added drive.

… and simply not care that the (to-be) Btrfs partition is not the same size as on the old one.

If fact, I could have also just skipped making a swap partition on the Samsung SSD. If/when one of the drives dies, I would replace it with a new one, and could just create swap on the new drive anyway.

Was that what I did?

Of course bloody not! 🤦‍♂️

Expand if you are curious about how removing and re-creating the swap partition caused me over two days of fixing the mess.

While the gurus at Btrfs assured me it would totally work, if I just had two differently-sized Btrfs partitions put into one RAID1, as long as I do not care some part of the larger partition will not be used, I still wanted to try to resize the partitions on Goodram SSD.

Did I ignore this good advice?

Oh yes!

I am not a total maniac though, so just in case, before I did anything, I made backups, and ran both btrfs scrub and btrfs balance (within Btrfs Assistant, so I do not mess anything up).

The big complication here is that this is a multi-step approach, depending on many tools, as one needs to resize the LUKS partition as well as Btrfs within it, and there are several points where one can mess up things royally, ending with a corrupt file system.

One option would be to go the manual way through CLI commands and risk messing up myself.

The other option would be to use a GUI to make things easier, but risk that the GUI did not anticipate such a complex task and will mess things up.

After much back and forth, I still decided to give KDE Partition Manager a go and see if I can simply resize a Btrfs partition within LUKS there. In the worst case, I already had backups.

… and here is where I messed things up.

Mea culpa!

Honestly, I would have messed things up the same way if I did it in CLI.

If anything, I am impressed how well KDE Partition Manager handled such a complex task in a very intuitive fashion.

What I did then was:

  1. Resized luks+btrfs (nvme0n1p2) on Goodram to be 20 GiB smaller – this is where I thought things would break, but KDE Partition Manager handled it fine. But now I had 20 GiB of unused disk space between nvme0n1p2 (btrfs) and nvme0n1p3 (swap).
  2. To fix this I decided to simply remove the swap (nvme0n1p3) and create a new one to fill the whole remaning space.
  3. (While I was at it, I added and changed a few partition labels, but that did not affect anything.)

So, I ended up with:

Goodram PX500:

partitionsizefile systemmount point
unallocated1,00 MiBunallocated
nvme0n1p11.000,00 MiBfat32/boot/efi
nvme0n1p2924,55 GiBbtrfs (luks)/
nvme0n1p328,34 GiBswap (luks)swap

Samsung 970 Evo Plus:

partitionsizefile systemmount point
nvme1n1p11.000,00 MiBfat32/mnt/backup_efi
nvme1n1p2920,77 GiBbtrfs (luks)
nvme1n1p39,77 GiBswap (luks)swap

At first things were peachy fine.

… and then I rebooted and was greeted with several pages of Dracut essentially warning me that it cannot open an encrypted partition.

Several lines of dracut-initqueue warnings about a hook timing out. Dracut gives up and offers logging into an emergency shell.

So what happened was that I forgot that since I removed and re-created nvme0n1p3 (swap), it now has a different UUID – which is why Dracut could not find it. 😅

After much trial and error and massive help from TC, we managed to identify the problem and solution through the emergency shell. It would have been possible to do that – and probably faster – by booting from LiveUSB too, but both TC and I were already deeply invested and had (some kind of twisted) fun doing it the emergency shell2. Luckily the Btrfs partition got unencrypted, so we could use chroot.

Long story short, this was the solution:

  1. Reboot and in GRUB edit the boot command to remove the non-existing swap partition from the kernel line.
  2. Wait during boot that systemd gives up on the non-existing swap partition.
  3. When in my normal system sudoedit /mnt/rootfs/etc/crypttab and sudoedit /mnt/rootfs/etc/fstab to change the UUID of the encrypted swap partition to the new partition’s UUID.
  4. sudoedit /etc/dracut.conf.d/calamares-luks.conf to change the swap partition’s UUID for the new one.
  5. sudo dracut-rebuild
  6. sudoedit /etc/default/grub – specifically the GRUB_CMDLINE_LINUX_DEFAULT line – to change the swap partition’s UUID for the new one, as well as make sure every LUKS-encrypted partition’s UUID has a rd.luks.uuid= entry there.
  7. sudo grub-install (just in case) and sudo grub-mkconfig.
  8. Reboot 😄

There was another self-caused issue that took me way too long to figure out, until someone on the #btrfs IRC channel pointed it out. I forgot the closing ' in the linux (a.k.a. “kernel”) line in GRUB, which is why grub-install would fail me, complaining about GRUB_ENABLE_CRYPTODISK=y missing, while it was clearly there in /etc/default/grub. I just had to add that ' at the end of GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and GRUB was happy again.

That was essentially the big oops and how it got fixed.

In the now my /etc/dracut.conf.d/* looks like:

# Configuration file automatically written by the Calamares system installer
# (This file is written once at install time and should be safe to edit.)
# Enables support for LUKS full disk encryption with single sign on from GRUB.

# force installing /etc/crypttab even if hostonly="no", install the keyfile
install_items+=" /etc/crypttab /crypto_keyfile.bin "
# enable automatic resume from swap
add_device+=" /dev/disk/by-uuid/2d90af35-7e6a-40f8-8353-f20433d0f994 "

omit_dracutmodules+=" network cifs nfs brltty "
compress="zstd"

force_drivers+=" amdgpu "
add_dracutmodules+=" plymouth "

add_dracutmodules+=" resume "

And /etc/default/grub:

# GRUB boot loader configuration

GRUB_DEFAULT='0'
GRUB_TIMEOUT='5'
GRUB_DISTRIBUTOR='EndeavourOS'
GRUB_CMDLINE_LINUX_DEFAULT='nowatchdog nvme_load=YES rd.luks.uuid=1a45a072-e9ed-4416-ac7e-04b69f11a9cc rd.luks.uuid=c82fca05-59d3-4595-969b-c1c4124d8559 rd.luks.uuid=2d90af35-7e6a-40f8-8353-f20433d0f994 rd.luks.uuid=2e91342f-3d19-4f75-a9a6-fc3f9798cb30 resume=/dev/mapper/luks-2d90af35-7e6a-40f8-8353-f20433d0f994 loglevel=3 splash quiet'
GRUB_CMDLINE_LINUX=""

# Preload both GPT and MBR modules so that they are not missed
GRUB_PRELOAD_MODULES="part_gpt part_msdos"

# Uncomment to enable booting from LUKS encrypted devices
GRUB_ENABLE_CRYPTODISK=y

# Set to 'countdown' or 'hidden' to change timeout behavior,
# press ESC key to display menu.
GRUB_TIMEOUT_STYLE=menu

# Uncomment to use basic console
GRUB_TERMINAL_INPUT=console

# Uncomment to disable graphical terminal
#GRUB_TERMINAL_OUTPUT=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `videoinfo'
GRUB_GFXMODE=auto

# Uncomment to allow the kernel use the same resolution used by grub
GRUB_GFXPAYLOAD_LINUX=keep

# Uncomment if you want GRUB to pass to the Linux kernel the old parameter
# format "root=/dev/xxx" instead of "root=/dev/disk/by-uuid/xxx"
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY='true'

# Uncomment and set to the desired menu colors.  Used by normal and wallpaper
# modes only.  Entries specified as foreground/background.
#GRUB_COLOR_NORMAL="light-blue/black"
#GRUB_COLOR_HIGHLIGHT="light-cyan/blue"

# Uncomment one of them for the gfx desired, a image background or a gfxtheme
GRUB_BACKGROUND='/usr/share/endeavouros/splash.png'
#GRUB_THEME="/path/to/gfxtheme"

# Uncomment to get a beep at GRUB start
#GRUB_INIT_TUNE="480 440 1"

# Uncomment to make GRUB remember the last selection. This requires
# setting 'GRUB_DEFAULT=saved' above.
#GRUB_SAVEDEFAULT=true

# Uncomment to disable submenus in boot menu
GRUB_DISABLE_SUBMENU='false'

# Probing for other operating systems is disabled for security reasons. Read
# documentation on GRUB_DISABLE_OS_PROBER, if still want to enable this
# functionality install os-prober and uncomment to detect and include other
# operating systems.
#GRUB_DISABLE_OS_PROBER=false

Automate decryption

Having four LUKS-encrypted partitions also means needing to decrypt all of them.

To make things easier, I added the same key that nvme1n1p2 uses also to all the other three partitions:

cryptsetup luksAddKey /dev/nvme1n1p3 /crypto_keyfile.bin    # new swap @ Goodram
cryptsetup luksAddKey /dev/nvme0n1p3 /crypto_keyfile.bin    # btrfs @ Samsung
cryptsetup luksAddKey /dev/nvme0n1p2 /crypto_keyfile.bin    # swap @ Samsung

And then I added them also to /etc/crypttab:

# <name>                                    <device>                                        <password>              <options>

### Goodram

## root in RAID1
luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc   UUID=1a45a072-e9ed-4416-ac7e-04b69f11a9cc       /crypto_keyfile.bin     luks

## swap
luks-2d90af35-7e6a-40f8-8353-f20433d0f994   UUID=2d90af35-7e6a-40f8-8353-f20433d0f994       /crypto_keyfile.bin     luks


### Samsung

## root in RAID1
luks-c82fca05-59d3-4595-969b-c1c4124d8559   UUID=c82fca05-59d3-4595-969b-c1c4124d8559       /crypto_keyfile.bin     luks

## swap
luks-2e91342f-3d19-4f75-a9a6-fc3f9798cb30   UUID=2e91342f-3d19-4f75-a9a6-fc3f9798cb30       /crypto_keyfile.bin     luks

Even after this, I still need to enter LUKS password thrice:

  • before GRUB to unlock the Goodram SSD’s root partition,
  • before GRUB to unlock the Samsung SSD’s root partition,
  • during systemd to unlock all four partitions.

If I could shorten this down to just once, it would be even nicer. But that is as far as I managed to get so far. Happy to hear suggestions, of course!

Add new drive to Btrfs to make RAID1

On the new Samsung SSD the nvme1n1p2 partition is LUKS + Btrfs, but when adding a new device to Btrfs RAID with btrfs device add, it expects the partition to be without a file system.

This was a(nother self-inflicted) problem.

I could probably avoid this if I did it in CLI – and perhaps even in KDE Partition Manager, if I spent more time with it.

But now I had to deal with it.

Initially I planned to simply use --force the `btrfs device, but was quickly told by the Btrfs gurus, that there was a much safer option:

So I used wipefs to hide the file system:

wipefs --all /dev/disk/by-uuid/a19847bc-d137-4443-9cd5-9f311a5d8636

Then I had to add the device to the same Btrfs mount point:

btrfs device add /dev/mapper/luks-c82fca05-59d3-4595-969b-c1c4124d8559 /

And finally convert the two devices into Btrfs RAID13 with:

btrfs balance start -mconvert=raid1,soft /
btrfs balance start -dconvert=raid1,soft /

At the end of all this my /etc/fstab looks like:

# /etc/fstab: static file system information.
# Use 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>                                        <mount point>    <type>  <options>                      <dump> <pass>

## ESP @ Goodram
UUID=B33A-4C29                                          /boot/efi        vfat   noatime                              0 2

## ESP backup @ Samsung
UUID=44D2-04AD                                          /mnt/backup_efi  vfat   noatime                              0 2

## btrfs @ Goodram (in RAID1 with Samsung)
/dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc   /                btrfs  subvol=/@,noatime,compress=zstd      0 0
/dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc   /home            btrfs  subvol=/@home,noatime,compress=zstd  0 0
/dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc   /var/cache       btrfs  subvol=/@cache,noatime,compress=zstd 0 0
/dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc   /var/log         btrfs  subvol=/@log,noatime,compress=zstd   0 0

## swap @ Goodram
/dev/mapper/luks-2d90af35-7e6a-40f8-8353-f20433d0f994   swap             swap   defaults                             0 0

## swap @ Samsung
/dev/mapper/luks-2e91342f-3d19-4f75-a9a6-fc3f9798cb30   swap             swap   defaults                             0 0

## tmpfs
tmpfs                                                   /tmp             tmpfs  noatime,mode=1777                    0 0

And with that my Btrfs RAID1 was basically done. 😌

There were some smart things to do still …

Automate Btrfs maintenance

According to the Btrfs documentation:

[Btrfs scrub is an] online filesystem checking tool. Reads all the data and metadata on the filesystem and uses checksums and the duplicate copies from RAID storage to identify and repair any corrupt data.

Which is one of the main reasons I embarked on this convoluted set-up adventure 😅

After consulting the Arch Wiki: Btrfs and the gurus on #btrfs IRC channel, it turns out I only needed to enable systemctl enable btrfs-scrub@-.timer.

The wiki says that @- equals the / mount point, @home equals /home mount point, etc., which suggests one should scrub each of the subvolumes / mount points.

But it turns out (at least the way I have things up) scrubbing / (i.e. @-) is perfectly enough, as it scrubs the whole device(s) anyway.

Re-introduce the “reserve tank”

Since I was resizing the original Btrfs partition, I wanted to re-introduce the “reserve tank”.

Measure twice, cut once!

If you did not mess up things like I did, you probably just need to do it for the new device.

Check how much Device slack you have on each device, before you do this. And if you are low on Device unallocated, run btrfs balance first.

In my case I started with 0 Bytes of Device slack, as sudo btrfs filesystem usage / -T shows:

Overall:
    Device size:                   1.78TiB
    Device allocated:            120.06GiB
    Device unallocated:            1.67TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        114.63GiB
    Free (estimated):            864.45GiB      (min: 854.45GiB)
    Free (statfs, df):           862.56GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              160.89MiB      (used: 0.00B)
    Multiple profiles:                  no

                                                         Data     Metadata System
Id Path                                                  RAID1    RAID1    RAID1    Unallocated Total     Slack
-- ----------------------------------------------------- -------- -------- -------- ----------- --------- -----
 1 /dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc 58.00GiB  2.00GiB 32.00MiB   864.52GiB 924.55GiB 0.00B
 2 /dev/mapper/luks-c82fca05-59d3-4595-969b-c1c4124d8559 58.00GiB  2.00GiB 32.00MiB   860.74GiB 920.77GiB 0.00B
-- ----------------------------------------------------- -------- -------- -------- ----------- --------- -----
   Total                                                 58.00GiB  2.00GiB 32.00MiB     1.67TiB   1.78TiB 0.00B
   Used                                                  56.18GiB  1.14GiB 16.00KiB

To add some slack / “reserve tank” to the Btrfs file system, I had to run:

sudo btrfs filesystem resize 1:-10G /
sudo btrfs filesystem resize 2:-10G /

The first command reduced the file system on the device ID 1 by 10 GiB, the second one reduced it on device ID 2.

As a result, I ended up 20 GiB of Device slack, 10 GiB on each drive, as sudo btrfs filesystem usage / -T shows:

Overall:
    Device size:                   1.78TiB
    Device allocated:            120.06GiB
    Device unallocated:            1.67TiB
    Device missing:                  0.00B
    Device slack:                 20.00GiB
    Used:                        114.63GiB
    Free (estimated):            854.45GiB      (min: 854.45GiB)
    Free (statfs, df):           852.56GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              160.89MiB      (used: 0.00B)
    Multiple profiles:                  no

                                                         Data     Metadata System
Id Path                                                  RAID1    RAID1    RAID1    Unallocated Total     Slack
-- ----------------------------------------------------- -------- -------- -------- ----------- --------- --------
 1 /dev/mapper/luks-1a45a072-e9ed-4416-ac7e-04b69f11a9cc 58.00GiB  2.00GiB 32.00MiB   854.52GiB 914.55GiB 10.00GiB
 2 /dev/mapper/luks-c82fca05-59d3-4595-969b-c1c4124d8559 58.00GiB  2.00GiB 32.00MiB   850.74GiB 910.77GiB 10.00GiB
-- ----------------------------------------------------- -------- -------- -------- ----------- --------- --------
   Total                                                 58.00GiB  2.00GiB 32.00MiB     1.67TiB   1.78TiB 20.00GiB
   Used                                                  56.18GiB  1.14GiB 16.00KiB

How to restore from a failed drive

This is more a note to future self.

When one of the drives dies:

  1. turn off laptop
  2. physically remove the faulty drive
  3. turn laptop back on and during boot mount the remaining drive as “degraded”
  4. buy new drive (use laptop normally in the meantime)
  5. when it arrives, turn off laptop
  6. put in replacement drive
  7. turn laptop back on and run btrfs replace

That is assuming you do not have a spare at hand. If you have it, just skip steps 3-5.

Replacing a dead drive in the Btrfs RAID

The internet seems full of messages that once a drive in Btrfs RAID dies, you can mount it as read-write only once and never again.

The Btrfs gurus on #btrfs IRC channel say that this was a bug and it was fixed several years ago (someone mentioned 6 years ago). Nowadays the btrfs replace command works as one would expect.

Create fallback ESP

So, with that I should be well equipped for when one of the drives dies.

But wait! There is an important part missing!

I cannot boot if ESP is also dead.

Remember the /mnt/backup_efi? Now it is time to make use of it.

Making sure the backup ESP includes everything, takes just a simple:

rsync --archive --delete /boot/efi/ /mnt/backup_efi

And to make sure this happens regularly enough, I decide to create a systemd service that triggers rsync every time I reboot or shutdown my computer.

For that I put into /etc/systemd/system/sync-efi.service the following:

[Unit]
Description=Sync EFI partitions
DefaultDependencies=no
Before=shutdown.target

[Service]
Type=oneshot
ExecStart=/usr/bin/rsync --archive --delete /boot/efi/ /mnt/backup_efi
TimeoutStartSec=0

[Install]
WantedBy=shutdown.target

Of course, the service unit should be enabled too:

systemctl enable sync-efi.service

hook out → well that was a rollercoaster ride


  1. … and perhaps GRUB for being a bit hard and weird to set up in such use cases. 

  2. The emergency shell was quite a pain, as it did not even have a text editor. So we had to be creative. 

  3. The Btrfs gurus suggested using soft in order to avoid re-balancing any new block groups that are created with. 

Recently, I’ve stumbled across some behavior of C++ lambda captures that has, initially, made absolutely no sense to me. Apparently, I wasn’t alone with this, because it has resulted in a memory leak in QtFuture::whenAll() and QtFuture::whenAny() (now fixed; more on that further down).

I find the corner cases of C++ quite interesting, so I wanted to share this. Luckily, we can discuss this without getting knee-deep into the internals of QtFuture. So, without further ado:

Time for an example

Consider this (godbolt):

#include <iostream>
#include <functional>
#include <memory>
#include <cassert>
#include <vector>

struct Job
{
    template<class T>
    Job(T &&func) : func(std::forward<T>(func)) {}

    void run() { func(); hasRun = true; }

    std::function<void()> func;
    bool hasRun = false;
};

std::vector<Job> jobs;

template<class T>
void enqueueJob(T &&func)
{
    jobs.emplace_back([func=std::forward<T>(func)]() mutable {
        std::cout << "Starting job..." << std::endl;
        // Move func to ensure that it is destroyed after running
        auto fn = std::move(func);
        fn();
        std::cout << "Job finished." << std::endl;
    });
}

int main()
{
    struct Data {};
    std::weak_ptr<Data> observer;
    {
        auto context = std::make_shared<Data>();
        observer = context;
        enqueueJob([context] {
            std::cout << "Running..." << std::endl;
        });
    }
    for (auto &job : jobs) {
        job.run();
    }
    assert((observer.use_count() == 0) 
                && "There's still shared data left!");
}

Output:

Starting job...
Running...
Job finished.

The code is fairly straight forward. There’s a list of jobs to which we can be append with enqueueJob(). enqueueJob() wraps the passed callable with some debug output and ensures that it is destroyed after calling it. The Job objects themselves are kept around a little longer; we can imagine doing something with them, even though the jobs have already been run.
In main(), we enqueue a job that captures some shared state Data, run all jobs, and finally assert that the shared Data has been destroyed. So far, so good.

Now you might have some issues with the code. Apart from the structure, which, arguably, is a little forced, you might think “context is never modified, so it should be const!”. And you’re right, that would be better. So let’s change it (godbolt):

--- old
+++ new
@@ -34,7 +34,7 @@
     struct Data {};
     std::weak_ptr<Data> observer;
     {
-        auto context = std::make_shared<Data>();
+        const auto context = std::make_shared<Data>();
         observer = context;
         enqueueJob([context] {
             std::cout << "Running..." << std::endl;

Looks like a trivial change, right? But when we run it, the assertion fails now!

int main(): Assertion `(observer.use_count() == 0) && "There's still shared data left!"' failed.

How can this be? We’ve just declared a variable const that isn’t even used once! This does not seem to make any sense.
But it gets better: we can fix this by adding what looks like a no-op (godbolt):

--- old
+++ new
@@ -34,9 +34,9 @@
     struct Data {};
     std::weak_ptr<Data> observer;
     {
-        auto context = std::make_shared<Data>();
+        const auto context = std::make_shared<Data>();
         observer = context;
-        enqueueJob([context] {
+        enqueueJob([context=context] {
             std::cout << "Running..." << std::endl;
         });
     }

Wait, what? We just have to tell the compiler that we really want to capture context by the name context – and then it will correctly destroy the shared data? Would this be an application for the really keyword? Whatever it is, it works; you can check it on godbolt yourself.

When I first stumbled across this behavior, I just couldn’t wrap my head around it. I was about to think “compiler bug”, as unlikely as that may be. But GCC and Clang both behave like this, so it’s pretty much guaranteed not to be a compiler bug.

So, after combing through the interwebs, I’ve found this StackOverflow answer that gives the right hint: [context] is not the same as [context=context]! The latter drops cv qualifiers while the former does not! Quoting cppreference.com:

Those data members that correspond to captures without initializers are direct-initialized when the lambda-expression is evaluated. Those that correspond to captures with initializers are initialized as the initializer requires (could be copy- or direct-initialization). If an array is captured, array elements are direct-initialized in increasing index order. The order in which the data members are initialized is the order in which they are declared (which is unspecified).

https://en.cppreference.com/w/cpp/language/lambda

So [context] will direct-initialize the corresponding data member, whereas [context=context] (in this case) does copy-initialization! In terms of code this means:

  • [context] is equivalent to decltype(context) captured_context{context};, i.e. const std::shared_ptr<Data> captured_context{context};
  • [context=context] is equivalent to auto capture_context = context;, i.e. std::shared_ptr<Data> captured_context = context;

Good, so writing [context=context] actually drops the const qualifier on the captured variable! Thus, for the lambda, it is equivalent to not having written it in the first place and using direct-initialization.

But why does this even matter? Why do we leak references to the shared_ptr<Data> if the captured variable is const? We only ever std::move() or std::forward() the lambda, right up to the place where we invoke it. After that, it goes out of scope, and all captures should be destroyed as well. Right?

Nearly. Let’s think about the compiler generates for us when we write a lambda. For the direct-initialization capture (i.e. [context]() {}), the compiler roughly generates something like this:

struct lambda
{
    const std::shared_ptr<Data> context;
    // ...
};

This is what we want to to std::move() around. But it contains a const data member, and that cannot be moved from (it’s const after all)! So even with std::move(), there’s still a part of the lambda that lingers, keeping a reference to context. In the example above, the lingering part is in func, the capture of the wrapper lambda created in enqueueJob(). We move from func to ensure that all captures are destroyed when the it goes out of scope. But for the const std::shared_ptr<Data> context, which is hidden inside func, this does not work. It keeps holding the reference. The wrapper lambda itself would have to be destroyed for the reference count to drop to zero.
However, we keep the already-finished jobs around, so this never happens. The assertion fails.

How does this matter for Qt?

QtFuture::whenAll() and whenAny() create a shared_ptr to a Context struct and capture that in two lambdas used as continuations on a QFuture. Upon completion, the Context stores a reference to the QFuture. Similar to what we have seen above, continuations attached to QFuture are also wrapped by another lambda before being stored. When invoked, the “inner” lambda is supposed to be destroyed, while the outer (wrapper) one is kept alive.

In contrast to our example, the QFuture situation had created an actual memory leak, though (QTBUG-116731): The “inner” continuation references the Context, which references the QFuture, which again references the continuation lambda, referencing the Context. The “inner” continuation could not be std::move()d and destroyed after invocation, because the std::shared_ptr data member was const. This had created a reference cycle, leaking memory. I’ve also cooked this more complex case down to a small example (godbolt).

The patch for all of this is very small. As in the example, it simply consists of making the capture [context=context]. It’s included in the upcoming Qt 6.6.0.

Bottom line

I seriously didn’t expect there to be these differences in initialization of by-value lambda captures. Why doesn’t [context] alone also do direct- or copy-initialization, i.e. be exactly the same as [context=context]? That would be the sane thing to do, I think. I guess there is some reasoning for this; but I couldn’t find it (yet). It probably also doesn’t make a difference in the vast majority of cases.

In any case, I liked hunting this one down and getting to know another one of those dark corners of the C++ spec. So it’s not all bad 😉.

Today I was doing some experiments with qmllint hoping it would help us make QML code more robust.


I created a very simple test which is basically a single QML file that creates an instance of an object I've created from C++.


But when running qmllint via the all_qmllint  target it tells me


Warning: Main.qml:14:9: No type found for property "model". This may be due to a missing import statement or incomplete qmltypes files. [missing-type]
        model: null
        ^^^^^
Warning: Main.qml:14:16: Cannot assign literal of type null to QAbstractItemModel [incompatible-type]
        model: null
               ^^^^
 

Which is a relatively confusing error, since it first says that it doesn't know what the model property is, but then says "the model property is an QAbstractItemModel and you can't assign null to it"


Here the full code https://bugreports.qt.io/secure/attachment/146411/untitled1.zip in case you want to fully reproduce but first some samples of what i think it's important


QML FILE

import QtQuick
import QtQuick.Window

import untitled1 // This is the name of my import

Window {
    // things     
    ObjectWithModel {
        model: null
    }
}
 

HEADER FILE (there's nothing interesting in the cpp file)

#pragma once

#include <QtQmlIntegration>
#include <QAbstractItemModel>
#include <QObject>

class ObjectWithModel : public QObject {
    Q_OBJECT
    QML_ELEMENT  
  
    Q_PROPERTY(QAbstractItemModel* model READ model WRITE setModel NOTIFY modelChanged)

public:
    explicit ObjectWithModel(QObject* parent = nullptr);  

    AbstractItemModel* model() const;
    void setModel(QAbstractItemModel* model);

signals:
    void modelChanged();

private:
    QAbstractItemModel* mModel  = nullptr;
};

CMAKE FILE

cmake_minimum_required(VERSION 3.16)
project(untitled1 VERSION 0.1 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(Qt6 6.4 REQUIRED COMPONENTS Quick)
qt_standard_project_setup()

qt_add_executable(appuntitled1 main.cpp)

qt_add_qml_module(appuntitled1
    URI untitled1 VERSION 1.0
    QML_FILES Main.qml
    SOURCES ObjectWithModel.h ObjectWithModel.cpp
)

target_link_libraries(appuntitled1 PRIVATE Qt6::Quick)  
 

As you can see it's quite simple and, as far as I know, using the recommended way of setting up a QML module when using a standalone app.

 

But maybe I am holding it wrong?

I might be busy early next month, so I’m posting this a few days early so I get it out of the way! I managed to do a lot of big stuff this month, and pretty happy with my pace. I still have way too many open MRs though, I really need to get that sorted.

Sorry about the shoddiness of some of the screenshots. We are the midst of our Qt6 transition, and sometimes my Breeze is broken and fell back to a built-in Qt theme. I promise it won’t look that ugly in a couple of months!

Plasma

I redid the Accessibility KCM to make it look a bit nicer, by using the newer sidebar view we use in other KCMs. This still needs some time in the oven, though.

The “new” Accessibility KCM!

The kaccess daemon now reloads it’s config files properly, causing odd behavior like the screen reader never turning off.

Tokodon

The Send button in the composer now changes depending on why you opened it.. This is an easy way to confirm you’re resending, editing and so on.

Screenshot of the new button behavior when editing a post.

I implemented a lot of UX improvements for the profile page. It’s way harder to mess up the timeline state by clicking through tabs too quickly. Oh yeah, and that’s fixed for the notification page filters too.

The settings are overhauled and using the new CategorizedSettings component which will make it easier to add more. This has already made space for granular per-account notification controls!

The new settings page. Better notification controls!

High character count posters rejoice, as the status composer is now usable for you! This will also appear in 23.08, so you don’t have to wait until the next major release.

The status composer now scrolls if it needs to.

The alignment of the top row of buttons in posts is ever so slightly fixed now so it looks prettier, and has better clickable areas.

BeforeAfter
imageimage

I ported the whole application to Qt6 declarative type registration, and other niceties. This doesn’t mean anything for users, but Tokodon should be a bit faster.

If you were ever frustrated with logging into Tokodon, fear not as in the next release the entire process is redone. I rewrote the entire UX to be way more user friendly, less buggy and it supports some cool features like a integrated authorization flow and registration!

The new registration page. You can’t view the server rules yet, but that will be added soon!

Tokodon will now show you a visible warning and explain why your account did not log in, instead of kicking you back to the login page like before:

An example of a login error. It’s even actionable!

Finally, a few media attachment improvements such as media attachments being blacked out if the blurhash is missing (which happens, apparently) and an “Alt” tag that shows up in the top right if the image has alt text. Saves me a click or two, and especially useful for video.

Showcase of the new chips.

NeoChat

I’m attempting to fix the lack of formatting when re-editing messages. It won’t be finished this month, but I’m close!

Two event source dialog changes, including it not showing any data and only showing the option if you have developer tools enabled.

The error message when your device encryption keys are somehow different than what’s currently in your database is now clearer, and tells you to log back in.

PlasmaTube

The sidebar is reorganized so more pages are separated as you might expect. There’s still some work to be done here.

There are more pages in the sidebar now, instead of being packed into one.

Added support for passing a video URL via the commandline.

Made sure PlasmaTube inhbits sleep like other well-behaving video applications when fullscreen.

Kirigami

Finally merged the Navigation Tab Bar page for Kirigami Gallery! It’s a simple example of this component that we use quite often on mobile.

The new section in Kirigami Gallery.

I changed the fullscreen image viewer used in NeoChat, Tokodon and more to stop listening to swipe events with the mouse and stop wrapping key navigation. For application developers, make sure you set the focus properly when opening it so key navigation works.

I fixed the FormArrow bug in Qt6 where it would point the wrong direction and thanks to Ivan Tkachenko for pointing out that we could use the existing enum in Qt. All consumers of this component have already been ported.

The CategorizedSettings component got some fixes as well, including the ability to filter out invisible actions (useful for hiding certain pages on other platforms, e.g. Android.) and fixing the stuck checked state. There’s still a lot of work to do on this component, but it’s a start!

KCoreAddons

I added a QML singleton for grabbing the applications’ KAboutData instead of it being reimplemented for every single QtQuick application. I have no idea why we didn’t do this before!

import QtQuick
import org.kde.kirigamiaddons.formcard as FormCard
import org.kde.coreaddons

FormCard.AboutPage {
    aboutData: AboutData
}

Qt

We are trying to adopt qmlformat in KDE. I spend a unreasonable amount of time fixing formatting, so it would be nice to have it automatically happen like we already use clang-format for with C++. I have managed to make some really good headway here, and squash lots of little nagging bugs we have hit. These have not been merged into Qt yet, but I hope we can get them reviewed soon. (If you have approver rights, I would appreciate if you took a look!)

I fixed a bug where qmlformat would indent call expressions twice causing weird indentation like this:

onTestChanged: {
fooBar(test, {
        // Testing
        "foo": "bar"
        });
}

qmlformat shouldn’t insert newlines in empty components and objects which we use in a lot of QML code. Normally qmlformat would format them like this, which wastes space and looks kinda ugly:

QtObject {
}

Oh yeah, and object declarations in arrays should look less insane.

If you use spaces to delineate groups of import statements, qmlformat should now try to preserve that instead of destroying it.

And two more small things: fixing the command line arguments not overriding anything and fixing the QML coding conventions documentation.

Monday, 25 September 2023

I had the hankering for tinkering the KDE application style. The default style by KDE, Breeze, is pretty nice as is, but there are small things I'd like to modify.

There's Klassy which is quite customizable and fun, but I don't really need all of the settings it has.

Then there's Kvantum which uses SVG files to create a theme, but they don't follow KDE colorschemes. And I dislike working with SVG files.

Both are brilliant for their usecases, but I wanted just Breeze with few changes.

Fork time!

Screenshot Zephyr style in action

So, I did what one has to do, forked Breeze and renamed everything Breeze related to Zephyr. I chose Zephyr because it was synonym for Breeze in Thesaurus lol. Also, it makes sure it's last in the list of the application styles, so people don't accidentally confuse it to Breeze.

Here's link to the repository: https://codeberg.org/akselmo/Zephyr

Installation help is also there, but feel free to make issue and/or merge requests for adding stuff like what packages one has to install for their distro.

Unfortunately due to the massive size of the Breeze Gitlab repo, I didn't want to flood Codeberg with the whole history. So, some of the history got lost. I have mentioned it in the readme file though.

After renaming all the things, the whole thing built and installed surprisingly easily.

I then implemented following:

  • Black outline setting, so the default outline has a black one around it.
    • Why? Idk looks cool. Not really other reason.
    • Yes, it can be disabled.
  • Traffic color icons in window deco
    • I am allergic to Apple but the traffic light concept just makes sense to me.
    • Also can be enabled or disabled
  • Customizable style frame and window deco outline colors
    • You can completely change the frame colors.
    • You can also make them invisible! No outlines, no frames! Fun!
  • Slightly rounder windows and buttons
    • At some point I will make a setting for these too, but now they're applied when the thing is built
  • Fitting Plasma style if you use the defaults Zephyr offers (mostly black outlines)
    • The plasma theme buttons do not match the application style in roundness, yet.
    • I am lazy and avoid working with SVG files as long as I can

Why

For fun! For learning! And I wanted to make something that is super close to Breeze (hell, it is Breeze, just few mods), but still has it's own charm and how I like seeing my desktop.

It also can work as a great test bench for others who want to see if they can modify application style.

Just rename anything Zephyr to YourForkNameHere and have fun. But it's probably better to fork the original Breeze project :)

Also, when making my own things for Breeze, it's nice to just implement them in something similar but different name so I can test the changes for longer period of time. And if I like the changes I can maybe show them to upstream.

In future, I will make it work with Plasma 6 (unless i feel lazy). Probably will have to fork Breeze then again and apply my changes. Hopefully it's not too big of a change.

Also, I will be working on the actual Breeze in future too! I hope to implement separator colors for the Plasma colorscheme, so basically you can change the color of all frames and outlines and whatnot. This kinda helped me to figure how that works as well!

All in all, good project, I keep tinkering with it and it helps me understand the Breeze styling and Qt in general more.

Revontuli and Zephyr

My colorscheme Revontuli works really well together with Zephyr. So, feel free to give them a go!

Thanks for reading as usual!

Sunday, 24 September 2023

On Thursday and Friday evenings, I went to the Matrix Community Summit at C-Base in Berlin with Tobias. It was the occasion to meet a few other Matrix developers particularly the Nheko developer, MTRNord and a few other devs whom I only knew by nickname. It was great even though I could only spend a few hours there. Tobias stayed longer and will be able to blog more about the event.

Photo of the C-Base showing a lot of electronical equipements
Photo of the C-Base showing a lot of electronical equipements

During the weekend, instead of going to the Matrix summit, I participated to the KDE Promo sprint with Paul, Aniqa, Niccolo, Volker, Joseph. Aron also joined us via video call on Saturday. This event was also in Berlin at the KDAB officem which we are very thankful for hosting us.

This sprint was the perfect occasion to move forward with many of our pending tasks. I mainly worked on web-related projects as I tried to work on a few items on my large todo list.

We now have an updated donation page, which includes the new donnorbox widget. Donnorboy is now our preferred way to make recurring donations and recurring donations are vital to the success of KDE. Check it out!

Screenshot of the website KDE.org/community/donations
Screenshot of the website KDE.org/community/donations

With Paul, we also looked at the next KDE For-pages. Two of them are now done and we will publish them in the coming weeks. There are plans for a few more and if you want to get involved there, this is the phabricator task to follow.

I also updated the KDE For Kids with the help of Aniqa. It now features the book Ada & Zangemann from Matthias Kirschner and Sandra Brandstätter that sensibilise kids to Free Software. Let me know if you have other books suggestions for kids around Free Software and KDE that we can include on our websites.

This was only a short version of all the things we did during this sprint, I will let the others blog about what they did. More blog posts will certainly pop up on planet.kde.org soon.

The sprint would have been only possible thanks to the generous donation from our users, so consider making a donation today! Your donation also helps to pay for the cost of hosting conferences, server infrastructure, and maintain KDE software.

Wednesday, 20 September 2023

Hello and welcome back to my blog! In this blog, I will be detailing the work I've done over the second coding period of my 2023 GSoC journey with KDE. Let's dive in!

Work done

Over the second coding period, I accomplished the following tasks:

Fix Okular Icon Rendering

Okular's Android app was not rendering the icons at all. At first, I thought that it might be an issue with the icons not being packaged. I tested out this theory, by exploring the contents of Okular's apk file. Inside the APK was a .qrc file, which I decompiled using a tool recommended to me in the KDE Android matrix room. Upon exploring the contents of the .qrc file, I found that all the icons were already packaged there. If I used the icons with their full paths in the qrc file instead of their freedesktop name (i.e. qrc:/path/to/document-open.svg instead of just document-open), the icon would show up properly in the app. I concluded that the issue was not due to faulty packaging of icons, but Kirigami not being able to find the proper icons.

Over the next week, I spent a large amount of time desperately trying to pick apart Okular and find the faulty bit of code that was causing the icons not to display. As I grew increasingly frustrated, I turned to the KDE android matrix room to ask for help.

They suggested that the KIconThemes library was causing issues just by being present, due to this line of code that executes any time the Okular app started up. What it does is, it automatically configures the app to use the breeze theme as a fallback. This was interfering with Kirigami's icon-finding mechanism, causing the icons to not show up.

Initially, I just removed KIconThemes as a dependency from Okular's CMakeLists.txt, but that didn't work. A couple of days later, several developers from the KDE Android matrix room pointed out that even though KIconThemes was removed, it was still being packaged with Okular by craft. They suggested that I wrap the offending code within a conditional compilation block so that Android would not execute it automatically.

I followed their advice, and it worked like a charm - Okular was finally displaying the icons properly.

I then created a Merge Request at the KIconThemes repository to implement the fix for the icons issue.

Peruse, another KDE application, was also having similar issues with icons as Okular. The fix also helped it to render icons properly.

Package Okular Icon

After fixing icon rendering, I added the Okular icon to the CMakeLists.txt for Okular's Android version, which would package it along with the other icons. This was needed because two places in the Okular app use icons:

  • The global drawer

  • The About Page

These places were empty when they should've been displaying the Okular logo, as can be seen here:

After adding the Okular icon to the kirigami_package_breeze_icons() array, you can now see the Okular icon in its glory in the global drawer and about page.

However, I did run into some footgun moments with craft while performing this task. This is because to test changes in Okular, I had always run the following craft commands to build and package it as an APK:

craft --compile okularcraft --package okular

However, icons are a special case. If you add a new icon, it has to be installed and qmerged as well. You do that by:

craft --compile --install --qmerge okularcraft --package okular

I spent a couple days trying to figure out what was wrong due to this small mistake. I only realized what was happening when I took a closer look at the output when freshly installing Okular using craft -i okular.

The members of the KDE Android Matrix room suggested that I edit the wiki page for Craft so that other people can avoid being plagued by this small mistake.

Moving from Kirigami.AboutPage to MobileForm.AboutPage

While adding the Okular icon to the app, I also came across MobileForm.AboutPage, which is an experimental feature from Kirigami Addons that changes the layout of the About page from Kirigami.AboutPage. I asked my mentor if we should move to MobileForm.AboutPage since it looks better and more modern, and he agreed. So I replaced Kirigami.AboutPage with MobileForm.AboutPage in this merge request.

Here's how the Kirigami.AboutPage looks in Okular:

Here's how the new MobileForm.AboutPage looks in Okular:

Porting to FormCard from MobileForm

Shortly after I had moved Okular to using MobileForm.AboutPage, Kirigami Addons renamed MobileForm to FormCard. The MobileForm QML import would still be available but it would not receive new features. The new FormCard will receive new improvements, so it is advised to switch to it. This blog post by Carl Schwan explains it in more detail. So I ported Okular to the new FormCard API in this merge request.

FormCard.AboutPage has a few subtle differences from MobileForm.AboutPage. Here's how it looks like:

Fix qtbase Crash/Hang in Okular

Sometime around the start of the first coding period, I noticed that Okular on my phone would freeze on a black screen when attempting to open PDFs in it. I initially worked around this running Okular in the Android emulator packaged with Android Studio.

However, since I had some more free time during the second coding period, my mentor and I decided to tackle this issue. The issue seemed to be device specific, and only a couple of the phones I tested Okular on had this issue.

I distinctly remembered that an older version of the Okular apk was running properly on my device, without freezing/crashing to a black screen whenever I attempted to open a PDF. Luckily I was able to find an old APK from my phone backups, and confirm this.

This older APK was v22.12.3 of Okular, with Qt 5.15.8, and KDE Frameworks 103.0. The newer APK with the black screen issue was release/23.04 of Okular, with Qt 5.15.10 and KDE Frameworks 108.0.

My mentor suggested downgrading Okular to version 22.12.3 and testing it. The app still crashed, so my mentor suggested that Qt itself might be causing issues, and so I compiled Qt 5.15.8 in Craft and testing Okular by building against it. It worked fine, and opened PDFs properly. After this I did the same with Qt 5.15.9, which displayed the black screen issue in Okular.

So far we had deduced that the issue occurred sometime between Qt version 5.15.8, and 5.15.9. To narrow down the search, my mentor then shared a list of commits which were applied between Qt 5.15.8 and 5.15.9, and were related to Android.

Immediately I noticed that when running Okular on Android, adb logcat would have some warning messages from Qt A11Y. During testing, I had also noticed that in the tombstone file generated by Okular crashing during the black screen, the stacktrace indicated that Qt was doing some threading-related stuff and also executing Qt A11Y code.

In the list of commits shared by my mentor, you will notice that commit 513fcf0d2ad3328830dbf73dc2a55ad1487393c0 deals with Qt A11Y and threading both, so this commit stood out to me. My suspicions turned out to be correct - when I checked out the commit in git, Okular was freezing/crashing as usual, but when I checkoud out the commit immediately before it, Okular was working fine.

I shared this information with my mentor, and he looked up some commits from qt6 that seemed related:

  • https://invent.kde.org/qt/qt/qtbase/-/commit/b8a95275440b8a143ee648466fd8b5401ee1e839

  • https://invent.kde.org/qt/qt/qtbase/-/commit/ac984bd8768b3d7e6439e0ffd98fd8b53e16b922

  • https://invent.kde.org/qt/qt/qtbase/-/commit/b832a5ac72c6015b6509d60b75b2ce5d5e570800

To test these patches, I backported them using git cherry-pick, rebuilt Qt, and the rebuilt Okular.

After a bit of testing, I saw that the commit ac984bd8768b3d7e6439e0ffd98fd8b53e16b922 completely solved the issue of black screening when opening a PDF on Okular. Out of the 2 devices I tested this on, both no longer displayed any issues with this commit.

My mentor suggested that I send these patches to the kde/5.15 branch of qtbase at KDE's qtbase repository by following this guide: https://community.kde.org/Qt5PatchCollection#How_do_I_get_a_patch_merged

I followed the instructions given in the page - I cherry picked the commit to the kde/5.15 branch of qtbase, and opened a merge request on the kde/5.15 branch of KDE's qtbase repository.

Challenges faced

Getting used to Craft's system:

Craft is an open source meta build system and package manager. It manages dependencies and builds libraries and applications from source, on Windows, Mac, Linux, FreeBSD and Android.

In order to work with Okular, I had to get used to Craft. As a beginner who just learnt CMake over the course of GSoC, learning to use Craft took quite a bit of effort. The first few steps were a breeze, thanks to the KDE wiki page about Craft and also because I had worked with Craft during the first coding period.

However, Craft keeps bumping up versions of software such as KDE Frameworks, Qt, etc. every once in a while. This can make it tough to test out older versions of such software. In my personal experience, I had to mess around in Craft's version.ini files for Qt5, located at /CraftRoot/etc/blueprints/locations/craft-blueprints-kde/libs/qt5.

Building Qt in craft - binutils

While trying to build applications and libraries in craft, I frequently ran into errors about the GNU binutils not being able to recognize the file format of the compiled files. I usually bypassed these by setting the $PATH variable to point to the GNU binutils built for Android, usually located at /home/user/CraftRoot/tmp/android-24-toolchain/arm-linux-androideabi/bin. Now keep in mind that I have only encountered this issue when building for arm32 devices, it could be entirely possible that it builds just fine when building for arm64 android devices.

However, that's not all - I encountered this issue again when building Qt for arm32 Android inside the Craft environment. For example, when building qtbase, I got the following error when using the default toolchain.

/opt/android-sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/arm-linux-androideabi-objcopy:/home/user/CraftRoot/build/libs/qt5/qtbase/image-MinSizeRel-5.15.9/bin/qmake: File format not recognizedCommand ['/opt/android-sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/arm-linux-androideabi-objcopy', '--only-keep-debug', '/home/user/CraftRoot/build/libs/qt5/qtbase/image-MinSizeRel-5.15.9/bin/qmake', '/home/user/CraftRoot/build/libs/qt5/qtbase/image-MinSizeRel-5.15.9-dbg/bin/qmake.debug'] failed with exit code 1Action: install for libs/qt5/qtbase:5.15.9-1 FAILED

In this case, simply setting the $PATH variable did not work, I had to improvise and use the Xamarin android toolchain as it was more up to date, and add it to $PATH inside the Craft docker container. However, this toolchain generated much larger APK sizes, jumping from a modest 56 MB to 110 MB.

Maybe in the future I could try creating a Craft package for GNU binutils for Android, which would minimize, or at least reduce the frequency of running into problems like this. No promises though! :P

Debugging in Android

This has been mentioned in my previous blog post already, but Android's way of handling permissions made debugging a headache. Initially I rooted my Android phone and used gdbserver to try and get debugging going, but it eventually ended with me giving up and resorting to staring at ADB logcat and trying to make Okular crash so that it would produce a tombstone file. While this approach did lead me to success with solving the black screen issue, it would have been nice to have some good debugging infrastructure.

Remaining Work

While this GSoC coding period was useful to fix some of Okular's issues as well as improve it a bit, there is still some remaining work that would go a long way for Okular to be considered an attractive PDF reader on Android:

Adding a "recent documents" section

Okular on desktop has a list of documents that were recently opened by the user. This allows the user to quickly re-open documents that they frequently access, and consequently speeds up their workflow as well as being much more usable. However, Okular on Android lacks this feature, and it would go a long way to make Okular a more attractive PDF viewer.

Adding the ability to jump to a specific page number

Okular on Android has no support for jumping to a specific page number of a PDF. This can be tedious for example in large documents with hundreds of pages.

Fixing bookmarks

At the moment, bookmarked pages do not show up in the bookmarks tab, as the bookmarks tab of the side drawer cannot be accessed as it is grayed out.

What now?

This GSoC journey has been an enjoyable learning experience for me. In the future, I'd like to contribute to KDE again, no matter if its through GSoC or not.

I'll most likely try to contribute to KDE and other FOSS projects whenever I have free time, and help those projects to improve, while improving my skills at the same time.

Wrapping it up

This GSoC has been a valuable and unforgettable experience into the world of Open Source Software. Not only did I get to learn the process for contributing to open-source and hone my own skills, I was also able to contribute to KDE, one of my favorite FOSS projects.

I'm thankful to my mentor Albert Astals Cid for his patience and guidance, which undoubtedly helped me overcome many of the obstacles in my journey.

I'd also grateful for the KDE Community, especially the members of the KDE Android Matrix room - they helped me to fix the Okular icon rendering, and without their help I might have still been bashing my head trying to solve it.

That's all for now. See you next time, and have a nice day! :D

Tuesday, 19 September 2023

Qt OPC UA – Data Type Code Generation

The type system of OPC UA permits the creation of complex and nested data types. With the merge of the generic struct decoding and encoding feature, the Qt OPC UA module has greatly improved the comfort of handling such types. But for large projects with lots of custom data types, its QVariant based interface might still feel a bit too complicated.

Continue reading Qt OPC UA – Data Type Code Generation at basysKom GmbH.

Saturday, 16 September 2023

Kraft (Github) is a desktop utility making it easy to create offers and invoices quickly and beautifully in small companies.

Today we are releasing Kraft Version 1.1 with significant improvements for users and the Krafts integration with latest software such as cmake and KDE.

It received updated dutch translations in UI and also for the manual. The application icon was fixed, and some cmake related fixes were done that make Kraft working with different versions of Akonadi that are available on different distributions.

Macros

For users, two significant improvements are included: The header- and footer texts of the documents now may contain macros that support automatic computing of values such as dates that depend on the document date. With that, it is for example easy to have for example a payment date printed out on the document, that is ten days later than the document date.

There are even more interesting macros, stay tuned for a separate post about this feature.

Insert Templates Button

The second new feature is a new button that allows to insert templates for the header- or footer text at the cursor position. Before it was only possible to replace the entire text with a template. This will give users way more flexibility how to structure template texts.

In parallel to these improvements, work is also going on in a branch for Kraft 2.0 which will enable more collaborative functions for Kraft.