Skip to content

Saturday, 23 July 2022

This is going to be a rather short blog post, but I think it’s still worth mentioning. Since 5.26, kwin will support only one way of setting up X screens – Xinerama, multi-head won’t be supported anymore. However, despite how “setup-breaking” it may sound, this will most likely not affect you as you probably already use Xinerama.

Before diving any deeper, it’s worth providing you some background. On X11, there are several ways how you could configure your desktop environment to run with multiple monitors – multi-head and Xinerama.

Multi-head is an old school way to run multiple monitors. Basically, with that mode, there’s an X screen per monitor. In Xinerama mode, there’s only one virtual X screen per all outputs. Both modes have their advantages and disadvantages, for example you can’t freely move windows between screens when using multi-head, etc. Xinerama is younger than multi-head and it provides the most user friendly workflow on multi-screen setups, so it’s usually enabled by default in all Linux distributions and many desktop environments are optimized for running in this mode, including Plasma.

Technically, kwin does provide support for both multi-head and Xinerama. But multi-head support has been in neglected and unmaintained state for many many years, e.g. some code (primarily, old code) supports multi-head, while lots of other code (mostly, new code) does not, various system settings modules and plasmashell components do not support multi-head either, etc. It’s also safe to say that no kwin developer has ever tested multi-head within last 5+ years.

So, rather than keep advertising the support for a feature that we don’t maintain and have no plans to fix it, we decided to drop the support for multi-head mode and make Xinerama a hard requirement since 5.26.

FAQ

Does this mean that Plasma won’t support multiple monitors anymore?

No, Plasma will continue supporting setups with multiple monitors, but you will need to ensure that Xinerama is used, which is usually the case and you don’t need to tweak anything.

I used multi-head for some esoteric thing, what should I do now?

It’s highly recommended to give the Wayland session a try. If something’s missing, file a bug report or contact us at https://webchat.kde.org/#/room/#kwin:kde.org.

Here's something interesting thing I stubled across today. (more like, struggled with for a day and half) This might be something obvious to regular Qt programmers, but I'm not a professional.

In my last post I told how Tobias helped me get cacheSpaceHierarchy() function working as expected. The fix was easy. That doesn't mean I'll take half a minute to code and commit.

cacheSpaceHierarchy() was being called correctly as expected. But now this function stopped functioning. It worked until two days back when it was being called at the wrong time. But today when I call it at the right place, it decides it won't work.

Whats funny to me is that the reason behind this malfunction was similar to the problem last week.

I used the following snipped to wait for a room to load and then populate its room list if its a space.

connect(neoChatRoom, &Room::baseStateLoaded, neoChatRoom, [this, neoChatRoom, connection]() {
    if (neoChatRoom->isSpace()) {
        this->populateSpaceHierarchy(connection, neoChatRoom->id(), false);
    }
});

The issue, you ask? &Room::baseStateLoaded signal is already been fired before this connection is made and as a result, the inner block is never executed.

Solution?

Just check if relevant data is loaded. If yes, use that data. If not, then set up the connection and wait for the data to be loaded.

if (neoChatRoom->isSpace()) {
    this->populateSpaceHierarchy(connection, neoChatRoom->id(), false);
} else {
    connect(neoChatRoom, &Room::baseStateLoaded, neoChatRoom, [this, neoChatRoom, connection]() {
        if (neoChatRoom->isSpace()) {
            this->populateSpaceHierarchy(connection, neoChatRoom->id(), false);
        }
    });
}

This works, because neoChatRoom->isSpace() can be true only if room's base state is loaded.

There is one potential issue though. This will set up a good number of unused connections. I'll have to discuss it with my mentors and get their views if there's a better way to accomplish this goal.

Hi everyone! I understand it’s been a long time, and I’m so excited to be writing this blog today. In today’s blog, I wanted to talk about my journey (so far) on contributing to Bismuth (a KDE’s Tiling Window Manager Extension), mainly how and why I started, and where I am right now.

The Story: Why KDE Plasma and Why Bismuth?

For the last few months (close to a year), I’ve been using Pop OS (a linux distribution by System 76) which had this amazing automatic tiling window extension called pop-shell, and it was close to what I always needed:

Friday, 22 July 2022

Let’s go for my web review for the week 2022-29. This is a very large triple issue since I was mostly away from the keyboard lately. That summer break was good. :-)


The Home Computer Generation | datagubbe.se

Tags: tech, culture, hacking

An excellent piece which raises interesting questions about computer literacy. There’s indeed be a generation of people before the so called “digital natives” who had to know how computers work. Are we loosing this by cuddling people with too much convenience? How much are we then loosing as a society?

https://www.datagubbe.se/hcg/


Uber broke laws, duped police and secretly lobbied governments, leak reveals | Uber | The Guardian

Tags: tech, uber, ethics

Disgusting practices… unlawful too… this company is really an overgrown parasite.

https://www.theguardian.com/news/2022/jul/10/uber-files-leak-reveals-global-lobbying-campaign


The Uber Leak Exposes the Global War on Workers

Tags: tech, uber, ethics, business

I don’t understand this “it is illegal and bad for workers” leading to “let’s change the law to make it happen anyway”. Change your business model already!

https://tribunemag.co.uk/2022/07/uber-files-leak-gig-economy


Driverless Robotaxi Fleet Paralyzed for Hours in San Francisco – The Last Driver License Holder…

Tags: tech, funny, ai, automotive

Mysterious event… that’s the problem with such centralized and homogeneous system, when something fails it is quickly at scale.

https://thelastdriverlicenseholder.com/2022/06/29/driverless-robotaxi-fleet-paralyzed-for-hours-in-san-francisco/


An experiment to test GitHub Copilot’s legality

Tags: tech, machine-learning, copyright, licensing, github

Very interesting thought experiment around Copilot’s legality. I’d love to see that happen and see what the outcome would be.

https://seirdy.one/posts/2022/07/01/experiment-copilot-legality/


Microsoft To Ban Commercial Open Source from App Store

Tags: tech, microsoft, foss

Microsoft being its usual self… now trying to make sure Free Software developers can’t make money through their app store. They claim they like Open Source but only when that’s about getting a benefit for them on platforms they control.

https://sfconservancy.org/blog/2022/jul/07/microsoft-bans-commerical-open-source-in-app-store/


mjg59 | Lenovo shipping new laptops that only boot Windows by default

Tags: tech, microsoft, lenovo, linux, vendor-lockin

Unsurprisingly, this might be the first device with such defaults… probably more to come until the settings disappear completely. Not cool.

https://mjg59.dreamwidth.org/59931.html


Unity is merging with a company who made a malware installer

Tags: tech, unity, 3d, surveillance

Clearly a worrying move from Unity… it wasn’t great before but now the level of trust one can have for games made with that engine will be even lower.

https://www.pcgamer.com/unity-is-merging-with-a-company-who-made-a-malware-installer/


It’s time to make that indie C# game in Godot.

Tags: tech, 3d, foss, godot

Let’s hope we’ll indeed see more indie game creators moving to Godot, it’s a neat engine.

https://jolexxa.medium.com/its-time-to-make-that-indie-c-game-in-godot-cea383151470


I’ve started using Mozilla Firefox and now I can never go back to Google Chrome

Tags: tech, browser, privacy, firefox

Still important to have it around and alive.

https://www.techradar.com/in/features/ive-started-using-mozilla-firefox-and-now-i-can-never-go-back-to-google-chrome


Don’t Lie To Me About Web 2.0

Tags: tech, web, web3, blockchain, social-media

Let’s get the historical records straight indeed. Don’t believe the web3 bullshit revisionism.

https://accordion-druid.tumblr.com/post/685175656750972928/dont-lie-to-me-about-web-20


Generate RSS feeds from websites

Tags: tech, rss

Looks like a nice tool to handle websites lacking a RSS feed.

https://github.com/wezm/rsspls


Thoughts on RSS

Tags: tech, web, rss

As much as I like RSS, it has indeed a few issues. It’s important to keep them in mind.

https://matt-rickard.com/thoughts-on-rss/


SourceHut is committed to making IRC better

Tags: tech, irc

I admit I secretly wish for an IRC revival… been using lesser solutions too much for my taste.

https://sourcehut.org/blog/2022-07-06-sourcehut-and-irc/


void versus [[noreturn]]

Tags: tech, c++

If you’re still confused about [[noreturn]] this is a good short read. Indeed it’s a bit annoying that it is not part of the type system.

https://quuxplusone.github.io/blog/2022/06/29/that-undiscovered-country/


The Windows malloc() Implementation Is A Trash Fire

Tags: tech, windows, memory, system

OK, it’s 2022 and this is still not an adequate ecosystem for system programming.

https://erikmcclure.com/blog/windows-malloc-implementation-is-a-trash-fire/


What happens when you press a key in your terminal?

Tags: tech, unix, terminal, command-line

Very neat trip back in history. Ever wondered what happened in your terminal? This explains it well.

https://jvns.ca/blog/2022/07/20/pseudoterminals/


DNS Esoterica - Why you can’t dig Switzerland

Tags: tech, dns, history, surprising

OK, now that’s a surprising bit of DNS history.

https://shkspr.mobi/blog/2022/07/dns-esoterica-why-you-cant-dig-switzerland/


Postgres Full-Text Search: A Search Engine in a Database

Tags: tech, databases, postgresql, search

Neat little introduction to Postgres full-text search facilities. Too often overlooked, you can wait before pulling another dependency like Elasticsearch.

https://www.crunchydata.com/blog/postgres-full-text-search-a-search-engine-in-a-database


Advantages of not using Spring Data and Hibernate with relational data

Tags: tech, spring, orm, databases, java

This shows quite well why I stay away from Spring Data JPA…

https://itnext.io/advantages-of-not-using-spring-data-and-hibernate-with-relational-data-8a509faf0c48?gi=5b6e2327b21b


Soft Deletion Probably Isn’t Worth It — brandur.org

Tags: tech, backend, databases

Interesting points about soft deletion… its usual pattern might not be what you need in the end. The proposed alternative is interesting to keep in mind.

https://brandur.org/soft-deletion


Gradual Soundness: Lessons from Static Python

Tags: tech, python, type-systems

Interesting paper about the gradual typing experiments done around Static Python. Shows a few interesting properties. I wonder if some or most of it will find its way back to CPython.

https://programming-journal.org/2023/7/2/


Typing your way into safety

Tags: tech, python, type-systems

The developing type system in Python is really having some nice properties now. Well used, it can help quite a bit with checking an API is properly called by user code. This is nothing new to languages with stricter type systems of course.

https://dev.to/flare/typing-your-way-into-safety-4lek


The new wave of React state management

Tags: tech, frontend, react, javascript, web

Good overview on the state management offer around React. Especially interesting is how it frames the different problems one has to keep in mind to maintain state in your UI.

https://frontendmastery.com/posts/the-new-wave-of-react-state-management/


Holograms, light-leaks and how to build CSS-only shaders - Robb Owen

Tags: tech, browser, html, css

Definitely a cool trick. Not really practical yet due to the performance and differences of behavior in the various browsers. Hopefully his will get solved at some point.

https://robbowen.digital/wrote-about/css-blend-mode-shaders/


Pandas vs Polar - A look at performance

Tags: tech, data-science, python, rust, pandas, polars

Polars looks like an interesting alternative to Pandas in the industrialization phase of a data processing pipeline. The performance difference are really notable with larger volumes. I’d be interested to see how much of it is lost when using its Python API though.

https://studioterabyte.nl/en/blog/polars-vs-pandas


Unit and Integration Tests

Tags: tech, tests, craftsmanship, tdd

Likewise I’m more and more unconvinced about the unit vs integration tests distinction. It’s likely a continuum between them. I like the proposed axes for classification here. I wish they’d be a bit more orthogonal though.

https://matklad.github.io/2022/07/04/unit-and-integration-tests.html


Every programmer should care about UI design

Tags: tech, programming, design, api, tests, craftsmanship

I don’t quite subscribe to some of the terms used (even though I see the point of not calling this API). Still I think this is a very good way to approach design, it’s also why I like TDD, the tests force you to see how the code is used. If it ain’t pretty there’s a problem.

https://silverhammermba.github.io/blog/2022/07/10/ui


Carbon Language: An experimental successor to C++

Tags: tech, c++, surprising

Now this is surprising and unexpected… extremely ambitious as well. I wonder how far this will go, I like the overall idea though.

https://github.com/carbon-language/carbon-lang


Some Thoughts on Zig — Sympolymathesy, by Chris Krycho

Tags: tech, zig, rust, system, programming

In the end, this is a nice conversation about language design…

https://v5.chriskrycho.com/journal/some-thoughts-on-zig/


Metastability and Distributed Systems - Marc’s Blog

Tags: tech, distributed, bug, complexity, safety

Discusions around a fascinating and very important class of errors in distributed systems.

https://brooker.co.za/blog/2021/05/24/metastable.html


The Laws of Software Development Explain Why Creating Software Always Takes Longer Than Expected | by Ben “The Hosk” Hosking | Jul, 2022 | Medium

Tags: tech, project-management, estimates, complexity, management

It feels a bit like cumulating aphorisms and “laws” to prove the point. Still it’s nice to know them at least for the general culture.

https://thehosk.medium.com/the-laws-of-software-development-explain-why-creating-software-always-takes-longer-than-expected-7b4fcbe35cea


Planning for Project Deadlines So You Can Sleep at Night - Build the Stage

Tags: tech, project-management, estimates

Couple of interesting tips. I like how it challenges the usual mythical man-month quote. Indeed sometimes adding people might help, if the conditions are right.

https://www.buildthestage.com/planning-for-project-deadlines-so-you-can-sleep-at-night/


3 tribes of programming

Tags: tech, culture, craftsmanship

I’m not sure the boundaries are a clear as laid out in this article. That said it’s an interesting way to frame things. Also, clearly it’s at the intersection of the so called tribes that the most interesting things happen.

https://josephg.com/blog/3-tribes/


Giving a Shit as a Service

Tags: tech, services, business, craftsmanship

Definitely this, showing care is the best thing you can do in services. Otherwise you can only do a mediocre job.

https://allenpike.com/2022/giving-a-shit


Remote first, Async second

Tags: tech, management, remote-working

Very good points in there. Indeed there’s a natural tension between making and managing. You can’t schedule the day in the same way. After more remote work, indeed we’ll need more async communication.

https://www.linkedin.com/pulse/remote-first-async-second-dror-poleg/


Three part solution to leaders burnout

Tags: management, leadership

OK, unexpected introduction, still the advices are sound: teach, delegate, handle the hard cases.

https://jjude.com/leaders-burnout/


Panel interviews don’t work - Jacob Kaplan-Moss

Tags: hr, interviews

I so much agree with this. Interviews are just better one on one. Mind the stress of the candidate.

https://jacobian.org/2022/jul/8/avoid-panel-interviews/


Is your smartphone ruining your memory? A special report on the rise of ‘digital amnesia’

Tags: tech, smartphone, attention-economy

None of this looks like definitive research results on the topic. Still, there are quite a few weak signals pointing in the same direction.

https://www.theguardian.com/global/2022/jul/03/is-your-smartphone-ruining-your-memory-the-rise-of-digital-amenesia


A Bored Chinese Housewife Spent Years Falsifying Russian History on Wikipedia

Tags: tech, wikipedia, surprising

Now this is a really strange story… amazing how she managed to be so successful and stay under the radars for so long!

https://www.vice.com/en/article/pkgbwm/chinese-woman-fake-russian-history-wikipedia


Days of Rage – George Monbiot

Tags: ecology, politics

Too little too late? Let’s hope not… now it’s time to see radical changes.

https://www.monbiot.com/2022/07/19/days-of-rage/



Bye for now!

Thursday, 21 July 2022

The previous blog post introduced my project for this summer. This blog post gives an update on the work that has been done so far in the past 5 weeks, and sketches out the plans for upcoming weeks.

Basic Concepts:

Types of Permissions:

Flatpak Permissions are categorized under “shared”, “sockets”, “devices”, “features”, “filesystems”, “session bus policy”, “system bus policy” and “environment”.

Among these, the first 5 belong to the “Context” group of permissions, while the last 3 are their own groups (going by the same name).

Among these, the first 4 are of (what I term) the “simple” type: they have only an on/off value, and nothing more. They can either be turned on, or off. No further complications or specifications.

The “filesystems” permissions can either be set off, or be set to “read-only”, “read/write” (this is default when the permission is “on” but there is nothing to specify the access level) or “create”.

The Session and System Bus Policy permissions (which are just names of buses with corresponding values) can have the values “talk”, “own”, “see” and “None”. The first three mean that the permission is “on”, and the last means that the permission is “off”.

The Environment permissions can have any value that the user types, instead of having a fixed set of values to choose from.

Metadata and Overrides File Structure:

The default values for all these permissions are loaded from their InstalledRef’s metadata file. The “current” values (that is, values changed from default by the user) are located in an overrides file (that is named the same as the Flatpak app’s ID in a overrides directory, usually at ~/.local/share/flatpak/overrides).

The permission sections of the metadata file is described below:

For “simple” permissions, if the name of the permission exists with the respective category under the Context group, then the permission is granted by default. For “filesystem” permissions, if the permission name exists with the “filesystem” category under the Context group, the permission is granted by default too, and the access level is specified after a semi-colon (or not specified at all, in which case it is “read/write”). Since these permissions are under the Context group, they are specified under a “[Context]” heading.

The “bus” permissions are specified under the name of the bus policy group. Each bus is specified in its own line. The format is: busName=value If the bus name is specified under the relevant header, the bus exists. It is not set to “off”. The value specifies if the application can just talk to the bus, or own it.

The “environment” permissions are listed exactly as the bus permissions, under the “[Environment]” header. The only difference is that these may have any value, and not one from a fixed set.

To see this format for yourself, type the following command on the terminal: flatpak info -M id.of.any.flatpak.app.installed.on.your.system

Having described this context (pun intended), I proceed to track what work is done:

Work Done:

I started with loading all installed flatpak applications (in the “FlatpakRef” class), specifying their IDs, display names, versions, icons etc. This was easily done using FlatpakInstalledRef struct and related functions. Once this was done, I progressed to loading the permissions granted to each application.

To do this, the first step was to load all the permissions regardless of whether or not it is granted. Therefore, I created instances of FlatpakPermission class with names, descriptions and values for all permissions listed in the Flatpak Metadata. This is necessary because the user needs to be able to see all the permissions on the menu for them to edit. At this point, the permissions were just loaded, their values were not granted.

The next step was to load their default values. This is done by parsing the metadata file (as described in the above section). After this, current values had to be loaded. Therefore, we parse the overrides files (if it exists at all) and store the current values.

After this, I implemented the ability to set and unset the permissions. For simple permissions, this was straightforward, since the new value of the permission had to just be the opposite of the old value.

For filesystems, bus and environment permissions (hereby collectively called “complex” permissions), the user needs the ability to “edit” the permissions (that is, change their level, such as upgrade “talk” to “own” for buses or downgrade “read/write” to “read-only” for filesystems) in addition to setting/unsetting the permissions.

For complex permissions, if you set a permission to on from off, the permission is restored to the same level that it had before it was set to off. For permissions that never had their levels changed, this is the default level. For permissions that did have their level changed, this was the last change the user made before setting it off. For permissions that were never turned on (that is, those that were also off by default), the level ends up being whatever the default for that permission is (eg read/write for filesystems).

I implemented the loading of default values, current values and setting of permissions for simple permissions first, and then moved on to the complex ones. I did this because the simple ones were simple, and would give me an idea of what to do next, but this also led to some design decisions that were not accomodative of the demands of complex permissions, so I spent some time between these two refactoring.

Screenshot

Future plans:

I am presently working on implementing the apply/default buttons (currently, just setting the values changes the value for real, instead of waiting for the user to click apply), and allowing the user to set those complex permissions that are not mentioned explicitly in the metadata. Once this is done, I will begin working on the redesign of the interface. The redesign we have in mind is this:

Divide the permissions page into two parts: the “basic” section and the “advanced” section. The latter is hidden by default and can be accessed through a button labelled “Advanced…”. The basic permissions will be few and clumped together, while the advanced ones would be many categorized under their respective section names. For the advanced permissions, we will be providing some tooltips to help the user navigate them better.

Learnings:

The learning has been immense:

  1. I got better at using Qt and learned a lot more about model/view programming.
  2. I experienced and learned from the costs of writing bad code without thinking through properly. But fortunately, this gave me invaluable lessons on also how to think through properly and write code that is generic yet elegant (I am quite excited to embark on a refactoring quest after GSoC formally ends and put this learning to use:D).
  3. I had always struggled with file handling across all languages I have used, but now I am quite comfortable with it.
  4. Understood some modern C++ concepts, such as lambdas and smart pointers, which I intend to start using for all future development.
  5. Finally, I got an insight into how Linux applications work and behave, specifically flatpaks, such as the different files that the app depends on and their locations on the system.

Here, I end this blog post. Hopefully, my next blog post will be a week or two from now describing the end of the work outlined above. Until next time!

I made progress - on getting stuck in my work.

In my meet with my mentors last week, we decided I'll work on the Space home page while finalizing the merge request I had opened.

Getting Space Home Page on Stack

I wrote a new QML file, which I later realised I couldn't figure out an easy method for passing data to. So my Space home page was populated with hardcoded dummy data. (lorem ipsum 👍)

space-home

So, to view how the UI looks, I'll need to view it, right? Yes of course. But the problem here was that once I added the Spce home page on page stack on right pane of NeoChat, I couldn't close it. Neither could I open any room.

Removing the offending piece of code should fix it, right? Yes?

NO!

Something about the cache that NeoChat stores to restore state on startup was not letting me exit the Space home page. Too bad. Thankfully Tobias told me which cache file to delete and I was back up running.

Caching Space Hierarchy on Startup

I wrote a function SortFilterRoomListModel ::cacheSpaceHierarchy() that is supposed to be fired when the class constructor is called and will query /hierarchy for each space to cache the list of their child rooms. This caching will let NeoChat instantly update UI when filtering rooms. SortFilterRoomListModelt they were the logs I expected so why should I cry. Room lists were being cached.

Next problem was that I couldn't access the cached room list while filtering. The map was empty always. Why? I didn't know. I did my old school debugging i.e. sticking up print statements all over the place. Some qInfo() later, I got to realize there are not just two instances of SortFilterRoomListModel, but there are actually THREE. Nice. Whats worse is that for the third one, the cache function wasn't being called at all and this third one is what was presented to user.

The issue was too peculiar to me, and will all evidences gathered, it seemes like quite a long issue to explain over chat to my mentors. Thankfully, that was two days back; today I had a call with Tobias, who cleared the air of mystery.

Untangling Myself

Tobias told me there were indeed multiple instances of SortFilterRoomListModel which I shouldn't care about. What I should care about is that the instance shown to the user isn't working as expected. He took a look at my code and traced down the issue to something I personally wouldn't have suspected.

connect(&Controller::instance(), &Controller::activeConnectionChanged, this, [this]() {
    cacheSpaceHierarchy();
});

This is how I was calling cacheSpaceHierarchy(). Looks fine, EXCEPT that this connection is made only after the signal we're waiting for (Controller ::activeConnectionChanged) has already been fired. The signal doesn't fire again and caching function is never called. The solution was to add another call to cacheSpaceHierarchy() outside of this connection.

Such an easy fix.

About my problem with not being able to exit Space home page, Tobias suggested I try pushing page onto pageStack.layers instead of pageStack like I was doing.

Certain things are easy, but I end up complicating them for no reason.

Wednesday, 20 July 2022

Abtract

At the end of my study at Université de technologie en Compiègne, I implemented the intership as a machine learning engineer. In this intership, I have opportunity to do the reseach and intergration of one of the feature of Sinequa production : Keyword extraction. In the first phase, I gave my try on applying deep learning model to resolve the problem of keyword extraction. The main idea is to apply, reproduce the result, improve the performance of Bert Joint KPE. I extended my research in applying model on long document which is weakness of BERT base model. In the next phase of my intership, I implemented the feature of keyword extraction in Sinequa’s production, using the result of research part. After this phase, Sinequa application can use pretrained deep learning to run the task keyword extraction in all of its use-cases. This intership is a great opportunity for me to improve my deep learning research skills and integrate deep learning feature in production.

Project description

Data pipeline

Data is one of the most important element in a deep learning project. In order to get a generalized data, I collect data from various source and topic. However, as the model is a product for commercial, the source should be free for commercial. At the end, I conclude only 4 dataset : [inspec], [pubmed], [semeval17], and [kptimes]. There are also [ldkp10k] dataset which is in process to check validity. As each data’s source have its own representation, I design a data pipeline to transform from raw format to input format of the model.

First of all, I determined a pivot format for the data. This is the common format for all dataset, and it contains only the information needed for training and evaluating keyword extraction. Hence, each sample in pivot contains : url as id of the document, document in initial format, and list of keyword or keyphrase. As I use a script for each dataset to transform from raw format to pivot format.

From pivot format, the data will pass preprocess to get input format. In the phase, we can choose different strategies for prerprocessing : tokenization, casing or uncasing, long document processing, stemmization or lemmization or not. Each strategy of preprocessing influences on performance of the model. Hence, I consider the preprocessing as a hyperprameter of the model.

Research Keyword extraction with Bert-Joint KPE

Bert-Joint KPE is a deep learning algorithm using bert base model to adapt on extracting keywords of a document. The [image] described the architechture of the model. First of all, the raw text is passed into a BERT model to extract one token embedding per token. It embeds each token of the document into a numeric vector that contains abstract information in the current context. There are various variants of Bert that have different features. The token embeddings are then fed into a custom CNN with various windows size (from 0 to the maximum length of a keyphrase) which will create an N-gram representation for every possible N-gram. Those N-gram representations are then fed to the chunking network, which will select which N-grams are keyphrases or “Chunks” thanks to a linear binary classification layer. Those chunks are in turn fed to the ranking network which will rank them according to their salience and assign a score to each of them with a linear layer. The training is made jointly on the combined loss of the chunking and ranking network: L = L_Chunk + L_Rank. For the L_Chunk, the cross-entropy is computed for every chunk where the chunk label is 1 if the chunk represents a keyphrase and 0 otherwise. For L_Chunk, the hinge loss in pairwise learning to rank is computed on exact matches with ground truth by minimizing the score of non-keyphrases minus the score of real keyphrases.

Research Keyword extraction on long document

Intergration keyword extraction in Sinequa production

Monday, 18 July 2022

Cutelyst the Qt Web framework just got a new license, the more permissive BSD-3-Clause. Back in 2013 when I started the project the LGPL was a perfectly fine license as software on servers can be closed as long they are not AGPL, thus it was permissive enough for the web and REST backends use-cases I had in mind.

Fast-forward almost 10 years and I have used for a few of projects where it was embedded into another application, and I realized that there might be users with commercial Qt license that would like to use it but can’t due the current license.

Hesitate no more! It’s amazing how much nicer it is to implement client-server applications using HTTP/REST APIs all with Qt/C++, and when real time is needed WebSockets is also there to the rescue.

Since Ubuntu 22.04 has Qt 5.15 and 6.2 releases, for the next releases 5.15.2 will be the new minimum as I want to increase the QStringView usage eventually doing a major version update.

*UPDATE: This release includes a QtWidgets example application showing how to use Cutelyst::Server embedded into a GUI application.

https://github.com/cutelyst/cutelyst/releases/tag/v3.5.0

Sunday, 17 July 2022

GSOC is in full swing and here is my first progress update! I’ve been spending time getting familiar with the Krita code base. The first step in my project was making SVG appear as an export option to test start testing the export code. While this may seem straight forward (I certainly thought it would be) there are a few things that we’ll need to do.

First, how does Krita know what files it can import/export as? Well that is easy enough to answer, in a database. Specifically Krita has a class KisMimeDatabase that stores all available file formats Krita supports. Adding a new option to this database is fairly easy as there are plenty of examples in the KisMimeDatabase.cpp. We can mostly copy/paste how other options are added but replace that file name with svg. Neat :).

But just because its in the database doesn’t mean its a valid file format that Krita recognizes. All file formats need a corresponding impex plugin in plugins/impex/$FILE_FORMAT and to be included in plugins/impex/CMakeLists.txt. Thankfully for us since svg is already partially supported for Krita some of this work is already done. We can see in plugins/impex/svg/CMakeLists.txt that there is already definitions for svg import so now we just need to figure out the export. Thanks to wolthera (for linking) and amyspark (for creating) we have an example commit to figure out what needs to be added. Looking at that commit the additions we need to make for export are pretty similiar to whats already for import. It’s mostly the same except we add kritaimpex to the target libraries and krita_svg.desktop to the ${XDG_APPS_INSTALL_DIR}. I don’t think the .desktop file is strictly required (I think it informs linux desktops that Krita supports this file format) but we may as well add it while we’re here.

Now that we’ve added all these files to the CMakeLists we actually need to implement some of that code that Krita is looking for. Not everything, but just enough so Krita is happy and we can start testing and iterating. Outside the basic constructor/destructor we can see in other plugins that we need to implement KisImportExportErrorCode convert() and initializeCapabilities(). convert() is what actually converts between Krita’s .kra file and SVG, while initializeCapabilities() tells Krita what features SVG supports. We possibly need to implement createConfigurationWidget() but after some discovery this is used to create a pop-up widget that shows export options. We may need to implement it in the future (if converting between formats requires artists to make a decision) but thats uncertain right now. The convert and initializeCapabilities functions we add some basic stubs for so Krita finds what it expects. Finally, we also make and add plugins/impex/svg/krita_svg_export.json to the plugin ExportFactory. Honestly, this last part I am not totally sure how it works but seems it is also used to generate the export list.

OK! That’s mostly all I’ve been worked on for my gsoc project. There were many hours of troubleshooting, comparing, and testing involved but it was very satisfiying making every little bit of progress :). There are still many things to do so hopefully lots more to talk about next time!

Friday, 15 July 2022

For someone who really doesn’t like the company or the platform, I’ve had curiously many macs. It started with a Powerbook Pismo which I got secondhand to investigate some problems Krita had with big-endianness (it had a powerpc cpu and ran Debian), during the first Krita kickstarter I got KO GmbH to buy a mac mini so I could work on porting Krita to macOS. That one was horribly slow, so then in 1015 I got a 15″ macbook pro. In 2020 I first got an M1 MacBook pro, to look into making Krita ready for the M1 cpu.  And after that an M1 mac mini for KDE’s binary factory. I haven’t noticed other projects making use of it, though, and it’s a bit unstable.

And then, since I still could get a good trade-in value, I decided to swap the 13″ M1 for a 14″. The 13″‘ screen was always a bit too small for me and I hated the touch bar with a vengeance.

I’ve been using it now for a bit, and here are my impressions…

The out of the box experience was… Trying my patience a lot! First it needed to download and install 6.1 GB of updates before I could even start sending over my user files. That took hours, even over my really fast glass connection..

Then I wanted to transfer my user folder from the old macbook to the new one; I was warned that that would take five hours. In the end, it was “only” two hours. But that worked really well: everything was copied and ready for me.

Only at that point, by now it was early in the evening could I log in. Asnd that didn’t work. I needed to futz with my Apple ID from another Apple device — and that several times. When that was finally sorted, and I don’t remember <i>how</i> I sorted it in the end, macOS insisted on setting up all kind of stuff I’ll never use, like iCloud.

The next day, I could finally setup my development environment, dropbox and other stuff. Dropbox on M1 macOS has a problem: it can no longer install the kernel extension that would automatically download an off-line file, which means… For every file in Dropbox that I want to use, I need to manually make it available off-line. That’s still not sorted.

As for the development environment, installing XCode took, once again, hours. I only use the command-line stuff, the IDE I use for working on Krita is Qt Creator.

So, now I was all set to go and build Krita. At that point it was clear that this laptop is amazingly fast. Compiled C++ files scrolled by at a clip that I only know from C on other computers.

Actually developing, though, is not such a nice experience. The problem is mainly with the keyboard. As far as keyboards go, the actual keys type fine. It hasn’t got a lot of travel, but it’s easy on the nail polish, it feel good — typing text is a lot of fun. It’s got function keys again, which is also nice.

But…

It’s missing so many keys. I know, that’s par for course with Apple, but when using Krita, a missing Insert key means no easy way to create layers. And there’s a lot of inconsistency between applications. In Terminal, you switch tabs with Control-Tab, in Firefox with Option-Command-Left/Right, in Qt Creator with Option-Tab. I haven’t figured out what it is in Kate. Navigating around text is also inconsistent between applications. And that means that I just never get any finger memory down: especially since I also use all other operating systems…

The window manager is also pretty primitive and needs help from an external utility called Rectangle.

And the permissions stuff is crazy. The wacom tablet driver needs permissions to use Accessibility — as does, for some reason Dropbox.

The hardware for the rest is fine… The screen is good, I don’t mind the notch since I run pretty much everything full-screen, all the time. Battery life is good, but not as good as the 13″‘ battery life was.

As for the rest of the hardware, the screen is fine, I don’t mind the notch, because I pretty much always use all applications full-screen, because of how bad window management is compared to KWin, even with Rectangle.

And I have got a cute cover sticker with Kiki on it!

A pretty laptop sticker