Introduction
License information in source code is best stored in each file of the source code as a comment, if at all possible. That way the license metadata travels with the file even if it was copied from its original package/repository into a different one.
Client-side JavaScript, CSS and similar languages that make up a large chunk of the web are often concatenated, minified and even uglified in an attempt to make the website faster to load. In this process, most often, the comments get culled the first to reduce the number of characters that serve no function to the program code itself.
Problem
The problem therefore is that typically when JavaScript, CSS (or similar client-side code) is being built, it tends to lose not just comments that describe the code’s functionality, but also comments that carry licensing and copyright information. And since licenses (FOSS or not) typically require the text of the license and copyright notices to be kept with the code, such removal can be problematic.
Proposal
The goal is to preserve copyright and licensing information of web front-end code even after minification in such a way that it makes FOSS license compliance of web applications easier.
In addition, my proposal is intended to keep things:
- as simple as possible;
- as short as possible;
- not introduce any new specifications, but rely on well-established standards; and
- not require any additional tooling, but rely on what is already in common use.
Essentially, my suggestion is literally as simple as wrapping every .js
, .css
and similar (e.g. .ts
, .scss
, …) file with SPDX snippets tags, following the REUSE specification as follows:
At the very beginning of the file introduce an “important” comment block that starts the (SPDX) snippet and includes all the REUSE/SPDX tags that apply to this file, e.g.:
/*!
* SPDX-SnippetBegin
* SPDX-CopyrightText: © 2024 Hacke R. McRandom <hacker@mcrandom.example>
* SPDX-LicenseIdentifier: MIT
*/
And at the very end of the file introduce another “important” comment block that simply closes the (SPDX) snippet:
… and that is basically it!
How any why this works (in theory)
This results in e.g. a .js
file that would look something like this:
/*!
* SPDX-SnippetBegin
* SPDX-CopyrightText: © 2024 Hacke R. McRandom <hacker@mcrandom.example>
* SPDX-LicenseIdentifier: MIT OR Unlicense
*/
import half_of_npm
code_goes_here();
/*! SPDX-SnippetEnd */
and a .css
file as follows:
/*!
* SPDX-SnippetBegin
* SPDX-CopyrightText: © 2020 Talha Mansoor <talha131@gmail.com>
* SPDX-LicenseIdentifier: MIT
*/
}
pre {
overflow: auto;
white-space: pre;
word-break: normal;
word-wrap: normal;
color: #ebdbb2; /* This is needed due to bug in Pygments. It does not wraps some part of the code of some lagauges, like reST. This is for fallback. */
}
/*! SPDX-SnippetEnd */
All JavaScript, CSS, TypeScript, Sass, etc. files would look like that.
Then on npm run build
(or whatever build system you use) the minifier keeps those tags where they are, because !
is a common trigger to keep that comment when minifying.
So if all the files are tagged as such, the minified barf you get, should include all the SPDX tags in order and in the right place, so you see which license/copyright starts and ends to apply in the gibberish.
And if it pulls stuff that does not use REUSE (snippets) yet, you will still be able to tell it apart, since it will be the barf that’s between SPDX-SnippetEnd
of the previous and SPDX-SnippetBegin
of the next properly marked barf.
Is this really enough?
OK, so now we know the start and end of a source code file that ended up in the minified barf. But are the SPDX-SnippetCopyrightText
and SPDX-License-Identifier
enough?
I think so, yes.
If I chose to express my copyright notice using an SPDX tag – especially if I followed the format that pretty much all copyright laws prescribe – that should be no problem.
The bigger question is whether communicating the license solely through the SPDX IDs is enough, since you would technically not be including the whole license text(s). Honestly, I think it is fine. At this stage SPDX is not just long-established in the industry and community, but is also a formal international standard (ISO/IEC 5692:2021). Practically from its beginning – and probably the most known part of the spec – unique names/IDs of licenses and the equivalent canonical texts of those license have been part of SPDX. Which means that if I see SPDX-License-Identifier: MIT
I know it specifically means the text that is on https://spdx.org/licenses/MIT.html. Ergo, as long as you are using licenses from the SPDX License List in these tags, all the relevant info is present.
As mentioned, most minifiers tend to remove comments by default to conserve space. But there is a way to retain license comments (or other important comments). And this method existed for over a decade now!
I have done some research into how different minifiers deal with this. Admittedly, mostly by reading through their documentation. Due to my lack of skills, I did not manage to test out all of them in practice.
But at least theoretically the vast majority of the minifiers that I was told are common (plus a few more I found) seem to support at least one way of keeping important – or even explicitly copyright/license-relevant – comments.
From what I can tell it is only Bun that does not support any way to (selectively) preserve comments. There is a ticket open to implement (at least) the !
method though.
While not themselves minifiers, module bundlers do call and handle minfiers. Here are notes about the ones that I learnt are the most popular ones:
From this overview it seems like using the /*!
comment method is our best option – it is short, the most widely supported and not loaded with meaning.
More details on both styles below.
Using @license
/ JSDoc-style
JSDoc is a markup language used to annotate JavaScript source code files to add in-code documentation, which is then used to generate documentation.
Looking at the JSDoc specification, the following keywords seem relevant:
So from JSDoc it seems the best choice would be @license
. To quote from the spec itself:
The @license
tag identifies the software license that applies to any portion of your code.
You can use any text to identify the license you are using. If your code uses a standard open-source license, consider using the appropriate identifier from the Software Package Data Exchange (SPDX) License List.
Some JavaScript processing tools, such as Google's Closure Compiler, will automatically preserve any JSDoc comment that includes a @license
tag. If you are using one of these tools, you may wish to add a standalone JSDoc comment that includes the @license
tag, along with the entire text of the license, so that the license text will be included in generated JavaScript files.
Using /*!
/ YUI-style
This other style seems to originate from Yahoo! UI Compressor 2.4.8 (also from 2013), to quote its README
:
C-style comments starting with /*!
are preserved. This is useful with comments containing copyright/license information. As of 2.4.8, the '!
' is no longer dropped by YUICompressor. For example:
/*!
* TERMS OF USE - EASING EQUATIONS
* Open source under the BSD License.
* Copyright 2001 Robert Penner All rights reserved.
*/
remains in the output, untouched by YUICompressor.
Many other projects adopted this, some extended it by using also the single-line //!
, but others have not.
Also note that YUI itself does not use the double-asterisk /**
tag (if it did, it should be /**!
), whereas that is typically the starting tag in JSDoc (and JavaDoc) of a document-relevant comment block.
So from the YUI-style tags, it seem using (multi-line) C-style comments that start with /*!
is the most consistently used.
And as YUI-style seems to be the most commonly implemented way to tag and preserve (licensing-relevant) comments in JS, it would seem prudent to adopt it for our purposes – to preserve REUSE-standardised tags to mark license and copyright information in files and snippets.
A few PoC I tried
So far the theory …
But when it comes to testing it in practice, it gets both a bit messy and I very quickly reach the limits of my JS skills.
I have tried a few PoC, and ran into mixed, yet promising, results so far.
The most issues, I assume, are fixable by simply changing the settings of the build tools accordingly.
It is entirely possible that a lot of the issues are PEBKAC as well.
Svelte + Rollup + Terser
The simplest PoC I tried is a Svelte app that uses Rollup as a build tool and Terser as the minifier – kindly offered by Oliver “oliwerix” Wagner as a guinea pig. I left the settings as they are, and the results are mostly fine.
First we pull the 29c9881
commit and build it with npm install; npm run build
.
In public/build/
we have three files: bundle.css
, bundle.js
, bundle.js.map
.
The bundle.css
file does not have any SPDX-*
tags, and I suspect this is because it consist solely of 3rd party components, which do not use these tags yet. The public/global.css
is still referred to separately in public/index.html
and retains the SPDX-*
tags in its non-minified form. So that is fine, but would need further testing to check the minified CSS.
The bundle.js
file contains the minified JS and the SPDX-*
tags remain there, but with one SPDX-SnippetEnd
being misplaced.
If we compare e.g. rg SPDX- src/*.js
:
src/store.js
2: * SPDX-SnippetBegin
3: * SPDX-SnippetCopyrightText: 1984 Winston Smith <win@smith.example>
4: * SPDX-License-Identifier: Unlicense
18:/*! SPDX-SnippetEnd */
src/main.js
2: * SPDX-SnippetBegin
3: * SPDX-SnippetCopyrightText: © 2021 Test Dummy <dummy@test.example>
4: * SPDX-License-Identifier: BSD-2-Clause
17:/*! SPDX-SnippetEnd */
… and rg SPDX- public/build/bundle.js
3: * SPDX-SnippetBegin
4: * SPDX-SnippetCopyrightText: 1984 Winston Smith <win@smith.example>
5: * SPDX-License-Identifier: Unlicense
8:/*! SPDX-SnippetEnd */
10:/*! SPDX-SnippetEnd */
13: * SPDX-SnippetBegin
14: * SPDX-SnippetCopyrightText: © 2021 Test Dummy <dummy@test.example>
15: * SPDX-License-Identifier: BSD-2-Clause
… it is clear that something is amiss. A snippet cannot end before it begins.
But when checking the public/build/bundle.js.map
SourceMap, we again see the SPDX-*
tags in order just fine.
I would really like to know what went wrong here.
React Scripts (+ WebPack + Terser)
Before that I tried to set up a “simple” React Scripts app with the help of my work colleague, Carlos “roclas” Hernandez.
Here, again, I am getting mixed results out of the box.
First we pull the af54954
commit and build it with npm install; npm run build
.
On the CSS side, we see that there are two files in build/static/css/
, namely: main.05c219f8.css
and main.05c219f8.css.map
.
Both the main.05c219f8.css
and its SourceMap retain the SPDX-*
tags where we want them, so that is great!
On the JS side it gets more complicated though. In build/static/js/
we have several files now, and if we pair them up:
453.2a77899f.chunk.js
and 453.2a77899f.chunk.js.map
main.608edf8e.js
, main.608edf8e.js.LICENSE.txt
and main.608edf8e.js.map
The 453.2a77899f.chunk.js*
files contain no SPDX-*
tags. Honestly, I do not know where they came from, but assume it is again 3rd party components, which do not use these tags yet. So we can ignore them.
But it is the main.608edf8e.js*
files that we are interested in.
Unfortunately, it is here that it gets a bit annoying.
It seems React Scripts is quite opinionated and hard-codes its preferences when it comes to minification etc. So even though it is easy to set-up WebPack and Terser to preserve (important) comments in the code itself, React forces it otherwise.
What this results in then is the following:
main.608edf8e.js
is cleaned of all comments – no SPDX-*
tags here;- but now
main.608edf8e.js.LICENSE.txt
has all the SPDX-*
tags as well other important comments (e.g. @license
blocks from 3rd party components); - and as for the
main.608edf8e.js.map
SourceMap, it includes SPDX-*
tags, as expected.
The really annoying bit is that it seems like main.608edf8e.js.LICENSE.txt
is not mapped, it is just a dump of all the license-related comments. So that does not help us here.
There is a workaround by injecting code and settings using Rewire, but so far I have not managed to set it up correctly. I am sure it is possible, but I gave up after it took way too much of my time already.
Some early thoughts on *.js.map
and *.js.LICENSE.txt
If the license and copyright info is missing from the minified source code, but it is there in the *.js.map
SourceMap (spec), I think that is better than nothing, but I am leaning towards it not being enough for the goal we are trying to reach here.
Similarly, when the minifier simply shoves all the license comments into a separate *.js.LICENSE.txt
file, removing them from the *.js
file and without any way to map the license and copyright comments back to the source code, I do not see how this is much more useful than the *.js.map
itself.
So far, it seems to me like this is a problem caused by some frameworks (e.g. React Scripts) hard-coding their preferences when it comes to minification, without an easy way to override it.
But if there was a *.js.LICENSE.txt
(or equivalent) that was mapped via SourceMaps, so one could figure out which license comment matches which source code block in the minified code, I would be inclined to take that as potentially good enough.
Future ideas
Once the base issue of preserving SPDX tags in minified (web front-end) code in proper places is solved, we can expand it to make it even more useful.
Here is a short list of ideas that popped up already. I am keeping them hidden by default, to not detract too much from the base problem.
Nothing stops us from adding more relevant information in these tags – in fact, as long as it is an SPDX tag, that would be in line with both the SPDX standard and the REUSE spec. A prefect candidate to include would be something to designate the origin or provenance of the package the file came from – e.g. using PackageURL.
To make this even more useful in practice, it is entirely imaginable that build tools could help generate or populate these tags and therefore inject information themselves, some early ideas:
- for (external) packages that do not use REUSE/SPDX Snippet Tags, the build tool could be ordered to generate them from REUSE/SPDX File Tags;
- same, but to pull the information from a different source (e.g.
LICENSE
file) – that might be a bit dubious when it comes to exactness though; - the above-mentioned PackageURL (or other origin identifier) could be added by the build tool.
All of the above future ideas are super early ideas and some could well be too error-prone to be useful, but should be kept in mind and discussed after the base issue is solved.
Open questions & Call for help and testers
As stated, I am not very savvy when it comes to writing web front-ends, so at this point this project would really benefit from people with experience in building web front-ends taking a look.
If anyone knows how to get React and any other similarly opinionated frameworks to not create the *.js.LICENSE.txt
file, but keep the important comments in code, that would be really useful.
If you are a legal or license compliance expert, while I am quite confident in the logic behind this, feedback is very welcome and do try to poke holes in my logic.
If you are a technical person (esp. front-end developers), please, try applying this to code in different build environments and let me know what works and what breaks. We need more and better PoCs.
If you have proposed fixes, even better!
Comments, ideas, suggestions, … welcome.
Ultimately, my goal is to come up with a solution that works and requires (ideally) no changes in tooling and specifications.
If that means abandoning this proposal and finding a better one, so be it. But it has to start somewhere that is half-way doable.
Credits
While I did spend quite a bit of time on this, this would not exist without prior work and a lot of help from others.
First of all, the basic idea of treating each file as a snippet that just happens to currently span a whole file, was originally proposed to me by José Maria “chema” Balsas Falaguera in 2020 and we worked together on an early PoC on how to apply REUSE to a large-ish JS + CSS code-base … before (REUSE Snippets and later) SPDX Snippets came to be.
In fact, it was this Chema’s idea that sparked my quest to first bring snippet support to REUSE, and later to SPDX specifications.
At REUSE I would specifically like to thank Carmen Bianca Bakker, Max Mehl, and Nico Rikken for not refusing this idea upfront and also for being great intellectual sparring partners on this adventure.
And at SPDX it was Alexios “zvr” Zavras who saw the potential and helped me draft SPDX tags for snippets to the point where it got accepted.
I would also like to thank Henrik “hesa” Sandklef, Maximilian Huber and Philippe Ombredanne for their feedback and some proposals on how to expand this further later on.
Of course, none of this would be possible without everyone behind SPDX, REUSE, YUI, and JSDoc.
I am sure I forgot to mention a few other people, and if that was you, I humbly apologise and ask you to let me know, so I can correct this issue.
hook out → wow, this took longer than expected … so many rabbit holes!