This month, the FSFE is starting a project to facilitate automation in the software toolchain. We're looking for the best practices for free and open source software development that facilitates use and increases the level of automation possible.
When you create a new project on Github, Gitlab, Gitea or many other websites for software development, you're asked for a default license file to be included in the software repository. These files end up looking rather similar, in many cases to the point of being identical, which is a good thing. When things look similar, it's easier for a computer to understand them and find commonalities between them.
For instance, if I write a piece of software which knows what an MIT license looks like, it will likely understand and can tell me which repositories on GitHub have an MIT license, without me needing to manually read the license text of each.
Most software also include some information about the author of the software, either in a copyright header, in a separate file, or in version control metadata. It becomes a bit more difficult for a computer to tell who the authors are, since the information is potentially spread out, and there's often no direct link between the authorship and the actual code authored.
The same is true for a lot of licenses too: not every license is conveyed in a top level LICENSE file. Some license texts are in the copyright header of a file, or in a separate sub directory. Some contain the fill license text, others just references to the full text elsewhere. Some are explicit about the code each license covers, others don't say anything about it, or feel it's implied.
All of this makes it more difficult for a computer to understand the licensing and authorship of software code. Which is unfortunate.
As free and open source software is entrenched as the foundation and default for all new software, we come to rely more and more on automation. I recently wrote about how the FSFE has automated our use of LetsEncrypt certificates for new services, but it actually goes well beyond that. With close to the click of a button, I can pull down a few hundred software projects, compile and install them, and start up a web service.
But there are some inherent risks in this: what if the license of one of the projects change, to one which I'm no longer allowed to use in the way I am? My automated scripts won't warn me about this, they'll carry on as they always have, assuming the project still builds.
And if I look at the top level LICENSE file, and tell my script to make sure it doesn't change, what if there's another license introduced for parts of the project which my script wouldn't know anything about? Perhaps some of the licenses also ask me to name the software and author in the service. A reasonable ask, but I might never know.
There's surely a legal risk here, but more importantly, there's a social risk. The licenses are just mechanisms for conveying expected behavior in our community. It's the behavior which is important; not the license. Still, having my computer be able to say something about the license of a software would certainly help. It would make it possible to create automated tools that actually warned if a license changes, that could create a list of licenses and software used, that could help me give credit to those whose software I rely on.
Over the next months, the FSFE will look at some of the best practices for conveying provenance information in software, in a way that computers can understand it. We will work to collate these practices and work with our community to understand what's most important and relevant.
And, we will make this information available, so software developers can benefit from it, and help us gradually increase the easy by which software can be used. As of today, we're starting to fill out information about the sources of best practices we can find on our WeKan board. You're welcome to look at what we have, and get in touch with me if you have additions to the board or would like to work with us on this!