Free software licenses are not a sufficient condition for software freedom

A common misconception about free software is that having a free license is both a sufficient and necessary condition for the software to be free. There might be cases when software is too simple to be restricted by copyright, so it is not a necessary condition. There are much more important arguments why it is not sufficient.

Copyright is used to restrict sharing of a work. This is often used to prevent users from cooperating with other users, while this can be used to prohibit some restrictions of the freedom of derivative work users in a practice called copyleft. Although copyleft can limit freedom of users modifying software (works under different copyleft licenses usually cannot be merged in one derived work), this is practically a useful compromise which won’t be discussed in this essay.

The most obvious way to restrict user’s freedom is to not share with them a source form of the program. There are many sourceless blobs of microcode in the kernel called Linux, while they have free licenses ‘giving’ users freedom that they cannot use. Without copyright (or with a very short one), there would be no effect on restrictions of nonfree software, only copyleft would be limited. Digital restrictions management clearly shows that copyright isn’t important for software owners: they make and enforce their own rules restricting what the user can do (actions allowed by copyright laws are also restricted by DRM).

DRM wouldn’t be effective if the user could understand the program (difficult, although possible even without source code as some cases in cryptography show) and install a different one. The second part is prevented by nonmodifiable bootloaders that can load only software signed by the device manufacturer. This is a common practice for phone and tablet operating systems using much GPLv2-licensed software, named tivoization after an early case.

Although free software licenses can protect users from the above problems in derived works, there are different legal issues that won’t be avoided in this way. Software patents are one of them. Governments censoring useful cryptography or pornography are similarly a restriction of software user’s freedom (and non-user’s).

These examples show that free software licenses aren’t sufficient for software to be free, while they don’t suggest any obvious solutions to this problem. They clearly require user’s awareness of their (not necessarily software-related) freedom. The focus on licensing leads to them not considering these issues restrictions of their freedom. It could also limit the visibility of copyright problems making other alternatives (with appropriate replacements for copyleft) more beneficial in a longer term.

There is no tree of evolution

We often see diagrams called ‘trees’ showing how different beings or things evolve from others. These are used to describe families, species, languages, programs and other entities. Most of them share two problems: they aren’t trees as in graph theory (while we reason about them as trees) and they don’t have discrete, unchanging nodes.

Directed acyclic graphs of evolution

Let’s assume for a while that there are immutable objects with derivative objects being created in atomic ways. These objects are nodes (or vertices), there are edges connecting them from the base one to the derived one.

The ‘tree’ will have nodes with multiple outward edges. However, in many real cases it will also have multiple inward edges. It isn’t a tree, the graph theoretic name for it is a directed acyclic graph, or a DAG. (We won’t get cycles by just adding immutable nodes having immutable lists of predecessor nodes, which seems all what we can do with unidirectional time and really atomic objects.)

Probably the most famous DAGs called ‘trees’ are ‘family trees’. The larger and more complex ones aren’t trees, there is a newer example in The Art of Computer Programming by Donald E. Knuth, Volume 1 Third Edition, Figure 18(a) on page 310 (the family DAGs of Eldar or Edain in the books of J.R.R. Tolkien are more interesting cases, although improbable without elven lifespans). Some have ancestors aligned on layers depending on the distance from a descendant, these might have multiple nodes on different layers for the same person. It’s an obvious consequence of every person having approximately two parents and there not being exponentially more people several generations ago. As explained in the TAOCP (page 311), these graphs are trees if a node represents ‘a person in the role of mother or father of so-and-so’ (I haven’t seen this definition anywhere else).

There are more software-related examples of graphs not being trees (leaving other natural examples for the next section, since they don’t have obviously atomic nodes). Dependencies graphs of classes, modules and software packages (being specific sets of code, so they are atomic as considered in this article) clearly aren’t trees in general (and common) case. (Dependencies inside a package are often cyclic and this isn’t usually a problem. It is more difficult in case of dependencies between packages, compilers are a well-known case of this.) I believe the assumption that software packages dependency graphs are nearly always trees, optimizing the design of software using such graphs for this case and adding workarounds for a package occurring in multiple parts of the ‘tree’ leads to a much more complex solution than designing for a DAG.

A significantly different tree-vs-DAG understanding issue is that software doesn’t evolve in trees. A common free software ideology point of view is that the user gets a software package, modifies it using only their own ideas and publishes it. This isn’t true, a common and important case is deriving a new package (or technically a version of the package, both are equally nodes here) from multiple other packages. Nearly any use of shared libraries from multiple projects is a case of this. This leads to different software freedoms than the tree view (the Free Software Definition mentions this way of modifying programs, although accepts licenses restricting what licenses the original programs can be under; I see no solution for this better than using only GPL-compatible licenses for all cultural works).

(There is also an interesting, although unrelated to trees, issue of maps being nonplanar graphs. I don’t remember seeing an example of a map having its chromatic number greater than four, although there are many cases requiring four colours. Many countries in these cases are non-contiguous, so a graph of a map containing them doesn’t have to be planar.)

Are there atomic nodes?

In the above examples it was clear what objects were nodes. Each edge referred to complete atomic nodes (e.g. there are parents of a human, not of their specific organs, I don’t know any other cases than aren’t only medically significant unlike the probable uses of family ‘trees’). It isn’t in all common uses of ‘trees’. If the node refers to a mutable union of immutable objects (or a mutable object), the graph can easily have cycles, multiple edges between the same nodes (it even is’t usually called a graph), or just be too unclear to be useful.

A problem of such nodes is that we often don’t know clearly when we have the same node.

Probably the most well-known example is a phylogenetic tree showing a common origin of species (this isn’t the only evolution the title of this article refers to). As described in that article, the ‘ideal’ approximated by such trees isn’t a tree, due to interesting issues like horizontal gene transfer which clearly makes it not a tree. Another issue is that there is no clear separation between species.

The same problem occurs in the evolution of languages. There is no clear separation between languages (except for the political ones). If, in parallel to the definition of a species as a set of animals being able to have common descendants, we define a language as a set of mutually-intelligible sentences, there is no point of separating languages (for each pair of communicators there is at most one language, there is no observable way of finding which of these languages are ‘the same’; this might be a reason for other definitions being widely used).

There is a similar case for software evolution. There is a great graph of GNU/Linux distribution derivation. Since the distributions change, it has many vertical lines when a distribution e.g. changes its base distribution. Somehow this case avoids having the nodes as unclear as species or languages.

Let’s not call the next graph a ‘tree’, unless the objects being modeled are clearly separate, atomic and form trees, not more complex graphs.

An unnamed DNS replacement idea

DNS solves two problems:

  1. translating between human-readable domain names and machine-usable IP addresses
  2. storing a reliable, hierarchical, distributed database describing which servers provide which services and the above mapping.

It’s known that the second problem is solved inefficiently, insecurely, unreliably and centrally. Thus a different system should be designed to solve it without these problems.

The first problem is solved using globally unique names with unique meanings. This is an unreal assumption enabling useless or harmful activities like domain parking, domain squatting, trademarks being used for censorship, or just making the names difficult to type.

This probably contributes to the fact that users often use machines to store the domain names. Other issues like advertising contribute to using unreadable names and sharing them via e.g. QR codes instead of memorization by humans.

Therefore I believe a good replacement for DNS solving the second problem would not use globally unique human readable names.

Let’s assume that a single being manages the database fragment describing some machines (like a DNS zone). There is no problem with having names in the fragment. The fragment should be signed using a key pair used only for this zone with private part known only to the managing being. Probably any useful and scalable DNS alternative would do this.

The ‘name’ of the zone would be the public key used to sign the zone. It would be random-like and there practically wouldn’t be multiple zones with the same ‘name’, so this would avoid the problems of nonrandom unique names. There are algorithms using elliptic curve cryptography having good enough public keys small enough to use them in a DNS domain name, they could be used here (although having e.g. a multi-kilobyte zone ‘name’ wouldn’t be a problem for machines transferring them).

We could let anyone serve the zone data, since knowing the public key allows knowing if the untrusted server provided us the correct data, assuming the data doesn’t change. In real life such data changes, but having an outdated copy could be detected by e.g. specifying the time when the data is valid in the zone (DNS uses a similar solution, although it wouldn’t need having globally synchronized clocks). There are existing solutions for sharing such data without having a central server.

This leaves the problem of having human-typeable names for the zones in the rare cases when they are useful. This would be solved by having a local daemon having such user-specified mapping and maybe asking other such daemons for other names (e.g. if the user has many machines or uses names shared with other users in a single organization).