We often see diagrams called ‘trees’ showing how different beings or
things evolve from others. These are used to describe families, species,
languages, programs and other entities. Most of them share two problems:
they aren’t trees as in graph theory (while we reason about them as
trees) and they don’t have discrete, unchanging nodes.
Directed acyclic graphs of evolution
Let’s assume for a while that there are immutable objects with
derivative objects being created in atomic ways. These objects are
nodes (or vertices), there are edges connecting them from the base
one to the derived one.
The ‘tree’ will have nodes with multiple outward edges. However, in many
real cases it will also have multiple inward edges. It isn’t a tree, the
graph theoretic name for it is a directed acyclic
graph, or a DAG.
(We won’t get cycles by just adding immutable nodes having immutable
lists of predecessor nodes, which seems all what we can do with
unidirectional time and really atomic objects.)
Probably the most famous DAGs called ‘trees’ are ‘family trees’. The
larger and more complex
ones aren’t
trees, there is a newer example in The Art of Computer Programming by
Donald E. Knuth, Volume 1 Third Edition, Figure 18(a) on page 310 (the
family DAGs of Eldar or Edain in the books of J.R.R. Tolkien are more
interesting cases, although improbable without elven lifespans). Some
have ancestors aligned on layers depending on the distance from a
descendant, these might have multiple nodes on different layers for the
same person. It’s an obvious consequence of every person having
approximately two parents and there not being exponentially more people
several generations ago. As explained in the TAOCP (page 311), these
graphs are trees if a node represents ‘a person in the role of mother or
father of so-and-so’ (I haven’t seen this definition anywhere else).
There are more software-related examples of graphs not being trees
(leaving other natural examples for the next section, since they don’t
have obviously atomic nodes). Dependencies graphs of classes, modules
and software packages (being specific sets of code, so they are atomic
as considered in this article) clearly aren’t trees in general (and
common) case. (Dependencies inside a package are often cyclic and this
isn’t usually a problem. It is more difficult in case of dependencies
between packages,
compilers
are a well-known case of this.) I believe the assumption that software
packages dependency graphs are nearly always trees, optimizing the
design of software using such graphs for this case and adding
workarounds for a package occurring in multiple parts of the ‘tree’
leads to a much more complex solution than designing for a DAG.
A significantly different tree-vs-DAG understanding issue is that
software doesn’t evolve in trees. A common free software ideology point
of view is that the user gets a software package, modifies it using only
their own ideas and publishes it. This isn’t true, a common and
important case is deriving a new package (or technically a version of
the package, both are equally nodes here) from multiple other packages.
Nearly any use of shared libraries from multiple projects is a case of
this. This leads to different software freedoms than the tree view (the
Free Software Definition
mentions this way of modifying programs, although accepts licenses
restricting what licenses the original programs can be under; I see no
solution for this better than using only GPL-compatible
licenses
for all cultural works).
(There is also an interesting, although unrelated to trees, issue of
maps being nonplanar graphs. I don’t remember seeing an example of a map
having its chromatic number greater than four, although there are many
cases requiring four
colours.
Many countries in these cases are non-contiguous, so a graph of a map
containing them doesn’t have to be planar.)
Are there atomic nodes?
In the above examples it was clear what objects were nodes. Each edge
referred to complete atomic nodes (e.g. there are parents of a human,
not of their specific organs, I don’t know any other
cases than aren’t
only medically significant unlike the probable uses of family ‘trees’).
It isn’t in all common uses of ‘trees’. If the node refers to a mutable
union of immutable objects (or a mutable object), the graph can easily
have cycles, multiple edges between the same nodes (it even is’t
usually called a graph), or
just be too unclear to be useful.
A problem of such nodes is that we often don’t know clearly when we have
the same node.
Probably the most well-known example is a phylogenetic
tree showing a common
origin of species (this isn’t the only evolution the title of this
article refers to). As described in that article, the ‘ideal’
approximated by such trees isn’t a tree, due to interesting issues like
horizontal gene
transfer which
clearly makes it not a tree. Another issue is that there is no clear
separation between
species.
The same problem occurs in the evolution of
languages. There is no clear
separation between
languages (except for
the political ones). If, in parallel to the definition of a species as a
set of animals being able to have common descendants, we define a
language as a set of mutually-intelligible sentences, there is no point
of separating languages (for each pair of communicators there is at most
one language, there is no observable way of finding which of these
languages are ‘the same’; this might be a reason for other definitions
being widely used).
There is a similar case for software evolution. There is a great graph
of GNU/Linux distribution
derivation. Since the
distributions change, it has many vertical lines when a distribution
e.g. changes its base distribution. Somehow this case avoids having the
nodes as unclear as species or languages.
Let’s not call the next graph a ‘tree’, unless the objects being modeled
are clearly separate, atomic and form trees, not more complex graphs.