Using Docker with IPv6

I found a tool extending Docker with IPv6 NAT and an issue about it being replaced by a builtin Docker functionality and I thought that if Docker natively (but experimentally) supports IPv6 then it will be easy to configure, so I chose to enable that for my Mastodon instance. It took slightly more work than I expected.

Unlike much more complex earlier guides that I have seen, this approach uses NAT, just like with IPv4. My containers do not expose any public services not proxied by an nginx on their host, so I don't need a more complex solution or a pool of public IPv6 addresses from my VPS provider. I assume a recent Docker version, I'm running 20.10.21.

Configuring the builtin IPv6 support I found several issues:

  • Due to an IP address parsing bug, an address pool is used for only one subnet. So I need a separate default-address-pools entry for each.

  • Not a bug, but I found no information on the default for default-address-pools and I copied that from an example in the Docker documentation, so I also had to update my iptables rules to allow Prometheus to access the host's node_exporter from a different IP address.

  • docker-compose.yml version 3 doesn't support enabling IPv6 (since it limits its functionality to what Docker Swarm supports; similarly it doesn't have memory limits without Swarm); so I upgraded to version 2.4.

  • docker-compose up -d and Ansible docker_compose module do not recreate networks, so their changes are not applied.

  • Mastodon failed to connect to PostgreSQL on IPv6; I don't need IPv6 for the internal network used for its PostgreSQL and Redis, while there is an issue about Redis on IPv6, so there is nothing here for me to debug and I keep using only IPv4 for that network.

So my /etc/docker/daemon.json evolved into this (with more single subnet address pools omitted):

{
  "default-address-pools": [
    {
      "base": "172.30.0.0/16",
      "size": 24
    },
    {
      "base": "172.31.0.0/16",
      "size": 24
    },
    {
      "base": "fd00:0000:0000:00::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:01::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:02::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:03::/64",
      "size": 64
    }
  ],
  "experimental": true,
  "features": {
    "buildkit": true
  },
  "fixed-cidr-v6": "fd00::/80",
  "ip6tables": true,
  "ipv6": true
}

(BuildKit is an unrelated feature for a much better image building experience, I use it on my laptop.)

I tested that containers support IPv6 (here with a large latency on my laptop):

$ docker run --rm busybox ping -c1 -6 mtjm.eu
PING mtjm.eu (2a01:7e01::f03c:91ff:fefb:b063): 56 data bytes
64 bytes from 2a01:7e01::f03c:91ff:fefb:b063: seq=0 ttl=49 time=47.265 ms

--- mtjm.eu ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 47.265/47.265/47.265 ms

Several docker-compose.yml files now have:

networks:
  default:
    enable_ipv6: true

while others specify IPv6 for external networks:

networks:
  external_network:
    enable_ipv6: true
  internal_network:
    internal: true

And in several directories I did docker-compose down and then docker-compose up -d.

I used this inelegant command to find containers not having IPv6:

for f in $( docker ps --format '{{.ID}}') ; \
  do docker inspect $f | grep fd00 || echo $f ; \
done

and it listed only the databases on internal networks.

So now all my containers talking to the outside world can access IPv6 services. But I don't know e.g. any IPv6-only Mastodon instance to check if mine can communicate with it.

Personal server exit review

I used a Cubieboard as a personal server until October, running Kanboard, tt-rss and some other Web apps there. It allowed me to keep more of my data at home and practice some more system administration.

Why Cubieboard

Initially I wanted a computer that would work during the night in my bedroom. This required a fanless design and therefore a low power usage. (Now I have a separate room for the computers, so they don’t need to be so quiet, while I pay their power bills.)

I could have either used my BeagleBone Black, a Cubieboard or buy a different board. I haven’t chosen the BBB for this since its 512 MiB RAM is not enough for a Web app that I maintain in my spare time (Cubieboard’s 1 GiB is enough), and I wanted a more stable storage than microSD: Cubieboard has a SATA port which I connected to an SSD (also no moving parts). Meanwhile, I can use the BBB for flashing and debugging coreboot which requires some downtime.

(Now I think I wouldn’t need the personal server to work at night, but it’s easier: I can read some Web comics when I resume my laptop from suspend and get daily mails from cron and Kanboard. Also I won’t forget to turn it on before accessing these while traveling.)

How it was configured

ARM makes booting slightly more interesting than on x86. It booted u-boot from a microSD card which loaded a kernel from /boot on the same microSD card (I wasn’t able to make it load a kernel from the SSD, while it was documented to support SATA) which mounted a btrfs root filesystem from the SSD.

The server was running Debian Jessie, manually installed using its usual installer (with good support for such boards). I configured nearly all services running on top of it using Ansible. Much configuration was shared with my other computers, e.g. using OpenVPN, Postfix for relaying locally-generated mails to my VPS, etc.

Like all my computers (or, in case of Thinkpads, disks moved between computers), it had a unique hostname. I named it after Sam, the trusted friend of Alice, Bob and Frodo.

How I broke it

After receiving a mail from apticron about Debian package updates being available, I ran aptitude full-upgrade. There was a kernel upgrade, so I rebooted it. This worked many times, but it didn’t once in October.

After getting the serial TTL cable (which required removing the top of its case), I found errors from initramfs. Root couldn’t have been mounted due to filesystem errors. Checking in another computer (a big advantage of SATA over soldered storage chips), I have seen many btrfs errors, while all interesting files could be read.

So I copied the filesystem image to my desktop, ran mkfs.btrfs, copied all files to the new filesystem, in many reboot loops I fixed /etc/fstab and some initramfs configuration. Then it was not booting, probably not being completely configured to use the new filesystem.

Now

Not being able to fix it ‘now’, I migrated services to my desktop computer (really easy with Ansible). I used data restored from the filesystem image and a daily PostgreSQL image. (I couldn’t get the possibly corrupted newer PostgreSQL data: it won’t load files written on a different architecture, requiring using pgdump on armhf.)

Two months passed and I haven’t noticed a need for that server, so I still haven’t fixed it and use the desktop as a personal server. There is a difference in the power bill, while I don’t know how much of it can be attributed to the desktop running more often now.

Future

When I set up a new personal server, I will think about filesystem errors before it stops booting. Maybe periodically running btrfs scrub or choosing an older filesystem would help. Certainly, I should backup before installing any OS update. I should also get a recovery method for when the OS won’t boot (very easy on x86).

Update 2018-06-16: My notes show that I have reinstalled the personal server and it mostly works since January 2016. Update 2022-11-15: Later I moved all the services to my VPS (or replaced them by local software on my laptop) and I stopped using the personal server in August 2019.

Inclusion of licenses longer than licensed works

There are licenses known for excessive attribution requirements: in a single project the old four-clause BSD license required including 75 different texts in all advertising materials. The license text itself can be long (GNU FDL 1.3 takes more than 3 500 words, the Web browser that I use would spend nine A4 pages to print it): imagine an award pin with an FDL-licensed image or several pages long document derived from a GNU manual. Both need to include the GNU FDL text. It makes the license, despite being free (possibly in specific cases for FDL; in all cases for GNU GPL), unusable for some kinds of free works.

If you don’t consider award pins sufficiently complex and original, imagine a postcard from a traveling family member. It should have a beautiful photo on one side, like the ones that Wikimedia Commons has, and the whole other side filled by a letter describing their holidays, and your postal address. There is no place to fit nine pages of license text there, and the postcard is distributed by itself, so no separate booklet with required legal texts can be included.

It’s one of the reason for GNU FDL being used for ‘professional’ photos: it’s free, so it is accepted in free culture projects like Wikimedia Commons, but it's unusable so proprietary relicensing businesses work. Wikimedia Commons now discourages using GNU FDL for photos without dual-licensing under a more usable license.

I do believe that this is a significant bug in the license: copyleft licenses should be designed to not support proprietary relicensing or proprietary extensions businesses (i.e. proprietary software businesses) and should not have features that are useful nearly only for such businesses (while FDL has several, possibly since the license was designed to be used by traditional publishers). (There are several different problems in other, more important, copyleft licenses like GNU AGPL or GNU GPL3, e.g. the optional attribution requirement. Some of them are solved in copyleft-next; e.g. the Nullification of Copyleft/Proprietary Dual Licensing clause protects against proprietary relicensing by removing the copyleft for all in some cases.)

How can we solve this problem? By not distributing FDL-licensed works and by not recommending the use of such licenses for cultural works. This requires recommending specific better licenses.

GNU recommends their all-permissive license for short documents like README files. Unless the work is a part of a GNU package, a free Creative Commons license is probably a better solution: copyleft (without source provision requirement) CC-BY-SA, permissive CC-BY or ‘public domain but legal everywhere’ CC0. In its clause 3(a)(1)(C), CC-BY-SA 4.0 requires to

indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License,

so it’s sufficient to fit an URI like https://creativecommons.org/licenses/by-sa/4.0/. (I have seen a much longer text than this URI written on a single pea seed in a local museum, so this surely works for bigger works like award pins or postcards.)

A more general term is used in copyleft-next: ‘inform recipients how they can obtain a copy of this License’ which is obviously satisfied by an URI. (The whole officially recommended license notice is: ‘Licensed under copyleft-next version 0.3.0. See https://gitorious.org/copyleft-next/copyleft-next/raw/master:Releases/copyleft-next-0.3.0 for more information’. Compare the three paragraphs recommended for the GNU GPL.)

This couldn’t have been done several decades ago. There was no Web in 1991 when GNU GPL2 was released (this is why usual GPL legal notices had an FSF address, changed several times after the license was released, until the GPL3 with both an URL and distributed license copy). It was reasonable to assume that the user couldn’t have obtained the license text from the Web, but now it’s probable that every computer user can access the Web, although not necessarily from their home. (How many GPL software recipients can access postal mail to use the source offers and not the Web?)

(This is not the only problem with long licenses or requiring to include their text in the work. It is a bigger problem that some licenses are too complex or too badly written to be understood by users, but that problem cannot be as easily quantified as their texts not fitting in the work: understanding of licenses is ‘cached’ in memories of their readers who have already met e.g. the GNU GPL3 for many other works. It would be also possible, and evil, to write a very short and incomprehensible license.)

PlaneShift and free software

On the download page of PlaneShift I see big letters ‘Fully Free Cross-Platform MMORPG’ and ‘Open Source Development!’. They provide the source code of their client, while writing how this helps user’s freedom and security. (I prefer using clearer terms like free software and copyleft for the exact things that they praise. While I played PlaneShift many years ago, I do not have any opinion on it beyond what I write in this essay, since I’m not interested in multiplayer games.)

However, they write that they need ‘some additional bounds [in the license] to keep safe the work of [their] artists and to ensure project success’. This both supports false assumptions (there are safe and successful projects releasing fully free cultural works) and significantly reduces the benefits of their licensing for user’s freedom and security.

They have clearly explained their licensing and its rationale. Source files are licensed under the GPL, while artwork, text and rules in the game use a custom nonfree license (called the PlaneShift Content License).

Despite using a free and copyleft license, the client has significant restrictions on user’s freedom:

  • ‘You cannot distribute the client, sell it or gain any profit from it’
  • ‘You can use our client only to connect to Official PlaneShift Servers’

So of the free software freedoms only a small part of ‘the freedom to study how the program works’ applies. It does not belong ‘to the community of OS developers’, it belongs to Atomic Blue, the organization running PlaneShift. While their licensing is rationalized by making forking as hard as possible, all benefits of free software that they write about require forking.

The ‘content’ license is short and simple. It forbids any distribution or modification of the work, allows using it only (personally) with their official servers and ‘a Planeshift Client, distributed by Atomic Blue’, and disclaims all warranty.

I’m not able to understand what their encouragement for users to ‘experiment with mods and changes to either [their] source code and to [their] art assets’ might mean. Are they recommending infringing their copyright or promoting fair use in a very unclear way?

The requirement to use the artwork ‘only in conjunction with a Planeshift Client, distributed by Atomic Blue’ might forbid using the client software if built from source. So that software, as normally used, is as free as if it was written on stone tablets, impossible to copy nor modify. All security benefits of its source code being free disappear, when a Mallory just needs to backdoor Atomic Blue’s compiler.

Even if a client built from source could be used, GNU/Linux distros wouldn’t be able to include that game, since they wouldn’t be allowed to distribute the needed artwork. The source might be free, but it’s not useful without the nonfree artwork. (Or is it? Write if you know a free derivative of it working without Atomic Blue’s artwork and servers.)

Free software Flash replacements and the JavaScript trap

One of the nonfree programs that make it hard for many people to use completely free software operating systems is Adobe Flash. There are several free software projects aiming to replace the Flash interpreter, one of them used to be an FSF high priority project. I don’t believe that developing such programs will significantly help people stop using nonfree software. (While hardware compatibility issues resulting from free drivers requiring nonfree firmware are well-known and probably more noticeable, they can be easily avoided by buying appropriate hardware, it’s not hard. There are social issues that make people use the same websites as their friends, but not the same computer hardware.)

While Flash has many uses, both as a Web browser plugin and for desktop applications, I will focus on its common use for video players on websites like YouTube.

Replacing Flash is hard

No free software implementation of SWF, the file format used by Flash, can currently support most such files used on the Web. gNewSense contributors mentioned both patents and incomplete specifications making this hard to do. Another issue is the Digital Restriction Management implemented in Flash. A sufficiently complete free implementation would probably violate anti-circumvention laws making DRM an effective restriction of our freedom.

The JavaScript trap

Even if we had a complete and free SWF implementation, it would interpret nonfree programs that websites publish. It is exactly the same problem as the JavaScript trap: using free software interpreters to run untrusted nonfree software from the Web. (I haven’t noticed this issue before reading the RMS’s essay on JavaScript and gNewSense’s page on SWF.)

Some sites like YouTube are moving to providing videos via the HTML5 video tag. It doesn’t solve this problem, since now nonfree JavaScript programs serve the same purpose as previously SWF. I think it might make writing free software replacements easier, due to free development and debugging tools available for JavaScript.

Why we need video downloader programs

Issues with specific video publishing sites are completely solved for their viewers by not running the code that the site provides (either SWF or JavaScript) and using a free software program to obtain the video. This can be done by youtube-dl, a command-line program; UnPlug, a browser extension; and many other programs. There are also extensions that display the video inline on the page without using its builtin player.

These tools support only specific sites, while very many are supported by youtube-dl despite its name. On other sites you can usually find the video URL by reading the source of the HTML page or the included JavaScript code. (It might be a nice fetish to have.) I don’t know what work is needed to use an unsupported site with a free SWF interpreter like Gnash.

Being able to download the video and save it on persistent storage (instead of downloading it just to display it in the player) is needed for at least several useful reasons: we cannot remix without downloading the video, we cannot protect against centralization and copyright censorship while accessing the works from a single centralized site and we cannot share it with our friends (or be a good friend to them) without having a copy. Even the very limited freedoms weakly protected by copyright law as fair use cannot be used without storing a copy of the work.

(While I highly disagree with completely rejecting JavaScript due to its usefulness in free Web applications, the arguments used against it clearly apply to SWF. Video downloader programs and browser extensions are software that we can write to replace nonfree software provided by websites.)

Flash animations

Before Web videos became popular, SWF was often used for vector animations. This might include them in the difficult to reason about area between software and non-functional cultural works, while there is a simple reason to consider it software: it has antifeatures. We need the freedom of free software for such works to make them respect their users.

JavaScript and HTML5 canvas are replacing this use of Flash too, so now nonfree programs using better tools control the animation.

Publishing your own works

If you write an interactive website, use JavaScript. Release your code as free software. If you make videos, release them on your site using a free software-friendly video format like WebM, or use Web applications like GNU MediaGoblin (possibly an instance run by your friend).

To prevent DRMed sites from using your videos to restrict their users, use a free culture license that disallows using ‘effective’ technical restrictions of the freedoms that it protects, like CC-BY-SA 4.0. (YouTube requires giving them a different license, don’t upload your work there.)

My email spam filtering and end-to-end encryption

Big email providers use very complex spam filtering methods. Solutions used by Google require distributed real-time processing, access to to plain text of all messages. Their work is closely followed by spammers in an arms race, while it’s not usable for small servers and both sides benefit from reducing user’s privacy. In this article I describe how spam filtering works on my personal server: a solution optimized for low administration effort and not using message content. It involves using only existing known free software packages without much extra configuration beyond what’s needed to have a working mail server.

Email spam that I receive comes from three main sources: zombie computers in botnets, hijacked accounts and Polish businesses. Zombies are easy to block, since they do not comply with mail standards in easily detectable ways. Hijacked accounts are now rare (partially due to the hard work of Google explained in the linked mail; it would be easier if the two Yahoo users who don’t spam moved to other providers).

Spam from Polish companies is my main issue, since they use properly configured servers and their own IP addresses. There is a law that allows sending uninformative spam to everyone, while informative spam can be sent only to companies. They do not check if the recipient has a company.

I use the following methods to filter these kinds of spam on my server using Postfix MTA:

  • postscreen It filters much zombie spam by adding a several second delay and checking if the client waits before sending data and doing several other protocol correctness checks.

  • Sender Policy Framework Since zombie spammers do not use their own domains (these would be blacklisted by Google), they use fake sender domains which are often real. SPF records specify which servers are authorized to send mails for that domain, so zombie spam using it is blocked. Not enough domains use it. SPF would block some good mails if I used email forwarders without SRS, I don’t, since I have no use for forwarders. (The SPF validator implementation that I use is pypolicyd-spf.)

  • postgrey It greylists all mail not from known trusted servers that haven’t successfully delivered a mail recently; i.e. it returns a temporary error code and allows the mail to be sent again after several minutes (proper servers do this; email servers work well without 100% uptime). This leads to delays when getting mails from new servers, annoying for registration emails from shops. It blocks nearly all remaining zombie spam.

  • static IP address blacklist For professional Polish spam businesses. For one provider, I have to blacklist entire IP ranges. This solution wouldn’t work for a server with more users.

I don’t use these common methods:

  • checking reverse DNS records: it fails on real servers and would block much self-hosted servers
  • using external RBLs: they are bad and block self-hosted mail
  • DKIM: I don’t find enough value in it to find how to configure it; I think it might be useful for more complex filtering that uses multiple factors to decide if a message is spam and if the provider can motivate administrators of other servers to configure extra things (Google can)
  • checking message content: it’s complex, has false positives, causes an arms race, needs access to message’s plain text content (preventing end-to-end security or delegating spam filtering to the client); if manual filtering of probable spam messages is needed, the method is at most as good as not doing any filtering.

I tried using ‘unsubscribe’ links in professional spam. They don’t work: they often fail (with e.g. page not found errors), are missing, or are mail addresses (I don’t mail spammers). If they work, they affect only some mails from the provider (only the mail that link was from?): they still send other mails. The IP address blacklist is more effective. I haven’t tried contacting server providers of spam businesses using VPSes or dedicated servers with terms of service prohibiting spam. I don’t know if they have a saner definition of spam than the law.

I would like it if all spammers moved to sending only OpenPGP-encrypted mails (they can easily get my public key from a public keyserver or from my Web site): it wouldn’t affect my spam filtering and it would increase their resource usage.

In this week, I received 11 spam messages (not counting ones from mailing lists), 5 are in English, probably from zombies, 6 are from real Polish businesses with IP addresses that I haven’t blacklisted yet. I don’t count how many were blocked. I consider this good enough to not research better spam filtering methods now.

I don’t offer a solution to the problem of spam: it’s difficult, has economic, legal, technical and educational aspects; what I use is sufficient for my needs and has no problems with securing message texts. I do not know how spam filtering would work if all users moved to their own servers, maybe some post-email protocols with proof-of-work schemes would solve these issues while not supporting sending emails from phones to Google servers.

DRM in free software

Free software has less antifeatures than proprietary software and users can remove them. While a well-known distro vendor includes spyware, such bugs usually get fixed. Despite these, some well-known free programs include antifeatures restricting uses or modification of data that these programs should access or edit.

These antifeatures are called DRM which is ‘digital restrictions management’. It is unrelated to the Direct Rendering Manager which despite using the same acronym has no freedom issues other than requiring nonfree microcode for Radeon graphics chips. Traditional bugs that make programs mishandle data or crash when using specific files are also different, developers fix them and don’t consider them intentional.

PDF restrictions: Okular, pdftk

The PDF document format includes metadata flags which readers use to determine if the user is allowed to e.g. print the file or copy its text. Okular obeys this restriction by default, while it has an option to respect what the users does.

The main argument for keeping that optional DRM is that the PDF specification requires it and users could use that ‘feature’.

The PDF manipulation program, pdftk, obeys such restrictions with no option to remove them without changing its source. Fortunately Debian fixed this bug in their packages, so it can be used on recent Debian-based systems to modify or fix restricted PDFs.

What if you get a restricted PDF and need to extract its text? Use pdftk input.pdf output output.pdf on a Debian-based system to drop this restriction, or just use the existing file in Okular with disabled DRM or another free PDF reader.

Debian-patched pdftk prints the following warning:

WARNING: The creator of the input PDF:
   drmed.pdf
   has set an owner password (which is not required to handle this PDF).
   You did not supply this password. Please respect any copyright.

I think it’s an acceptable way to handle such restrictions. Many uses of the restricted features don’t involve violating copyright.

I made that restricted file earlier using pdftk text.pdf output drmed.pdf owner_pw hunter2 allow. It did not warn me that that DRM is bad or that it can be very easily ignored or removed.

The PDF format supports also document encryption with user passwords. It’s not DRM, since it prevents reading the document instead of restricting it in software: can it be used to protect user’s privacy? (I don’t know how secure that encryption is, I would use OpenPGP instead if I had to send an encrypted document to a friend.)

LibreOffice spreadsheet ‘protection’

OpenDocument supports sheet and cell ‘protection’. It allows the user to read the spreadsheet (except for hidden sheets), but not view formulas, copy or edit their data.

This is implemented by adding metadata that tells programs to not allow editing the cells. The document contains an element with hashed password ‘needed’ to unprotect the sheet. It’s easy to change that password or remove protection using a text editor and a ZIP program to access XML files stored in the document.

LibreOffice Calc did not warn me that the added ‘protection’ is useless against users who can use a text editor. It did not warn me that this kind of restrictions is unfriendly and harmful regardless of them being effective.

The reason why I learned about this antifeature is that I once received a spreadsheet document and wanted to learn how its formulas worked. I converted it to ODF using LibreOffice and used jar and sed to change the ‘protection’ password. I learned more than I expected to from that document.

While all complete OpenDocument implementations have this problem, I name LibreOffice specifically here since I use it and recommend for other reasons. This antifeature probably comes from OpenOffice.org or StarOffice which cloned it and other bugs from other proprietary office software.

Like PDF, OpenDocument supports encryption which is unrelated to the discussed restriction.

FontForge

TrueType fonts have metadata flags specifying if a font editor should allow users to modify or embed the font. FontForge supports modifying that metadata and warns the user when opening a font containing it.

The setting responsible for this is ‘Element’ → ‘Font Info’ → ‘OS/2’ → ‘Embeddable’, opening a TrueType font with that value set to ‘Never Embed/No Editing’ shows a dialog box with the following message:

This font is marked with an FSType of 2 (Restricted License). That means it is not editable without the permission of the legal owner.

Do you have such permission?

Accepting it, the program allows me to modify my font and change that setting. I haven’t felt mislead into considering it an effective restriction, unlike when using LibreOffice or pdftk.

DRM and software freedom

All generic DRM issues apply here; I think there are more specific problems when it is used in works edited using free software:

  • these restrictions make studying or modifying the work harder; LibreOffice and unpatched pdftk don’t suggest a way of solving this
  • programs offering options to restrict works made using them usually mislead users into believing that that snake oil is secure
  • it legitimizes preventing users from studying or modifying the digital works that they receive

While all cultural works should be free, these issues apply to functional works like fonts, spreadsheets (non-hacker’s programs), research articles or documentation, for which the freedoms of free software can be most clearly applied.

Solution

Free software that we develop should have no antifeatures. If we find a free program with DRM, we should fix it, like Debian fixed pdftk. Software distributions should have explicit policies against DRM; the No Malware section of the Free System Distribution Guidelines would be appropriate if it was implemented and more widely promoted.

Free software is better: skilled users can fix it and share the changes that allow users to control their own computers.

Missing source code for non-software works in free GNU/Linux distributions

Most software cannot be edited without a source, making source availability necessary for software freedom. Free GNU/Linux distributions have an explicit requirement to provide sources of included software. Despite this, they include works without source. I do believe this is practically acceptable, while it restricts potential uses of the software and limits our ability to reason about software freedom.

The source

Section 1 of the GNU General Public License, version 3, defines the source code of a work as ‘the preferred form of the work for making modifications to it’. This definition is also used outside of the GPL.

However, only the author of the program can know if the given text is the source. C ‘source’ code is usually the source of the program compiled from it, while it isn’t if it was generated from a Bison parser. (Free software projects sometimes do accidentally omit the source for such files.)

Let’s simplify the issue: a source is a form of the work that a skilled user can reasonably modify. Some works, usually not C programs, are distributed in modifiable forms that might be compiled from forms that the author prefers more for editing. (Some generated parsers do get modified, making GPL compliance for them slightly harder.)

(For GPL compliance there is a more important issue of the corresponding source for a non-source work which is certainly harder than deciding if a work is just the source. It is beyond the scope of this essay.)

I believe these issues are trivial in case of C programs like printer drivers that inspired the free software philosophy and rules. For other works, deciding if a form is the source is probably impossible if the software was distributed.

Fonts

Fonts are ‘information for practical use’. They describe the shapes and metrics of letters and symbols, editing them is useful to support minority languages or special symbols needed in computer science. Now most fonts are vector or outline fonts in formats like TrueType. Bitmap fonts have different practical and legal issues.

Fonts legally are considered programs, while their description of glyph shapes just lists points and curves connecting them with no features expected from every programming language. Editors like FontForge can edit TrueType fonts, while it has a different native format with lossy conversion to TrueType which is preferred for editing.

Hinting in TrueType contains ‘real’ programs adapting these shapes to low resolution grids and making them legible on screen. These programs are distributed in a Turing-complete assembly-like language interpreted by a stack-based virtual machine. There are tools like Xgridfit which can compile a higher-level language into these programs. The other popular font formats, PostScript Type 1 and its derivatives, use high-level ‘hints’ like positions of stems and standard heights that the rasterized uses in unspecified ways to grid-fit the glyph.

While there is some benefit of editing the source instead of TrueType files, this is much different for meta-fonts. The Computer Modern project developed by Donald E. Knuth for use with TeX consists of programs using 62 parameters to generate 96 fonts. Modern technologies require drawing every font separately, while the same program describes e.g. a Roman letter for all fonts that contain it and doesn’t need many changes for new fonts. Making a separate set of fonts in a much different style for a single book is possible with meta-fonts, or gradually changing between two different fonts in a single article. (I have made a narrow sans-serif monospace style for a Computer Modern derivative in several hours. It is not published due to a licensing issue.)

However, there are nearly no other uses of meta-fonts as effective as this one. MetaFont, the program that interprets Computer Modern, generates device-specific bitmaps with no Unicode support. All programs that compile meta-fonts to outline font formats do it either by tracing bitmaps produced by MetaFont (resulting in big and unoptimized fonts) or generating outlines directly without support for important features used in Computer Modern. Recent meta-font projects rebuilding their sources from generated outline fonts or not publishing sources do not support this being a successful style today.

Hyphenation patterns

While some languages have reliable rules for hy-phen-a-tion, in English this was done using dictionaries of hyphenated words. This approach has significant problems that were solved by Franklin Liang’s hyphenation algorithm used in TeX, generating rule-like hyphenation patterns from a dictionary. 4447 patterns generated from a non-public dictionary allow TeX to recognize 89.3% of hyphens in the dictionary words.

The patterns are subwords with multiple levels of hyphens to be added or removed. The word hyphenation is hyphenated using hy3ph, he2n, hena4 and six other patterns, resulting in hy-phen-ation. (Not all hyphens are found, this will be fixed by future dictionaries using TeX to derive their hyphens.)

The same algorithm is used for multiple other languages with different patterns. They are usually generated from dictionaries restricted by copyright and not available to the users. Some languages have patterns distributed with the source dictionary. (I believe patterns could be easily written by hand for a language having reliable hyphenation rules depending only on the characters in words, although I haven’t seen any example of this.)

The patterns can be and are edited, while the source dictionaries can be more useful for development of other hyphenation algorithms. This makes them ‘a source’, but not ‘the source’.

(Technically, TeX doesn’t use the patterns directly. INITeX loads macro definitions, hyphenation patterns and font metrics, and saves its memory into a format: a very build-specific file for fast loading by VIRTeX which is normally used to build documents, representing patterns in a difficult to edit packed trie. VIRTeX does not support loading patterns since their compilation needs extra memory and code, now the same program is used for both purposes. Many other macro processors and Lisp implementations have a similar feature under a different name.)

Game data

Video games provide a bigger source of binary data. Many contain bitmaps or animations made using 3D rendering software from unpublished sources. Some games like Flight of the Amazon Queen are published as a single binary with no source and no tools for editing it. (A Trisquel users forum thread about this game originally motivated me to write this essay.)

This game has another interesting issue: a license that forbids selling it alone and allows selling it in larger software distributions. Well-known free licenses for fonts like the SIL Open Font License have the same restriction. It’s ‘useless’ since distributing the work with a Hello World program is allowed and this makes it a free software license.

Lack of source nor tools to edit it is more interesting. The Debian package includes an explanation of its compatibility with the DFSG. The binary is the ‘the preferred form for modification’ and the tools for its editing being lost made modifications equally hard for both Debian users and authors of the game. This is consistent with the source requirement being made to prevent authors from having a monopoly over their works (this explanation looks equivalent to the user’s freedom argument).

In GNU/Linux distributions endorsed by the FSF this is not an issue. Game data is considered non-functional and the only permission required is to distribute unmodified copies. (Debian excludes from the main repository games that are included in these distributions, while they exclude games that other distributions include. The first common issue is lack of data source or modification permission, the second is a restriction of commercial distribution.)

Documentation

Documentation of free software should be free, so it can be shared with the software and updated for modified versions. Most documentation is distributed as HTML or PDF files which are usually generated from various other markup languages.

Not all such documentation has a published source and sometimes software source is distributed with the binary only. (Sourceless PDFs often use nonfree fonts too.)

HTML can be edited and often is the source, while in other cases it is compiled from sources which preserve more semantic information about the document and have better printing support. For this reason we should not consider it the source if the author has a source from which it is compiled. Can we know this?

While the most popular free software licenses require providing the source with binaries, this isn’t true for most documentation licenses. No Creative Commons license protects the practical freedom to modify due to its focus on non-textual works. GNU FDL does and unlike software licenses it also requires the source to be in a free format.

The program-data dualism

Most of the above cases suggest that source code access is needed only for programs, not for data. This isn’t true and is not strict enough to be an useful criterion.

TrueType fonts are both programs and data. The PostScript page description language and typesetting systems based on TeX use Turing-complete programming languages for formatting documents which sometimes do contain nontrivial programs. Scripts describing events (and dialogue) in games are programs.

There is another difference between these works and compiled C programs: they work on multiple architectures. This is not a sufficient criterion for requiring sources, since we do not consider Java programs distributed as class files without source free, while they run on all architectures supported by Java virtual machines. Binaries being architecture-specific make distribution package builds for unpopular architectures like MIPS a more useful way of finding missing sources.

Version control

Most recent free software projects distribute the source in two ways: in distributed version control system repositories and as archives of a specific versions: tarballs which often include generated files that require ‘special’ tools to build that not all Unix systems had.

For development, source is obtained from the version control system, since it has the whole project history explaining why the changes were made. For fulfillment of the source distribution requirements, the tarball is used. Does this mean that the tarball isn’t the ‘the preferred form of the work for making modifications to it’?

Conclusions

We should provide the sources of the works that we make, since only in this case we know that it is the source. The source should be in a public and distributed version control system and include tools to build all non-source files of the work.

Verifying if software written by others has a source is harder. If you can edit it, then maybe it’s free and it’s a source. Don’t distribute software that you don’t use, since you don’t know if it respects the freedom of its users.

Manual dynamic memory management might make debugging easier

Dynamic memory allocation has an important use in real world programs: data like input lines has no fixed size, so robust programs shouldn’t allocate fixed buffers for it. Writing a homework program I found another reason for it: it leads to more errors that tools like Valgrind can detect.

Using static allocation for such programs has its advantages: all size limits are specified, so it won’t cause errors like truncating input; it’s easy to declare static arrays of structures in C; it’s faster and doesn’t require many additional lines of code for deallocation of the structure nor thinking where to free the objects.

However, this approach makes debugging harder: static memory is valid for the whole run of the program. Tools like Valgrind’s Memcheck won’t complain about uninitialized values being read or memory being accessed after being freed. (There are problems that occur only with dynamic memory, like double frees or frees of unallocated memory, I don’t consider them as common or as hard as the issues that don’t depend on memory allocation style.)

(In real world programs another reason to use dynamic memory is that some of it might be returned to the operating system before the program finishes. This would be useful in long-lived processes doing big allocations for quick computations, although it won’t occur in all cases due to the way how malloc-style routines work.)

The program that motivated me to write this article implemented monotone polygon triangulation using the DCEL structure to represent the polygon. (This structure is not needed for this algorithm. Using it leads to having more bugs, therefore it is educationally useful.) The code managed half-edges to run an algorithm designed for vertices. Storing triangulated parts of the polygon was not needed, so they were deallocated immediately after printing their representation. This resulted in a use-after-free error detected by Valgrind, fixing it corrected an incorrect result on a different polygon. The code added diagonals between half-edges, the resulting graph wasn’t a correct triangulation if the diagonals were added between half-edges of different polygons: some of which were completely triangulated in some cases and thus deallocated before adding the faulty diagonal.

For nearly all my other programs I use languages with automatic memory management (and I don’t use data structures as complex as DCEL), so there would be no error with the deallocation delayed after the final use of the object. Previously I thought that all such errors would be introduced by incorrect placement of free or delete calls. This program helped me realize that the delete operator can be useful in detecting otherwise incorrect code.

Lemote YeeLoong 8101B with Loongson 2F CPU review

The Lemote YeeLoong is a small and free software-friendly laptop and one of the few available non-x86 (and non-ARM) laptops. (It’s sometimes called a ‘netbook’ or a ‘mini notebook’.)

As a user and contributor to a GNU/Linux distribution supporting this device, I’m often asked about it. The information published by the manufacturer and distro maintainers doesn’t reflect what could be seen by a user. This review is based on my experience using it and questions of free software supporters interested in this device.

The YeeLoong I have is 8101B with a 10.1 inch display. The 8089B model probably differs only in display size (8.9 inch), having the same internals. There are newer YeeLoongs with 2G or 3A CPUs being marketed, these are significantly different on most points discussed here. (This review probably won’t be helpful for review-writing classes, there are better resources for them available.)

Hardware

Case

One of the most common marketing claims is that the machine was built by Quanta and the case is of high quality. This seems more reliable than most qualitative opinions stated on the Lemote page.

Shiny lid, the user-visible part matte, the parts I use aren’t visibly shinier. No intrusive logos (the small model name near the screen helps typing it correctly). No scratches after more than two years of using. The display hinge works ok.

(My other laptop, an Asus F3U has many scratches on top and had once many parts changed after the display hinge broke. Despite being newer by about a year, the parts I touched are now much shinier. Completely different experience.)

CPU

Loongson 2F is a single core MIPS3-compatible 64-bit CPU with some custom ISA extensions (not all used in software).

There is a custom SIMD extension, similar to MMX although not well-supported by GCC and with different intrinsics. A Gentoo hacker used them to optimize an important graphics library and posted a great explanation of these issues.

There are easily worked around bugs which would hang the machine (still untrusted code can do it), they were a bigger problem before English documentation was made available.

There is no uploadable microcode, this is one of the reason why x86 systems probably won’t be as free as this one (even the free boot firmware implementation coreboot usually requires nonfree CPU microcode).

The manufacturer claims of buffer overflow protection, this probably refers to an NX bit. I don’t know if it’s used in software.

Video card

The video card is a SiliconMotion SMI712 which does not have any hardware 3D acceleration. The reason why I consider the machine not completely supported by free software is limited 2D or video acceleration in free GNU/Linux distros.

gNewSense metad uses the fbdev driver without support for resolution change or 2D acceleration. Parabola uses the siliconmotion driver with unoptimal support for these features (fbdev is available). Newer X server releases make XAA slower (this is very noticeable when using KDE), while EXA hangs the machine (not a new issue), so fbdev might be faster now. There are legendary drivers with xrandr support, I never used them.

Gentoo has patches making full-screen low quality YouTube videos playable (I used WatchVideo for this), they probably could be ported to other distros. There are ongoing discussions on a new SiliconMotion video driver on the X.Org development list, maybe this driver will improve this situation (it has xrandr support).

The VGA output has low colour quality, although I haven’t used such outputs on other machines for a longer time, when all my other machines can use DVI-D. The documentation of the chip claims dual head support at 16 bpp, I never used it successfully.

SMI712 has only 4 MiB of video RAM, using reasonable resolutions might need special X settings to fit in it.

Summing up, I believe the only good thing about this chip is no dependency on a nonfree VBIOS, system-provided microcode nor driver. Unfortunately, no other graphics chip used in laptops or desktop computers known to me has this feature.

Display

Despite all the driver problems, this machine is fast enough to read typical books in a PDF reader. The screen is matte, unlike my other laptop, so it’s useful even during summer days.

However, decreasing the backlight brightness results in a headache-causing flickering (a problem caused by the LED backlight design, probably occurring on other devices). Usually I can use it at full brightness, it’s a less noticeable issue when not using X.

Wi-Fi

There are reports of the Wi-Fi card not working, but I haven’t observed any problems with it. AP mode is not supported by the driver, I never wanted to use it since my other machine with a more powerful Atheros 802.11n card supports it.

The card supports 802.11b and 802.11g, not 802.11a despite what some reviews state.

Webcam

The webcam works with only some programs, depending on kernel version. I probably haven’t tried enough to configure it.

SD card reader

It works with both SD and SD HC cards. Somehow on Parabola reading from the SD card was needed before the partitions were found, so it didn’t work well with the GUIs for mounting storage.

Booting from SD cards is not supported.

Touchpad

There is no middle button and the layout of left and right buttons make simultaneous clicks impossible.

The device is of Sentelic, it doesn’t support absolute positioning in the driver (possibly due to patent issues). I had better experience with an ALPS touchpad supported by the xf86-input-synaptics driver.

There are various non-mainline drivers for Sentelic touchpads (e.g. for MSI Wind), maybe some of them would work with the one in the YeeLoong. I haven’t tried using them.

Fan

It’s loud. It’s too often running, although this might be partially fixed using thinkfan.

RAM

Only 1 GiB is supported by the CPU and boot firmware. The SO-DIMM can be changed, I haven’t found any need to do it.

Disk

The (probably too optimistic in general case) hdparm benchmark shows 20 MB/s transfer speed, even when I configured the driver to consider it being connected via an 80 wire cable (a Parabola hacker had similar results with an SSD). The chipset and disk documentation suggests a much higher speed being supported. (Maybe this is related to using a SATA disk with an IDE controller?)

Suspend

Fan spins while suspended to RAM, so I use only suspend to disk.

Battery

The machine can work up to two hours on battery. The manufacturer claims lower power use of 12 Watts for the SSD version, data available to the system suggests it being similar for some uses of the HDD version.

Most netbooks work much longer on battery, this results from both bigger batteries and lower CPU power usage. Users who need this use external batteries (I never used them).

Connectors

The device has external connectors for VGA, power, 3.5 mm microphone and speaker, 100 Mbps Ethernet and three USB ports. Its layout prevents using both an Ethernet wire and an USB mouse (the wire would be on the place where I would keep the mouse), this doesn’t change mice being uncomfortable for me in all cases.

Boot firmware

The YeeLoong is often called the only laptop not requiring nonfree software. EC and hard disk firmware are exceptions.

All Lemote machines use a derivative of PMON2000 as boot firmware. It is free (under a four clause BSD license), although on all other devices than YeeLoongs with 2F CPUs it requires a sourceless VGA BIOS blob.

PMON initializes the hardware, shows a menu of kernels to run (using a GRUB 0.97-like configuration file) and boots one of them, supports network booting and flashing itself. It’s not compatible with x86 BIOSes and is more powerful (e.g. it can boot a kernel from an ext2 filesystem, although it doesn’t support newer filesystems).

Booting is fast unless using an initrd (gNewSense and Parabola kernels don’t need it unless using an encrypted root filesystem) or GRUB 2 booted from PMON (I see no benefit of using it).

It is also possible to use GRUB 2 as a PMON replacement, installed directly to a PLCC chip (coreboot doesn’t support the machine). It is difficult to solve potential problems with it due to the PLCC chip being soldered in most devices (or difficult to access).

Software availability

A Debian-based system with very old packages was installed on the machine. I haven’t used it long before installing gNewSense metad. (The installer was broken at that time, so I haven’t used it initially for two weeks before it was fixed; this problem motivated me to use IRC, this led me into contributing to several free software projects and using fully free GNU/Linux distributions.) The review at OSNews has more details on this system (and many other features that I haven’t noticed).

gNewSense has most Debian packages available. There is GHC without the interactive interpreter and a slow Java implementation (without JIT). Mono and Valgrind are not available (although the newest release of Valgrind supports MIPS and is included in Debian Jessie). Gnash is available, although it is too slow to be useful for me and there are better specific tools for most tasks that I could need it for.

While Debian and gNewSense use packages built for any little-endian MIPS system, Parabola has them optimized for Loongson 2F and uses a different ABI called N32 that uses 64-bit registers (and all floating point registers, unlike O32 used in Debian) while 32-bit pointers are used (so a single process can use only 2 GiB of virtual memory: the highest address bit is used for kernel and physical memory). As an advantage, some articles suggest it being 30% faster on some operations. A disadvantage is lack of support for many architecture-specific packages like Java, Valgrind or GHC (and much more portability problems in other packages like WebKit which doesn’t need to use architecture-specific code). Now more packages start to require a JIT, so modern Mozilla software and Qt 5 aren’t available on N32.

One of the reasons for RMS to use such a machine is that it is not supported by popular nonfree operating systems, so it won’t be used to promote them, unlike OLPC.

Performance

The CPU speed is not a problem unless compiling distro packages or using Java or other programs that are optimized for JIT not available or working on the MIPS ABIs used (or when playing videos without large assembly patches using its SIMD extension).

Building GCC, Mozilla browsers or WebKit is too slow to maintain these packages correctly in Parabola. Typical tasks like Web browsing are interactive enough, unless building a package at the same time or viewing a big JPEG image (although this is also slow on my AMD64 machine).

Having played free games like Wesnoth, FreeDink and DCSS (without tiles which require hardware-accelerated OpenGL), I’m not convinced that good games need 3d acceleration.

Availability

There were YeeLoongs available in Europe from Tekmote Electronics (where I bought mine from) and KD85.com. Freedom Included was selling them in the USA with gNewSense preinstalled.

The manufacturer site claims of ‘very competitive price’, this certainly isn’t true in Europe in comparison with non-freedom-respecting x86 netbooks.

Summary

I know three main reasons to use a YeeLoong: it respects user’s freedom, it can be used for MIPS programming and it is a small and portable laptop. I don’t know any good alternatives for the first two of these uses. Except for the graphics performance, I believe the YeeLoong might still be an appropriate device as a general purpose small laptop (although this is not a significant problem for most of my text-oriented needs).