Moving from AWS Route53 to Gandi LiveDNS

Looking for ways to simplify my setup, I recently switched the DNS zones for my domains from AWS Route53 to Gandi LiveDNS. I already have my domains registered at Gandi and use a VPS of another provider, so I can reduce my dependence on Amazon and save a small amount of money.

I use two kinds of automation working on DNS records: my zones are defined and managed via OpenTofu, and my servers use Let’s Encrypt DNS validation for its wildcard certificates. I don’t use any advanced features of Route53: if not for the temporary records added by certbot, my zones would be completely static (with changes only when I update my email setup or add/remove subdomains for websites).

Both tools have extensions interacting with Gandi: there is a Gandi provider for OpenTofu and a certbot plugin. (I checked that, and the IPv6 support, before deciding to use Gandi’s DNS. Otherwise I’d look for other DNS services with certbot support.)

The migration process itself was different than I expected: I had to switch NS records for the domain while providing a zone file, instead of enabling LiveDNS and then building a new zone before switching the domain to it. (Gandi also allows starting with a default zone which is useful only when using their other services for Web hosting and email.) So I needed a zone file export via the cli53 tool (which I already knew and used a long time before to compare zones when dealing with automation). The zone file required at least removing Amazon NS records before being imported.

The OpenTofu migration was easy: I configured the provider using a personal access token for the Gandi API, copied my files defining the records for Route53, and ran some find and replace to rename resource types and their fields. There are OpenTofu records for domains and records, so this is mostly the same as before.

I imported the records from LiveDNS into OpenTofu via its import command. I had to look in the provider’s source code for the syntax of import IDs: these are fqdn/record/type, e.g. tofu import gandi_livedns_record.example_com_mta_sts_txt example.com/_mta-sts/TXT would make the example_com_mta_sts_txt record in its configuration handle the existing _mta-sts.example.com TXT record. (With my resources being organized around record types often with for_each accessing subdomain names for a host, I generated the import commands by running tofu plan and transforming the text informing about resources to be added.)

Gandi LiveDNS requires relative DNS names, with @ for the root domain, and has different quoting for TXT records, while Route53 uses fully-qualified domain names. So I adjusted all of these until tofu plan informed of no changes to perform.

I tested the migration via the usual tools, getting some records via host and querying at least one website checking email-related records (which were the one most likely to break, being long TXT records affected by the difference in quoting between Route53 and LiveDNS).

The certbot plugin was trickier. As Gandi recently replaced its API keys by personal access tokens, the plugin required an update which was not yet in Debian Stable. So I had to backport it.

I needed to regenerate the certificates to be sure it works, running sudo /usr/bin/certbot --verbose --authenticator dns-gandi --dns-gandi-credentials /etc/letsencrypt/gandi/gandi.ini renew --force-renewal.

I’m not fully satisfied with personal access tokens: they require manual rotation, so I get the pre-Let’s Encrypt issue of requiring at least a yearly manual action to keep TLS working. The token also gives all domain-related permissions. A computer running a Web server for one domain should not be able to e.g. change records of other domains. I could solve this by using the certbot-dns-standalone plugin, delegating only a zone specific to ACME verification for the given domain, but I can do that later.

Then I deleted the Route53 zones via OpenTofu. It took a long time.

Several days later, all my services work, at least as well as they need to for hobbyist purposes. Now I need to research object storage providers to replace the only service my AWS account still bills for.

Recorded presentation slides should have more text

A lot of advice that I have seen on making presentations with slides recommends not putting enough text on the slide to be useful without the presenter’s speech, ideally just a few words or a picture per slide, with at most around a dozen slides. Having listened to many presentations of various styles, I disagree with that advice.

Slides are mostly used for presentations that are now shared in real-time over video call solutions, prerecorded, or recorded for non-interactive watching in the future. Usually the recipients see the slides and hear the speech concurrently in a multimedia way. If there are no spoken words, then slides are not as useful as an article (other kinds of slideshows might be useful, while they are out of scope of this article and belong to a museum as an animated caption for its exposition). (I had high school assignments of preparing slides where I would not be presenting them and I would not get any feedback; these could have been much more useful.)

(I’m not considering lectures from before the time an overhead projector was introduced. It’s problematic when a student has to copy all text the professor writes on a blackboard and has no remaining thought to comprehend it and ask the professor useful questions. It’s good that movable type and electronic communication were invented so now lectures can use multimedia techniques with materials that students can review before or after the lecture.)

Remote meetings using video call technology add a lot of issues to presentations: like audio quality and network issues (including jitter and complete connection loss). Listeners are more likely to mishear some words which is already an issue when listening to people speaking in the same room.

Currently popular video meeting software also makes slides less accessible: they are often shared as a video of the presenter’s screen. This uses lossy compression often making text blurry. We could instead get a perfectly rendered PDF or HTML slide; possibly the listener could adjust the font size and other visual aspects for their own accessibility needs. (Wearing glasses, I don’t want the uncertainty if I see blurred text due to the prescription being outdated, or due to the video call software introducing artifacts to the text.)

As language is redundant and many words could be omitted preserving the message, we can put a lot less on the slides. However, some things are non-redundant, harder to comprehend from speech and should be displayed on slides. These include large numbers (which we might compare against context and usually hear and forget before the became relevant), dates, acronyms (which I often need to draw with my fingers to recognize from spoken letters; it’s good that people usually do not abbreviate the less obvious ones when speaking) and mathematical equations. In the rare cases when a presentation includes tables or URLs, these obviously should be included (and if they refer to resources on the Web, they should have full URLs for these). Obviously charts need to be shown on slides, their spoken (or written) descriptions are necessary but not sufficient.

Some minimalism is needed in slide design: they should not distract from the presentation. This involves not having typos: use spell-checking software and proof-read the slides before the speech. They should have an accessible color theme, legible fonts with not too small text. (Too big text is rarely an issue, slides are too small to require aerial view like the Nazca Lines.)

It’s also easy to miss short words when listening to a speech; a common example is ‘not’. So a small amount of network issues can completely invert the perceived meaning of a speech. This can be solved by showing the correct message on a slide.

Even with a perfect network people do not listen at 100% of their capacity: they might be distracted (by letter carriers, by power outages, loud neighbors, etc) or tired after hours of work. These make it harder to comprehend every word of a speech. (All regular accessibility issues, experience with the used language and the subject matter affect this.)

As a slide is shown statically for a longer time (even when the presenter reveals the slide line by line while speaking on its topic, the past lines remain visible longer), a listener can check the slide to read what they missed hearing and to consider the previously shown data in a new context.

The advice against reading slides has some merits: a meeting is not needed if it could have been an email (or a blog post) with the slides containing everything and there being no interaction with the presenter; the presenter has to explain the topic in an interesting and a different way than incomplete slides.

The technical presentations I’m most proud of were about a complete document (an explanation of a technical issue or a user story) when every new information that I said was an error in the document requiring it to be fixed after the presentation. The listeners did learn more from these than just by reading the original documents and they contributed a lot of feedback to these. Both the meeting was needed, and the document itself and the fact that I presented it while talking.

So for my future presentations, I will aim for slides or other visual materials that are sufficient on their own while possibly more concise than my usual writing style. I’ll focus on including all the data that I would speak in the slides. And I might need to prepare the slides early enough so I can share them with remote listeners instead of having to screen share.

Using Docker with IPv6

I found a tool extending Docker with IPv6 NAT and an issue about it being replaced by a builtin Docker functionality and I thought that if Docker natively (but experimentally) supports IPv6 then it will be easy to configure, so I chose to enable that for my Mastodon instance. It took slightly more work than I expected.

Unlike much more complex earlier guides that I have seen, this approach uses NAT, just like with IPv4. My containers do not expose any public services not proxied by an nginx on their host, so I don't need a more complex solution or a pool of public IPv6 addresses from my VPS provider. I assume a recent Docker version, I'm running 20.10.21.

Configuring the builtin IPv6 support I found several issues:

  • Due to an IP address parsing bug, an address pool is used for only one subnet. So I need a separate default-address-pools entry for each.

  • Not a bug, but I found no information on the default for default-address-pools and I copied that from an example in the Docker documentation, so I also had to update my iptables rules to allow Prometheus to access the host's node_exporter from a different IP address.

  • docker-compose.yml version 3 doesn't support enabling IPv6 (since it limits its functionality to what Docker Swarm supports; similarly it doesn't have memory limits without Swarm); so I upgraded to version 2.4.

  • docker-compose up -d and Ansible docker_compose module do not recreate networks, so their changes are not applied.

  • Mastodon failed to connect to PostgreSQL on IPv6; I don't need IPv6 for the internal network used for its PostgreSQL and Redis, while there is an issue about Redis on IPv6, so there is nothing here for me to debug and I keep using only IPv4 for that network.

So my /etc/docker/daemon.json evolved into this (with more single subnet address pools omitted):

{
  "default-address-pools": [
    {
      "base": "172.30.0.0/16",
      "size": 24
    },
    {
      "base": "172.31.0.0/16",
      "size": 24
    },
    {
      "base": "fd00:0000:0000:00::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:01::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:02::/64",
      "size": 64
    },
    {
      "base": "fd00:0000:0000:03::/64",
      "size": 64
    }
  ],
  "experimental": true,
  "features": {
    "buildkit": true
  },
  "fixed-cidr-v6": "fd00::/80",
  "ip6tables": true,
  "ipv6": true
}

(BuildKit is an unrelated feature for a much better image building experience, I use it on my laptop.)

I tested that containers support IPv6 (here with a large latency on my laptop):

$ docker run --rm busybox ping -c1 -6 mtjm.eu
PING mtjm.eu (2a01:7e01::f03c:91ff:fefb:b063): 56 data bytes
64 bytes from 2a01:7e01::f03c:91ff:fefb:b063: seq=0 ttl=49 time=47.265 ms

--- mtjm.eu ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 47.265/47.265/47.265 ms

Several docker-compose.yml files now have:

networks:
  default:
    enable_ipv6: true

while others specify IPv6 for external networks:

networks:
  external_network:
    enable_ipv6: true
  internal_network:
    internal: true

And in several directories I did docker-compose down and then docker-compose up -d.

I used this inelegant command to find containers not having IPv6:

for f in $( docker ps --format '{{.ID}}') ; \
  do docker inspect $f | grep fd00 || echo $f ; \
done

and it listed only the databases on internal networks.

So now all my containers talking to the outside world can access IPv6 services. But I don't know e.g. any IPv6-only Mastodon instance to check if mine can communicate with it.

Personal server exit review

I used a Cubieboard as a personal server until October, running Kanboard, tt-rss and some other Web apps there. It allowed me to keep more of my data at home and practice some more system administration.

Why Cubieboard

Initially I wanted a computer that would work during the night in my bedroom. This required a fanless design and therefore a low power usage. (Now I have a separate room for the computers, so they don’t need to be so quiet, while I pay their power bills.)

I could have either used my BeagleBone Black, a Cubieboard or buy a different board. I haven’t chosen the BBB for this since its 512 MiB RAM is not enough for a Web app that I maintain in my spare time (Cubieboard’s 1 GiB is enough), and I wanted a more stable storage than microSD: Cubieboard has a SATA port which I connected to an SSD (also no moving parts). Meanwhile, I can use the BBB for flashing and debugging coreboot which requires some downtime.

(Now I think I wouldn’t need the personal server to work at night, but it’s easier: I can read some Web comics when I resume my laptop from suspend and get daily mails from cron and Kanboard. Also I won’t forget to turn it on before accessing these while traveling.)

How it was configured

ARM makes booting slightly more interesting than on x86. It booted u-boot from a microSD card which loaded a kernel from /boot on the same microSD card (I wasn’t able to make it load a kernel from the SSD, while it was documented to support SATA) which mounted a btrfs root filesystem from the SSD.

The server was running Debian Jessie, manually installed using its usual installer (with good support for such boards). I configured nearly all services running on top of it using Ansible. Much configuration was shared with my other computers, e.g. using OpenVPN, Postfix for relaying locally-generated mails to my VPS, etc.

Like all my computers (or, in case of Thinkpads, disks moved between computers), it had a unique hostname. I named it after Sam, the trusted friend of Alice, Bob and Frodo.

How I broke it

After receiving a mail from apticron about Debian package updates being available, I ran aptitude full-upgrade. There was a kernel upgrade, so I rebooted it. This worked many times, but it didn’t once in October.

After getting the serial TTL cable (which required removing the top of its case), I found errors from initramfs. Root couldn’t have been mounted due to filesystem errors. Checking in another computer (a big advantage of SATA over soldered storage chips), I have seen many btrfs errors, while all interesting files could be read.

So I copied the filesystem image to my desktop, ran mkfs.btrfs, copied all files to the new filesystem, in many reboot loops I fixed /etc/fstab and some initramfs configuration. Then it was not booting, probably not being completely configured to use the new filesystem.

Now

Not being able to fix it ‘now’, I migrated services to my desktop computer (really easy with Ansible). I used data restored from the filesystem image and a daily PostgreSQL image. (I couldn’t get the possibly corrupted newer PostgreSQL data: it won’t load files written on a different architecture, requiring using pgdump on armhf.)

Two months passed and I haven’t noticed a need for that server, so I still haven’t fixed it and use the desktop as a personal server. There is a difference in the power bill, while I don’t know how much of it can be attributed to the desktop running more often now.

Future

When I set up a new personal server, I will think about filesystem errors before it stops booting. Maybe periodically running btrfs scrub or choosing an older filesystem would help. Certainly, I should backup before installing any OS update. I should also get a recovery method for when the OS won’t boot (very easy on x86).

Update 2018-06-16: My notes show that I have reinstalled the personal server and it mostly works since January 2016. Update 2022-11-15: Later I moved all the services to my VPS (or replaced them by local software on my laptop) and I stopped using the personal server in August 2019.

Inclusion of licenses longer than licensed works

There are licenses known for excessive attribution requirements: in a single project the old four-clause BSD license required including 75 different texts in all advertising materials. The license text itself can be long (GNU FDL 1.3 takes more than 3 500 words, the Web browser that I use would spend nine A4 pages to print it): imagine an award pin with an FDL-licensed image or several pages long document derived from a GNU manual. Both need to include the GNU FDL text. It makes the license, despite being free (possibly in specific cases for FDL; in all cases for GNU GPL), unusable for some kinds of free works.

If you don’t consider award pins sufficiently complex and original, imagine a postcard from a traveling family member. It should have a beautiful photo on one side, like the ones that Wikimedia Commons has, and the whole other side filled by a letter describing their holidays, and your postal address. There is no place to fit nine pages of license text there, and the postcard is distributed by itself, so no separate booklet with required legal texts can be included.

It’s one of the reason for GNU FDL being used for ‘professional’ photos: it’s free, so it is accepted in free culture projects like Wikimedia Commons, but it's unusable so proprietary relicensing businesses work. Wikimedia Commons now discourages using GNU FDL for photos without dual-licensing under a more usable license.

I do believe that this is a significant bug in the license: copyleft licenses should be designed to not support proprietary relicensing or proprietary extensions businesses (i.e. proprietary software businesses) and should not have features that are useful nearly only for such businesses (while FDL has several, possibly since the license was designed to be used by traditional publishers). (There are several different problems in other, more important, copyleft licenses like GNU AGPL or GNU GPL3, e.g. the optional attribution requirement. Some of them are solved in copyleft-next; e.g. the Nullification of Copyleft/Proprietary Dual Licensing clause protects against proprietary relicensing by removing the copyleft for all in some cases.)

How can we solve this problem? By not distributing FDL-licensed works and by not recommending the use of such licenses for cultural works. This requires recommending specific better licenses.

GNU recommends their all-permissive license for short documents like README files. Unless the work is a part of a GNU package, a free Creative Commons license is probably a better solution: copyleft (without source provision requirement) CC-BY-SA, permissive CC-BY or ‘public domain but legal everywhere’ CC0. In its clause 3(a)(1)(C), CC-BY-SA 4.0 requires to

indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License,

so it’s sufficient to fit an URI like https://creativecommons.org/licenses/by-sa/4.0/. (I have seen a much longer text than this URI written on a single pea seed in a local museum, so this surely works for bigger works like award pins or postcards.)

A more general term is used in copyleft-next: ‘inform recipients how they can obtain a copy of this License’ which is obviously satisfied by an URI. (The whole officially recommended license notice is: ‘Licensed under copyleft-next version 0.3.0. See https://gitorious.org/copyleft-next/copyleft-next/raw/master:Releases/copyleft-next-0.3.0 for more information’. Compare the three paragraphs recommended for the GNU GPL.)

This couldn’t have been done several decades ago. There was no Web in 1991 when GNU GPL2 was released (this is why usual GPL legal notices had an FSF address, changed several times after the license was released, until the GPL3 with both an URL and distributed license copy). It was reasonable to assume that the user couldn’t have obtained the license text from the Web, but now it’s probable that every computer user can access the Web, although not necessarily from their home. (How many GPL software recipients can access postal mail to use the source offers and not the Web?)

(This is not the only problem with long licenses or requiring to include their text in the work. It is a bigger problem that some licenses are too complex or too badly written to be understood by users, but that problem cannot be as easily quantified as their texts not fitting in the work: understanding of licenses is ‘cached’ in memories of their readers who have already met e.g. the GNU GPL3 for many other works. It would be also possible, and evil, to write a very short and incomprehensible license.)

PlaneShift and free software

On the download page of PlaneShift I see big letters ‘Fully Free Cross-Platform MMORPG’ and ‘Open Source Development!’. They provide the source code of their client, while writing how this helps user’s freedom and security. (I prefer using clearer terms like free software and copyleft for the exact things that they praise. While I played PlaneShift many years ago, I do not have any opinion on it beyond what I write in this essay, since I’m not interested in multiplayer games.)

However, they write that they need ‘some additional bounds [in the license] to keep safe the work of [their] artists and to ensure project success’. This both supports false assumptions (there are safe and successful projects releasing fully free cultural works) and significantly reduces the benefits of their licensing for user’s freedom and security.

They have clearly explained their licensing and its rationale. Source files are licensed under the GPL, while artwork, text and rules in the game use a custom nonfree license (called the PlaneShift Content License).

Despite using a free and copyleft license, the client has significant restrictions on user’s freedom:

  • ‘You cannot distribute the client, sell it or gain any profit from it’
  • ‘You can use our client only to connect to Official PlaneShift Servers’

So of the free software freedoms only a small part of ‘the freedom to study how the program works’ applies. It does not belong ‘to the community of OS developers’, it belongs to Atomic Blue, the organization running PlaneShift. While their licensing is rationalized by making forking as hard as possible, all benefits of free software that they write about require forking.

The ‘content’ license is short and simple. It forbids any distribution or modification of the work, allows using it only (personally) with their official servers and ‘a Planeshift Client, distributed by Atomic Blue’, and disclaims all warranty.

I’m not able to understand what their encouragement for users to ‘experiment with mods and changes to either [their] source code and to [their] art assets’ might mean. Are they recommending infringing their copyright or promoting fair use in a very unclear way?

The requirement to use the artwork ‘only in conjunction with a Planeshift Client, distributed by Atomic Blue’ might forbid using the client software if built from source. So that software, as normally used, is as free as if it was written on stone tablets, impossible to copy nor modify. All security benefits of its source code being free disappear, when a Mallory just needs to backdoor Atomic Blue’s compiler.

Even if a client built from source could be used, GNU/Linux distros wouldn’t be able to include that game, since they wouldn’t be allowed to distribute the needed artwork. The source might be free, but it’s not useful without the nonfree artwork. (Or is it? Write if you know a free derivative of it working without Atomic Blue’s artwork and servers.)

Free software Flash replacements and the JavaScript trap

One of the nonfree programs that make it hard for many people to use completely free software operating systems is Adobe Flash. There are several free software projects aiming to replace the Flash interpreter, one of them used to be an FSF high priority project. I don’t believe that developing such programs will significantly help people stop using nonfree software. (While hardware compatibility issues resulting from free drivers requiring nonfree firmware are well-known and probably more noticeable, they can be easily avoided by buying appropriate hardware, it’s not hard. There are social issues that make people use the same websites as their friends, but not the same computer hardware.)

While Flash has many uses, both as a Web browser plugin and for desktop applications, I will focus on its common use for video players on websites like YouTube.

Replacing Flash is hard

No free software implementation of SWF, the file format used by Flash, can currently support most such files used on the Web. gNewSense contributors mentioned both patents and incomplete specifications making this hard to do. Another issue is the Digital Restriction Management implemented in Flash. A sufficiently complete free implementation would probably violate anti-circumvention laws making DRM an effective restriction of our freedom.

The JavaScript trap

Even if we had a complete and free SWF implementation, it would interpret nonfree programs that websites publish. It is exactly the same problem as the JavaScript trap: using free software interpreters to run untrusted nonfree software from the Web. (I haven’t noticed this issue before reading the RMS’s essay on JavaScript and gNewSense’s page on SWF.)

Some sites like YouTube are moving to providing videos via the HTML5 video tag. It doesn’t solve this problem, since now nonfree JavaScript programs serve the same purpose as previously SWF. I think it might make writing free software replacements easier, due to free development and debugging tools available for JavaScript.

Why we need video downloader programs

Issues with specific video publishing sites are completely solved for their viewers by not running the code that the site provides (either SWF or JavaScript) and using a free software program to obtain the video. This can be done by youtube-dl, a command-line program; UnPlug, a browser extension; and many other programs. There are also extensions that display the video inline on the page without using its builtin player.

These tools support only specific sites, while very many are supported by youtube-dl despite its name. On other sites you can usually find the video URL by reading the source of the HTML page or the included JavaScript code. (It might be a nice fetish to have.) I don’t know what work is needed to use an unsupported site with a free SWF interpreter like Gnash.

Being able to download the video and save it on persistent storage (instead of downloading it just to display it in the player) is needed for at least several useful reasons: we cannot remix without downloading the video, we cannot protect against centralization and copyright censorship while accessing the works from a single centralized site and we cannot share it with our friends (or be a good friend to them) without having a copy. Even the very limited freedoms weakly protected by copyright law as fair use cannot be used without storing a copy of the work.

(While I highly disagree with completely rejecting JavaScript due to its usefulness in free Web applications, the arguments used against it clearly apply to SWF. Video downloader programs and browser extensions are software that we can write to replace nonfree software provided by websites.)

Flash animations

Before Web videos became popular, SWF was often used for vector animations. This might include them in the difficult to reason about area between software and non-functional cultural works, while there is a simple reason to consider it software: it has antifeatures. We need the freedom of free software for such works to make them respect their users.

JavaScript and HTML5 canvas are replacing this use of Flash too, so now nonfree programs using better tools control the animation.

Publishing your own works

If you write an interactive website, use JavaScript. Release your code as free software. If you make videos, release them on your site using a free software-friendly video format like WebM, or use Web applications like GNU MediaGoblin (possibly an instance run by your friend).

To prevent DRMed sites from using your videos to restrict their users, use a free culture license that disallows using ‘effective’ technical restrictions of the freedoms that it protects, like CC-BY-SA 4.0. (YouTube requires giving them a different license, don’t upload your work there.)

My email spam filtering and end-to-end encryption

Big email providers use very complex spam filtering methods. Solutions used by Google require distributed real-time processing, access to to plain text of all messages. Their work is closely followed by spammers in an arms race, while it’s not usable for small servers and both sides benefit from reducing user’s privacy. In this article I describe how spam filtering works on my personal server: a solution optimized for low administration effort and not using message content. It involves using only existing known free software packages without much extra configuration beyond what’s needed to have a working mail server.

Email spam that I receive comes from three main sources: zombie computers in botnets, hijacked accounts and Polish businesses. Zombies are easy to block, since they do not comply with mail standards in easily detectable ways. Hijacked accounts are now rare (partially due to the hard work of Google explained in the linked mail; it would be easier if the two Yahoo users who don’t spam moved to other providers).

Spam from Polish companies is my main issue, since they use properly configured servers and their own IP addresses. There is a law that allows sending uninformative spam to everyone, while informative spam can be sent only to companies. They do not check if the recipient has a company.

I use the following methods to filter these kinds of spam on my server using Postfix MTA:

  • postscreen It filters much zombie spam by adding a several second delay and checking if the client waits before sending data and doing several other protocol correctness checks.

  • Sender Policy Framework Since zombie spammers do not use their own domains (these would be blacklisted by Google), they use fake sender domains which are often real. SPF records specify which servers are authorized to send mails for that domain, so zombie spam using it is blocked. Not enough domains use it. SPF would block some good mails if I used email forwarders without SRS, I don’t, since I have no use for forwarders. (The SPF validator implementation that I use is pypolicyd-spf.)

  • postgrey It greylists all mail not from known trusted servers that haven’t successfully delivered a mail recently; i.e. it returns a temporary error code and allows the mail to be sent again after several minutes (proper servers do this; email servers work well without 100% uptime). This leads to delays when getting mails from new servers, annoying for registration emails from shops. It blocks nearly all remaining zombie spam.

  • static IP address blacklist For professional Polish spam businesses. For one provider, I have to blacklist entire IP ranges. This solution wouldn’t work for a server with more users.

I don’t use these common methods:

  • checking reverse DNS records: it fails on real servers and would block much self-hosted servers
  • using external RBLs: they are bad and block self-hosted mail
  • DKIM: I don’t find enough value in it to find how to configure it; I think it might be useful for more complex filtering that uses multiple factors to decide if a message is spam and if the provider can motivate administrators of other servers to configure extra things (Google can)
  • checking message content: it’s complex, has false positives, causes an arms race, needs access to message’s plain text content (preventing end-to-end security or delegating spam filtering to the client); if manual filtering of probable spam messages is needed, the method is at most as good as not doing any filtering.

I tried using ‘unsubscribe’ links in professional spam. They don’t work: they often fail (with e.g. page not found errors), are missing, or are mail addresses (I don’t mail spammers). If they work, they affect only some mails from the provider (only the mail that link was from?): they still send other mails. The IP address blacklist is more effective. I haven’t tried contacting server providers of spam businesses using VPSes or dedicated servers with terms of service prohibiting spam. I don’t know if they have a saner definition of spam than the law.

I would like it if all spammers moved to sending only OpenPGP-encrypted mails (they can easily get my public key from a public keyserver or from my Web site): it wouldn’t affect my spam filtering and it would increase their resource usage.

In this week, I received 11 spam messages (not counting ones from mailing lists), 5 are in English, probably from zombies, 6 are from real Polish businesses with IP addresses that I haven’t blacklisted yet. I don’t count how many were blocked. I consider this good enough to not research better spam filtering methods now.

I don’t offer a solution to the problem of spam: it’s difficult, has economic, legal, technical and educational aspects; what I use is sufficient for my needs and has no problems with securing message texts. I do not know how spam filtering would work if all users moved to their own servers, maybe some post-email protocols with proof-of-work schemes would solve these issues while not supporting sending emails from phones to Google servers.

DRM in free software

Free software has less antifeatures than proprietary software and users can remove them. While a well-known distro vendor includes spyware, such bugs usually get fixed. Despite these, some well-known free programs include antifeatures restricting uses or modification of data that these programs should access or edit.

These antifeatures are called DRM which is ‘digital restrictions management’. It is unrelated to the Direct Rendering Manager which despite using the same acronym has no freedom issues other than requiring nonfree microcode for Radeon graphics chips. Traditional bugs that make programs mishandle data or crash when using specific files are also different, developers fix them and don’t consider them intentional.

PDF restrictions: Okular, pdftk

The PDF document format includes metadata flags which readers use to determine if the user is allowed to e.g. print the file or copy its text. Okular obeys this restriction by default, while it has an option to respect what the users does.

The main argument for keeping that optional DRM is that the PDF specification requires it and users could use that ‘feature’.

The PDF manipulation program, pdftk, obeys such restrictions with no option to remove them without changing its source. Fortunately Debian fixed this bug in their packages, so it can be used on recent Debian-based systems to modify or fix restricted PDFs.

What if you get a restricted PDF and need to extract its text? Use pdftk input.pdf output output.pdf on a Debian-based system to drop this restriction, or just use the existing file in Okular with disabled DRM or another free PDF reader.

Debian-patched pdftk prints the following warning:

WARNING: The creator of the input PDF:
   drmed.pdf
   has set an owner password (which is not required to handle this PDF).
   You did not supply this password. Please respect any copyright.

I think it’s an acceptable way to handle such restrictions. Many uses of the restricted features don’t involve violating copyright.

I made that restricted file earlier using pdftk text.pdf output drmed.pdf owner_pw hunter2 allow. It did not warn me that that DRM is bad or that it can be very easily ignored or removed.

The PDF format supports also document encryption with user passwords. It’s not DRM, since it prevents reading the document instead of restricting it in software: can it be used to protect user’s privacy? (I don’t know how secure that encryption is, I would use OpenPGP instead if I had to send an encrypted document to a friend.)

LibreOffice spreadsheet ‘protection’

OpenDocument supports sheet and cell ‘protection’. It allows the user to read the spreadsheet (except for hidden sheets), but not view formulas, copy or edit their data.

This is implemented by adding metadata that tells programs to not allow editing the cells. The document contains an element with hashed password ‘needed’ to unprotect the sheet. It’s easy to change that password or remove protection using a text editor and a ZIP program to access XML files stored in the document.

LibreOffice Calc did not warn me that the added ‘protection’ is useless against users who can use a text editor. It did not warn me that this kind of restrictions is unfriendly and harmful regardless of them being effective.

The reason why I learned about this antifeature is that I once received a spreadsheet document and wanted to learn how its formulas worked. I converted it to ODF using LibreOffice and used jar and sed to change the ‘protection’ password. I learned more than I expected to from that document.

While all complete OpenDocument implementations have this problem, I name LibreOffice specifically here since I use it and recommend for other reasons. This antifeature probably comes from OpenOffice.org or StarOffice which cloned it and other bugs from other proprietary office software.

Like PDF, OpenDocument supports encryption which is unrelated to the discussed restriction.

FontForge

TrueType fonts have metadata flags specifying if a font editor should allow users to modify or embed the font. FontForge supports modifying that metadata and warns the user when opening a font containing it.

The setting responsible for this is ‘Element’ → ‘Font Info’ → ‘OS/2’ → ‘Embeddable’, opening a TrueType font with that value set to ‘Never Embed/No Editing’ shows a dialog box with the following message:

This font is marked with an FSType of 2 (Restricted License). That means it is not editable without the permission of the legal owner.

Do you have such permission?

Accepting it, the program allows me to modify my font and change that setting. I haven’t felt mislead into considering it an effective restriction, unlike when using LibreOffice or pdftk.

DRM and software freedom

All generic DRM issues apply here; I think there are more specific problems when it is used in works edited using free software:

  • these restrictions make studying or modifying the work harder; LibreOffice and unpatched pdftk don’t suggest a way of solving this
  • programs offering options to restrict works made using them usually mislead users into believing that that snake oil is secure
  • it legitimizes preventing users from studying or modifying the digital works that they receive

While all cultural works should be free, these issues apply to functional works like fonts, spreadsheets (non-hacker’s programs), research articles or documentation, for which the freedoms of free software can be most clearly applied.

Solution

Free software that we develop should have no antifeatures. If we find a free program with DRM, we should fix it, like Debian fixed pdftk. Software distributions should have explicit policies against DRM; the No Malware section of the Free System Distribution Guidelines would be appropriate if it was implemented and more widely promoted.

Free software is better: skilled users can fix it and share the changes that allow users to control their own computers.

Missing source code for non-software works in free GNU/Linux distributions

Most software cannot be edited without a source, making source availability necessary for software freedom. Free GNU/Linux distributions have an explicit requirement to provide sources of included software. Despite this, they include works without source. I do believe this is practically acceptable, while it restricts potential uses of the software and limits our ability to reason about software freedom.

The source

Section 1 of the GNU General Public License, version 3, defines the source code of a work as ‘the preferred form of the work for making modifications to it’. This definition is also used outside of the GPL.

However, only the author of the program can know if the given text is the source. C ‘source’ code is usually the source of the program compiled from it, while it isn’t if it was generated from a Bison parser. (Free software projects sometimes do accidentally omit the source for such files.)

Let’s simplify the issue: a source is a form of the work that a skilled user can reasonably modify. Some works, usually not C programs, are distributed in modifiable forms that might be compiled from forms that the author prefers more for editing. (Some generated parsers do get modified, making GPL compliance for them slightly harder.)

(For GPL compliance there is a more important issue of the corresponding source for a non-source work which is certainly harder than deciding if a work is just the source. It is beyond the scope of this essay.)

I believe these issues are trivial in case of C programs like printer drivers that inspired the free software philosophy and rules. For other works, deciding if a form is the source is probably impossible if the software was distributed.

Fonts

Fonts are ‘information for practical use’. They describe the shapes and metrics of letters and symbols, editing them is useful to support minority languages or special symbols needed in computer science. Now most fonts are vector or outline fonts in formats like TrueType. Bitmap fonts have different practical and legal issues.

Fonts legally are considered programs, while their description of glyph shapes just lists points and curves connecting them with no features expected from every programming language. Editors like FontForge can edit TrueType fonts, while it has a different native format with lossy conversion to TrueType which is preferred for editing.

Hinting in TrueType contains ‘real’ programs adapting these shapes to low resolution grids and making them legible on screen. These programs are distributed in a Turing-complete assembly-like language interpreted by a stack-based virtual machine. There are tools like Xgridfit which can compile a higher-level language into these programs. The other popular font formats, PostScript Type 1 and its derivatives, use high-level ‘hints’ like positions of stems and standard heights that the rasterized uses in unspecified ways to grid-fit the glyph.

While there is some benefit of editing the source instead of TrueType files, this is much different for meta-fonts. The Computer Modern project developed by Donald E. Knuth for use with TeX consists of programs using 62 parameters to generate 96 fonts. Modern technologies require drawing every font separately, while the same program describes e.g. a Roman letter for all fonts that contain it and doesn’t need many changes for new fonts. Making a separate set of fonts in a much different style for a single book is possible with meta-fonts, or gradually changing between two different fonts in a single article. (I have made a narrow sans-serif monospace style for a Computer Modern derivative in several hours. It is not published due to a licensing issue.)

However, there are nearly no other uses of meta-fonts as effective as this one. MetaFont, the program that interprets Computer Modern, generates device-specific bitmaps with no Unicode support. All programs that compile meta-fonts to outline font formats do it either by tracing bitmaps produced by MetaFont (resulting in big and unoptimized fonts) or generating outlines directly without support for important features used in Computer Modern. Recent meta-font projects rebuilding their sources from generated outline fonts or not publishing sources do not support this being a successful style today.

Hyphenation patterns

While some languages have reliable rules for hy-phen-a-tion, in English this was done using dictionaries of hyphenated words. This approach has significant problems that were solved by Franklin Liang’s hyphenation algorithm used in TeX, generating rule-like hyphenation patterns from a dictionary. 4447 patterns generated from a non-public dictionary allow TeX to recognize 89.3% of hyphens in the dictionary words.

The patterns are subwords with multiple levels of hyphens to be added or removed. The word hyphenation is hyphenated using hy3ph, he2n, hena4 and six other patterns, resulting in hy-phen-ation. (Not all hyphens are found, this will be fixed by future dictionaries using TeX to derive their hyphens.)

The same algorithm is used for multiple other languages with different patterns. They are usually generated from dictionaries restricted by copyright and not available to the users. Some languages have patterns distributed with the source dictionary. (I believe patterns could be easily written by hand for a language having reliable hyphenation rules depending only on the characters in words, although I haven’t seen any example of this.)

The patterns can be and are edited, while the source dictionaries can be more useful for development of other hyphenation algorithms. This makes them ‘a source’, but not ‘the source’.

(Technically, TeX doesn’t use the patterns directly. INITeX loads macro definitions, hyphenation patterns and font metrics, and saves its memory into a format: a very build-specific file for fast loading by VIRTeX which is normally used to build documents, representing patterns in a difficult to edit packed trie. VIRTeX does not support loading patterns since their compilation needs extra memory and code, now the same program is used for both purposes. Many other macro processors and Lisp implementations have a similar feature under a different name.)

Game data

Video games provide a bigger source of binary data. Many contain bitmaps or animations made using 3D rendering software from unpublished sources. Some games like Flight of the Amazon Queen are published as a single binary with no source and no tools for editing it. (A Trisquel users forum thread about this game originally motivated me to write this essay.)

This game has another interesting issue: a license that forbids selling it alone and allows selling it in larger software distributions. Well-known free licenses for fonts like the SIL Open Font License have the same restriction. It’s ‘useless’ since distributing the work with a Hello World program is allowed and this makes it a free software license.

Lack of source nor tools to edit it is more interesting. The Debian package includes an explanation of its compatibility with the DFSG. The binary is the ‘the preferred form for modification’ and the tools for its editing being lost made modifications equally hard for both Debian users and authors of the game. This is consistent with the source requirement being made to prevent authors from having a monopoly over their works (this explanation looks equivalent to the user’s freedom argument).

In GNU/Linux distributions endorsed by the FSF this is not an issue. Game data is considered non-functional and the only permission required is to distribute unmodified copies. (Debian excludes from the main repository games that are included in these distributions, while they exclude games that other distributions include. The first common issue is lack of data source or modification permission, the second is a restriction of commercial distribution.)

Documentation

Documentation of free software should be free, so it can be shared with the software and updated for modified versions. Most documentation is distributed as HTML or PDF files which are usually generated from various other markup languages.

Not all such documentation has a published source and sometimes software source is distributed with the binary only. (Sourceless PDFs often use nonfree fonts too.)

HTML can be edited and often is the source, while in other cases it is compiled from sources which preserve more semantic information about the document and have better printing support. For this reason we should not consider it the source if the author has a source from which it is compiled. Can we know this?

While the most popular free software licenses require providing the source with binaries, this isn’t true for most documentation licenses. No Creative Commons license protects the practical freedom to modify due to its focus on non-textual works. GNU FDL does and unlike software licenses it also requires the source to be in a free format.

The program-data dualism

Most of the above cases suggest that source code access is needed only for programs, not for data. This isn’t true and is not strict enough to be an useful criterion.

TrueType fonts are both programs and data. The PostScript page description language and typesetting systems based on TeX use Turing-complete programming languages for formatting documents which sometimes do contain nontrivial programs. Scripts describing events (and dialogue) in games are programs.

There is another difference between these works and compiled C programs: they work on multiple architectures. This is not a sufficient criterion for requiring sources, since we do not consider Java programs distributed as class files without source free, while they run on all architectures supported by Java virtual machines. Binaries being architecture-specific make distribution package builds for unpopular architectures like MIPS a more useful way of finding missing sources.

Version control

Most recent free software projects distribute the source in two ways: in distributed version control system repositories and as archives of a specific versions: tarballs which often include generated files that require ‘special’ tools to build that not all Unix systems had.

For development, source is obtained from the version control system, since it has the whole project history explaining why the changes were made. For fulfillment of the source distribution requirements, the tarball is used. Does this mean that the tarball isn’t the ‘the preferred form of the work for making modifications to it’?

Conclusions

We should provide the sources of the works that we make, since only in this case we know that it is the source. The source should be in a public and distributed version control system and include tools to build all non-source files of the work.

Verifying if software written by others has a source is harder. If you can edit it, then maybe it’s free and it’s a source. Don’t distribute software that you don’t use, since you don’t know if it respects the freedom of its users.