Big email providers use very complex spam filtering methods. Solutions used by Google require distributed real-time processing, access to to plain text of all messages. Their work is closely followed by spammers in an arms race, while it’s not usable for small servers and both sides benefit from reducing user’s privacy. In this article I describe how spam filtering works on my personal server: a solution optimized for low administration effort and not using message content. It involves using only existing known free software packages without much extra configuration beyond what’s needed to have a working mail server.
Email spam that I receive comes from three main sources: zombie computers in botnets, hijacked accounts and Polish businesses. Zombies are easy to block, since they do not comply with mail standards in easily detectable ways. Hijacked accounts are now rare (partially due to the hard work of Google explained in the linked mail; it would be easier if the two Yahoo users who don’t spam moved to other providers).
Spam from Polish companies is my main issue, since they use properly configured servers and their own IP addresses. There is a law that allows sending uninformative spam to everyone, while informative spam can be sent only to companies. They do not check if the recipient has a company.
I use the following methods to filter these kinds of spam on my server using Postfix MTA:
- It filters much zombie spam by adding a several second delay and checking if the client waits before sending data and doing several other protocol correctness checks.
- Sender Policy Framework
- Since zombie spammers do not use their own domains (these would be blacklisted by Google), they use fake sender domains which are often real. SPF records specify which servers are authorized to send mails for that domain, so zombie spam using it is blocked. Not enough domains use it. SPF would block some good mails if I used email forwarders without SRS, I don’t, since I have no use for forwarders. (The SPF validator implementation that I use is pypolicyd-spf.)
- It greylists all mail not from known trusted servers that haven’t successfully delivered a mail recently; i.e. it returns a temporary error code and allows the mail to be sent again after several minutes (proper servers do this; email servers work well without 100% uptime). This leads to delays when getting mails from new servers, annoying for registration emails from shops. It blocks nearly all remaining zombie spam.
- static IP address blacklist
- For professional Polish spam businesses. For one provider, I have to blacklist entire IP ranges. This solution wouldn’t work for a server with more users.
I don’t use these common methods:
- checking reverse DNS records: it fails on real servers and would block much self-hosted servers
- using external RBLs: they are bad and block self-hosted mail
- DKIM: I don’t find enough value in it to find how to configure it; I think it might be useful for more complex filtering that uses multiple factors to decide if a message is spam and if the provider can motivate administrators of other servers to configure extra things (Google can)
- checking message content: it’s complex, has false positives, causes an arms race, needs access to message’s plain text content (preventing end-to-end security or delegating spam filtering to the client); if manual filtering of probable spam messages is needed, the method is at most as good as not doing any filtering.
I tried using ‘unsubscribe’ links in professional spam. They don’t work: they often fail (with e.g. page not found errors), are missing, or are mail addresses (I don’t mail spammers). If they work, they affect only some mails from the provider (only the mail that link was from?): they still send other mails. The IP address blacklist is more effective. I haven’t tried contacting server providers of spam businesses using VPSes or dedicated servers with terms of service prohibiting spam. I don’t know if they have a saner definition of spam than the law.
I would like it if all spammers moved to sending only OpenPGP-encrypted mails (they can easily get my public key from a public keyserver or from my Web site): it wouldn’t affect my spam filtering and it would increase their resource usage.
In this week, I received 11 spam messages (not counting ones from mailing lists), 5 are in English, probably from zombies, 6 are from real Polish businesses with IP addresses that I haven’t blacklisted yet. I don’t count how many were blocked. I consider this good enough to not research better spam filtering methods now.
I don’t offer a solution to the problem of spam: it’s difficult, has economic, legal, technical and educational aspects; what I use is sufficient for my needs and has no problems with securing message texts. I do not know how spam filtering would work if all users moved to their own servers, maybe some post-email protocols with proof-of-work schemes would solve these issues while not supporting sending emails from phones to Google servers.