Problems don't have two causes, unless they do

Hi folks,

The idea that the problem you’re facing could be caused by two random bad things happening at the same time is only amusing until it actually happens, then it gets “interesting” with a capital F.

I run two MiaB servers behind a pfSense firewall, so when I scheduled time with my users to upgrade the firewall software Friday night I was on high alert for problems afterwards. Afterwards most of my systems seemed to run normally but when I went in to check MiaB I saw multiple problem I was unfamiliar with. Fearing the worst I went to check on the backup status only to be told Something is wrong with the backup.

At that point the thread was still very young with no solutions on offer, and so started a night spent in hell. I managed to restore one of the servers (with little enough traffic to not worry too much) from backup to a point before the unattended updates had installed the broken version of duplicity. But the trouble didn’t end there for as I tried to do a test restore of the S3 hosted backups (to confirm it is an option for the other server) I discovered that DNS resolution is completely broken on the restored machine. IN fact, it was broken on the other (untouched) MiaB server as well. unless I manually override the local nameserver in /etc/resolv.conf not to point to localhost but nameserver on the firewall which is given through in the DHCP settings, it just wouldn’t resolve any names, which meant that the usual cure-all of re-running mailinabox (faitfully insisting on setting /etc/resolv.conf so the nameserver is 127.0.0.1) when there’s a problem actually ended up reliably breaking DNS resolution on the box and as a result everything else failed as well.

It was a rare example of two independent events happening simultaneously yet having a compounding effect on each other. I couldn’t make or restore a duplicity backup of the emails without restoring an oder machine backup first, and repairing MiaB in the usual manner returned it to a broken state every time.

The DNS issue went away when I reverted to the previous stable version of pfSense on the firewall, that allowed me to copy duplicity files from the quiet server to the busy server to get it working well enough to make final backups of the emails themselves before restoring the server image from a machine backup, restore the emails from S3, put a hold on the duplicity update and run mailinabox setup again to ensure all is as it should be.

I’ve since confirmed that the latest pfSense version with a known good configuration restored onto it breaks MiaBs DNS setup immediately and permanently but I am yet to identify (in the release notes, probably) what triggers the problem.

Making this just a cautionary tale, saying:

  • It’s never two separate problems, until it is.
  • MiaB installations behind pfSense can get seriously impacted by upgrading to CE version 2.8.0.
  • Hold duplicity at known good versions (sudo apt-mark hold duplicity) despite installer errors.
  • Lobby to update the installer to manage duplicity updates differently.

I accept that it was sheer bad luck that my servers and I got impacted by two critical issues at the same time which consumed my entire weekend. Yet both issues are real and would have required some intervention, just hopefully not as much.

Thanks for listening, and be safe out there.