Mail-in-a-Box Inaccessible after Failure to Provision SSL Certificate

chaos215bar2 · December 21, 2017, 7:59pm

I have everything pretty much setup the way the way I want, but I’m running into a problem trying to provision Let’s Encrypt certificates. The very first time I clicked the “Provision” button on the “TLS (SSL) Certificates” page, I was presented with the Let’s Encrypt ToS and accepted them. After than and every time I’ve tried the “Provision” button since, Mail-in-a-Box sits for about 30 seconds before throwing up an error stating “Something went wrong, sorry.”, after which any attempt to access the admin interface results in a “504 Gateway Time-out” from nginx. I have to reboot the server to restore access to the web admin.

Unfortunately, I see no errors whatsoever in /var/log/syslog when all of this is happening. I have not yet had a chance to delve into precisely what the “Provision” button is doing or figure out whether there are errors being logged elsewhere, but before I do that, does anyone have any advice?

The only unusual thing about my setup is that I had to disable IPv6 (by removing the configuration options under /etc/network/interfaces; the box does still have an automatically configured link local address, but that’s it) after installation as I discovered my ISP does not (currently) support IPv6 reverse DNS configuration. I removed the corresponding lines from /etc/mailinabox.conf, and everything does seem to be working in that regard, with the odd exception that MIAB is reporting services are not publicly accessible on my IPv4 address (which is wrong; they are accessible via the box’s public IP from both inside and outside my network).

chaos215bar2 · December 22, 2017, 3:03am

I’ve tried disabling IPv6 entirely (after re-running the configuration script and seeing the link-local address happily added as a “public” IPv6 address), but I still see the same behavior. At this point, plenty of time should have passed for DNS propagation since removing the IPv6 address, so I doubt the problem is Let’s Encrypt timing out trying to access the box via IPv6 (and then also not falling back to IPv4).

I do see the following in /var/log/nginx/error.log (IPs and domains changed, of course, though all are correct):
2017/12/21 18:56:17 [error] 9354#0: *33 upstream timed out (110: Connection timed out) while reading response header from upstream, client: X.X.X.X, server: mail.example.com, request: "POST /admin/ssl/provision HTTP/1.1", upstream: "http://127.0.0.1:10222/ssl/provision", host: "mail.example.com", referrer: "https://mail.example.com/admin/"

What would be running on port 10222? (Edit: I gather this is the mailinabox management daemon; still trying to figure out how to see what’s going wrong there.)

chaos215bar2 · December 22, 2017, 9:19am

So, the problem here is that MIAB is assuming that all traffic sent to the publicly facing IP will make it back to the box, even traffic sent from the box itself. This is… problematic when NAT is involved, and not really an ideal way to set things up anyway. (Split DNS is better, as it won’t unnecessarily route local traffic through the router.) I actually have split DNS configured properly on my network, but the setup script overwrites this, setting RESOLVCONF=yes in /etc/default/bind9.

Incidentally, this is also why the status checks claim no services are running. (Note: When public and private IPs don’t match, testing accessibility of services on the public IP isn’t going to be a reliable way to tell that those ports are accessible. Even assuming NAT is configured to allow that sort of translation, mot likely there are not going to be any firewall rules blocking traffic from the local interface, while such rules will be present by default on the WAN interface.)

I tried setting RESOLVCONF=no in /etc/default/bind9, but it looks like this breaks other things, including — again — Let’s Encrypt. The admin page now helpfully states (IP hidden) Domain control validation cannot be performed for this domain because DNS points the domain to another machine (A $PRIVATE_IP). This, of course, is again not accurate. Let’s Encrypt will see the correct IP, but MIAB has decided that things are broken, so won’t even try.

Ideas? I’m thinking my (not all that unusual) setup is just a little too different from what MIAB is designed for to be viable. I’ve also tried to get NAT configured to route traffic from my box back to itself via its public IP, but that doesn’t seem to be possible on pfSense. (Unfortunately, there doesn’t seem to be anyway to debug where the traffic is being sent awry either.) Other hosts can access my box via its public IP, but not the box itself.

chaos215bar2 · December 22, 2017, 9:54am

Also, nsd won’t start with IPv6 disabled:
Dec 22 01:49:53 mail nsd[6951]: can't bind tcp socket: Cannot assign requested address Dec 22 01:49:53 mail nsd[6951]: cannot open control interface ::1 8952 Dec 22 01:49:53 mail nsd[6951]: could not open remote control port Dec 22 01:49:53 mail nsd[6951]: could not perform remote control setup Dec 22 01:49:53 mail kernel: [ 796.206763] init: nsd main process (6951) terminated with status 1 Dec 22 01:49:53 mail kernel: [ 796.206796] init: nsd respawning too fast, stopped

murgero · December 23, 2017, 5:03pm

Please check and confirm apache2 is not installed, this is a common problem with installs on version 25.

chaos215bar2 · December 23, 2017, 5:23pm

No.

I was able to resolve the NAT issue on the router (turns out pfSense NAT reflection doesn’t quite work with forwarding rules using hostnames rather than IP addresses). It’s unfortunate that the only thing actually going wrong here was the MIAB self-checks (as they aren’t robust to split DNS without NAT reflection), but everything is working now.

system · December 30, 2017, 5:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.