DNS broke on my mailserver recently. This meant that the system status checks reported that “Nameserver glue records are incorrect.” and “The nameservers set on this domain are incorrect. They are currently [Not Set].”
When I looked in /var/log/syslog
after running dig @localhost google.com
as a test, I found this:
Oct 1 00:21:30 ubuntu named[882]: validating @0x72f00468: com DS: bad cache hit (./DNSKEY)
Oct 1 00:21:30 ubuntu named[882]: error (broken trust chain) resolving 'google.com/A/IN': 216.239.32.10#53
Oct 1 00:21:32 ubuntu named[882]: validating @0x72e2c770: com DS: bad cache hit (./DNSKEY)
Oct 1 00:21:32 ubuntu named[882]: error (broken trust chain) resolving 'google.com/A/IN': 216.239.34.10#53
Strangely, the IP’s listed are valid google.com IP’s, so the DNS lookup is sort of succeeding, but named
doesn’t want to allow the answers to be returned as valid.
I fixed the problem by resetting the clock like this, based on http://www.thedumbterminal.co.uk/posts/2015/03/correcting_bind_errors_due_to_an_out_of_sync_clock.html
Stop time and DNS daemons:
/etc/init.d/ntp stop
/etc/init.d/bind9 stop
Find the address of a public NTP server:
nslookup pool.ntp.org 8.8.8.8
Set the time correctly:
ntpdate 209.114.111.1
Restart DNS and time daemons:
/etc/init.d/bind9 start
/etc/init.d/ntp start
After this, DNS started working again and I started receiving mail again too.
I am still not sure exactly how my system time went wrong, but I hope this is useful to someone . . .
1 Like
Wow. I would have loved to know what the system clock was before and after these steps.
Thanks for posting this.
Here are the relevant excerpts from my console as I was fixing it.
root@ubuntu:~# ntpdate 193.188.204.101
1 Oct 00:30:20 ntpdate[15236]: adjust time server 193.188.204.101 offset 0.127124 sec
root@ubuntu:~# date
Thu Oct 1 00:30:30 UTC 2015
root@ubuntu:~# ntpdate 193.188.204.101
1 Oct 00:30:45 ntpdate[15373]: adjust time server 193.188.204.101 offset 0.118377 sec
root@ubuntu:~# ntpdate 193.188.204.101
1 Oct 00:30:53 ntpdate[15379]: adjust time server 193.188.204.101 offset 0.113422 sec
root@ubuntu:~# ntpdate 209.114.111.1
1 Oct 00:31:20 ntpdate[15388]: adjust time server 209.114.111.1 offset 0.095278 sec
Looking at this now, it’s hard to believe that an error of less than a second could make named
choke, but maybe I’m wrong about that.
An alternative explanation could be that restarting ntp
and bind9
fixed something, but that seems even less likely, given that named
was at least alive and listening before, and all ntp
could have fixed was the time.
Update a few weeks later: I think the problem has nothing to do with NTP or time.
The same symptoms recurred a couple of days ago and just restarting bind9
fixed it. I’m not sure what the root cause is, but it appears that time has nothing to do with it.
Update 1.5 years later: the same problem recurred again.
I’m now on MIAB 0.21c. DNS failed on a Saturday afternoon. When I logged in, I discovered the system time had been reset to what I think was probably Unix epoch 0 (it was definitely 1970; I didn’t write down the exact time).
Restarting bind9
solved the problem as before. Before restarting bind9
, I was able to query the daemon, but it would respond with empty records (maybe this was actually named
responding?). Oddly, rebooting the system did not solve the problem.
Just one other update: it appears that I have to actually reboot and subsequently restart bind9 as well.
Still not sure what’s going wrong.
Also, restarting bind9 and then rebooting (the reverse order as above) fixes the system clock, but does not fix DNS. A second restart of bind9 fixes DNS.
There was a power glitch in the building around the same time DNS failed, so it might have something to do with the server not rebooting cleanly, or something like that. Maybe the system clock is set before the network connection comes up? Maybe DNS resolution is needed to set the system clock?