I’ve experienced a loss of service about the same time (11am ish) for two days running.
The status page shows: Dovecot LMTP LDA is not running (port 10026). IMAPS (dovecot) is not running (port 993). Mail Filters (Sieve/dovecot) is not running (port 4190).
A restart on both occasions resolved (yesterday I ran an upgrade/update before restart and today I noted the box was indicating it needed a Reboot from the status page so actioned that). No further changes have been made.
Reviewing mail.log isn’t throwing up anything obvious. mail.err is showing the following but unsure if this timing co-incides:
Aug 20 11:45:18 box postfix/submission/smtpd[151387]: fatal: no SASL authentication mechanisms
Aug 20 11:46:56 box spampd[149402]: WARNING!! Error in process_request eval block: /usr/sbin/spampd: socket connect failure: Connection refused
Just wondering if others have experienced similar.
If anyone can help guide on investigative routes I would appreciate it. If not I’ll continue to monitor over the next fews days.
This problem has occasionally plagued us and other MiaB users for a long time. I consider it the only unfixed major MiaB problem because it prevents MiaB from functioning when the problem manifests. I recently successfully diagnosed the root of the problem, and reported it as an Issue on GitHub: Too Many sa-learn Processes Make MiaB Server Unresponsive and Can Crash Mail Processes #2531. I described a possible remedy in general terms, but the needed modification of the MiaB code is beyond my skill level. I am hoping that someone with the requisite skills will try to fix this major problem.
Thanks @KiekerJan, looks like the following coincided with the failure time and a spike on memory consumption. No further occurrences to date for me but will continue to monitor.
Aug 20 11:32:26 box.domainame.com systemd[1]: dovecot.service: A process of this unit has been killed by the OOM killer.
Aug 20 11:32:39 box.domainame.com systemd[1]: dovecot.service: Failed with result 'oom-kill'.
Aug 20 11:32:39 box.domainame.com systemd[1]: dovecot.service: Consumed 1h 20min 41.816s CPU time.
Thanks @MSSEsq, good work on finding the underlying cause. Agreed, hopefully it can be resolved going forward.
No further occurrences and not too much of an issue for me as I can pick up resolve pretty quickly but if one has a greater (commercial) dependency I can see how it would be problematic.
Hi @darond - thinking about a work-around, how much swap space have you allocated? And how busy is your system - how many users & emails?
If you can’t/don’t want to allocate more memory, one way to make your problem less likely is to allocate more swap. Swap is slower than RAM, but it is disk (cheap) and easy to modify.
Changing swap will depend on how your system is setup. Your provider might be able to help, and there’s plenty of pointers on the web.
You can see your actual memory details by cat /proc/meminfo
I have investigated further by allowing mail to be downloaded from an account that I knew would trigger the problem, then ran top to see the sa-learn processes accumulate toward eventually overwhelming available memory. I then successively ran ps -C sa-learn -o args to see more detail about what was happening. To my surprise, I found that messages were being processed as ham rather than as spam. The following is an example of the command along with what is returned:
Further investigation is needed to determine why many simultaneously running sa-learn processes accumulate referencing individual mail messages until system failure occurs due to exhaustion of all available RAM. Lines 670–671 of /usr/bin/sa-learn states: “Simply run this command once for each of your mail folders, and it will ‘‘learn’’ from the mail therein.” This seems to indicate that the commands with arguments listed above should end with /tmp rather than with the names of individual message files, and that only one sa-learn process should be running to learn from all the messages in the /tmp directory. Also, it needs to be verified that those messages should be learned as ham rather than as spam.