Postfix, PHP and Dovecot shutdown overnight with postgrey.pid unlink error

Starting on 1/28/24, my MIAB has started shutting down PHP, Postfix and Dovecot at 3:03am EST and not restarting them. I can manually restart the services without an issue. Looking at /var/log/syslog, this appears to be related to the daily_tasks.sh cronjob running as it kicks off at 3:03am followed by all the services stopping. I’m assuming the intent is to restart the services, not shut them down.

The one potential clue I have is that when postgrey goes to close, there is a line in syslog that reads:

postgrey[969786]: Couldn’t unlink “/var/run/postgrey.pid” [Permission denied]

Postgrey does then continue to close and shows as stopped. However, all of the services are now stopped and they do not restart.

If I check /var/run/postgrey.pid, it looks perfectly normal and has the same ownership and permissions as another MIAB installation that I run that does not have this same issue, so it may be red herring, but it still shouldn’t complain about a permission issue. An ls of the file shows this:

-rw-r–r-- 1 postgrey postgrey 6 Jan 30 06:25 postgrey.pid

Has anyone else begun experiencing this issue? If not, any recommendations as to how I might solve this issue. Getting ‘down’ alerts at 3am two nights in a row gets old really fast. :slight_smile:

I include the relevant log entries below for reference.

Any help greatly appreciated.

===============

Jan 28 03:03:01 mail CRON[1247071]: (root) CMD ((cd /root/mailinabox && management/daily_tasks.sh))
Jan 28 03:03:05 mail systemd[1]: Stopping The PHP 8.0 FastCGI Process Manager…
Jan 28 03:03:05 mail systemd[1]: php8.0-fpm.service: Deactivated successfully.
Jan 28 03:03:05 mail systemd[1]: Stopped The PHP 8.0 FastCGI Process Manager.
Jan 28 03:03:05 mail systemd[1]: php8.0-fpm.service: Consumed 57.053s CPU time.
Jan 28 03:03:07 mail systemd[1]: postfix.service: Deactivated successfully.
Jan 28 03:03:07 mail systemd[1]: Stopped Postfix Mail Transport Agent.
Jan 28 03:03:07 mail systemd[1]: Stopping Postfix Mail Transport Agent (instance -)…
Jan 28 03:03:09 mail systemd[1]: postfix@-.service: Deactivated successfully.
Jan 28 03:03:09 mail systemd[1]: Stopped Postfix Mail Transport Agent (instance -).
Jan 28 03:03:09 mail systemd[1]: postfix@-.service: Consumed 2min 7.086s CPU time.
Jan 28 03:03:09 mail systemd[1]: Stopping Dovecot IMAP/POP3 email server…
Jan 28 03:03:11 mail systemd[1]: dovecot.service: Deactivated successfully.
Jan 28 03:03:11 mail systemd[1]: Stopped Dovecot IMAP/POP3 email server.
Jan 28 03:03:11 mail systemd[1]: dovecot.service: Consumed 3min 10.460s CPU time.
Jan 28 03:03:13 mail systemd[1]: Stopping LSB: Start/stop the postgrey daemon…
Jan 28 03:03:13 mail postgrey[1247120]: * Stopping postfix greylisting daemon postgrey
Jan 28 03:03:13 mail postgrey[969786]: 2024/01/28-03:03:13 Server closing!
Jan 28 03:03:13 mail postgrey[969786]: Couldn’t unlink “/var/run/postgrey.pid” [Permission denied]
Jan 28 03:03:13 mail postgrey[1247120]: …done.
Jan 28 03:03:13 mail systemd[1]: postgrey.service: Deactivated successfully.
Jan 28 03:03:13 mail systemd[1]: Stopped LSB: Start/stop the postgrey daemon.

Looked at my other MIAB install – it has the same postgrey.pid unlink error during the daily_tasks.sh process, so that may simply be unrelated. However, unlike the problematic server, on the ‘normal’ server, as soon as postgrey stops, it then starts up, followed by dovecot, postfix and php. This simply never happens on the problematic box.

I have done all of the updates prior to today and running updateMIAB as I write this returns, “The server is completely up-to-date”.

Figured this out…

I brought in a number of new accounts from another customer last week. Then, on 1/28, it was the time of the month when Duplicity does a full backup. The daily_tasks.sh shuts down the services to quiesce the server, runs the Duplicity backup and then starts up all the services again when the backup is done. However, if you have a relatively big or lower-power server, it can take 10 - 20 minutes to complete the Duplicity step, meaning that all mail services are down for that time. This never showed up before when the server had less data on it, but now it is running long enough to trip my monitoring.

I include this answer for anyone else who might experience the same issue and wonder about the situation. Best solution I can offer at the moment is to adjust any monitoring you may have to exclude the time when your daily_tasks.sh runs.