Roundcube Timeout 504 (upstream timed out)

prostream · March 10, 2022, 11:58am

My hoster (netcup) offers a memory optimization in the admin-web-panel. After running this, I get a 504 when calling the roundcube webmailer:

/var/log/nginx/error.log says:

2022/03/10 12:32:59 [error] 7366#7366: *5981 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 2001:16b8:12cf:4200:c50c:bc6d:e142:432e, server: box…com, request: “POST /mail/?_task=login HTTP/2.0”, upstream: “fastcgi://unix:/var/run/php/php7.2-fpm.sock”, host: “box…com”, referrer: “https://box…com/mail/”

I already saw this old thread where @murgero points out that it could be due to an Apache instance. But I could already exclude that.

PHP-FPM looks also fine so far:

sudo systemctl status php7.2-fpm.service

● php7.2-fpm.service - The PHP 7.2 FastCGI Process Manager
   Loaded: loaded (/lib/systemd/system/php7.2-fpm.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2022-03-10 11:17:37 CET; 1h 30min ago
     Docs: man:php-fpm7.2(8)
 Main PID: 7892 (php-fpm7.2)
   Status: "Processes active: 10, idle: 12, Requests: 2066, slow: 0, Traffic: 0.1req/sec"
    Tasks: 23 (limit: 4915)
   CGroup: /system.slice/php7.2-fpm.service
           ├─ 7892 php-fpm: master process (/etc/php/7.2/fpm/php-fpm.conf)
           ├─ 7917 php-fpm: pool www
           ├─ 7918 php-fpm: pool www
           ├─ 7919 php-fpm: pool www
           ├─ 7922 php-fpm: pool www
           ├─ 7923 php-fpm: pool www
           ├─ 7924 php-fpm: pool www
           ├─ 7925 php-fpm: pool www
           ├─ 7926 php-fpm: pool www
           ├─ 7927 php-fpm: pool www
           ├─ 7928 php-fpm: pool www
           ├─ 7929 php-fpm: pool www
           ├─ 7930 php-fpm: pool www
           ├─ 9427 php-fpm: pool www
           ├─ 9429 php-fpm: pool www
           ├─ 9458 php-fpm: pool www
           ├─ 9460 php-fpm: pool www
           ├─10980 php-fpm: pool www
           ├─10981 php-fpm: pool www
           ├─11089 php-fpm: pool www
           ├─11092 php-fpm: pool www
           ├─12481 php-fpm: pool www
           └─12482 php-fpm: pool www

Mar 10 11:17:37 box.niewoehner.co systemd[1]: Starting The PHP 7.2 FastCGI Process Manager...
Mar 10 11:17:37 box.niewoehner.co systemd[1]: Started The PHP 7.2 FastCGI Process Manager.

Nothing in syslog…

MIAB Version: Version 56 (January 19, 2022)

I already tried upgrade of MIAB and reboot vserver… Nothing helped.

Any further ideas?

openletter · March 10, 2022, 1:11pm

Did you try disabling the “memory optimization”?

prostream · March 10, 2022, 2:59pm

No. Where to disable?

openletter · March 10, 2022, 3:01pm

The place you enabled?

prostream · March 10, 2022, 3:14pm

oh sorry, I must have expressed myself incorrectly. This is a storage-optimization feature that performs a defragmentation, so undoable… But i asked the hoster for help. Maybe they have another idea.

prostream · March 10, 2022, 4:17pm

That’s strange, it now suddenly works again. Without that I have done anything

CorneliusLentulus · March 10, 2022, 5:06pm

Same issue here without having a “memory optimization”:

[error] 1870#1870: *24 upstream timed out (110: Connection timed out) while reading response header from upstream, (…) request: “POST /mail/?_task=login HTTP/2.0”, upstream: “fastcgi://unix:/var/run/php/php7.2-fpm.sock”(…)"

prostream · March 10, 2022, 9:48pm

how many domains or mailboxes do you manage?

CorneliusLentulus · March 10, 2022, 10:12pm

@prostream only one atm, fresh install v56. admin page accessible without any problems. only roundcube is the issue so far.

qlx4r · March 11, 2022, 12:56pm

Just wanted to chime in that I have, since this morning, the same problem. When trying to access webmail/roundcube, I receive the timeouts described above.

A few observations:

The login page loads fine. It is only when trying to log in that I receive the timeout.
When using false credentials, I am being immediately returned to the login page, with an error. So the reason for the timeout seems to be AFTER successful auth
I massively increased the timeout. I copied the fastcgi_read_timeout line in the nginx config from the nextcloud section to the roundcube section. This increased the timeout from the default 60s to 630s. After this change, it was possible to log into roundcube, but the login process took much longer than 60s. Once logged in, all is fast and quick. (I know this is not a solution, because the nginx config will be regenerated. But it is a good data point).

So something is massively slowing down the login process. Unfortunately, neither the roundcube logs not the php-fpm logs nor the nginx logs are very helpful. I checked if there is a problem with DNS resolution, but there isn’t.

I will keep investigating and posting here.

prostream · March 11, 2022, 1:09pm

ok so it can´t have anything todo with the number of mailboxes or domains managed. I´m running about 60 Domains and more than 100 mailboxes.

@qlx4r very good observations, that’s exactly how it all played out for me as well. But i had not tried to increase the timeout.

qlx4r · March 11, 2022, 1:16pm

what i cannot explain: I have not touched the box in many days, yet the problem only manifested itself today. The logical conclusion should be that it is something external. That is why I checked the DNS resolution: If the login process queries a URL a couple of times, but now the DNS server is timing out, this would explain why suddenly everything takes order of magnitude longer. I found a related thread here in the forums where tweaking the /etc/hosts file was advised… it did not work in my case.

https://discourse.mailinabox.email/t/roundcube-now-giving-504-gateway-timeout-after-upgrade/8714/12

qlx4r · March 11, 2022, 3:50pm

I started debugging roundcube, and so far, it looks like the carddav-plugin is causing the timeout.

I went to /usr/local/lib/roundcubemail/ and started debugging the index.php. The long delay occurs when roundcube is calling the login_after-hook. Not many plugins register such a hook, so I checked all of them.

I disabled the carddav-Plugin by running a chmod 000 plugins/carddav (!!! write down the permissions beforehand so that you can restore them, ls -l plugins is your friend), and booom, login is fast again.

So: There seems to be a problem accessing the MIAB-Nextcloud instance. This seems to cause the timeout.

JoshData · March 11, 2022, 4:02pm

Nice work debugging it!

prostream · March 11, 2022, 4:06pm

yes, very good job! But does that explain why it only occurs sporadically?

qlx4r · March 11, 2022, 4:35pm

ok, here is the rub. I re-enabled the plugin. Logged into roundcube. Went to contacts. Tried to add a contact to the address book ‘owncloud (Contacts)’. This worked, but it took like a minute for the PUT call to work. In address book ‘owncloud (Recently contacted)’, adding an entry failed. But it also took a minute or so to get the Error Message. I checked in my nextcloud, there is indeed no Contact group for the second address book. But this doesn’t really matter: even successfully adding an entry to the first address book takes way too long.

Thus, when logging into roundcube, the plugin carddav probably queries all nextcloud address books, and that is why it takes so long.

I have disabled the plugin for now (let’s see how many users will hate me for that), and will continue to investigate. But the error must lie with nextcloud, not roundcube. Keep you posted!

And since I got the attention of @JoshData: Thank you so much for MIAB. I cannot even enumerate the reasons I am grateful for it, there are so many. It is a great project, for its intentions and for its technical implementation.

qlx4r · March 12, 2022, 12:21am

well, great. I just sat down to continue to debug the problem, but it has vanished. Plugin carddav enabled, login is fast, creating new contacts through roundcube in the owncloud addressbook works immediately… I have reached the dreaded stage of ‘cannot reproduce’. I believe this is the end of the road.

Well, it was fun, if it occurs again, I will be back here.

qlx4r · March 12, 2022, 12:32am

I just realised: it was a heisenbug!

CorneliusLentulus · March 12, 2022, 8:54am

@qlx4r Awesome! That did the trick and I can conform by disabling carddav-Plugin no timeout 504 are observed anymore.

system · March 19, 2022, 8:55am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.