Dovecot not working - [blocking on log write]

jstrauss

Verified User
Joined
Jun 1, 2023
Messages
17
Location
Austria
Hello,

I've been facing an issue with Dovecot that suddenly stopped working, resulting in a complete halt of mail traffic. My system runs on AlmaLinux 9 with the latest DirectAdmin version. Everything was functioning perfectly for approximately three months. However, after several proper server restarts, this error emerged. Upon restarting Dovecot, it operates normally for a few hours before encountering the same problem.

When checking the status of the Dovecot service using service dovecot status, the following information is displayed:

Code:
Redirecting to /bin/systemctl status dovecot.service
● dovecot.service - Dovecot IMAP/POP3 email server
     Loaded: loaded (/etc/systemd/system/dovecot.service; enabled; preset: disabled)
     Active: active (running) since Sun 2023-12-17 17:36:11 CET; 1h 21min ago
   Main PID: 468550 (dovecot)
      Tasks: 191 (limit: 125845)
     Memory: 181.4M
        CPU: 5.296s
     CGroup: /system.slice/dovecot.service
             ├─468550 /usr/sbin/dovecot -F
             ├─468552 "dovecot/managesieve-login "
             ├─468553 "dovecot/pop3-login "
             ├─468556 "dovecot/anvil [0 connections]"
             ├─468557 "dovecot/log "
             ├─468558 "dovecot/managesieve-login "
             ├─468559 "dovecot/managesieve-login "
             ├─468560 "dovecot/managesieve-login "
             ├─468562 "dovecot/lmtp "
             ├─468564 "dovecot/lmtp "
             ├─468566 "dovecot/lmtp "
             ├─468568 "dovecot/lmtp "
             ├─468571 "dovecot/pop3-login "
             ├─468572 "dovecot/pop3-login "
             ├─468573 "dovecot/pop3-login "
             ├─468574 "dovecot/pop3-login "
             ├─468575 "dovecot/pop3-login "
             ├─468576 "dovecot/pop3-login "
             ├─468577 "dovecot/pop3-login "
             ├─468579 "dovecot/pop3-login "
             ├─468580 "dovecot/pop3-login "
             ├─468581 "dovecot/config "
             ├─468583 "dovecot/lmtp "
             ├─468584 "dovecot/lmtp "
             ├─468585 "dovecot/lmtp "
             ├─468586 "dovecot/lmtp [local READY]"
             ├─468587 "dovecot/pop3-login "
             ├─468588 "dovecot/pop3-login "
             ├─468589 "dovecot/pop3-login "
             ├─468590 "dovecot/pop3-login "
             ├─468591 "dovecot/pop3-login "
             ├─468592 "dovecot/stats [186 connections]"
             ├─468594 "dovecot/auth [0 wait, 0 passdb, 0 userdb]"
             ├─469843 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─469844 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─469846 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─470425 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─470428 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─470429 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─470430 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─470433 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─488289 "dovecot/lmtp "
             ├─515511 "dovecot/lmtp "
             ├─520384 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─520396 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─520407 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─520412 "dovecot/imap [[email protected] 1.2.3.4 (deinit)] - [blocking on log write]"
             ├─525333 "dovecot/imap [auth lookup] - [blocking on log write]"
             ├─525335 "dovecot/imap [auth lookup] - [blocking on log write]"
             ├─525337 "dovecot/imap [auth lookup] - [blocking on log write]"
             ├─525338 "dovecot/imap [auth lookup] - [blocking on log write]"
             ├─525340 "dovecot/imap [auth lookup] - [blocking on log write]"

The concerning aspect I've noticed is the presence of "blocking on log write." Despite having adequate disk space, the issue persists.

The output from systemd-journald indicates:

Code:
Redirecting to /bin/systemctl status systemd-journald.service
● systemd-journald.service - Journal Service
     Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
     Active: active (running) since Sat 2023-12-16 12:42:15 CET; 1 day 6h ago
TriggeredBy: ● systemd-journald.socket
             ● systemd-journald-dev-log.socket
       Docs: man:systemd-journald.service(8)
             man:journald.conf(5)
   Main PID: 640 (systemd-journal)
     Status: "Processing requests..."
      Tasks: 2 (limit: 125845)
     Memory: 469.9M
        CPU: 20h 31min 25.520s
     CGroup: /system.slice/systemd-journald.service
             └─640 /usr/lib/systemd/systemd-journald

Dec 17 19:02:21 host.jstrauss.at systemd-journald[640]: [?] Suppressed 4346457 messages from named.service
Dec 17 19:02:51 host.jstrauss.at systemd-journald[640]: [?] Suppressed 5506517 messages from named.service
Dec 17 19:03:21 host.jstrauss.at systemd-journald[640]: [?] Suppressed 6458173 messages from named.service
Dec 17 19:03:21 host.jstrauss.at systemd-journald[640]: Data hash table of /run/log/journal/7c482cd411a0462981181be7775b95f0/system.journal has a fill level >
Dec 17 19:03:21 host.jstrauss.at systemd-journald[640]: /run/log/journal/7c482cd411a0462981181be7775b95f0/system.journal: Journal header limits reached or he>
Dec 17 19:03:51 host.jstrauss.at systemd-journald[640]: [?] Suppressed 6633903 messages from named.service
Dec 17 19:04:21 host.jstrauss.at systemd-journald[640]: [?] Suppressed 6712979 messages from named.service
Dec 17 19:04:51 host.jstrauss.at systemd-journald[640]: [?] Suppressed 6561285 messages from named.service
Dec 17 19:05:21 host.jstrauss.at systemd-journald[640]: [?] Suppressed 4436252 messages from named.service
Dec 17 19:05:51 host.jstrauss.at systemd-journald[640]: [?] Suppressed 4398026 messages from named.service

It appears that an excessive amount of logs is being written, causing Dovecot's log writing process to stall. Despite researching extensively, I couldn't find a similar issue documented online.

My assumption is that there might be an underlying system-level problem leading to this sudden increase in log generation. I'm uncertain how to resolve this without resorting to a complete OS reinstallation.

I would greatly appreciate any guidance or suggestions on resolving this issue. Thank you for your assistance.

Best regards,Jonathan
 
Hello.

Normally a blocking on log write by dovecot could point to too many users using it, so limits too low.
However, since there is also these messages from named, it might be something else, maybe a DDOS attack or something.

Suppressed 6712979 messages from named.service
Because this is very very much.

I would suggest checking your system log to see what or who is causing all these named messages.
Does you datacenter has DDOS protection?
 
Thanks for your fast answer.
I don't know if it's a DDOS Attack because everything else works fine and I don't notice some irregularity. The integrated CSF & LFD should block ddos attacks... but my datacenter has no dedicated DDOS protection.

I just checked my /var/log/messages file and there are hundreds of following entry:

Code:
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file
Dec 17 20:23:21 host named[1793]: TCP connection failed: invalid file

So maybe it's something else?
 
1702844899627.png
1702844931424.png

Now after a full reboot of the server i got following messages that the Network Manager and Login Management are not starting. It is stuck here and then it says failed to start Network Manager and it boots up but without internet connection...
 
You lost me here, I'm not deep into that what that means.

But when I look at this in a very simple way it seems like a hardware issue.

a.) It has been running fine for months.
b.) everything is setup and configured perfectly
c.) suddenly Dovecot stopped working generating write issues
d.) named is also having issues, pointing to invalid files, maybe invalid because of non readable or non writable
e.) After a reboot, not even the network can be started.

So when reading this, to me this looks like a hardware issue, defective HDD, defective RAM and/or defective network card.
However, I can't point the finger to the exact cause, but this seems the most logic reason to me.
 
I don't think so it's a hardware error. Or maybe it just was a one time hardware error and downed everything.
Because my other 10 VPS on the same host aren't having any problems. And if i up the network interface manually, the network works again, but only limited. Also i get following "warnings" on and on. Even if it is "rotating" the log files. So something is exploding my log files and causing the error.

1703019048126.png

There must be a fix for this problem, because i reinstalled the server so often because of some similiar problems and i changed also between this issues cpu and memory of the server host, so a hardware error is kinda except.
 
Back
Top