some emails wait in queue for a long time before being sent through

jvdwilk · Oct 25, 2020

Using a fresh server setup, running
CloudLinux 8 and Exim 4.94

All software up-to-date, running with just a few test domain, this new server is to become a new production server.

Now, when sending emails, usually things just work as expected. But then sometimes they don't.
Especially when the system is 'cold', having been idling for a day or two (monday morning), the first email I send from [email protected] on the new server to [email protected] on another DA server sits in the queue for up to an hour before it is being processed. And when being processed, it goes right through. Logs show the message has been accepted, but that is it, after that it just waits.

If I then send another message, from the same [email protected] on the new server to the same [email protected] on the other DA server, just a different subject and message body (both having TEST 5544 or something similar in it) then it usually passes right through to the receiving server. While the first almost similar message stills sits and waits in the queue until it gets processed.

The only thing I see that might give an indication is when I look at the running processes:

Code:

# ps waux | grep exim
mail     1231327  0.0  0.0  95088  9144 ?        Ss   16:31   0:00 /usr/sbin/exim -bd -q1h

I think that -q1h at the end indicates a queue would be processed only once per hour maybe?

But where this is coming from, where I could change this, I have no idea.

Would anyone be able to give some ideas on this?
Either how to troubleshoot (regular logs give no indication why the mail sits in the queue)
or maybe even a solution??

Thanks,
Jorge.

factor · Oct 25, 2020

From

Code:

man exim

-bd This option runs Exim as a daemon, awaiting incoming SMTP connections. Usually the -bd option is combined with the -q<time> option, to specify that the daemon should also initiate periodic queue runs.

The -bd option can be used only by an admin user. If either of the -d (debugging) or -v (verifying) options are set, the daemon does not disconnect from the controlling terminal. When running this way, it can be stopped by pressing ctrl-C.

By default, Exim listens for incoming connections to the standard SMTP port on all the host's running interfaces. However, it is possible to listen on other ports, on multiple ports, and only on specific interfaces.

When a listening daemon is started without the use of -oX (that is, without overriding the normal configuration), it writes its process id to a file called exim-daemon.pid in Exim's spool directory. This location can be overridden by setting
PID_FILE_PATH in Local/Makefile. The file is written while Exim is still running as root.

When -oX is used on the command line to start a listening daemon, the process id is not written to the normal pid file path. However, -oP can be used to specify a path on the command line if a pid file is required.

The SIGHUP signal can be used to cause the daemon to re-execute itself. This should be done whenever Exim's configuration file, or any file that is incorporated into it by means of the .include facility, is changed, and also whenever a new ver‐
sion of Exim is installed. It is not necessary to do this when other files that are referenced from the configuration (for example, alias files) are changed, because these are reread each time they are used.

-q This option is normally restricted to admin users. However, there is a configuration option called prod_requires_admin which can be set false to relax this restriction (and also the same requirement for the -M, -R, and -S options).

If other commandline options do not specify an action, the -q option starts one queue runner process. This scans the queue of waiting messages, and runs a delivery process for each one in turn. It waits for each delivery process to finish before
starting the next one. A delivery process may not actually do any deliveries if the retry times for the addresses have not been reached. Use -qf (see below) if you want to override this.

If the delivery process spawns other processes to deliver other messages down passed SMTP connections, the queue runner waits for these to finish before proceeding.

When all the queued messages have been considered, the original queue runner process terminates. In other words, a single pass is made over the waiting mail, one message at a time. Use -q with a time (see below) if you want this to be repeated
periodically.

Exim processes the waiting messages in an unpredictable order. It isn't very random, but it is likely to be different each time, which is all that matters. If one particular message screws up a remote MTA, other messages to the same MTA have a
chance of getting through if they get tried first.

It is possible to cause the messages to be processed in lexical message id order, which is essentially the order in which they arrived, by setting the queue_run_in_order option, but this is not recommended for normal use.

-q<qflags>
The -q option may be followed by one or more flag letters that change its behaviour. They are all optional, but if more than one is present, they must appear in the correct order. Each flag is described in a separate item below.

-q<qflags><time>
When a time value is present, the -q option causes Exim to run as a daemon, starting a queue runner process at intervals specified by the given time value. This form of the -q option is commonly combined with the -bd option, in which case a sin‐
gle daemon process handles both functions. A common way of starting up a combined daemon at system boot time is to use a command such as

/usr/exim/bin/exim -bd -q30m

Such a daemon listens for incoming SMTP calls, and also starts a queue runner process every 30 minutes.

When a daemon is started by -q with a time value, but without -bd, no pid file is written unless one is explicitly requested by the -oP option.

jvdwilk · Oct 26, 2020

Yes, thank you for pointing that out. I had kind of read through the exim man pages, but you focussed in exactly on the important parts.

I have found where the -q1h came from. It's the /etc/systemd/system/exim.service script, which starts the exim service.
Changing it should be easy, but I fail to understand WHY this value of 1 hour has been choosen. It seems so crazy high, and that makes me wonder if I am missing something vital here..?

It would be nice if DirectAdmin staff could say why this value should be 1 hour, and not say, 5 minutes..?

Bash:

# exim binary startup for DirectAdmin servers
# To reload systemd daemon after changes to this file:
# systemctl --system daemon-reload

[Unit]
Description=Exim Mail Transport Agent
After=network.target
Conflicts=sendmail.service postfix.service

[Service]
PrivateTmp=true
Environment=QUEUE=1h
ExecStart=/usr/sbin/exim -bd -q${QUEUE}
ExecReload=/bin/sh -c 'kill -HUP ${MAINPID}'

[Install]
WantedBy=multi-user.target

factor · Oct 26, 2020

/etc/init.d/exim queue runner

I'm having trouble getting my exim daemon to run according to the init.d script settings. The main problem is with the QUEUE variable. I've set it 5m to have it spawn another queue runner as needed...

serverfault.com

3. How Exim receives and delivers mail

Exim is a message transfer agent (MTA) developed at the University of Cambridge for use on Unix systems connected to the Internet.

www.exim.org

14. Retry mechanism

Exim’s mechanism for retrying messages that fail to get delivered at the first attempt is the queue runner process. You must either run an Exim daemon that uses the -q option with a time interval to start queue runners at regular intervals, or use some other means (such as cron) to start them. If you do not arrange for queue runners to be run, messages that fail temporarily at the first attempt will remain on your queue for ever. A queue runner process works its way through the queue, one message at a time, trying each delivery that has passed its retry time. You can run several queue runners at once.

Exim uses a set of configured rules to determine when next to retry the failing address (see chapter 32). These rules also specify when Exim should give up trying to deliver to the address, at which point it generates a bounce message. If no retry rules are set for a particular host, address, and error combination, no retries are attempted, and temporary errors are treated as permanent.

Richard G · Oct 26, 2020

I'm not native English as you know and now it's getting a bit technical for me but also my curiosity plays parts.
Just a question. We all have this same setting on the servers. If we send mail, it will be send immediately.

What exactly can trigger exim to wait for an hour to send a mail like Jvdwilk experiences. Is there a easy to understand explanation for this?

factor · Oct 26, 2020

The way I interpret it is.
When Exim starts it starts a queue runner process. This process "try" to sends the email immediately. If it sends its done. If it wont go due to the other server is down or offline or mailbox full. It waits an hour and "try" to send that email again. It's the normal delivery process. We all assume email just goes... but Exim doesn't. If it makes it through all its tries it fails to deliver it gives up on delivery of that email.

LawsHosting · Oct 26, 2020

A reason will be listed in the queue at the bottom if you go to a delayed message...... Probably a defer of some reason.

jvdwilk · Oct 27, 2020

What Richard is asking is exactly what I wat wondering about also. What is the reason for these mails to not being send out immediatelly?

@Peter Laws , there is no reason listed with the messages I am talking about. These message are accepted by Exim for delivery, and then nothing. No defer, no error, no delivery attempts, nothing at all. They just sit in the queue and wait for a queue runner to re-process the queue, at which time they usually deliver right away. Which is also why a -q1h is a problem, customers will ask why there messages take an hour for delivery.

I have now changed the -q1h to -q5m so that a new queue runner is started every 5 minutes, which is a delay most customers will not really notice.

But the question 'Why' remains. What is the root cause why these message are queued but not processed?
How to find out, when Exim doesn't log anything about this..?

@bdacus01 - if exim cannot send a mail on first try, it is supposed to write an error or a defer or one of many things in the message headers file (the top block of a queued message) and/or in the message log. These waitng messages get nothing added to them. They just sit and wait in queue.

Here is one example of a message that just sits and waits in the queue for a queue re-run:

View 1kXODa-006o5N-Jo

E-mail Headers

Code:

1kXODa-006o5N-Jo-H
mail 8 12
<[email protected]>
1603802210 0
-received_time_usec .614435
--helo_name [10.1.1.160]
-host_address 5.255.96.111.59288
--host_name tun-nld2.npnservers.net
-host_auth plain
-interface_address 54.36.89.14.587
-active_hostname madi.npnservers.net
-received_protocol esmtpsa
-aclc _esf_skip 1
0
-aclc _accept_recipient_if_whitelisted 1
1
-aclc _spam_assassin_has_run 1
0
-aclc _rspamd 1
1
-aclm _bc_skip 1
0
--aclm _user 23
[email protected]
-body_linecount 3
-max_received_linelength 67
--auth_id [email protected]
-tls_cipher TLS1.3:TLS_AES_128_GCM_SHA256:128
--tls_sni smtp.imglocker.net
-tls_ourcert -----BEGIN CERTIFICATE-----\nMIIE6DCCAt/UhaOlmv62wB/2ogxBGSqGSIb3DQEBCwUA\nMEoxCTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MSzAJBgNVBAYTAlVMwIQYDVQQD\nExpMZXQncyBFbmNyeXB0IEF1dGhvcml0eSBYMzAeFw0yMDA5MTYxODMyMjBaFw0y\nMDEyMTUxODMyMjBaMBgxFjAUBgNVBAMTDWltZ2xvY2tlci5uZXQwWTATBgcqhkjO\nPQIBBggqhkjOPQMBBwNCAAQtTa54MvS3nA26/ylXjFP5pmphCG\nZScYcCKGJQ6DsaJB+l5Vm45NqVV6rr5MEGa1p1oI3+XSFRdB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjAMBgNVHRMB\nAf8YX1VnZ43kBBo4ICwzCCAr8wDgYDVR0P\nAQH/BAQDAgeAMEAjAAMB0GA1UdDgQWBBSm7bPGgLxG9uOI/tyvhEQ26FBaCzAfBgNVHSMEGDAW\ngBSoSmpjBH3duubRObemRWXv86jsoTBvBggrBgEFBQcBAQRjMGEwLgYIKwYBBQUH\nMAGGImh0dHA6Ly9vY3NwLmludC14My5sZXRzZW5jcnlwdC5vcmcwLwYIKwYBBQUH\nMAKGI2h0dHA6Ly9jZXJ0LmludC14My5sZXRzZW5jcnlwdC5vcmcvMHkGA1UdEQRy\nMHCCEWZ0cC5pbWdsb2NrZXIubmV0gg1pbWdsb2NrZXIubmV0ghJtYWlsLmltZ2xv\nY2tlci5uZXSCEXBvcC5pbWdXRwLmltZ2xvY2tlci5uZXSC\nEXd3dy5pbsb2NrZXIubmV0ghJzbWdsb2NrZXIubmV0MEwGA1UdIARFMEMwCAYGZ4EMAQIBMDcGCysGAQQB\ngt8TAQEBMCgwJgYIKwYBBQUHAgEWGmh0dHA6Ly9jcHMubGV0c2VuY3J5cHQub3Jn\nMIIBBAYKKwYBBAHWeQIEAgSB9QSB8gDwAHYA5xLysDd+GmL7jskMYYTx6ns3y1Yd\nESZb8+DzS/JBVG4AAAFAiBuMqhUrxMQm\nJFCvF198R2vWi+umgIhALU2joKWcROu3uZYC6XOELNL8zieV5cUC5GzqNVmwgROn9rIyw8bbu0gMucS\nAHYAB7dcG+V9aP/xsMYdIxXHuuZXfFeUt2ruvGE6GmnTohwAAAF0mGdMzwAABAMA\nRzBFAiAPL5e2KuCHwFZyYOLIk5k87CQryT5nvuN07ZiJCxIwpwIhAIw70kT6M+mJ\nzNkRq9RXP/F1AYUt1/npYx7jPbHFdFNBMA0GCSqGSIb3DQEBCwUAA4IBAQBPcGPE\n61gbzCQ0G6jlOLIx9SBzw+NVGMA9CgAwIBAgISA8Z0GCEh76eS17VgI8b3GG+WYaBkBP8yXDBytFX+1Q2h1G\n/5R+pmxk2QLi3QOyzhnpVg6hXzh0k8Pl6+TtSI4yNu+VMd5yzCD9Kjf2ndiodcEF\nf4UURV8QCbOc5b4JpXPAtHlkYN4CEw6HwJ5V+afn05v5znBCe9MgUth5Ac5ZafFp\nUwd4eYOgaejbCgz7qFLzsLl4m4I/x4b8OYPxrm18mAGLEa04fFi4bL2lo86NJXG8\nv3A+Idi1iJvbzyhNee5SIRV2E72DJot0FmmAeQW1gDdQWj1n4zlwAd8WVtI7V+Ay\nrzrxikBgCtOsjGiv\n-----END CERTIFICATE-----\n
-tls_ver TLS1.3
XX
1
[email protected]

281P Received: from tun-nld2.npnservers.net ([5.255.96.111] helo=[10.1.1.160])
    by madi.npnservers.net with esmtpsa  (TLS1.3) tls TLS_AES_128_GCM_SHA256
    (Exim 4.94)
    (envelope-from <[email protected]>)
    id 1kXODa-006o5N-Jo
    for [email protected]; Tue, 27 Oct 2020 12:36:50 +0000
022T To: [email protected]
048F From: Test Check Madi <[email protected]>
019  Subject: test 5744
065I Message-ID: <[email protected]>
038  Date: Tue, 27 Oct 2020 18:21:47 +0545
089  User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
018  MIME-Version: 1.0
040  Content-Type: text/plain; charset=utf-8
032  Content-Transfer-Encoding: 7bit
024  Content-Language: en-US

E-mail Body Chunk

Code:

1kXODa-006o5N-Jo-D
test 5744

Log

Code:

2020-10-27 12:36:50 Received from [email protected] H=tun-nld2.npnservers.net ([10.1.1.160]) [5.255.96.111] P=esmtpsa X=TLS1.3:TLS_AES_128_GCM_SHA256:128 CV=no A=plain:[email protected] S=689 [email protected] T="test 5744"

This message has now been sitting in the queue for 15 minutes (yes, I changed back to -q1h for testing), and as soon as a queue runner comes by, or as soon as I hit Retry, it will be delivered. And all that time, no indication of what failed in the first place.

@DirectAdmin Support
Maybe DA Support can give some insights in this?

factor · Oct 27, 2020

I have the same setting. My emails send immediately.

2020-10-27 08:50:41 1kXPN3-005DSD-1Y <= [email protected] H=apollo.myserver.com [myip] P=esmtpsa X=TLS1.2:ECDHE-RSA-AES128-GCM-SHA256:128 CV=no A=login:[email protected] S=2605 id=[email protected] T="test email" from <[email protected]> for [email protected]
2020-10-27 08:50:42 1kXPN3-005DSD-1Y => [email protected] F=<[email protected]> R=lookuphost T=remote_smtp S=3580 H=aspmx.l.google.com [64.233.165.26] X=TLS1.3:TLS_AES_256_GCM_SHA384:256 CV=no C="250 2.0.0 OK 1603806642 f17si677340ljj.512 - gsmtp"
2020-10-27 08:50:42 1kXPN3-005DSD-1Y Completed

It took 1 second. I would log a support ticket.

factor · Oct 27, 2020

Maybe look at some of the DNS issues. Not sure if these are your real domains

intoDNS: imglocker.net - check DNS server and mail server health

intoDNS: Checking health and configurtion of DNS server and mail server for domainimglocker.net.

intodns.com

Free Tools - Software Reviews, Opinions, and Tips - DNSstuff

Explore DNSstuff's suite of free tools. Get comprehensive information and utilities to analyze, troubleshoot, and optimize your performance.

tools.dnsstuff.com

intoDNS: npnservers.net - check DNS server and mail server health

intoDNS: Checking health and configurtion of DNS server and mail server for domainnpnservers.net.

intodns.com

You might install

ConfigServer Mail Queues - cmq v3.02

ConfigServer Mail Queues (cmq) – ConfigServer Services

www.configserver.com

it's great for seeing what is going on in the queue

jvdwilk · Oct 27, 2020

@bdacus01 - _most_ of my emails also send fine. Only some of them sit in the queue for a longer time. It's only these emails this forum question is about. And this longer time used to be up to an hour, but with changing the queue runner re-spawn time (-q1h => -q5m) it solves a large part of the issue, where some (!) customer emails would be sitting, waiting in the queue for an unacceptably long time.

For me this has now changed from a serious issue to a head scratching 'why does this happen to some of the emails'.

I'll have a good look at the cmq application. Sounds promissing! Thanks for that tip.

And yes, the domains are real domains. But the shown DNS issues are not a big deal. Partly because the domains are just there for testing, and will be removed when the server goes 'production'. Partly because the 'open to recursive queries' is not really open to that. It's just that every request for an A record gets an answer that points to a single IP at which we host a 'parked domain' page.

factor · Oct 27, 2020

Ok well come back and let us know what you find.

some emails wait in queue for a long time before being sent through

jvdwilk

Verified User

factor

Verified User

jvdwilk

Verified User

factor

Verified User

/etc/init.d/exim queue runner

3. How Exim receives and delivers mail

14. Retry mechanism

Richard G

Verified User

factor

Verified User

LawsHosting

Verified User

jvdwilk

Verified User

View 1kXODa-006o5N-Jo

E-mail Headers

E-mail Body Chunk

Log

factor

Verified User

factor

Verified User

intoDNS: imglocker.net - check DNS server and mail server health

Free Tools - Software Reviews, Opinions, and Tips - DNSstuff

intoDNS: npnservers.net - check DNS server and mail server health

ConfigServer Mail Queues - cmq v3.02

ConfigServer Mail Queues (cmq) – ConfigServer Services

jvdwilk

Verified User

factor

Verified User

some emails wait in queue for a long time before being sent through

Verified User

Verified User

Verified User

Verified User

14. Retry mechanism​

Verified User

Verified User

Verified User

Verified User

View 1kXODa-006o5N-Jo​

E-mail Headers​

E-mail Body Chunk​

Log​

Verified User

Verified User

ConfigServer Mail Queues - cmq v3.02​

Verified User

Verified User

14. Retry mechanism

View 1kXODa-006o5N-Jo

E-mail Headers

E-mail Body Chunk

Log

ConfigServer Mail Queues - cmq v3.02