Slow HELO on established SMTP connection

SupermanInNY

Verified User
Joined
Sep 28, 2004
Messages
419
Hi All,

One of the service providers has done something on their network last thursday and ever since then, users with ADSL who use that ISP are getting very slow connections to the SMTP on my servers.
The problem is that this behaviour is not consistent.
Dynamic ADSL sometimes exhibit this problem and sometimes not.
Same goes with Fixed IP (also ADSL, but with fixed IP plans from the ISP).
The problem itself is by very long wait time to send emails out, sometimes as a long as a minute just for "hello world" with no attachments.
Then, the email goes through.
Recieving emails is not a problem. Immediate!
from a command line: telnet <ip> 25
opens a black window and it hangs for about 40 seconds and then acknowlege.
Notice, this is an issue only that particular ISP.
Other ISP go through with zero wait time.
The ISP did something on their network last week that affected their own email system.
However, in this case, it is a matter of their ADSL users trying to SMTP to my servers which are not in that DataCenter where the ISP is.

This has been tested on servers that reside both behind other datacenter's firewalls and other data center's without firewall.

Again, users from different ISPs have no problems at all.

I need a server side netstat log-analyzer/sniffer that can monitor the SMTP connections to see where the delay/fail occurs.
Nothing is showing in the /var/log/exim/mainlog as the connection eventually does succeed, but just take very very long time.
I'm open for any ideas, thoughts etc.

-Alon.
 
Is your server having problems resolving ip addresses?
 
Is your server having problems resolving ip addresses?

No. No problems at all.

Mind you that this is occurring only from a specific ISP! All other ISPs are working fine.

I'm sure any or all of the utilities below can be of some help, if anyone can suggest how to employ them to capture a connection and make sense of the output, it will probably be of some use in diagnosing the source of the problem.

wireshark(1), tshark(1), editcap(1), mergecap(1), capinfos(1), pcap-filter(4), tcpdump(8), pcap(3)
 
Best way to know where the problem resides?
Just run Wireshark to capture everything when doing `telnet <ip> 25'.

EDIT: I just saw your answer. You probably want to run "tshark -i eth0" as root if you have a linux box, or capture any traffic on the Internet interface with Wireshark on Windows. I'm talking about the PC that resides behind the mentioned ISP, not the server.

The important thing is where the delay resides: between the initial DNS request-reply? Between the SYN-SYNACK of the SMTP connection? Between some indefinite ACK-ACK during the SMTP connection?

I can show you an example if you want.
 
Best way to know where the problem resides?
Just run Wireshark to capture everything when doing `telnet <ip> 25'.

EDIT: I just saw your answer. You probably want to run "tshark -i eth0" as root if you have a linux box, or capture any traffic on the Internet interface with Wireshark on Windows. I'm talking about the PC that resides behind the mentioned ISP, not the server.

The important thing is where the delay resides: between the initial DNS request-reply? Between the SYN-SYNACK of the SMTP connection? Between some indefinite ACK-ACK during the SMTP connection?

I can show you an example if you want.

OK... I found part of the source problem, but still not clear about the solution:


bess.domain.com:/root # tshark |grep 89.138.34.192
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
4.019499 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [SYN] Seq=0 Win=8192 Len=0 MSS=1200
4.019537 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
4.050479 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [ACK] Seq=1 Ack=1 Win=64800 Len=0
4.057222 192.115.106.31 -> 1.2.3.4 DNS Standard query response PTR 89-138-34-192.bb.netvision.net.il
4.058226 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
9.058625 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
12.058937 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
18.058864 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
23.059347 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
26.060263 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
32.059708 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
37.060160 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
40.060485 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
40.060575 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
45.061335 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
48.061995 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
54.062263 1.2.3.4 -> 89.138.34.192 SMTP Response: 220 bess.domain.com ESMTP Exim 4.69 Mon, 01 Dec 2008 17:17:29 +0200
54.289276 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [ACK] Seq=1 Ack=68 Win=64733 Len=0
90.805307 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
90.805371 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=2 Win=5840 Len=0
90.973929 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
90.973967 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=3 Win=5840 Len=0
91.202509 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
91.202550 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=4 Win=5840 Len=0
91.389120 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
91.389162 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=5 Win=5840 Len=0
91.483550 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
91.483590 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=6 Win=5840 Len=0
91.676656 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
91.676688 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=7 Win=5840 Len=0
91.813310 89.138.34.192 -> 1.2.3.4 SMTP Command: helo a
91.813403 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=68 Ack=9 Win=5840 Len=0
91.813514 1.2.3.4 -> 89.138.34.192 SMTP Response: 250 bess.domain.com Hello a [89.138.34.192]
92.027144 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [ACK] Seq=9 Ack=111 Win=64690 Len=0
95.368403 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
95.408766 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=10 Win=5840 Len=0
95.492562 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
95.492589 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=11 Win=5840 Len=0
95.636954 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
95.636995 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=12 Win=5840 Len=0
95.804079 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
95.804120 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=13 Win=5840 Len=0
96.020417 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
96.020458 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=14 Win=5840 Len=0
96.571757 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
96.571798 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=15 Win=5840 Len=0
96.766414 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
96.766525 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=16 Win=5840 Len=0
96.910505 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
96.910555 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=17 Win=5840 Len=0
97.038409 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
97.038445 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=18 Win=5840 Len=0
97.224520 89.138.34.192 -> 1.2.3.4 SMTP Command: mail from
97.224565 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=111 Ack=20 Win=5840 Len=0
97.224659 1.2.3.4 -> 89.138.34.192 SMTP Response: 500 unrecognized command
97.434364 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [ACK] Seq=20 Ack=137 Win=64664 Len=0
175.925601 89.138.34.192 -> 1.2.3.4 TCP [TCP segment of a reassembled PDU]
175.964953 1.2.3.4 -> 89.138.34.192 TCP smtp > 55957 [ACK] Seq=137 Ack=21 Win=5840 Len=0
75396 packets captured

bess.domain.com:/root # tshark |grep 194.90.1.33
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
23.101008 194.90.1.33 -> 1.2.3.4 TCP 58633 > smtp [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=1182910893 TSER=0 WS=6
23.101092 1.2.3.4 -> 194.90.1.33 TCP smtp > 58633 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
23.103752 194.90.1.33 -> 1.2.3.4 TCP 58633 > smtp [ACK] Seq=1 Ack=1 Win=5840 Len=0
23.117993 192.115.106.31 -> 1.2.3.4 DNS Standard query response A 194.90.1.33
23.119581 1.2.3.4 -> 194.90.1.33 SMTP Response: 220 bess.domain.com ESMTP Exim 4.69 Mon, 01 Dec 2008 17:20:17 +0200
23.122238 194.90.1.33 -> 1.2.3.4 TCP 58633 > smtp [ACK] Seq=1 Ack=68 Win=5840 Len=0
15103 packets captured


You may notice that there is a lot reverse DNS resolving, and then it gives up and pass through.

Now,. on dynamic IPs there should be no 'true' Reverse DNS entries. But still it has some kind of a resolving.

If I disable in exim.conf:


# DO HOST LOOKUP
# OPTIONAL MODIFICATIONS:
# The setting below causes Exim to do a reverse DNS lookup on all incoming
# IP calls, in order to get the true host name. If you feel this is too
# expensive, you can specify the networks for which a lookup is done, or
# remove the setting entirely.

# host_lookup = *




Then, we can telnet immediatly,. but when attempting to send via Outlook Express,. we experience the same old behavior.
So I'm still stuck!

-Alon.
 
Last edited:
The netvision.net.il ISP manages a reverse DNS pointer (DNS PTR request, means IP -> host) for any dynamic address, following the RFC.

The real problem is that the normal resolution (DNS A request, means host -> IP) give an empy answer with the "SERVFAIL" reply code. Their DNS is completely broken.

Now, this should not give any problem to exim... but it seems like it does.
It's normal for him to do that A resolution, because a PTR host has to resolve back to the address: this is the FQDN standard.
Receiving a SERVFAIL should just mark the host as spoofed/invalid and it should continue.

I don't have any other suggestions right now, beside reading some exim documentation and searching on Google about the specific problem.
 
Alon, have you made any changes to the default settings on your exim.conf file?

Jeff
 
Alon, have you made any changes to the default settings on your exim.conf file?

Jeff

Maybe!
But, with that said, if I have made any changes, those were made about a year ago. No changes since.
Also, this erratic bahavior occurred at once on several separate servers, sitting in separate data centers, some behind shared firewall of the data center and others entirely outside.

-Alon.
 
If you made the same modification everywehre, that's where the error must be.
You have noticed the error only now because it's extremely rare that an ISP would have such broken DNS for a long time.

Try to remember what modifications you made, retrieve a backup or just read exim.conf documentation about DNS behaviour and check your conf accordingly.
 
What I'm thinking could be happening (Alon, please tell me if I'm wrong :)) is that exim is trying a DNS lookup to reconcile HELO to an IP#, or perhaps IP# to a domain name for some other reason, and waiting for timeouts because of the bad DNS.

Jeff
 
What I'm thinking could be happening (Alon, please tell me if I'm wrong :)) is that exim is trying a DNS lookup to reconcile HELO to an IP#, or perhaps IP# to a domain name for some other reason, and waiting for timeouts because of the bad DNS.

Jeff

That seems like a good description.
There is no Reject to the email, it is just a 'time-out'.
Now here is the next part:

I tried to comment out this section in the exim.conf to try to overcome this problem:


# DO HOST LOOKUP
# OPTIONAL MODIFICATIONS:
# The setting below causes Exim to do a reverse DNS lookup on all incoming
# IP calls, in order to get the true host name. If you feel this is too
# expensive, you can specify the networks for which a lookup is done, or
# remove the setting entirely.

# host_lookup = * (default not commented).

I had the ISP guy test it out (I myself am not using that ISP, so I can't test a connection from that ISP,. for me everything works great).

After I've commented that line out, when he attempted to do

telnet <ip> 25 from a 'timing-out ip' he got through immediatly with no problem.


This was a huge progress in terms of finally working it fast.
But, he tried to email another message from Outlook Express and it again exhibited same behavior.

But for now, it appears that this is a step towards understanding the delay.

Exactly at this enlightning moment, the ISP rep that I was working with informed me that "sorry, my shift ended,.. see ya tomorrow,. OH.. tomorrow I'm off,. see you Wednesday".
He has another rep taking over, but my understanding is that the replacement rep is not as proficient, so I may experience another delay in this entire thingy.


Falling back onto: host_lookup = *
How bad is it to have commented out?
The weired thing is, I find it hard to believe it is the source of the problem. It does't make much sense to me.
It appears to me more and more that somehow the DNS resolution is failing when such a query is initiated for Reverse DNS for Email verification from the ISP side.
Now, that doesn't make much sense as I don't think that their DNS server cares if this is for SMTP or some other protocol, but maybe it is?
Have you ever seen anything like that?
The issue started out on last Thursday when the ISP had a problem with their own Radius server not authenticating users for emails.I'm not sure how this is related directly or indirectly to my problem, maybe you can think of something?

Thanks,

-Alon.
 
Last edited:
I think it is the host_lookup = * that's causing it, though I don't know why you'd be okay manually but not on the check using an email client. Have you checked to see if outgoing reverse dns lookup works from your server with their IP#?

Jeff
 
I think it is the host_lookup = * that's causing it, though I don't know why you'd be okay manually but not on the check using an email client.

This was tested at 6:01PM and the ISP rep left at 6:05PM, so he was kinda of a hurry to finish the shift. Tomorrow I'll re-test this and see if this was a fluke or the real reason/cause.
In any event, my understanding is that I should be OK not utilizing this option and I will not be prone to have spammers hammering my server just because I disabled this option.


Have you checked to see if outgoing reverse dns lookup works from your server with their IP#?

You mean attempting to resolve the the IP?
What should I do to make this run correctly, meaning:

whois <ip>?
or some other command that runs the reverse dns lookup? (what is it?)

or perhaps I should do tracepath?

Thanks,

-Alon.
 
RESOLVED!
========

Want to know how?

Me too!

Ok... here is some info that I did manage to decipher and find.

I have a subscription to dnsstuff (yes,. I paid them for the membership and its worth it for me) and I queried them for testing configurations.

Here is the output I had yesterday and the output I had today. Notice the difference in the outputs:


TODAY'S OUTPUT

IP Information - 199.203.84.17
Email link to results
Generated by www.DNSstuff.com


When the server was last reloaded, we had 5303 IP addresses banned.
Remember, you are not allowed to use automated programs to access our tools, unless you have a purchased a DNSstuff automated usage plan.
Please email [email protected] to learn more.

IP address: 199.203.84.17
Reverse DNS: odap-199-203-84-17.bb.netvision.net.il.
Reverse DNS authenticity: [Verified]
ASN: 1680
ASN Name:
IP range connectivity: 4
Registrar (per ASN): RIPE
Country (per IP registrar): US [United States]
Country Currency: USD [United States Dollars]
Country IP Range: 199.203.0.0 to 199.203.255.255
Country fraud profile: Normal
City (per outside source): Unknown
Country (per outside source): -- []
Private (internal) IP? No
IP address registrar: whois.arin.net
Known Proxy? No
Link for WHOIS: 199.203.84.17



YESTERDAY'S OUTPUT

IP Information - 199.203.84.17
Email link to results
Generated by www.DNSstuff.com


When the server was last reloaded, we had 5131 IP addresses banned.
Remember, you are not allowed to use automated programs to access our tools, unless you have a purchased a DNSstuff automated usage plan.
Please email [email protected] to learn more.

IP address: 199.203.84.17
Reverse DNS: odap-199-203-84-17.bb.netvision.net.il.
Reverse DNS authenticity: [Could be forged: hostname odap-199-203-84-17.bb.netvision.net.il. does not exist]
ASN: 1680
ASN Name:
IP range connectivity: 4
Registrar (per ASN): RIPE
Country (per IP registrar): US [United States]
Country Currency: USD [United States Dollars]
Country IP Range: 199.203.0.0 to 199.203.255.255
Country fraud profile: Normal
City (per outside source): Unknown
Country (per outside source): -- []
Private (internal) IP? No
IP address registrar: whois.arin.net
Known Proxy? No
Link for WHOIS: 199.203.84.17


The question:

So,. the next question is: who is responsible for the Verification process?
Is it a RIPE misconfiguration or the ISP ?

-Alon.
 
That really doesnt mean much it just means it couldnt resolve the hostname. You should of tested on your local machine if the hostname resolved or not. Then you would of known if it was a problem at their dns server or just a problem at dnsstuff
 
I guess he needs a dnsstuff subscription because he can't read :( I've tried to explain that. Maybe it's just my english... or I'm too technical.

Anyway, what's important is that the problem is not solved. It's just working now because the ISP's DNS is OK, but next time an ISP will have a DNS problem it will happen again, for sure.

Keep searching: why does your exim retry a DNS A query (required by FQDN check, following SMTP RFC) multiple times when receiving a SERVFAIL answer?
This behaviour is. just. wrong. :)
 
I guess he needs a dnsstuff subscription because he can't read :( I've tried to explain that. Maybe it's just my english... or I'm too technical.

Anyway, what's important is that the problem is not solved. It's just working now because the ISP's DNS is OK, but next time an ISP will have a DNS problem it will happen again, for sure.

Keep searching: why does your exim retry a DNS A query (required by FQDN check, following SMTP RFC) multiple times when receiving a SERVFAIL answer?
This behaviour is. just. wrong. :)

There was no SERVFAIL answer.
There was NO answer at all. It timed out.

4.050479 89.138.34.192 -> 1.2.3.4 TCP 55957 > smtp [ACK] Seq=1 Ack=1 Win=64800 Len=0
4.057222 192.115.106.31 -> 1.2.3.4 DNS Standard query response PTR 89-138-34-192.bb.netvision.net.il
4.058226 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
9.058625 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
12.058937 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
18.058864 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
23.059347 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
26.060263 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
32.059708 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
37.060160 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
40.060485 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
40.060575 1.2.3.4 -> 192.115.106.31 DNS Standard query A 89-138-34-192.bb.netvision.net.il
45.061335 1.2.3.4 -> 62.219.186.7 DNS Standard query A 89-138-34-192.bb.netvision.net.il
48.061995 1.2.3.4 -> 192.115.106.35 DNS Standard query A 89-138-34-192.bb.netvision.net.il
54.062263 1.2.3.4 -> 89.138.34.192 SMTP Response: 220 bess.domain.com ESMTP Exim 4.69 M
 
Trust me, it was SERVFAIL.

You were running this command:
Code:
# tshark |grep 89.138.34.192
Therefore you weren't seeing the answer.
I did the same query here and the answer was SERVFAIL.
 
Yesterday:
Reverse DNS: odap-199-203-84-17.bb.netvision.net.il.
Reverse DNS authenticity: [Could be forged: hostname odap-199-203-84-17.bb.netvision.net.il. does not exist]
reverse DNS was pointing to odap-199-203-84-17.bb.netvision.net.il, but there was no A record for odap-199-203-84.17.bb.netvision.net.il. As others have pointed out, that's a problem with netvision.net.il.
or some other command that runs the reverse dns lookup? (what is it?)
On Windows:
Code:
nslookup xx.yy.zz.aa
On Linux:
Code:
dig -x xx.yy.zz.aa +short
In both cases above replace the xx.yy.zz.aa with the real IP#.

Jeff
 
Back
Top