DNS becomes unresponsive

HMTKSteve · May 6, 2014

This is a bit of a weird problem.

I just installed custombuild 2 and currently my installed software looks like this:

Apache 2.4.9 Running
DirectAdmin 1.45.2 Running
Exim 4.69 Running
MySQL 5.0.51a Running
Named 9.3.6 Running
ProFTPd 1.3.1 Running
sshd Running
dovecot 1.2.10 Running
Php 5.5.12 Installed

After about 3 hours of server uptime my DNS stops working... sort of. named continues to run and I can ping my ns1 and ns2 but if I ping any domains that use my ns1 and ns2 nothing comes back. Not the ip information, nothing. It is like I am pinging a domain that does not exist.

This just started happening after I updated apache and php.

Any ideas?

SeLLeRoNe · May 7, 2014

And apache/php have nothing to do with DNS, named does....

What if you use dig command to analyze the nameservers?

Ex: dig @ns1.yourdomain.tld anhosteddomain.tld

This will "ask" to ns1.yourdomain.tld if he know and what ip he does have associated for anhostedomain.tld

Regards

HMTKSteve · May 11, 2014

I did find a mistake in my DNS records in that NS1 had an extra IP assigned to it which was causing some problems.

When I fixed that the problem appeared to go away until it cropped up again today. While I can put in cron for named to restart once an hour I'd like to try and find a fix for why named is acting this way.

The next time named becomes unresponsive I will use the dig command.

HMTKSteve · May 13, 2014

I used intodns this morning when the name server became unresponsive and both ns1 and ns2 go down and do not respond to any requests. The IPs for the me servers are functioning and named is running on the server but no response.

Is there a particular command I should use with dig? When I use dig domain.TLD I get a short response that tells me nothing.

SeLLeRoNe · May 13, 2014

dig @NAMESERVER DOMAIN.TLD

Probably is not responding/listening on 53 port?

Also try:

telnet NAMESERVER 53

and check if it does respond.

Regards

scsi · May 13, 2014

Highly doubt its dns related. It has to be something else causing it.

Is the named daemon still running and listening on port 53?

Code:

ps xuawww | grep -i named | grep -v grep

Code:

netstat -na | grep -i list | grep 53

My guess is its firewall related. Does everything else work fine on the server?

nobaloney · May 13, 2014

@HMTKSteve:

I agree with scsi.

Also...

Are you running two nameservers? Or are you pointing both ns1 and ns2 to the same physical server?

When the nameservers appear down, can you ping them?

Jeff

HMTKSteve · May 13, 2014

While DNS has not gone down again today I have noticed a very consistent high cpu load (10-20+).

I ran ps -aux and found the following two lines for named:

Code:

root     24894 60.6  0.0  88800  2972 ?        Ssl  06:31 552:27 ./named -c name
root     24920 60.6  0.0  88800  2968 ?        Ssl  06:31 552:36 ./named -c name

That was about 12 hours after restarting named

Code:

root     24006 19.8  0.0  88800  2940 ?        Ssl  22:00   0:24 ./named -c named.conf
root     24030 20.7  0.0  88800  2944 ?        Ssl  22:00   0:24 ./named -c named.conf

Above is about 10 minutes after restarting named

Here is a top snippet:

Code:

top - 21:48:15 up 2 days,  8:16,  1 user,  load average: 11.60, 11.99, 11.95
Tasks: 174 total,   5 running, 168 sleeping,   0 stopped,   1 zombie
Cpu(s): 85.6%us,  2.0%sy, 12.3%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  16439536k total, 14556568k used,  1882968k free,   530484k buffers
Swap: 33551744k total,      116k used, 33551628k free, 12490564k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23341 dicetowe  25   0  344m  91m 9372 R 91.1  0.6   0:05.99 httpd
23344 dicetowe  25   0  318m  63m 7320 R 90.8  0.4   0:12.68 httpd
23230 dicetowe  25   0  336m  95m  21m R 88.5  0.6   0:30.97 httpd
23357 apache    16   0  271m  19m 9172 S 32.9  0.1   0:00.99 httpd
23361 dicetowe  17   0  318m  65m 9572 R 31.6  0.4   0:01.03 httpd
24920 root      22   0 88800 2968  996 S 24.9  0.0 557:23.66 named
24894 root      20   0 88800 2972  996 S 24.6  0.0 557:14.82 named
23237 apache    17   0  271m  19m 9560 S  8.6  0.1   0:06.03 httpd
23354 apache    15   0  269m  16m 8200 S  2.3  0.1   0:00.07 httpd
23360 apache    15   0  268m  16m 8740 S  2.3  0.1   0:00.11 httpd
22919 mysql     15   0  764m 146m 4692 S  0.7  0.9   1:20.60 mysqld

After restarting named PIDs 24920 and 24894 were still running. I had to manually kill them.

name servers are ns1.hmtk.com and ns2.hmtk.com. Both are on the same server with different IPs.

Any ideas why named is going haywire? After I kill named my server load drops down to 1.5 - 2.1 range until I restart named.

EDIT:

Code:

netstat -na | grep -i list | grep 53

Returns nothing. Then I restarted named from the directadmin panel:

Code:

[root@server1 ~]# netstat -na | grep -i list | grep 53
tcp        0      0 74.84.140.79:53             0.0.0.0:*                   LISTEN
tcp        0      0 74.84.136.235:53            0.0.0.0:*                   LISTEN
tcp        0      0 74.84.136.212:53            0.0.0.0:*                   LISTEN
tcp        0      0 74.84.138.99:53             0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:53                0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:953               0.0.0.0:*                   LISTEN

Code:

[root@server1 ~]# ps xuawww | grep -i named | grep -v grep
root     24006 66.5  0.0  88800  2940 ?        Ssl  22:00   8:40 ./named -c named.conf
root     24030 66.9  0.0  88800  2968 ?        Ssl  22:00   8:40 ./named -c named.conf
named    24372  0.0  0.0 185144  4688 ?        Ssl  22:11   0:00 /usr/sbin/named -u named

Something is very wrong. When named is running by user named everything works perfectly. However, within 15 minutes of killing the named processes being run by root they reappear and kill the process run by named and DNS stops working.

Took a quick look at system messages and I see a lot of these in my /var/log/messages

Code:

May 12 09:46:55 server1 named[13919]: client 217.165.23.240#1063: query (cache) 'atriumcomp-com.mail.protection.outlook.com/A/IN' denied
May 12 09:47:13 server1 named[13919]: client 217.165.23.240#1063: query (cache) 'ashevilleart.org/MX/IN' denied
May 12 09:47:16 server1 named[13919]: client 217.165.23.240#1063: query (cache) 'ashoka.org/MX/IN' denied
May 12 09:47:17 server1 named[13919]: client 217.165.23.240#1063: query (cache) 'benabe-com.mail.protection.outlook.com/A/IN' denied
May 12 09:47:22 server1 named[13919]: client 69.58.110.133#8996: query (cache) 'bentley-usa.com/MX/IN' denied
May 12 09:47:30 server1 named[13919]: client 85.100.41.251#59245: query (cache) 'globeemail.com/MX/IN' denied

scsi · May 14, 2014

Something doesnt seem right. Maybe its not reloading properly.

You might want to email directadmin support and have them look at it. I dont know what process they use to reload after changes were made.

https://www.directadmin.com/clients/safesubmit.php

nobaloney · May 14, 2014

@HMTKSteve:

More specifically, jt appears you've had two copies of the named daemon running when you should only have one.

If you still do, then try this:

First stop named from DirectAdmin.

Then as root:

Code:

killall -9 named

and then start named from the control panel or the root shell.

Then check to see if you still have multiple copies running.

Jeff

HMTKSteve · May 14, 2014

I killed them all last night and around midnight the rooted named stop appearing. It has been running fine for the last 17 hours.

DNS becomes unresponsive

HMTKSteve

Verified User

SeLLeRoNe

Super Moderator

HMTKSteve

Verified User

HMTKSteve

Verified User

SeLLeRoNe

Super Moderator

scsi

Verified User

nobaloney

NoBaloney Internet Svcs - In Memoriam †

HMTKSteve

Verified User

scsi

Verified User

nobaloney

NoBaloney Internet Svcs - In Memoriam †

HMTKSteve

Verified User