Weird DNS Problem

rohit

Verified User
Joined
Mar 20, 2007
Messages
198
Location
Melbourne
Hi All,

Yesterday, I came across a very weird DNS problem.

named couldn't resolve one domain.

#ping www.cybergraff.com

the above command didn't work.

I checked DNS (named) and it was resolving correctly all the other domains.

e.g
#ping www.yahoo.com (worked)
#ping www.google.com (worked)

I checked the entries in /etc/resolv.conf and they seemed to be fine too. I then added my ISP's dns in /etc/resolv.conf and it started to resolve cybergraff.com domain correctly.

I checked the firewall and even stopped it, but still resolution to cybergraff.com didn't work. Any other domain hosted on the server where cybergraff.com is hosted, didn't resolve too.

I finally end up restarting the named service, which then fixed the problem.

Does anyone know, what would have caused this to happen and what steps should I follow to avoid this from happening again?

Thanks in advance.

Regards
 
Here are a few steps that can help you if the matter continues.

First of all don't use `ping' to test DNS resolution, use `dig'. For exemple:

Code:
$ dig www.cybergraff.com

; <<>> DiG 9.4.2-P2 <<>> www.cybergraff.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26714
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;www.cybergraff.com.		IN	A

;; ANSWER SECTION:
www.cybergraff.com.	86400	IN	A	203.122.59.71

;; AUTHORITY SECTION:
cybergraff.com.		86400	IN	NS	ns2.spectranet.com.
cybergraff.com.		86400	IN	NS	ns5.spectranet.com.

;; Query time: 216 msec
;; SERVER: 85.17.207.63#53(85.17.207.63)
;; WHEN: Thu Nov 20 10:43:29 2008
;; MSG SIZE  rcvd: 99

The important data is:
  • the "status:" answer type; it will tell you what's the matter: NXDOMAIN means there is no zone/record, SERVFAIL means that there has been a major internal failure (for example two CNAME records), NOERROR means that the answer is complete; there are more, but you won't probably see them
  • the "ANSWER SECTION:"; it's, of course, the content of the answer; in case of NOERROR you may have nothing, this means that the record/zone exists but doesn't have the requested TYPE (for a normal query it's A, means address)
  • the "SERVER:" value; here resides the exact IP address of the DNS reply you received; make sure you are querying the right server
  • the number which is "86400" on my example query; it's the TTL, I'll explain later why it's useful

When you tested your server, do the same thing on your ISP by running `dig www.cybergraff.com @xxx.xxx.xxx.xxx'.

If the answer differs, look at the TTL.

If it's the same, it means that your DNS and the ISP's one have both queried the record at the authoritative nameservers instead of reading the cache, but they have a different record... so the authoritative nameserver is distributing different answers for the same record OR either your DNS or the ISP one is broken.

If your TTL is smaller, it could just be a faulty cached answer... it happens when an authoritative nameserver has had a wrong answer for some time and some DNS have cached that answer. Solution is to refresh your DNS cache.

If your TTL is larger, it's the ISP DNS which has a cached working answer, while your DNS is probably requesting directly to the broken authoritative nameservers. There is no other fast solution but using your ISP's DNS until `dig www.cybergraff.com @your.local.dns.address' gives a good answer.

To check the real answer of the authoritative nameserver you can `dig +trace www.cybergraff.com'. This will ask directly and recursively any record but the initial ".", which is request to the normal DNS. There will be multiple answers, for ".", "com.", "cybergraff.com." and finally "www.cybergraff.com."; just look at the last one.

Hope this will be helpful.

Does anyone know, what would have caused this to happen and what steps should I follow to avoid this from happening again?
You can refresh the cache periodically... but the real answer is that the DNS system is old, broken and stupid (for example, like in this exact case, it permits an admin to make any other DNS to cache the answer for 86400/3600 = 24 hours; this is way too much). Juste bear with it. :)
 
Last edited:
thanks tillo for explaining that in details

I did use dig in first place, it was just that I mentioned ping in my post. :(

dig couldn't get the result and just came back with an error (I cant remember the exact error)

When i tried dig with my ISP DNS server, it was all working fine for the same domain (cybergraff.com).
 
Back
Top