Reload of named.service fails, and multiple named processes on Debian 9

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
Greetings!

I am experiencing a problem with bind not reloading properly on one of our servers. On an identical server, it is working as it should, but it has some strange issues as well. I can't figure out why they are different, or why one of them is failing to reload bind. Both servers have been set up using Ansible (automation), so they should be exactly the same.

The first issue is the reload problem on web32. The error in the log is:

Code:
Apr  4 11:26:01 web32 systemd[1]: named.service: Unit cannot be reloaded because it is inactive.
And sure enough, when checking the status for named.service, this is the output on web32:

Code:
# systemctl status named.service
● named.service - BIND Domain Name Server
   Loaded: loaded (/etc/systemd/system/named.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2019-03-25 14:29:29 CET; 1 weeks 2 days ago
     Docs: man:named(8)
 Main PID: 4241 (code=exited, status=0/SUCCESS)

Apr 03 12:37:02 web32 systemd[1]: named.service: Unit cannot be reloaded because it is inactive.
While on web33, where everything works as expected, the output is different:

Code:
# systemctl status named.service
● named.service - BIND Domain Name Server
   Loaded: loaded (/etc/systemd/system/named.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-04-02 17:02:25 CEST; 1 day 18h ago
     Docs: man:named(8)
 Main PID: 18657 (named)
    Tasks: 7 (limit: 4915)
   CGroup: /system.slice/named.service
           └─18657 /usr/sbin/named -f -u bind

Apr 04 09:40:25 web33 rndc[24184]: server reload successful
If I check the status of bind9.service on the two servers, they show this for Web32:

Code:
# systemctl status bind9.service 
● bind9.service - LSB: Start and stop bind9
   Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled)
   Active: active (running) since Thu 2019-04-04 10:05:57 CEST; 1h 50min ago
     Docs: man:systemd-sysv-generator(8)
   CGroup: /system.slice/bind9.service
           └─12475 /usr/sbin/named -u bind
And web33:

Code:
# systemctl status bind9.service
● bind9.service - LSB: Start and stop bind9
   Loaded: loaded (/etc/init.d/bind9; generated; vendor preset: enabled)
   Active: active (running) since Thu 2019-04-04 10:08:35 CEST; 1h 48min ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 7 (limit: 4915)
   CGroup: /system.slice/bind9.service
           └─26631 /usr/sbin/named -u bind
Both servers have /etc/init.d/bind9 and /etc/systemd/system/named.service and they are identical.

Looking at the processlist on each server, I see two named processes running, which I find strange. In addition, the two servers are different in that regards as well:

Web32:

Code:
# ps auxfww | grep name[d]
root     21444  0.0  0.2 389260 21364 ?        Ssl  Mar21   0:12 named
bind     12475  0.0  0.3 411260 26788 ?        Ssl  10:05   0:01 /usr/sbin/named -u bind
Web33:

Code:
# ps auxfww | grep name[d]
bind     18657  0.0  0.3 411000 26332 ?        Ssl  Apr02   0:09 /usr/sbin/named -f -u bind
bind     26631  0.0  0.3 407620 29176 ?        Ssl  10:08   0:00 /usr/sbin/named -u bind
The servers are both running Debian 9(.8), Linux 4.9.0-5-amd64 x86_64, with Directadmin 1.56:

Code:
# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.8 (stretch)
Release:	9.8
Codename:	stretch

# uname -srm
Linux 4.9.0-5-amd64 x86_64

# /usr/local/directadmin/directadmin o
Compiled on 'Debian 9.0 64-bit'
Compile time: Mar 18 2019 at 02:18:53
Timestamp: '1552897108'
Compiled with IPv6

# /usr/local/directadmin/directadmin v
Version: DirectAdmin v.1.56.0
So I guess my three questions are:

1 - Why do one of my servers have an inactive named.service? A reload of bind9.service works as expected.
2 - Why do I have two named processes running in the first place?
3 - How do I resolve 1 and 2? :)

This is confusing me bigtime. Any help appreciated!
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
Ah, that is useful indeed. Not sure what has happened on my servers to cause one to work without that setting, and the other to not work though.
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
For some reason, when setting named_service_override=bind9 in the directadmin.conf file, and restarting directadmin, there seems to be no attempt whatsoever to reload bind. I can't see any trace of DirectAdmin even trying in any of the logs. The changed zone is not available if I query the local named either.
 
Last edited:

DirectAdmin Support

Administrator
Staff member
Joined
Feb 27, 2003
Messages
8,905
A few steps to test with:
  1. Add/remove a test record from any dns zone. Quickly type:
    Code:
    cat /usr/local/directadmin/data/task.queue
    to see if DA has added the changed value for the action. If not, confirm the setting with
    Code:
    ./directadmin c | grep named_service_override
  2. If you do see the correct action, you can manually test it, repeatedly by dumping the same thing to the task.queue again, and running the dataskq with
    Code:
    cd /usr/local/directadmin
    echo 'action=bind9&value=reload' >> data/task.queue; ./dataskq d2000
    and check /var/log/directadmin/system.log to see if mentinos bind9 or not.

John
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
After changing a DNS zone, /usr/local/directadmin/data/task.queue contained:

Code:
action=bind%39&value=reload
So it seems the 9 is urlencoded before inserted for some reason. The config does not contain the urlencoded value:

Code:
# ./directadmin c | grep named_service_override
named_service_override=bind9
I also tried manually inserting it with a proper name:

Code:
echo 'action=bind9&value=reload' >> /usr/local/directadmin/data/task.queue
In both these cases, nothing was logged to /var/log/directadmin/system.log or /var/log/directadmin/errortaskq.log or anywhere else that I could find, but the entry in the queue file disappeared.
 

DirectAdmin Support

Administrator
Staff member
Joined
Feb 27, 2003
Messages
8,905
URL encoding shouldn't affect anything, as it's decoded in the dataskq.
I'm interested in the dataskq output, so please run it through as described in #2 of my previous post.

John
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
Apologies, I expected it to show in the log as well. Here's the output:

Code:
# echo 'action=bind9&value=reload' >> data/task.queue; ./dataskq d2000
Debug mode. Level 2000

root priv set: uid:0 gid:0 euid:0 egid:0
pidfile written
starting queue
dataskq: command: action=bind9&value=reload
done queue
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
This has been fixed in DirectAdmin version 1.57. When making a zone change now, I see the following in /var/log/directadmin/system.log:

Code:
2019:05:30-23:04:01: bind9 reloaded
2019:05:30-23:04:03: bind9 restarted
Probably not necessary to do both a reload and a restart, but it works.
 

kristian

Verified User
Joined
Nov 4, 2005
Messages
100
Location
Norway
The "wrong" named will start on reboot, so in order to resolve that, disable named and enable bind9:

Code:
systemctl disable named.service
systemctl enable bind9.service
 
Last edited:
Top