Memory hungry server killing named

klasje · Jan 28, 2025

Hello,

Since I updated the server to php 8.* , I experience the Ubuntu server killing named every 2 days as it's running out of memory.
I keep on upgrading the vps and adding more memory to it, to a point it's becoming quite expensive.
This server is running ok for about 4 years now and never had this issue before.

Is this users that increased their php max memory setting too much or is something else wrong and how to find out the cause?
I do not see any excessive memory usage at this time and it always crashes when I'm sleeping.
Directadmin port 2222 is unresponsive as well at that time.
I suspect it's related to clamav, but I'm not sure.

The server can kill php or nginx or ... as long as it restarts it, but it doesn't seem to restart named after killing it.
Is there:
1) A way to avoid it
2) A way that if it's killed, directadmin tries to restart it?

Many thanks in advance!

Ohm J · Jan 28, 2025

maybe outdate of system package ?

try update the system package via "apt update" and use custombuild to rebuild the php again.

I have experiment of high memory usage from PHP process since last 2 version previous update.

klasje · Jan 29, 2025

It just crashed again, after updating everything 2 days ago. Any other suggestions?

Ohm J · Jan 29, 2025

No more.... maybe it need to monitor which process cause the memory leaks.

For normally I always monitor via "Resource Limits" page for users process, If something went to high usage without any reason... so it's meant something relate to users process.

If none, so it's something else that accident cause in the sametime before/after you change the PHP version.... Like your server get DDOS in some service.

klasje · Jan 30, 2025

Well, that specific server is the only server I still have on the directadmin datacenter license, so it has no resource limits page. ;-)
Only newer licenses have the resource limits option.
Can I see it using a command line?

In the service monitor page I do not see any excessive resource or memory usage during the day, but something seems to happen at night always...

Can I not tell ubuntu not to kill named when out of memory, but only nginx and php-fpm?
I always get error ns not resolved in the browser and cannot load any webpages, not even da.
Adding extra memory to a server that already has plenty feels like such a waste...

kristian · Jan 30, 2025

There are plenty of tools that will tell you which processes are using memory, such as ps and top. The kernel's OOM-killer is responsible for killing processes to avoid the whole machine going down. How it decides what to kill is a whole other topic, but I believe it tries to pick something that will free up the most memory. Why this would be named is a bit strange, specially if it happens every time, and is the only process killed. Check your system logs for information about the OOM killer, and monitor the memory usage of your processes. If named is using more than a fraction of your total memory, it's likely it has a memory leak somewhere, and it should probably be upgraded (or downgraded).

Hostmavi · Jan 30, 2025

HI;

did you limited your bind(named) caching memory?
default value 'max-cache-size 90% of memory .maybe this using your memory.
maybe your version of Bind it has a memory leak bug.

check your in named.conf unter options if you have any setting max-cache-size
change it max-cache-size 32M;

if not add the line under option section like this
max-cache-size 32M;

(32M = 32 Megabyte)

restart named

first check if this will help you not killing named.
if yes change the value 32M; bigger values test it.

Richard G · Jan 30, 2025

How much ram does your vps have?
And do you have a swap file/partition, and if yes how big?

klasje · Feb 1, 2025

1) I added max-cache-size 32M; to /etc/bind/named.conf.options and restarted named.
2) Server has 8 gig memory, which should be sufficient for the few sites on it.
3)
swapon --show
NAME TYPE SIZE USED PRIO
/swapfile file 6G 1.1G -2

Many thanks, hoping to get some sleep tonight with the max-cache-size.

exlhost · Feb 1, 2025

If your server uses and requests a lot of memory, it will also need it. Your server simply does not have enough memory. As soon as you add 8 gigabytes, for example, you will see that it is better distributed.

cjd · Feb 1, 2025

8GB does seem a bit low for a server running PHP web sites. And then as SQL databases grow it will need more memory to keep performing well. 16GB I would consider the minimum for this type of server, most sites have been running wordpress and it's quite memory hungry with lots of plugins. If it's a default install of linux, it's a good idea at a minimum to adjust the kernel swappiness (i normally use vm.swappiness=1 )... If the server is using a lot of swap it's going to be slow you need more ram.

klasje · Feb 6, 2025

I added again more memory, but keeps on crashing.

1) Upon reboot, I get thousands and thousands of following lines in syslog, making it difficult to debug, as I need to scroll up endlessley:
server nginx[709]: nginx: [warn] conflicting server name "anydomain.net" on 93.119.x.xx:443, ignored
Can I somehow fix this?

2) Syslog tells me following:
Feb 7 05:10:18 server named[2093]: no longer listening on 93.119.0.83#53

Is csf blocking port 53? Why does named even stop listing and why is directadmin not recovering from this automatically?

Logs right before the reboot:

Feb 7 05:10:01 server named[2093]: client @0x7f70f00d6f08 47.117.220.101#52782 (gymgroep.be): query (cache) 'gymgroep.be/AAAA/IN' denied (allow-query-cache did not match)
Feb 7 05:10:02 server named[2093]: client @0x7f70d438ad58 47.117.220.101#51966 (gymgroep.be): query (cache) 'gymgroep.be/AAAA/IN' denied (allow-query-cache did not match)
Feb 7 05:10:02 server named[2093]: client @0x7f70d4392a68 47.117.220.100#38822 (gymgroep.be): query (cache) 'gymgroep.be/AAAA/IN' denied (allow-query-cache did not match)
Feb 7 05:10:02 server named[2093]: client @0x7f70ec074998 47.117.220.100#49772 (gymgroep.be): query (cache) 'gymgroep.be/AAAA/IN' denied (allow-query-cache did not match)
Feb 7 05:10:02 server named[2093]: client @0x7f70d438ad58 47.117.220.98#4553 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:03 server named[2093]: client @0x7f70f00d6f08 47.117.220.98#9337 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:03 server named[2093]: client @0x7f70ec074998 47.117.220.98#47429 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:03 server systemd[1]: Created slice User Slice of UID 1077.
Feb 7 05:10:03 server systemd[1]: Removed slice User Slice of UID 1077.
Feb 7 05:10:03 server named[2093]: client @0x7f70ec0643b8 47.117.220.98#18032 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:03 server named[2093]: client @0x7f70f00aeda8 47.117.220.97#47347 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:03 server named[2093]: client @0x7f70e804f7a8 47.117.220.97#32865 (gymgroep.be): query (cache) 'gymgroep.be/A/IN' denied (allow-query-cache did not match)
Feb 7 05:10:05 server kernel: [86395.211341] Firewall: *TCP_IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:94:0a:71:08:00 SRC=167.94.145.19 DST=93.119.0.83 LEN=60 TOS=0x00 PREC=0x00 TTL=54 ID=23475 PROTO=TCP SPT=1412 DPT=28080 WINDOW=42340 RES=0x00 SYN URGP=0
Feb 7 05:10:05 server kernel: [86395.341073] Firewall: *TCP_IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:8d:be:15:08:00 SRC=193.41.206.156 DST=93.119.0.83 LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=54321 PROTO=TCP SPT=50544 DPT=8728 WINDOW=65535 RES=0x00 SYN URGP=0
Feb 7 05:10:05 server kernel: [86395.361964] Firewall: *TCP_IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:94:0a:71:08:00 SRC=167.94.146.44 DST=93.119.1.85 LEN=60 TOS=0x00 PREC=0x00 TTL=52 ID=61004 PROTO=TCP SPT=24820 DPT=6031 WINDOW=42340 RES=0x00 SYN URGP=0
Feb 7 05:10:11 server kernel: [86401.633605] Firewall: *TCP_IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:94:0a:71:08:00 SRC=104.234.115.134 DST=93.119.1.85 LEN=44 TOS=0x00 PREC=0x00 TTL=57 ID=38001 PROTO=TCP SPT=21088 DPT=7654 WINDOW=1024 RES=0x00 SYN URGP=0
Feb 7 05:10:11 server systemd[1]: Created slice User Slice of UID 1062.
Feb 7 05:10:11 server systemd[1]: Removed slice User Slice of UID 1062.
Feb 7 05:10:14 server systemd[1]: Created slice User Slice of UID 1077.
Feb 7 05:10:14 server systemd[1]: Removed slice User Slice of UID 1077.
Feb 7 05:10:18 server named[2093]: no longer listening on 93.119.0.83#53
Feb 7 05:10:23 server directadmin[694]: counted resources usage duration=7.432738ms
Feb 7 05:10:26 server directadmin[694]: removed old sessions duration=155.43µs removed=0
Feb 7 05:10:29 server kernel: [86419.318546] Firewall: *TCP6IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:94:0a:71:86:dd SRC=2604:a940:0302:0118:0000:0024:0000:0000 DST=2a01:07c8:bb0a:004b:5054:00ff:fe49:154d LEN=60 TC=0 HOPLIMIT=248 FLOWLBL=0 PROTO=TCP SPT=49789 DPT=49501 WINDOW=65535 RES=0x00 SYN URGP=0
Feb 7 05:10:30 server freshclam[687]: Fri Feb 7 05:10:30 2025 -> Received signal: wake up
Feb 7 05:10:30 server freshclam[687]: Fri Feb 7 05:10:30 2025 -> ClamAV update process started at Fri Feb 7 05:10:30 2025
Feb 7 05:10:36 server kernel: [86425.919235] Firewall: *TCP6IN Blocked* IN=ens3 OUT= MAC=52:54:00:49:15:4d:cc:1a:a3:94:0a:71:86:dd SRC=2604:a940:0302:0118:0000:0019:0000:0000 DST=2a01:07c8:bb0a:004b:5054:00ff:fe49:154d LEN=60 TC=0 HOPLIMIT=248 FLOWLBL=0 PROTO=TCP SPT=50066 DPT=7080 WINDOW=65535 RES=0x00 SYN URGP=0
Feb 7 05:13:23 server freshclam[687]: Fri Feb 7 05:13:23 2025 -> ^Failed to get daily database version information from server: https://database.clamav.net
Feb 7 05:13:23 server freshclam[687]: Fri Feb 7 05:13:23 2025 -> !check_for_new_database_version: Failed to find daily database using server https://database.clamav.net.
Feb 7 05:13:23 server freshclam[687]: Fri Feb 7 05:13:23 2025 -> Trying again in 5 secs...
Feb 7 05:13:28 server freshclam[687]: Fri Feb 7 05:13:28 2025 -> Trying to retrieve CVD header from https://database.clamav.net/daily.cvd
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Feb 7 05:14:24 server systemd[1]: Mounted Huge Pages File System.
Feb 7 05:14:24 server systemd[1]: Mounted POSIX Message Queue File System.
Feb 7 05:14:24 server systemd[1]: Mounted Kernel Debug File System.

And some other insignificant cron jobs and monitoring that run before the reboot that I snipped from these logs.

Should I create a cronjob that reboots named every 12 hours maybe as a quickfix?
Where does named store it's logs to dig deeper into the cause?

kristian · Feb 7, 2025

Did you monitor the memory usage of your processes, to see which one is using memory? Did you look for OOM-killer log entries in your system logs?

If you want named to log more you can add the necessary configuration in the named.conf related files. There are many resources out there on how to set up logging.

Are you using your named as only authoritative DNS for your own domains, or also as a recursive caching resolver for yourself and/or for anyone to use? An authoritative NS should not also be a recursive resolver at the same time for the same client IP (i.e. you would need to implement views, which I assume you have not).

Richard G · Feb 7, 2025

Can you post your named.conf here? Just the config part, not the domains.

Hostmavi · Feb 7, 2025

i don't thing csf blocking the port
Feb 7 05:10:18 server named[2093]: no longer listening on 93.119.0.83#53
this mean your named killed or stop .

as Richard wrote post your named.conf

and post output from command

cat /usr/local/directadmin/data/admin/services.status

may be your named setting Off ther so directadmin not restarting it.

klasje · Feb 16, 2025

root@server:/home/user# cat /usr/local/directadmin/data/admin/services.status
clamav-daemon=ON
clamav-freshclam=ON
dovecot=ON
exim=ON
lfd=ON
mysqld=ON
named=ON
nginx=ON
php-fpm74=ON
php-fpm82=ON
php-fpm84=ON
pure-ftpd=ON
spamd=ON
sshd=ON

I meanwhile found a probable partial cause.I updated mysql 5 to 8 and all of a sudden the mysql started doing bin logging.
This filled the disk very rapidly, including the swap.

So actually a lot of issues needs to be resolved:
1) Added extra disk space
2) Should move swap to a separate partition
3) Changed the default directadmin my.cn config to do not do bin logging (added "sql_mode=" to /etc/my.cnf)
4) I should buy a second server with a second directadmin so nameservers could sync and there are actually 2 different nameservers
5) Kristian mentioned something about views, no clue what to do here, but I need to look into it.

Still strange that if there is no memory that:
1) Named is killed
2) Directadmin does not start it again

Many thanks everyone!

Memory hungry server killing named

klasje

Verified User

Ohm J

Verified User

klasje

Verified User

Ohm J

Verified User

klasje

Verified User

kristian

Verified User

Hostmavi

Verified User

Richard G

Verified User

klasje

Verified User

exlhost

Verified User

cjd

Verified User

klasje

Verified User

kristian

Verified User

Richard G

Verified User

Hostmavi

Verified User

klasje

Verified User