Apache goes unresponsive/unreachable during Direct Admin tally update cron run

CAISC · Jan 4, 2022

Hello,

Note - I am already running apache 2.4.51, so bug related to apache 2.4.52 is not the issue in my case.

I am facing strange issue, on Direct Admin server Apache goes unresponsive/unreachable randomly after 30 mins to 2 hours (random times) during Direct Admin tally update cron run.

I have to restart apache to make all sites start working. But it keep going unresponsive/unreachable atleast 2 to 4 times during the time tally update cron is running. Tally update cron runs for approx 3 hours on server and during those 3 hours this issue keeps on occuring are random time intervals.

Once Tally update is over all keeps working fine.

It appears server runs out of resources like RAM during the Tally cron run, how do I limit the Tally cron run on RAM usage and make sure it runs for longer duration with low resource usage. Server load seems to be normal during the time cron is running, but I do get very high load warning 1-2 times after the cron run completes. May be its updating the data for all accounts.

Its an 32 core AMD CPU server with 128GB RAM.
This is the cron that's creating problem when running -
10 0 * * * root echo 'action=tally&value=all' >> /usr/local/directadmin/data/task.queue

As per DA support team recommendation I added nice parameter in query -
10 0 * * * root echo 'action=tally&value=all' >> /usr/bin/nice -n 5 /usr/local/directadmin/data/task.queue
But then it appears cron update never finishes, because even after waiting 48+ hours Server Statistics tally is not updated, and shows same old date for "Last Tally Completion"

Appreciate help from experts in this matter.

Thanks

factor · Jan 4, 2022

Just curious how many user accounts do you have on the server?

Also on apache are you using Event as the mpm?

CAISC · Jan 4, 2022

mpm - Worker
Approx 900+ users (1.2 TB DATA),

please note these users were hosted on a low end config cpanel server previously, when compared to this server.
On cpanel server never faced such issue.

factor · Jan 4, 2022

What is https://docs.directadmin.com/direct...l-directadmin-conf-values.html#realtime-quota
set to in the directadmin.conf

Code:

/usr/local/directadmin/directadmin c |grep realtime

also this one to

All directadmin.conf values | Directadmin Docs

DirectAdmin Knowledge Base

docs.directadmin.com

Code:

/usr/local/directadmin/directadmin c |grep simple

You might try tweaking those.

Any reason your not using mpm_event. I dont have that many users and never used worker.

Also DA is not like cpanel. All new stuff to learn.

CAISC · Jan 4, 2022

/usr/local/directadmin/directadmin c |grep realtime
realtime_quota=2

Also in file - /etc/cron.d/directadmin_cron
this cron is commented out by default -
#5 5 * * 0 root /sbin/quotaoff -a; /sbin/quotacheck -augm; /sbin/quotaon -a;

factor · Jan 4, 2022

All directadmin.conf values | Directadmin Docs

DirectAdmin Knowledge Base

docs.directadmin.com

I would read through all the directadmin.conf values The system isnt tweaked for large use you have to optimize it yourself.

CAISC · Jan 4, 2022

Thinking of modifying the cron -
echo 'action=tally&value=all' >> /usr/local/directadmin/data/task.queue

into 2 parts and run at diff intervals -

echo 'action=tally&value=[a-m]*' >> /usr/local/directadmin/data/task.queue
and
echo 'action=tally&value=[n-z]*' >> /usr/local/directadmin/data/task.queue

not sure it will work or not.

CAISC · Jan 4, 2022

You might like this on to
All directadmin.conf values | Directadmin Docs

Thanks bdacus01,

will surely go through all of them, Also if you can recommend some values that are must have for a busy server.

CAISC · Jan 4, 2022

/usr/local/directadmin/directadmin c |grep simple
simple_disk_usage=0

Any reason your not using mpm_event. I dont have that many users and never used worker.

Previous cpanel server was running on worker, while migrating to DA we tried to keep maximum settings and environment same, as much possible.
So that migration goes on smoothly.

Agree that cpanel and DA are both diff prod, still much to explore for cp admin like us.

We have many servers to be migrated, right now just ironing out some issues that we are facing on DA servers as of now.

factor · Jan 4, 2022

Well I am no expert..
I would check php fpm configs and you might like mpm_event better. You will also want to tweak the mariadb configs

factor · Jan 4, 2022

caisc said:
still much to explore for cp admin like us.

its for sure a journey. Welcome to DA btw.

factor · Jan 4, 2022

PHP-FPM nightmares

Needing many children processes would mean usually either 1 of the 2, or a combination I think: 1) simply a lot of traffic 2) some pages, or the base system (then it would mean all pages), use a lot of calculations or inefficient database queries etc.. something that is causing 1 request to take...

forum.directadmin.com

PHP-FPM nightmares

Well, I guess it wasn't enough [19-Nov-2020 05:58:07] WARNING: [pool xxx] server reached max_children setting (46), consider raising it Now changed to 100 ?‍♂️ Now the user's fpm.conf is pm = ondemand pm.max_children = 100 pm.process_idle_timeout = 20 pm.max_requests = 500 Maybe the timeout...

forum.directadmin.com

if you dont have opcache on you might try this

Slow Image Loading Speeds

Hello there, Sorry for being a ghost yesterday, was a hectic day. PHP 7.4.20 (cli) (built: Jun 5 2021 17:15:32) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies with the ionCube PHP Loader + ionCube24 v10.4.5, Copyright (c) 2002-2020, by...

forum.directadmin.com

CAISC · Jan 4, 2022

Thanks actually we run Cloudlinux and use lsphp by default for all.

Any suggestions for tweaking mariadb configs will be helpful.

CAISC · Jan 4, 2022

Went through all the direct admin parameters that were mentioned, specially -

All directadmin.conf values | Directadmin Docs

DirectAdmin Knowledge Base

docs.directadmin.com

disabled inode count -
inode=0

plus added few extra in the conf file -
restart_apache_after_tally=1
reload_apache_after_rotation=1

restarted DA.

then again ran the cron -
10 0 * * * root echo 'action=tally&value=all' >> /usr/local/directadmin/data/task.queue

but no luck, same issue again. This is now becoming a pain point.

factor · Jan 4, 2022

You might try to collect more info in debug mode

Troubleshooting DA service | Directadmin Docs

DirectAdmin Knowledge Base

docs.directadmin.com

factor · Jan 4, 2022

Also what is the mpm_worker config

Code:

cat /etc/httpd/conf/extra/httpd-mpm.conf

The standard is
<IfModule mpm_worker_module>
StartServers 6
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 50
MaxRequestWorkers 300
MaxConnectionsPerChild 10000
</IfModule>

Which isn't enough. try something like this. May need to be more.

<IfModule mpm_worker_module>
StartServers 25
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 50
MaxRequestWorkers 1024
MaxConnectionsPerChild 10000
</IfModule>

CAISC · Jan 4, 2022

httpd-mpm.conf is the first thing I adjusted after server setup -
Here are the settings in my case -

<IfModule mpm_worker_module>
ServerLimit 500
StartServers 50
MinSpareThreads 25
MaxSpareThreads 150
ThreadsPerChild 50
MaxRequestWorkers 1000
MaxConnectionsPerChild 5000
</IfModule>

factor · Jan 5, 2022

caisc said:
ServerLimit 500

You might try uping these 2

Code:

ServerLimit          1024
MaxConnectionsPerChild   10000

CAISC · Jan 5, 2022

Thanks Brent for the follow up.

tried with combinations -

ServerLimit 1000
MaxConnectionsPerChild 10000

and

ServerLimit 2000
MaxConnectionsPerChild 10000

but that didn't helped either. ?

CAISC · Jan 5, 2022

Actually load on the server is not that much, almost 95% percent of time
only 20% of CPU and RAM capacity is utilised.

Its a high end server with ample resources and well configured.

Its just this cron that's creates some deadlock during the final stages of computation.

For example What I have seen is -

cron runs for approx 2 hour 45 mins

1st deadlock (Apache unreachable issue) comes after 2 hours mostly, as a remedy I restart the Apache
2nd deadlock (Apache unreachable issue) comes after approx 15 min of 1st deadlock occurence, again as a remedy I restart the Apache

finally after that cron job gets completed and no issue after that.

Apache goes unresponsive/unreachable during Direct Admin tally update cron run

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User