apache 2.4.49,2.4.52 strange unreachable after pass few hours

I am not with nginx combo and I am not with CSF (I actually use ipfw on FreeBSD).

Increasing the ServerLimit, etc. did NOT fix the issue. It just made it crash a bit later than "usual". I downgraded.
 
The problem is back in version 2.4.52.
We went back to version 2.4.51 and the problem has been fixed.
 
The problem is back in version 2.4.52.
We went back to version 2.4.51 and the problem has been fixed.
Many thanks for this!

I thought I was going insane being the only person that recognized this happening in Apache 2.4.52.

I'm not sure if it's exactly the same thing from Apache 2.4.49, but end result issue is the same. I can just vaguely remember the issue with 2.4.49 - I'll have to look through my notes, if I have any.

I assume that Apache had to fix this issue in 2.4.50? So I'm thinking they'll have to fix this and release a new update to Apache.
 
issued come back again,
maybe their source code have new hooks, new script about mpm_event, and breaks change old code.

relate commit.
 
Really weird , all our servers have the latest Apache 2.4.52 and non of them have this problem (nock on the wood)
Maybe an combination of OS? (our systems have all Centos and Almalinux OS)
 
I'll knock on the wood too. Never had this issues, neither on previous versions. Like Activ8 we also run Centos and Almalinux.
 
We also have this issue (on nginx_apache) after upgrading to the latest version, worked stable on previous versions but had to downgrade because apache would stop accepting new connections while still running.

Our fix was:
Bash:
cd /usr/local/directadmin/custombuild/
echo "apache2.4:2.4.51:" > custom_versions.txt
./build nginx_apache
 
Best I've been able to determine with the Apache 2.4.52 issue is that it's proportional to how busy the server is.

If you have a lot of accounts and a lot of traffic on the server, then the issue will present itself more often.

Fewer accounts and less traffic, then the issue will present itself less often.

I would guess there's also a factor in there as to how often you restart Apache on the server. Restarting Apache may "reset the clock" so to speak.

Calling a server "busy" can be very subjective. I just know that we had a server that was busier (more accounts and more traffic) than another one. The busier server was seeing this issue about every 4 hours, give or take. The less busy server took more than a day to present the issue.

I suppose it's certainly possible that the issue is configuration related. But with multiple people seeming to be experiencing this issue and the fact that a similar issue happened in an earlier version of Apache, and the fact that the issue did not present itself until upgrading to Apache 2.4.52, then the evidence would seem to point to a change in Apache that is causing this.

Perhaps those that have not been affected by it have something configured that the rest of us do not. Or perhaps those that not been affected simply haven't given it enough time to present itself as they are on a less busy server.
 
I also suspect that this may only be affecting those of us that are using the event MPM. Perhaps some of you that haven't experienced this issue are using something other than the event MPM in Apache?

There's quite a few changes from Apache 2.4.51 to Apache 2.4.52 in the event code:


Compared to worker and prefork:


I can't say for certain that the issue is some where in there, but it's the leading candidate for me.
 
I don't think DirectAdmin really has anything to do with this.

This is an Apache issue. Unfortunately I don't really know how to report a bug to Apache or even how to describe the issue in a repeatable form.

This thread was started back when Apache 2.4.49 was released. I tried to find my notes on this issue with Apache 2.4.49 but could not find anything. I think I skipped this version - probably because I found this thread or others experiencing the issue, so I just waited until Apache 2.4.50 was released hoping to avoid this problem.

First question:

Is everybody experiencing this issue using the event MPM? You can see what MPM you are using by running:

/usr/sbin/apachectl -V

and looking for what is set for Server MPM:

Or

/usr/sbin/apachectl -V | grep "^Server MPM:"

I suspect everybody that's having this issue is using event. Those that are not, may be using worker or prefork.


The changelog for Apache 2.4.50 from Apache 2.4.49 includes:

*) event mpm: Correctly count active child processes in parent process if
child process dies due to MaxConnectionsPerChild.
PR 65592 [Ruediger Pluem]


Which leads to the bug report:


The patch mentioned in that bug report was applied to Apache 2.4.50.

The differences in server/mpm/event/event.c from Apache 2.4.49 to Apache 2.4.50 are:


Specifically the addition of

if (ps->quiescing == 1) { ps->quiescing = 2; active_daemons--; }

In static void perform_idle_server_maintenance()

I assume that this is what fixed the issue everyone was experiencing with 2.4.49 - which I gather is similar to what folks are seeing with Apache 2.4.52. (Anybody that experienced the issue with Apache 2.4.49 and is also experiencing this issue with Apache 2.4.52 want to weigh in on the similarities?)

Now from Apache 2.4.50 to Apache 2.4.51 server/mpm/event/event.c was not touched.

From Apache 2.4.51 to Apache 2.4.52, server/mpm/event/event.c was changed:


And there were a lot of changes.

There were a lot of changes in the static void perform_idle_server_maintenance() function.

Specifically, the part that was changed from 2.4.49 to 2.4.50:

if (ps->quiescing == 1) { ps->quiescing = 2; active_daemons--; }

Was changed to:

int child_threads_active = 0; if (ps->quiescing == 1) { ps->quiescing = 2; retained->active_daemons--; }

Along with a lot of other changes.

I would surmise that all of these changes have re-introduced the bug that came to be in Apache 2.4.49 or at least introduced something very similar. And the Apache developers would need to fix it.

I'm not going to even pretend to understand what all is being done with the changes from Apache 2.4.51 to Apache 2.4.52 - but I have to believe that these changes are a large part of the issue we are all facing.
 
@sparek nice explanation but i understand @caisc that a revert/forced downgrade would be better in this situation.

Ofcourse it is people their own choice to upgrade directly, but considering security updates aside it might be a good thing to have update channels to prevent such issues in production.
 
I don't think there is a separate update channel for Apache.

The update channel only applies to core DirectAdmin.

Custombuild is going to recommend the newest version of Apache independent of what DirectAdmin update channel you are on (well... as much as DirectAdmin's servers control Apache ... a new version of Apache could be released right now, but Custombuild won't show it until DirectAdmin's servers have the new version of Apache).

Right now the ONLY defense we have is to downgrade to Apache 2.4.51. I suppose you could argue that DirectAdmin should remove Apache 2.4.52 as an option from Custombuild, but that's going to wreck havoc on those that have already upgraded to Apache 2.4.52. I don't fault DirectAdmin at all in any of this. It's not DirectAdmin's responsibility to do R&D for ever new software update. Plus they've already provided a convenient way to downgrade should anyone want to do so.

I gave my thoughts on the changes in server/mpm/event/event.c to illustrate where I think the Apache developers went wrong, or at least some general area of where they went wrong. Ultimately the Apache developers need to be made aware of this issue and fix the flaw on their end. But I don't know how to get in touch with the Apache developers nor do I know exactly what to tell them. All I have is a theory that they mucked something up with the changes in server/mpm/even/event.c from Apache 2.4.51 to Apache 2.4.52.
 
My point @sparek was maybe to introduce update channels in custombuild, so unknowing people might get prevented of having these issues and probable issues are rooted out like from RC versions of DA before releasing them to the bigger public.

Just a mindfart/idea nothing more, thank you for your insights! By the way are you the Sparek from LET?
 
I suspect everybody that's having this issue is using event. Those that are not, may be using worker or prefork.
No, I'm also using event. And I'm one of those not having issues.

Code:
Architecture:   64-bit
Server MPM:     event
threaded:     yes (fixed thread count)
forked:     yes (variable process count)

However, as you stated before it might have to do how busy things are.
 
Due to many complaints, we've reverted the version back to 2.4.51, if you'd like to get 2.4.52 installed, you'd need to add the following line to custom_versions.txt:
Code:
apache2.4:2.4.52:
 
Back
Top