Apache Fatal Error

Rick

Verified User
Joined
Sep 25, 2004
Messages
26
Location
Amsterdam, The Netherlands
Hello all,

We've had several problems on both Apache 1 and Apache 2 (currently installed).

Several times we've encountered the following error: [Mon Nov 15 00:47:44 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)

Apache then suddenly stops working and all sites are down. An restart seems to solve this problem, but only for a short time (2 hours).

Has anyone encountered this problem? And more importantly, how to fix it?

Thank you all,
:)
 
Same error..

Im getting the same error, thought Id add my info instead of starting a new thread.

Basic summary is that from the command line apachectl & /usr/local/etc/rc.d/httpd both work fine with (start|stop|restart|-t). Generates no errors, all sites work, etc. When directadmin tries to control httpd (ie, though SHOW_SERVICES and Apache: (Start|restart|reload) the main /usr/bin/httpd starts but then repeatedly spawns child procs as they die.
It *looks* like every spawn/death is logged in /var/log/httpd/error_log as a
Socket operation on non-socket: apr_accept: (client socket)
My *guess* is that something with the (pre)forking of the child/workers is going horribly wrong.

So, my question is where is directadmin trying to control /usr/bin/httpd from? As far as I can tell it simply calls /usr/local/etc/rc.d/httpd (start|stop|restart). But if this is the case what flags/variable is it trying to pass to httpd that cause it to go b0rked?

System info is here: http://kix.montereybay.com/~donavan/details.txt
 
Hi

Does anyone has a clue.

httpd will stop dialy and customers are not happy with it

we can't find anything to fix this problem :(
 
Kind of "solution"

I have a horrible dirty "Solution" which seems to be working so far.

1) Stop apache through directadmin
2) Disable /usr/local/etc/rc.d/httpd. I gzip'd it but I imagine you could also do "chmod 500 /usr/local/etc/rc.d/httpd" so that the directadmin user couldnt use it.
3) use "apachectl startssl" to get httpd going again.

Caveats: apache wont start on boot, theres no "watchdog" for httpd, and I havnt really tested this much. But it works so far ( ~30 hours).

I havnt looked into the actual *cause* of the problem anymore but I suspect it might be compile time threading/child options. Ill update if I find anything else out.
 
Yep

Apache 1.3 and Apache 2.0 has the same error

httpd goes down every 2 hours :(

customers are leaving :(
 
Mainswitch,

You don't give nearly enough information to help keep you from losing customers.

What OS are you using?

Are you trying 1.x and 2.x on the same system?

What kind of errors are appearing in the apache logs?

What system load do you have when the system goes down?

If necessary you should output a grep of ps waux to a logfile every minute or so, so when it goes down you can see if you have any runaway processes.

Jeff
 
jlasman said:
Mainswitch,

You don't give nearly enough information to help keep you from losing customers.

What OS are you using?

Are you trying 1.x and 2.x on the same system?

What kind of errors are appearing in the apache logs?

What system load do you have when the system goes down?

If necessary you should output a grep of ps waux to a logfile every minute or so, so when it goes down you can see if you have any runaway processes.

Jeff

We are using Fedora Core 2 as Operating System, and yes we tried Apache 1 and Apache 2 on the same system, while Apache 1 was having the same problem. Please see my openingpost for the error we recieved when Apache "crashes".

The system load is very high due to all the hanging httpd processes who refuse to shutdown. We've had the same amount of users on a far slower system and that was running almost without problems.

I will monitor the logfiles for the coming hours.

Thanks for your help, :)
 
Hi,

I am having the same problem on my DirectAdmin VPS. To answer the questions:

> What OS are you using?

Fedora Core 1 running DA v1.23.1 on a Virtuozzo server.

> Are you trying 1.x and 2.x on the same system?

Just 2.x

> What kind of errors are appearing in the apache logs?

[Tue Nov 23 02:07:00 2004] [notice] caught SIGTERM, shutting down
[Tue Nov 23 02:08:00 2004] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Nov 23 02:08:01 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:01 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:01 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:01 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:01 2004] [notice] Apache/2.0.52 (Unix) mod_perl/1.99_17-dev Perl/v5.8.3 mod_ssl/2.0.52 OpenSSL/0.9.7a PHP/4.3.9 FrontPage/5.0.2.2634 configured -- resuming normal operations
[Tue Nov 23 02:08:01 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:02 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:03 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)
[Tue Nov 23 02:08:03 2004] [error] (88)Socket operation on non-socket: apr_accept: (client socket)

> What system load do you have when the system goes down?

0.0. Its basically an empty server.

Also, this happens for me when I hit the restart link for httpd on the show services page. My guess is that it freaks out whenever DirectAdmin tries to restart Apache. For example, this happened at 00:10AM this morning when DA ran one of its cron tasks:

10 0 * * * root echo 'action=tally&value=all' >> /usr/local/directadmin/data/task.queue

So that is why donavan's solution (to disable the httpd service) works.

However, doing a "services httpd restart" from the command line doesn't cause this problem to happen.

Let me know if there is any other information you might need. Thanks.

Hal
 
I have exactly the same problem, trying to restart httpd via directadmin and I get these messages in the httpd errorlog

[Tue Nov 23 18:55:08 2004] [error] (38)Socket operation on non-socket: apr_accept: (client socket)

I'm running freebsd 5.3 and apache 2.0.52, and it's a clean installation from 3 days ago, with one testdomain on it.

Regards
Fabrizio
 
Note that I only run Apache 1.3x on old distro's, don't trust Fedora enough...

It's some kind of new problem I guess, Google only had 2 matches :(
Personally I haven't had it yet (and I'm not going to do any version upgrades until I know what is causing it ;)).

httpd will stop dialy and customers are not happy with it
bad solution: (used this one before when a apache restart worked for about 10 hours); create a crontab entry that forces apache to die, and start over again.
It's a bad/ugly solution, but so far it worked when I needed it...
 
bump...


anyone have a non-hacked fix? it's obviously an issue with how DA restarts the service as starting/stopping from the command line works like a champ.

I'm ready to put my DA box into production on Saturday and would really love to get this (my only problem) worked out.

--Josh
 
Not I

I havnt messed wit hthis in a while now, still using the ghetto fix.

From what I recall I tracked it down to an issue with apache spawning children using the apr_socket call. I seem to recall that its supposed to "clone" a socket the way it was being used in apache.

I tried making a dummy httpd.sh which would echo all the calling args and system variables in syslog but didnt even see anything strange there. I think the next step would be trying to use a drop in replacement for httpd.sh or compiling apache with a different thread model than prefork. IIRC "worker" runs on 5.3 now, but dont know the stability as its still super beta code.
 
I wonder how many of you guys are running dual processor servers.

Just a wild idea.
 
ARGHHH!

Dammit. Well heres my relevant uname and kernel config. Ill go ahead and try GENERIC.

%uname -a
FreeBSD chex.montereybay.com 4.10-RELEASE-p2 FreeBSD 4.10-RELEASE-p2 #0: Fri Sep 10 13:07:13 PDT 2004[email protected]:/usr/obj/usr/src/sys/SMPKERN i386
%diff SMPKERN /usr/src/sys/i386/conf/GENERIC

http://kix.montereybay.com/~donavan/kernel.txt

PS: If I had to guess Id say it was compiliing with 686, not 386 mode. Or something *totally* wacky in the smp code.

EDIT: Just to be clear, I am running SMP. IBM Netfinity - Dual P3 700 on 440LGX
 
Last edited:
well i have fixed the problem. well hacked it to fix it anyway. the problem doesn't appear to be with apache at all, just with Directadmin using the new /etc/init.d/httpd (i use RHEL 3)

simply move the /etc/init.d/httpd to httpd.real

then make a new httpd in that same directory with the same permissions but only put these two lines in it:

#!/bin/sh
/etc/init.d/httpd.real $1 &> /dev/null

this will pass the correct parameters (start , stop, reload etc) to the real httpd init file. but redirect stdout and stderr to /dev/null. This will allow directadmin to start/stop the service without generating any of those nasty problems we have all received.

DA people: any idea on what in DA or the apache compilation would cause this? and when a fix would be available?

BTW: this fix will allow DA to do it's normal functions and allow the server to start/stop the service on bootup and shutdown.

let me know if anyone has any questions

--Josh

PS - I have a p4 2.4 with hyperthreading on my server.
 
This is what John answered me, to the mail regarding this problem

Hello,

We are aware of that problem, but do not have any solutions as of yet. Google isn't providing any hints either.
For now I think the solution is to add a cront that runs hourly or so to restart apache.. as the solution is beyond me at this point.
(or manually go back to 1.3.33)

Thank you,

John
--------------------------------------------------

DirectAdmin Web Control Panel
http://www.directadmin.com
 
Hello,

I've changed the 2 boot scripts we provide for apache 2:
http://files.directadmin.com/services/customapache/httpd_2
http://files.directadmin.com/services/customapache/httpd_2_freebsd

so give them a try. Changes include the ulimit commands and using the -k flag when starting apache.

John
 
no dice on that...

i think it has something to do with something being outputted (is that a word) to stderr.
because
> does not work but
&> does (which is stdout & stderr)
(the above 2 apply to my fix in a few posts up)

--Josh
 
Back
Top