[Thu Mar 18 04:57:17 2004] [notice] child pid 2644 exit signal Segmentation fault (11

RSanders

Verified User
Joined
Sep 7, 2003
Messages
20
Hi,

Tonight, one of our machines droped offline. It fills the /var/log/httpd/error_log

[Thu Mar 18 04:57:59 2004] [notice] child pid 23037 exit signal Segmentation fault (11)
[Thu Mar 18 04:58:08 2004] [notice] child pid 25877 exit signal Segmentation fault (11)
[Thu Mar 18 04:58:12 2004] [notice] child pid

I've checked the up2date log, and do not see any incorrect packages being installed lately.

Noone was in the shell, we do show the client loging in to the control panel the same time this started.

I'm rebuilding apache from direct admin, but I think that's a long shot.

I'm still digging around, but this is odd...
 
Well,

rebuilding apache brings the system back up, but we still don't know what took it down.
 
Check none of your log files are large (that can clog up apache), is there anything prior to them messages, anything in /var/log/messages?

Did the entire system go down or just apache also, your quote "rebuilding apache brings the system back up" makes me think that it was only apache that failed?

Chris
 
This is really strange, but i'm starting to think this is DA related.
I see it more and more often; i manage the DA servers for a certain Dutch company, and still one of those servers has problems with this.

My own company's equipment hasn't had problems with it (last Apache compile on my own company's stuff all was nov-2003).

ProWebUK: Apache just quits without leaving much logs, only those child exits, it simply dies, but DA doesn't detect is since the parent survives but no longer responds to requests (and no, it's not declared to be a zombie process), restarting the apache service works for a while, but within 24 hours you most likely have the same problem again...
 
Have you tried an entire rebuild (removing files also...)

rm -rf /usr/lib/apache/
cd /usr/local/directadmin/customapache
rm -f configure.*
./build clean
./build update
./build all

Chris
 
Just apache. None of the logs were very large.

cd /usr/local/directadmin/customapache
./build clean
rm -f configure.*
./build update
./build all

and it pops to life. Also, afaik noone has done any system changes to the machine. Nothing that I can see was installed, etc, that might have got in the way.
 
Hello,

What OS are you using? We've recently changed around the configure.apache_ssl script. The old one may have been causing problems, so try out the new one. (Affected RedHat 9)

DA only restarts apache and changes the user httpd.conf files.. perhaps with the old configure.apache_ssl script and a certain condition caused the segfault. Hard to say.

John
 
]$ cat /etc/redhat-release
Red Hat Linux release 7.3 (Valhalla)

*shrug*

I do have a 9 box that drops off almost nightly. Strangest thing,
service httpd status
shows pids,
stop it, then status again,
shows _different_ pids
stop it again, status shows stoped,
start it, and it runs *shrug*

Odd, but I still bought yet another license tonight. I have confedence in ya ;)
 
I should mention, I have seen that oddity on the RH 9 machine on non DA machines. Usually, the first status shows one pid, the second shows half a dozen or more, stoping twice and starting always brings it up.
 
What time of the day do they start to happen? Wondering if the cpu usage from the dataskq just after midnight is causing it.

John
 
It was 2300 EST, the machine thought it was 0415 EST (all my fault) for the first issue.

The second issue mentioned happens around 0000, with the correct system time. So you might be on to something, heres the monitor for one machine.


HTTP OK 03-17-2004 00:21:47 HTTP ok: HTTP/1.1 200 OK - 0.007 second response time
HTTP OK 03-17-2004 00:21:47 HTTP ok: HTTP/1.1 200 OK - 0.007 second response time
HTTP CRITICAL 03-17-2004 00:18:58 Socket timeout after 10 seconds
HTTP CRITICAL 03-17-2004 00:18:58 Socket timeout after 10 seconds

HTTP OK 03-10-2004 00:19:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP OK 03-10-2004 00:19:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP CRITICAL 03-10-2004 00:16:37 Socket timeout after 10 seconds
HTTP CRITICAL 03-10-2004 00:16:37 Socket timeout after 10 seconds


HTTP OK 03-09-2004 00:20:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP OK 03-09-2004 00:20:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP CRITICAL 03-09-2004 00:17:37 Socket timeout after 10 seconds
HTTP CRITICAL 03-09-2004 00:17:37 Socket timeout after 10 seconds

If I do say so myself, our responce time for service failures on managed machines is pretty fast :)
 
Hmm. Looks to me as though it may be the log rotater. Just a hunch, but since the dataskq takes a while to chug through all the data, apache isn't restarted for quite some time. But before it's restarted the rotated logs are tar.gz'ed and copied over, then the original log is deleted. Now, for the error logs, apache holds the filedescriptor open, so if it's trying to write to the deleted file before apache is restarted, I'm not sure what would happen. On our 7.2 build system, nothing happens. But I can't really say for other ones..

Perhaps do a test.. try deleteing someones error log file, then try generating errors (page not found errors, or php).. see if it generates a segfault.

That's just a guess.. it's the only real thing that I can think of which might cause apache to go down at that time.. as it's the only thing that DA does to apache, other than restart it after the tally.

John
 
I'll keep my eye on it. I would rather not 'test' too much on these as they are production machines. The time i test something, is the time they are doing something 'important' *sighs*


But, I will watch it a bit closer and see what I can find out. Next time one locks up it's open season. If it's down from a failure, I can justify taking 5 minutes to check it out.
 
DirectAdmin Support said:
Now, for the error logs, apache holds the filedescriptor open, so if it's trying to write to the deleted file before apache is restarted, I'm not sure what would happen. On our 7.2 build system, nothing happens. But I can't really say for other ones..

John

Isn't it better to truncate log files instead of deleting them? This would also allow to avoid restarting apache.
 
How do you mean? Anything other than appending would rewrite the file from zero and create a different link in the filesystem, essentially deleting the file and starting over causing a dangling filedescriptor...

John
 
DirectAdmin Support said:
How do you mean? Anything other than appending would rewrite the file from zero and create a different link in the filesystem, essentially deleting the file and starting over causing a dangling filedescriptor...

John

Below is an example of Perl code that truncates file. It doesn't delete a file, it doesn't dangle file descriptor, it just sets file size to zero. This code was tested on apache logs and it works perfectly.

open(FILE, "+<access.log") or die;
flock(FILE, 2) or die;
seek(FILE, 0, 0) or die;
truncate(FILE, 0) or die;
flock(FILE, 8) or die;
close(FILE) or die;
 
my server do the same thing

my httpd crash always each day

sometime at around 20h pm
and sometimes at around midnight....


what i need to do???

i built alllllllll

and its did it again....
 
log

[Thu Jul 8 20:13:45 2004] [notice] child pid 31459 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31460 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31461 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31462 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31463 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31464 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31465 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31468 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:46 2004] [notice] child pid 31474 exit signal Segmentation fault (11)



OR

[Fri Jul 9 00:15:16 2004] [crit] (98)Address already in use: make_sock: could not bind to port 8090


what can i do serious???
 
Back
Top