[Thu Mar 18 04:57:17 2004] [notice] child pid 2644 exit signal Segmentation fault (11

RSanders · Mar 17, 2004

Hi,

Tonight, one of our machines droped offline. It fills the /var/log/httpd/error_log

[Thu Mar 18 04:57:59 2004] [notice] child pid 23037 exit signal Segmentation fault (11)
[Thu Mar 18 04:58:08 2004] [notice] child pid 25877 exit signal Segmentation fault (11)
[Thu Mar 18 04:58:12 2004] [notice] child pid

I've checked the up2date log, and do not see any incorrect packages being installed lately.

Noone was in the shell, we do show the client loging in to the control panel the same time this started.

I'm rebuilding apache from direct admin, but I think that's a long shot.

I'm still digging around, but this is odd...

RSanders · Mar 17, 2004

Well,

rebuilding apache brings the system back up, but we still don't know what took it down.

ProWebUK · Mar 17, 2004

Check none of your log files are large (that can clog up apache), is there anything prior to them messages, anything in /var/log/messages?

Did the entire system go down or just apache also, your quote "rebuilding apache brings the system back up" makes me think that it was only apache that failed?

Chris

Icheb · Mar 17, 2004

This is really strange, but i'm starting to think this is DA related.
I see it more and more often; i manage the DA servers for a certain Dutch company, and still one of those servers has problems with this.

My own company's equipment hasn't had problems with it (last Apache compile on my own company's stuff all was nov-2003).

ProWebUK: Apache just quits without leaving much logs, only those child exits, it simply dies, but DA doesn't detect is since the parent survives but no longer responds to requests (and no, it's not declared to be a zombie process), restarting the apache service works for a while, but within 24 hours you most likely have the same problem again...

ProWebUK · Mar 17, 2004

Have you tried an entire rebuild (removing files also...)

rm -rf /usr/lib/apache/
cd /usr/local/directadmin/customapache
rm -f configure.*
./build clean
./build update
./build all

Chris

RSanders · Mar 17, 2004

Just apache. None of the logs were very large.

cd /usr/local/directadmin/customapache
./build clean
rm -f configure.*
./build update
./build all

and it pops to life. Also, afaik noone has done any system changes to the machine. Nothing that I can see was installed, etc, that might have got in the way.

DirectAdmin Support · Mar 18, 2004

Hello,

What OS are you using? We've recently changed around the configure.apache_ssl script. The old one may have been causing problems, so try out the new one. (Affected RedHat 9)

DA only restarts apache and changes the user httpd.conf files.. perhaps with the old configure.apache_ssl script and a certain condition caused the segfault. Hard to say.

John

RSanders · Mar 18, 2004

]$ cat /etc/redhat-release
Red Hat Linux release 7.3 (Valhalla)

*shrug*

I do have a 9 box that drops off almost nightly. Strangest thing,
service httpd status
shows pids,
stop it, then status again,
shows _different_ pids
stop it again, status shows stoped,
start it, and it runs *shrug*

Odd, but I still bought yet another license tonight. I have confedence in ya

RSanders · Mar 18, 2004

I should mention, I have seen that oddity on the RH 9 machine on non DA machines. Usually, the first status shows one pid, the second shows half a dozen or more, stoping twice and starting always brings it up.

DirectAdmin Support · Mar 18, 2004

What time of the day do they start to happen? Wondering if the cpu usage from the dataskq just after midnight is causing it.

John

RSanders · Mar 18, 2004

It was 2300 EST, the machine thought it was 0415 EST (all my fault) for the first issue.

The second issue mentioned happens around 0000, with the correct system time. So you might be on to something, heres the monitor for one machine.

HTTP OK 03-17-2004 00:21:47 HTTP ok: HTTP/1.1 200 OK - 0.007 second response time
HTTP OK 03-17-2004 00:21:47 HTTP ok: HTTP/1.1 200 OK - 0.007 second response time
HTTP CRITICAL 03-17-2004 00:18:58 Socket timeout after 10 seconds
HTTP CRITICAL 03-17-2004 00:18:58 Socket timeout after 10 seconds

HTTP OK 03-10-2004 00:19:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP OK 03-10-2004 00:19:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP CRITICAL 03-10-2004 00:16:37 Socket timeout after 10 seconds
HTTP CRITICAL 03-10-2004 00:16:37 Socket timeout after 10 seconds

HTTP OK 03-09-2004 00:20:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP OK 03-09-2004 00:20:27 HTTP ok: HTTP/1.1 200 OK - 0.004 second response time
HTTP CRITICAL 03-09-2004 00:17:37 Socket timeout after 10 seconds
HTTP CRITICAL 03-09-2004 00:17:37 Socket timeout after 10 seconds

If I do say so myself, our responce time for service failures on managed machines is pretty fast

DirectAdmin Support · Mar 18, 2004

Hmm. Looks to me as though it may be the log rotater. Just a hunch, but since the dataskq takes a while to chug through all the data, apache isn't restarted for quite some time. But before it's restarted the rotated logs are tar.gz'ed and copied over, then the original log is deleted. Now, for the error logs, apache holds the filedescriptor open, so if it's trying to write to the deleted file before apache is restarted, I'm not sure what would happen. On our 7.2 build system, nothing happens. But I can't really say for other ones..

Perhaps do a test.. try deleteing someones error log file, then try generating errors (page not found errors, or php).. see if it generates a segfault.

That's just a guess.. it's the only real thing that I can think of which might cause apache to go down at that time.. as it's the only thing that DA does to apache, other than restart it after the tally.

John

RSanders · Mar 18, 2004

I'll keep my eye on it. I would rather not 'test' too much on these as they are production machines. The time i test something, is the time they are doing something 'important' *sighs*

But, I will watch it a bit closer and see what I can find out. Next time one locks up it's open season. If it's down from a failure, I can justify taking 5 minutes to check it out.

arvydas · Mar 31, 2004

DirectAdmin Support said:
Now, for the error logs, apache holds the filedescriptor open, so if it's trying to write to the deleted file before apache is restarted, I'm not sure what would happen. On our 7.2 build system, nothing happens. But I can't really say for other ones..

John

Isn't it better to truncate log files instead of deleting them? This would also allow to avoid restarting apache.

DirectAdmin Support · Mar 31, 2004

How do you mean? Anything other than appending would rewrite the file from zero and create a different link in the filesystem, essentially deleting the file and starting over causing a dangling filedescriptor...

John

arvydas · Mar 31, 2004

DirectAdmin Support said:
How do you mean? Anything other than appending would rewrite the file from zero and create a different link in the filesystem, essentially deleting the file and starting over causing a dangling filedescriptor...

John

Below is an example of Perl code that truncates file. It doesn't delete a file, it doesn't dangle file descriptor, it just sets file size to zero. This code was tested on apache logs and it works perfectly.

open(FILE, "+<access.log") or die;
flock(FILE, 2) or die;
seek(FILE, 0, 0) or die;
truncate(FILE, 0) or die;
flock(FILE, 8) or die;
close(FILE) or die;

DirectAdmin Support · Apr 1, 2004

Ok, I'll make the in-place log deletion and see what happen

http://www.directadmin.com/features.php?id=359

Thanks,

John

koos · Apr 12, 2004

sorry wrong post

duke27 · Jul 9, 2004

my server do the same thing

my httpd crash always each day

sometime at around 20h pm
and sometimes at around midnight....

what i need to do???

i built alllllllll

and its did it again....

duke27 · Jul 9, 2004

log

[Thu Jul 8 20:13:45 2004] [notice] child pid 31459 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31460 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31461 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31462 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31463 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31464 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31465 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:45 2004] [notice] child pid 31468 exit signal Segmentation fault (11)
[Thu Jul 8 20:13:46 2004] [notice] child pid 31474 exit signal Segmentation fault (11)

OR

[Fri Jul 9 00:15:16 2004] [crit] (98)Address already in use: make_sock: could not bind to port 8090

what can i do serious???

[Thu Mar 18 04:57:17 2004] [notice] child pid 2644 exit signal Segmentation fault (11

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Administrator

Verified User

Verified User

Administrator

Verified User

Administrator

Verified User

New member

Administrator

New member

Administrator

Verified User

Verified User

Verified User