DirectAdmin and System Crash

AsadMoeen · Dec 3, 2010

Thanks.

Waiting to hear back on Post #18 and #19.

AsadMoeen · Dec 6, 2010

We're closing in on this.

Since everything is default ( OS/ Directadmin ) nothing else seems to be the issue rather than a Hardware Libraries ( Virtualization ) since I'm running VMware workstation for the machine.

As a final check, 1st of all I requested DA to give me some websites that are running a default directadmin so I can check if they shutdown the same way like mine on running that software.

And 2nd is that I am gonna upgrade my Minimal to full.

After both these checks pass, I can then be sure it's Hardware libraries or VMware workstation at issue. I will then be asking Virtualization experts that do run DA on VPS's and finally get a new machine or fix the current then.

One more thing,

Today same issue happened but without running the link-builder software. Happened randomly. I tracked it up . I went to the IP/server-status page and it said:

503 error. Service unavailable due to maintenence or capacity problem.

Went to SSH, ram usage was 1.5 GB and 1.5GB was free out of total 3GB.

Ran a TOP and saw that apache processes are less but a single process was using 98% CPU. Came down by itself and services kept running. Really pissed off by Linux now.

AsadMoeen · Dec 7, 2010

Ok if someone can reply to the previous mail, here's an update.

I've upgraded my Minimal system to full and added a 3GB swap.

I now have 6GB ram.

I see an amazing and ridiculous thing happening now.

It's not just software specific, it's something that causes apache to be triggered and the processes never stop.

I ran the software again today, apache processes started growing.

CPU usage 20% and stable.
RAM usage just 120 MB free of Real RAM and whole 3GB swap free.

Websites : Still not working, although CPU/RAM are free by quite enough.

So it's something apache related, possibly a bug in apache or something else.

So if anybody thinks that my Apache needs to be upgraded ?
OR maybe since I run through VMware workstation, could that be an issue ? ( hardware related ? )

DirectAdmin Support · Dec 7, 2010

This is a random guess, Try changing your file:
/etc/httpd/conf/extra/httpd-mpm.conf

in the mpm_prefork_module, change:
MaxRequestsPerChild 10000

to be:
MaxRequestsPerChild 1

Also, if you're not running the latest version of apache, yes, update.

Code:

cd /usr/local/directadmin/custombuild
./build update
./build all d

John

zEitEr · Dec 7, 2010

@AsadMoeen

Since you're running a virtual server, it would be nice to install munin on a hardware node additionally to see the hole picture.

How many VMs are you running there?

Munin shows an IOstat's jump, at the same time your box begins to use a swap. Pay attention at IOwait, that increased up to 83%. Also your eth0 traffic enlarged to 110Mbit/s.

Did you check your HDD(s) on bad blocks? Did you run any RAM tests? Are there any hardware errors logged in system logs and on your console?

You can try to use bonnie+ for testing your HDD(s).

AsadMoeen · Dec 7, 2010

The Host machine for the VPS is windows and that works fine. I keep a check on the Host's CPU/RAM and that has not been an issue. I am just running a Single VM so that's not an issue too.

About Network, So I did run a Full System upgrade so maybe that explains why Network Usage was high today. And IOwait perhaps went high due to the unlimited apache processes?

About any Hard-Disk or RAM issues, I've done successful checks on them too. I've done all the tests I could and doing them from 2-3 weeks now. So that's why I am reaching on it as some Hardware library or Kernel Compatibility issue. And I am considering this to be a go because:

When the system crashes, it ends up giving the following error:

INFO: task httpd:xxxx blocked for more than 120 seconds
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

And for why I think it's Hardware/Kernel related issue so that's simple. Everything is simply default. A clean OS install with a clean Directadmin default. Nothing changed, no extra package or software. Fully default configuration. Thousands of people use it just on the basic configuration and that does not have problems like these. So I think that's why Hardware is the cause rather than the Software. For it, I've posted a message on Debian forum already. I will now ask Virtualization experts who do run DA's on VPSs so maybe they would know the cause.

And yes, I've set the MaxRequestsPerChild to be 1 .
I did run an update . It looks apache was already up-to-date but some other packages were upgraded.

zEitEr · Dec 7, 2010

If hardware is OK, you'd better lower:

ServerLimit 450
MaxClients 450

to keep Apache from crashing your server.

I'm not any kind of an expert with VMware, but perhaps some settings should be updated.

AsadMoeen · Dec 7, 2010

Apache process running multiple times seems likely to be the issue.

I am guessing it's related to Child-process.

I'm trying out MaxRequestsPerChild to be 1 now as requested by Support.

zEitEr · Dec 7, 2010

Ok, please update us with new information as soon, as it's possible.

AsadMoeen · Dec 7, 2010

Read the next Post

AsadMoeen · Dec 8, 2010

Ok it's still the same but things are better now. Websites start working!!

I've discovered some new thing now. But this time I did the following:

I made 140+ requests or more to the website, all the websites stopped working. But this time, as soon as the requests stopped, the websites started working back again. There were still a lot of Apache processes in Top and RAM usage was still too high but websites did work!. Soon in 2 mins, Apache processes shutdown back to 1 process and ram usage was lesser .

Previously, even after the requests finished, the websites did not work.

The only thing I did change from Previous to Current was to restore all the httpds confs.

But along with it another thing I changed in the default config was

MaxRequestsPerChild to be 1000 instead of 10000.

Now I restored MaxRequestsPerChild to 10000 and I feel no change.

So Maybe it wasn't due to MaxRequests.. but due to the Restored Confs.

One more thing I'd like to ask,

During the requests are made ( Software is run ) , the website shuts down and comes back as soon as the Software is stopped ( But Requests don't stop until 2 mins ) . By this I mean that even though Software is stopped and websites start working, I can still see a lot of processes of Apache and 150+ Requests in server-status for about 3 mins more.

Maybe this is related with KeepAlive and Timeout. I see some people turn off keep alive. I want to know about this and all the comments you can give about this. Since I now have a default DA . I'm happy it's better now but I'll keep trying the software for a couple days more now before declaring if the issue is fixed. The next step if fixed will of course be to optimize apache to a better level.

So I think this explains why Default works better than everything else. Similarly, if I run this on other Default DA hosted Boxes, if they have the capacity, they will stop working and work back again as soon as the requests sent are stopped. I've tried this already on a Directadmin's website and that stopped working too.

Also, is there anything I can use to keep the websites working even when the requests are high ?

tomtom901 · Dec 8, 2010

Maybe some kind of bash script? http://bash.cyberciti.biz/monitoring/monitor-unix-linux-system-load/. You could just let it restart apache, but I'd rather say, try it on another VM or something. Maybe it is just your hardware.

zEitEr · Dec 9, 2010

AsadMoeen said:
Also, is there anything I can use to keep the websites working even when the requests are high ?

Frontend (NGINX/LIGHTTPD) + Backend (Apache)

More information is available by links:

http://kovyrin.net/2006/05/18/nginx-as-reverse-proxy/lang/en/
http://wiki.joyent.com/smartmachine:nginx_apache_proxy

also you can google more links.

AsadMoeen · Dec 9, 2010

Well that would be like going out of the way.

I wanna stick with how it is and just optimize it up.

I am back to default configs and my system is now working like any other default DA

.

I'm happy it's working better now. When I run the software, websites become inaccessible but come back again as the software stops. This happened with DA's website too so I think that is common between me and the rest.

The child processes grow and when the requests are stopped, they quit.

So I need someone to tell me some optimization values for httpd. Once again, Keepalive, timeout and childprocesses are what that matter more. Because on other shared hosts, like hostgator and even the free 000webhost websites don't stop on running that software. They're certainly optimized.

AsadMoeen · Dec 10, 2010

If anyone has a default DA running box.

Please provide me your website address, a sitemap, server-status page.

If your DA is on default configs and nothing has been done to protect it, then it's possible that after a certain number of apache requests are crossed, your website would shutdown. I want to experiment this and submit the details to DA support. We are finding a way to protect this.

AsadMoeen · Dec 12, 2010

AsadMoeen said:
If anyone has a default DA running box.

Please provide me your website address, a sitemap, server-status page.

If your DA is on default configs and nothing has been done to protect it, then it's possible that after a certain number of apache requests are crossed, your website would shutdown. I want to experiment this and submit the details to DA support. We are finding a way to protect this.

One more thing I just found out today:

I installed DA on a new Cent OS 5.5 Box with a 512MB ram + 1GB swap just for testing purposes. I ran the software again and websites became inaccessible.

Now I must say, that this is not a Debian or Cent OS specific bug or even a hardware fault. But still to be more sure, I am gonna buy a pre-installed Directadmin VPS for $5 just for testing it on that too so I'm sure if VMware's provided Hardware or Libraries are not a cause.

For what more the cause could be ( possibly an Apache bug ) :

Well let's say we should analyze this on our own DA's and I'm sure it will show up the same. I sent just 90 requests to my website and it was shutdown for sometime and came back itself later.

Now on analyzing the server-status page:

I could see about 90 W's ( which is basically a symbol for Sending-Reply ) so even when the RAM/CPU is free since these are just 90 requests, websites don't work.

That's why I need your help and please submit me your DA informations regarding websites and sitemaps.

zEitEr · Dec 12, 2010

Report any Apache's bug to it's community or mailing lists.

LawsHosting · Dec 12, 2010

AsadMoeen said:
I ran the software again

What software? I take it a some sort of pseudo DDoS thingy?

AsadMoeen · Dec 12, 2010

Peter Laws said:
What software? I take it a some sort of pseudo DDoS thingy?

Not a Ddos program. It's actually an SEO software called "PRstorm" . You actually give it a Referring URL and a list of URL's which you want it to harvest. I actually gave it some URL's from my sitemap. But you can also use it's URL extractor to extract some URL's from the website. I have blogs actually.

And also, sometimes when the software is not used, but requests come as Normal traffic, you can also see this issue happening. Once again, the settings for httpd.conf are not a cause because I've already tested them by lowering the values considerably in mpm.conf so that my RAM/CPU is free. I lowered the values so that apache does not consume all of the RAM. Then I ran the software again, and although CPU/RAM was free by 70%, websites were shutdown.

The software is just something to test the requests specifically. So you can actually see the outcome of what would happen on your site if these much people start connecting.

Tried it on DA's own file hosting website. It went down too and came back soon after the software was stopped. Maybe it came back early because the URL's were light, just actually file URL's, not the PHP/ sql blog urls.

And before I submit any bug to Apache's mailing list. I want to be sure about it so I can test it on your websites. So please provide the information I asked. I'm already looking for it on some other hardware so I can be sure VMware's hardware is not a cause which actually has a very least chance of happening. Your websites, specially blogs would be very nice to test.

zEitEr · Dec 13, 2010

Sure, not DDOS, just simple DOS program.

DirectAdmin and System Crash

Verified User

Verified User

Verified User

Administrator

Super Moderator

Verified User

Super Moderator

Verified User

Super Moderator

Verified User

Verified User

Verified User

Super Moderator

Verified User

Verified User

Verified User

Super Moderator

Verified User

Verified User

Super Moderator