How to Kill/Restart on High Load?

SupermanInNY

Verified User
Joined
Sep 28, 2004
Messages
428
Hi All,

I have a need for a script / monitor that does automatic kill and restart of Apache and MySQL upon reaching high loads over say...14.00

Any pointers?

Thanks,

-Alon.
 
What is the point in this. There are reasons why the load is getting high restarting is not gonna do anything but kill current connections and then they will just all reconnect.

Only thing I can think of is write a program that would look at the ps results and if its higher then a certain number issue the restart.
 
chatwizrd said:
What is the point in this. There are reasons why the load is getting high restarting is not gonna do anything but kill current connections and then they will just all reconnect.

Only thing I can think of is write a program that would look at the ps results and if its higher then a certain number issue the restart.

I moved a client with a single domain from a shared enviroment to a dedicated server.
This should have had an opposite situation,. where instead of a load going up,. it should have remained down.

The OS on the old server was WBEL 3.0. Stable as a rock.

the new server has CentOS 4.3 which is yielding this high load problem.

As it happens, this user is about to move to a 3rd server which I'll install WBEL 3.0 again and that's how will keep it..knowing that if something works... we don't need to mess with it anymore.

With that said,. the new server will only be in the NOC in 8 days.
The solution that I'm looking for is a temporary patch solution for that time.

I suspect the new OS/Kernel or something else in the current CentOS 4.3 server, but I can't do anything at this time,.and as it happens, I am issueing killall -9 every few hours, as the load is just hitting highs of 300.0 and above!

So,. the remedy that I'm seeking is not for a long term solution, rather it is just for the few days till the new server will arrive.

It isn't the nicest of solutions, but it is a way to keep the site up instead of hung up for hours until I reach it.
 
Something like this would probably work. Please note that it is untested and you will have to test it and probably code it to work correctly.

Code:
#!/bin/sh

max_cpu="10" # Set to max cpu.

# Set to apache restart command.
apache_restart() {
/sbin/service apache restart
}

# Set to mysqld restart command.
mysql_restart() {
/sbin/service mysqld restart
}

################## DONT TOUCH ##################

cpu_usage_apache=$(ps xua | grep -i apache | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)
cpu_usage_mysql=$(ps xua | grep -i mysql | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)

for i in $cpu_usage_apache; do

        if [ "$i" > "$max_cpu" ]; then
                $apache_restart
        else
                echo >/dev/null
        fi
done

for i in $cpu_usage_mysql; do

        if [ "$i" > "$max_cpu" ]; then
                $mysql_restart
        else
                echo >/dev/null
        fi
done

This file would be put somewhere like /sbin/restart_servers

Then chmod 750 so it is executable.

Then setup in a crontab to run at intervals of your choosing.
 
chatwizrd said:
Something like this would probably work. Please note that it is untested and you will have to test it and probably code it to work correctly.

Code:
#!/bin/sh

max_cpu="10" # Set to max cpu.

# Set to apache restart command.
apache_restart() {
/sbin/service apache restart
}

# Set to mysqld restart command.
mysql_restart() {
/sbin/service mysqld restart
}

################## DONT TOUCH ##################

cpu_usage_apache=$(ps xua | grep -i apache | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)
cpu_usage_mysql=$(ps xua | grep -i mysql | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)

for i in $cpu_usage_apache; do

        if [ "$i" > "$max_cpu" ]; then
                $apache_restart
        else
                echo >/dev/null
        fi
done

for i in $cpu_usage_mysql; do

        if [ "$i" > "$max_cpu" ]; then
                $mysql_restart
        else
                echo >/dev/null
        fi
done

This file would be put somewhere like /sbin/restart_servers

Then chmod 750 so it is executable.

Then setup in a crontab to run at intervals of your choosing.

Few questions with regards to this script.

This script will do a Restart Service.
I need a brute force command that does the killall -9 for the httpd.
that seem to work very well, though not pleasent.
As noted this is for a short one week of time.
the restart of the service will occur automatically by DA.
Also,. no need for the mysqld,. as it is not having any issues.
The problem is directly linked to the start/stop of the apache server.

I modified slightly the proposed script.
Would this script work?

Code:
#!/bin/sh

max_cpu="10" # Set to max cpu.

# Set to apache restart command.
apache_restart() {
/usr/bin/killall -9 httpd
}


################## DONT TOUCH ##################

cpu_usage_apache=$(ps xua | grep -i apache | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)
cpu_usage_mysql=$(ps xua | grep -i mysql | awk '{print $3}' | sed -e '/%CPU/d' | sort -nr)

for i in $cpu_usage_apache; do

        if [ "$i" > "$max_cpu" ]; then
                $apache_restart
        else
                echo >/dev/null
        fi
done
 
Last edited:
Script updated:

Code:
#!/bin/sh

max_cpu="10" # Set to max cpu. This is the max % you will allow no decimals. 
apache_name="apache" # Set to the name that apache runs as. Usually apache but sometimes it may be httpd.

################## DONT TOUCH ##################

cpu_usage_apache=$(ps xua | grep -i $apache_name | grep -v grep | awk '{print $3}' | sed -e '/%CPU/d' -e 's/.[0-9]$//' | sort -nr)

if [ -z "$cpu_usage_apache" ]; then
        exit 1
fi

for i in $cpu_usage_apache; do

        if [ "$i" -gt "$max_cpu" ]; then

kill_apache=$(ps xua | grep -i $apache_name | grep -v grep | awk '{print $2}' | xargs kill)

                $kill_apache
                exit 0
        else
                echo >/dev/null
        fi

done

exit 0
 
Last edited:
chatwizrd said:
What is the point in this. There are reasons why the load is getting high restarting is not gonna do anything but kill current connections and then they will just all reconnect.
For many of us (perhaps most of us doing webhosting, the problem is usally httpd and/or mysql. But you don't have to shut down the mysql; it'll most likely die off by itself when you restart httpd.

I've never needed the -9 (force), I can always get a graceful restart at server loads in the <> 14 range (though not in the <> 100+ range). It's much better to have something like this in place than to do a power-switch restart.

And for many of us, it would be a good tool to have.

Jeff
 
jlasman said:
For many of us (perhaps most of us doing webhosting, the problem is usally httpd and/or mysql. But you don't have to shut down the mysql; it'll most likely die off by itself when you restart httpd.

I've never needed the -9 (force), I can always get a graceful restart at server loads in the <> 14 range (though not in the <> 100+ range). It's much better to have something like this in place than to do a power-switch restart.

And for many of us, it would be a good tool to have.

Jeff

Well.. a normal load for my servers is usually in the 0.00 to 4.x
When there is a 'run-a-way' service..due to an attack or some other weired situation, the load is usually jumping high within seconds and the chances of being able to give any shell commands start to become almost imposible.

The killall,. although brute by nature,. is practically the only way that I found that I can d/l the CPU load when I'm over the 30.00 mark.
And getting to 30,. usually gets within 30 seconds of an attack or a runaway situation as I'm experiencing.

Just as curiosity for the cron job:

Should I use:

crontab -e

or should I use something else?

Also,. I'm a bit short on my cron jobs definitions,.

Should it be:

* * * * * /usr/sbin/restart_servers.sh

and should that be in the

crontab -e

Thanks,

-Alon.
 
Supe,

crontab -e ... the -e means edit.

That sample cronjob will run every minute. Is that what you want?

Jeff
 
jlasman said:
Supe,

crontab -e ... the -e means edit.

That sample cronjob will run every minute. Is that what you want?

Jeff

Yes!
This script is a 'nothing' in terms of resource. It is in need to be checked every minutest to make sure there isn't a runaway service of some sort.
This is like a heart-rate check.. you don't want to find yourself in the high numbers when it is 'already late'.
Since it does nothing if you don't hit the high value trigger,. it keeps the system at low values and 'save the day' at a 60 seconds delay.

Bare in mind,. this is a not a remedy for a long term solution.
It is just for the purpose of the short time of 8 days till a new server comes in.
Normally, this kind of a script should be invoked ever, this is just a temporary quick fix for the short time we are in now.

Thanks for the help with the script and the crontab assurance :).
 
I have no luck with the script :(

apache is running free to 116,...123 and there is no stopping of it.

Brute force anyone?

killall -9 httpd please?
 
Once you get it working, yoiu may want to add a logging feature so you know how often it runs...might be able to find a pattern.
 
Script updated again:

Code:
#!/bin/sh

max_cpu="10" # Set to max cpu. This is the max % you will allow no decimals.
apache_name="httpd" # Set to the name that apache runs as. Usually apache but sometimes it may be httpd.

################## DONT TOUCH ##################

cpu_usage_apache=$(ps xua | grep -iw $apache_name | grep -v grep | awk '{print $3}' | sed -e 's/..$//' | awk '!L[$0]++' | sort -nr)

if [ -z "$cpu_usage_apache" ]; then
        echo "not running"
        exit 1
fi
  
for i in $cpu_usage_apache; do

        if [ "$i" -gt "$max_cpu" ]; then
                ps xua | grep -iw $apache_name | grep -v grep | awk '{print $2}' | xargs kill
                exit 0
        
        else
                echo >/dev/null
        fi
          
done      

exit 0
 
sullise said:
Once you get it working, yoiu may want to add a logging feature so you know how often it runs...might be able to find a pattern.

Yeah probably could after its fixed to work correctly. Was just trying to show him that something like this could practically work.
 
Supe,

I know it's not easy but how about finding the reason and fixing that? For example, there's something called mytop that tells you if a particular mysql user is hogging resources. And someone should be able to write similar for httpd (although breaking up the logfiles makes it a lot harder).

Jeff
 
Hi Jeff,

I was with chatwizrd on the chat trying to work things,. and about 3 minutes after I started to talk to him on the chat,.. I got a Kernel Panic.

Since I'm running with a KVM-Over-IP was able to see all the error msgs incluing the Kernel panic.

I have strong belief that the problem is OS related and not service related.
I've seen other posts in CentOS forum where other folks got high loads there and were complaining about this.

As a result of the kernel panic (which forced me to call the NOC to physically press the reset button as kernel panic stops all keyboard action) I throw in the towel and moved the client back to original server (which happens to be the same server you actually configured a year and a half ago :) ).

I'll format the faulty server and install clean OS on it.

Should I go with WBEL 4.0 or 3.0?
3.0 Has been stable as a rock for me on several servers... for the purpose of webhosting specifically.. do I care for anything else?
 
I believe that by now I've responded somewhat to this on the chat...

But here's more:

If you can forwrd the subject line of the posts in the CentOS forum, I'll look at them. But I've never had such a problem. Kernel Panic can be caused by faulty memory. I don't think high loads can.

The most common cause of a high load (actually it's not technically a load; it's a backup of processes waiting to get to the processor) is generally high usage of swap memory, which slows the server incredibly on a shared server where processes have to be swapped off and on to disk continually.

And that could be caused by bad memory not recognized by the system, so the system stops counting and doesn't think it has as much memory as it does.

Check the output of cat /proc/meminfo to see how much memory the system thinks you've got (first few lines).

While it's possible that CentOS folk are somehow making a compilation error that WBEL isn't, it's highly unlikely; I'd look at the RH lists or forums.

Jeff
 
Thanks for the link, Supe. I'm now guessing it's a problem between the kernel and apache. I'm not sure if John is using the latest apache or not. I'd try the kernel that in that thread gives the best results, if possible.

Definitely do NOT use localhost in your /etc/resolv.conf file; especially if you've made it a non-cacheing nameserver as discussed in other threads here.

Since I don't have the problem (except for occasionally high loads which is to be expected) I have no way to check further.

Jeff
 
I am using the script and it is running with a cronjob - I can see it is running from /var/log/cron. But how do I know if the script is working in restarting apache when the load exceeds the limit that is set? What log should I check or create?

My OS is Fedora 4 and the apache version is 2.0 - I built apache using customapache.
 
Last edited:
Back
Top