High server load

mimic

Verified User
Joined
Oct 5, 2007
Messages
50
Location
Utrecht, The Netherlands
Hello all,

Recently we have changed our hostingservices from a physical machine to a virtual one.

The virtual machine has the same hardware as the physical one, but has the double amount of memory. (4GB). We also switched from Debian to Ubuntu LTS.

The other two virtual machines on the same server are practically idle.

The problem that we have is that at random points the server starts getting high CPU load, and the DirectAdmin panel e-mails us a warning about it.

Fortunately, there is a top-dump included in that message. It shows us that Apache is responsible for the load.

My question here is, why is it happening, and how can we solve that problem? The load is, as I said, random, but luckly it doesn't affect services for so far we know.

Logs:
A new message or response with subject:

Warning: The system load average is 14.81

This is an automated message notifying you that the 5 minute load average on your system is 14.81.
This has exceeded the 10 threshold.

One Minute - 40.54
Five Minutes - 14.81
Fifteen Minutes - 6.01

top - 21:31:45 up 1 day, 19:17, 0 users, load average: 43.81, 17.67, 7.19
Tasks: 297 total, 42 running, 254 sleeping, 0 stopped, 1 zombie
Cpu(s): 4.2%us, 5.9%sy, 0.4%ni, 88.5%id, 0.8%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 4118332k total, 2224328k used, 1894004k free, 384148k buffers
Swap: 1952760k total, 0k used, 1952760k free, 830444k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27102 thegirls 20 0 41016 36m 2976 R 9 0.9 0:18.81 spamd child
21917 root 20 0 33504 16m 3940 R 8 0.4 2:48.35 /usr/sbin/httpd -k start -DSSL
10567 apache 20 0 39884 21m 2988 R 8 0.5 0:06.21 /usr/sbin/httpd -k start -DSSL
10575 apache 20 0 39884 21m 2988 R 8 0.5 0:05.30 /usr/sbin/httpd -k start -DSSL
10625 apache 20 0 39596 21m 2988 R 8 0.5 0:02.95 /usr/sbin/httpd -k start -DSSL
10645 apache 20 0 39596 21m 2988 R 8 0.5 0:02.56 /usr/sbin/httpd -k start -DSSL
10647 apache 20 0 39088 20m 2956 R 8 0.5 0:02.47 /usr/sbin/httpd -k start -DSSL
10550 apache 20 0 39892 21m 3052 R 8 0.5 0:07.33 /usr/sbin/httpd -k start -DSSL
10563 apache 20 0 39884 21m 2988 R 8 0.5 0:06.34 /usr/sbin/httpd -k start -DSSL
10568 apache 20 0 39884 21m 2988 R 8 0.5 0:05.92 /usr/sbin/httpd -k start -DSSL
10574 apache 20 0 39884 21m 2988 R 8 0.5 0:05.50 /usr/sbin/httpd -k start -DSSL
10584 apache 20 0 39884 21m 2988 R 8 0.5 0:04.36 /usr/sbin/httpd -k start -DSSL
10605 apache 20 0 39884 21m 2988 R 8 0.5 0:03.75 /usr/sbin/httpd -k start -DSSL
10546 apache 20 0 39884 21m 2988 R 8 0.5 0:07.66 /usr/sbin/httpd -k start -DSSL
10557 apache 20 0 39884 21m 2988 R 8 0.5 0:06.86 /usr/sbin/httpd -k start -DSSL
10561 apache 20 0 39884 21m 2988 R 8 0.5 0:06.65 /usr/sbin/httpd -k start -DSSL
10576 apache 20 0 39884 21m 2988 R 8 0.5 0:04.85 /usr/sbin/httpd -k start -DSSL
10589 apache 20 0 39884 21m 2988 R 8 0.5 0:04.21 /usr/sbin/httpd -k start -DSSL
10596 apache 20 0 39884 21m 2988 R 8 0.5 0:04.00 /usr/sbin/httpd -k start -DSSL
10675 root 20 0 2676 1180 796 R 8 0.0 0:01.27 /usr/bin/top -c -b -n 1
10237 apache 20 0 40116 22m 3384 R 8 0.6 1:11.13 /usr/sbin/httpd -k start -DSSL
10637 apache 20 0 39596 21m 2988 R 8 0.5 0:02.79 /usr/sbin/httpd -k start -DSSL
10542 apache 20 0 40044 21m 3316 R 7 0.5 0:08.15 /usr/sbin/httpd -k start -DSSL

Server has been running for about 3 days now, and this also catched my attention:
278 root 20 0 0 0 0 S 3 0.0 23:12.45 jbd2/sda3-8
1108 mysql 20 0 144m 32m 5952 S 9 0.8 107:46.37 mysqld
896 root 20 0 3664 848 616 S 2 0.0 15:36.16 dovecot
9971 apache 20 0 0 0 0 Z 2 0.0 0:00.15 httpd <defunct>
309 root 20 0 0 0 0 S 5 0.0 18:48.17 flush-8:0

Memory usage:
MemTotal: 4118332 kB
MemFree: 1910064 kB
Buffers: 294380 kB
Cached: 1198112 kB
SwapCached: 8 kB
Active: 909900 kB
Inactive: 997372 kB
Active(anon): 336176 kB
Inactive(anon): 79132 kB
Active(file): 573724 kB
Inactive(file): 918240 kB

Interesting things in the Apache log:
loads of:
[Fri Dec 16 01:12:05 2011] [warn] child process 3868 still did not exit, sending a SIGTERM
[Fri Dec 16 01:12:07 2011] [warn] child process 3905 still did not exit, sending a SIGTERM
[Fri Dec 16 01:12:07 2011] [warn] child process 3868 still did not exit, sending a SIGTERM
[Fri Dec 16 01:12:09 2011] [warn] child process 3905 still did not exit, sending a SIGTERM
[Fri Dec 16 01:12:09 2011] [warn] child process 3868 still did not exit, sending a SIGTERM
[Fri Dec 16 01:12:11 2011] [error] child process 3905 still did not exit, sending a SIGKILL
[Fri Dec 16 01:12:11 2011] [error] child process 3868 still did not exit, sending a SIGKILL
[Fri Dec 16 01:12:12 2011] [notice] caught SIGTERM, shutting down
 
You would have to enable apache extendedstatus and see what sites are being accessed. There are many reasons that it could be causing it. Hire someone to look at your server if you dont have the expertise to fix it yourself.
 
Thanks for your feedback. We just need some guidelines to determain the source of the problem, then we are able to fix it.

I'm now checking how I can include the contents of /server-status in the DirectAdmin message while the load is high. I think that would help a lot.


Edit: I've created my own script that checks system load, and e-mails me the contents of /server-status when its too high.

mailserverstatus.sh | $8 = 1min. marker - $9 = 5min. marker $10 = 15min. marker
#!/bin/bash
if [ `uptime | awk '{ print$9 }' | cut -d. -f1` -gt 5 ];
then
/usr/local/bin/php /home/-username-/serverstatus.php > /home/-username-/serverstatus.txt
(cat <<EOCAT
Subject: `cat /tmp/load`
MIME-Version: 1.0
Content-Type: text/html
Content-Disposition: inline
EOCAT
cat /home/-username-/serverstatus.txt) | /usr/sbin/sendmail -emailaddress-
rm /home/-username-/serverstatus.txt
exit
fi

serverstatus.php
<?php
print file_get_contents("http://-ipaddress-/server-status/");
?>

Added a cron-job for every 5 minutes:
*/5 * * * * /home/-username-/mailserverstatus.sh

Works like a charm.

Now we are going to play the waiting game.
 
Last edited:
It didn't take long before we saw the load rising again. Apparently a client has a very poor written website that converts images on the fly to smaller ones.

Normally, with a small website, that shouldn't cause much trouble. But unfortunately, this websites has a lot of hits per day.

We still need to do some research, but this might be it.
 
Last edited:
After a few days testing, we are sure the load was caused by the script I earlier talked about. The issue has been resolved now.

Happy X-mas, if it applies! :)
 
Hello,

Let me answer the question.

serverstatus.php

It is a script which saves data from Apache Server Status page. It's not a malicious script. And server status reports can be useful when doing an investigation.
 
In addition to zEitEr,

In our case it was obvious. The Apache status page showing information about all the current connections where pointing out to one website/script.

jjma, do you have load problems too?
 
Back
Top