Server load went through the roof today

nealdxmhost

Verified User
Joined
Jan 1, 2009
Messages
232
Location
Los Angeles CA
Today I had a bunch of warnings that my server load was over 400!!!! Feeling about as dumb as a bag of hammers I got no idea what may be fouled that needs fixing. The server in question is a dual XEON with 3GHZ and 4GB of RAM running on Ubuntu 10.04 (if memory serves me correctly)

I have copied and pasted one of the messages I got into this message in the hope maybe someone here can make heads or tails of what is up.

Thanks in advance guys.

Code:
This is an automated message notifying you that the 5 minute load average on your system is 452.91.
This has exceeded the 10 threshold.

One Minute      - 480.63
Five Minutes    - 452.91
Fifteen Minutes - 289.65

top - 11:41:36 up 4 days, 10:01,  0 users,  load average: 506.02, 502.03, 458.53
Tasks: 837 total,  15 running, 822 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.4%us,  1.6%sy,  6.5%ni, 80.1%id, 10.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4059504k total,  4026688k used,    32816k free,     8496k buffers
Swap: 11893752k total,  5637536k used,  6256216k free,    34408k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  52 root      20   0     0    0    0 D   17  0.0  25:24.78 [kswapd0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2247 mysql     20   0  609m 7500 1708 S    0  0.2  44:27.72 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql --log-error=/usr/local/mysql/data/da1.laxweb.net.err --pid-file=/usr/local/mysql/data/da1.pid                                                                                                                                                                                                                                                                                                                                  
1468 nobody    39  19 41076 1172  708 D    0  0.0   1:40.55 /usr/local/directadmin/dataskq                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
25158 apache    20   0  124m  11m 3308 D    0  0.3   0:00.99 /usr/sbin/httpd -k start -DSSL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
25378 root      20   0 39992 2680 2156 S    0  0.1   0:00.75 /usr/local/directadmin/dataskq                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
4399 root      20   0  119m 4316 1408 S    0  0.1   0:07.91 /usr/sbin/httpd -k start -DSSL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
20431 root      20   0     0    0    0 S    0  0.0   2:28.18 [flush-8:16]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 921 bind      20   0  232m 2672  872 S    0  0.1   0:30.88 /usr/sbin/named -u bind                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25154 root      20   0 19760 1712  824 R    0  0.0   0:00.56 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
14117 root      20   0 89908 5460 1128 D    0  0.1   1:18.06 lfd - processing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
25161 root      20   0 19760 1704  824 R    0  0.0   0:00.53 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25155 root      20   0 19760 1712  824 R    0  0.0   0:00.52 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25072 root      20   0 39992 2348 2036 S    0  0.1   0:00.43 /usr/local/directadmin/dataskq                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
25342 rubrepor  20   0  125m  13m 3852 S    0  0.3   0:00.42 /usr/sbin/httpd -k start -DSSL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
25368 root      20   0 89908 6064  708 D    0  0.1   0:00.41 lfd - (child) closing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
 767 root      20   0     0    0    0 S    0  0.0  12:11.08 [kjournald]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
25160 root      20   0 19760 1708  824 D    0  0.0   0:00.39 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25331 root      20   0 19760 1664  844 D    0  0.0   0:00.39 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25391 webapps   20   0  126m  14m 4328 S    0  0.4   0:00.39 /usr/sbin/httpd -k start -DSSL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
25415 root      20   0 19760 1772  852 R    0  0.0   0:00.38 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25423 root      20   0 19760 1784  852 R    0  0.0   0:00.37 /usr/bin/top -c -b -n 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
25165 root      20   0 89908 7960  700 D    0  0.2   0:00.36 lfd - (child) closing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
25242 root      20   0 19760 1644  832 D    0  0.0   0:00.33 /usr/bin/top -c -b -n 1
 
Your output shows nothing that is going on. You should never use that much swap. You either had a memory leak or something else is going on.
 
Today I had a bunch of warnings that my server load was over 400!!!! Feeling about as dumb as a bag of hammers I got no idea what may be fouled that needs fixing. The server in question is a dual XEON with 3GHZ and 4GB of RAM running on Ubuntu 10.04 (if memory serves me correctly)

You might need to add much more RAM or:

1. to lower max allowed servers/users in apache
2. to lower max allowed connections/threads in mysql
3. to disable persistent connections to MySQL in php.ini and limit number of max allowed connections
4. etc.
 
You might need to add much more RAM or:

1. to lower max allowed servers/users in apache
2. to lower max allowed connections/threads in mysql
3. to disable persistent connections to MySQL in php.ini and limit number of max allowed connections
4. etc.

I discovered that the source of the problem appeared to be tied to one web site that was built with Joomla and somehow got compromised with a rogue plugin, after enabling server-status I was seeing an ungodly amount of calls to that one domain for a certain php file which upon research I found was hacked in some way shape or form. Needless to say I shut that site down ASAP and things leveled off back to normal. To add insult to injury it also sent my overall bandwidth right through the roof as it was pushing about 300 MB/second. Also I was finding an inordinately large number of connections from a couple of offshore IP addresses as well. Added those puppies into BFD and CSF and things seem to be very quiet now.

I shall keep my fingers crossed.
 
I discovered that the source of the problem appeared to be tied to one web site that was built with Joomla and somehow got compromised with a rogue plugin

Don't know how you think of it, my position is as following it is not acceptable that one user account or badly written (or compromised) PHP script can cause your server to fail. You as an administrator should prevent it by applying all necessary steps in order not to allow overcome allowed limits.

This is my point of view, and I don't try to change your mind. It's just a little thing which you might need to think of.
 
I'm thinking perhaps Cloud Linux. Anyone know if it would help resolve these kinds of problems?

Neal: I'm out all day today but feel free to call me about this on Friday.

Jeff
 
Any load issue will disappear with Cloud Linux as you set the max CPU per domain or package. I had a server (i7 - 24GB RAM, SAS drives) where load was all the time from 7-10 and the moment I installed Cloud Linux and setup up the limits load dropped to 1-2. You can setup more strict limits but then your clients will get some errors due to limiting resources.

They have a 30 days trial, this is what I got in the first instance as I was sceptical about the concept. If you are not happy at the end of the trial you can revert OS back to CentOS.
 
zEitEr,

You make a very good point and even after five years of running a hosting server I am still learning. I am sure there is something I missed somewhere when I put the server online. Now I just need to figure it out. Luckily I have not had the server load up on my like this for the past three or four weeks


Don't know how you think of it, my position is as following it is not acceptable that one user account or badly written (or compromised) PHP script can cause your server to fail. You as an administrator should prevent it by applying all necessary steps in order not to allow overcome allowed limits.

This is my point of view, and I don't try to change your mind. It's just a little thing which you might need to think of.
 
Back
Top