Please help diagnose sudden high system load...

csgo

Verified User
Joined
Feb 29, 2012
Messages
47
My system normally runs at a very low load. Tonight I started getting the high system load emails. The load was so high I couldn't access DirectAdmin (it would just timeout) or even via SSH. I had to power cycle the server.

Below is the DirectAdmin ticket info once the server came back online, but I'm not experienced enough to know what caused the "lockup". If someone would be kind enough to give me some insight I would greatly appreciate it.

Here's the info (CentOS 6 64-bit with DirectAdmin 1.43.3):

This is an automated message notifying you that the 5 minute load average on your system is 170.12.
This has exceeded the 10 threshold.

One Minute - 170.53
Five Minutes - 170.12
Fifteen Minutes - 148.29

top - 22:33:53 up 8 days, 34 min, 0 users, load average: 171.01, 170.17, 150.09
Tasks: 432 total, 3 running, 428 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.4%us, 0.1%sy, 0.4%ni, 97.3%id, 0.9%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3911628k total, 3824144k used, 87484k free, 3160k buffers
Swap: 8368120k total, 4191208k used, 4176912k free, 38616k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2225 mysql 20 0 4430m 105m 3524 S 0.4 2.8 36:11.09 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/brat.xxxxx.com.err --pid-file=/var/lib/mysql/brat.xxxxx.com.pid
10286 apache 20 0 304m 73m 4216 D 0.4 1.9 0:01.69 /usr/sbin/httpd -k start -DSSL
10282 apache 20 0 237m 21m 4280 D 0.1 0.6 0:01.00 /usr/sbin/httpd -k start -DSSL
129 root 39 19 0 0 0 S 0.1 0.0 10:40.03 [kipmi0]
10391 apache 20 0 234m 19m 4504 D 0.1 0.5 0:01.28 /usr/sbin/httpd -k start -DSSL
11379 root 20 0 54424 2688 2136 S 0.1 0.1 0:00.06 /usr/local/directadmin/dataskq
10298 apache 20 0 239m 21m 4660 D 0.1 0.6 0:00.99 /usr/sbin/httpd -k start -DSSL
10311 apache 20 0 237m 20m 4300 D 0.1 0.5 0:01.02 /usr/sbin/httpd -k start -DSSL
10574 apache 20 0 238m 18m 4532 D 0.1 0.5 0:01.05 /usr/sbin/httpd -k start -DSSL
10851 apache 20 0 325m 28m 5388 D 0.1 0.7 0:00.65 /usr/sbin/httpd -k start -DSSL
10854 apache 20 0 264m 50m 4648 D 0.1 1.3 0:00.93 /usr/sbin/httpd -k start -DSSL
10947 apache 20 0 237m 24m 4500 S 0.1 0.7 0:00.57 /usr/sbin/httpd -k start -DSSL
11380 root 20 0 15292 1384 828 R 0.1 0.0 0:00.05 /usr/bin/top -c -b -n 1
58 root 20 0 0 0 0 S 0.1 0.0 0:24.16 [kblockd/0]
59 root 20 0 0 0 0 S 0.1 0.0 0:37.91 [kblockd/1]
10097 apache 20 0 309m 31m 4488 D 0.1 0.8 0:26.68 /usr/sbin/httpd -k start -DSSL
10116 apache 20 0 309m 78m 4504 D 0.1 2.0 0:07.93 /usr/sbin/httpd -k start -DSSL
10142 apache 20 0 306m 24m 4356 D 0.1 0.6 0:29.32 /usr/sbin/httpd -k start -DSSL
10148 apache 20 0 237m 16m 4604 D 0.1 0.4 0:04.00 /usr/sbin/httpd -k start -DSSL
10150 apache 20 0 306m 44m 4288 D 0.1 1.2 0:11.45 /usr/sbin/httpd -k start -DSSL
10168 apache 20 0 246m 30m 5608 D 0.1 0.8 0:05.00 /usr/sbin/httpd -k start -DSSL
10172 apache 20 0 306m 34m 4356 D 0.1 0.9 0:10.84 /usr/sbin/httpd -k start -DSSL
10180 apache 20 0 306m 40m 4356 D 0.1 1.1 0:11.87 /usr/sbin/httpd -k start -DSSL


================================
Automated Message Generated by DirectAdmin
 
Last edited:
I also received about 3 emails with the following warning while this was going on:

Warning: file(top.raw): failed to open stream: No such file or directory in /usr/local/directadmin/plugins/load_monitor/scripts/add.php on line 6

Warning: array_slice() expects parameter 1 to be array, boolean given in /usr/local/directadmin/plugins/load_monitor/scripts/add.php on line 59

Warning: Invalid argument supplied for foreach() in /usr/local/directadmin/plugins/load_monitor/scripts/add.php on line 60
rm: cannot remove `top.raw': No such file or directory


Thanks much in advance!!!!
 
Capture.JPGHere's the chart if that helps. I welcome all input. please!

I just want to know what caused the server load to go so high that it would not respond to any requests.

Please help.

Thanks,
-Joe
 
I've been suffering the same symptoms for months now, out of nowhere the server load just explodes, consuming all resources, rendering the server useless until someone can restart it. Even moved server providers as previous suppliers were able to offer no assistance, but the problem has followed me to the new server :(

Here are my outages since moving to a new server:

Warning: The system load average is 11.25 04/26/2013
Warning: The system load average is 10.48 06/20/2013
Warning: The system load average is 12.97 06/22/2013
Warning: The system load average is 15.77 06/26/2013
Warning: The system load average is 11.32 07/12/2013
Warning: The system load average is 33.19 07/18/2013

The TOP output included in the system message nearly always contains the following command and so does csgo's (that's what I search for to find this thread):
/usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/my.domain.net.err --pid-file=/var/lib/mysql/my.domain.net.net.pid

In my case the above command has nearly always been running for 100 - 300 minutes.

Any tips on where I should start looking to find out what causes these extreme loads would be most appreciated.

P.S. Why doesn't DirectAdmin's system messages include a time? Would be really handy to know exactly what time these outages occur, anyone know where they are stored on the server so I could access a timestamp directly from the files?
 
DOH, on clearing my head I realised the mysqld command is purely mysql, which of course is going to be running for a long time and tends to show up in these instances as mysql is struggling to deal with the lack of resources when the server is under load.
 
In my case it appears the CPU% is low, but the server load is extremely high. None of the processes listed seem unusual, but maybe there are too many Apache processes running?

Is there a way to force the Apache service to restart on a regular basis to clear the processes?

I'm just grasping at straws.

Thanks,
-Joe
 
Is there a way to force the Apache service to restart on a regular basis to clear the processes?
By default it restarts daily after the tallies are complete.

Maybe you need to do some optimisation with Apache and MySQL
 
To check if the MySQL process is the source of the high load on your server, you can use a tool such as "mytop".
Install it and run it at the moment you are experiencing the high load. It will show you all the queries that are running at that specific moment.
 
Back
Top