How to find load spikes cause

Anne

Verified User
Joined
Dec 3, 2015
Messages
72
Hi,

I keep a close eye on the load of the servers. Therefor I installed the 'load monitor' plugin and that is doing a nice job. However, when I look at that graph, I see some high spikes a couple of times a day. What I don't understand is this. When you click on that spike/time you go to the 'process overview' page of that exact moment.

There I see for example:

load average: 6.09, 1.27, 0.41

Now this is quite a big spike to me, maybe others are not impressed?

However, in the list below that I see all the processes, but almost all with 0% CPU and say systemd with a 6.2% CPU for example.

Do I misinterpreted the load average? 6.09 does not mean 6.09% but more or less 609% CPU use right?
Why is it than that I don't I see one or more processes that use 100% CPU at that time?
 
Some general articles
The numbers are read from left to right, and the output above means that:

  • load average over the last 1 minute is 1.98
  • load average over the last 5 minutes is 2.15
  • load average over the last 15 minutes is 2.21
High load averages imply that a system is overloaded; many processes are waiting for CPU time.
So for 1 minute it was 6.09 over the last 5 min it was 1.27 and over last 15 min it was .41

In general you would need to calculate based on the cpus you have.
 
A load of 6.09 actually means that ON AVERAGE in the last minute 6.09 processes where waiting to get cpu time. Most likely some ( can be one, can be more) processes are using all cpu time. A backup doing a gzip on a 1 core vm would easily do that. A bad sql query cou also hog all available time for a while. Too few cores, slow disks, congested internet, vm migrations, bad scripts all are easily capable of raising the load on a system. But it's not calculated by added all '%' together.
 
Hey,

Thanks a lot for the information, it makes it a little more clear for me. However, I could not find why the average was so high and the processes gave me nothing.

But I found a clue. By clicking on the timeframe just before this spike, I do see the processes that cause the high load. And then this is reported in the next minutes of course.

So if other people use this load plugin, don't click on the spike, but just before it, this gives you more info.

Thanks you!
 
Back
Top