Memory drain while running sysbk

mart_nl · May 31, 2012

Hi,

Server Debian 6.0.5 (squeeze)
Directadmin 1.40.3
Intel(R) Xeon(R) CPU L3406 @ 2.27GHz / 16 Gb mem

We are currently facing some unusual behavior. This server is a fresh install. Clean install and then Directadmin. Server is functioning normally for it's day to day work.

Status right now:

Code:

top - 13:54:39 up 4:17, 1 user, load average: 0.09, 0.08, 0.12
Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4%us, 0.2%sy, 0.0%ni, 98.0%id, 1.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16464932k total, 1060140k used, 15404792k free, 70368k buffers
Swap: 32156664k total, 0k used, 32156664k free, 501144k cached

However, each night @ 02:00 a system backup runs (DA System backup -> cron). It includes user home directories. Though this is supposed to start at 2 am, we experience major memory loss at around 9:15 am in the morning, EACH morning.

Today DA send me this alert:

Code:

This is an automated message notifying you that the 5 minute load average on your system is 11.06.
This has exceeded the 10 threshold.

One Minute      - 14.56
Five Minutes    - 11.06
Fifteen Minutes - 7.77

top - 09:29:02 up 1 day, 23:02,  1 user,  load average: 14.56, 11.06, 7.77
Tasks: 191 total,   1 running, 190 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.2%sy,  0.1%ni, 96.9%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16464932k total, 16373000k used,    91932k free,   386808k buffers
Swap: 32156664k total,        0k used, 32156664k free, 14852968k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
7960 root      36  16  4092  732  380 S   36  0.0   1:29.21 gzip

We've investigated this problem and learned that if we killed the PID of the sysbk memory leakage will stop. But will never come back to it's normal levels.

Also; we noticed that no PID / process ever uses more than 1% memory. Even when free memory is as low as shown above (dropping down from 16 Gb !) there is not even one service or process ever consuming more than 1 or maybe 2%.

We are at a loss here. I've used the DirectAdmin scripts to install the latest updates.

The only thing we know at this moment for sure:

- Memory remains steady at 14 or 15 Gb free memory during the whole day around
- Each night at 02:00 the backup is supposed to start
- Each morning everything is fine until around 9:15 am when memory starts to drop fast and keeps on falling until sites become unresponsive
- Each morning we notice "gzip" as most active process at the time of the memory leakage
- Memory leakage stops right after killing PID of "sysbk"
- Memory doesn't "come back" and remains steady at around 90 Mb free memory

Please advice !

Kind regards,
Martin Koppelaar

scsi · May 31, 2012

You should use admin level backup instead.

mart_nl · May 31, 2012

scsi said:
You should use admin level backup instead.

Thank you for your quick reply.

Can you elaborate on that ?

I can see two backup options, "System backup" and "Admin backup/transfer". You refer to "Admin backup/transfer" when you say "Admin level backup" ?

Our goal (much like anyone else) is to have a backup to restore the server (DirectAdmin, Dovecot, Exim, dns data etc. etc.) in case of major malfunction.

Also, we create backups to restore user files in the event some user messes up his or her website.

Our impression was, "System backup" with option checked for "add user directories to list below" was the safe and best way to go.

But in particular, how come "System backup" would drain your memory down to close to nothing, and why would "Admin level backup" not do that ?

As this was the topic of my question. Why do we witness 14 Gb available memory going down to a few Mb's in minutes, while at the same time "top" tells me only gzip is using about 1% of memory.

Also; why doesn't the memory "loss" goes back to normal (about 14 Gb) after the process is done / killed etc.

And finally ...
During this reply I started the Admin level backup. The problem remains exactly the same as with "System backup". It dropped from 14 Gb when starting this reply to 4 Gb at this very moment. While the only process really eating memory is mysqld (from 0% -> 1.1%) and gzip (28% cpu and 0% mem at this moment).

Thanks,
Martin

nobaloney · May 31, 2012

I'll try to elaborate a bit:

In no particular order:

* system loads of 14 are not of themselves bad. What the load actually refers to is the number of processes awaiting execution at any given time, averaged during the last 1, 5 and 15 minutes. If you've got a lot of cores and a fast processor or processors, the loads can get higher without them impacting server performance. Your processor has two cores and isn't that fast, but you don't say you're seeing any performance degradation.

* The problem with sysbk is that it has no restore command and it doesn't group anything by user or reseller, so it leaves you on your own if you ever need to restore anything. The Admin level Reseller backup/restore command doesn't backup system internals or even DirectAdmin, but only users and resellers. But it does have an excellent restore function. While both are available in DirectAdmin, I think most of us use the latter.

* You're not using as much memory as you think; take a look here (linuxatemyram.com). Read and understand the contents of this page; it will tell you why you don't get the memory back, and while you don't need it back. That you're not using any swap memory is proof that you've still got lots of RAM.

The remaining question for you is: Do you actually see any service degredation?

Jeff

mart_nl · May 31, 2012

nobaloney said:
I'll try to elaborate a bit:

In no particular order:

* system loads of 14 are not of themselves bad. What the load actually refers to is the number of processes awaiting execution at any given time, averaged during the last 1, 5 and 15 minutes. If you've got a lot of cores and a fast processor or processors, the loads can get higher without them impacting server performance. Your processor has two cores and isn't that fast, but you don't say you're seeing any performance degradation.

* The problem with sysbk is that it has no restore command and it doesn't group anything by user or reseller, so it leaves you on your own if you ever need to restore anything. The Admin level Reseller backup/restore command doesn't backup system internals or even DirectAdmin, but only users and resellers. But it does have an excellent restore function. While both are available in DirectAdmin, I think most of us use the latter.

* You're not using as much memory as you think; take a look here (linuxatemyram.com). Read and understand the contents of this page; it will tell you why you don't get the memory back, and while you don't need it back. That you're not using any swap memory is proof that you've still got lots of RAM.

The remaining question for you is: Do you actually see any service degredation?

Jeff

Hi Jeff,

Nice, I've been reading your posts quite a lot around here

Seems like your right on the money when talking about lost memory:

Code:

                 total         used       free     shared    buffers     cached
Mem:         16079      15903         175          0        553      14096
-/+ buffers/cache:       1253      14826
Swap:        31402             0      31402

Memory leaking;
Okay this seems exactly the issue you refer to. I'm glad !

Backing up in DA.
It's all clear now. Actually what you say is, in case of any big emergency needing to fully rebuild your box you should format, install OS, run DirectAdmin script setup your server like you did the first time, install all 3rd party plugins and tools you want (again) and sit back and let DirectAdmin recover your users, resellers etc.

Right ?

Should you be in a hurry and need basically a full recovery of everything, either have some sort of snapshot, diskimage created or user System Backup next to Admin level Backup to fully restore DA / tools + users and mail and such.

It's just the 2 dots that need connecting for me why never "loose" any ram to buffers and cache and "loose" it as soon as sysbk or Admin level backup (zip) kicks in.

Oh and not to forget, why any backup scheduled for 2am kicks in at 9:15am and that those sites get unresponsive. But that might be just the CPU being very busy with zipping up all files. So being unresponsive is perhaps after all not low on memory but low on CPU.

Thanks !!
Martin

nobaloney · Jun 1, 2012

mart_nl said:
Backing up in DA.
It's all clear now. Actually what you say is, in case of any big emergency needing to fully rebuild your box you should format, install OS, run DirectAdmin script setup your server like you did the first time, install all 3rd party plugins and tools you want (again) and sit back and let DirectAdmin recover your users, resellers etc.

That's what I would do.

Should you be in a hurry and need basically a full recovery of everything, either have some sort of snapshot, diskimage created or user System Backup next to Admin level Backup to fully restore DA / tools + users and mail and such.

You could, but then you wouldn't be taking advantage of the latest version of your OS, and even possibly software versions, as the sysbk backup, if you're going to have good results, should be done either on a system set up with the same versions of everything, or else, slowly and methodically, not restoring any file until you know it will work in an updated environment. The one time in our business history we had to rebuild a totally broken system was before the DirectAdmin Admin Level backup existed; we had only sysbk backups, and restoring was difficult and timeconsuming. We would have saved a lot of time if the DirectAdmin Admin Level backup had existed. In fact, when it came out we tested it on the same reserver; it took less than 1/3 the time; the savings was many hours.

It's just the 2 dots that need connecting for me why never "loose" any ram to buffers and cache and "loose" it as soon as sysbk or Admin level backup (zip) kicks in.

Read that link again and think about it. Over time various programs use memory. That's when they show up in buffers and cache. While some may get released over time, if the system determines releasing it would result in more efficiency, it's beyond us mere mortals to figure it out

. (Feel free to study the kernel source code if you'd like

.)

Oh and not to forget, why any backup scheduled for 2am kicks in at 9:15am and that those sites get unresponsive. But that might be just the CPU being very busy with zipping up all files. So being unresponsive is perhaps after all not low on memory but low on CPU.

We don't know what time the backup actually starts; I'm going to presume it starts at the proper time, but if you think it may not, then you can manually adjust the cronjob which starts it to send you an email just before it actually runs; that would tell you what time it actually runs.

As far as what causes the unresponsiveness, yes, that could be caused by the processor not being fast enough, though unless we actually study the machine minute by minute, we'll never know exactly why. My recollection is that the DirectAdmin Admin level Reseller backup does run slowly, but you can change it's priority in the system; search these forums and the knowledgebase.

Jeff

Memory drain while running sysbk

mart_nl

Verified User

scsi

Verified User

mart_nl

Verified User

nobaloney

NoBaloney Internet Svcs - In Memoriam †

mart_nl

Verified User

nobaloney

NoBaloney Internet Svcs - In Memoriam †