Extremely poor handling of backup, system completely unresponsive, load spike to 900!

chasebug

Verified User
Joined
Aug 31, 2009
Messages
18
While backing up my site of only 400MB, I noticed with top c command that the user making the backup has tons of processes open and it keeps building up more and more. When I try to access the site, it gets stuck at waiting for... and does not even timeout even after several minutes.

Then Apache starts opening hundreds of processes and the load spikes to a crazy average until the backup is complete.

During the backup, I can see with top c command that the user root have many processes running and among them are 2 processes running Gzip and Tar. Gzip is using 99.9% of CPU, nice of 19. There is also a dump process running for MySQL. All these combined brings my server to over 300 load average even going up to 900 until the backup is complete.

I know that backups put extra stress on the system but this is only a 400MB site with most of the data being the database, not the entire server! This is not very efficient for making backup if the Gzip completely makes the server and sites inaccessible during backup which can take as long as 15 minutes for a 400MB site.

This is a dual core Xeon server with 2G of RAM.
Average load when not making backup is under 1, during backup the load is 900!
 
I've never seen anyone else report this, so my guess would be that there's something in your system causing the problem. Which backup process are you using? (DirectAdmin has several.)

Jeff
 
I used the one that backs up all the files, emails, and database.

When I backup small sites that are under 50M, there is no problem that I could notice. If the site is bigger, the server is unusable while doing a backup. I think the reason this problem is not reported is because:

1. Many people's sites are small
2. They may not notice the problem because they are not logged into SSH while performing the backup and the directadmin panel still responds although very slow so those users might not think much of the slowness
3. Those users do not check their sites while backup is in progress
4. Automated backup is done so they don't really check the system while backup is in progress

Has anybody who has a site with hundreds of MB of database data tried to do a backup and check the server load?
 
Hello,

The load values reported do seem very high.

Note that I have seen older versions of gzip get into an infinite loop if they run out of disk space.. so check to ensure that there is plenty of space on disk, as well as the system quotas for the User being backed up are not full. (DA does set quotas to unlimited during this window of time, but we'll check it anyway):
Code:
df -h
quota -v [b]username[/b]
Also check the mailboxes for the Users in question. Ensure the Maildir folder for each account is not over sized. Mailboxes that are never checked can end up with thousands of small emails that don't take up much space, but take longer to backup due to the overhead of all of the filenames.

John
 
The problem is the website is not accessible during backup. I monitored with the top c command for this user the processes, for example, more and more /home/xxxxxx/public_html/cgi-bin/script.cgi which is my site, keeps opening immediately after I start the backup. Usually I would see the script.cgi show up but the process would end very quickly. Instead it does not end and so I soon see the entire screen full of script.cgi running. The processes begin to go from 150 to 1500+ and load spikes to over 900+ until the backup finishes.

Code:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      222G  208G  2.9G  99% /
/dev/sda1              99M   12M   82M  13% /boot
tmpfs                1014M     0 1014M   0% /dev/shm
/dev/sdb1             230G  216G  2.5G  99% /home2

Code:
Disk quotas for user xxxxxxxx (uid 509):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/VolGroup00-LogVol00
                  80240       0       0            4950       0       0
 
Hello,

Your disk is 99% full. That's most likely why gzip is going crazy as I previously guessed. Free up more space so that you have more space to work while creating backups.

To find out what files/folders on your disk are using the most space, type:
Code:
cd /
du -x | sort -n
Note that this command will take quite a while to run as it checks every file on your system. You could also do "cd /home" or "cd /var", etc.. instead of "cd /" to search just those folders to speed it up.. if you think the usage is more likely in those folders.

John
 
I don't think this is result of low disk space. The disk space had over 50G free when I was doing the backup. What you see now is a result of me using up the space to move some files from a VPS I am cancelling.
 
Where is DirectAdmin support?

I just tried to do a backup again and getting the SAME problem. It is NOT a disk space problem, I have over 100G free space. It seems the culprit is either Gzip or MySQL. The MySQL database is almost 500M, when backing up sites that have a MySQL database of under 50M, I do not see this problem.

I was logged into my server by SSH anticipating this same problem again. Here is what happened:
1. Started backup of everything including database under user "myuser"
2. Immediately began monitoring using top for high CPU usage and high memory usage
3. Using top, I noticed ALL the Perl based sites are NOT responding, the processes keep increasing (I can see the script.cgi fill up the screen)
4. At this point, total processes as reported from top are over 1000, load is at 75
5. Tried to access a site that does not use MySQL, loads fine except took about 5 seconds
6. Tried to access a site that uses Perl and MySQL, I get "Gateway Time-out" error (see below)
7. Tried to access the same site again, I get error about cannot connect to MySQL
8. Now the load is at over 500, I stop Apache, number of processes decrease but load does not go down
9. I stop MySQL (took over 5 minutes to stop) the ........ filled 5 lines!
10. As soon as MySQL is stopped and restarted, the site that uses Perl and MySQL are running fine
11. I get an email saying "An error occurred during the backup"

Additional Details
Code:
Dual Xeon 3 Ghz
2G Ram
MySQL database is about 500M
Site files is under 5M

Apache Error
Code:
Gateway Time-out

The gateway did not receive a timely response from the upstream server or application.
Apache/2 Server at mysite.com Port 80

Backup Error
Code:
Error while backing up database mysite_mydb
mysqldump error output: mysqldump: Error 1053: Server shutdown in progress when dumping table `MyTable` at row: xxxxxxx

My.cnf
Code:
[mysqld]
local-infile=0
skip-locking
query_cache_limit=1M
query_cache_size=64M
query_cache_type=1
max_connections=500
interactive_timeout=100
wait_timeout=100
connect_timeout=10
thread_cache_size=128
key_buffer=128M
join_buffer=1M
max_allowed_packet=16M
table_cache=1024
record_buffer=1M
sort_buffer_size=4M
read_buffer_size=4M
max_connect_errors=10
# Try nummber of CPU's*2 for thread_concurrency
thread_concurrency=4
myisam_sort_buffer_size=64M
server-id=1

max_heap_table_size=64M
tmp_table_size=64M

[safe_mysqld]
err-log=/var/log/mysqld.log
open_files_limit=8192

[mysqldump]
quick
max_allowed_packet=16M

[mysql]
no-auto-rehash
#safe-updates

[isamchk]
key_buffer=64M
sort_buffer=64M
read_buffer=16M
write_buffer=16M

[myisamchk]
key_buffer=64M
sort_buffer=64M
read_buffer=16M
write_buffer=16M

[mysqlhotcopy]
interactive-timeout
 
I just made a backup without the MySQL database. No issue!

I think the culprit is clearly MySQL and possibly Gzip while it is compressing the database.

Support?
 
How much space does the database take up?

When you see all those processes, which programs are using the most processes?

Jeff
 
The database is about 500M. I can't tell what other processes are running, when I view the top processes of the user I am doing the backup, the script.cgi take up the entire screen by CPU%, Memory, and Time.

I just tried to do a [check]/optimize/repair from the control panel on the database and the load went up to a couple hundred again.
 
1. Check mysql error logs. Perhaps your database is crashed.
2. Try to backup that database manually with mysqldump with and without gzipping.
 
Is the database being used during the making of a backup? Because MySQL tends to lock tables temporary during a backup, and if any queries are performed at the same time that require writes to your table, MySQL will just put them in queue.

And, if MySQL puts those queries in queue, and you don't have a timeout on your Perl scripts that perform those queries, Perl is the culprit, and will cause your load to spike. You reported over 1000 processes, probably all Perl scripts for your site?

I don't think DA has to do much with this problem, I think it is MySQL locking tables, and your Perl scripts that don't "accept" that. If you do an optimize/check/repair, MySQL locks tables to!

So check the Perl scripts for your website, and make sure you put a timeout on them, or let them handle INSERT/UPDATE/DELETE queries to MySQL special.
 
Where is DirectAdmin support?

This is a user support forum. It is not DirectAdmin's official means of support. Perhaps you should contact them directly if you want support directly from them.
 
Back
Top