Testing the .tar + rsync backup

Arieh

Verified User
Joined
May 27, 2008
Messages
1,308
Location
The Netherlands
It has been a while since the backup_gzip=0 feature has been added, and since I haven't really found any threads about it, I decided to test it myself.

I've done some testing and from what I've seen so far it looks really good.

What I've done is the following:

- having a test VPS with a few sites on it (1 user with magento, 1 user with wordpress, other, and admin) - 3 cpu cores
- the rsync script

Steps:
- made .tar.gz backups to see how long it takes, and how much space it requires
- made .tar backups to see how long that takes, and how much space it requires
- rsyn'd the .tar files
- modified some files, database to get a different backup
- made .tar backups again
- use rsync again

This is the result:
.tar.gz backups:
80 MB

.tar backups:
213 MB

- Creating .tar.gz files took about 3 minutes.
- Creating .tar files was under a minute, xx seconds

First time rsync:
Code:
sent 91 bytes  received 83176231 bytes  3539417.96 bytes/sec
total size is 222105600  speedup is 2.67

Second time rsync:
Code:
sent 152532 bytes  received 129556 bytes  26865.52 bytes/sec
total size is 221501440  speedup is 785.22

In conclusion:
Though the .tar files take up more space, they're way faster done. Because its not compressed, it can be rsynced and only changes will be transfered; saving time and bandwidth - yet you also have retention. Of course I tested this on a very small scale, but its supposed to work the same on larger numbers.

Is anyone using something similar like this, or has any thoughts on it?
 
If i dont remember bad there is a way to gzip a file with an option that allow rsync to "notice" just the differences.

Dont remeber what flag it is, but i remember it exist and could be helpful to your tests i suppose.

Regards
 
I've googled it quickly and there seems to be a --rsyncable parameter for gzip. It would then be a matter of choosing between <faster speed/less cpu+more disk space> or <slower/higher cpu usage+less disk space>. Which one to use would be depending on how much you have to backup and if disk space is a problem.

Still I wonder if theres someone here who uses something else then transferring the whole admin backups completely.
 
I've ran a backup on a production machine using .tar. The speed I thought it would give unfortunately isn't that high.

CPU
http://i.imgur.com/asvve.jpg

IO
http://i.imgur.com/QrSG6.jpg

The time it takes for the .tar.gzs is 26 minutes vs 24 minuts for .tars. Altough the .tar.gz backups are made at night and I just made these .tars at evening, it shouldn't really have that big an influence as the server has enough capacity. Perhaps SSD drives or a big RAID array would change things.

total .tar.gz backup size
7.3 GB

total .tar backup size
12 GB

In my case I think initially the .tar.gz files are preferred since the cpu usage isn't a problem and it requires less disk space. However I've looked into the --rsyncable, and it has been discussed a few years ago: http://www.directadmin.com/forum/showthread.php?t=33357 - but it isn't possible.

So I guess the .tar + rsync is the way to go then.
 
Hi Arieh,

Although I didn't set it up myself, in the past I have used rsync backups using the software/package BackupPC on a backup server (linux). I believe it does compress the files (certainly on the backup server itself) and it does incremental backups. Per-file restoration was therefore possible (and easy), which was fantastic for 'everyday' use (user mistakes...).

Bye
 
Thank you harro, I've also thought of rsyncing all files, but I rarely need the single file restore and I need the full admin backups in case the server is ruined.
 
Reasonable. I did read the suggestion in another thread about rsync / backup, to rsync files across and then use the same tar/gz script that DA uses to build the backup tar.gz userfiles. Adds a process on the backup server, but might even be link-able with the BackupPC tar script. i.e. maybe you/one can adjust the BackupPC tar/gz script so that it stores the files exactly as DA wants them.
 
Well that does sound like some trouble. The .tar backups + rsync [incremental+retention] I've tested works already, only if it would be possible to add the rsyncable to gzip it would be a nice addition.

What you suggest sounds like a ideal situation, but I'm afraid it would require much time.
 
But, what about disable gzip on da backups and with backup post scripts gzip it with rsyncable option and than rsync?

Regards
 
I've added the .sh script and tested it with 1 small user, now Ill wait what happens on the daily backup. Though there is a catch to this solution, the home.tar stays home.tar and when you restore it it probably needs to be unzipped manually first.
 
The backup has ran and all are gzipped as expected. But at the end it does require more capacity, since it first needs to .tar and then .gz after instead of both in 1. Now both IO and CPU are pressured instead of only IO or only CPU. Duration now is 32 minutes. It would be most efficient if it were possible to add the rsyncable parameter in DA. But I guess I'm happy enough with the current options.
 
If i dont remember bad da backup are made .tar.gz directly with tar command, there is an option for tar to add resyncable option on the "gzip" part?

Are you now using the post backup script for gzip rsyncable?

Regards
 
I guess I didn't think that trough, I was merely continuing the thought in the other topic suggesting the same thing. But now that I've googled it it seems to be possible with an environment variable.

Code:
export GZIP="--rsyncable"

would effect the gzip triggered by tar.

I'm not too familiar with environment variables, they should be placed in a .profile file, but I'm not sure how this needs to be done with the DA backup.

Right now I've made user_backup_post.sh with

Code:
#!/bin/bash

gzip --rsyncable $file

exit 0;

Perhaps I need to make a user_backup_pre.sh with the export variable?

I've found this about cpanel, which is about the backups as well: http://blog.configserver.com/index.php?itemid=190 - there the cron runs both the export variable and triggers the backup I guess.

I think for this to work it depends on if export variables set by the pre.sh script are still at work when the .tar command is used by the backup.
 
Yes with _pre should work.

But, DA Staff suggested an additional flag that can be put in directadmin.conf: http://www.directadmin.com/forum/showthread.php?t=33357&p=167957#post167957

If that is the same variable for tar, so, the option suggested by DA Staff should work aswell (i think), have you already tryed this?

As they say, the command would be like: tar czfp /path/to/user.tar.gz --rsyncable -C /backup/location/username backup -C /home/username domains

If Tar accept the --rsyncable option as gzip should, so, the gzip file should be made with that option.

Regards
 
Yes but a few post below that the reply is that the rsyncable doesn't work for tar. I just tried the line and says the same error. tar: unrecognized option '--rsyncable'

I've tried the following pre script

Code:
#!/bin/bash

export GZIP="--rsyncable"

exit 0;

But it gives an error at backup

"Script output: /usr/local/directadmin/scripts/custom/user_backup_pre.sh"

I believe it means that it gave output other then 0. I'm not sure if its possible this way?

edit: wait a minute I forgot to chmod the file :cool:
 
Last edited:
Well now the error is gone, but it doesn't work. With the pre script it the rsync speedup is only around 1 while with the post script (with the manual gzip) its speedup is 1598.82.

edit:
Was testing around with and withoud the GZIP var

GZIP="--rsyncable" tar czfp tartest.tar.gz tartest/

and it does work, so now its only the trick to get this var at the backup.
 
Last edited:
Hello,

I could add code into DA to make it an env, but first see if the following works:

1) Edit /etc/init.d/directadmin

2) Make the start) section look like this:
Code:
start() {
        echo -n "Starting DirectAdmin: "
        [b]export GZIP="--rsyncable"[/b]
        daemon $PROGBIN
        echo
        touch $PROGLOCK
}
basically, just inserting the env var before DA starts up. In theory, it will propagate down to tar.
Restart DA after making the change.

If it doesn't, let me know and I can load it into the env before tar is called (perhaps with an on-off switch in the directadmin.conf).

John
 
Correction, that won't work.. do it to the dataskq, not directadmin.

1) Edit:
/etc/cron.d/directadmin_cron

2) Make the dataskq call look like:
Code:
* * * * * root GZIP="--rsyncable" /usr/local/directadmin/dataskq
and restart crond.

John
 
Back
Top