Testing the .tar + rsync backup

Well it works, but only transferring the changes takes a bit longer vs downloading the whole thing

I've ran 2 full backups with about an hour apart to make sure there are some changes in multiple accounts.

First time it took 11 minutes and:

sent 1326 bytes received 7124224174 bytes 11426183.64 bytes/sec (6.6 GB received = total backup size)
total size is 7121827697 speedup is 1.00

Then the second time where it can actually sync it took 16 minutes

sent 3388308 bytes received 14291534 bytes 18503.24 bytes/sec (14 MB received -> only the changes)
total size is 7121886967 speedup is 402.83

So it saves a lot of traffic but the time increases. Its understandable because it needs to check both versions on differences I guess. The pressure point during the sync seemed to be on the pull backup server, so it maybe depends on the specs of that machine. Also I don't know what the full/incremental difference will be on larger numbers.
 
Why don't u just tar on the server, and rsync + gzip from the remote backup server ?
 
I'm not too sure, but I think it may be the DA server. During the rsync process it has 2x20% CPU and at the backup server 1x20 and 1x10. But, the DA server has way more cores. But anyway the CPUs were not maxed out.

I think the bottleneck would be the disks, I see some spikes on IO in the graphs on DA but I don't have the stats @ backup. Maybe with SSD drives the incremental backups are done in no time :)
 
Why don't u just tar on the server, and rsync + gzip from the remote backup server ?

Maybe, but .tar.gz vs .tar on DA doesn't make much difference and .tar seperatly from gzip (may or may not be on a diff server) takes more time then .tar.gz in 1 time.

But it could be an option if the .tar rsyncing would be faster then rsyncing .tar.gz - I don't know if that may be the case?
 
I'm just starting to test backing up files on a remote server without using tar or gzip on teh webservers themselves.
Currently I only tar the userbackups without their home directory.
And asside from that I rsync their /home directory and tar.gz those on the backup server.

For the database I'm thinking between shadowing or slave server.
But if u think abt shadowing, u could just do everything at once.
 
I've heard of backup setups like these. Though it may be more vulnerable to error - as instead of one point its spread out to about 3 with each problems of their own, the system load is evenly spread out, and you have more backup points and in case of mysql nearly real time. Perhaps I will try something similar as well later.
 
Now that we know it'll work, and that John will change the coding in DirectAdmin, hopefully, Arieh, you can post a summary of the reasons to do it and also the pros and cons.

And for everyone else: My understanding is that the export command will create the environment variable in the parent bash environment, so all new bash instances called at this level will inherit it, even after the current bash is exited, while the set command will only create the environment variable in the current running bash environment; as soon as it ends, the environment variable disappears.

Yeah, I know, it's confusing to me, too, and I stopped using set a long time ago.

But this may help:
Code:
[root@linux ~]# x=5                       # here variable is set without export command
[root@linux ~]# echo $x
5
[root@linux ~]# bash                      # subshell creation
[root@linux ~]# echo $x                   # subshell doesnt know $x variable value

[root@linux ~]# exit                      # exit from subshell
exit
[root@linux ~]# echo $x                   # parent shell still knows $x variable
5
[root@linux ~]# export x=5                # specify $x variable value using export command
[root@linux ~]# echo $x                   # parent shell doesn't see any difference from the first declaration
5
[root@linux ~]# bash                      # create subshell again
[root@linux ~]# echo $x                   # now the subshell knows $x variable value
5
[root@linux ~]#
(above adapted from: http://www.unix.com/solaris/161775-export-vs-env-vs-set-commands.html)

Jeff
 
Its hard to say what the pros and cons really are. I think it depends on how much you have to backup and how fast your disks are.

The pro is that it only transfers the changes, so there's way less traffic. Con is that it takes longer. But thats on my setup.

How it will be for servers with other specs or larger backups I just can't say. The easiest way to find out is just trying it I guess, it only requires adding the rsyncable to cron and then use the rsync command or use the script I linked to in the starting post. Maybe someone already knows more about how rsync works and what the bottlenecks really are. With just monitoring the stats its not easy to say.

Things to consider in general:
- How much GB do you have to backup
- How fast are your disks (maybe on both server and backup, you would think the comparing part happens on both servers?)
- + what I can think of:
- + It has data known, it needs to check for differences in the new backups - that takes disk usage on DA server
- + it needs to copy from the previous backup to the new + changes - that takes disk usage on backup server
- How fast is the connection between server <> backup. If its slow rsync definitly would be the best option, but most people will have it in the internal network or its between datacenters.

If anyone hasn't read the whole topic, in short this is what I've been using:
- You add GZIP="--rsyncable" to the cron (see the post by John)
- rsync script is on the backup server, it will connect trough ssh keys to the DA server
- the script uses 8 directories to keep 8 days of retention (or eight times you run the script)
- it uses the rsync parameter "--link-dest=../backup.1" to use the previous backup as source, but creates a full new one in backup.0 (backup 0 to 6 are being moved one number up, 7 gets deleted, of course you can modify it all to your needs)
 
Thanks. For me, we're doing automatic DirectAdmin backups into an unmetered network, so I prefer less time. I'm leaving it alone, but for doing it across a metered network, I could see trying it.

Jeff
 
No the rsyncable doesn't work for tar, only for gzip. Or, if set as ENV, it will work on the gzip part of tar.
 
I added the rsync command to my datatask and have used the rsync script on the backup server but each night I receive the entire contents of the admin backup i.e 4gb and not ancremental change.

I suspect that the * * * * root GZIP="--rsyncable" /usr/local/directadmin/dataskq is either not working, although the backups are created or I am missing a step.

Should I be using backup_gzip=0?

Jon
 
Can anyone help me with my issue of the full download being pulled each day rather than the incremental changes.

I have * * * * root GZIP="--rsyncable" /usr/local/directadmin/dataskq in my crontab and am using the rsync script each day (listed on the front page which is working). Running centos 5 and gzip version 1.3.5 on the directadmin server.

Jon
 
Post with 10mins time step will not cause people to reply faster, keep in mind that this forum is a user-to-user help forum.

Sayd that, the crontab should work as described, if files are created i suppose that the backup is working fine.

You should check if there is a gzip option for "check" if a zipped file is made with rsyncable option and also read directadmin logs, and rsync logs.

There should be something that is not working, but.. apparently the cron line should actually work...

I still dont know if the "GZIP" env value has been put as standard in DA, maybe John should clarify this

Regards
 
Thanks but if you look at the post dates you will seee that there is good eight days between the posts!

Will ask da for clarification

Jon
 
No Worries ;-)

Looked into system.log and the 2012-Oct-04.log but not sure there's anything - what should be there if it is working and I'll grep the log files?

Jon
 
Back
Top