Strange Error During Backups

roarkh

Verified User
Joined
Aug 30, 2005
Messages
139
Location
Bellingham, WA
Hello,

I have a DirectAdmin server that has been running perfectly since late last year, I have it set up to do Reseller level backups of each domain every night. The backups are stored on an ftp server on our local network that is accessed by our DirectAdmin server through a port forward set up on our router.

On Saturday morning I replaced our router with a newer model of the same brand. After transferring the configuration over I logged on to our DirectAdmin server and used ftp from the command line to verify that the port forward for ftp was working properly, it allowed me to connect just fine so I assumed all was good.

However, when the backups ran that night that I got an email that "An error occurred during the backup." I logged in to look at the detailed message and saw that 9 of 10 users backed up just fine, the user that triggered the error is the 7th of the 10 users and is by far the largest (that user's backup file is about 45gb). Here is the message...

Code:
Could not read reply from control connection -- timed out.
ncftpput /home/tmp/admin/<username>.tar.gz: could not send file to remote host.

Now for the "Strange" part. I took a look at the folder on the server where the backups are sent to. The file that it said "timed out", <username>.tar.gz, was there and it was approx. the size I would have expected. I copied the file to my desktop and ungzipped, then untarred it, and as far as I can tell the backup is there and intact.

Clearly it would seem that swapping out our router had something to do with causing this issue so it is probably not specifically a Directadmin problem, however I am very confused about why I would be receiving the above error message if the backup is actually completed.

Does anyone have any tips on how I might troubleshoot this issue? I'm not sure if I should trust the backup I have or not. If the backup was really corrupt would I have been able to unzip and untar it successfully? I was under the impression that DirectAdmin created the backup file locally and then copied the file to the remote server, if this is the case and the file is intact it seems that the error message may be bogus, but I am not sure. I realize there is perhaps not a lot to go on here but if anyone has any ideas about how this could happen I would appreciate hearing them.

Thanks in advance.
 
If you backup to your local netwerk I presume you're using NAT somewhere, in which case you might be running into nat timeouts. You might be able to change this by echoing new timeoutvalues to the /proc/sys/net/ipv4/tcp_keepalive_* settings, but because you switched routers you might want to look for equivalent settings in your router first.
 
Thanks for your reply, it definitely is the router, just to be sure I swapped the "old" router back in last night and the backup completed with no error messages. I still am confused by the fact that I seem to end up with a full backup even when receiving the error. If the file really was not copying successfully I would expect that it would be corrupt and that I would not be able to ungzip and untar it (perhaps I'm wrong about that?). That is what is so confusing about this situation to me. I will talk with the router vendor today about "nat timeout settings."
 
Well, there's probbably nothing wrong with the backup as it is transmitted over another port/stream, keeping the stuff alive enough. The error occurs after the transmission when the controllstream needs the confirmation of the completed transmission. At that time either your router or firewall might have closed the connection. Some routers close connections if no data is transmitted for a specified time. Some firewalls even close connections after a specified time if data actually IS being transmitted (NSA 2400 firewalles have a 'nice' setting to do this actively).

You might look for another ftp client (some of the *bsd's keep sending NOOP commands that actually do nothing but keeping the connection channel alive).

On the other hand... there should be something mentioned about this in the admin-/userguide of your new router.... somewhere... :)
 
Back
Top