unicode filenames (in backup)

MtK

Verified User
Joined
Aug 2, 2007
Messages
405
Hey,
when a user uploads a file with a filename in something other then latin for example עברית.txt (Hebrew), everything is find, and the server lives well with the file's name & content.

upon backup of that user, that same file's name become: עברית.txt
(its content is still intact)

this is really bad, because if we want to restore the account, all those unicode filenames cannot be read anymore by the site storing them.




btw, not sure it's related, but DA doesn't like non-Latin filenames.
when trying to create one, like: עברית.txt
Code:
Unable to create create new file

Details

The new file path is invalid.
but the same file, can be uploaded with the same name, to the same path via FTP without any issues.
 
What happens if file name is entirely in hebrew, for example:
Code:
עברית
(in other words I'm wondering if mixing between a right-to-left and left-to-right characterset in the same word is causing the code to misunderstand the characterset)

Jeff
 
Hello,

DirectAdmin itself is going to be more strict with regards to what it allows for filenames...
However, if the file is in the public_html somewhere, the name shouldn't matter, since it's all up to tar to compress the "domains" directory.

Where exactly is the עברית.txt name being seen? Is it stored like that in the tar.gz file itself, or only after it's been extracted.

Which version of tar are you running?
It may be possible that a newer version has a fix... as some quick googling of the issue results in several posts about possible tar bugs.
Code:
tar --version
we've got a compile guide here for an unrelated issue:
http://help.directadmin.com/item.php?id=220

I believe 1.26 is still the latest version

John
 
What happens if file name is entirely in hebrew, for example:
Code:
עברית
(in other words I'm wondering if mixing between a right-to-left and left-to-right characterset in the same word is causing the code to misunderstand the characterset)

Jeff
Same result

Hello,

DirectAdmin itself is going to be more strict with regards to what it allows for filenames...
However, if the file is in the public_html somewhere, the name shouldn't matter, since it's all up to tar to compress the "domains" directory.

Where exactly is the עברית.txt name being seen? Is it stored like that in the tar.gz file itself, or only after it's been extracted.

Which version of tar are you running?
It may be possible that a newer version has a fix... as some quick googling of the issue results in several posts about possible tar bugs.
Code:
tar --version
we've got a compile guide here for an unrelated issue:
http://help.directadmin.com/item.php?id=220

I believe 1.26 is still the latest version

John
upgraded to 1.26, nothing changed.
even tried to manually create a tar.gz (not through backup) and got:
Code:
An error occured while creating the compressed file

Details

/domains/unicode.lighthost.co.il/public_html/עברית is not a valid path
and the same for
Code:
/domains/unicode.lighthost.co.il/public_html/עברית.txt is not a valid path
 
DirectAdmin itself is going to be more strict with regards to what it allows for filenames.
This holds true for this case. What DA allows as a filename is going to be very strict.
The name you've mentioned won't fall under that category as it contains characters which DA may not correctly handle, thus does not fall under the category of allowed filenames.
I wouldn't consider this a DA bug, but rather a design choice.
The issue with tar however, may be a tar bug (or perhaps just a design choice by the tar developers, I'm not too sure)

Other services like proftpd, apache, etc.. will likely handle them correctly, so they can be used normally.

John
 
Back
Top