I am using Nextcloud to sync files from computer A (unknown computer - i’m guessing Mac) to computer B (Rocky Linux server).
Some of the invoices that come from computer A have special characters in their name, this causes weird bevahviour on the Rocky Linux server
Does anyone have experience with this kind of behaviour? Preferably, I would like to check the filename first for unknown characters, and then strip the unknown character from the filename
I am from Belgium, we use Celcius (" ° ") for our degrees, and I guess, some people us it for “number”
This server is a Nextcloud Client, the file is hosted on a different server, I moved the file out of the synced folder and synced the changes to the server, so I am unable to verify how it looks on the Nextcloud web interface
and see how the filename displays on the Rocky server after that? You have en_US, but no UTF-8 support.
[root@rocky ~]# localectl
System Locale: LANG=en_US.UTF-8
VC Keymap: us
X11 Layout: us
that’s how it shows on my basic install on my server. So I expect that should help out. Your keymaps can stay as they are, mostly just interested in the system locale.
If you can log into computer A, I’d check for the intended characters, because, the next cloud sync might be getting it wrong. Stripping bad characters isn’t a good idea, you wnat the exact original filename.
The original file name is “FACTUUR n°Credit Nota n° -0097000156.PDF”
I also have an appliation on the server running where users can upload fiels themselves, when I uploaded a file that has the exact same name, it seemed to work perfectly
The issue seems to arise when the files comes trough the Nextcloud server?
If you can’t log into computer A, how were you able to work out what the correct file name was?
It’s interesting that this forum seems able to render the name correctly.
I have asked my client what the original file name was, I do not think he will let me access his computer and/or can convince him to run “random commands” (as he will see it) on his computer
OK that’s good he was able to communicate the correct file name.
You could ask him to zip two of the files and send them to you, then you could double check the exact names, and compare them with what next cloud is saying.
At least some filesystems do have specific codepage for filenames.
I did encounter that with NetApp:
Mounted a volume with NFSv3. Copied files. Mounted a volume with NFSv4. Output of ‘ls’ stops entirely on first “peculiar” character. Cannot upload via NFSv4 either
Created another volume with different codepage. Copied and accessed files fine with NFSv4.
That is, even NFSv3 and NFSv4 treat/handle filenames differently.
Samba utilities do include convmv that “converts filenames from one encoding to another”.
(The hard part is to guess the encoding of source, particularly all of them do not use same encoding.)
Since I have no control (i think) over how the communication is done between the Nextcloud Server and my Linux Nextcloud Client, what is your guys’ opinion on this take:
“Preferably, I would like to check the filename first for unknown characters, and then remove the unknown character from the filename”
In my case, the unknown characters are not important (they can be removed and nobody would turn their head), the processing of the file has priority
EDIT:
I have been fiddeling around, I’ve made this script:
#!/bin/bash
shopt -s nullglob
fileArray=(~/TEST/*)
touch tempIconvFile
for i in "${fileArray[@]}"
do
if [[ ${i} != *"tempIconvFile"* && ${i} != *".sh" ]];then
echo $i >> tempIconvFile
iconv -f utf-8 -t utf-8 -c tempIconvFile -o outputIconvFile
newFileName=$(cat outputIconvFile | tail -n 1)
newFileName="${newFileName// /}"
echo $newFileName
echo $i
rm outputIconvFile
fi
done
rm tempIconvFile
which produces this output:
/root/TEST/FACTUURnCreditNotan-0090006324.PDF
/root/TEST/FACTUUR n▒Credit Nota n▒ -0090006324.PDF
Any thoughts?
My next step would be to mv the file to change its name, but when i try to ls the original file, i get this error:
ls: invalid option -- '0'
Try 'ls --help' for more information.
My guess is that it cannot find the orignal file on the drive due to the unknown character in the name, since it is displayed like so:
[root@localhost TEST]# ls
'FACTUUR n'$'\260''Credit Nota n'$'\260'' -0090006324.PDF' test.sh
[root@localhost TEST]# ls 'FACTUUR n'$'\260''Credit Nota n'$'\260'' -0090006324.PDF'
'FACTUUR n'$'\260''Credit Nota n'$'\260'' -0090006324.PDF'
I think it does show up exactly like that on the client’s screen
Maybe this can help you?
Output from
echo FACTUUR*-0097000156.PDF|od -tx1 -c
0000000 46 41 43 54 55 55 52 20 6e b0 43 72 65 64 69 74
F A C T U U R n 260 C r e d i t
0000020 20 4e 6f 74 61 20 20 6e b0 20 2d 30 30 39 30 30
N o t a n 260 - 0 0 9 0 0
0000040 30 36 33 32 34 2e 50 44 46 0a
0 6 3 2 4 . P D F \n
0000052
Any ideas? Im still facing this issue, if possible, maybe someone can take a look at my script to remove the special characters from the filename (see discussion)