Stuck with special character " ° " on Rocky Linux

Hello Rocky Linux Family,

I am using Nextcloud to sync files from computer A (unknown computer - i’m guessing Mac) to computer B (Rocky Linux server).

Some of the invoices that come from computer A have special characters in their name, this causes weird bevahviour on the Rocky Linux server

image

Does anyone have experience with this kind of behaviour? Preferably, I would like to check the filename first for unknown characters, and then strip the unknown character from the filename

Kind regards

Hi,

What’s the output of:

localectl

on the Rocky Server? Does the filename show properly in the Nextcloud web interface?

Hi! Thank you for your reply

This is the output:

image

I am from Belgium, we use Celcius (" ° ") for our degrees, and I guess, some people us it for “number”

This server is a Nextcloud Client, the file is hosted on a different server, I moved the file out of the synced folder and synced the changes to the server, so I am unable to verify how it looks on the Nextcloud web interface

Can you do:

localectl set-locale en_US.UTF-8

and see how the filename displays on the Rocky server after that? You have en_US, but no UTF-8 support.

[root@rocky ~]# localectl
   System Locale: LANG=en_US.UTF-8
       VC Keymap: us
      X11 Layout: us

that’s how it shows on my basic install on my server. So I expect that should help out. Your keymaps can stay as they are, mostly just interested in the system locale.

Log out and log back in again after the change.

I have changed the locale settings and logged back in

now the result is as follows:

image

If you can log into computer A, I’d check for the intended characters, because, the next cloud sync might be getting it wrong. Stripping bad characters isn’t a good idea, you wnat the exact original filename.

The original file name is “FACTUUR n°Credit Nota n° -0097000156.PDF”

I also have an appliation on the server running where users can upload fiels themselves, when I uploaded a file that has the exact same name, it seemed to work perfectly

image

The issue seems to arise when the files comes trough the Nextcloud server?

I am not able to login into computer A :confused:

If you can’t log into computer A, how were you able to work out what the correct file name was?
It’s interesting that this forum seems able to render the name correctly.

I have asked my client what the original file name was, I do not think he will let me access his computer and/or can convince him to run “random commands” (as he will see it) on his computer

OK that’s good he was able to communicate the correct file name.

You could ask him to zip two of the files and send them to you, then you could double check the exact names, and compare them with what next cloud is saying.

The exact name is indeed “FACTUUR n°Credit Nota n° -0097000156.PDF”

When the file is uploaded trough the Nextcloud UI, it becomes this:

image

So I’m guessing it had someting to do with the communucation between the Nextcloud Server and my Nextcloud Client?

At least some filesystems do have specific codepage for filenames.

I did encounter that with NetApp:

  • Mounted a volume with NFSv3. Copied files. Mounted a volume with NFSv4. Output of ‘ls’ stops entirely on first “peculiar” character. Cannot upload via NFSv4 either
  • Created another volume with different codepage. Copied and accessed files fine with NFSv4.

That is, even NFSv3 and NFSv4 treat/handle filenames differently.

Samba utilities do include convmv that “converts filenames from one encoding to another”.
(The hard part is to guess the encoding of source, particularly all of them do not use same encoding.)

Since I have no control (i think) over how the communication is done between the Nextcloud Server and my Linux Nextcloud Client, what is your guys’ opinion on this take:

“Preferably, I would like to check the filename first for unknown characters, and then remove the unknown character from the filename”

In my case, the unknown characters are not important (they can be removed and nobody would turn their head), the processing of the file has priority

EDIT:

I have been fiddeling around, I’ve made this script:

#!/bin/bash

shopt -s nullglob
fileArray=(~/TEST/*)

touch tempIconvFile

for i in "${fileArray[@]}"
do

if [[ ${i} != *"tempIconvFile"* && ${i} != *".sh" ]];then
   echo $i >> tempIconvFile
   iconv -f utf-8 -t utf-8 -c tempIconvFile -o outputIconvFile
   newFileName=$(cat outputIconvFile | tail -n 1)
   newFileName="${newFileName// /}"
   echo $newFileName
   echo $i
   rm  outputIconvFile
fi

done

rm tempIconvFile

which produces this output:

/root/TEST/FACTUURnCreditNotan-0090006324.PDF
/root/TEST/FACTUUR n▒Credit Nota n▒ -0090006324.PDF

Any thoughts?

My next step would be to mv the file to change its name, but when i try to ls the original file, i get this error:

ls: invalid option -- '0'
Try 'ls --help' for more information. 

My guess is that it cannot find the orignal file on the drive due to the unknown character in the name, since it is displayed like so:

[root@localhost TEST]# ls
'FACTUUR n'$'\260''Credit Nota  n'$'\260'' -0090006324.PDF'   test.sh
[root@localhost TEST]# ls 'FACTUUR n'$'\260''Credit Nota  n'$'\260'' -0090006324.PDF'
'FACTUUR n'$'\260''Credit Nota  n'$'\260'' -0090006324.PDF'

OK, but that doesn’t look right, even if it’s a different language.

Do you think it displays on the client’s screen exactly like that?

I’m looking at the letter ‘n’ followed by the (two byte) degree sign, and wondering if it’s actually a single character consisting of three bytes?

I think it does show up exactly like that on the client’s screen

Maybe this can help you?

Output from

echo FACTUUR*-0097000156.PDF|od -tx1 -c
0000000  46  41  43  54  55  55  52  20  6e  b0  43  72  65  64  69  74
          F   A   C   T   U   U   R       n 260   C   r   e   d   i   t
0000020  20  4e  6f  74  61  20  20  6e  b0  20  2d  30  30  39  30  30
              N   o   t   a           n 260       -   0   0   9   0   0
0000040  30  36  33  32  34  2e  50  44  46  0a
          0   6   3   2   4   .   P   D   F  \n
0000052

Any ideas? Im still facing this issue, if possible, maybe someone can take a look at my script to remove the special characters from the filename (see discussion)