Text Viewer cannot parse UTF-8 without BOM text file correctly!

Text Viewer cannot parse UTF-8 without BOM text file correctly!

This causes non ascii characters(chinese characters) display incorrectly. such as "娴嬭瘯缂栫爜aaa"

can this be solved?

thanks

Without a BOM, nothing identifies a .txt file as being UTF-8.

Programs can either assume a particular encoding (the text viewer assumes the one which Windows is configured to assume in the locale settings), or they can try to guess the encoding (which does not always work well).

Hello. Would it be a better idea in the future to allow the users to pick what encoding Directory Opus should assume it it comes across a file that does not have a BOM? That would at least help with this problem. The reason is BOM is required in UTF-16 and UTF-32 but, "The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8". Therefore quite a few editors or software when writing UTF-8 files does not use the BOM and because Directory Opus does not know that these files are UTF-8, it will displayed all these files incorrectly.

And like you said, trying to guess the encoding does not always work well. Therefore it's probably better to have the users manually set the assumed encoding of all files that does not have any indication of what encoding it used.

1 Like

We may add an option for that to the viewer. We already have one in some other places (e.g. file content searching; text-file thumbnails).

BOMs are the Windows standard, however, even with UTF-8, and the only way to explicitly indicate that a file is in a UTF format.

Note also that you can tell Windows itself that you want text to be interpreted as UTF-8 by default, via the language/locale control panel, and Opus will pick up that setting. So if you believe text should be UTF-8 by default, you can tell Windows and Opus to do that already, but you have to believe it enough to want it for all of your applications.

Ignore the last paragraph in my post above; I just checked it and while it's possible in theory, it doesn't seem to be exposed as an option in the control panel, so it's not really a viable path.

Yes that would be great if you can add that option for the viewer. It makes it easier for the user to do it directly within Directory Opus and even carry it with the portable version of Directory Opus without having to muck with Windows settings depending on what system they go to.

And yes while it's true that it's a Windows standard, it's also a problem if you download various text files from the internet since they typically don't have the BOM for UTF-8 files depending on what OS they are written on or the other person's preferences. I guess the idea behind Unicode is they want Unicode to be read as the default (instead of any Latin or ANSI based encoding) hence the idea that UTF-8 should not need a BOM and OSs should automatically read it as such.

Haha yeah I never ever remember seeing that option before. But if there was, it almost sounds like Microsoft is saying "we don't trust our detection algorithm, so here's an option to "fix" that" haha. Anyway, thanks!

I second the suggestion to add the UTF-8 option in the Viewer or make it a default "guess" since it's one of the most common encodings as as External mentioned above, Unicode standards doesn't recommend using a BOM, so all of my test files from Sublime Text 3 and other sources have no BOMs and hence are displayed incorrectly in the viewer.

Given that this was raised a while ago, is there an indication when this would be fixed?

1 Like

I'm following this thread for the first time because it relates to an issue I'm having.

I've never heard the acronym "BOM" before, can someone explain what this is?

en.wikipedia.org/wiki/Byte_order_mark

Also, the option was added in the recent betas:


1 Like

Thank you, Leo.