Text viewer - cyrillic symbols display incorrect

kmi · April 5, 2017, 3:32pm

Hi
If I preview text document with text in ANSI 1251 encoding, it displayd incorrectly.
I can't change text encoding in plugin. With text in UTF-8 all fine, but not all documents in UTF

Jon · April 6, 2017, 3:18am

I think it would only be expected to work if your system code page was set to 1251 as well.

kmi · April 6, 2017, 4:40am

System locale set to russia and all other program, far manager, for example, total commander, notepad display content correctly. Add options to change encoding in text viewer

Leo · April 6, 2017, 8:50am

The text viewer in Opus should do the same thing, I would have expected.

Can you zip and attach a couple of example files with the problem for us to look at?

kmi · April 6, 2017, 9:11am

Yes, DOpus.zip (167.7 KB)
Two text files and two screenshots

qiuqiu · April 6, 2017, 9:45am

I recommend that the TextView plug-in refer to this project for character set and text coding detection.
It is better to add a DOpus Encode column to identify the text encoding or character set.
https://www.freedesktop.org/wiki/Software/uchardet/

Leo · April 6, 2017, 10:16am

We would rather not use anything that tries to guess encodings. There is no reliable way to do so and it can go horribly wrong and cause more problems that it solves.

Explicit encodings work best, either though configuration (as you can do with the Text File Thumbnails plugin, but not currently with the Text Viewer plugin; we may merge the two at some point), or by using Unicode with a BOM at the start of the file, if the file is not in the codepage that the OS is configured to use.

Leo · April 6, 2017, 10:22am

From the results you are seeing, your Text viewer plugin configuration must have Assume UTF-8 without BOM turned on. (Configured under Preferences / Viewer / Plugins.)

Otherwise, the UTF-8 file would not work, as it is missing the BOM at the start which indicates it is UTF-8. (Without that BOM, which is standard on Windows but non-standard on Linux, software can only either guess or assume that the data is UTF-8. Guessing is unreliable. A BOM makes things explicit, but is not always used. The Assume UTF-8 without BOM option tells Opus to make the assumption of UTF-8 when the BOM is not there.)

Since you have that option turned on, it means the 1251 file will not work, because it will also be interpreted as UTF-8 data. (That's what the option does, after all.)

If you turn the option off, I would expect the 1251 file to then work, assuming your system is configured with that locale. The UTF-8 file would then break, unless it was re-saved to have a BOM at the start that identifies it as UTF-8 data.

kmi · April 6, 2017, 11:21am

I try turn off Assume UTF-8 without BOM, in text plugun confugure but it remain turned on.
What's I do wrong?

Leo · April 6, 2017, 12:02pm

That is a bug, apologies. We have fixed it for the next update.

In the meantime, if you go to /dopusdata/ConfigFiles/Plugins (usually C:\Users\USERNAME\AppData\Roaming\GPSoftware\Directory Opus\ConfigFiles\Plugins) and edit text.oxc it should look similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<text>
	<flags>3</flags>
	<max_preview>0</max_preview>
	<font>Courier New/-24,400,0,0,0,0,49,200,9</font>
	<font_hex>Courier New/-24,400,0,0,0,0,0,200,9</font_hex>
</text>

Find the flags line and subtract 2 from it. It should either be 2 or 3 at the moment.

If it's 2, change it to 0.
If it's 3, change it to 1.

Save the config file and the setting should be off now.

kmi · April 7, 2017, 8:20am

Thank's it's work