Why sometimes thumbs view does not 'view inside' text files?

Hi,

Lately I had some discussion with customers and providers about codification of txt files, because some of them prefer it as Unicode, some as UTF-8 for the API's and some of them doesn't mind at all as long it can be viewed with notepad.

These people is like a pain in the ass for me; I didn't care much about coding until now.

The question is: is there any way in Dopus to know wich codification a txt file was saved with?
Does it do with the fact that sometimes the thubmail view does not display the inside of a txt file?

Thank you

The TextThumbs plugin should be able to display thumbnails for all of those types of files, provided they have a BOM (byte-order-mark) at the start of them, which is mandatory for UTF-16 text files (but not written by some programs, unfortunately).

At least, that's how I remember it working. I would have to look at the source code to be completely sure. :slight_smile:

Can you send me a couple of files which don't show up as thumbnails?

Hi,

Thank you for such a quick answer.

I'm attaching a rar file with two txt files. One is a product I created somehow (I don't remember) and the other is the result of the common command "create new text file" within right mouse click. They do not show up in Thumb view.

UEdit-32 says both of them are "DOS" files, which after reading throught its documentation means that are ASCII files, no Unicode no UTF-8.

Well, after reading this progtram help I'm even more confused about this stuff. I'm tempted to check the UEdit-32 "use Unicode as default for new files" but don't know if that is the wiser solution.

Any suggestion would be appreciated.

Thank you.
txt_examples.rar (3.33 KB)

juegos.txt is a UTF-16 file but has no BOM at the start, so it looks like a binary file to the TextThumbs plugin (and the Opus Text viewer). If you load it into Notepad and re-save it then the new version will appear correctly in Opus since Notepad writes the BOM. (You can see the file grows by 2 bytes.)

The empty text file shows as an empty text-thumb for me. The TextThumb plugin cannot tell that an empty file is a text file by looking at the contents (of course :slight_smile:) so it looks at the PerceivedType value in the registry for the extension in question.

Have a look under HKEY_CLASSES_ROOT.txt where there is usually a PerceivedType string value set to text -- maybe this is missing from your registry?

The PerceivedType registry setting is only ever looked for when TextThumbs is deciding whether to handle an empty file.

Hi,

Yep, the reg entry was somehow missing. Now the empty file shows also as an empty thumb for me.

I've done some tests with the 'juegos.txt' file and got some strange results. (All files have the CR/LF DOS line terminators) Have a look:

Coding ------- Size(Kb) ------- preview Thumb ------- viewable in Opus viewer
ASCII ---------- 15 --------------- Yes ---------------------- Yes -------
UTF-8 ---------- 15 ----------------- No ------------------ Yes -------
UTF-16 --------- 30 ---------------- Yes ---------------- Yes -------
UTF-8NoBOM ---- 15 --------------- Yes ----------------- Yes -------
UTF-16NoBOM ----- 30 -------------- No ----------------------- Hexadecimal ----
UnicodeASCII-Escaped ---15 ---------- Yes -------------------- Yes -------

At a glance, interesting to see how it wasted double space with 16 bit encoding; and how an UTF-8 No BOM can bee seen in the thumb, while an UTF-8 BOM can't be seen in thumb but do in Opus viewer.
Unless I messed-up the files (attached), this does not agree on what you said about BOM.

If I am not wrong, then my suggestions to development would be:

  1. Thumb preview for as much codings as possible
  2. Or at least, some coherence between the thumb plugin and the viewer.

What do you think?
P.S.: What a pain to make this seem a table :open_mouth:

txt_ex_2.rar (6.5 KB)

When I want a table on the forum, I edit it in Notepad and paste it here inside of the Code tags.

[quote="artema"]UTF-8 ---------- 15 ----------------- No ------------------ Yes -------
UTF-16 --------- 30 ---------------- Yes ---------------- Yes -------
UTF-8NoBOM ---- 15 --------------- Yes ----------------- Yes -------
UTF-16NoBOM ----- 30 -------------- No ----------------------- Hexadecimal ----
UnicodeASCII-Escaped ---15 ---------- Yes -------------------- Yes -------[/quote]
Thanks for testing the different cases! I haven't looked into this further but I guess I need to make the TextThumbs plugin understand UTF-8 BOMs. I'll make the change when I next do some work on the plugin, which should be in the next few weeks.

Ok, in the meantime I'll forward this topic as a suggestion to development, so it won't be forgotten.

Thank you.