Unicode chars makes script output into something else

It basically messes up the display of specific characters (read: backslash) whenever they're used afterwards.
I tried using it both with through a variable, and directly, but the result is the same.

Notice how the backslash displays properly before the japanese characters (I don't understand japanese, they're just test chars).
However, everything after them makes a backslash into a yen character (Y with two crossing lines).

Btw, when copying the yen character to paste into the browser, it pastes a backslash.

var unicodeteststr="リリズム";
var filename="D:\\Prg\\Projects\\MUtil\\Tests\\"+unicodeteststr+"Test\\TestData"+unicodeteststr+".txt";
var dirname="D:\\Prg\\Projects\\MUtil\\Tests\\"+unicodeteststr+"Test";

var filename2="D:\\Prg\\Projects\\MUtil\\Tests\\リリズムTest\\TestDataリリズム.txt";
var dirname2="D:\\Prg\\Projects\\MUtil\\Tests\\リリズムTest";

DOpus.Output('FileName ='+filename);
DOpus.Output('FileName ='+filename);

DOpus.Output('DirName ='+dirname);
DOpus.Output('DirName ='+dirname);

DOpus.Output('FileName2 ='+filename2);
DOpus.Output('FileName2 ='+filename2);

DOpus.Output('DirName2 ='+dirname2);
DOpus.Output('DirName2 ='+dirname2);

Here's what I see:

Where/in what is the script being edited and stored?

I'm using EditPad Pro to edit it.
I pasted it into the CLI, and the edit window looks (mostly) ok.
There are some skewing because of the japanese characters, but nothing else.
However, the output from the script..doesn't look ok.

I've attached the script as a file.
test.js.txt (811 Bytes)

I mean, the output from the script doesn't look ok in the output windows.
(both the one below the edit window/field, and the "Script Output" log window).

Pasting into the CLI script editor and running it there seems to work OK here in Windows 10 and Opus 11.15.2, with English as the locale (locale might be relevant; not sure).

Did I do the right thing?


Yes you did. I guess either the font or the edit field behavior has changed compared to W7 x64.

Btw, the path separator is apparently the yen character on japanese systems (which mine isn't), and
those two characters apparently have the same charcode.

After searching a bit, maybe it is shown as a yen character because there were a japanese character present.
If so, it'll probably display a won (korean money) character if a korean character is present.

I have english W7 Pro, and use US Eng and Norwegian keyboard mapping (toggle with shift-alt).

The reason I began with this at all was to debug unicode output from an activex library
when using scripts (it is somewhat beneficial to be able to handle unicode filenames etc)
which led to a "strange" display..and not the expected kind of "strange" (as in, not raw utf8 characters).

In the post above I it should be "(as in, raw utf8 characters)"
Yep, I was right. Replace unicodeteststr with "백묵 헤드라인" (no idea what it says) and the
backslashes becomes ₩.

Considering it works in W10, I assume the log display is the standard windows edit control..which
kind of explains the apparent 32K character limit I've mentioned before.

The log display is a sub-classed RichEdit control, which doesn't have a 32K limit as far as I know.

However I do think this is probably caused by a bug in the RichEdit control. If you look at Leo's screenshot you can see that after the first appearance of the Asian characters, the font in the control has changed. The second and subsequent "FileName" strings are obviously in a different font to the first one.

When the control is created Opus sets its font to Courier New, but after that we don't change the font at all, so this font change is something the control is doing itself when it sees the Asian characters. If you copy the text out of the log and paste it into WordPad, at least on my system the second font is shown as "nSimSun". So I guess the bug here is that, while it's fine for the control to switch the font, it's not switching it back again afterwards.

I get the same result as Leo in that the backslashes remain as backslashes so my only guess is that either on Windows 10 the "nSimSun" font has a proper backslash character at that codepoint, or it's something the richedit control knows how to do.

[quote="jon"]The log display is a sub-classed RichEdit control, which doesn't have a 32K limit as far as I know.

However I do think this is probably caused by a bug in the RichEdit control. If you look at Leo's screenshot you can see that after the first appearance of the Asian characters, the font in the control has changed. The second and subsequent "FileName" strings are obviously in a different font to the first one.

When the control is created Opus sets its font to Courier New, but after that we don't change the font at all, so this font change is something the control is doing itself when it sees the Asian characters. If you copy the text out of the log and paste it into WordPad, at least on my system the second font is shown as "nSimSun". So I guess the bug here is that, while it's fine for the control to switch the font, it's not switching it back again afterwards.

I get the same result as Leo in that the backslashes remain as backslashes so my only guess is that either on Windows 10 the "nSimSun" font has a proper backslash character at that codepoint, or it's something the richedit control knows how to do.[/quote]
I tested it just now, and I could get it to print 176KB so it doesn't seem to have that limit, at least now.
I'm not sure what caused it, but when I posted this 32K seemed to be the limit.

Btw, the short script in that post still does the same in 11.15.2 (today it displayed the column-text for 10273 of 17354 files).

Yes, considering it works in W10, its probably been fixed in the later versions.
That bug doesn't really affect me much as I'm just testing my lib with those characters, but I still thought I should report it.

If you're interested, there's a good description here of why backslashes end up as yen symbols in the first place.

That was one of the places I found while searching for this issue :slight_smile:
I was kind of happy to see that bug initially though, because at least my lib worked perfectly.