FSU.Run and Unicode support

I noticed that if I run certain commands through it, the response isn't decoded correctly.
If I run the exact same command in PowerShell, I get the proper Unicode characters.

I even tried:

var FSU = DOpus.FSUtil();
var query = 'cmd /c chcp 65001';
var r = FSU.Run(query, 0, true);
if (!r) {
	DOpus.Output('=> Unable to run command');
}
else if (r.stderr) {
	DOpus.Output('=> Error : ' + r.stderr);
}
else DOpus.Output(r.stdout);


Is there's something I'm missing?
Instead of trying to run it through cmd, can we get some way to set the codepage? And could the default value support Unicode?
Thanks

Unicode and cmd.exe is always tricky.

What are you ultimately trying to do?

If it's outputting UTF8 instead of ANSI, Opus has no way of knowing that, and you might need to use StringTools to convert the output.

cmd.exe also has a /U argument that "Causes the output of internal commands to a pipe or file to be Unicode" and may be useful. (I suspect it means UTF-16, not UTF-8, but haven't used that argument in a long time, since before Windows started supporting UTF-8, so it may have changed.)

Just run a command through FSU.Run() (not cmd) and process its output. But if the stdout contains any Unicode characters, I get garbage characters instead, as you can see in the example above.

Isn't it possible to set the codepage for the runner that Opus uses through FSU.Run()?
I'm not sure whether it's the tool I'm calling something to blame here, since I remember MediaInfo CLI doesn't have this problem. But like I said, if I run the same command in PowerShell, I get the correct output.

I found how to do it in this specific case, but please consider adding an option to set or force the codepage for FSU.Run()

1 Like

StringTools should let you convert the output from a specific codepage to UTF-8.

@Jon , @Leo,

When you run a process using WinApi, it gives a UTF-8 pipe for the child process's stdout. You need to convert it to a wide string to return from the FSU.Run method. How do you do that?

Because I needed to solve a similar problem in DOpus-Scripting-Extensions.ProcessRunner. And I convert the output from the pipe using boost::locale: ToUtf16().

Actually looking at it I think we may be assuming UTF-8 output already, which might be the problem if the tools are using the current code page instead. We can probably make that controllable.

I have tested DOpus-Scripting-Extensions.ProcessRunner has the same problem. Tools that print output in UTF-8 work properly, for example:

query = 'C:/Program Files/Git/usr/bin/echo.exe arg1_трじα';
var r = FSU.Run(query, 0, true);
DOpus.Output(r.stdout); // prints "arg1_трじα"

However, if the executable prints its output in a non-UTF-8 encoding, then the output is incorrect.

Also strange thing that even doing this query = 'cmd /c chcp 65001 & echo arg1_трじα' doesn't help.

I dug a bit more. In my opinion, it is impossible to influence/detect the encoding of the child process. There is an option to return a byte array from FSU.Run and let the user convert it to a string using whatever encoding they want.
Honestly, I don't know if it's worth it. All console executables should use UTF-8.

Not necessary to auto detect, just let us set the codepage.
And maybe setting it to 65001 as the default would be a good idea?

Not necessary to auto detect, just let us set the codepage.

Unfortunately, it seems impossible. Maybe @Jon, will be able to find something. But I couldn't.

Do you have a specific use case where it is a problem?

There's no way to tell the command what encoding to use, but we can control how it's converted to the string the function returns. At the moment it's hardcoded to assume UTF-8 but in the next beta we'll add an argument for the function to let the encoding be specified.

1 Like