Recode Text to Another Coding

I need to convert like 1000 different .srt files (subtitles) located in different subfolders from Window-1250 encoding to UTF-8. For that purpose I've found some "recode.exe" that can do that easy. I thought it'd be easy for me to set up a button to do that, but apparently I grew up rusty over time. I'm not sure if I should write a script to do that or I could just get away with using Opus standard set of commands?

My idea is this:

  1. Open the subflolders in flat view
  2. Select the .srt files

The Opus command button then should:

  1. Make backup of all selected files in their respective folders by just renaming them with adding ".W1250" extension to them
  2. Do the conversion on the primary file (the command is this):
    recode {in-file} {out-file} /in=Windows-1250 /out=utf-8

..
My question is this: how the button's commands should look like in order to achieve this? If the answer is that it's not possible without a script, no need to bother - I should know how to do it.

I know similar issue has already been raised here a couple of years ago, but to be honest I couldn't comprehend the real usage, as it ended up in toggling BOM.

Thanks in advance :slight_smile:

You can create backups with

@nodeselect 
@nofilenamequoting
Copy "{filepath}" AS="{file}.W1250" HERE

But since recode.exe doesn't want to overwrite files you could skip this and directly use (DOS-Button):

@nodeselect 
@nofilenamequoting
Recode.exe "{filepath}" "{filepath}.utf-8" /in=Windows-1250 /out=utf-8
pause

Strangely, recode seems to get confused by filenames with spaces (even when quoted!), so you might need to rename your files before running the button.

1 Like

So I decided to cut down my laziness and take a plunge into the Opus command parsing logic. To my pleasant surprise I've found out that it already supports UTF-8 or so it seems, as I couldn't get it up to work properly. I should be able to define the source coding, but I can't see the option. What am I missing here?

Copy {filepath} AS={file|noext}.W1250.srt HERE
Copy {filepath} AS={file|noext|utf8}.UTF8.srt HERE

.
Result (apparently wrong):

 Name                  Ext        Size
 test.srt              srt     91,8 KB
 test.UTF8.sr          srt     91,8 KB
 test.W1250.srt        srt     91,8 KB

.
Should be:

Name                    Ext        Size
test.srt                srt     91,8 KB
test.UTF8.srt           srt       93 KB

The utf8 modifier doesn't do what you think. From the manual:

Forces the file written by file, filem or fileq to use UTF-8 format. For example, {allfilepath|filem|utf8}.

It is for when you are writing all the selected filenames out to a temporary text file, using the file, filem, or fileq modifiers.

The file modifier in the manual:

Redirects the filenames to a temporary text file. This is useful with external programs that can accept a list of files from a text file rather than on the command line. (In turn, that is useful because command lines have a maximum length which may limit the number of filenames you can pass directly.) The name of the temporary file is passed on the command line in place of the filenames themselves. In the text file, each filename is separated by a space. If a filename or its path contains an embedded space it will be surrounded by quotes. For example, {allfilepath|file}.

It doesn't have anything to do with making copies of files.

(Don't confuse the {file} code, which inserts a filename into a command, with the file modifier, which is added to a code like {allfilepath} to make something like {allfilepath|file} which writes filenames to a temporary file and then inserts the path to that file into the command.)

Opus can convert text file contents between encodings, if the Recode.exe tool you were using isn't working, but you would need to use scripting for that. See the StringTools Encode and Decode methods.

1 Like

Thanks, @Leo now it's clear. I'll try to settle this down with Recode.exe, but if this wouldn't work, off the scripting we'll go.

BTW, is there a list of possible encodings, or should we browse Microsoft web pages for this?

Otherwise, format must be set to a valid code-page name (e.g. "gb2312" , "utf-8" ), or a Windows code-page ID (e.g. 936 , 65001 ). The source will be decoded using the specified code-page and a string is returned.

Googling for "windows code page" finds a Wikipedia page which lists the common ones, at least Windows code page - Wikipedia

I think the exact list will depend on which version of Windows and which languages you have installed in Windows.

For your case, 1250 and "utf-8" should work as the two values. (Note that one is a number and the other is a string.)

1 Like

Continuing the discussion from Recode Text to Another Coding:

So I finally made it. :slight_smile:

I made a small C# program called CodeToUTF8 that nicely and safely converts the files. The C# source and exe can be found at the GitHub 'CodeToUTF8' page here.

--

The Opus button:

@nodeselect 
Copy DUPLICATE {filepath} AS="{file|noext}.OriginalBackup" HERE
/dopusdata\UserCommands\CodeToUTF8.exe "{filepath}" "{filepath}" Windows-1250

--

The result: Before / After

2021-02-16_005045

2 Likes