Is there some script to covert text files (e.g. srt) into UTF-8, without BoM? For the time being I am either using UTFcast Professional for batch changes or NotePad++ for single file conversion. in both cases, many clicks are involved and is going to be extremely helpful to be able to have a Dopus script button converting txt/srt/etc files into UTF-8, no-BoM.
Georgios means converting from UTF-8+BOM to UTF-8.
There is a difference between UTF-8 and UTF-8 with BOM.
Some applications have problems with the BOM variant. Stripping the BOM, thus saving the text file as UTF-8 solves those problems.
I myself must regularly open text files in Notepad++, change the encoding and than save the file again. Georgios is asking for a script to do this in a batch kind of way. Even for a single file a button click is faster then opening and saving in a text editor.
If it removes the BOM from a file, that file will be deselected. Files that don't start with a UTF-8 BOM will be skipped and left selected.
The script only does minimal error checking, and does not create a backup of the old file before overwriting it, so you might want to create backups first or improve the script if you are going to use it on important data that isn't already backed up.
Click here to see the script code contained in the .dcf above, if you just want to look at how it works without downloading the .dcf:
OLD script code
function OnClick(clickData)
{
var tab = clickData.func.sourcetab;
var cmd = clickData.func.command;
cmd.deselect = false;
var vecDeselect = DOpus.Create.Vector();
var blobBOM = DOpus.Create.Blob(0xEF,0xBB,0xBF);
var blobFile = DOpus.Create.Blob();
for (var eSel = new Enumerator(clickData.func.sourcetab.selected_files); !eSel.atEnd(); eSel.moveNext())
{
var item = eSel.item();
var file = item.Open("r", tab);
if (file.error == 0 &&
file.Read(blobFile, 3) == 3 &&
file.error == 0 &&
blobBOM.Compare(blobFile) == 0)
{
blobFile.Free();
file.Read(blobFile);
if (file.error == 0)
{
file.Close();
file = item.Open("wt", tab);
if (file.error == 0)
{
file.Write(blobFile);
file.Close();
vecDeselect.push_back(item);
}
}
}
}
if (vecDeselect.size > 0)
{
cmd.ClearFiles();
cmd.AddFiles(vecDeselect);
cmd.RunCommand("Select DESELECT FROMSCRIPT");
}
}
Not only from UTF-8-BoM. Original file(s) may be UTF-8-BoM, in which case only stripping BoM is needed, but may be ANSI, ISO, OEM, etc. UTFCast Professional as well as Notpad++ do the job fine but... too many clicks. I am always in favor of executing file related commands within Dopus, whenever applicable. It is The file manager after all!
There is generally no reliable way to automatically detect those encodings from each other.
If you have (or can find) tools which do the guesswork well enough, and if they have command-line interfaces, you can run them from Opus buttons to automate things.
The only downside is that this generates a UTF-8 BOM file.
But with some scripting and use of variables you could make it a two step-conversion with Leo's script as last part.
I don't know the nitty-gritty of PowerShell, DO variables and scripting so I can't help you further.
Thanks Leo!
To focus on the actual RemoveUTF8BOM functionality and for using it elsewhere more easily.. I put your jscript code together a bit differently, same logic otherwise.
function RemoveUTF8BOM( doItem, tab) {
var bBOM = DOpus.Create.Blob(0xEF,0xBB,0xBF);
var bFile = DOpus.Create.Blob();
tab = tab || null;
var f = doItem.Open("r", tab); if (f.error != 0) return false;
if (f.Read(bFile, 3) != 3 || bBOM.Compare(bFile) != 0) return false;
bFile.Free();
f.Read(bFile); if (f.error != 0) return false;
f.Close();
f = doItem.Open("wt", tab); if (f.error != 0) return false;
f.Write(bFile); f.Close(); bFile.Free();
return true;
}
function OnClick(data) {
var f = data.func, tab = f.sourcetab, selFiles = tab.selected_files;
var cmd = f.command; cmd.deselect = false; vecDeselect = DOpus.Create.Vector();
for (var eSel = new Enumerator(selFiles); !eSel.atEnd(); eSel.moveNext()) {
var bomRemoved = RemoveUTF8BOM(eSel.item(), tab);
if (bomRemoved) vecDeselect.push_back(eSel.item());
}
if (!vecDeselect.size) return;
cmd.ClearFiles();
cmd.AddFiles(vecDeselect);
cmd.RunCommand("Select DESELECT FROMSCRIPT");
}
Here is a powershell version (needs to be an external file). For safety reasons it uses a temporary file while copying the file contents. If the targetPath parameter is given, it does not overwrite the source file containing the BOM.
ps: I played some more with powershell to get something more tiny for the UTF8-BOM removal, but I always ended up using an equal amount of code compared to the DO specific JScript, so there's no real benefit in a powershell version (unless you need to run that code outside of DO). The 3-liner powershell versions out there always seem to mess around with the line endings (adding additional linebreaks at the end e.g.), not an option if you ask me.
I've posted a newer version of my script above to the Buttons/Scripts area.
You can use this for buttons or commands which add, remove or toggle the UTF-8 BOM at the start of the selected file(s) (or a file specified on the command line):