Metadata BOM removal

I’d like to talk about the BOM in the room. :slight_smile:

In the course of working with image metadata via both Directory Opus and ExifTool, it’s become apparent that some EXIF fields written by DOpus contain a BOM (UTF byte order mark, or zero-width non-breaking space, U+FEFF / 0xEF,0xBB,0xBF) as the first character. As far as I have seen so far, this BOM does not get included in the respective field value string when displaying/editing metadata via DOpus’ Metadata Pane or Set Metadata dialog, but can be found in JSON data exported from image files by ExifTool:

Under normal circumstances, this BOM probably doesn’t bother anybody else, and it mostly didn’t bother me, either, until my recent efforts to automate a number of routine (for me) metadata operations that I had previously performed manually. For example, I’m often enough prepending and/or appending new strings to existing metadata fields such as Description, Subject, Title and Comment. I’m using ExifTool to do these, via custom DOpus commands like this:

@set descprfx={dlgstring|Enter string to prepend to Description}$
ExifTool "-EXIF:ImageDescription<{$descprfx}EXIF:ImageDescription" .

As written above, this command results in BOM ending up between the new and existing strings, which often causes problems for me later on. I’ve now managed to amend my command line to remove the BOM, if there is one, during the prepend operation:

ExifTool "-EXIF:ImageDescription<{$descprfx}{EXIF:ImageDescription;m/^(?:\xEF\xBB\xBF)?(.*)/; $_=$1}" .

Post-BOM removal, there seems to be no adverse effects on the display/editability of the Description field in the Metadata Pane or Set Metadata dialog. Is the BOM actually necessary? Can a future update to DOpus’ metadata capabilities maybe do away with it? I’m not absolutely sure, but I think there are other metadata fields written by DOpus which do not contain a BOM. I assume the code for dealing with metadata is not strictly DOpus’, but the support library I see looks to me like it’s a customized version of Exiv, credited to GPSoftware, and not updated since 2014.

On the other hand, if the BOM does serve some purpose for DOpus that’s eluding me, am I potentially setting myself up for other complications down the road by removing it? I’m sure I could easily revise my code to restore it to the beginning of the prepended string, but would rather just leave it out if all else is equal.

Contents:

“DOpus_Meta_BOM\”
“2020-09-12 08;56;35 - MAZE - DOpus Meta BOM.png” (57,940) [800 x 340 x 24]
“DOpus_Meta_BOM.png” (4,191) [1 x 1 x 1]
“DOpus_Meta_BOM.png.json” (1,582)
“DOpus_Meta_BOM.png.txt” (370)
“DOpus_Meta_BOM_Original.png” (4,049) [1 x 1 x 1]
“DOpus_Meta_BOM_Original.png.json” (1,441)

2020-09-12 10;51;05 - MAZE - DOpus Meta BOM.7z (55.3 KB)

That was meant to say: “I think there are other UTF-encoded metadata fields written by DOpus which do not contain a BOM.”

BOMs remove ambiguity regarding which 8-bit codepage is in use, and I think are written by other tools as well. (AFAIK, we did not invent doing that for this type of metadata.)

It might make more sense to ask for ExifTool to be able to handle BOMs properly (or it may already have such an option; I don't know it well enough to know for sure).

@mazeckenrode

Another idea is cut out the middleman and do it all in Opus.

A small script can gather the present metadata field you are interested in and display it. If you wish to change the field, you can do so. You can then compare the values an if there is a change you can overwrite the old field with the new.

I have been using this method for years and it seems BOM proof :grinning:

@Leo

From the admins at ExifTool, for what it’s worth:

– “I’ve never seen a BOM in any image I’ve collected from the web, nor have any of the tools I’ve tested ever used a BOM.”

– “I’d have to agree.”

@auden

I appreciate the suggestion, but what I try to automate and accomplish in a single ExifTool command line is generally much more complicated and sweeping than the few short examples I’ve given here. Also, ExifTool is being actively developed, with support for reading and writing many, many tags and tag types, at least some of which I use, that as far as I can tell aren’t supported by DOpus, or in a few cases are supported but not in accordance with the official tag specifications. I do want to maintain DOpus’ ability to display the major tags it’s capable of displaying, though, which is why I use DOpus to write those tags first, before subsequent manipulations by ExifTool.

Maybe they are right. I'll do some more research to try and work out when/why we (or a library we're using) started doing it, in case that brings anything to light.

Opus has done this for a long time, though, and without a BOM every tool is left guessing what the encoding is (some tools use UTF8, some UCS-2, some ASCII or local codepages). I find the idea of avoiding explicitness and going with error-prone guessing a curious one, but if that really is the standard (or if there is some other good way to indicate encoding, which works even when metadata is edited by multiple programs) then we aren't exactly in a position to change that. :slight_smile:

@Leo

In case it’s any help, you may find ExifTool’s FAQ entry on metadata character encoding of interest. In particular:

“ExifTool writes Unicode in native EXIF byte ordering by default, but the byte order may be specified by setting the ExifUnicodeByteOrder tag (see the Extra Tags documentation).”

…and…

“The value of the IPTC:CodedCharacterSet tag determines how the internal IPTC string values are interpreted.”

“Note that unless CodedCharacterSet is UTF-8, applications have no reliable way to determine the IPTC character encoding. For this reason, it is recommended that CodedCharacterSet be set to ‘UTF8’ when creating new IPTC.”

The next Opus beta will change things so that we no longer add the BOM.

Note that this will only affect fields which are added/edited; existing fields are left as-is if they aren't changed.

@Leo

Understood, and thanks for the heads up. Out of curiosity, is that the only change, as far as BOM and metadata encoding go, or have you opted for some other method of encoding indication?

The other change is we're assuming EXIF data is always UTF-8, with or without a BOM on the front, since there is no facility within EXIF to indicate encoding and UTF-8 is the only reasonable thing you can assume.

Looks to me like the BOM is still in the building, or still in part of it anyway, so to speak. I could have sworn that after I installed v12.22 was released in October, I tested adding metadata to typical fields I use in images to make sure BOMs were no longer being added to the EXIF fields as I had seen previously, but maybe I hadn’t actually done any testing, or maybe my testing wasn’t as extensive as it should have been. In any case, I’m now seeing BOMs being added by DOpus to the beginning of EXIF:ImageDescription, as reported afterwards by ExifTool. I now have v12.23 installed, but it also happened while I was still using v12.22. Can this behavior be confirmed by anyone else?

Do you mean there was no BOM, you edited the field with Opus, then there was a BOM?

That isn't supposed to happen now, at least.

Could you give us an example JPG where editing it in Opus triggers this, and exact steps on what to edit (which field to set to which value)?

There was no metadata whatsoever in the relevant fields, then I added some, and BAM there was BOM. I’ve now done some more extensive testing, and found that pretty much ALL of the EXIF fields I routinely use that DOpus used to add a BOM to, still get the BOM added. The affected fields are:

EXIF:ImageDescription
EXIF:Make
EXIF:Model
EXIF:Software
EXIF:Artist
EXIF:Copyright

I guess I must not have done any testing back in October after all.

PNG, in this particular case, but it happens to JPEGs for me as well. I normally add the metadata you see in the PNGs in the attached ZIP programmatically, via custom commands such as:

SetAttr META "(comment|title|subject|imagedesc):{dlgstring|Enter comment/subject/description/title:}"
SetAttr META "tags:{dlgstring|Enter keywords:}"

In the case of this test, I also added metadata to one of the PNGs via the Set Metadata panel, with the same results.

Attached: “2020-12-30_MetaBOM_Test.zip” (4,459)

Contents:

“2020-12-30_MetaBOM_Test\”
“1-NoMeta.png” (640) [1000 x 1000 x 1]
“2-Meta_via_Command.png” (4,253) [1000 x 1000 x 1]
“2-Meta_via_Command.png.json” (883)
“3-Meta_via_Panel.png” (4,144) [1000 x 1000 x 1]
“3-Meta_via_Panel.png.json”

(867)2020-12-30_MetaBOM_Test.zip (4.4 KB)

Many thanks!

I think we have this fixed for the next version.

Glad to help. I look forward to taking a look at the next version when it arrives.

Fix is now available for testing: Directory Opus 12.23.1 (Beta)

Thanks. I don’t suppose there’s any way I can test it without installing it, is there? I was hoping the installer would give the option for an official portable installation, but it looks like not. Running a virtual machine is out of the question on my system, I’m afraid.

You'd have to install it, the same as any other version. If Opus is already installed, that ultimately just replaces a few files with the new ones. You can reinstall the previous version afterwards to go back to what you had.

The 12.23.1 beta appears to have eliminated the BOM here. Thanks.

1 Like