The case of the missing COMMENT (DOpus vs. ExifTool, round 10?)

This issue is somewhat similar to others I’ve brought up before involving ExifTool and DOpus, but pretty sure I haven’t aired this particular set of specifics before. I’ve also brought this up in the ExifTool forum, in case anything useful comes of it, though no such luck there so far.

As stated in previous posts, I utilize both DOpus and ExifTool extensively in dealing with image metadata, mostly deal with images in PNG format, and a significant portion of my relevant work is spent on scanned pages from physical publications such as books and magazines, paper documents, and/or extracted/converted page images from PDFs. Adding unique and useful metadata to fields such as DESCRIPTION, SUBJECT, TITLE, COMMENT and TAGS (aka KEYWORDS) has always been a challenge, and my use of ExifTool has been instrumental in streamlining the process, but I also want the commonly-displayed metadata fields (mainly COMMENT and DATE TAKEN, for my purposes) to be displayable by DOpus, and previously found that it’s necessary to add those fields using DOpus prior to any subsequent manipulations by ExifTool, because the two tools don’t create new metadata in the same location within PNG files, and DOpus won’t display it when created by ExifTool, but ExifTool is reportedly capable of updating metadata in PNGs whereever it is found.

For page images derived from a 13-page PDF document, my filenames might look like this:

“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 01.png”
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 02.png”
[…]
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 13.png”

I use DOpus (via SetAttr META in custom command) to write the same value to what it calls DESCRIPTION, SUBJECT, TITLE and COMMENT in all 13 files, such as: General Company invoice 1234567890, account ABCDEFGHI-00001, 15 Dec 2020, total due $0.01, due date 5 Jan 2021, p 1/13; File as downloaded: “General_bill_December_15_2020.pdf” (296,175) [13 pp, 4347 w, 20097/23854 ch, 590 l]; Source: <https://www.general.com/gw/bill/docs/getpdf/gndoc?docName=General-Bill-12.15.2020&docId=YNAhNz8sXlzYk3n3dHidUX8hWkmiYZ5R9SbXOkGvcDcfsKLKtI22MkilMpEIdbItYozvYAGlzR0nmgg3Tdu6ZsAL1hxvnosmFcGx1sOSSd3fivVEkSQh2xQOPlDhouAU9yDpaJkhXGvV3vgjKBZWcB6rGbsAo6s6Uo72YGK2tDS8FbwP0PCQaYuknwWo0>

Then DOpus again to add TAGS (aka KEYWORDS), example: General; Company; PDF; Document; Screenshot; 2020-12-15; 2020; December; Bill; Invoice; $0.01; Due_2021-01-05; Due_2021; Due_January; Account_ABCDEFGHI-00001; Invoice_1234567890; Page_1

I ultimately want each page image to have metadata accurately reflecting its own unique page number, so I’m using a fairly complicated ExifTool command, created with much help from the guys in that forum last summer, that gets the page number from the filename (digits following “-” at the end in this example, though could be followed by more text in some cases) and uses it to adjust p 1/13 to the resepectively appropriate number in DESCRIPTION, SUBJECT, TITLE and COMMENT in all files, and likewise bump Page_1 in TAGS/KEYWORDS.

The problem is, in some but not all cases, after I’ve run the ExifTool command to adjust the page numbers, DOpus won’t display the updated COMMENT, neither via mouse-hover tooltip nor via Set Metadata dialog / Metadata pane. The length of the data string appears to be a factor — the threshold in my tests seems to be 512/513 characters. ExifTool confirms that the metadata is, in fact, there in the COMMENT, and that it’s identical to the strings in DESCRIPTION/SUBJECT/TITLE for each file, and at most only one character off from the original (and correctly displayed) string written before the ExifTool manipulation. Furthermore, I can manually copy the updated string from any of the latter fields, paste it to COMMENT, and then it shows up as expected. But the copied/pasted/displayed string is exactly the same as the MIA one, as far as any method I can use to examine it goes.

In case anything can be gleaned from my test files, please see the attached 7-zip. Apart from ideally solving the mystery and/or finding a fix for it, assuming there’s any chance of either happening, it would be nice if I could use DOpus to programmatically copy from, say, DESCRIPTION to COMMENT in all selected files, but I’m not seeing any way to do that (unless it’s maybe another scripting-only solution).

Attached: “2021-03-29 17;18;54 - MAZE - MIA Comment Test.7z” (13,218)

Contents:

“2021-03-29 17;18;54 - MAZE - MIA Comment Test\”
“0 No Meta\”
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07.png” (197) [1 x 1 x 1]
“1 DOpus\”
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png” (8,152) [1 x 1 x 1]
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png.json” (6,644)
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png” (8,168) [1 x 1 x 1]
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png.json” (6,652)
“2 DOpus+ExifTool\”
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png” (11,380) [1 x 1 x 1]
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png.json” (7,733)
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png” (11,687) [1 x 1 x 1]
“2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png.json” (7,743)

2021-03-29 17;18;54 - MAZE - MIA Comment Test.7z (12.9 KB)

We deliberately ignore the Exif UserComment field if it contains more than 512 characters. From memory this is because photos created on iPhones store their entire metadata chunk in text form in that field, often kilobytes in length, and showing it in the comment field in Opus seemed unhelpful. 512 seemed like a sensible cut-off, because a metadata chunk will usually be more than that and almost all comments will be less.

I assume you mean EXIF:XPComment? When I use DOpus to write what it refers to as COMMENT, it ends up in both XMP:UserComment and EXIF:XPComment, but nowhere else for me.

Either way, my experience with 513+ character comments doesn’t seem to bear out what you’re saying. As stated in my first post, if I use DOpus to write it to begin with, it gets displayed via tooltip and Set Metadata just fine. If I use ExifTool to update it, it disappears from view in DOpus. If I copy/paste the updated string to COMMENT again, it’s once again visible in DOpus.

Does DOpus maybe have some way to distinguish between a 513+ character string written by itself to COMMENT in zTXt and one written by another tool?

No, the limit applies no matter which tool writes it. My only guess is that when you write it in Opus, you're actually setting the "Image Description" rather than the comment, which is a different field.

The 512 char limit applies to Exif:XPComment, Exif:UserComment and Xmp:UserComment as Opus treats these fields interchangeably.

  • In the file display this value is shown in the Comments, User Description and Description fields.
  • In the metadata pane it's the Comment field in the Extended Properties section.
  • The Set Description command will also set this field for an image file.

There's also Iptc:Caption, Exif:ImageDescription and Xmp:description which Opus also treats interchangeably. No 512 char limit applies to this field.

  • In the file display these are shown in the Image Description column.
  • In the metadata pane it's the Description field in the Document Properties section.

I like to use Microsoft's indexing software to search images on my system. Bizarrely, the Microsoft indexer knows nothing about The description or Image description fields. If you want to index and search these fields (which any normal photographer would term Caption) you have to copy the Opus imagedesc field into the comment field and search on that

I use an Opus script to do this on every picture I process.

Exiftool tells me that imagedesc field has been copied into the following fields.

User Comment

XP Comment

Description

Caption-Abstract

I have thousands of images that have been processed this way and I have had no trouble with missing data ever. Some of my captions also contain many many more characters than 512. Incidentally, Exiftool forms no part of my image workflow.

I just offer this as an observation in case it is of any help.

Then how to explain the files in the 7-zip I attached to my initial post in this thread? The file “1 DOpus\2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png” has a 513-character string in COMMENT, which displays via tooltip, the Set Metadata dialog, and the Metadata panel (ExifTool confirms that this string is stored in both XMP:UserComment and EXIF:XPComment, and same string has also been written to all metadata fields corresponding to DESCRIPTION, SUBJECT and TITLE in DOpus — see the associated JSON file in the archive).

As just stated, I am typically writing the same data to ALL metadata fields corresponding to COMMENT, DESCRIPTION, SUBJECT and TITLE in DOpus. See JSON.

Thanks. I previously posted about my experimentation with writing as much metadata as possible via DOpus and noting which actual metadata fields the data goes to — see Set Metadata inconsistent results.

I find it very helpful, especially since there are a handful of metadata fields I use which DOpus won’t deal with.

@ mazeckenrode

Obviously our needs are very different. All I require is that the metadata in my images can be read wherever I send it, which, in practical terms means that Photoshop can read it. I only use three image formats, JPEG for ordinary images; LZW compressed TIFF for images with transparent backgrounds and PSD files for layered images.

It strikes me that you are using PNG files, about which I know not a lot apart from it's ability to handle transparency easily.

Have you tried experimenting with different image formats? I have no idea if it might help.

All I know is that using Opus to write image metadata is smooth,very, very quick and extremely reliable. What's more you can do it all without leaving the best file browser there is.

Not for me; the 513 one shows no comment, whereas the 512 one does:

@auden

I’ve created a new thread “Image formats & metadata” in the Off-Topic forum to continue this discussion.

Now, that is interesting, and gave me an idea. I’ve verified that both files in that folder, as they already existed on my computer, continue to display their comments. But if I download that 7-zip from this thread, unpack it and check, the 513 does NOT display its comment. And if I then open Set Metadata for it, select/copy the full 513-character string from any of the other fields (DESCRIPTION, SUBJECT, TITLE) that contain it, paste to COMMENT and click on OK, the comment is then visible (via tooltip, dialog, panel, and DESCRIPTION field of lister file display). If you (and/or anyone else) don’t mind indulging me a bit more on this, I’d be interested in learning whether this is reproducible by anyone else following the same steps.

In any case, it seems that DOpus is not ignoring comments with 513+ characters under all circumstances, or at least not for me. I’d prefer that it never ignored them, but don’t recall ever having had any experience with iPhone photos packing all their metadata into Exif:XPComment, Exif:UserComment and/or Xmp:UserComment, even though I have dealt with some iPhone photos before. Is it something that should possibly be revisited?

If iPhone photos with all metadata pre-loaded into one or more comment fields are actually still a thing, could someone archive and upload one here for me to check out? My curiosity if piqued.

My guess is that the comment is being written to NTFS ADS metadata in the filesystem, as well as the EXIF data inside the file itself. That ADS would be lost when archiving the file, but also isn't subject to any length limits (since other tools don't do crazy things with it that cause problems with huge data chunks).

Wouldn’t you guys know if DOpus did that? Or are you saying something other than DOpus could be repsonsible?

Anyway, this is what the output of dir /r looks like for the two PNGs with their comments visible:

29-Mar-2021  17:16             8,152 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png
                               1,316 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png:SummaryInformation:$DATA
                                   0 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (512 ch).png:{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}:$DATA
29-Mar-2021  17:11             8,168 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png
                               1,316 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png:SummaryInformation:$DATA
                                   0 2020-12-15 23;59;59 - General - xFGHI-00001 Invoice 1234567890 - 07 (513 ch).png:{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}:$DATA

No expert on ADS here, and not completely sure what all that output is telling me. This page (“Introduction to Alternate Data Streams” at blog.malwarebytes.com) implies that $DATA should be the main file content, but I see two entities (streams?) ending with $DATA, one of which is listed as having a size of 0, for each PNG. If I attempt to read the contents of the non-0 ones (ending in SummaryInformation:$DATA), it just looks like gobbledegook to me. Curiously, the two SummaryInformation:$DATA streams are indicated as having the same size (1,316) in the output above, but when I read their content using Notepad++, one is said to have a normal text length of 1,724, while the other one is 1,726.

I used NirSoft’s AlternateStreamView to delete all of the alternate streams, but comments remain visible for both PNGs here, so it doesn’t look to me like ADS is the cause.