Metadata Problem when Changing PDFs

SoBreezyInChile · April 22, 2020, 6:01pm

The Error that Occurs
When I open the Metadata Panel for a few PDF files and try to enter new data in the Document Properties fields, the following error dialog appears: "An error occurred setting metadata..." I noticed the following details to these errors.

This error dialog only occurs with some of my PDF files. They are always the same files that show this error when I attempt to change the metadata. Most of my PDF files do not show this error and I can update the metadata in Document Properties normally.
The error only occurs when I try to change the fields in the Document Properties section of the Metadata Panel. When I change the metadata in other sections, everything is normal.
This error only seems to appear when changing the metadata of PDF files.
I will make a copy of the defective PDF file and try to change the copy's metadata but it will still lead to an error dialog.
The Document Properties section is empty when I try to change it and it leads to an error.
In some PDF files, I will add info to the empty Document Properties section and it will not lead to an error. But, even if I add info to all the fields and then press Apply, all the fields in the Document Properties section will return to being empty.

Possible Causes
I am speculating that some defective PDF files are missing identifying info or a storage area which is where the metadata is stored. Other defective PDF files may have them but they aren't working properly. Most PDF files have them so they work normally.

Is there a way to fix this problem? I couldn't find a similar problem when searching through google or the threads in this forum.

Thanks a lot!

Leo · April 22, 2020, 6:30pm

Can you zip up some example files, and tell us which metadata to try setting to what?

Setting PDF metadata can depend on third party components (not part of Opus or Windows). Which PDF software do you have installed?

SoBreezyInChile · April 23, 2020, 2:49am

I zipped up 2 PDF files that malfunction exactly in the two ways that I describe. I private messaged them to you.

I have Nuance Power PDF. I downloaded new copies to see if perhaps I altered them by mistake. I also repaired Nuance to see if that is it. Maybe I'll uninstall Nuance completely and install it again and see if that fixes it. The malfuction doesn't happen to most PDFs. It only happens to a couple of them.

Leo · April 23, 2020, 6:05am

The larger PDF you sent can't be edited (by Opus) because it's encrypted.

We'll improve the error message for that.

Note that encryption here doesn't mean nothing can view it. The encryption is done to block things like printing, editing or copy & paste but not intended to block viewing. By the sound of it, the encryption scheme was secret to Adobe for a long time but makers of other PDF viewers worked it out. It's thus probably completely pointless to use it at all, since it just gets in the way, and anyone really determined to print the document will use a tool that bypasses the encryption.

More about that here:

https://www.cs.cmu.edu/~dst/Adobe/Gallery/anon21jul01-pdf-encryption.txt

Quoting the important part:

...

The goal of protecting a PDF file is typically something like this:
you want a viewer application to be able to display the file but not
be able to print it. (PDF has additional options to disallow
copy-and-pasting text and editing the file; the same argument applies
to them.) The problem here is that exactly the same information is
used for both functions -- once you have a (decrypted) page
description, it can be turned into either pixels on the screen or
toner dots on a printed page.

Adobe's PDF protection scheme is a classic example of security
throughd obscurity. They encrypt the content of a PDF file and hope
that no one figures out how to decrypt it. When Adobe's viewer
encounters an encrypted PDF file, it checks a set of flags, and allows
certain operations (typically viewing) while disabling others
(typically printing).

Now PDF is supposedly an open standard, and in fact, Adobe has been
pretty good about documenting it. They initially refused to release
detailed information on the encryption; presumably they were aware of
the security/obscurity issues. But they eventually relented. Various
third-party PDF viewers have been able to display encrypted PDF files
for a few years now.

...

The smaller PDF seems to be malformed, at least according to this same information:

Every document has a "trailer dictionary" which holds references to a
few important things (like the tree of page objects which contains the
document content) and optionally to an encryption dictionary. If the
encryption dictionary is present (i.e., if the document is encrypted),
it contains the information needed to decrypt the document. An
example:

The smaller PDF has no "trailer dictionary" inside it. When that happens, we try falling back on a second method of storing PDF metadata (XMP), but that is also failing.

(Note: I'm far from an expert on the PDF format, so this is just based on what the code is doing and what the documentation I found says.)

SoBreezyInChile · April 23, 2020, 12:51pm

Thanks, Mate! It had nothing to do with Directory Opus! You probably get questions like this all the time and figured out the answers yourself. Must be frustrating.

Thanks again. I am able to edit the metadata now. All I had to do was open the PDF files in Chrome and the print them as a PDF. The new PDF is unencrypted and must have the trailer dictionary that you were referring to. Now, I can fix the metadata.

Now, if I can only search for the file by relying on the metadata. That would be awesome...

I know it will eventually happen so that every program can do so. It's just a matter of when - who gets it done first.

Thanks again.

John

Leo · April 23, 2020, 12:57pm

This let us make some improvements on the Opus side as well, so it was useful all around.

Clever solution.

You should be able to via Tools > Find Files > Advanced. e.g.:

SoBreezyInChile · April 23, 2020, 12:58pm

Official Solution: Unable to Change the Metadata of PDF Files
The PDF file doesn't allow changes to a section of the Metadata panel because it's either encrypted or it doesn't contain a trailer dictionary, according to our resident expert, Leo.

To fix either problem, open the PDF file via an internet browser like Chrome, and Print the document as a PDF. You'll end up with the same exact document but fixed. Some say that you cannot select the text of the new PDF so you need to keep the original PDF around but I've found that this isn't the case.

Happy Computing

SoBreezyInChile · April 23, 2020, 1:16pm

Sorry, when I just said it would be great to include the feature of searching through metadata via a Search, I meant that it would be great if you could find it by typing search terms in the Search field as normal and it would look through filenames as well as the metadata automatically. It's too much of a pain to specify one field and see if the search term(s) shows up there. Then, try another field. And, so forth until you find the file. This would take too long because the search term could be in any of the many fields, filename, pathname, etc.

I already knew that you could already look through metadata in the original Window's File Explorer by adding "Tag:" before your search terms but that kinda' defeats one of the major purposes of a Tag - convenience. Tags are terms that act as cognitive connectors to the filename but use totally different words. Including tags in Searches become a life-saver when you can remember a tag but for the life of you, you can't think of a single word in the filename.

You also want the search in specific Tag fields to be performed automatically every single time you do a general Search. Literally even an additional second of effort of extra typing or clicking prevents the average person from performing the action if they think there is a high chance that nothing is going to come of it.

It would be great if it could also search through content but it would take forever as it would create a humungous database and it'll also produce many worthless results.

That's my opinion.

Leo · April 23, 2020, 1:20pm

What the search field does is up to Windows Search. I'm not sure exactly what it does in terms of PDF metadata.

SoBreezyInChile · April 23, 2020, 2:06pm

You just mentioned that "The smaller PDF has no "trailer dictionary" inside it. When that happens, we try falling back on a second method of storing PDF metadata (XMP), but that is also failing."

I had mentioned a program called FileMeta a while ago. It doesn't work with Opus but it works partially in Window's File Explorer. It's the best thing I could find and I looked through a lot of programs. Maybe, you can look at FileMeta's code and see how it accomplishes what it does and it might give you some ideas.

FileMeta is far from perfect because you can add (but only add) a gazillion of predefined metadata fields but you can't create custom field names. This stunk. The interface is also nonintuitive and clunky.

However, it's the best program I found in which you can alter and add to the original metadata. All of the other tagging programs I found were distinctly flawed. They usually added their own Tag filing system or other addition to the original file which didn't really didn't fit into a person's normal workflow. I really tried to incorporate the popular and pretty programs such as Tabbles, XnView MP, TagSpaces, Tag Explorer, RecentX, Tagged Frog, SetTags and literally 10 or more other tag/metadata programs into my workflow (tested each one for a few weeks). Nothing worked. They were all conceptually flawed. For example, you might have to use a separate program to perform a search for tags. Who would want to do that? They essentially all took too long to use the tags or just too inconvenient. Also, as you know, long in computer-use time is several seconds, rather than instantaneously. That's what users are usually looking for and need. That's IMHO.

I also think incorporating automatic tagging would be awesome and is not as hard as it sounds but that's a totally different story...

auden · April 23, 2020, 4:22pm

If you are looking to use Windows search to search PDF files, this may help.

https://blog.techhit.com/55696-indexing-and-searching-pdf-content-using-windows-search

SoBreezyInChile · April 24, 2020, 6:27pm

Thanks for the great advice. I'll try it out. I only want to search the metadata (because indexing and searching through the entire contents seem like it would take a lot of resources) but I'll give it a whirl!

Thank you.

auden · April 25, 2020, 3:32pm

I have found that searching metadata on PDF files with Windows search is very iffy. For instance, I can't get any results at all out of the keyword or comment field, which kind of renders it pretty useless.

However after you have indexed your PDFs the free text search is lightning fast and it does not take that long to index the files if you do not use the computer while the indices are building.

I tried to tell windows search to only index the metadata, but it seems to index the content anyway.

SoBreezyInChile · April 27, 2020, 6:55pm

Auden,

Tough searching through PDF metadata via Windows search? I have found it hard to through the metadata of many other oducments as well.

I wish there was a more robust enternal search engine that can look up every stuff with the following parameters:

You can set the areas where the search applies (not that hard to accomplish. most programs do this such as Windows)
Search by filename and parent folder name.
Search by metadata like comments, tags, and title.

auden · April 28, 2020, 4:43am

@SoBreezyInChile
Searching PDF metadata seems like much hard work for very little return. As far as I can see the only way to index the metadata of PDF in Windows Search is to turn on Adobe's PDF Filter. The problem is that this filter insists on indexing everything in a PDF. The result is that with even a moderate number of PDF files, the indexing takes ages.

Even having achieved the building of the index, certain metadata fields like comment and keyword do not yield results when Opus uses Windows Search.

I have read that Microsoft will introduce some substantive changes to Windows search in the upcoming May update. Rumour has it that one of those improvements is the speeding up of indexing. We shall see.

I can however vouch for the efficacy of Windows Search and Opus for images. But do remember that a search is only as good as the metadata a file contains, which is why companies like Getty Images employ professionals to keyword their metadata.

I have thousands of images. I can find any by date; keyword; description; subject; photographer within a couple of seconds at most. Windows Search is far from perfect, but as the song says "it's getting better all the time".