Hide all or Show only files with identical CRC in SRC and DEST

I know we discussed hiding or only showing files with identical filenames in the past:

I'm struck with a use case where I need to hide or show only files with identical CRC (so it could also work within archives) or MD5 hashes but with different filenames in order to visually compare / verify / see differences in contents without synchronizing or anything.

Last time I checked even "beyond compare" wasn't able to do this.
I know dopus can do everything via scripting but I suspect the magic in the above solution won't help me this time and going at it with loops, vectors and arrays is the only way. I'll be making and sharing this script however I'm stuck at the beginning.

Is it possible for a script to access the Hash fields of files without explicitly opening them via FsUtil and FsUtil.Hash. In FsUtil's case i see the following possible problems:

  • Already cached (by dopus as native column data) hashes would be recalculated on every script run (unless reinventing caching in the script itself)?
  • Hashing files in archives would require extracting them?
  • Hashing would be performed synchronously inducing a performance penalty.

The things above would severely affect performance and usability.
Could there be some alternative approach I am missing?

I am afraid you are right being concerned about performance. Maybe you can use collections built by Find DUPES to simplify your caching. For maximum speed and best usability you will probably need your own cache. ExtendedExif uses one and could provide some ideas.

Maybe only for inspiration, there is a script addin called CompareEx, which is able to select items based on MD5.
You were able to find it here (in the german DO support forum): https://www.haage-partner.de/forum/viewtopic.php?t=4956
But since the linked thread/homepage and forum is offline for some time, try this one to get the script addin at least: Want to Compare two listers/folders to check similarity?

Run it like so:
CompareEx TABS=by-md5

It won't work with archives I guess, did not try.

Not sure what size of files and archives you need this for, but if you go huge in number of files or file sizes/archive depth, scripting this thing is probably not optimal. Heavier lifting could be done by some specialized C++ code/executable which only reports back to logfile. Logfile content could be read by DO script and applied to current view (the selecting/deselecting part), separating logic/comparison and view by that.

Thanks for the input guys.
It will have to work for both normal filesystem and inside archives, as well as support flat view for both.

It seems it's time to dust off an old c# archive search project I made some years ago to work with the archives portion as it can just read the archive header and list all names and hashsums on those. Might as well make it calculate the normal folder portion.
The application will be run with SRCpath and DSTpath as arguments and spit out a json file to feed back to the script which can then do the showing and hiding.

This leads me to another minor inconvenience. I can't seem to find any property in dopus scripting for telling apart VFS paths (when within an archive) from normal filesystem paths.
Eg. C:\Folder1\Archive1.rar\ArchiveSubfolder1\ArchiveSubfolder2
Vs C:\Folder2\Folder3\Another.Folder.With.Dots\AnotherSubfolder
Bruteforcing known archive extensions within the path string seems trivial but is never the proper way to go. We need to support all extensions but we can't trust the dot as delimiter because it's also a valid folder name character. Having to make dir.exists checks on all parts is ugly as well but will have to do for the time being until we get there.

That's essentially what Opus does. You can probably skip checking path components that don't have a dot in them to speed things up slightly.