Dopusrt sorted collection import

apocalypse · June 2, 2022, 6:06pm

I'm writing an image similarity program (in .net) which integrates with DOpus via imported file collections similar to my SearchEverything project and plan to release it in the scripting section soon.
It will scan a folder or multiple folders for images, generate perceptual hashes, compare similarity of those hashes based on a target similarity threshold and provide these functions directly to DOpus:

Ability to group similar images together in /dupe style collections and show them grouped in DOpus.
Search for images similar to the one selected and display them in a collection.
Display all images in a folder (optionally including subfolders ala flat view) sorted by their relative similarity. The only program that has this functionality, which I am aware of, is ThumbPlus and it's bloated and a pain to use for actual file management.

I'm mostly done with the hashing and sorting part however I cannot get DOpus to order the collection on import. For sorting, I'd like to produce a view which is not grouped and ignores the sorting order, or produces a dynamic column by which it will be sorted by default. A switch like /dopusrt col import /sorted. This would display the files in the same order they were read from the imported collection file but not force a grouped view.
I tried using the #int,filename trick to make them look like dupes but the resulting groups cannot be ordered by group name.
I tried appending group: id , name . which indeed assigned names to the groups with incrementing numbers but they still sort on filename. Group sorting does nothing in DOpus.

Finally, the ability to sort group appearance order (alphanumerically), while keeping file sorting within them, without going into column scripting would be extremely helpful in many other cases, outside of the scope of this project.

I'd appreciate some input on how to handle this if at all possible in the current version or if it can hopefully be implemented in the upcoming ones.

Leo · June 2, 2022, 8:45pm

My approach would be to make the list of files for the collection, and then also have (either in the same file or in a second file) a list of the other metadata which can be read by a script column.

You can then sort, group, search and filter on that script column however you need to and won't be limited to just the one order the files were listed in initially.

apocalypse · June 2, 2022, 9:09pm

This is approach could be useful to display a column with the relative similarity in case #2 but not much else. It's complete overkill to have to maintain a custom column with just a sequence number in order to sort a view. The additional hash data makes no sense to dopus anyway and it cannot compare based on it or use it in any way so I'm not supplying that at all. Just file-paths but in a correct order.
On top of that - the generated import files are temporary and used once. Hashing, hash caching and comparing will be done on the .net side. Having to store additional data for dopus is unreliable and would break persistence. What happens when you clear the temp and restart dopus? Collection is still there but the sorting is gone as the file that defines the order no longer exists and the column is blank.
Writing temp files to working folders is a mess and I wouldn't go that way either.
Writing to dopus data would require eventual cleanup.
If it's auto-handled by the collection itself - deleting the collection removes all traces of the query.

Leo · June 2, 2022, 9:53pm

Collections are just a list of files. How they are sorted will come down properties of the included items (whether built-in or via scripts) or manual-sort.

You could have a script which sets up a manual-sort order at the time the collection is imported. But you can't group by that, so you'll probably need a script column of some sort if you want to do custom grouping.

One alternative would be to not use collections at all, and instead have the program that generates the list create a folder hierarchy where each group has a sub-folder and within that there's a link to each real file. You could then group by location on that.

(Not sure how good these approaches will be for what you're ultimately aiming to do with the files after they're in Opus and grouped, of course.)

apocalypse · June 2, 2022, 11:51pm

The thing is that the sort only matters in case #3 and that doesn't require groups. I tried implementing it with groups to somehow force sorting alas with no success.

Manual sorting seems like it would work however this part of the manual concerns me:

Manual sorting is currently not supported in Flat View or when the file display is grouped. It will also not work correctly in a folder when compatibility files are displayed and there are two files with the same name.

Nothing mentions how it behaves in a collection, but I reckon they would fall in line with Flat View as those have files which do not necessarily come from a single directory. Also from what I understand, manual sorting order is saved as ntfs streams or properties of the parent folder. Two files with the same name are totally possible when enumerating trees with subdirs, hence why we use collections, so this is not applicable. Also passing huge data structs from external commands to apply that sort is a no-go. Reading them from files - might as well go back to script columns.

I also noticed that when a collection .col is generated via /import the collection item order is reversed compared to the source file we pass to it, which would invert the expected order as well. If only dupe_id was exposed as a column we could sort a lister by it.

I might settle on a custom column which would directly read the dupe_id from the /dopusdata/Collections/*.col which it belongs to.

Leo · June 3, 2022, 9:46am

The order in the .col file shouldn’t matter. The items will be sorted by some criteria after the list is loaded. (And the user may change the sort by clicking a column header etc.)