Preferential file selection in Find Duplicate Files

One day some time ago I discovered that DO had a Find Duplicate Files function. Wonderful - I could get rid of another application that did that. It never ceases to impress me how useful DO is.

When there are a lot of duplicate sets though, a useful thing that the stand-alone app could do was to set a path (or list of paths) to either keep, or delete, preferentially.

For instance, I could set it to keep a file if found in the folder c:/pics/cars/ferrari and mark for deletion those duplicates found elsewhere (and additional duplicates in the given folder). Or, inverse it and mark files in the given folder for deletion first.

In either case, only one file in a duplicate set would be kept unless I manually intervened. This is mostly what I want; leave one of each file where it should be, which might be all in one main folder, or in many different folders (different files, that is) where I've sorted them to and removed from the given one.

There was also a checkbox to include subfolders below the given path as well, so a file found anywhere from the path downwards would be 'preferred' for whichever.

This occurred to me today when I found about 200 duplicate sets, and of course all the ones marked for deletion were in the one folder I wanted to keep them in so I had to manually inverse every file tick.

Note that I did discover, after doing the above, that I could reverse the sort order in the duplicate files display and click 'Select' again. That might do the trick sometimes, or might not. Having the above would be more flexible.

1 Like

Thanks for the suggestion!

Hi, to add to this idea, it would be nice to be able completely lock one path against modification, basically using it as a stable reference.

I think that one scenario that often comes up is that your goal is not to eliminate all duplicates. Rather, you want to 'validate' one directory structure against another. This comes up when you manually back up or reorganize files from one location to another, and then want to verify that the target contains all the files from the source before deleting them.

To make this more concrete, I collect all the photos from the various family phones and manually sort them into a folder structure. Inside that structure, I sometimes create albums with intentional duplicates, so I do not really care about internal duplication there. I leave the photos on the phones though, so it is not immediately obvious which photos are already in the sorted structure. When I need to delete photos from the phones to make space, I want to ensure that all photos that I delete have been copied into the sorted structure. By marking the sorted structure as locked or 'for reference', I could easily delete all sorted photos from a phone, without accidentally deleting my albums within the sorted structure.

There is an old discussion thread where people have the same requirement. The third party tool mentioned there offers such a locking mechanism, which is a really elegant and simple solution to the problem.

As a second point, I don't really use copies, but hard link clones for my albums. The find duplicates function does not distinguish between true copies and hard links to the same file. It would be nice to have that ability as an optional behavior, i.e. 'Ignore hard links' or 'do not flag hard links as duplicates'.

1 Like