Cleaning up a mess of files

Maybe this should be in Help and Support section but I was more interested in people's view on this. For reasons whicch are not relevant I was asked to look at a program called Allway Sync. Not a patch on Opus but it has one useful function in that it can sync more than 2 folders or discs at once. This led me to ask myself the basic question - what was I wanting from that and also to some extent Opus.
"Requirement is to ensure all 2 or more discs in a collection all contain the same files in the same folders and all duplicates within each disc are removed.
That proved a difficult requirement. The following undoubtedly shows up my IT inexperience but, if you have the time and patience to read it. I'd be interested in views - and possibly answers!

Difficulty
Files in folders and discs can include -
a. the same file in 2 or more folders and/or disc(s),
b. the same file with 2 or more different names in the same or different folders and/or disc(s)
c. the same file with 2 or more different dates in the same or different folders and/or disc(s)
d. a file with the same name but different sizes in different folders and/or disc(s)
e. the same image file but a different size in the same or different folder and/or disc(s)
f. the same image file but with a different name and a different size in the same or different folder(s) and/or disc(s)
There are probably more combinations I haven’t thought of.

Ideal program
Should
a. Search on selected folders on one or more disks for the duplicates
b. Select folder/drive to be the one kept
c. Once duplicates found all but one duplicate to be moved to a temporary location until one is sure there are no errors
d. The duplicate for deletion is the duplicate on the user selected folder or disc
e. The actual deletion requires user to select duplicates
Option 1
a. Compare disc A and disc B and flag for deletion from (say) B any duplicates of files in A.
b. Any files now left on disc B are only ones not on disc A
c. Synchronise disc A and B
d. Delete duplicates from within each disc
i. Drawback to this method is that it involves deleting files from B which will need to be copied back in step c

  1. Option 2
    a. Synchronise disc A & B
    i. Disc A and disc B now contain identical files but both have duplicates within each disc
    b. Delete duplicates from each disk
    Either process needs to be repeated for disc C and any other discs

Two main problems with this
Option 1 requires 3 duplicate searches and one synchronization. It also leaves you temporarily without a backup so you should have a 2nd backup available before starting this process.
Option 2 requires 2 duplicate searches and one synchronization. Which is faster depends on which process, copying or identifying duplicates is the faster.
Cleaning up a mess of files – Original and more than one backup
Option 2 can’t be used to sort out a 3rd disc as synchronizing the 3rd disc with either A or B will probably result in lots of duplicates being recreated in A & B. So Option 1 has to be used to ensure that the only files still on drive C are unique ones not on A or B. That means doing it twice – once comparing a and C and a second comparing B and C. Disc C can now be synchronized first with A and then with B.
Neither GP Software’s Director Opus nor Allway Sync seem to fit the required bill.
Allway Sync does not have a search for duplicates function. Opus doesn’t have the facility to sync 3 or more folders/discs simultaneously.
Both do have what I perceive as a major problem with syncing – a smaller but newer file will be used to overwrite an older but larger one. With a text file that is not necessarily a bad action, but with an image in many, probably most cases it will be.
A large text file will often be edited down to more precise prose and then deliberately saved over the original file.
A large image file can often be resized for a number of reasons but the important file is the original large one. If one remembers to change the name. Changing the name is therefore essential to avoid the larger file being overwritten.

Opus may well have the power to do all this. Maybe I just need to dig deeper!

Thanks for reading this far!

Is the aim to have a backup of everything, or is the copy actively used as well?

Is one disk a copy of the other, or can changes be made on either and the sync needs to go in both directions?

The aim is to ensure that the active master disc has a copy of all files that exist across all discs.Once it has all files, then all other discs are synced with it so that they too have copies of all files. So this first stage will be a mix of two way and one way copies.
In the process the aim is also to delete all duplicates in all discs. Most of these files are images but some may be smaller versions of the same image.
In the end there should be one master disc which is the active master disc, with the other two discs being full backups of this disc and which will continue to be used as backups so future copies will be one way.
At present I have files on the master disc, mostly backed up to the other two, but the other two may also have files which for some reason are missing from the master disc. All 3 discs have duplicate files which I would like to remove from them.
Hence my subject line - cleaning up a mess of files!

I would look to backup software in that case. The capabilities vary greatly and it's not always possible to find one that does everything you want in one package (I've had the same problem myself), but it's where I would look if the aims are:

  • Automated backups.
  • De-duplication.
  • Historic backups, so if a file is replaced you can still access older versions of it.

Backup tools will handle those requirements better than synching tools.

I don't know if any tools can help there, although it's possible.

It's easy if the files have the same names and paths, but I am guessing things are all over the place. If so, you could e.g. generate a flat list of all files (names and/or hashes) on all the drives, then see which files are only on the secondary drives and turn that into a list of files you then need to manually inspect to see if you want to keep them (restore from the backup drives to the master drive) or delete them (if they were backups of things you no longer want). I don't know if any tool automates that kind of process.

Take a look at Duplicate Cleaner. It doesn't do everything you want but it will help with cleaning up duplicates while protecting the directories you pick. It can also display unique files. And it has an Image mode for finding images that could be identical but maybe smaller/larger or flipped/rotated etc that you set how sensitive it is. For images, it also has a compare window so you can compare and select the files you want to delete. Lots of options with that program that has helped me find duplicates that don't have matching hashes

  1. Move everything you want to one disk.
  2. De-duplicate using dopus. Compare by md5.
  3. Upload this master, deduped collection to the cloud - google drive/dropbox/onedrive/whatever.

In the end there should be one master disc which is the active master disc

  1. Sync via the cloud, using it as your master copy.