Duplicate file finder

Are we able to script the selection of the items to delete?
I though I could write a script that would loop through each group of duplicates and select the items I would like to delete. I would also like to read the list of folders in the "find in" area.

Is this currently possible?

Scripts can go through the file list and select/deselect things.

The part you'd be missing is knowing which files are duplicates of which, since scripts do not currently have a way to see how the file display is grouped.

If you're doing a duplicates search by name then that's easy enough, but if you're doing it by size/content then the script would have to recalculate that information itself, which isn't so great.

Actually, in Delete mode your script could probably infer the groups from the selections/checks, since the first item in each group should be unchecked. (Just make sure you don't change the checkboxes or sort order before running the script.)

There are a lot of rather simple duplicate file managers out there. Trying to find a good one is a bit of work. I found an app called Duplicate file detective. It specializes in handling duplicates and had the options I was looking for. Very useful app.

Please keep Help & Support discussion to Opus.

You can talk about other software in the CoffeeShop area.

Although this topic has been idle for a few years, I would like to share a simple workflow to find and eliminate duplicate files spread across multiple archives. I have a huge collection of photos in my primary archive (SOURCE), and have backed up my photos over the last decade to various hard drives and files archives. While most of the files in each archive are duplicates, occasionally I will discover photos in one archive but not the others. Because some of these sparse photos are ones I want to capture and keep, it is difficult to throw away these archives. However, hand checking each archive is generally not possible.

DOPUS is extremely versatile, and by chaining together a few features, I was able to create a simple workflow that works well; it is relatively painless and is quite fast. It is also robust enough to be used to deduplicate archives that contain hundreds of thousands of files.

BUT.... before you use it on something as valuable as financial records or your family photos, be sure to test this workflow on a trivial archive to understand the key steps.

For this exercise, I am starting with 3 large archives: SOURCE (77,000 photos), ARCHIVE A (12,000 photos), and ARCHIVE B (67,000 photos). Each archive contains a series of subfolders with content. This workflow will identify and eliminate duplicate files in ARCHIVE A and ARCHIVE B.

Once you've eliminated duplicate files, you will need to manually move all remaining files from Archive A and Archive B into the Source.

WORKFLOW

SUCCESS TIP
Q: Is it smart to test this workflow on something trivial before you use it on invaluable data?
A: Yes.

  1. Open Tools | Duplicate File Finder.
  2. Add SOURCE, ARCHIVE A, ARCHVIE B to the Duplicate Files | Find in: window.
  3. Perform a Search using MD5 Checksum. Since I want to keep every unique file, I've set MD5 accuracy to 100%. For speed, the MD5 cache option is ticked.
  4. The Duplicate File finder will run for a while, and will generate a list of grouped files.
  5. Go to the File Collections tab on the left panel and click on Duplicate Files.
  6. In the File Collections > Duplicate Files panel, you will see that one of the columns is Location. Right click on Location, and then select Group
  7. DOPUS will now sort duplicate files by Location (SOURCE, ARCHIVE A, ARCHIVE B), rather than grouping duplicate files from different locations together. The ARCHIVE A directories are shown sequentially first, followed by ARCHIVE B directories, and finally SOURCE directories.
  8. To delete duplicates in a location, click on an appropriate Section Heading in the File Collections > Duplicate Files window. DOPUS will highlight all the files in the heading; you can delete all of the files contained within the heading.

In this example:

  • You can select and delete files in ARCHIVE A section. These are duplicate files shared with other directories..

  • You can delete files in ARCHIVE B section. These are duplicate files shared with other directories.

  • IGNORE ANYTHING IN SOURCE! DO NOT SELECT OR DELETE ANY FILES FROM THE SOURCE.
    Don't even look at the SOURCE. The SOURCE does not like you.
    The SOURCE contains your primary files.
    If you mess with files in the SOURCE, you are going to have a bad time :slight_smile:

  1. As a final step, make sure to to manually import the deduplicated files from Archive A and Archive B into your Source.

I hope that you find this workflow helpful for your own digital spring cleaning!

1 Like