Duplicate file finder tutorial

Is there any tutorial about duplicate file finder? I couldn't find any.
I tried using this tool several times to delete duplicate files but each time I found it useless as it didn't show me duplicate files in a clear way so I knew which file to move or delete. Maybe if somebody showed how they use it, I will also find it useful.

From DOpus, open Help and use the Search section to search for Duplicates. You'll see a nicely written section on how the tool works.

Once results are prepared, you'll see a grouping of files with duplicates grouped (according to the criteria you selected) and the columns in the collection show the names, sizes, and path locations of the files. This allows you to make the judgement of which files should be deleted, and probably you'll select only those in certain folders. In Delete Mode, checkboxes will appear adjacent to each file, with the first file in the group unselected and duplicates selected. You can then delete these. Presumably, you will know which ones you want deleted, and they fall into some sort of pattern (e.g. you copied files from some old system to your new one, and now have duplicates in two locations and want to delete those from the old computer only).

See also today's post: Create & Load Profiles when using Duplicate File Finder

Is there anyway of sending the duplicates to a separate folder? The reason for asking is that in the past I have used d'Peg from Someware which is very flexible but somewhat slow and has a habit of crashing if trying to find duplicates amongst large number of files. I liked putting the deleted files in a separate folder as I could then run a second check which deleted them. Using MD5 hash and two runs made it pretty safe to automate the delete without having to look at every file.

Opus duplicate finder is much faster and stable but you have to make the decision yourself in every case whether or not to delete. My difficulty is that over the years I have put about 60,000 photos on my system and amongst them are many duplicates. I want to automate the clear out but still be sure I am not deleting non-duplicates. My two stage process with d'Peg gave a check on this. Opus gives me one chance!

I suppose what it comes down to is how certain can I be that if Opus says it is a duplicate it is a duplicate!

The Opus duplicate finder has an md5 mode. Do you still need the second tool/scan?

You can drag things out of the duplicate results to another folder if you want to copy or move them, anyway.

You tell me! :slight_smile: How reliable is MD5?

Thanks for the suggestion.

I'm not sure what you mean. Why were you running a second duplicates scan before?

MD5 doesn't change from one run to the next; the MD5 of a file will always be the same as long as the file's contents have not changed.

If they are pictures, it is extremely unlikely that two non-identical pictures will have the same MD5, and you could put the duplicate finder results into thumbnails mode to quickly inspect things.

...and as long as you have not changed META information such as tags, ratings, user description.

Regards, AB

On the basis that if I checked them twice I was less likely to delete one I shouldn't. From what has been said here I was obviously worrying unduly. I know that MD5 can throw up the occasional identical checksum for different files but wasn't sure how high that risk was. From your replies it seems very unlikely.

Thanks for the help.

It's very unlikely, although not impossible.

Checking the thumbnails to make sure they really are the same image would make sense if you want to be completely sure.

(Also, Opus only compares the MD5 of two files if the files are the same size, so if two unrelated files happen to have the same MD5s they still won't be considered identical unless their sizes are also the same.)

Thanks Leo.

Ok, it's been a little since last post but just a moment ago I tried to use duplicate files finder and didn't get what I wanted, so I have a real-life example so you can help me find a way of doing what I want.

Look at the screenshot. This is what I get after clicking "find". Of course these results are pretty useless as DO selected a bunch of files that are obviously not identical (I also tried to compare files by name and size instead of md5).


If you turn on the MD5 Checksum column, do those files show checksums?

If you push Find again, do you get identical results?

Files larger than 5 MB have "file too large" in md5 column.
I changed comparison method to filename and size but I still get the same result.
I get the same result each time I click "find" button.

FWIW, you can force calculation by selecting them and running GetSizes MD5 or by increasing the size limit in Prefs/Misc/Advanced... But if you're seeing the same thing with just filename+size then the MD5 stuff can't be relevant, so don't worry about that anymore.

[quote]I changed comparison method to filename and size but I still get the same result.
I get the same result each time I click "find" button.[/quote]
Try going up a level from the Duplicate Files results, so you're in coll:// itself, and delete the Duplicate Files collection.

To be sure, fully exit Opus, then open it again and go back to coll:// and verify that the Duplicate Files collection has not come back.

Assuming the collection did not come back by itself, run the duplicate search again (Filename+Size only). Do you get the same results again?

I deleted collection, exit Opus completely (confirmed by clicking "exit" when asked if I really wanted to exit DO), collection wasn't there when opened DO again. I run duplicate finder comparing filename and size and still the same result.

I might need to send you a debug version that outputs some extra information about why those items are being grouped together.

Are you using 32-bit or 64-bit Windows/Opus? (If you want, you can add the info so it appears on the right of each post, by clicking User Control Panel at the top of the page, then the Profile tab. There are drop-downs near the bottom.)

Windows 7 64-bit.
Also, I added info to my profile for future reference.

I might try debug version you mentioned. Is this portable or installable version? Does it make any difference if I install it on virtual machine (it's also Win7 x64)?

I find this a bit worrying considering I have had a big clearout of images recently. I didn't notice this issue but then I didn't check every file relying largely on MD5. I hope this is just an MPEG issue. Have you tried running a duplicate file finder on images?
Ian

Just a quick test for you (duplicates were made for purpose of testing). Filename+size vs md5 comparison.
Please not that all files except the first one are selected so I guess you would notice.