Duplicate file finder tutorial

You tell me! :slight_smile: How reliable is MD5?

Thanks for the suggestion.

I'm not sure what you mean. Why were you running a second duplicates scan before?

MD5 doesn't change from one run to the next; the MD5 of a file will always be the same as long as the file's contents have not changed.

If they are pictures, it is extremely unlikely that two non-identical pictures will have the same MD5, and you could put the duplicate finder results into thumbnails mode to quickly inspect things.

...and as long as you have not changed META information such as tags, ratings, user description.

Regards, AB

On the basis that if I checked them twice I was less likely to delete one I shouldn't. From what has been said here I was obviously worrying unduly. I know that MD5 can throw up the occasional identical checksum for different files but wasn't sure how high that risk was. From your replies it seems very unlikely.

Thanks for the help.

It's very unlikely, although not impossible.

Checking the thumbnails to make sure they really are the same image would make sense if you want to be completely sure.

(Also, Opus only compares the MD5 of two files if the files are the same size, so if two unrelated files happen to have the same MD5s they still won't be considered identical unless their sizes are also the same.)

Thanks Leo.

Ok, it's been a little since last post but just a moment ago I tried to use duplicate files finder and didn't get what I wanted, so I have a real-life example so you can help me find a way of doing what I want.

Look at the screenshot. This is what I get after clicking "find". Of course these results are pretty useless as DO selected a bunch of files that are obviously not identical (I also tried to compare files by name and size instead of md5).


If you turn on the MD5 Checksum column, do those files show checksums?

If you push Find again, do you get identical results?

Files larger than 5 MB have "file too large" in md5 column.
I changed comparison method to filename and size but I still get the same result.
I get the same result each time I click "find" button.

FWIW, you can force calculation by selecting them and running GetSizes MD5 or by increasing the size limit in Prefs/Misc/Advanced... But if you're seeing the same thing with just filename+size then the MD5 stuff can't be relevant, so don't worry about that anymore.

[quote]I changed comparison method to filename and size but I still get the same result.
I get the same result each time I click "find" button.[/quote]
Try going up a level from the Duplicate Files results, so you're in coll:// itself, and delete the Duplicate Files collection.

To be sure, fully exit Opus, then open it again and go back to coll:// and verify that the Duplicate Files collection has not come back.

Assuming the collection did not come back by itself, run the duplicate search again (Filename+Size only). Do you get the same results again?

I deleted collection, exit Opus completely (confirmed by clicking "exit" when asked if I really wanted to exit DO), collection wasn't there when opened DO again. I run duplicate finder comparing filename and size and still the same result.

I might need to send you a debug version that outputs some extra information about why those items are being grouped together.

Are you using 32-bit or 64-bit Windows/Opus? (If you want, you can add the info so it appears on the right of each post, by clicking User Control Panel at the top of the page, then the Profile tab. There are drop-downs near the bottom.)

Windows 7 64-bit.
Also, I added info to my profile for future reference.

I might try debug version you mentioned. Is this portable or installable version? Does it make any difference if I install it on virtual machine (it's also Win7 x64)?

I find this a bit worrying considering I have had a big clearout of images recently. I didn't notice this issue but then I didn't check every file relying largely on MD5. I hope this is just an MPEG issue. Have you tried running a duplicate file finder on images?
Ian

Just a quick test for you (duplicates were made for purpose of testing). Filename+size vs md5 comparison.
Please not that all files except the first one are selected so I guess you would notice.




I have just tried a search selecting Clear previous results, Search inside sub-folders, Delete mode, Comparison MD5 and got correct duplicates - which reassures me but doesn't help you.


[quote="leo"]I might need to send you a debug version that outputs some extra information about why those items are being grouped together.

Are you using 32-bit or 64-bit Windows/Opus? (If you want, you can add the info so it appears on the right of each post, by clicking User Control Panel at the top of the page, then the Profile tab. There are drop-downs near the bottom.)[/quote]
Are you still going to try that?

Yes, just haven't had a chance to get to it yet.

I think the problem is all in the grouping somehow, which doesn't seem to be happening properly on your machine, since everything in the list is a duplicate of something else in the list, they just aren't being grouped properly.

I thought it might a locale issue but I tried setting my locale (the way file sizes are written, in particular) the same as yours and it still worked, so that was a dead-end.

Will get a test version to you when we can, unless we work out the problem before then.

I was just about to ask if you could find some time to look into the matter again (you said you could give me a debug version) but I gave it another chance first. And - to my surprise - it works correctly!

Did you do anything to fix this issue?




There were some changes to how the new Group column works which may have fixed what you were seeing in the earlier version, although it may just be a coincidence. Please keep an eye out for it happening again, and if it does any differences between now & then which might be part of the trigger.