Blake3 instead of MD5 in Find Duplicate

Please implement blake3 instead of MD5 (or add it as an option) in the duplicate file finder.

1 Like

Available with Opus 13.5.1

image

3 Likes

Would suggest using Column + Size, otherwise every file will have its checksum calculated rather than just those files with the same size as another.

3 Likes

Out of interest, why do you care which hashing algorithm is used to find duplicates? CPU time or something else?

For me, it is the lower collision probability of blake3 compared to MD5, and the computational efficiency is a nice bonus.

I know that MD5 is practically good enough 99.9999...% of the time, but if there's a way to use something that's even theoretically better, I need it, otherwise I can't find my inner peace. I just want to use state-of-the-art technology.

Are you sure? I ran a few tests and blake3 was only faster for files smaller than 250 KB. MD5 processed larger files on SSDs usually twice as fast. The speed was about the same for files on HDDs.

To be honest, I haven't tested it myself. I just saw some comparisons and tests. Like I said, speed is not important to me, just nice to have :slight_smile:

Collisions are only going to be a problem if someone is maliciously trying to make it appear that some of your files have duplicates when they don't, and where that person knows the content of some of your files to manufacture junk files that collide with them and is also able to trick you into downloading them, running the duplicate finder on them, and deleting the wrong files. All highly unlikely.

Doesn't seem there's any real reason to change the algorithm here.

So you're saying that it's impossible for different files to have the same MD5 hash by chance? I mean, if the attacker can create such a collision, it could happen by chance too, right?

1 Like

It's not impossible but it's very unlikely. And the same thing can happen with any hash algorithm.

MD5 is not a secure algorithm for cryptography. For a quick way to tell if two files are identical it's fine.

Yes, it is very unlikely, and it can happen with other algorithms as well, but blake3 should be more robust in this sense than MD5, right (at least due to the larger digest size)? There are many known collisions for MD5, but none (yet) for blake3.