I have been trying to find duplicate files by size on a drive. When the list gets to 10k files or more it gets very slow respond. Over 100k it takes minutes to respond to mouse clicks.
I worked around it a bit by using the search feature to remove some of the files, but some I want to keep.
If you're finding 100,000 matches based on file size alone, can you really do anything with that list of files? It seems too large to work through in any meaningful way.
My guess is the files aren't really duplicates, and they're just common file sizes for a given data type (e.g. BMP files with the same dimensions will tend to have the same size, as they're uncompressed). Or there are so many files on the drive that lots of them happen to have the same size as at least one other.
A lot of them are small files, yes. I just sort the list by size and then start with the largest ones. If there was a way to filter out files below a certain size it would help.
I could do MD5 as well, but it would take forever with about 8TB of data.