How to find duplicates that are having the same contents and sizes

Downloaded the same pdf files a 2nd time.

  • Filenames are different,
  • sizes are the same,
  • dates are the same (but not the time)
  • comparing contents (e.g. using a tool like Beyond Compare->Content compare) shows no differences

One way or the other hashes are different and for that reason, I don't see an easy way to search for duplicates.
Only way is to sort on date+size and then manually delete duplicates, one by one.

In the below example the first group of digits (10) is the document number of the source server.

=

Any suggestions as to how to cope with this ?

Thanks

Unless you've been downloading from Google's SHA1 demo server I don't see how the hashes can be different but the contents the same.

I have no explanation either and was hoping that someone had.
Point is, the hashes are different.

Have taken 2 'identical' files as an example.
screenshot shows properties

size in bytes = same
Beyond Compare - comparing file contents - 162 lines - no differences
Obviously, as it involved a duplicate download, but I renamed the downloads differently.
Hm.. whilst writing, am unsure as to whether a hash calculation is taking the 'created' date/time stamp into consideration.

Beyond Compare is probably in a mode where it only compares the basic (ASCII) text of the two PDF files and not any of the formatting, metadata, tags, etc.

If the file is generated on-the-fly by the server you downloaded it from, there might be a "date created" tag or something which is always the same size but different each time you download it.

The binary file contents will not be identical if the hashes are different. The basic text when converted to ASCII might be the same, but we don't have a way to match that in our Duplicate Finder. You might be able to do it with a script and something that can extract the text, but I don't think it would be worth the amount of time that would take to create, unless these extra files you've downloaded are causing you serious problems and difficult to find by sorting by size.

Thanks Leo.

Well, so be it. No alternative but to delete them manually.

BTW I also used Acrobat to have the documents compared: no differences.
Within Acrobat, File->Properties, to show the metadata: no differences

I downloaded the same file again, just now.
Indeed an URL is opened showing https blabla... 'GetPDF...', plus popup to save the file.
When comparing the new one with the existing 2, the hash is also different.

Oh, note that updating the creation date-time of all 3 files simultaneously, to have the same date-time
and copy that to modified, then still the hashes are different.

=