Define "identical"

When moving files if I choose skip identical, it uses a different definition of "identical" than I have in my head.
skip identical
I would like it to skip files with the same hash. It seems to skip based on attributes. perhaps file creation or last modified or something else. For me, I don't care if the attributes are different, if the data contained is the same then I don't need it.

(if it matters, the files seem to have identical names and file size but are not considered identical by whatever makes those decisions.)

Is there a way to define what identical means when moving files?

It's defined in the manual:

There's very little point comparing the file contents when copying files. It would take just as long (or longer) to read the contents of both files as it would to simply overwrite it.

As a professional photo organizer, I do a lot of duplicate removal. Yes, there are more consideration when de-duping photos as compared to other types of files.
If there are several criteria you need to compare, and you can't make mistakes, I recommend Duplicate Cleaner Pro by Digital Volcano. It has a few different modes depending on what kind of files you are working with.

That isn't what I asked. My question was, "Is there a way to define what identical means when moving files?" Unless you are saying that I can change the manual and that will change how my installation works. Not sure how I would do that. Your attachment looks like it is from a website.

I'm not copying files; I am moving them.

I don't think we are talking about the same thing. Is there any additional information I can provide to clarify what I want to do?

Move/copy, it's the same thing. If you have to read both files to find out if they're the same or not, you may as well just replace the old ones.

one deletes the source the other doesn't.

Are you saying that if I choose "replace" instead of "skip" it will do a hash check comparison? Or are you saying that my files aren't important enough for me to make the effort?

Unless your solution verifies the data is the same, It seems like I might be better off using rename new and then deduping them after.

Say it does a hash check:

  • The files are different. What do you do? You'll move the source and replace the destination. You end up with one copy of the new file.
  • The files are the same. What do you do? You delete the original without moving it. You end up with one copy of the new file.

Now say it doesn't do a hash check, and you simply say "yes" to overwrite the original file. You end up with one copy of the new file.

The outcome is the same. There's no point doing the hash check if the eventual outcome is that you always end up keeping the new file, you may as well just overwrite in the first instance.

I keep them both. then copy the parent folder over instead.

Well, thanks. It seems like the answer to my question is no.

That's not what you originally said:

My answer was based on that. Anyway, to be explicit, no, you can't define what identical means.

1 Like

I do appreciate you taking the time to help me out. I see that the answer is, no. I suspected that, but it is good to have an official confirmation.

The following is just for clarification. No need to respond unless you want to clarify something.

I fully admit that I have no idea what a hash check would actually even check. Maybe metadata like creation date and modified date affect the hash and so what I was asking for was pointless. If that is how it works then I could see why there would be confusion.

Correct. If the actual data contained in the file is exactly the same, then I don't need two copies. I believe my answer was consistent with my original statement since in your example you said, "The files are different". Since you state they are different in your example, I keep both. In your second example you say the files are the same and in that case I didn't disagree with that example.

Generating a hash requires reading the entire file and performing a CPU-intensive algorithm on all of the data, for both the source and destination files. It is not a trivial operation in terms of time.

Thanks for the reply. I searched and it seems that hash check would not be affected by the file attributes since it is created from the binary data.

You are correct the hash option in opus for finding duplicates seems slower than other duplicate-finding software I use. I think I have used the feature in opus twice. Speed isn't usually a problem for me though as I can just let it run in the background.