Comparing dates within filenames. Remove duplicates

I'm hoping that Dopus is up to this rather challenging task:

I have two sets of files with video that are different in file size and resolution. However when you view the videos they are the same footage but at different resolutions.

Example:
Some_cool_clip_19.02.17_sd.mp4
Some_cool_clip_19.02.17_mp4

How can I use the "Duplicate files" dialog to eliminate one of them?

This obviously has to only look at the filenames and figure it out. MD5 hashes aren't going to work.

This is very similar and may work for you:

Thank you Leo.
The problem with this example, is that it is using a hardcoded word – – COMPLETE -- in all circumstances and only at a fixed location.

My situation is that the filename dates are what’s important to determining a duplicate.

How can I compare the dates within the filenames between folders for a duplicate condition?

In your example, the two filenames are identical except for the "sd" on the end. Changing the script to look for "sd" rather than " [complete]" should match them, I think.

Forgive me, my example is in need of clarification.

Here's a more realistic example:

Some.cool.clip.19.02.17.sd.mp4
funvideo.19.02.17.wmv

The key problem is that I know that these are the same "series" of clips (because of the folder-names they are held in) and it's ONLY the dates in the filename that are reliable. Not even the size, creation dates or modified dates in the metadata are the same.

Is it possible to find these dupes?

It'd be possible with some extra scripting.

Do you care about the filenames staying as they are? If not, it'd be easier to rename everything so only the dates remain, at which point the duplicates will stand out.

How about copying the dates to the beginning of the filename and then sort by name? After deleting all dupes, the date could be easily removed.

I do care about the name of the file. For example it could be:

Some.cool.clip.19.02.17.happyinformation.sd.mp4
funvideo.19.02.17.interestingdetails.wmv

The problem is that these files are named manually elsewhere which makes them have no predictability other than the dates. Sometimes though the extra information is useful for other unrelated reasons.

Moving the dates to the beginning can be problematic because this is a very involved workflow which involves a SQL server cataloging all of the files. I'm ok with one of the directories moving the dates (files yet to be cataloged.)

1 Like

If the information in the filenames is useful, won't you lose half of it when you delete the second copy of any pair of files?

Are there any filenames where the date part will be ambiguous? e.g. Ones with other two digit numbers and dots next to the date, or multiple dates (or things which look like dates)? I think to do this will require scripting, and to write a good script will need a lot more example filenames.

You could use the Regexp column to extract the date in to its own column. Then group by that column.
Any group with more than one item is a dupe (based on your criteria).

I don't know if you can sort by group count or hide any groups with only one item.

Would look like this
image

Using this config
Regexp is .*(\d\d\.\d\d\.\d\d).*

2 Likes

Leo yes I could lose some good (but not critical information), but if it could be left in place that would be better. For example, if one of the files has good details and one didn't, i'd likely choose the one with more details to keep.

Wowbagger, interesting idea. Those custom columns can then just be used in the duplicate files dialog?. I'm going to try this.

I tried using Wowbaggers tool and it was pretty useful if the files were in the same directory.

The problem is that they are in separate directories and it doesn't appear that I can customize the file comparison criteria to include custom columns.

So my search continues on how to solve this problem. It would be a major time saver for me if I could get this to work.

Any further suggestions?

Can you use Find or Flat View (Mixed) so all the files are listed together? Then turn the column on and sort or group by it.

Flat view is good if they have the same parent. If the files don't have a common parent folder, you could use the search to find all files (*.*) adding the needed folders to the the search. Then add the custom column to the Find Results Lister.

WOW this is working amazingly well!!!! This is some tool!

It's doing exactly what i need and is going to save me a ton of time.

Thank you so much Wowbagger!

And thanks for Dopus for making such a powerful tool that can be extended in all kinds of cool ways!

1 Like