Comparing dates within filenames. Remove duplicates

jimerb · February 17, 2019, 11:38pm

I'm hoping that Dopus is up to this rather challenging task:

I have two sets of files with video that are different in file size and resolution. However when you view the videos they are the same footage but at different resolutions.

Example:
Some_cool_clip_19.02.17_sd.mp4
Some_cool_clip_19.02.17_mp4

How can I use the "Duplicate files" dialog to eliminate one of them?

This obviously has to only look at the filenames and figure it out. MD5 hashes aren't going to work.

Leo · February 17, 2019, 11:50pm

This is very similar and may work for you:

jimerb · February 18, 2019, 12:06pm

Thank you Leo.
The problem with this example, is that it is using a hardcoded word – – COMPLETE -- in all circumstances and only at a fixed location.

My situation is that the filename dates are what’s important to determining a duplicate.

How can I compare the dates within the filenames between folders for a duplicate condition?

Leo · February 18, 2019, 2:28pm

In your example, the two filenames are identical except for the "sd" on the end. Changing the script to look for "sd" rather than " [complete]" should match them, I think.

jimerb · February 18, 2019, 4:39pm

Forgive me, my example is in need of clarification.

Here's a more realistic example:

Some.cool.clip.19.02.17.sd.mp4
funvideo.19.02.17.wmv

The key problem is that I know that these are the same "series" of clips (because of the folder-names they are held in) and it's ONLY the dates in the filename that are reliable. Not even the size, creation dates or modified dates in the metadata are the same.

Is it possible to find these dupes?

Leo · February 18, 2019, 6:09pm

It'd be possible with some extra scripting.

Do you care about the filenames staying as they are? If not, it'd be easier to rename everything so only the dates remain, at which point the duplicates will stand out.

lxp · February 18, 2019, 6:54pm

How about copying the dates to the beginning of the filename and then sort by name? After deleting all dupes, the date could be easily removed.

jimerb · February 18, 2019, 11:26pm

I do care about the name of the file. For example it could be:

Some.cool.clip.19.02.17.happyinformation.sd.mp4
funvideo.19.02.17.interestingdetails.wmv

The problem is that these files are named manually elsewhere which makes them have no predictability other than the dates. Sometimes though the extra information is useful for other unrelated reasons.

Moving the dates to the beginning can be problematic because this is a very involved workflow which involves a SQL server cataloging all of the files. I'm ok with one of the directories moving the dates (files yet to be cataloged.)

Leo · February 18, 2019, 11:59pm

If the information in the filenames is useful, won't you lose half of it when you delete the second copy of any pair of files?

Are there any filenames where the date part will be ambiguous? e.g. Ones with other two digit numbers and dots next to the date, or multiple dates (or things which look like dates)? I think to do this will require scripting, and to write a good script will need a lot more example filenames.

wowbagger · February 19, 2019, 12:05am

You could use the Regexp column to extract the date in to its own column. Then group by that column.
Any group with more than one item is a dupe (based on your criteria).

I don't know if you can sort by group count or hide any groups with only one item.

Would look like this

Using this config
Regexp is .*(\d\d\.\d\d\.\d\d).*

jimerb · February 19, 2019, 1:53pm

Leo yes I could lose some good (but not critical information), but if it could be left in place that would be better. For example, if one of the files has good details and one didn't, i'd likely choose the one with more details to keep.

Wowbagger, interesting idea. Those custom columns can then just be used in the duplicate files dialog?. I'm going to try this.

jimerb · February 23, 2019, 4:53am

I tried using Wowbaggers tool and it was pretty useful if the files were in the same directory.

The problem is that they are in separate directories and it doesn't appear that I can customize the file comparison criteria to include custom columns.

So my search continues on how to solve this problem. It would be a major time saver for me if I could get this to work.

Any further suggestions?

Leo · February 23, 2019, 8:01am

Can you use Find or Flat View (Mixed) so all the files are listed together? Then turn the column on and sort or group by it.

wowbagger · February 23, 2019, 11:53am

Flat view is good if they have the same parent. If the files don't have a common parent folder, you could use the search to find all files (*.*) adding the needed folders to the the search. Then add the custom column to the Find Results Lister.

jimerb · February 23, 2019, 5:07pm

WOW this is working amazingly well!!!! This is some tool!

It's doing exactly what i need and is going to save me a ton of time.

Thank you so much Wowbagger!

And thanks for Dopus for making such a powerful tool that can be extended in all kinds of cool ways!