Use Duplicate Files tool to find SIMILAR files

I was wondering if anyone knew of a way to use the Duplicate File tool to do more sophisticated comparisions of files in order to find files that have similar names, but are not exactly the same.

i.e., I have a large library of videos and music, and sometimes I get several copies of the same music video or song, and they are indeed different copies (different encodes, different sources, etc), so the MD5 hash won't match, but I want to be able to see all of them out there. Usually the filename will be a little different based on its tracknumber, the codec or container extension, or maybe because of some spelling mistake or formatting inconsistencies.

A while back I make a fairly complex perl script that could do stuff like this by parsing the filenames to get the artist part, the title part, and comparing all the titles and all the artists and then giving them a ranking by using Levenshtein distance (where each character different gets one point, IIRC). I could then specify a threshold of how different the names and titles could be for them to show as similar.

Obviously that's way too complex for Directory Opus (I reckon), but could the Levenshtein Distance part be done at least? Like, if there are five characters difference between the names, they show as similar? This would be extremely useful.

I'd appreciate it if anyone could think of a way to do this (or the developers could add in a feature for this!).

FYI, here's the wikipedia entry on Levenshtein Distance:
en.wikipedia.org/wiki/Levenshtein_distance

There's currently no way to do this directly in Opus. You could make an Opus button which runs your Perl script though.

I should just redo the perl script for this application... I could reuse a lot of the program's algorithm for this specific application. I just like the interface of directory opus a lot more for being able to see the information about the files and delete them, load them into a player, etc.

I thought it would be handy if there were a way to do this with DOpus. Is there such a thing as some kind of scripting engine within Dopus for being able to run custom programs and functions like this? That would be pretty neat too... I really like, for example, the very flexible rename scripting you can do using regular expressions.

I've been learning a little about linux shell scripting lately too... like how you can use find with a bunch of parameters and then pass the file list that results to xargs to have it run a command on each file. It could be handy if you could do something like that with an external program and DOpus... like... have my perl script dtermine that five files are similar and them open up a lister window showing those five files. Is something like this perhaps possible? Maybe this would be an elaborate application for the find command.
Maybe... it would be possible to mark the files in some way so that the find command could uniquely identify them, and then you could run a search for those files. I don't think i'd want to change the filename though... Maybe something strange like setting the creation date of each match to a different year and then I could sort them all by creation date?

There may be a way for the Perl script to add watever files it wants to a collection which would be a good way to do this. I had a quick play and couldn't find a method which worked.

I'll have another look when I'm not feeling ill and if there isn't a way I'll send a feature request to GPSoft for some simple clear-collection and add-filepath-to-collection commands.

Sorry to hear that you're under the weather nudel.

Being able to add files to a collection with a perl script or some such thing would be very handy though... that could do what I need, possibly.

I wonder if it would be possible to somehow group things the way the duplicate file tool does...?

Jon posted some info that should be helpful:

[Is DOpus right for me?:: user-generated collection xml)