AI Based Find Duplicate Files

I previously suggested adding a feature for Intelligent Renaming Using AI.

Here, similarly, I'd like to recommend AI Based De-Duplication.

Often times files like videos can be the same but in a different file size or codec that provides no traditional way to identify a duplicate. The filename usually leaves a clue, but only for a human.

For example, take this set of files:

  • Law and Order S01E01 - Pilot 720p.mp4
  • LawAndOrder season 1 episode 1 1080p.avi
  • Law andOrder - s1 EP1 pilot 2160p.mpg
  • S01E01.Law.And.Order.1999.Pilot.1080p.h265.mkv

You as a human can tell at a glance that these are all the same thing, but good luck trying to systemically remove duplicates with any kind of traditional check.

This is where AI can come in. With new options in the Find Duplicates dialog box, you could store and use instructions that link out to a chatgpt in much the same manner I referenced in my post about Intelligent Renames (linked above.)

Directory Opus 14 could reach out to Chatgpt with a set of filenames and use pre-crafted instructions to find duplicates. Then ChatGPT would return a list of the duplicates. Directory opus would take care of the mechanics of presenting to the user the duplicates and removal but ChatGPT would do the intelligent analysis to find the dups in a situation where md5 hashing, or filesizes are of no use.

This is where I'm suggesting a new feature:

It would need to store and pass instructions like this one for TV Series: "I'm going to give you a list of tv shows and I want you to analyze the names and return to me only the sets of files that appear to be duplicate episodes. You should return the full-path and filename and you should include a unique identifier number to group the duplicates together. You should always return them in order of grouped file identifier."

In Practice, the user would be able to

  1. build up a list of directories they want to check for dups on the left.
  2. pick "AI Duplicate Check" and choose your favorite stored instructions. Then hit FIND. (providing a default would be good.)
  3. Directory Opus would reach out to ChatGPT via their API and hand it the list of names with the instructions.
  4. ChatGPT would return only the list of duplicates with a group identifier so DOPUS would no how to present the results.
  5. User confirms that they are indeed dups.
  6. User deletes the dups as desired using the traditional interface.

Noting like this exists in any file manager I'm aware of today. I hope you will consider it. Could be a major feature for the next release.

The feature you really want is the ability for DO to query AI and then parse the output into variables or directly used as function code to execute a command. It would enable everything you've requested so far and pretty much anything else you can think of AI related.

I'm not sure utilizing online AI services is the best approach though, there are a lot of AI services out there. ChatGPT is one service, presumably if DO were to support querying online AI it would have to support more than just ChatGPT, a lot more and that's a ton of work. IMO it'd make more sense if DOPUS just used a windows built-in API (or a Co-Pilot plugin) to access whatever the OS AI model happens to be at the time or it could have it's own internal model. Both of those provided far more consistency then relying on online models that may not exist in a few years and avoids having to deal with programing dozens of APIs which would be a ridiculous time sink upfront and over time. Heck not even window's copilot is guaranteed at this point, it could completely changed just due to how new the technology is.

1 Like

I support the general concept of AI being utilized in some form in a file manager. I believe it is a likely future with, currently, an unclear time frame. In this context, something to keep in mind.

1 Like




(set a description and tags from a given image)

Playing around with this over the last weekend, it seems quite feasible with current tools. Given the options available in today's AI market, I imagine it will become even more accessible over time.

The only inconvenience I've encountered, and one that's quite challenging to overcome, is the lack of a decent tool that provides an easy way to retrieve/send information from the internet. JScript is not suitable for that purpose, since asynchronous sending is almost impossible (or is rather inconsistent, to say the least), which presents a real challenge when it comes to tasks like these. Consequently, there isn't a proper way to abort an ongoing connection, among other issues. This leads to relying on various workarounds, which can be discouraging to use, especially for beginners.

In my humble opinion, a better addition would be to have a native way to get/send information to/from a URL via scripting. Something I'd love to see is a native DO approach for this purpose. Perhaps in the not-too-distant future?

1 Like