Find Duplicates but only for partial file names (command)

Hi guys,

Just after a little assistance with the "Find DUPES" command syntax.

I would like to be able to identify duplicate files, but based on only a partial file name. I am a draftsman and we name our drawing files with a revision suffix, so if possible, I would like to be able to find files with duplicate names, but ignoring the revision suffix.

For example, our file naming convention is 1102-FL1001_A (where "_A" represents the revision). Is it possible to find duplicate files for only partial file name?

I've had a look at the wildcard & regular expression filtering in the 'Find Duplicate Files' tool, but I really don't know where to begin. If possible, I would really like to create a toolbar command/button for this exercise.

Thanks a lot for any help.

Do you really want to find files that are (binary) indentical and not only share the same basename?

If yes, then I guess you could use the DUPE-find and filter search result afterwards.
If no, then you might even get along with the regular search I assume.

Hi tbone,

Thanks for the reply.

The files will be different. They will just share the same filename (aside from the revision suffix).

Do you want to find all the duplicates of a particular file, or do you want to find all groups of duplicates at once?

(Where "duplicates" means "files that have similar name prefixes as each other".)

Hi Leo,

Thanks for the reply. I would like to find all groups of duplicates within the current folder at once.

For example, if I have a folder that contains the following drawings/files:

1095-A1001_0.dwg
1095-A1006_0.dwg
1095-A1006_1.dwg
1095-A1006_2.dwg
1095-A1008_0.dwg
1095-A1009_0.dwg
1095-A1009_1.dwg
1095-A1010_0.dwg
1095-A1011_0.dwg
1095-A1011_1.dwg
1095-A1017_0.dwg

I would like to create a command that will isolate or select:

1095-A1006_0.dwg
1095-A1006_1.dwg
1095-A1006_2.dwg
1095-A1009_0.dwg
1095-A1009_1.dwg
1095-A1011_0.dwg
1095-A1011_1.dwg

Please be advised that the file extension will not always be .DWG.

Thanks again.

So you'd like to find all groups that have at least a single ".._." member, is that correct?

Is the naming scheme/pattern the same for all files? If yes and you don't need to run the find over a lot of folders at once, then I see chances to use SelectEx and a small jscript-snippet to select groups from source to destination. You can invert the resulting selection afterwards to get all the single files not having a revision duplicate.

I currenly don't see anything that wouldn't require a bit of scripting to do this, but maybe Leo comes up with something different.

Yeah, you would need a bit of scripting for that kind of logic.

At first there is preperation! o)

I updated ClipboardEx with a new PASTEEMPTY switch. That one allows quick creation of (empty) testfiles from clipboard. We had several threads where it's much more easy to help if you actually have similarly named files at hand, so why not start using it right here. ClipboardEx: (Command: ClipboardEx (clipboard related functions))

I hope to get back to you soon.. o)

@lamensterms
Do you actually use DO11 or DO10 as displayed by your user-info?

[quote="tbone"]At first there is preperation! o)

I updated ClipboardEx with a new PASTEEMPTY switch. That one allows quick creation of (empty) testfiles from clipboard. We had several threads where it's much more easy to help if you actually have similarly named files at hand, so why not start using it right here. ClipboardEx: ([Command: ClipboardEx (clipboard related functions)))[/quote]

We were already prepared. :slight_smile: One of the first script samples: Paste empty file list.

Hi guys,

I'm using DOpus 11.

Yes, but the suffix will not always be a numerical digit, sometimes it will be an alpha (A, B, C, etc).

Install SelectEx, find it here: [Command: SelectEx (extended Select command))
Then open the folder your files are in and run this command without selecting items.
Put the command it in a button first or paste it straigt into the command bar, press ">" to open it.

SelectEx SIMILARMETAJS="if (!item.is_dir && String(item.name).search('_(0|A)\\.')==-1) cmd.RunCommand('Select '+String(item.name).substring(0,10)+'_*');"

This works for the filenames you posted and also if the group ends with _A,B,C.. instead of _0,1,2.
You need SelectEx v0.5 (uploaded some seconds ago), as I made the shortcut "cmd" variable available to js-snippets to prevent extra long command lines. Admittedly it's still quite long, but still a one-liner. o)

Hi TBone,

Awesome thanks so much for providing that.

I have only just got around to testing it (sorry for the late reply) and have found that it will only select a group of duplicates if all duplicate files have either alpha suffixes or numeric suffixes. It will identify three (or more) files so long as at least 2 of the duplicates have an alpha or number suffix. Please see screen shot below for example. In the example, file '1095-f1505' has not been picked as a duplicate because of the pair; one has an alpha suffix and one has a numerical suffix - there is not 2 of each type.


Sorry for the poor explanation, pretty difficult to describe.

Thanks for the help.

Any update on this TBone?

Yes, I still keep this thread/browser-tab opened to not forget about it! o)
Can't say when I'll find the time to put the necessary pieces together though.

With the info in your last post, the approach used until now won't cut it, so we need to do it a bit differently next time.
To be continued (soon I hope).. o)

No worries, thanks a lot for the update.

Do you have any suggestions as to other approaches I might be able to investigate in the meantime? I feel the script you provided is close to 100%, so that might be the best bet.

Hi lamenstern,

in case this is still unresolved and something you'd like to have.
Download and try the attached button, it should enhance on the misbehaving bits from the last approach.
SelectGroupOfFiles.dcf (8.59 KB)

This is the full script button code, making use of a generic filter "thing":

////////////////////////////////////////////////////////////////////////////////
var PreFilter = function( file ){
	//return true if this file(type) is of interest in the process
	return true;
}
////////////////////////////////////////////////////////////////////////////////
var MainFilter = function( file ){
	//return true to run evaluation on this specific file(type)
	return true;
}
////////////////////////////////////////////////////////////////////////////////
var Evaluate = function( file ){
	//return true to execute the foreach-operation on this file (select, delete etc.)
	var baseBaseName = file.baseName.replace( new RegExp("(.*?)_(.*)"), "$1");
	var baseBaseSuffix = file.baseName.replace( new RegExp("(.*?)_(.*)"), "$2");
	if (this.FileExists(baseBaseName.esc()+'_[^'+baseBaseSuffix.esc()+']'+file.ext.esc()))
		return true;
}
////////////////////////////////////////////////////////////////////////////////
var ExecuteForEach = function( file ){
	//execute for each file that passed evaluation
	this.cmd.RunCommand('Select "'+file.name+'" EXACT');
} 
////////////////////////////////////////////////////////////////////////////////
var ExecuteForAll = function(){
	//execute finally
	//this.cmd.RunCommand('SelectEx MAKEVISIBLE');
} 
////////////////////////////////////////////////////////////////////////////////
function EasyFilter(filesIn,cmd,preFilter,mainFilter,evaluate,executeForEach,executeForAll){
	this.version		= 0.1
	this.files			= [];
	this.filesFiltered	= [];
	this.filesEvaluated	= [];
	this.cmd			= cmd;
	this.PreFilter		= preFilter;
	this.MainFilter		= mainFilter;
	this.Evaluate		= evaluate;
	this.ExecuteForEach	= executeForEach;
	this.ExecuteForAll	= executeForAll;
	////////////////////////////////////////////////////////////////////////////
	String.prototype.esc = function(str){
		return this.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
	}
	////////////////////////////////////////////////////////////////////////////
	this.FileExists = function( fileNameRegex ){
		fileNameRegex = new RegExp(fileNameRegex);
		for(var i=0;i<this.files.length;i++){
			if (this.files[i].name.search(fileNameRegex)!=-1)
				return this.files[i];
		}
		return null;
	}
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("PreFiltering ["+filesIn.length+"] files..");
	for(var i=0;i<filesIn.length;i++){
		if (this.PreFilter(filesIn[i])===true){
			this.files[this.files.length] = filesIn[i];
			DOpus.Output("    PreFilter passed ["+filesIn[i].name+"]");
		} else {
			//DOpus.Output("    PreFilter ignore ["+filesIn[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("MainFiltering ["+this.files.length+"] files..");
	for(var i=0;i<this.files.length;i++){
		if (this.MainFilter(this.files[i])===true){
			DOpus.Output("    MainFilter passed ["+this.files[i].name+"]");
			this.filesFiltered[this.filesFiltered.length] = this.files[i];
		} else {
			//DOpus.Output("    MainFilter ignore ["+filesIn[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Evaluating ["+this.filesFiltered.length+"] files..");
	for(var i=0;i<this.filesFiltered.length;i++){
		if (this.Evaluate(this.filesFiltered[i])===true){
			DOpus.Output("    Evaluation passed ["+this.filesFiltered[i].name+"]");
			this.filesEvaluated[this.filesEvaluated.length] = this.filesFiltered[i];
		} else {
			//DOpus.Output("    Evaluation ignore ["+this.filesFiltered[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Processing ["+this.filesEvaluated.length+"] files..");
	for(var i=0;i<this.filesEvaluated.length;i++){
		DOpus.Output("    Running ForEachOp ["+this.filesEvaluated[i].name+"]");
		this.ExecuteForEach(this.filesEvaluated[i]);
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Running ForAll-Operation..");
	this.ExecuteForAll();
	DOpus.Output("");
}
////////////////////////////////////////////////////////////////////////////////
function OnClick(data){
	var filesTab = data.func.sourcetab.files, files = [];
	var cmd = data.func.command; cmd.ClearFiles();
	////////////////////////////////////////////////////////////////////////////
	for(var i=0;i<filesTab.count;i++){
		var file = {	name		: String(filesTab(i).name).toLowerCase(),
						baseName	: String(filesTab(i).name_stem).toLowerCase(),
						ext			: String(filesTab(i).ext).toLowerCase() };
		files[files.length] = file;
	}
	var filter = new EasyFilter(files,cmd,PreFilter,MainFilter,Evaluate,ExecuteForEach,ExecuteForAll);
	DOpus.Output("Done.");
}

Hi TBone,

Oh awesome, thanks so much for taking the time to get back to this, and also for all your hard work putting the script together.

Sorry for the late response.

I have performed a few tests and from what I have tried so far... the script appears to be working great (selecting all groups of duplicates).

One minor change I have made was to perform a SELECT NONE before executing the script - simply to deselect the current selection, given the current selection may not always contain duplicates.

Thanks again for you huge effort, I really appreciate it.

I'm glad to hear! o)
Once the EasyFilter snippet was done, which I use for other things myself, incorporating your usecase took just a few seconds. The hard part was to not forget about this unfinished challenge during the weeks. o) Cya!