Find Duplicates but only for partial file names (command)

Hi Leo,

Thanks for the reply. I would like to find all groups of duplicates within the current folder at once.

For example, if I have a folder that contains the following drawings/files:

1095-A1001_0.dwg
1095-A1006_0.dwg
1095-A1006_1.dwg
1095-A1006_2.dwg
1095-A1008_0.dwg
1095-A1009_0.dwg
1095-A1009_1.dwg
1095-A1010_0.dwg
1095-A1011_0.dwg
1095-A1011_1.dwg
1095-A1017_0.dwg

I would like to create a command that will isolate or select:

1095-A1006_0.dwg
1095-A1006_1.dwg
1095-A1006_2.dwg
1095-A1009_0.dwg
1095-A1009_1.dwg
1095-A1011_0.dwg
1095-A1011_1.dwg

Please be advised that the file extension will not always be .DWG.

Thanks again.

So you'd like to find all groups that have at least a single ".._." member, is that correct?

Is the naming scheme/pattern the same for all files? If yes and you don't need to run the find over a lot of folders at once, then I see chances to use SelectEx and a small jscript-snippet to select groups from source to destination. You can invert the resulting selection afterwards to get all the single files not having a revision duplicate.

I currenly don't see anything that wouldn't require a bit of scripting to do this, but maybe Leo comes up with something different.

Yeah, you would need a bit of scripting for that kind of logic.

At first there is preperation! o)

I updated ClipboardEx with a new PASTEEMPTY switch. That one allows quick creation of (empty) testfiles from clipboard. We had several threads where it's much more easy to help if you actually have similarly named files at hand, so why not start using it right here. ClipboardEx: (Command: ClipboardEx (clipboard related functions))

I hope to get back to you soon.. o)

@lamensterms
Do you actually use DO11 or DO10 as displayed by your user-info?

[quote="tbone"]At first there is preperation! o)

I updated ClipboardEx with a new PASTEEMPTY switch. That one allows quick creation of (empty) testfiles from clipboard. We had several threads where it's much more easy to help if you actually have similarly named files at hand, so why not start using it right here. ClipboardEx: ([Command: ClipboardEx (clipboard related functions)))[/quote]

We were already prepared. :slight_smile: One of the first script samples: Paste empty file list.

Hi guys,

I'm using DOpus 11.

Yes, but the suffix will not always be a numerical digit, sometimes it will be an alpha (A, B, C, etc).

Install SelectEx, find it here: [Command: SelectEx (extended Select command))
Then open the folder your files are in and run this command without selecting items.
Put the command it in a button first or paste it straigt into the command bar, press ">" to open it.

SelectEx SIMILARMETAJS="if (!item.is_dir && String(item.name).search('_(0|A)\\.')==-1) cmd.RunCommand('Select '+String(item.name).substring(0,10)+'_*');"

This works for the filenames you posted and also if the group ends with _A,B,C.. instead of _0,1,2.
You need SelectEx v0.5 (uploaded some seconds ago), as I made the shortcut "cmd" variable available to js-snippets to prevent extra long command lines. Admittedly it's still quite long, but still a one-liner. o)

Hi TBone,

Awesome thanks so much for providing that.

I have only just got around to testing it (sorry for the late reply) and have found that it will only select a group of duplicates if all duplicate files have either alpha suffixes or numeric suffixes. It will identify three (or more) files so long as at least 2 of the duplicates have an alpha or number suffix. Please see screen shot below for example. In the example, file '1095-f1505' has not been picked as a duplicate because of the pair; one has an alpha suffix and one has a numerical suffix - there is not 2 of each type.


Sorry for the poor explanation, pretty difficult to describe.

Thanks for the help.

Any update on this TBone?

Yes, I still keep this thread/browser-tab opened to not forget about it! o)
Can't say when I'll find the time to put the necessary pieces together though.

With the info in your last post, the approach used until now won't cut it, so we need to do it a bit differently next time.
To be continued (soon I hope).. o)

No worries, thanks a lot for the update.

Do you have any suggestions as to other approaches I might be able to investigate in the meantime? I feel the script you provided is close to 100%, so that might be the best bet.

Hi lamenstern,

in case this is still unresolved and something you'd like to have.
Download and try the attached button, it should enhance on the misbehaving bits from the last approach.
SelectGroupOfFiles.dcf (8.59 KB)

This is the full script button code, making use of a generic filter "thing":

////////////////////////////////////////////////////////////////////////////////
var PreFilter = function( file ){
	//return true if this file(type) is of interest in the process
	return true;
}
////////////////////////////////////////////////////////////////////////////////
var MainFilter = function( file ){
	//return true to run evaluation on this specific file(type)
	return true;
}
////////////////////////////////////////////////////////////////////////////////
var Evaluate = function( file ){
	//return true to execute the foreach-operation on this file (select, delete etc.)
	var baseBaseName = file.baseName.replace( new RegExp("(.*?)_(.*)"), "$1");
	var baseBaseSuffix = file.baseName.replace( new RegExp("(.*?)_(.*)"), "$2");
	if (this.FileExists(baseBaseName.esc()+'_[^'+baseBaseSuffix.esc()+']'+file.ext.esc()))
		return true;
}
////////////////////////////////////////////////////////////////////////////////
var ExecuteForEach = function( file ){
	//execute for each file that passed evaluation
	this.cmd.RunCommand('Select "'+file.name+'" EXACT');
} 
////////////////////////////////////////////////////////////////////////////////
var ExecuteForAll = function(){
	//execute finally
	//this.cmd.RunCommand('SelectEx MAKEVISIBLE');
} 
////////////////////////////////////////////////////////////////////////////////
function EasyFilter(filesIn,cmd,preFilter,mainFilter,evaluate,executeForEach,executeForAll){
	this.version		= 0.1
	this.files			= [];
	this.filesFiltered	= [];
	this.filesEvaluated	= [];
	this.cmd			= cmd;
	this.PreFilter		= preFilter;
	this.MainFilter		= mainFilter;
	this.Evaluate		= evaluate;
	this.ExecuteForEach	= executeForEach;
	this.ExecuteForAll	= executeForAll;
	////////////////////////////////////////////////////////////////////////////
	String.prototype.esc = function(str){
		return this.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
	}
	////////////////////////////////////////////////////////////////////////////
	this.FileExists = function( fileNameRegex ){
		fileNameRegex = new RegExp(fileNameRegex);
		for(var i=0;i<this.files.length;i++){
			if (this.files[i].name.search(fileNameRegex)!=-1)
				return this.files[i];
		}
		return null;
	}
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("PreFiltering ["+filesIn.length+"] files..");
	for(var i=0;i<filesIn.length;i++){
		if (this.PreFilter(filesIn[i])===true){
			this.files[this.files.length] = filesIn[i];
			DOpus.Output("    PreFilter passed ["+filesIn[i].name+"]");
		} else {
			//DOpus.Output("    PreFilter ignore ["+filesIn[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("MainFiltering ["+this.files.length+"] files..");
	for(var i=0;i<this.files.length;i++){
		if (this.MainFilter(this.files[i])===true){
			DOpus.Output("    MainFilter passed ["+this.files[i].name+"]");
			this.filesFiltered[this.filesFiltered.length] = this.files[i];
		} else {
			//DOpus.Output("    MainFilter ignore ["+filesIn[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Evaluating ["+this.filesFiltered.length+"] files..");
	for(var i=0;i<this.filesFiltered.length;i++){
		if (this.Evaluate(this.filesFiltered[i])===true){
			DOpus.Output("    Evaluation passed ["+this.filesFiltered[i].name+"]");
			this.filesEvaluated[this.filesEvaluated.length] = this.filesFiltered[i];
		} else {
			//DOpus.Output("    Evaluation ignore ["+this.filesFiltered[i].name+"]");
		}
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Processing ["+this.filesEvaluated.length+"] files..");
	for(var i=0;i<this.filesEvaluated.length;i++){
		DOpus.Output("    Running ForEachOp ["+this.filesEvaluated[i].name+"]");
		this.ExecuteForEach(this.filesEvaluated[i]);
	}
	DOpus.Output("");
	////////////////////////////////////////////////////////////////////////////
	DOpus.Output("Running ForAll-Operation..");
	this.ExecuteForAll();
	DOpus.Output("");
}
////////////////////////////////////////////////////////////////////////////////
function OnClick(data){
	var filesTab = data.func.sourcetab.files, files = [];
	var cmd = data.func.command; cmd.ClearFiles();
	////////////////////////////////////////////////////////////////////////////
	for(var i=0;i<filesTab.count;i++){
		var file = {	name		: String(filesTab(i).name).toLowerCase(),
						baseName	: String(filesTab(i).name_stem).toLowerCase(),
						ext			: String(filesTab(i).ext).toLowerCase() };
		files[files.length] = file;
	}
	var filter = new EasyFilter(files,cmd,PreFilter,MainFilter,Evaluate,ExecuteForEach,ExecuteForAll);
	DOpus.Output("Done.");
}

Hi TBone,

Oh awesome, thanks so much for taking the time to get back to this, and also for all your hard work putting the script together.

Sorry for the late response.

I have performed a few tests and from what I have tried so far... the script appears to be working great (selecting all groups of duplicates).

One minor change I have made was to perform a SELECT NONE before executing the script - simply to deselect the current selection, given the current selection may not always contain duplicates.

Thanks again for you huge effort, I really appreciate it.

I'm glad to hear! o)
Once the EasyFilter snippet was done, which I use for other things myself, incorporating your usecase took just a few seconds. The hard part was to not forget about this unfinished challenge during the weeks. o) Cya!

Hi, I have a request on similar lines. Almost like what is requested here [url]Find and Delete Partial-Named "Duplicate" Files].

Let's say, for the following files: (name and version number)
First Set of files 2.4.07.ext
First Set of files 3.0.ext
Second Set 1.0.ext
Second Set 1.9.3.ext
Second Set 2.7.ext
Second Set 3.3.1.2.ext
Third Single 5.2.ext
and so on..

I would like the following selection possibilities:

  1. Select the older version files of a set (for deletion), i.e. leave alone the highest version file in each set and single files and select the rest.
  2. Select the oldest version files alone.
  3. Select only the Singles.

Thanks :slight_smile:

Set the MODE variable to your preferred kind of mode, maybe create 3 buttons for each.
Same EasyFilter as before, but I added an Init() callback to register the custom functions.


var MODES   = { SINGLE      : 0,    //items without further versions
                OBSOLETE    : 1,    //items having newer version
                LATEST      : 2};   //items being latest version

var MODE    = MODES.LATEST;

////////////////////////////////////////////////////////////////////////////////
var Init = function(){
    this.Sets = {};
    this.HasVersion = function(file){
        var match = /((?:\d+\.)+\d+)/.exec(file.baseName);
        if (match) return (file.version = match[1]);
    }
    this.GetSetName = function(file){
        return file.baseName.replace(/(.+?)((?:\d+\.)+\d+)/,"$1");
    }
}
////////////////////////////////////////////////////////////////////////////////
var PreFilter = function( file ){
    //return true if this file(type) is of interest in the process
    if (this.HasVersion(file)) return true;
}
////////////////////////////////////////////////////////////////////////////////
var MainFilter = function( file ){
    //return true to run evaluation on this specific file(type)
    var setName = (file.setName = this.GetSetName(file));
    if (!setName) return false;
    if (typeof this.Sets[setName] == "undefined") this.Sets[setName] = [];
    this.Sets[setName][this.Sets[setName].length] = file;
    return true;
}
////////////////////////////////////////////////////////////////////////////////
var Evaluate = function( file ){
    //return true to execute the foreach-operation on this file (select, delete etc.)
    var set = this.Sets[file.setName];
    if (MODE == MODES.SINGLE){ if (set && set.length == 1) return true; return false; }
    if (!set) return false;
    if (!set.sorted) set = set.sort(
        function(a,b){ return VersionCompare(a.version, b.version, {lexicographical:true, zeroExtend:true} ); } );

    set.sorted = true; //first item == highest version
    if (MODE == MODES.LATEST)   {if (set[0].name == file.name) return true; return false;}
    if (MODE == MODES.OBSOLETE) {if (set[0].name != file.name) return true; return false;}
    return false;
}
////////////////////////////////////////////////////////////////////////////////
var ExecuteForEach = function( file ){
    //execute for each file that passed evaluation
    this.cmd.RunCommand('Select "'+file.name+'" EXACT');
} 
////////////////////////////////////////////////////////////////////////////////
var ExecuteForAll = function(){
    //execute finally
    this.cmd.RunCommand('SelectEx MAKEVISIBLE');
} 
////////////////////////////////////////////////////////////////////////////////
function EasyFilter(filesIn,cmd,init,preFilter,mainFilter,evaluate,executeForEach,executeForAll){
    this.version        = 0.2; //init added, and "nulled" files will not be evaluated
    this.files          = [];
    this.filesFiltered  = [];
    this.filesEvaluated = [];
    this.cmd            = cmd;
    this.Init           = init;
    this.PreFilter      = preFilter;
    this.MainFilter     = mainFilter;
    this.Evaluate       = evaluate;
    this.ExecuteForEach = executeForEach;
    this.ExecuteForAll  = executeForAll;
    ////////////////////////////////////////////////////////////////////////////
    String.prototype.esc = function(str){
        return this.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
    }
    ////////////////////////////////////////////////////////////////////////////
    this.FileExists = function( fileNameRegex ){
        fileNameRegex = new RegExp(fileNameRegex);
        for(var i=0;i<this.files.length;i++){
            if (this.files[i].name.search(fileNameRegex)!=-1)
                return this.files[i];
        }
        return null;
    }
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("Initialising....");
    this.Init();
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("PreFiltering ["+filesIn.length+"] files..");
    for(var i=0;i<filesIn.length;i++){
        if (this.PreFilter(filesIn[i])===true){
            this.files[this.files.length] = filesIn[i];
            DOpus.Output("    PreFilter passed ["+filesIn[i].name+"]");
        } else {
            //DOpus.Output("    PreFilter ignore ["+filesIn[i].name+"]");
        }
    }
    DOpus.Output("");
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("MainFiltering ["+this.files.length+"] files..");
    for(var i=0;i<this.files.length;i++){
        if (this.MainFilter(this.files[i])===true){
            DOpus.Output("    MainFilter passed ["+this.files[i].name+"]");
            this.filesFiltered[this.filesFiltered.length] = this.files[i];
        } else {
            //DOpus.Output("    MainFilter ignore ["+filesIn[i].name+"]");
        }
    }
    DOpus.Output("");
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("Evaluating ["+this.filesFiltered.length+"] files..");
    for(var i=0;i<this.filesFiltered.length;i++){
        if (this.filesFiltered[i] && this.Evaluate(this.filesFiltered[i])===true){
            DOpus.Output("    Evaluation passed ["+this.filesFiltered[i].name+"]");
            this.filesEvaluated[this.filesEvaluated.length] = this.filesFiltered[i];
        } else {
            //DOpus.Output("    Evaluation ignore ["+this.filesFiltered[i].name+"]");
        }
    }
    DOpus.Output("");
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("Processing ["+this.filesEvaluated.length+"] files..");
    for(var i=0;i<this.filesEvaluated.length;i++){
        DOpus.Output("    Running ForEachOp ["+this.filesEvaluated[i].name+"]");
        this.ExecuteForEach(this.filesEvaluated[i]);
    }
    DOpus.Output("");
    ////////////////////////////////////////////////////////////////////////////
    DOpus.Output("Running ForAll-Operation..");
    this.ExecuteForAll();
    DOpus.Output("");
}
////////////////////////////////////////////////////////////////////////////////
function OnClick(data){
    var filesTab = data.func.sourcetab.files, files = [];
    var cmd = data.func.command; cmd.ClearFiles();
    ////////////////////////////////////////////////////////////////////////////
    for(var i=0;i<filesTab.count;i++){
        var file = {    name        : String(filesTab(i).name).toLowerCase(),
                        baseName    : String(filesTab(i).name_stem).toLowerCase(),
                        ext         : String(filesTab(i).ext).toLowerCase() };
        files[files.length] = file;
    }
    var filter = new EasyFilter(files,cmd,Init,PreFilter,MainFilter,Evaluate,ExecuteForEach,ExecuteForAll);
    DOpus.Output("Done.");
}
//////////////////////////////////////////////////////////////////////////
function VersionCompare(v1, v2, options){
    //compares two software version numbers (e.g. "1.7.1" or "1.2b")
    //copyright by Jon Papaioannou (["john", "papaioannou"].join(".") + "@gmail.com")
    //This function is in the public domain. Do what you want with it, no strings attached.
    var lexicographical = options && options.lexicographical,
        zeroExtend = options && options.zeroExtend,
        v1parts = v1.split('.'),
        v2parts = v2.split('.');

    function isValidPart(x) { return (lexicographical ? /^\d+[A-Za-z]*$/ : /^\d+$/).test(x); }

    for(var v=0;v<v1parts;v++) if (!isValidPart(v1parts[v])){ return NaN; }
    for(var v=0;v<v2parts;v++) if (!isValidPart(v2parts[v])){ return NaN; }
    
    if (zeroExtend) {
        while (v1parts.length < v2parts.length) v1parts.push("0");
        while (v2parts.length < v1parts.length) v2parts.push("0");
    }

    if (!lexicographical) {
        v1parts = v1parts.map(Number);
        v2parts = v2parts.map(Number);
    }

    for (var i = 0; i < v1parts.length; ++i) {
        if (v2parts.length == i) { return -1; }
        if (Number(v1parts[i]) == Number(v2parts[i])) { continue; }
        else if (Number(v1parts[i]) > Number(v2parts[i])) { return -1; }
        else { return 1; }
    }

    if (v1parts.length != v2parts.length) { return 1; }
    return 0;
}

Awesome! Thank you soo much!
There were 2 places where it missed in the whole lot..
(Latest Mode)

I think it missed the 'Bluelight Filter' one, since there was no period in the version number.
The 'CalcTape' one is due to the parentheses in the version number.
Pls fix it. More importantly the first one.

Thanks again :slight_smile:

I like to quote Leo here, for further help and fixes, please link your account.
Also please post the problematic sets (file names), so I don't need to create them by hand.

The fact that "Bluelight Filter" fails, is indeed the missing dot in the version, is "60" a version? o)
For "CalcTape" I'd need to look more closely.