Streamlining synchronization and find duplicate files

Toller · June 6, 2007, 8:50pm

Hello,
After trying to use the “find duplicate files” and “synchronization”panels, I came to conclusion that it would be not so bad idea to merge these panels (or introduce new panel). The rationale for this proposal is the following:
These operations perform two close related actions on the two sets A and B of files properties. The “Find Duplicate” presents the intersection of the sets: A.AND.B. The synchronization operation performs union operation: A.OR.B. Other interesting and useful operations on sets such as, for example, A.XOR.B, (NOT.A).OR.B etc. are not implemented (or may be implemented but not explicitly presented). Therefore, I proposed to perform all these operations on the sets of file properties in one place. This has IMO a few benefits. First, it will help a user to choose correct synchronization operation. It seems for me that for vast majority users it is much simple to chose a correct synchronization operation seeing a diagram showing two intercepting circles (as a Photoshop tool for dividing and merging objects). Second, presentation of first step of the synchronization operation in collection folder is much less error prone, because a user can see immediately the result of the operation. Third, as I already mentioned above, it will introduce new very useful operations. Fourth, it will significantly simplify the program code, because these operations will be implemented in one piece of code.
Best regards, Toller

Leo · June 11, 2007, 10:17pm

I don't think they could be merged quite so simply because the Find Duplicates tool doesn't work on two directories A and B, it can work on one directory or two or as many as you want and, in effect, every subdirectory within the dirs it starts from is a separate set as well. (You could have duplicates spread around lots of different directories all below the same starting point.)

Toller · June 13, 2007, 8:11pm

Nudel,
For brevity, I used in my proposal only two sets (directories). All operations can be easily generalized to multiple sets A, B, C,…. For example, “Find Duplicate” operation is (A.AND..OR.(A.AND.C).OR(B.AND.C)…, where B can be subdirectory of A. In mathematics, the theory of sets considers even infinite number of sets.
Regards, Toller

Leo · June 14, 2007, 8:06am

So the sets in these logical expressions would consist of every directory (including all subdirectories) that a duplicate was found in? Rather than sets you could just use the names of the directories.

I'm still having a hard time imagining how the user interface would work, though.

Something that could work is using VBScript (etc.) like the Rename dialog can now. Opus could pass scripts arrays of duplicate files and the scripts could contain whatever logic you want and tell Opus which of the items to select for deletion. Problem is, I can't see this being used by many people. Unlike the Rename scripts these scripts would probably tend to be one-offs that get written, used once and discarded. That would mean the feature was only useful to people who know VBScript, unlike the Rename scripts where a few people who know VBScript can create scripts that everyone else can download and use without having to understand or edit them.

Leo · June 14, 2007, 9:35pm

Here are some further ideas for discussion.

All of these are in addition to what we already have. I don't think there is any one idea that covers all situations and what we have currently works well for some things which these ideas would not work well at at all, and vice versa.

Idea 1: "The Delete Tree"

After finding the duplicates, Opus could display a tree containing all folders that have duplicates in them. Clicking a branch of the tree would tell Opus "I want to select everything in and below this branch for deletion", and Opus would them remove that branch (and everything below it) from the tree. Opus would also remove everything that would no longer be a duplicate after the specified deletion. You could then click further branches until the tree was empty (or you could leave some branches alone to keep some duplicates).

Behind the scenes, Opus would remember the order you selected things in. This is important because it wouldn't be right to delete all duplicates in all selected folders since you might have this situation:

DirA( File1, File2 )
DirB( File1, File3 )
DirC( File2, File3 )

If DirA is clicked and then DirB is clicked, then Opus should select DirA.File1, DirA.File2 and DirB.File3 for deletion, leaving:

DirA( )
DirB( File1 )
DirC( File2, File3 )

Blindly deleting all duplicates in DirA and DirB would mean there would be no copies of File1 left. Opus has to remember that DirA was chosen before DirB and use that to decide that when File1 exists in DirA and DirB it is the DirA copy that gets the chop.

In effect, each time you click a folder in the "delete tree" you are assigning it a priority that Opus will use against the priorities of other directories when deciding what to select for deletion.

This idea still leaves ambiguities. Consider this:

DirA\DirX( File1 )
DirB\DirY( File1, File2 )
DirB\DirZ( File2 )

If you select DirB then clearly DirB\DirY.File1 should be deleted, but which of the File2 copies should be kept? Opus would have to either pick one arbitrarily or leave both there for you to delete in a second pass.

This "delete tree" idea is also quite useless if you have duplicates like this:

DirA\DirX( File1 )
DirA\DirY( File1 )

DirB\DirX( File2 )
DirB\DirY( File2 )

DirC\DirX( File3 )
DirC\DirY( File3 )

You might want to delete all the duplicates in *\DirY but the tree won't give you any way to do that except going through and clicking every DirY.

Idea 2: "Duplicate Select"

In the last case mentioned above, the problem could be solved by simply selecting all duplicates whose location matches *\DirY. The select command already lets us do that, the only problem is when you have something like this:

DirA\DirX( File1 )
DirA\DirY( File1 )

DirB\DirX( File2 )
DirB\DirY( File2 )

DirC\DirY( File3 )

DirD\DirY( File3 )

Deleting everything in *\DirY will get rid of the File1 and File2 duplicates but it will also get rid of both copies of File3, leaving nothing left.

To solve this there could be a special select command which avoided the situation where all files in a duplicate group got selected at once. Opus would have to make an arbitrary choice to keep one or the other.

Thoughts

The combination of these two things, plus what we have already, would cover all the cases I can think of so far, but maybe there are cases I haven't considered.

I think the "delete tree" idea is good because it's easy to understand and interactive. You click on something in the tree and immediately see the simulated results, and then you can click more things in the tree until you're left with nothing. Each click removes a load of clutter and lets you concentrate on what's left.

I am less sure about the "duplicate select" idea. I can see myself using it, and it is very close to how I think about deleting duplicates, but I wonder if it is too confusing/advanced for people who aren't die-hard Opus users like myself?

Feel free to build upon or tear apart these ideas, or suggest something completely different.

Xyzzy · June 15, 2007, 7:34am

Here's another approach:

After performing search for duplicates, DO presents list of folders where duplicates reside (let's say they are displayed in Duplicates panel).
Then user reorders folders (via drag'n'drop or any other means) in the order he wants deletions to be performed (first remove all duplicates from 1. folder on the list, then from 2. etc.). When user changes folders' order, deletion marks in checkboxes are appropriately updated. User can of course make hand-selections.

X.

Leo · June 15, 2007, 8:30am

[quote="Xyzzy"]Here's another approach:

After performing search for duplicates, DO presents list of folders where duplicates reside (let's say they are displayed in Duplicates panel).
Then user reorders folders (via drag'n'drop or any other means) in the order he wants deletions to be performed (first remove all duplicates from 1. folder on the list, then from 2. etc.). When user changes folders' order, deletion marks in checkboxes are appropriately updated. User can of course make hand-selections.

X.[/quote]
I was thinking about the same thing, but does it provide anything that the "delete tree" doesn't? I think they end up doing the same thing (prioritising directories for deletion). Do you think one is easier to understand then the other?

I think the delete tree makes it easier to select branches (and everything below them).

In the tree idea I think it's also very helpful that after each selection it remove all folders that no longer need to be considered (because deleting the current selection would mean there are no longer any duplicates in them). On the other hand, a priority list would have people making choices between two unrelated folders that could already be duplicate-free based on the existing priorities.

On thing a priority list would be good for is giving priority to single folders without including their children, but that could be provided in the delete tree by making it possible to select an entire branch or just a single level.

Xyzzy · June 15, 2007, 10:14am

Tree is cool, and allows for some more functionality.
List requires no changes in existing interface (oh, maybe two buttons for up/down directory in Search in list).
List, in its simplest form, requires some more effort from the user to handle nested/scattered folders (fe. c:\temp\ ; c:\temp\aaa and c:\aaa\AB; d:\eee\fff\AB). I can be solved by adding buttons "Fold with nested", "Fold selected", "Fold with mask", "Unfold". No nested folding allowed.

My main point is that I'd like the function to be added (almost) transparently to the current Duplicates panel, I believe additional tree would be to messy, unless it adds essential functionality.

X.