File search for "other" attributes

I am in the process of upgrading this computer to the Anniversary update for Windows 10. It gave me a message saying that I might have music files that were created with DRM that is tied to my installation (and apparently they are going to break it, so nice of them). So, I went to my trusty DOPUS to find all of the protected files on my computer.

I was able to use the advanced search, but I could find no way to tell the search to only show me files with a specific attribute (not an attribute in the directory entry, but an attribute in the file itself). It seems like it would be a fantastic improvement to the search tool if I could search by essentially any attribute you allow me to put into the Details view (in this case I'm looking at the /Music/Protected attribute).

I was able to solve this by sorting the search results by the protected attribute (it's a boolean value), but I could think of others that would be much harder to handle this way (having to look through hundreds of files manually). For example, what if I wanted to search my music library for MP3 files with specific text in the copyright field? How cool would that be (and slow, I get that, too).

Also, I ran DOPUS file search side-by-side with the program I've been using for years, TreeSize Professional File Search, and it was interesting. Both programs ran the search in similar time (finding only 172 files out of 1.45 million files on two drives). There were four files that DOPUS didn't find but TreeSize did (turns out the directories the files were in had a botched owner setting, Microsoft Messaging-related files, of course). There were 17 files that DOPUS found but TreeSize did not. I still don't know why this is (I don't see anything special about the files).

Thanks for a great product.

I just had another thought. I haven't really played with the JavaScript macros in DOPUS, but maybe there is a way I could iterate over the search results and remove files that don't match the attributes I care about? I tried modifying a sample script to walk through audio files in a directory, but I really can't figure out the metadata object. The documentation says, "The AudioMeta object is retrieved from the Metadata.audio or Metadata.audio_text properties." but I couldn't get it to work.

Here is my test script, I'd love some feedback:

var folderEnum = DOpus.FSUtil.ReadDir("D:\\temp", false);

while (!folderEnum.complete)
{
    var folderItem = folderEnum.next;

    if (!folderItem.is_dir)
    {
        if (folderItem.metadata === "none")
        {
            tagString = "<no metadata available>";
        }
        else
        {
            try 
            {
                if (folderItem.metadata == "audio") DOpus.Output("audio defined");

                DOpus.Output("'" + folderItem + "'");                               // Prints file path, maybe 'C:\temp\testfile.mp3'
                DOpus.Output("'" + folderItem.metadata + "'");                      // Prints 'audio'    
                DOpus.Output("'" + folderItem.metadata.audio + "'");                // Prints ''
                DOpus.Output("'" + folderItem.metadata.audio.AudioMeta + "'");      // Prints 'undefined'
                DOpus.Output("'" + folderItem.metadata.AudioMeta + "'");            // Prints 'undefined'
                DOpus.Output("'" + folderItem.metadata.AudioMeta.mp3bitrate + "'"); // Throws exception
            }
            catch (e) 
            {
                DOpus.Output("error: " + e); // Prints "error: TypeError: 'folderItem.metadata.AudioMeta.mp3bitrate' is null or not an object"
            }
        }
    }
}

Have you tried using a filter? The 'Music' category contains a field 'Protected'. That might do the trick.

AudioMeta is the name of an object type (for documentation purposes, mainly). It's not an actual property name.

If folderItem is an Item object, folderItem.metadata is a Metadata object, folderItem.metadata.audio is an AudioMeta object, and folderItem.metadata.audio.mp3bitrate is the property.

Thanks for the response, but can you please elaborate?

Prompted by your response I searched for "filter" related to DOPUS and only found a "filter bar" that I was previously unaware of (enabled by pressing the asterisk key), but doesn't appear to offer any possibility of filtering other than by pre-defined file categories (which I could find no way to customize).

The File Find, which I've used, does not reference filters, and with that you can specify file names to find in a very powerful way, but don't see any way to specify categories.

If there is a way to specify that I only want files with the "Music Category, Protected Field, set to true", I don't know where or how to do that.

[quote="jon"]AudioMeta is the name of an object type (for documentation purposes, mainly). It's not an actual property name.

If folderItem is an Item object, folderItem.metadata is a Metadata object, folderItem.metadata.audio is an AudioMeta object, and folderItem.metadata.audio.mp3bitrate is the property.[/quote]

Thank you, that is helpful.

Damn, not five minutes after I posted that previous message, I figured out that in the Advanced tab of the File Find, you can click on the "plus sign" on the left (which brings up a new condition row), and then click on the word "Name" and select other things besides file name (one of which was Music, and within music the Protected field).

To DOPUS, I have to say that this was not obvious. In fact, while this interface is very powerful, it was extremely difficult to figure out which items were actually clickable drop-down lists (turns out most of them). It wouldn't take much to make them stand out and not just look like something other than just descriptive text (maybe just a downward-pointing triangle next to the words).

Web applications have completely robbed me (and many people) of the idea of tabbing through fields (which, if I had done, would have realized sooner that the word "Name" was actually a control). So many web applications written today don't work properly with the tab key (they jump all over the place), I've pretty much stopped depending on it. Personally, I work very hard to put tabbing functionality into our applications, but in trials and testing very few people use the tab key on any web site, and (in my experience) this is translating into fewer users using tab in native applications as well.

OK, I'm back again to say that the File Find doesn't work 100% with regard to attributes (or at least this attribute). When I search my library of files with the following search criteria:


It found 128 files that apparently had no value in the "Protected" column. If I reverse the logic and search for Protected == No, it finds the same 128 and then a whole lot of other files that do have a displayed value of "No" in that field (I can see which files have a value in that field by adding the Protected field to the column list for the search results). I don't actually have any DRM "protected" files to test against, but I'm assuming it would include those for the == Yes search.

At first I thought it was just assuming blank meant "Yes" (I don't really know whether blank means the attribute is missing, or is an empty string, or either), but once it returned those files for both Yes and No, it's clear that it's not working (at least, it's not working the way I would have thought "Music Match Protected == Yes" should work).

And the advanced file find isn't a panacea since while it has "Music" it doesn't have "Video" or "Movies" (the column chooser has "Movies" but it seems to me it would be better to use the more generic "Video" and "Audio" rather than trying to describe one use for each type of file). Also, it has "Image" while the column chooser has two different categories for "Picture" in "Picture Dimensions" and "Picture Metadata" and under "Picture" it only includes a subset of the attributes shown in the column chooser.

To make this more interesting, when I look at the properties of a file that shows a blank value for "Protected" in DOPUS, I found the Protected property is in the list and specifically has a "No" value (see screen shot). I would chalk this down to the Properties dialog just showing all attributes, but there are quite a few attributes that DOPUS lists that aren't in the list in the Properties dialog.


The query you have boils down to A or B or C and D.

This actually evaluates as A or B or (C and D), but what I think you want is (A or B or C) and D.

In Opus this query is built using a Subclause, like this:


But there's actually a much simpler way of doing this, by combining the three wildcards into one:


The advantage of this is that you don't need the subclause because there are only two tests (A and B).

[quote="jon"]The query you have boils down to A or B or C and D.

This actually evaluates as A or B or (C and D), but what I think you want is (A or B or C) and D.
[/quote]

Thanks again for the help. I was afraid of that. I looked all around for a way of "grouping" the existing rules, it didn't occur to me to look in the attribute list to create a new group. I think that even if I had thought of that, "subclause" wouldn't have caught my attention. Group might be a better word. Also, now that I see that, I think having the grouping be on the same logical level as "name" and "size" is a mistake.

It did make me wonder what "subfolder" and "filter" do and I love them both (and filter would be another way to solve this problem).

Also, thanks also for the alternate syntax for wildcards, I just assumed it was the standard Windows wildcard. If I had thought of it I could have used a regular expression to combine them as well. After some testing though, the regular expression search is quite a lot slower than the wildcard search (you probably knew that).

The interface as it is, doesn't allow me to grow a query dynamically. What I mean is, there is no way I can find to take my existing items and create a "group" or "subclause" out of them (or remove a row from a subclause). Instead, I have to start from scratch once I've realized that I need a group. Also, there appears to be no way to re-order the clauses so that I could change the relationships.

I think having a "handle" column that would allow me to drag-n-drop the clauses (to change the order or to drag them into a group) would be a very intuitive solution. Another choice, which could be done with, or without, the drag-n-drop, would be to have left arrow and right arrow icons, which would "indent" the items, creating a group (sequential items with the same indent are in a logical group). You could use the exact same displayed indenting that you use now (in that it would create a "subclause" element to contain them), but it would be created automatically by pressing the arrow, and would remove the item from the group if you press the back arrow. I was looking at the keyboard interface for that control as well and right and left keyboard buttons could do the indenting and un-indenting, they don't appear to do anything now.

Another choice would be to have check boxes in the left column of each item, before the "+-x" icons. Clicking multiple rows would then have an item in the menu bar that said, "group" or something and would build the subclause grouping with those moved into it. Once you had this, there are other things you could do, for example, if I had three lines selected and changed the "and/or" it would change all of the selected items. Of course, this may not be as clean with your keyboard mapping as hitting space would disable the item, rather than checking it, which might be counter-intuitive.

Also, once I've created a named filter, I would like to add it to a toolbar. That way I could search the current directory (and below if the saved filter included that) by clicking on a toolbar button that says "All large triple X files". Or, I could right-click on any folder and find a menu item of DOPUS Search -> "All large triple X files" in that folder.

Also, since you have this scripting interface, it would be cool to define a way to have a scripted item in the advanced search. It would run the script for every file found by the search and return "true" or "false" to indicate whether it matches some really complicated thing.

Also, since I'm redesigning the interface :wink:, I think it was a mistake using an "X" to mean disable, as in every other application in windows "X" means delete or close (it's on the top right of every window). I would make that a checkbox, or some other icon at least. Strangely, when I looked at the documentation (which clearly I should have read before trying to use the Advanced File Find) the sample graphics do show a box with a check in it instead of the current X icon.

I would also use radio buttons for the items in a row (Use wildcards, Use regular expression), those clearly don't work like check boxes.

Finally, for this post anyway, it might be smart to put a "help" button directly on the advanced menu bar so that people don't have to search for the documentation about advanced searching. You could also put

OK, the Regular expressions aren't as slow as I thought. I made the mistake of putting the test for the Music attribute first in the list and the file name match second. So, apparently it was checking the properties of every one of the millions of files on my drive and then applying the filename filter.

I tried hitting stop on the search query after 30 minutes, but it refused to stop. The only way I could get it to abort was to kill the process.

This is another instance where I had to destroy and recreate my query from scratch, since I couldn't add a new item above the existing item (I had deleted the three file name tests, which left the attribute match as the only item, so I, of course, added the new RE filename test after the attribute match).

Might I suggest attaching expense ratings to each of the filter types. Anything that only requires looking in the directory structure (date, time, filename, etc.) would get an expense of 1, while any that has to open each file gets a 10, etc. You can play with these values until the relative expense makes sense. Since boolean logic is commutative at any level in your parse tree you can reorder the items by their expense, doing the cheap ones first and the expensive ones later. You're more likely to remove files from the list with the cheap tests and then you've saved having to do the expensive ones at all. Walk your parse tree before you start and sort each level. It's not a guarantee of a faster query, but it's extremely likely.

Here's another suggestion for you. The Subfolder item is interesting, but it could be more powerful.

Let's say you have two subfolders you want to search in, but you want to search for files with regular expression A in the first subfolder and regular expression B in the other subfolder (the subfolders searched would still apply the way it does). If you add a "children" node defined as a "subclause" that was run on the children of the folders matched by subfolder. So you might have a parse tree that looks like this:

[ol]
[li]Subfolder Match
[ol]
[li]Name Match "Folder1"[/li]
[li]And Children
[ol][li]Name Match "sweet"[/li]
[li]And Name Match RE "^.+.(mp3)$"[/li][/ol][/li][/ol][/li]
[li]Subfolder Match
[ol]
[li]Name Match "Folder2"[/li]
[li]And Children
[ol][li]Name Match "ripe"[/li]
[li]And Name Match RE "^.+.(mp3)$"[/li][/ol][/li][/ol][/li][/ol]

This would match any MP3 files with "sweet" in the name in folders with (base folder)"Folder1" in the name and any files with "ripe" in the name in folders with (base folder)"Folder2" in the name.

This query could be accomplished with a regular expression and the fullpath filter type, but it would be more expensive since you might be checking hundreds or thousands of extra files just to see if the final fullpath matched. This way you would recursively search only candidate folders for candidate files. And, yes, I could have mixed the ripe and the RE match, I just wanted to show multiple rules within the Children.