[Suggestion] Improve FAYT searching by non-adjacent terms

There are other tools around providing a similar on-the-fly search feature as the substring searching/highlighting that was newly introduced in DOpus 11.

I want to suggest that DOpus even matches non-continuous substrings if there is no exact match in the file list. In other words if the user type "reca" the FAYT will apply the term as the following regex ".*r.*e.*c.a." which will highlight "Red Fast Car.jpg". More formally this would allow the user to enter any sequence of characters (e.g. only the first character of each word) to quickly locate files and folders based on multiple known parts of the name (that don't need to be adjacent). To make it more powerful a lowercase searchstring should be applied case-insensitive but if there is at least one uppercase character the searchstring should be applied case-sensitive. In other words the term "ECa" produces the regex ".*E.*C.a." and obviously it won't match "Red Fast Car.jpg" but "Electric Car.jpg".

See the attachment for another example!


Not sure I would like it as it would match too much.

However, maybe something like some sites or programs has would be better, multi-partial matches separated by spaces.
Example:
You could search for "re te" (without the quotes) to match "regular track test.txt"
Basically it first tries to match as is, then tries to match it as partial words in the same filename/ext.

It shouldn't be so hard to implement as it should simply try to partial-match (as it already does) each of the space separated
words to the same filename, where the matches are in the same order in the filename.
In other words, a search for "re te" won't match "test regular", but will match "regular track test".

This should make it possible to do searches similar to what you suggested except it would require spaces
between the characters to trigger the behavior.

[quote="myarmor"]This should make it possible to do searches similar to what you suggested except it would require spaces
between the characters to trigger the behavior.[/quote]

Yes this would significantly reduce the number of matching candidates. Unfortunately I have a lot of Java classfiles which all have no spaces in their names. By separating at word boundaries it won't help in my usecase.

Maybe there is another way of reducing the number of matching files. If each character in the search term can only be the first character of a word there will be much less matches but you don't have to enter space as separator. The beginning of a word is either a change from lowercase to uppercase or a preceding whitespace.

Examples:

Term: redcar
Match: red car, red fast car, RedFastCar, RedfastCar
No-Match: redfastcar, RedFastcar

If we say that space is still used as separator between partials in the search, and it stops after successful match in a filename.
By stop I mean that if it has a match when checking 1. against the filename, it won't try to check 2. against the same file and so on.

Why not have it like this

  1. check partial match including spaces, iow "ar track"="regular track" (current behavior)
  2. check partial words where the matches are in the same order in the filename, iow "re te"="regular track test" (as suggested above)
  3. check partial match where the matches are in the same order but can be mid-word, iow "he wo"="helloworld" (charactermatch is casesensitive if character is uppercase).

The last test allows something similar to what you suggested because with this you could write
"sing xy ada" and it would match the same as in your example.
If you use a space between each character then it would also match the exact thing you initially suggested (".*r.*e.*c.a.")

Using spaces (between word/chars, not trailing) to trigger the additional behavior should limit the performance impact.
It should probably still have an additional enable/disable option in preferences if implemented.

I can even imagine about a term and each character in the term can match a new word but the word boundaries are defined by a splitting regex in the advanced preferences. Thus any user who want to have a narrow or wide matching behaviour just have to provide an appropriate regex.

Huh? I haven't mentioned anything about regexes.
I mean that if you search for "r e c a" it would behave in the same way as what you initially suggested because the space separated
characters are in order, hence you would get the same match as ".*r.*e.*c.a.".

Yes I understand your suggestion. The idea to have a regex in the preferences for word splitting came in my mind independent of that. :slight_smile:

Relating to your suggestion it would be difficult if you want to locate "helloworld.jpg" and decide to enter "he wo" but it depends on the other files in the same wolder whether "helloworld.jpg" would match or not. I think this would lead to a lot of confusion for people don't understanding the exact behaviour. I think the matching fileset should always be stable.

Thus I propose to have a splitting regex for word matching (in the preferences) which probably would be "\b" by default.

Example:

Filename: "Hello FoBar.jpg"
Splitting Regex: "\b" -> {Hello} {FooBar}
Splitting Regex: "\b|(?=[A-Z]) -> {Hello} {Foo} {Bar}

Obviously a term "heba" would only match for the wider splitting regex.

Yes I understand your suggestion. The idea to have a regex in the preferences for word splitting came in my mind independent of that. :slight_smile:

Relating to your suggestion it would be difficult if you want to locate "helloworld.jpg" and decide to enter "he wo" but it depends on the other files in the same wolder whether "helloworld.jpg" would match or not. I think this would lead to a lot of confusion for people don't understanding the exact behaviour. I think the matching fileset should always be stable.[/quote]
It is.
The reason for "he wo" matching is that it would try to charactermatch each space separated part, in other words
it'll try to match "he" anywhere in the filename, followed by "wo" at a later position in the same filename.

When I use the term charactermatch I mean that when it scans the string it will compare each character in the
searchpart to the filename, if the search character is uppercase it'll require the same in the filename, but if the character is
lowercase then that character is caseinsensitive.

That's a good rule!

PS: I think it would be nice to get some feedback from Leo or Jon if this topic is of any interest at all ...

I think these cases can already be handled using wildcards in the filter bar or, when needed, more advanced things via Find. Or using a script if you need something really esoteric.

Making FAYT so complicated would be a bad thing for most people.

I think these cases can already be handled using wildcards in the filter bar or, when needed, more advanced things via Find. Or using a script if you need something really esoteric.

Making FAYT so complicated would be a bad thing for most people.[/quote]
I wouldn't say it is complicated as it don't use any kind of wildcards or similar, only straightforward text.

The only difference is what it does when it has a search that includes a space AND it doesn't already have a
match for the filename by using the current behavior.

For example you enter a folder and want to open your favorite picture showing a red car. Thus you enter "red car" but DOpus is smart enough to select "red fast car.jpg" because there is no "red car.jpg". You simply forgot that the the car was not only red but also fast!

To be honest I think that would make things easier instead of more complicated! :thumbsup:

@AKA-Mythos, I think we might be talking partially past eachother, although the result is the same in your example.
Just to ensure we're on the same page, when I say "doesn't already have a match for the filename" I mean exactly that, not that the
search didn't match any filenames at all. In other words, if you have this filelist:
red car.jpg
red fast car.jpg
redo the scar.jpg
and you search for "red car" then with the described behavior it would match
red car.jpg (current behavior)
red fast car.jpg (new)
redo the scar.jpg (new)

Yes we have some slightly different ideas how this feature may work in detail. But unless there are more users spending some interest for this feature there will be no chances to see this implemented (since Leo said this idea was too complicated). To be honest I'm not sure whether the discussion was too complicated or the feature itself (which I feel as a great help).

There isn't much we can do about that.

The discussion was probably simple enough, but even I reacted a bit when you mentioned regexes in the middle.
I understand what you meant, although I had to reread it a couple of times with the "regex hat on" to get it.
Other than that the discussion should be simple enough.

Btw, this Java code might be interesting in relation to 3 (it isn't my site).
It doesn't have the suggested charactermatch processing, but by replacing "\*" with " " you would already get something similar to 3.
I don't do Java, only enough to translate to other languages.

It just came to me that the easiest way to explain 3. is that it is treating space as a wildcard character ("*"), but with additional character matching.

@Leo @Jon

Could you guys provide a little bit of direction about how I would get started writing an Opus script that would be a bit like this modified FAYT behavior originally pointed out by @AKA-Mythos

The requirements would be:
1: Very fast to activate (similar to FAYT)
2: Instant highlight or filter feedback in the Lister UI (again, similar to FAYT)
3: Use Filename, Extension, Comments, Tags and Label as potential strings to do partial matches against (Just like @AKA-Mythos suggested)

Thanks for any guidance that you can provide.

I think a script which met those requirements would be very difficult, if not impossible.

I'd like to share my 2 cents. I did something "scripty" to select items in "and"-mode, something like what was suggested a while back:
Entering "red" and "car" would select:

red car.jpg (current behavior)
red fast car.jpg (new)
redo the scar.jpg (new)

This kind of works, but it's slow and unconvenient in comparison to just using the bar or find tool. If the filter bar would provide a little checkbox/hotkey or special character to switch from wildcard to this simple "and"-mode it would be really cool. The find tool in v12 now has "match any word", which is exactly what I'd like to use in the (filter) bar as well.
This would be very handy if your files stick to a naming/tagging scheme or you cannot exactly remember the name of the items.

Maybe it's even time to reconsider the wildcard default handling in the filter bar. I almost never enter wildcards there, and if, then they are very basic like *(aaa|bbb), never anything more advanced, since handling the brackets and pipe characters is inconvenient most of the time.

The Google way of finding things might be more appropriate here - or as an alternative mode of course.

  • add "+" to make the word a "must have"
  • add "-" to make the word a "must not have"
  • wrap query into "" to match exactly
  • use * to allow for partial matches
    etc.

Much easier and versatile than handling the wildcards, which do not allow a "match any word"/"and" mode at all.
My point of view only of course, thanks for reading. o)