Finding files based on slightly unconventional attributes

I have the following queries on finding files on slightly unconventional attributes

Using the Find in Directory Opus, how do I find All Files which

  • Use non standard English, non ASCII or non-standard characters in its name.
    e.g. વાર્ષિક.pdf or
    Any file using characters other than A-Z, a-z, 0-9, specified set of special characters like hyphen - etc. in its file name

  • Which have names that are more than 260 characters in length (length of file name including full path)

Something like this:

To try out the different kind of Find and how it works,

I have a Test folder with following files in it.
Names in Print Folder to PDF available in TestFiles.zip (218.1 KB)

I am trying to use Directory Opus Find for finding all files that.

  1. does NOT contain space in the name.
  2. does contain space in the name.
  3. have characters other than A-Z a-z 0-9 in the name.
  4. have characters = + ~ ^ @ , % $ ! - ' (space) in the name.
  5. does NOT have characters = + ~ ^ @ , % $ ! - ' (space) in the name.
  6. have character other than A-Z a-z 0-9 = + ~ ^ @ , % $ ! - ' (space) (i.e. have foreign character or any other characters other than those used currently in any of the test file names)

Queries

I could successfully use Find for 1, 2, 4, 5.

But I cannot figure out how to write the regular expression for 3 and 6.

I tried variations of above response as shown in above image below but it did not work.

Help to figure out the regular expression for 3 and 6 above will be highly appreciated.

The regex I gave you already finds anything that has any characters other than A-Z, 0-9, space, - and . in.

(You don't have to worry about lowercase a-z if you turn off the "case sensitive" option.)

You can add more characters before the ]. Some need a \ before them to escape them, since they have special meanings otherwise. That is why there is \- and \. instead of just - and . in the list.

The basics of the regex syntax are here in the manual:

https://www.gpsoft.com.au/help/opus12/index.html#!Documents/Regular_Expression_Syntax.htm

Ok. I got something working. And found something during the testing which was not intended.

If I give Name Match [^A-Z0-9 -.=+~^@,%$!'], it shows me only two files with the અ in it.

These files also have ABC as part of name in it which is fine.

I have added two more file to the directory called અ.txt (file name (excluding extension) containing only foreign characters) and ABC.doc.

If I give following
Name match [x] it shows all files that have extension .txt.
Name match [o] it shows ABC.doc as it is only file having o as part of .doc

This to some extent beats the purpose as all files irrespective of their name part will have some standard extension most of the times. Extension including the dot needs to be ignored for some of the finds I require (described at the end of this post)

So some of my requirement when properly elaborated based on above observation changes to find all files wherein only the name part is considered without the extension being taken into account. How do I achieve this ?

To extend the query above at the start to find all files with ONLY foreign characters in it (within name). That is the file name should NOT contain any of the characters in above regular expression at all. In other words full file name (excluding extension) should be made of foreign characters.

One other variation which I will want is full name of foreign characters and any of the special characters above. No A-Z in short. Extensions should be ignored in all cases.

You should escape the - and . characters, like I did in my example. The \ before each of them were important.

Some of the others may also need escaping, although I would have to do tests to be sure. If in doubt, escape the character, since it never hurts. For example ^ has special meaning inside [...] blocks, at least at the start of them, so I would escape that as well. (It's possible you don't need to. Depends on the exact regex flavour, and I can never remember which needs what.)

[^A-Z0-9 \-\.=+~\^@,%$!']

I'm confused about why you are doing that. It seems to go against your requirements, or at least your initial ones.

Could you rephrase that?

Or maybe provide a list of examples.

And please include all details up-front, if there are any others you haven't specified yet, otherwise this will take a lot longer. If there are other requirements then it could completely change the way the problem needs to be tackled.

I have escaped ALL characters. It is just that on the forum when displaying HTML renders them without the \ when it displays it as text. This is what I actually use to give an example.
image

This is linked above the post editor (except on mobile), and essential if you want to post code, regex, etc.

BACKGROUND and BASIC ASSUMPTIONS
I consider all characters other than these are foreign characters.

The following characters (everything except A-Z in above) are special characters.

Extension including the dot to be ignored while finding files.

TEST SCENARIO
The test directory contains total of 24 files. The complete Print Folder to PDF with all 24 test file names is available in TestFiles.zip (219.9 KB)

For test purposes only, these are the foreign characters I use in my test file names. (Please note that it could be any other foreign characters as well in real life)
અ બા ક

REQUIREMENT
Ignore extension including the dot while finding files for all the queries below.

  1. Find all files with only foreign characters in name. The regular expression used in Find tool should give me only following files from the 24 in the directory.
    image

  2. Find all files which contain (at least one or more but all foreign characters only) AND (all other characters other than foreign characters if present are special characters). (That is no A-Z anywhere in name). The regular expression used in Find tool should give me only following files from the 24 in the directory.
    image

  3. Find all files which contain
    ((at least one or more foreign characters) AND
    ((one or more special characters) OR (A-Z)))

    as part of name. (That is no files to be found if it does not contain at least one foreign character in name) The regular expression used in Find tool should give me only following files from the 24 in the directory.
    image

  4. Find all files which contain NO foreign characters anywhere in its name. The regular expression used in Find tool should give me only following files from the 24 in the directory.

  5. Find all files which contain only A-Z OR 0-9 in its name (That is no files to be found with any foreign or special characters other than 0-9 in it). The regular expression used in Find tool should give me only following files from the 24 in the directory.
    image

  6. Find all files which contain only A-Z in its name (no 0-9 or foreign or special characters anywhere). The regular expression used in Find tool should give me only following files from the 24 in the directory.
    image

  7. Find all files which contain only special characters (excluding 0-9) in its name . (That is no A-Z, 0-9 or foreign characters anywhere in its name). The regular expression used in Find tool should give me no files in this case as there are no files available in test directory with only special characters in its name.

p.s. Note that I have manually copied all the outputs required and shown above into a separate folder for providing the screen shots of find results. I actually want to know as to how to use the Find tool regular expression for the above 7 queries which covers most of the common requirements I have as of now.

So this is a lot more complicated than the initial question implied.

We can help a bit with this type of thing, but we don't have time to work out seven different complicated regular expressions for you. If you want that kind of thing, to that level of detail/complexity, you'll need to read up on regex to become self sufficient in them.

I have already started reading more on regex.

However, if you can help with three queries namely 1,2,4 it will be highly appreciated.

Also I am not aware on how to ignore extension (including dot) when searching using Find. This applies to all my queries.

Is there any attribute which allows to search only the name part (e.g. like there is one for extension) ?

You'd need to make (some of) your regex account for the extension.

e.g. [A-Z]+\..* will match a file with 1 or more of the letters A-Z before a dot and then anything else.

If you have files with multiple dots in them and only want the last one to be considered the extension, then [A-Z]+\.[^\.]+ might be better.

I understood the first part of this regex. [A-Z]+\.checks for 1 or more of the letters A-Z before a dot.

  • In the remaining part .*, * signifies any characters.
    Can you please explain what the dot in the .* signifies

  • What [^\.]+ actually does I cannot understand.

There's a section on regex in the Opus manual which explains these terms.