Search for RTLO characters is too hard

I've been reading about RTLO (Right-to-Left Override) characters, such as 0x202E and so I thought I would use Dopus to see if a) I have any files named with RTLO characters, and b) If I can find one that I had created myself. I figured I could search the same way I would search for anything with a 'z' in it, namely "z.". So, I searched for "0x202E." in the quick search (except, of course, I copied the string with the actual character in there, so four characters). However, it returns everything in the directory (basically the same as if I had only typed "**.*"). I did find that if I copied and pasted the exact whole filename into quick search it does find the file, but that might still just indicate that it's ignoring the RTLO character.

Next I tried bringing up the advanced find, added one rule Name Match "0x202E.*" but strangely that returned nothing (it did not find the file I created for it to find). Next I tried regular expressions, searching for "[\u202E]+" and that finally worked.

This seems like a bug in the wildcard search? It seems like it shouldn't be so hard to find strings containing RTLO characters, should it? Or is this somehow a feature that I'm just not seeing?

I do need to admit that I have never learned a language that required right-to-left encoding, so maybe this way of working is normal and required to support these languages?

The full regular expression to find any file containing any of the RTLO characters is:
[\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]+

That's the only one I would expect to work, as it's the only one the documentation on the regex syntax Opus uses (the EMCAScript syntax as described here):

  • A unicode escape sequence of the form \uhhhh. Matches a character in the target sequence that is represented by the four hexadecimal digits hhhh.

Finding arbitrary unicode control characters has always worked this way, in my experience, except that many applications don't let you do it at all.

As an aside, in Opus you can avoid being tricked by RTLO exploits by turning on the file extension column, and/or by setting up wildcard labels to highlight exe/script/batch extensions. I do that in folders where I download and extract files. It's unlikely to be an issue in any other folder, so I just do it for specific ones and keep the other folders normal.

1 Like

You can paste 0x202E from the character map into Everything - with a cute side effect:

1 Like

Both excellent suggestions, thank you!