'Simple" RegExp not working as expected

It seems like lately I'm either fighting with Opus or RegExp's, or both...

I can't figure out why the RegExp's in the following pics aren't working as I would have expected them to work...

This is the RegExp that looks to be doing what I want: (taking "-r, -o, or -r-o" off the end)


.
.
This is the one that I would have thought would've worked:

Not only do I not understand why the 2nd is NOT working.. I can't figure out why the 1st is working at all. (Although I haven't tested plain "-o" yet.) This is going to be in a menu button, so I want to make sure they're going to work properly.

Thanks for helping to figure out this mystery.

I'm not sure why the first one seems to work either, apart from luck. The list of alternatives separated by | should be surrounded by a single pair of (...) as in the second one.

The second one is correct, in a way, but it's allowing an ambiguity which means it isn't doing exactly what you want.

The second one is saying:

  • (.*) match as much as possible of the string
  • (-r-o|-r|-o) then, match -r-o, or -r, or -o
  • \.pdf then, match .pdf

If the string is xyz-r-o.pdf then the three parts match the following:

  • xyz-r
  • -o
  • .pdf

To fix it, you need to make the (.*) at the start be "lazy", which means it will match as little as possible, leaving as much as possible for all the other parts of the expression. That's done by adding a ? like this:

(.*?)(-r-o|-r|-o)\.pdf

It's also a good idea to be in the habit of explicitly matching the start ^ and end $, to avoid unexpected behavior with names that have substrings that match the regex but don't match it entirely (e.g. xyz-r-o.pdf.bak would match the previous regex and have the .bak stripped off unintentionally.):

^(.*?)(-r-o|-r|-o)\.pdf$

Yeah, I started with the second - but after over an hour of trying alternatives I tried removing the parenthesis for sh*ts-&-giggles and stumbled upon it working for some reason.. IIRC only one ordering of it seemingly 'worked'.

And after learning the other tips above, I FINALLY got a whole slew of other renames to work how I intended for them to work.. i.e. take off repeats of endings like "_CC_CC" - I was only ever able to get one set removed without doing a second rename. .... I would've thought one wanted (.*) to catch as much as possible, but I guess not. :slight_smile:

I can't remember if I got anything to work with this part, but I've been trying things like (.*)(-r|-o)$\.pdf, thinking it would anchor it to just before the '.pdf' part. But I guess something like that would never work?? Does $ ALWAYS anchor to the absolute possible end of the entire string?

Another thing I'm wondering is does that 'lazy' (.*?) only work in Opus, or is that also a RegExp thing? I couldn't find anything about it in the Opus docs, nor at the 'TR1 ECMA' website that the Opus docs point to. And similarly, I'm wondering if the # thing mentioned in the Opus RegExp docs is a unique Opus 'thing', or is that also a general-type RegExp thing?

$ always matches the very end of the string (or end of the line in some contexts, but that's not important with filenames)

Using ? to make the preceding thing lazy instead of greedy is part of many regex variants, but not all. There isn't a single standard for regex so you may need to check if it works in other programs.

Microsoft's regex page uses the term "non-greedy" instead of "lazy", and seems to have an error where it describes the feature but the character that invokes it is missing. The line has a pair of ' ' single quotes with nothing between them; it was presumably meant to be '?' rather than ''.

In ECMAScript, all the forms of repetition count can be followed by the character '', which designates a non-greedy repetition.

The # at the very end of a regex to apply it repeatedly (until it stops changing the string or gets into a loop) is an Opus addition and not part of any standard regex syntax.

1 Like