Rename via regex: group capturing and non-marking grouping

Hi,

I love being able to use regular expressions in the Rename dialog, but I just ran into a (minor) problem there. I think it might be a bug, but I'd like to get some input to be sure I'm not making a mistake (I don't think I am, but..).

I have this regex, "Old name": [0-9X]+-\[foo\](-\[bar\]|-)\[ *(.*) *\]-.* which has 2 sets of parentheses. And "New name": \2
When I try to rename folder with name: 123456-[foo]-[Bla_bla-Bla_Bla]- foobar I get Bla_bla-Bla_Bla ...super!

But when I try to rename folder with name: 123456-[foo]-[bar]-[Bla_bla-Bla_Bla]- foobar I get bar]-[Bla_bla-Bla_BlaThat's wrong, I think. Should be the same result in both cases I think, because the first set of parentheses say EITHER a dash OR a dash followed by the word bar in brackets. That capture group matches in both cases, so \2, ie. the 2nd set of parentheses should never include part of text matched by the first set... should it? Maybe I misunderstand how matching occurs (is it done 'greedily' from right to left? And if so, is there a way to force a 'lazy' match?).

It seems to me, that maybe, if I could mark the first parentheses grouping as non-marking, it might solve the problem, but you cannot type a colon in the "Old Name" field..... I assume the reason you cannot type a colon in the "Old name" field in the "Rename" dialog is that the colon is not a valid character for file/folder names (except for accessing alternate data streams). Non-marking grouping requires a colon, eg. (?:expression) where 'expression' is not a group capture so when you would do, eg. "Old name": code([0-9]+)[/code]and "New name": \1 the new name would end up a number, because the first set of parentheses is used solely for grouping an OR, not capturing. Maybe I should post this as a feature request in a separate topic?

Thanks for reading and any and all input.

Since I cannot seem to edit, I'm quoting my own post. I simplified my regex a little more (removed irrelevant part)...[quote="zeez"]Hi,

I love being able to use regular expressions in the Rename dialog, but I just ran into a (minor) problem there. I think it might be a bug, but I'd like to get some input to be sure I'm not making a mistake (I don't think I am, but..).

I have this regex, "Old name": [0-9]+-\[foo\](-\[bar\]|-)\[(.*)\] which has 2 sets of parentheses. And "New name": \2
When I try to rename folder with name: 123-[foo]-[Bla_bla-Bla_Bla] I get Bla_bla-Bla_Bla ...super!

But when I try to rename folder with name: 123-[foo]-[bar]-[Bla_bla-Bla_Bla] I get bar]-[Bla_bla-Bla_BlaThat's wrong, I think. Should be the same result in both cases I think, because the first set of parentheses say EITHER a dash OR a dash followed by the word bar in brackets. That capture group matches in both cases, so \2, ie. the 2nd set of parentheses should never include part of text matched by the first set... should it? Maybe I misunderstand how matching occurs (is it done 'greedily' from right to left? And if so, is there a way to force a 'lazy' match?).

It seems to me, that maybe, if I could mark the first parentheses grouping as non-marking, it might solve the problem, but you cannot type a colon in the "Old Name" field..... I assume the reason you cannot type a colon in the "Old name" field in the "Rename" dialog is that the colon is not a valid character for file/folder names (except for accessing alternate data streams). Non-marking grouping requires a colon, eg. (?:expression) where 'expression' is not a group capture so when you would do, eg. "Old name": code([0-9]+)[/code]and "New name": \1 the new name would end up a number, because the first set of parentheses is used solely for grouping an OR, not capturing. Maybe I should post this as a feature request in a separate topic?

Thanks for reading and any and all input.[/quote]

It's not a bug; there's a problem with your regex.

Taking the simplified version: [0-9]+-[foo](-[bar]|-)[(.*)]
And the second filename: 123-[foo]-[bar]-[Bla_bla-Bla_Bla]

The [0-9]+-[foo] part will match and eat 123-[foo], leaving us with:

Remaining Regex: b[(.*)][/b]
Remaining Name: -[bar]-[Bla_bla-Bla_Bla]

Now the b[/b] part can match either -[bar] or just - on its own. Since both are valid it is not defined which one will be matched. (Unless choosing one causes the rest of the regex to fail, in which case the regex engine will backtrack and try the alternative route, but that isn't the case here; either path results in a matching regex.)

What you're seeing is that just the - on its own matches, leaving us with:

Remaining Regex: [(.*)]
Remaining Name: [bar]-[Bla_bla-Bla_Bla]
And \1 bound to -

That then removes the outer square brackets and leaves us with \2 bound to bar]-[Bla_bla-Bla_Bla

To fix your regex, you need to make it more specific to avoid having it go down the unwanted path.

If you just want to grab the last thing in square brackets then you can do this more directly with the following:

Old Name: [([^]]+)][^[]*$
New Name: \1

That will net you Bla_bla-Bla_Bla in both your examples.

NOTE: Some regex engines differ on the syntax for including [ and ] within character classes. Sometimes you have to escape them as [ and ] within the classes; sometimes you must not escape them if they are the first character in the class. The one above should work with the current version of Opus. I found I had to use the one below instead with a different regex engine:

Old Name: [([^]]+)][^[]*$

Thank you so much, leo! Excellent explanation and solution.

Every time I think I understand regular expressions well enough, I find I'm either doing something wrong or there's something about it I didn't yet know. :laughing:

Cheers