Opus ECMAScript RegEx, RegExS, greedy parsing

(EDIT: The subject has been adapted to better reflect the problem & solution)

I have some trouble with the following situation:

tested filename: “Some filename [5309272820375] [µA].mp4”

In this filename I want to capture the media identifier between the first set of brackets, and show this in a column. There’s a second set of brackets that has the symbol “µ” right after the opening bracket), and this is not a media identifier, it indicates a series of ‘labels’ - in this case just label “A” (for Audio-only). I use this instead of the icon labels in Opus because it shows up much faster in my “labels” column. Displaying media identifiers and labels in two separate columns is, of course, not a problem, using evaluator columns.

The problem occurs when both types of brackets appear in the same filename, as in the example above. I can’t seem to get the first one - the media identifier - with a single RegEx call, like this:

if (IsDir(path+"\\"+name)) return "";
c = Count(name, "["); if (c==0) return "-"; //name is the selected file above
if (RegEx(name, ".*\[[^@µ]+")) Output("Ok, the evaluator works.");
//The line below doesn't work as expected because it doesn't do non-greedy!
if (RegEx(name, ".*\[[^@µ]+")) return RegEx(name, ".*\[([^@µ\]]+).*", "\1");
return "?";

What I’m trying to do here is picking up all characters within the particular set of brackets which does NOT have “@” or “µ” immediately after the opening bracket. (If no such thing is found, it returns an empty string). Line 3 just gives some output, confirming that the “if” test works. So the same “if” on line 5 does of course also work. However, the RegEx doesn’t work as expected. In fact, it seems to fail without error, going on with line 6, because the the column shows the question mark. Wtf?

So my question would normally be, how and where exactly to add the non-greedy indicator? Normally, “In ECMAScript, all the forms of repetition count can be followed by the character ? which designates a non-greedy repetition” (quoting the Microsoft page referred to in the Regular Expressions page of the Opus manual). I tries several things, but none was working.

But another, prior question would be: why is the Regex on line 5 being skipped altogether, going on with line 6?

I’m fairly familiar with RegEx but not exactly an in-dept expert.

PS. I can solve the problem of course: I can put the filename in a variable, remove the first bracket if followed by an unwanted character, and then do the exact same regular expression: this works perfectly:

if (c==1 && RegEx(name, ".*\[[^@µ]+")) return RegEx(name, ".*\[([^@µ\]]+).*", "\1");
//If multiple bracketed values, some of which of the types "[@" or "[µ" ...
nn = Replace(name, "[@", ""); nn = Replace(nn, "[µ", "");
if (RegEx(nn, ".*\[[^@µ]+")) return RegEx(nn, ".*\[([^@µ\]]+).*", "\1");
return "";

But this is not how we’re supposed to roll with RegEx. (I feel Rick-Rolled like in the good old days).

Is this Evaluator code or Javascript/ECMAScript?

If it's evaluator, if (RegEx(nn, ".*\[[^@µ]+")) returns false because your pattern doesn't match the entire string; it only matches a prefix of the string.

Doesn't matter if it tries greedy or non-greedy matches first; it won't match at all because there is a µ in the string and your pattern excludes anything with a µ anywhere after the last [.

It was in the column evaluator, so evaluator indeed.

You are right : The evaluator works correct in case there’s no “µ” in the last part, but indeed didn’t work for this particular case. Great: solved.

So this was NOT a greedy vs. non-greedy issue after all? Does the Opus RegEx implementation support the non-greedy character, or does it always do non-greedy perhaps?

It's about partial vs non-partial matching, not greedy vs non-greedy.

Your pattern:

  • .* Starts with anything (or nothing)
  • \[ followed by a literal [ character
  • [^@µ]+ followed by one or more characters that aren't @ or µ
  • and then the pattern ends, so nothing at all is allowed to follow.

If it was doing partial matching, you would not need the .* on the front. But if it isn't doing partial matching, you'll need it (or something similar) on the front and on the end.

Note that there are functions for both modes (entire string and substring):

If using RegExS, I would remove the .* from the start of the pattern and search for \[[^@µ\]]+] so it looks for the [, then a sequence of characters that aren't @ or µ or ], then a ].

I’m a fast and (often) sloppy reader - the difference between RegEx & RegExS is so well camouflaged (two long chunks of almost identical text with - as I just discovered - a few subtle changes here and there) so I thought one of these was just an obsolete, older function name. Imho it would be more productive to treat both together on one page and explain the subtle difference in passing.

But thanks for pointing my attention to it. RegExS seems to make more sense in many if not most scripts where I have used RegEx so far.

This works perfectly fine now:

if (IsDir(path+"\\"+name)) return "";
c = Count(name, "["); if (c==0) return "-";
if (RegExS(name, "\[[^@µ]")) return RegEx(name, ".*\[([^@µ\]]+)\].*", "\1");
return "-";