I have some troubles getting this to work. I have files which look like this:
20121009 1928 - SF zwei HD - Factory Made - So wird's gebaut.ts
20120805 0219 - BR-alpha - Alpha Centauri - Distanzen 1.ts
What I want now is to strip the first part away which is the date, the time and the TV channel name. Or differently said, I want everything after the second occurence of " - ".
Factory Made - So wird's gebaut.ts
Alpha Centauri - Distanzen 1.ts
I came up with the following expression (added some more groups to be able to quickly check the result):
^(\d{8}[ ]\d{4})(\s-\s)(.)(\s-\s)(.)$
But this doesn't work for the second case and I don't understand why...
The smallest modification to your regex to make it work is to add a ? in the first dot-star:
^(\d{8}[ ]\d{4})(\s-\s)(.?)(\s-\s)(.)$
Then replace with \5
This is because dot-star (.*) is greedy. In your original expression, it matches everything to the end of the file name. Then, to produce a match, the engine backtracks one character at a time. So the last capture group ends up only containing what is after the last space-dash-space.
When you add the question mark, the dot-star becomes ungreedy, and only matches until the next space-dash-space, which is what you intended.
Playful, your "compact" solutions works great. I was checking the documentation to find out about the first part of your pattern "(?:.*?" but didn't find an explanation for the colon. What does that mean? However I do understand that you check for that specific " - " pattern 2 times, that's a great idea!
Thanks as well for the explanation about my pattern problem that also explains how the regex engine works .
There are many different variants of regular expression; by default Opus uses what's called TR1 ECMAScript. Microsoft has a page on TR1 that goes into far more detail than this help file can.
Hi Roger,
The pages Leo linked to should have the information, but I'll explain it to complete the thread (and because it's fun).
So we are looking at the first set of parentheses: (?:.*?\s-\s)
(?: means that this set of parentheses is non-capturing. That means that whatever it matches will not go into Group 1, which you would later refer to as \1. This is why in the replace, we can use \1, as it refers to the only capturing parentheses in the pattern: the final dot-star.
After that, the dot-star-question mark ungreedily matches everything until we meet (and match) a space-dash-space.
After the closing parenthesis, you are quite right that we repeat this pattern thanks to the {2}
Once we're past the second occurrence of space-dash-space, we're free to capture everything (dot-star) into Group 1.
Of course this is only one way---there are a number of other ways of writing patterns to match these strings.
Hi David,
Indeed, at first glance it can seem surprising that the pattern would work on the second file:
20120805 0219 - BR-alpha - Alpha Centauri - Distanzen 1.ts
We're matching no-dashes then a dash, no-dashes then a dash, then space. If the renamer was trying to match the entire file name, the regex would fail, because the second dash (the one in BR-alpha) is not followed by a space.
But that's not what the renamer does. It looks for files where the pattern matches somewhere, but the pattern doesn't have to match the entire file name. If you want the pattern to match the entire name, you use anchors (^ and $). For instance, for a pattern you could just have "alpha", and for rename you could have "beta", and the entire long file name above would get renamed to "beta".
With your pattern, the renamer is able to match BR-alpha - Alpha Centauri - Distanzen 1.ts
It may not be the string you expected to match, but it conforms to no-dashes then a dash, no-dashes then a dash, then space, than anything up to a dot, then anything.
And the parentheses correctly capture what you want.
[quote="playful"]Hi Roger,
The pages Leo linked to should have the information, but I'll explain it to complete the thread (and because it's fun).[/quote]
Many thanks for this.
These "worked examples" are an excellent way of working out the complexities of these regex things. While it is wonderful when people throw in answers to particular challenges, you can't always take away any wider lessons.
This one message has taught me more than dozens of other "problem solved" replies.
Before DOpus 10, at least at some point in time, the pattern had to match the entire filename.
So yes, naturally I had thought this pattern had to match the entire filename.
This being the case I have almost never used or needed anchors.
Regexp in DOpus 10 are more as there are in PHP or VB script.
I nearly always bow to your wisdom, Leo, but David's pattern
[^-]-[^-]-+(...)
though it is not on my top-ten list for regex style, does work even for the following mouthful of dashes: 20120805 0219 - BR-alp-ha - Alp-ha Cen-tauri - Dist-anzen 1.ts
(We jump to the first dash, which happens to be the first space-dash-space component, then we jump to the next dash-space (just before Alp-ha), then we capture everything after that, so any dashes on the right shouldn't matter.)
Maybe I misunderstood and you had something else in mind. For instance, the pattern would break if there were a dash (with or without space) in the date on the left, because the first "dash test" is based on a plain dash, not space-dash-space, so that a plain dash on the left would be a "false positive".
Michael, thank you for your kind post. Sometimes I wonder if I've gone into my own little world and vomited a thousand keystrokes on a topic that's only interesting to me. So it's a treat to know you didn't find the details boring.
My mistake, you're quite right. It's skipping the first dash, then finding the second dash with a space after it, which is fine for the specified inputs.
It'd only go wrong if there was an extra dash somewhere in the date at the start (not part of the specification, so that's okay) or if a word ended in a dash, e.g.
There are loads of regular expression tutorials, and interactive learning tools, on the web that walk you through step-by-step examples. If you want to learn regular expressions, the information is out there. (In addition to the beginner-level guide I wrote for the rename scripting area here.)
I agree it can be helpful to explain things step-by-step, but it's also very time consuming and often seems redundant when there are tutorials out there for people who want them. (Explaining things is definitely useful when some of the newer or more esoteric regexp features are used, as in Playful's example, of course.)
[quote="leo"]
There are loads of regular expression tutorials, and interactive learning tools, on the web that walk you through step-by-step examples. If you want to learn regular expressions, the information is out there. (In addition to the beginner-level guide I wrote for the rename scripting area here.)[/quote]
I have looked at many of those tutorials.
The message I fingered for its usefulness was particularly helpful because it addresses a specific problem with a good example and a clear explanation of what each bit does.
Most tutorials start with very general issues and then provide comprehensive solutions. They also mostly come from people writing for readers like themselves, rather than for someone who wants to just deal with an issue rather than complete a PhD.
As in many walks of life, the worst explanations often come from the most knowledgeable people. This is why people like me can make a living translating technical stuff for readers who start with very little knowledge.