Capture Extension

amerifax · June 19, 2013, 11:41pm

Capture Extension
399891 42024 685417.jpg
The following rename worked well if I was not concerned with the extension. I would simply hardcode the extension:
(^.......)
\1.pdf

Now I'm trying to capture the extension for the file rename. I had very little lock checking different forums. In our example. help, I just can't see how the extension is being captured.
The Lord Of The Rings Backup.avi
(.)\s(.)#
\1\2

I can't get the hang of it. None of the elements worked for me. it does seem (.) is te key to the extension. So this is my basic attempt . (^.......)(.)
\1 \2.* or \1 (.*)
Not much luck

Bob

StuSmith · June 20, 2013, 3:14am

Hi amerifax,
I do not pretend to be a regex master, but I think I may be able to help.
Just add . so your expression becomes (^.......).(.*)
now \2 will contain your extension
I hope that helps.
Stu.

amerifax · June 20, 2013, 6:49am

Thanks for looking into it but it did not work. And I have run out of ideas for tonight. It's time for the break with the hope.
Bob

amerifax · June 20, 2013, 8:02am

Thought to take another stab at it before break time.

I complicated it a bit more for experimental reasons. I'm close but still no cigar.
010113 422105 44201 40845 spec.jpg

(^.)\s(.)\s(.)\s(.)\s(.*)
\2 \5
My result is 422105 spec.jpg

I have my extension but also the unwanted "spec" and the space before. It seems I needed to treat the word groups by identifying them by spaces. This link was also a great help RegExp basics: Removing characters from start/end of names

I hope I'm going in the right direction, logic, with the code I' using.
Bob

amerifax · June 20, 2013, 8:29am

It seems this should be close. But still not there.
\s(......)\s(.)\s(.)\s(.*)\s(....+).[^.]+

Bob

Leo · June 20, 2013, 9:51am

There are only two spaces in your filename, but your regexp has \s in four different places.

aterlecki · June 20, 2013, 11:34am

I think this is what you want:

Old Name: ^([^\s])[^.]..(.*)$
New name: \1.\2

aterlecki · June 20, 2013, 1:37pm

Actually that's not right. This should only act on files with spaces in them:

^([^\s])..(.*)$

Personally, I don't like the Dopus regex engine - but that's just a biased opinion from a Perl enthusiast!

amerifax · June 20, 2013, 7:11pm

aterlecki<<
To test this out I used this: 312158 13113 70585.jpg

I modifies your code by adding : (^.......)
(^.......)([^\s])..(.*)$
\1.\3

I got the right answer but I question my use of 2 "^". I'm know going to go over this code till I totally understand. Thanks all for the help. Am I right to asume the $ forces a look at the very end of the filename. So the extension would be .(.*)$

In Leo's example I don't get the of it and as used above.
[^.]+ followed by one or more characters that are not dots.
Square brackets with ^ at the start mean match anything that is not one of the characters in square brackets. The . here is an actual . and does not need escaping when inside of square brackets.

Bob

aterlecki · June 20, 2013, 7:41pm

My code worked did it not and is a bit more generic as it matches any and all characters before the first whitespace character (which is what I thought you were asking but correct me if I'm wrong). Your code seems to include the space which I didn't think you wanted.

$ at the end means anchor pattern to end of text.

The first ^ means anchor pattern to start of text.

The second ^ is a negating character for use within the character class so ([^\s]*) means "zero or more characters that are not whitespace characters and save them into the \1 register/capture group.

MrC · June 21, 2013, 12:56am

Some tips:

Generally you don't want to place ^ inside the capture group. It is a "zero-width assertion" and captures nothing. It matches a location, not a character. So use ^(.) instead of (^.). Likewise $. That is, unless you know otherwise what your doing.
Reading lots of dots is hard for many and is error prone. Use the repetition construct. So instead of ....... use .{7} or if you need a range, .{3,7} for example, to match from 3 to 7 characters.
Sometimes \s is nice, but sometimes using a literal space when that's all you want to match instead makes the RE easier to scan and read. Compare:

(.)\s(.)\s(.)
(.) (.) (.)
Try to state your RE as precisely as you can. In other words, be as literal as possible in what can match to help (your and) the RE engine along. If you want digits (rather than any character), use \d or [0-9] instead of dot. That would turn your 7-digit match into the more precise \d{7} or the capturing version (\d{7}).
Build your RE in small pieces. Try to match from left to right, asking yourself exactly what you want to match each step of the way. Often this helps you vocalize the pattern, so writing it becomes more natural (e.g. From the beginning of the string, match exactly 7 digits, then one or more spaces, then 7 digits, etc.).
In Dopus, since you have the nice preview window, adding extra characters around your New name capture registers (i.e \1, \2, etc.) makes it easy to see what is being captured by which capture group. I often use something like: -\1-\2- so it is clear what is being stuffed into \1 and into \2.

Hopefully this provides a little help!