Shorten long filenames with regex

I have got quite a bunch of verry long filenames.
Would like to keep the names as long as possible (describing the contents of the file).

they all go like this
this is a very very long filename about this and that so and so and a lot of other crap.01-01-2013.txt

(the date is in that format, but just an example)

I would like to shorten the name to nnn characters+the date added.

so the above would for example become

fm: this is a very very long filename about this and that so and so and a lot of other crap.01-01-2013.txt
to: this is a very very long filename about this and that so and so.01-01-2013.txt

don't know whether this is easily achievable, vainly tried to find something on Internet.

Hope this can be done without complex scripting :slight_smile:

thanks

Are all the dates in that exact format, or are there lots of variations as in your Change modified date after filename thread?

It should be pretty simple if they're all the same, but would be much more complex if not. We need to know everything it has to deal with first to give a good answer.

leo - maybe I am overlooking something but what meant is shortening filenames, i.e. not about handling dates or something.
i need to shorten filenames of over 260 characters and pathlengths sometimes exceed 320.
but i would like to keep, say, 200 or 220 characters of the filename+the existing date.
regretfully there is no such thing as replace with: substr\1{1,220}\2
where \1 is the name and \2 the date that is already in the name.
hope you understand what I mean?

To keep the existing the date on the end, we need to be able to work out where the date begins, which means we need to know what the date looks like.

If you have dates in lots of different formats like in the other thread, then it's not going to be simple.

Dates are all in the same format dd-mm-yyyy.ext
As a matter of fact all files go like this

bla bla text-text.txt-01-02-2003.txt

files were originally saved as .txt files
lateron in a batch the 'source-date' was added plus again .txt
so they could be opened using a text editor.

it would roughtly be
find ^(.+).txt-(\d{2}-\d{2}-\d{4}).(\w{3})$
repl $1-$2.$3
(or in opus \1-\2.\3

But how to get just first 200 from \1 ?

=

You can use .{200} to match 200 of any character. Like you're using \d{2} to match two \d i.e. 2 digits.

Ah..

Thanks a lot Leo!

Just for fun (please don't do this for this particular case):

A precision freak might replace the first part ^(.+) with:

^((?:[^.]|\.(?!txt))+)

This matches all the characters before the first .txt and captures them into Group 1.

In contrast, ^(.+) matches the entire file name, then rolls back (backtracks) the string until the next part of the pattern, i.e. .txt, can be matched. In your case, this is exactly what you want. In other situations, with files that have more than two .txt fragments, such as one.txt-two.txt-01-01-2001.txt, if you only wanted to match the section before the first .txt, i.e. one, the ^(.+) fragment would overshoot (not a problem for you as you want everything up to the next-to-last .txt fragment, i.e. one.txt-two)

The regex fragment above is an example of the practice of "saying exactly what you want and what you don't want" in regex.
Here is how it breaks down:
^ asserts that we are at the beginning of the string
( start group 1
(?: start non-capturing group
[^.] match one character that is not a dot
| or
. match one dot...
(?!txt) that is not followed by txt
) close non-capture

  • match one or more such characters
    ) close Group 1

And as Leo showed you, you can easily change the plus + quantifier. For instance, if you only wanted to capture between 1 and 50 characters, you could use {1,50}

I'd go for: ^(.+?).txt-(\d{2}-\d{2}-\d{4}).(\w{3})$

The ? right after the quantifier + makes it non-greedy (by default they are all greedy, which leads to the whole filename ending up in the first group). The non-greedy + makes sure, you just get the part before the 1st occurence of ".txt". The non-greedy quantifiers were a relief when I found out about them, everyone should know! They are the key to the real regex heaven! o) The questionmark ? normally means "may" occur, but not after quantifiers like + or *.

Hi @tbone
IMO there is no point to the lazy ? in this regex.

Not so. It gets the part before the occurence of the LAST txt.
For instance, with this string
one.txt-two.txt-01-01-2001.txt

Group 1 will match one.txt-two

That is because the whole section after the lazy quantifier must match, and that is not just .txt, but also the digits.

If you really wanted to use your lazy quantifier and contain it to the first txt, you would have to use an atomic group:
^(?>(.+?)) etc

I haven't checked if TR-1 supports atomic groups. Another solution is the explicit one that I showed.

On this topic I'd like to pimp one of my pages, Mastering Quantifiers
Beyond the basics, there are several sections with non-obvious traps and advanced quantifier topics.

Man, I should have thought twice before responding to a regex posted by you - the regex-master of the universe! o))
Thanks for correcting me, I knew only half the quantifier story it seems. Your site covering all this is really excellent in explaining and structure. Worth a visit even on non-regex days, hehe. o)

So I'm smarter now and appologize for any confusion created! o)

@tbone thank you for your very kind message... Should I blush? But of course you are a grand master in fields where I know nothing, and I'm very grateful for the great help you've given me time and again.

Regex is tricky, even when you think you know it... I often make mistakes, I'm sure most people do.

The reason I brought up this particular form,
i+[/i]
is that the idea behind it is a regex classic, worth knowing in my view, and with a delightful logic:
...match either one character that is not a dot...
...or a dot that is not followed by txt...
...one or more times.

To cheer you up, and off-topic, someone told me today that this skit (Parents' Day at School) is entertaining... Sadly, I don't understand enough German. :smiley:

Thanks to all.

Well, in my case I managed to do something like:
find: ^(.{200}).*(.{14})$
repl: \1\2

=