RegExp basics: Removing characters from start/end of names

See also: Script to perform multiple Regular Expressions

Regular Expressions make it easy to remove characters from the start or end of filenames.

This guide should help you learn the basics of regular expressions. Along the way it will also provide you with some frequently requested Rename Presets which you can use whether you understand how they work or not.

If you want to learn more about regular expressions there are many resources about them on the web. Just keep in mind that there are, unfortunately, many variations of regular expression syntax. Different programs and programming languages may do things slightly differently and there is no one standard. The Regular Expression Syntax appendix near the back of the Opus manual and Help file describes the exact syntax which Opus uses.

First some basics:

  • The . character means match any single character.

  • The * character means match zero or more of the thing before me. For example, a* will match zero or more a characters. If you use .* then you will match zero or more of any characters, exactly the same as a single * would do in a wildcard expression.

  • The + character means match one or more of the thing before me. a+ will match one or more a characters, but will not match if there are no a characters at all. .+ will match one or more of any characters.

We'll use these filenames for the first few examples:

01_moocow.txt
xx_file1_xyz.txt

Some simple Rename operations:

  • Remove the first 3 characters from the start of a filename:

    Old name: ...(.+)
    New name: \1
    Type: Regular Expressions

    01_moocow.txt    -> moocow.txt
    xx_file1_xyz.txt -> file1_xyz.txt
    

    Here is how to understand the old name:

    • ... Find any three characters
    • (.+) *followed by one or more characters.

    The second part, (.+), is in brackets so that we can refer back to it in the new name using \1, which means insert the characters that matched what was in the first set of brackets.

    The number of characters that are removed from the start of the name will be the same as the number of . characters in the first part of the old name expression. So, if you wanted to remove 4 characters you would use ....(.+) as the old name.

  • Remove the last 3 characters from the end of the filename:

    The first thing you might try is this:

    Old name: (.+)...
    New name: \1
    Type: Regular Expressions

    01_moocow.txt    -> 01_moocow.
    xx_file1_xyz.txt -> xx_file1_xyz.
    

    Oops! That probably wasn't what you wanted. It removed characters from the file extension which you would almost always want to leave alone.

    If you are using Directory Opus 12 or above, you can simply turn on the Ignore Extension checkbox and things will work as you wanted. The checkbox removes the extension from the equation, applies your regular expression to the rest of the filename, then puts the extension back on the end of the result. It also ensures this is only done with files, not folders, so you can use the same regular expression on both.

    But let's pretend the Ignore Extension checkbox doesn't exist, and we want to do something similar using a regular expression. You wouldn't normally want to do this, unless on an old version of Opus, but the techniques it demonstrates will be useful in other places, so it's a good example.

    You need to use a slightly more complex regular expression to keep the file extension in tact:

  • Remove the last 3 characters from the main part of the filename, preserving the file extension:

    (As per above, if you are using Directory Opus 12 or above, you can do things much more easily than this, by simply turning on the Ignore Extension checkbox.)

    Old name: (.+)...(\.[^.]+)
    New name: \1\2
    Type: Regular Expressions

    01_moocow.txt    -> 01_moo.txt
    xx_file1_xyz.txt -> xx_file1_.txt
    

    The old name means this:

    • (.+) Find one or more characters
    • ... followed by any three characters
    • (\.[^.]+) followed by the file extension.

    The \.[^.]+ expression for the file extension means this:

    • \. One dot character...

      Since the . (dot) character in regular expressions usually means match any character, when you want to match an actual . you have to put a \ before it to "escape" it. In other words, \. will match an actual . character.

    • [^.]+ ...followed by one or more characters that are not dots.

      Square brackets with ^ at the start mean match anything that is not one of the characters in square brackets. The . here is an actual . and does not need escaping when inside of square brackets.

    In this example, we have two things in brackets. We want to keep the start of the filename, so that's in brackets. We also want to keep the file extension at the end of the filename, so that is also in brackets. The new name of \1\2 joins together the two parts that we kept and throws away what was in the middle.

What if it's not a fixed number of characters?

The examples above are fine if you always want to remove a set number of characters, but what if the thing you want to remove isn't always the same length? Consider these example filenames:

0_example1.txt
42_test2.txt
360_banana3.txt
462foo_bar.txt

If you want to remove the digits from the start of these names then you can still use a regular expression. Here are some examples:

  • Remove the digits from the start of a name:

    Old name: [0-9]+(.+)
    New name: \1
    Type: Regular Expressions

    The [0-9] means any character in the range 0 to 9; in other words, any digit. Adding the + makes it match one or more digits.

    (As an aside, you can use \d as a synonym for [0-9], and \d+ would mean match one or more digits the same as [0-9]+ does. There are similar shorthand codes for a few different groups of characters, such as whitespace, letters-and-numbers, and so on.)

    Breaking down the old name expression:

    • [0-9]+ Find one or more digits
    • (.+) followed by one or more characters.

    Using our filenames from above, you will get these results:

    0_example1.txt  -> _example1.txt
    42_test2.txt    -> _test2.txt
    360_banana3.txt -> _banana3.txt
    462foo_bar.txt  -> foo_bar.txt
    

    If your filenames are all like the first three and you want to remove the _ as well then you only need a slight change:

  • Remove the digits and the underscore from the start of a name:

    Old name: [0-9]+_(.+)
    New name: \1
    Type: Regular Expressions

    0_example1.txt  -> example1.txt
    42_test2.txt    -> test2.txt
    360_banana3.txt -> banana3.txt
    462foo_bar.txt  -> 462foo_bar.txt
    

    Note that the fourth filename has not been changed at all this time. That is because it did not match the "old name" expression and was skipped over.

    The "old name" expressions [0-9]+_(.+) says:

    • [0-9]+ Find one or more digits
    • _ followed by an underscore
    • (.+) followed by one or more characters.

    Since the fourth filename does not have one or more digits followed by an underscore, it does not match the expression and is ignored.

  • Remove up to and including the first underscore:

    The last example can be modified slightly to remove everything up to and including the first underscore:

    Old name: [^_]*_(.+)
    New name: \1
    Type: Regular Expressions

    0_example1.txt  -> example1.txt
    42_test2.txt    -> test2.txt
    360_banana3.txt -> banana3.txt
    462foo_bar.txt  -> bar.txt
    
    • [^_]* Find zero or more characters that are not underscores
    • _ followed by an underscore
    • (.+) followed by one or more characters.

Advanced Bonus: Buttons that ask you how many characters to remove.

Everything you've seen so far can be saved as a Rename Preset in the Rename dialog. You can also turn them into toolbar buttons which carry out the renames in a single-click, without opening a dialog.

These next two things can only be used in toolbar buttons. They are toolbar buttons and not Rename Presets.

These buttons will pop-up a small window asking you how many characters you want to remove from the start or end of the filename. You simply type in a number and hit return and the button will remove that many characters.

See How to add example buttons to your toolbars and menus to understand what to do with the blocks of XML below. Don't worry, it's very simple and you don't have to understand the actual XML or button code to use it!

  • Ask how many characters to remove from the start of the filename:

    <?xml version="1.0"?>
    <button>
       <guid>{3190F9B8-C395-49BF-BB80-EADF5DC5EA4B}</guid>
       <label>Remove n First Character(s) (files only)</label>
       <tip>Remove the desired number of first character(s) from file name (preserving extension)</tip>
       <icon1>16</icon1>
       <function>
          <instruction>@NoDeselect</instruction>
          <instruction>Rename TYPE=files REGEXP PATTERN ".(.+)(\..*)#{dlgstring|Enter Number of Characters to Remove}" TO "\1\2" AUTORENAME</instruction>
       </function>
    </button>
    
  • Ask how many characters to remove from the end of the filename:

    <?xml version="1.0"?>
    <button>
       <guid>{5F9B9C7F-2A52-40F4-8088-E81BE23CC8AE}</guid>
       <label>Remove n Last Character(s) (files only)</label>
       <tip>Remove the desired number of last character(s) from file name (preserving extension)</tip>
       <icon1>16</icon1>
       <function>
          <instruction>@NoDeselect</instruction>
          <instruction>Rename TYPE=files REGEXP PATTERN "(.+).(.*)(\..*)#{dlgstring|Enter Number of Characters to Remove}" TO "\1\3" AUTORENAME</instruction>
       </function>
    </button>
    

The two buttons above were originally posted to the forums by hicario (original thread).

Explaining everything about these two buttons is beyond the scope of this post. They're just a bonus. Here are some hints, though:

  • {dlgstring|Enter Number of Characters to Remove}, in a button, will prompt the user to type something and then insert whatever they type into the command.

    (This is just in a basic non-script button. You can do even more advanced prompting and user interface via scripting, but things get more complicated if you go there. For our purposes, a simple prompt is all we need.)

  • If a regular expression ends in a # then Opus will keep applying it until the results stop changing. You can also tell Opus to re-apply the regular expression a maximum number of times by putting a number after the #. If a regular expression ends in #5 then it will be applied up to five times.

    (Note that the # ending is an Opus-specific extension of regular expressions. It won't work in most other software.)

  • The main part of the both regular expressions removes just one character from the start (or end) of the filename. Appending the #, and then the number the user types in, means the expression will be applied that many times and remove that many characters.

1 Like

I want to remove "The " from the beginning of artists names (i.e. the first 4 characters including a space) but not from elsewhere in the name (e.g. "The Three Degrees - The Chase" renamed to "Three Degrees - The Chase" and not "Three Degrees - Chase", which I know I can do by setting a button to delete "The ".

Match "^The " (without the quotes) if you want to remove "The " at the start only.

^ matches the start of the string.

$ matches the end of the string.

(Dummy reply to fix a forum issue.)