Filename Cleanup Regex

Looks like someone got bitten hard by the regEx bug... Awesome work!

More like the File Format Naming Bug! :smiley:.

Nearly all of my Opus time lately has been devoted to a new toolbar, named KRA-FileNames.dop. Hopefully I will post it by this weekend.

Right now I need some input and a tester or two. any volunteers?

Help me think this through. I'm considering providing all the formats listed below. I just don't know if they are all really needed.

Proposed File Name Formats
[ul][li] Friendly - "nasa--shuttle.inspectionv.1.2.3.ext" To: "Nasa - Shuttle Inspection v1.2.3.ext" (doesn't preserve capitalization, forces title case)[/li]
[li] Space - "NASA--ShuttleInspectionV.1.2.3.ext" To: "NASA - Shuttle Inspection v1.2.3.ext" (preserves capitalization)
[/li]
[li] Butt - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--ShuttleInspectionV-1-2-3.ext" (preserves capitalization, only hyphens)
[/li]
[li] ScoresL - "NASA - Shuttle Inspection v1.2.3.ext" To: "nasa--shuttle_inspection_v_1_2_3.ext" (all lower case, separate with underscores)[/li]
[li] ScoresP - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--Shuttle_Inspection_v_1_2_3.ext" (preserves capitalization, separate with underscores)[/li]
[li] ScoresU - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--SHUTTLE_INSPECTION_V_1_2_3.EXT" (all upper case, separate with underscores)
[/li]
[li] DotsL - "NASA - Shuttle Inspection v1.2.3.ext" To: "nasa--shuttle.inspection.v.1.2.3.ext" (all lower case, separate with periods)[/li]
[li] DotsP - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--Shuttle.Inspection.v.1.2.3.ext" (preserves capitalization, separate with periods)[/li]
[li] DotsU - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--SHUTTLE.INSPECTION.V.1.2.3.EXT" (all upper case, separate with periods)[/li][/ul]

This article suggests using underscores in lieu of spaces for web files. That would be my ScoresL format above.

This article looks down on using underscores in web file names, and suggest using dots in instead. :unamused: That would be my DotsL format above.

All formats are safe to operate on files already so formatted. Thus, if the user selects a file formatted in Space and clicks on Space, the file name does change, only it changes back to the same name it started with.

On any of the Scores or Dots Formats, I will force a double hyphen. This is required so that Friendly and Space formats can properly handle files that use hyphens between version numbers.


Friendly and Spaces do nearly everything the other does with one exception: Friendly forces capitalized words, Space preserves existing capitals in the file name. One would use Space on a file formatted like Butt (the opposite of Space).

I have already completed, and extensively tested, Spaces. I'm completing the newest version of Friendly today (I already provided several versions of Friendly in this thread). These are both intended for files used in Windows by humans. I will complete Butt, since I want it. The DotsL and ScoresL are essential web formats. The others I'm not so sure about, but I though I'd type them out to see how they would look.

Q: Do you think any of the other formats are worth spending my time on?

Q: Any issues with how the output formats above look?


Draft Documentation:

The new toolbar provides 4 automation-level buttons for each format rename. Each has its strengths and weaknesses, none are perfect. Users should copy the buttons they want to use to their own toolbar. Each automation level button is named in the following convention, where Format = the actual format name from the above list:

[ul][li] Format 1[ul][li] Intended For: Only one (1) and only one (1) file at a time.[/li]
[li] Speed: Fully automated, one and your done in seconds.[/li]
[li] Safeguards: Speed is throttled back a tiny bit (≈ 1 second) between each step. This ensures the file list is updated on long file names where several recursive renames occur in a single step, before proceeding to the next step.[/li]
[li] Weakness: The file list update speed is weakest link. This method cannot be used for more than one file. [/li][/ul]
Bar none, this is the fastest and most convenient method for any format rename. And bar none, this is also the most risky method for any format rename. The high risk prohibits usage on more than one file--it simple will error on more than one file. The reason is the file list doesn't get refreshed fast enough between the multiple rename steps. This risk is greatly minimalized when only one file is selected. The file list refreshes nearly fast enough to keep up with the several rename steps on a single file. It depends on how long the file name is, how intense the rename step is (some use recursion), and various system performance factors (what other apps are running). The tiny throttle-back safeguard helps infinitesimalize the risk to ensure the file list refresh keeps up. It should be safe for all single-file renames. (But I'm not not guaranteeing anything.) If a desktop search engine is monitoring the hard drive at that moment, all bets are off.

Format 1~[ul][li] Intended For: One or more files at once, manual verification of each step.[/li]
[li] Speed: As fast as the user can verify the file list has been updated and click OK.[/li]
[li] Safeguards: The user prevents the next step from launching until the file list is updated from previous step.[/li]
[li] Weakness: the user's error or impatience--clicking too early at any step messes up file names.[/li][/ul]
If the user is knowledgeable in how the file list refresh lags behind a rename, and well-practiced in using this button, this is the fastest method for multiple-file renames. However, the human error probability makes this the second riskiest method. Basically the button queues each rename step, and the the user clicks OK when its safe to proceed to the next step. All the control and risk is in the users hands. The user must watch for the few selected files to be renamed, not to be deselected! Opus deselects files during a rename. However the updated file listing occurs after this. Patience and keen eye observing each step is a must.
[/li]
[li] Step XX of YY: These buttons are located in the menu portion of the Format 1~ menu button. Each individual Step button has the essentially the same factors as the Format 1~, only the rename steps are not queued. The user must click on each individual rename step button in the proper sequence. These buttons are intended more for developing and testing a new rename format sequence. They are included in case others want to experiment with them.
[/li]
[li] Format 1+[ul][li] Intended For: One or more files at once with no required manual verification between each step.[/li]
[li] Speed: Calibrated (restricted), to account for system performance, allowing for safer one-click automation on several hundred files.[/li]
[li] Safeguards: Upfront testing by the user helps determine the calibration required. A calibration factor is added to the command's auto-wait period between each rename step. Eliminates the probability of the user clicking too soon between steps or in the wrong order on individual buttons.[/li]
[li] Weakness: The button calibration may not be accurate under all system loads. This risk level rises as system performance degrades below what it was during calibration testing. This can be managed by calibrating the buttons under a significant system load that includes applications accessing the hard drive. This method is not recommended for use on networked or removable drives.[/li][/ul]
This is the best option I could provide for hundreds of files. It allows for fewer mouse clicks Than Format~. The button has a built-in auto-wait that scales to how many files the user selected (the user is prompted for this information).

I've provided a fairly automated calibration testing mechanism that allows the user to see how the button's default auto-wait performs against the provided 1281 test cases on the user's system. The user re-runs the calibration testing with different modifiers until no errors occur. The calibration test shows what actually should happen to one file at each step of a Space 1 Format Rename. So it's fairly informative. (In a future version I might actually discuss the RegExp code taking place on each of these screens.) Once the user has a good calibration number, they edit the Format 1+ button to code that in, and from then on just use the button. If the system is modified, or performance changes, the calibration should be done again.[/li][/ul]

Great Blazes that's insane! And incredible! I can't offer much time for testing but I'll gladly give it a spin and let you know of any issues that arise.

Hi Ken and denyerec,

As I thought it might, and as Ken's solution indicates,
this one does consume some thought and time.

I made significant progress today on a different approach to this problem.
My solution will probably be three buttons.
Each button works a different part of the rename using RegExps.
The buttons each must be pressed multiple times until that part of the rename converges.
At convergence, input name = output name.
When each of the three buttons reach convergence, only a titlecase rename will still be needed.

I'm not ready to post any results yet, but perhaps tomorrow.
This entire rename could be done more easily just by writing a command line program,
but it is now a matter of stubborn pride....

Regards,
Porcupine

It all became much easier this morning.
Well, here it is.

I have here three buttons.
Each of these buttons may have to pressed several times to achieve convergence of that button.
Thie idea is to press the button, look at the result, and then press the button again if needed. 8)
The buttons must be used in the correct order.
Button one must be run to convergence, then button two must be run to convergence, and then button three must be run to convergence.

If you haven't already done so, download Ken's Test Files.
They can be downloaded from his KRA-10StepsToFileNameBliss-TestCases-v5.0.zip link earlier in this thread.

Button One - Removes ( dot | underscore | multiple space ) characters

@NoDeselect Rename REGEXP PATTERN="(([^_]*|[^\.]*)(_|\.| ))(.*)(\.)(.*)" TO="\2 \4\5\6"

Button Two - Inserts dot characters between numbers relative to (V | v) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(V|v)((\.[0-9])*)([^0-9])?([0-9])(.*)" TO="\1\2\3.\6\7"

Button Three - Flanks hyphen characters with a space character on each side if they don't already exist

@NoDeselect Rename REGEXP PATTERN="(.*)(([^ ])(-)|(-)([^ ]))(.*)" TO="\1\3 \4\5 \6\7"

The entire toolbar can be downloaded at the bottom of this post.

This does all of the original problem except the Titlecase Rename.
Ken got the Titlecase Rename in his solution.
I still think a command line program is the better solution here,
but after working on it several hours yesterday and almost failing,
I just HAD to finish it.

@Ken
Thanks very much for your test files.
They were invaluable ! :stuck_out_tongue:

Regards,
Porcupine
Filename Cleanup Regex -- Porcupine Three Button.dop (1.56 KB)

As Soon as I'm done testing I'm posting a toolbar in the toolbar forum. It will have it all, in one button. I'm not done with the user screens yet.

Hi Ken,

I have a fix and update on my three button solution.
It now works with a single click on each button.

Button One - Removes ( dot | underscore | multiple space ) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(_|\.| )(.*)(\.)(.*)#" TO="\1 \3\4\5"

Button Two - Inserts dot characters between numbers relative to (V | v) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(V|v)((\.[0-9])*)([^\.])([0-9])(.*)#" TO="\1\2\3.\6\7"

Button Three - Flanks hyphen characters with a space character on each side if they don't already exist

@NoDeselect Rename REGEXP PATTERN="(.*)(([^ ])(-)|(-)([^ ]))(.*)#" TO="\1\3 \4\5 \6\7"

I tried your, AFAIK, undocumented 'trick' of adding a '#' character to the end of the expression.
It worked on buttons one and three, but always froze my computer on button two.
I was stumped for quite some time.
I realized tonight I was creating an infinite loop.
It all seems to work now.

The entire toolbar can be downloaded at the bottom of this post.

Regards,
Porcupine
Filename Cleanup Regex -- Porcupine Three Button- ver2.dop (1.55 KB)

I actually have things worked out to one and only button to the job. I've now created several different toolbar buttons for different file name formats. I am calling "Format Renames", and I will be releasing a Format Rename Toolbar soon. However, I must do more testing on the performance calibration.

I've pioneered three different techniques of using one button.

One technique automates the process, as fast as possible, but is only safe for one file.

One technique automates the process, is as fast and as safely as possible, and is safe for thousands of files.

The other technique is semi-automatic, it is only as safe as the user (it's really for me to test things out).

Right now all I'm working on is some stress testing, and trying to extrapolate the most accurate calibration calculation to use. (All this will be explained in detail in my Toolbar post when I submit it for others to use.)