Filename Cleanup Regex

[quote="JohnZeman"]I don't know if this button command code is good enough to do all you want, but you might give it a try. Note all command lines begin with the word sync:

sync: dopusrt /cmd Rename PATTERN "." TO " " FINDREP
sync: dopusrt /cmd Select RESELECT
sync: dopusrt /cmd Rename PATTERN "-" TO " - " FINDREP
sync: dopusrt /cmd Select RESELECT
sync: dopusrt /cmd Rename PATTERN "_" TO " " FINDREP
sync: dopusrt /cmd Select RESELECT
sync: dopusrt /cmd Rename PATTERN "(.) ([0-9]) ([0-9]) ([0-9])(.)" TO "\1\2.\3.\4\5" REGEXP
sync: dopusrt /cmd Select RESELECT
sync: dopusrt /cmd Rename PATTERN "(.) (.)#" TO "\1 \2" REGEXP
sync: dopusrt /cmd Select RESELECT
sync: dopusrt /cmd Rename CASE=allwords[/quote]

@John

I ran this against my test case files above, to see what would happen.
Oh boy did I see fireworks! Lots of dialogs started flying all over the place. CPU usage for dopus.exe also climbed rapidly to 100% and my system fan started ramping up because the CPU started to get hot! :open_mouth:

I'm assuming you meant for this code to work only on a single file.
But even when I tried this against a single test case from my list, I still incurred both the dialogs and the CPU usage.

By the way, the SYNC: keyword will not help here. That only tells dopus.exe to wait for an external command to complete before launching the next external command. However, dopusrt.exe reports itself as complete as soon as it hands off the command back to dopus.exe. Dopusrt.exe doesn't know if dopus.exe has completed the raw command or not. Thus, this technique just adds an unneeded step to the same results as just using raw commands internally without using dopusrt.exe. Leo informed me of this in another thread somewhere on the Centre here.

Well that's interesting Ken, I had no such problems on my end. The button commands fail for me without the sync: dopusrt code, but work just fine with it on the file name examples denyerec posted such as:

another_file V_2_3.4.zip
HYPHEN-TASTIC -v4.rar
SOME.FILE.V.1.2.3_____TEMP.zip

I did run my button on your file list and I noticed not all the files renamed correctly, but most did.

[quote="JohnZeman"]Well that's interesting Ken, I had no such problems on my end. The button commands fail for me without the sync: dopusrt code, but work just fine with it on the file name examples denyerec posted such as:

another_file V_2_3.4.zip
HYPHEN-TASTIC -v4.rar
SOME.FILE.V.1.2.3_____TEMP.zip

I did run my button on your file list and I noticed not all the files renamed correctly, but most did.[/quote]

Download my test cases .zip file, and try to rename them using your button.

I've just posted an improved version to my method above. See my behemoth post earlier in this thread.

For those who have downloaded the earlier version, I recommend downloading this one as well as the .zip file containing my test cases. Try out my method on the test case files before attempting it with your own files! The key is to wait patiently for each step to complete before launching the next one.

I still cannot get a single button to reliably, safely, and consistently perform all the required steps, in proper the sequence, without error dialogs (even on a single file). Thus the 10-button method. However, with my method you can do all the files in your folder.

I'm sure PC performance, processor speed, other processes running, file name length and number of character replacements affect the likelihood of the "Race State".

For the curious, all the extra numbers in my test cases are what I call "search and replace obstacles." Opus usually searches for numbers from the right to left. Since this often trips up people (or at least me) in regular expression, I include prefix, middle, and suffix numbers to search around in my test cases. This helps me insure my rename doesn't destroy numbers in the file names I want to keep.

My method above handles version wherever they occur and leaves other whole numbers alone wherever they occur. Decimal numbers that are not part of a version number (i.e. preceded with a "v" or "V" would be in trouble.

[quote="JohnZeman"]Well that's interesting Ken, I had no such problems on my end. The button commands fail for me without the sync: dopusrt code, but work just fine with it on the file name examples denyerec posted such as:

I did run my button on your file list and I noticed not all the files renamed correctly, but most did.[/quote]

I'm sorry John, I missed that bold sentence above before!

I just went through again and tried your button against my test case files. I might have rushed it before. It appears as if it is done (viewing the mouse hourglass) then another process kicks in. This time I was much more patient between each file and I indeed did not get any errors. So that is encouraging.

However...

I still observe dopus.exe utilize 97%-99% CPU. I just tested this on a second, pristine PC--it only has brand new Windows XP SP2 install, fully updated, with only its hardware drivers, and the latest Opus installed (plus a faster CPU). But, I'm still seeing dopus.exe pegged at 96%-99% CPU utilization. and this is 15 minutes after I clicked the button last, plus the time I've been typing this message. My PC is just sitting there with the meter red-lined.

Okay I've achieved measured success. I've updated my post above with the definitive solution. There is a button for one and only one selected file. And another for several selected files. Carefully read the updated post above and the dialogs while using the buttons. Both buttons do it all in one button. Neither will hurt files that have already been so renamed. Neither use 99% CPU time.

Try everything on my provided test cases first, before attempting with you own files. If you receive error dialogs, you most likely double-clicked the toolbar button, or clicked on a dialog button too early.

Please post feedback from your own tests.

Hi Ken,

I'm overwhelmed !
Fantastic !

I understand you're a bit worked up as you've obviously put much effort into this.
I'm still wondering how to type the '¬' character from the keyboard.

But Ken, I still stand by my original post.
I still think I may be able to do this 'My Way' .
I just need a few days.
If I fail so be it.
In any case congrats to you !

We're really not that far apart Ken.
I went to some of Elementary School in the Eastern part of the UP.
I attended Gros Cap west of St. Ignace.
At the time, we lived on US2 almost as west as Brevort, about 20 miles from St. Ignace.

My Boyhood beach is now an almost hidden State Pull Over on US2.
Much thanks from me to a Detroit Lawyer who had a summer home adjacent to us at the time.
Grandfather wanted State permission to build a staircase down to Lake Michigan.
He got it and built it.
Long gone, it has been replaced and protected by the State of Michigan.

:opusicon: Porcupine

[quote="porcupine"]I understand you're a bit worked up as you've obviously put much effort into this.
I'm still wondering how to type the '¬' character from the keyboard.[/quote]

Hi there Porc!
I'm rather proud of my Opus raw command technique of using {dlgstring} to pause overall execution. It's a wee bit of a hack, but it works! :laughing:

Yeah, you could say I've put some time into this one. Truth is, when Denyerec posted his question, I knew right away what he was asking for. I was trying to accomplish something very similar (as in nearly exact) a few months back. I tried a number techniques only to end in frustration. But I kept notes. When Denyerec's posted what he had tried, it gave me an idea today what I had been doing wrong in the past. It had to do with the general order of the replace and needing to use temporary placeholder characters. I also have learned about the "#" operator since a few months ago. This is what makes it hum.

The character "¬" (not sign) is my personal favorite, for the very reason you typed above--not many people know how to type it. This really makes it the best temporary character. The chance of it actually existing in a real file name is slight at best. It is actually part of both the ASCII and Unicode character sets. I use it in command script function variables for handling special characters that otherwise would choke a command script.

With the Numlock enabled, hold down the Alt key and type "0172" on your numeric keypad. It won't work from the regular number keys. Barring that, you can also copy and paste it from CharMap.exe.

Yeah we're basically at opposite ends of the same state. I haven't been to the U.P. since college--I deejayed a friend's wedding in Ishpeming. It's really beautiful up there.

I just completed a button prototype that undoes the first one! :smiling_imp:

Soon, we will have a file name formatting toolbar with at least two sets of buttons: one to reformat file names suitable for uploading to the Internet (removes spaces and caps and converts versions like "v1.2.3" to "v_1_2_3"), the other (posted already) takes Internet files and makes them the more palatable for human consumption in in Windows.

Quick somebody give me something else meaningful and useful to add to it!

I think if this job gets packed into 2 buttons (One for adding and one for stripping) the guy responsible should get an honorary "Dev" title! :wink:

It's been and continues to be an interesting learning experience. I've already, as a byproduct, managed to add several little shortcuts to Opus that are now in constant use. Loving it!

:smiling_imp: I'm still cooking here.

The next version is getting renamed and published in the Buttons and Toolbars forum.

I'm adding Support for "BeachesOfTheUSVirginIslands.doc" <-> "Beaches Of The US Virgin Islands.doc"

I'm also adding buttons for several steps that have value by themselves "V1.2.3" <-> "v1.2.3"

"&" <-> "+"

Any other ideas?

Looks like someone got bitten hard by the regEx bug... Awesome work!

More like the File Format Naming Bug! :smiley:.

Nearly all of my Opus time lately has been devoted to a new toolbar, named KRA-FileNames.dop. Hopefully I will post it by this weekend.

Right now I need some input and a tester or two. any volunteers?

Help me think this through. I'm considering providing all the formats listed below. I just don't know if they are all really needed.

Proposed File Name Formats
[ul][li] Friendly - "nasa--shuttle.inspectionv.1.2.3.ext" To: "Nasa - Shuttle Inspection v1.2.3.ext" (doesn't preserve capitalization, forces title case)[/li]
[li] Space - "NASA--ShuttleInspectionV.1.2.3.ext" To: "NASA - Shuttle Inspection v1.2.3.ext" (preserves capitalization)
[/li]
[li] Butt - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--ShuttleInspectionV-1-2-3.ext" (preserves capitalization, only hyphens)
[/li]
[li] ScoresL - "NASA - Shuttle Inspection v1.2.3.ext" To: "nasa--shuttle_inspection_v_1_2_3.ext" (all lower case, separate with underscores)[/li]
[li] ScoresP - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--Shuttle_Inspection_v_1_2_3.ext" (preserves capitalization, separate with underscores)[/li]
[li] ScoresU - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--SHUTTLE_INSPECTION_V_1_2_3.EXT" (all upper case, separate with underscores)
[/li]
[li] DotsL - "NASA - Shuttle Inspection v1.2.3.ext" To: "nasa--shuttle.inspection.v.1.2.3.ext" (all lower case, separate with periods)[/li]
[li] DotsP - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--Shuttle.Inspection.v.1.2.3.ext" (preserves capitalization, separate with periods)[/li]
[li] DotsU - "NASA - Shuttle Inspection v1.2.3.ext" To: "NASA--SHUTTLE.INSPECTION.V.1.2.3.EXT" (all upper case, separate with periods)[/li][/ul]

This article suggests using underscores in lieu of spaces for web files. That would be my ScoresL format above.

This article looks down on using underscores in web file names, and suggest using dots in instead. :unamused: That would be my DotsL format above.

All formats are safe to operate on files already so formatted. Thus, if the user selects a file formatted in Space and clicks on Space, the file name does change, only it changes back to the same name it started with.

On any of the Scores or Dots Formats, I will force a double hyphen. This is required so that Friendly and Space formats can properly handle files that use hyphens between version numbers.


Friendly and Spaces do nearly everything the other does with one exception: Friendly forces capitalized words, Space preserves existing capitals in the file name. One would use Space on a file formatted like Butt (the opposite of Space).

I have already completed, and extensively tested, Spaces. I'm completing the newest version of Friendly today (I already provided several versions of Friendly in this thread). These are both intended for files used in Windows by humans. I will complete Butt, since I want it. The DotsL and ScoresL are essential web formats. The others I'm not so sure about, but I though I'd type them out to see how they would look.

Q: Do you think any of the other formats are worth spending my time on?

Q: Any issues with how the output formats above look?


Draft Documentation:

The new toolbar provides 4 automation-level buttons for each format rename. Each has its strengths and weaknesses, none are perfect. Users should copy the buttons they want to use to their own toolbar. Each automation level button is named in the following convention, where Format = the actual format name from the above list:

[ul][li] Format 1[ul][li] Intended For: Only one (1) and only one (1) file at a time.[/li]
[li] Speed: Fully automated, one and your done in seconds.[/li]
[li] Safeguards: Speed is throttled back a tiny bit (≈ 1 second) between each step. This ensures the file list is updated on long file names where several recursive renames occur in a single step, before proceeding to the next step.[/li]
[li] Weakness: The file list update speed is weakest link. This method cannot be used for more than one file. [/li][/ul]
Bar none, this is the fastest and most convenient method for any format rename. And bar none, this is also the most risky method for any format rename. The high risk prohibits usage on more than one file--it simple will error on more than one file. The reason is the file list doesn't get refreshed fast enough between the multiple rename steps. This risk is greatly minimalized when only one file is selected. The file list refreshes nearly fast enough to keep up with the several rename steps on a single file. It depends on how long the file name is, how intense the rename step is (some use recursion), and various system performance factors (what other apps are running). The tiny throttle-back safeguard helps infinitesimalize the risk to ensure the file list refresh keeps up. It should be safe for all single-file renames. (But I'm not not guaranteeing anything.) If a desktop search engine is monitoring the hard drive at that moment, all bets are off.

Format 1~[ul][li] Intended For: One or more files at once, manual verification of each step.[/li]
[li] Speed: As fast as the user can verify the file list has been updated and click OK.[/li]
[li] Safeguards: The user prevents the next step from launching until the file list is updated from previous step.[/li]
[li] Weakness: the user's error or impatience--clicking too early at any step messes up file names.[/li][/ul]
If the user is knowledgeable in how the file list refresh lags behind a rename, and well-practiced in using this button, this is the fastest method for multiple-file renames. However, the human error probability makes this the second riskiest method. Basically the button queues each rename step, and the the user clicks OK when its safe to proceed to the next step. All the control and risk is in the users hands. The user must watch for the few selected files to be renamed, not to be deselected! Opus deselects files during a rename. However the updated file listing occurs after this. Patience and keen eye observing each step is a must.
[/li]
[li] Step XX of YY: These buttons are located in the menu portion of the Format 1~ menu button. Each individual Step button has the essentially the same factors as the Format 1~, only the rename steps are not queued. The user must click on each individual rename step button in the proper sequence. These buttons are intended more for developing and testing a new rename format sequence. They are included in case others want to experiment with them.
[/li]
[li] Format 1+[ul][li] Intended For: One or more files at once with no required manual verification between each step.[/li]
[li] Speed: Calibrated (restricted), to account for system performance, allowing for safer one-click automation on several hundred files.[/li]
[li] Safeguards: Upfront testing by the user helps determine the calibration required. A calibration factor is added to the command's auto-wait period between each rename step. Eliminates the probability of the user clicking too soon between steps or in the wrong order on individual buttons.[/li]
[li] Weakness: The button calibration may not be accurate under all system loads. This risk level rises as system performance degrades below what it was during calibration testing. This can be managed by calibrating the buttons under a significant system load that includes applications accessing the hard drive. This method is not recommended for use on networked or removable drives.[/li][/ul]
This is the best option I could provide for hundreds of files. It allows for fewer mouse clicks Than Format~. The button has a built-in auto-wait that scales to how many files the user selected (the user is prompted for this information).

I've provided a fairly automated calibration testing mechanism that allows the user to see how the button's default auto-wait performs against the provided 1281 test cases on the user's system. The user re-runs the calibration testing with different modifiers until no errors occur. The calibration test shows what actually should happen to one file at each step of a Space 1 Format Rename. So it's fairly informative. (In a future version I might actually discuss the RegExp code taking place on each of these screens.) Once the user has a good calibration number, they edit the Format 1+ button to code that in, and from then on just use the button. If the system is modified, or performance changes, the calibration should be done again.[/li][/ul]

Great Blazes that's insane! And incredible! I can't offer much time for testing but I'll gladly give it a spin and let you know of any issues that arise.

Hi Ken and denyerec,

As I thought it might, and as Ken's solution indicates,
this one does consume some thought and time.

I made significant progress today on a different approach to this problem.
My solution will probably be three buttons.
Each button works a different part of the rename using RegExps.
The buttons each must be pressed multiple times until that part of the rename converges.
At convergence, input name = output name.
When each of the three buttons reach convergence, only a titlecase rename will still be needed.

I'm not ready to post any results yet, but perhaps tomorrow.
This entire rename could be done more easily just by writing a command line program,
but it is now a matter of stubborn pride....

Regards,
Porcupine

It all became much easier this morning.
Well, here it is.

I have here three buttons.
Each of these buttons may have to pressed several times to achieve convergence of that button.
Thie idea is to press the button, look at the result, and then press the button again if needed. 8)
The buttons must be used in the correct order.
Button one must be run to convergence, then button two must be run to convergence, and then button three must be run to convergence.

If you haven't already done so, download Ken's Test Files.
They can be downloaded from his KRA-10StepsToFileNameBliss-TestCases-v5.0.zip link earlier in this thread.

Button One - Removes ( dot | underscore | multiple space ) characters

@NoDeselect Rename REGEXP PATTERN="(([^_]*|[^\.]*)(_|\.| ))(.*)(\.)(.*)" TO="\2 \4\5\6"

Button Two - Inserts dot characters between numbers relative to (V | v) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(V|v)((\.[0-9])*)([^0-9])?([0-9])(.*)" TO="\1\2\3.\6\7"

Button Three - Flanks hyphen characters with a space character on each side if they don't already exist

@NoDeselect Rename REGEXP PATTERN="(.*)(([^ ])(-)|(-)([^ ]))(.*)" TO="\1\3 \4\5 \6\7"

The entire toolbar can be downloaded at the bottom of this post.

This does all of the original problem except the Titlecase Rename.
Ken got the Titlecase Rename in his solution.
I still think a command line program is the better solution here,
but after working on it several hours yesterday and almost failing,
I just HAD to finish it.

@Ken
Thanks very much for your test files.
They were invaluable ! :stuck_out_tongue:

Regards,
Porcupine
Filename Cleanup Regex -- Porcupine Three Button.dop (1.56 KB)

As Soon as I'm done testing I'm posting a toolbar in the toolbar forum. It will have it all, in one button. I'm not done with the user screens yet.

Hi Ken,

I have a fix and update on my three button solution.
It now works with a single click on each button.

Button One - Removes ( dot | underscore | multiple space ) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(_|\.| )(.*)(\.)(.*)#" TO="\1 \3\4\5"

Button Two - Inserts dot characters between numbers relative to (V | v) characters

@NoDeselect Rename REGEXP PATTERN="(.*)(V|v)((\.[0-9])*)([^\.])([0-9])(.*)#" TO="\1\2\3.\6\7"

Button Three - Flanks hyphen characters with a space character on each side if they don't already exist

@NoDeselect Rename REGEXP PATTERN="(.*)(([^ ])(-)|(-)([^ ]))(.*)#" TO="\1\3 \4\5 \6\7"

I tried your, AFAIK, undocumented 'trick' of adding a '#' character to the end of the expression.
It worked on buttons one and three, but always froze my computer on button two.
I was stumped for quite some time.
I realized tonight I was creating an infinite loop.
It all seems to work now.

The entire toolbar can be downloaded at the bottom of this post.

Regards,
Porcupine
Filename Cleanup Regex -- Porcupine Three Button- ver2.dop (1.55 KB)

I actually have things worked out to one and only button to the job. I've now created several different toolbar buttons for different file name formats. I am calling "Format Renames", and I will be releasing a Format Rename Toolbar soon. However, I must do more testing on the performance calibration.

I've pioneered three different techniques of using one button.

One technique automates the process, as fast as possible, but is only safe for one file.

One technique automates the process, is as fast and as safely as possible, and is safe for thousands of files.

The other technique is semi-automatic, it is only as safe as the user (it's really for me to test things out).

Right now all I'm working on is some stress testing, and trying to extrapolate the most accurate calibration calculation to use. (All this will be explained in detail in my Toolbar post when I submit it for others to use.)