How to replace numeric entities in filenames? (&#number;)

Hello.. I know how to do it in perl, but has anyone a solution for VBScript?

The perl line for this is: s/&#[0]*([0-9]+);/chr($1)/ieg;

The chr function is the same as in VBScript, however in perl the code above executes the chr statement for each match.

This doesn't seem as straightforward in VB because matches and submatches collections are readonly.

PS! I'm very unfamiliar with VBScript, but W7 x64 doesn't support perlscript as engine, so I don't have a choice.

Btw, what I'm trying to achieve is to convert the numeric entity to its corresponding character.

What are you trying to do (in English rather than Perl-regex :slight_smile:)?

I thought my second post said it all.. I guess you've seen filenames like (between quotes):

"It's test.html" and "It%27s%20;test.html"

What I would like it to do is convert the entities to its real character so after rename it would be, in this case:
"It's a test.html"

Numeric entities: december.com/html/spec/codes.html

But how I would do it in VBScript doesn't seem so obvious.
It is basic rename scripting, but apparently with a twist.

I have a "strip all before first number" script used to rename episodetitles, and it depends on this functionality as those kinds of numbers isn't a number per se, but really a character.

Hmm, the forum interprets the other kind as it's corresponding character.
Remove the . between & and #:
"It&.#39;s&.#32;a&.#32;test.html"

If you mean for renaming files and it's just a few characters such as spaces and punctuation, then I'd create a button that runs a multiple find and replace rename command similar to the following:

@nodeselect Rename PATTERN="'" TO="'" FINDREP Rename PATTERN=" " TO=" " FINDREP

If you mean to replace all the characters within an html document, that's a little different. I just did that yesterday when I went to look up some lyrics to a song and the entire song lyrics were specially encoded by character entities to prevent them from being copied. So what I did was to save the source code from that web page, then copied the character entities and pasted them into a brand new html document and let my web browser decode them for me.

JohnZeman, the intent is to replace ANY numeric entity, i.e %number or &.#number; (without the .) in a filename with its proper character (for instance "&.#65;" = "%41" = "A"). In other words, 00-FF and 000-255 respectively.

In this case it would be too long a road to run Rename pattern as it would mean 512 renames (if one starts from 00) per file, and my folders have 12-500 files each.

If your intent is to do that, then you may discover as I did a few years back when I was writing a script to do the same thing, that what you'll likely end up with is an endless loop and a bunch of files that have totally meaningless names. What I mean by that is each character within a character entity, is in itself capable of being converted into yet another character entity. Since character entities contain & and # the digits 0-9, and the semicolon, you have to treat those characters differently.

In the end I limited my script, which was a script for my text editor so it wouldn't be compatible here, to only the main alphabet characters and certain other characters to prevent it from going into an endless loop.

JohnZeman, as I said initially. I already have the code, but currently in a scripting language that doesn't work with DOpus under 64bit windows (it works with x32 however):

This is the entire section I'm struggling with (it is written in perl, and is part of a rename script):

s/&#[0]*([0-9]+);/chr($1)/ieg;
#Convert hex % symbols to chars.
s/%([0-9][0-9A-F])/chr(hex($1))/ieg;

and it does what I mentioned.

I didn't mention both earlier because if one of the lines is solved, then the solution for the other becomes obvious.
It is a tiny bit more limited than 00-FF, but covers most if not all chars I've ever encountered in a filename.

Is this what you want?

classicasp.aspfaq.com/general/ho ... d-url.html

[quote="leo"]Is this what you want?

classicasp.aspfaq.com/general/ho ... d-url.html[/quote]
Thank you, it does seem to work, at least at first glance. :slight_smile:

So the code for both becomes something like this:

Function URLDecode(str) 
  For i = 1 To Len(str) 
      sT = Mid(str, i, 1) 
      If sT = "%" Then 
          If i+2 <= Len(str) Then 
              sR = sR & _ 
                  Chr(CLng("&H" & Mid(str, i+1, 2))) 
              i = i+2 
          End If 
      Else 
          sR = sR & sT 
      End If 
  Next 
  URLDecode = sR 
End Function

Function EntityDecode(str) 
  For i = 1 To Len(str) 
      sT = Mid(str, i, 2)
      If sT = "&#" and Mid(str,i+4,1)=";" Then
          If i+4 <= Len(str) Then 
              sR = sR & _ 
                  Chr(CLng(Mid(str, i+2, 2))) 
              i = i+4
          End If 
      Else 
          sR = sR & Left(sT,1)
      End If 
  Next 
  EntityDecode = sR 
End Function

WScript.Echo(UrlDecode("testing%20this%20again"))
WScript.Echo(EntityDecode("and&.#32;again&.#32;and&.#32"))

(Remove the . before & and # in the line above. Added to stop the forum messing with the line.)

The two last lines is just a test of the functions.

It's a lot more messy than perl, but as long as it works..

There was a small bug in the example that leaked through to the above functions. < should apparently be <=, otherwise it won't replace a number/entity if it is the end of the line.

[Admin note: Thanks, I've edited the post to correct it. --Leo]