Find the two of the same words in a filename?

hello,

I just went through an exhausting procedure where I renamed over 30,000 files. Some of the file name are over 100 characters and some contain the same word throughout the file name..

Example: Planning and Development Instructions for Development Permit.pdf

Same words are : Development

Is there a script or command that will look at all files in a folder and return results with just the files that have two ore more of the same words?

Thanks :laughing:

I don't know of an existing tool to do that. Seems like a good job for a scripting language (or Excel if you use Opus to turn the filepaths into something you can paste into Excel).

I have exported all the file names into an excel document but do not know the function/command to find duplicate names in each file. Any help would be frea. Thank you

You would have to write macro code (Visual Basic) to do this in Excel, or find someone who can do it for you. You need to split each name into "words" and count the number of occurrences of each word. You would need to be the judge of what constitutes a word. Words are not always separated by spaces. For example:

[ol][li]Planning and Development Instructions for Development Permit.pdf[/li][li]Planning_and_Development Instructions for Development Permit.pdf[/li][/ol]
Regards, AB

Thanks for the help, i just email staff in the IT department to help me out with this!

Ill let you know if it works, and will post the script

This Excel macro will identify the file names with any repeating words and copy the file name to another sheet. The macro looks for the file names on "Sheet1" starting with cell "A1". The names with repeated words are copied to "Sheet2" starting at cell "A1". You can change these to where your data is. Here is the macro code.

CODE:

Sub FindRepeatedWords()

Dim Cell As Range
Dim DSO As Object
Dim DstRng As Range
Dim R As Long
Dim RegExp As Object
Dim RngEnd As Range
Dim SrcRng As Range
Dim W As Variant
Dim Words As Variant

'Define the starting cells for the Source and Destination ranges
Set SrcRng = Worksheets("Sheet1").Range("A1")
Set DstRng = Worksheets("Sheet2").Range("A1")

'Find the last cell in the column with data
Set RngEnd = SrcRng.Parent.Cells(Rows.Count, SrcRng.Column).End(xlUp)
'Exit if no data is found
If RngEnd.Row < SrcRng.Row Then Exit Sub

'Extend the range to the last cell in the column
Set SrcRng = SrcRng.Parent.Range(SrcRng, RngEnd)

 'Create the associative array for identifying repeated words
  Set DSO = CreateObject("Scripting.Dictionary")
  DSO.CompareMode = vbTextCompare

 'Create the word parser
  Set RegExp = CreateObject("VBScript.RegExp")
  RegExp.Global = True
  RegExp.IgnoreCase = True
  RegExp.Pattern = "(\w+)\b"
  
  For Each Cell In SrcRng
    Set Words = RegExp.Execute(Cell.Text)
    For Each W In Words
      If Not DSO.Exists(W.Value) Then
        DSO.Add W.Value, 1
      Else
       'Copy to file name to new sheet
        DstRng.Offset(R, 0) = Cell
       'Advance the row counter
        R = R + 1
        Exit For
      End If
    Next W
    DSO.RemoveAll
  Next Cell

'Free the object references and memory
Set DSO = Nothing
Set RegExp = Nothing

End Sub

I prefer perl...

while (<>) { if (m/\b(\w+)\b.+\b\1\b/g) { print $_; } }

Unfortunately the above regex does not work in DO search.