Automate replacing all files with a certain, identical content throughout a folder hierarchy

TrulyFoxy · May 28, 2021, 7:26pm

I was about to investigate a powershell way, but given how often I've found out later that DO could have done something, I'm asking... is there a way:

I have a huge set of small files (four million) in a folder hierarchy (thousands of subfolders). It's a geographic map tile source. I need to find all of them that match a certain content, and replace all those with a single 'source' file; keeping the name of the file that is replaced intact.

Actually, only the file size and first four characters of the file need to be tested to match. They are all small, identical (content, not name) .jpg files that are (unfortunately) named as .png in among the majority of files that actually are .png. I'm replacing those erroneous jpgs with a real png file.

Answers on a postcard please (there's a phrase from the past)... or better yet, here

Leo · May 28, 2021, 9:17pm

It's something you could do with Opus but it'd probably require some scripting and either end up similar to what you were thinking of doing in PowerShell (if the script did everything) or involve a bunch of extra clicking (if the script just identified the files that need replacing).

If you're already familiar with PowerShell and thinking about ways to do it with that, I'd probably stick with that method. But Opus does have scripting helpers that let you examine the binary data in files which you could use if you wanted to do the same thing using Opus.

TrulyFoxy · May 28, 2021, 10:42pm

Thanks Leo. I've thought of a simpler way anyway, although it's not as exciting.

I realized that there's practically no chance of another file having the exact same size but not being one of those in question (they are 'blank' tiles around the coastline). So...

Use robocopy to move out all the files of the correct size into a duplicated folder structure.
Recursive for statement in a batch file to overwrite every file in the extracted set with the source file.
Robocopy them back into the original place.
Enjoy a beer in the time I saved not writing a powershell script to check the file headers.