opw62
October 29, 2024, 12:18pm
1
I have hundreds of HTML files that have geen converted to PDF.
File names (without extensions) are the same, so
blah-blah.html and
blah-blah.pdf
They are all in the same folder, I expect some 80-90 pct have these 'same file names'.
There are also a bunch of files that have not been convered.
I'd like to delete all those converted html-type files, i.e. the html files that have a .pdf version with the same file name.
Any suggestions?
Deleting them one by one is a timeconsuming and boring exercise...
Thanks.
fkast
October 29, 2024, 1:58pm
2
try to modify this script:
argl the extension check is case sensitive and I only covered lowercase while in your case everything seems uppercase.
a version with case insensitive extension check.
OrphanedXMP.osp (917 Bytes)
or modify this:
You could use this button to select matching folders:
@nodeselect
Select PATTERN="{=RegEx(file, "(.*)\.html?$", "\1_files")=}" TYPE=dirs
Delete
opw62
October 29, 2024, 3:11pm
4
Many thanks to you all.
Am going to give it a try. A bit more complex than I initially thought, to be honest.
Thanks again!
lxp
October 29, 2024, 4:06pm
5
You can use the last example almost verbatim.
opw62:
all those ... html files
FileExt(file_name)==".html"
Exists("" + Parent(fullpath) + "\" + Stem(file_name) + ".pdf")
Putting it together:
Select FILTERDEF =FileExt(file_name)==".html" && Exists("" + Parent(fullpath) + "\" + Stem(file_name) + ".pdf")
You can also Group by extension and click the group header to select them all.
Leo
October 29, 2024, 5:06pm
7
Another possibility is something like this:
Select SIMILARBASE
Select ~*.(html|htm) DESELECT
That will look at what is currently selected and then:
Select everything with the same name stem but a different extension.
Deselect anything that isn't a .html or .htm file.
(The original selection of non-html files will be deselected as well, which is presumably wanted if you intend to delete the selected files as the next step.)
(You could also add Select * TYPE=dirs DESELECT
at the end to explicitly deselect the folders, if you're worried about folders with .html or .htm at the end of their names.)
1 Like