HTML title column?

PeterPanino · May 26, 2008, 10:28am

In DO Details view, is there a column which displays the title of html files? (from the title tags Some Title).

Steve · May 26, 2008, 10:42am

No.

PeterPanino · May 26, 2008, 10:55am

Strange. There are specific columns for pictures, documents, music and programs but none for html/internet files!

Steve · May 26, 2008, 11:10am

No, it's not strange at all. They're not filetypes which provide a built in tagging system like pictures, docs, music and programs do. The files themselves would have to be parsed looking for html tags which could be anywhere in the text - slowing things down considerably.

PeterPanino · May 26, 2008, 11:24am

The title tag is always at beginning, IMHO.

Leo · May 26, 2008, 11:41am

If you view the source to this very page you'll see an example where it's on the 28th line. It could also be below a load of comments, CSS, javascript, etc.

Opus (or a plugin) could parse all the HTML to find the title but it's non-trivia (in terms of both programming and how long it would take per-file) compared to the other formats mentioned.

Not impossible, by any means, but also not surprising that it isn't done already, especially given how the title of most HTML files isn't interesting when you already know the path and filename to them.

Steve · May 26, 2008, 11:42am

You're wrong, it's not always in the exact same place. A vague "at the beginning" is not good enough when it comes to parsing a file - it needs to know the exact byte it starts at otherwise it has to parse through until it finds it.

PeterPanino · May 26, 2008, 1:03pm

Parsing an html file for the title tag should affect only those who have activated this column, so it would not slow down any other users. For the specific situation where I need it (not more than 35 html files inside the folder, title tag among the first 5-10 lines) it would take appr. 0,001 seconds to detect the titles in all 35 files, if a custom thread is used then this would be reduced to appr. 0,0001 seconds. Not really a time amount to be felt by the user.

Steve · May 26, 2008, 1:20pm

I doubt that GPSoftware would write something into Opus which would only work for your very specific needs. As for your "guestimates" regarding time well that depends on hardware speeds, system usage, etc, so you cannot possibly have any idea how long it would take. What about the overhead of an antivirus program, for example, which checks all files loaded? You have absolutely no idea how long it would take to parse the files and populate the opus description column so don't imply that your figures are anything other than guesses.

For such specific needs like this you could write your own program or script to parse the files and populate the descript.ion file - which would work fine with Opus.

If you can't do that yourself then I'm sure you can pay a programmer to do it for you. Otherwise, as you say, you only need this for 35 files so why not do it manually?

A far better alternative, imo, would be to not rely on the to identify your html files in the first place.

PeterPanino · May 26, 2008, 10:45pm

It took me a half hour to put together a html title parser - see attached files and try it yourself. As for my "guestimates", they weren't very wrong ...

If you want the source code - drop me a message ...
ParseForTitleTag.zip (192 KB)

Jon · May 26, 2008, 10:56pm

0.00225510504826731 seconds is far too slow - it would have to take less than 0.00225510504825 seconds before we'd even consider it!

PeterPanino · May 26, 2008, 11:01pm

He he ...

David · May 27, 2008, 1:13am

I think Jon is serious.
What was it, seven files ?

Could you,
or anyone else posting to this thread,
provide a zip file of sample HTML files to use as a benchmark ?

I'm curious as to what I can achieve using PHP and my SLOW Linux based web host.
It's easy to write code in a PHP script to caculate the computation time of the script.

I know that I can do a recursive Directory search of about 350 photos in various directories,
obtaining image dimensions and filesize,
reorder the database in a natural recursive search order,
and retain alias fields of of the already existing database in about 3 seconds.

Regards,
David

andersonnnunes · December 17, 2018, 10:04am

Windows Search seems to index correctly the properties of .html files (tested author, title, tag). It must be using something else than the default paths, as even Explorer does not seem to show the properties for .html.

Any change that Opus could use whatever component Windows Search is using? Would be helpful when a search returns html files.

Leo · December 17, 2018, 10:10am

You could do something using a script column these days.

It's probably not something we'll add internally unless more people ask for it. 2 people in 11 years isn't much.

wowbagger · December 18, 2018, 3:28am

You can do this with theregexp column script. Granted, regexp is not the best way to parse HTML, but it will prob work fine for most pages.

This is the config you would need

    {
      "name": "HtmlTitle",
      "label": "HtmlTitle",
      "header": "",
      "type": "text",
      "defwidth": "30",
      "justify": "left",
      "infotiponly": false,
      "maxstars": "5",
      "datetimeformat": "yyyy-mm-dd",
      "nogroup": false,
      "grouporder": "",
      "sorting": "normal",
      "regexp": "<\\s*?title\\s*?>(.*?)</\\s*?title\\s*?>",
      "inputItemProperty": "contents",
      "firstValid": true,
      "output": "$1",
      "filter": "*.html",
      "graphLowerColor": "",
      "graphUpperColor": "",
      "graphColorThreshhold": ""
    }

andersonnnunes · December 18, 2018, 7:59am

Guessing that the component Windows Search is using is an iFilter, given that Opus already uses iFilters to search the contents of files, I wonder if it would be very much harder to pipe the values of other fields to the default columns.

For this column, only 2, but given how many asked for columns for .eml/.msg, given how there is an iFilter for them, given the possibility of the user installing third-party iFilters, it is a bit surprising that it has not been done before.