I wonder if there is any plugin or mechanism for DO that indexing PDF files so user can search text inside PDF files with the build-in Find tool? Thanks!
Not at present but do send the idea to GPSoftware if you want to ensure it's on their to-do list.
What do you use now to achieve this?
To a certain extent the answer depends on what you mean. Do you want to find PDF files containing particular text? Or do you want to find where that text appears in a PDF file?
You say "DO that indexing PDF files" which I assume means "indexes" files, which implies something more powerful than the first option.
Indexing, searching and locating text strings within PDF files is usually handled by "desktop search" (DTS) software. The most popular are the one built into Vista, also available for XP, and Google. Other DTS candidates are Copernic and X1, my choice.
To get DO to let you perform this task you probably need the ability to integrate it with DTS. I'm guessing that they all differ in how they work, which will make it hard to get them all to work. As Microsoft and Google provide free DTS, that is probably where you will find most support for integration.
It is possible to get Vista itself to search using something other than its own DTS. But I don't know how much functionality this gives you.
Michael's post reminded me that Opus does tie in to Google Desktop Search if it is installed. There will be an extra Google Desktop tab in the Find panel (Tools -> Find Panel) in that case. I don't have it installed to check but maybe that can search in PDFs.
Microsoft DTS should search within PDF files. So that may solve your problem.
Anyone who wants to compare the various DTS options might like to consult this page:
tayeb.fr/informatique/search/ind ... _test3.xls
It is a spreadsheet that lists the features of different packages.
It isn't as up to date as it might be for some versions, but it is a good quick guide.
It does not go into the integration with other stuff.
What I want is to search certain text inside bunch of PDF files so that I can locate which file contain the text. For example search paper title in my bibliography directory.
Actually I know there are some PDF filter can achieve this, like Adobe PDF IFilter (foxitsoftware.com/pdf/ifilter/). But both needs native Windows search panel. I tried to install these softwares and then use DO to search, but seems DO does not utilize the additional filters. So I wonder if there is a way to fix this.
The built-in search feature in DO can do this. It will search pdf files for the text they contain and show the results. It does not use an index.
This is probably the slowest way of finding stuff. DTS is much quicker, because it does index.
The only software I know that indexes without holding position information is PaperPort, which can scan files as PDF and index them for words they contain without location information. But that is only a minor feature of PaperPort that many people ignore in favour of something more powerful.
If you really don't want to see where words appear within pdf files, your quest is to find indexing software that does not hold position details. (That would certainly be faster than an "on the fly" search.) I guess that was your first question.
When you find software that can do that, you can investigate getting it to work in DO.
Personally, I would try one of the more powerful DTS tools. After all, Microsoft offers it free, it is hard to avoid it in Vista, and DO already works with it.
Proper DTS is a seriously fast way of finding things. X1 has indexed 93,458 PDF files on my PC. It took 10 seconds to find four files containing the word GPSoftware. (Tells me that I have somehow buried three copies of the manual in different directories.) That is slower than some searches.
The last time I tried an unindexed search with DO, earlier today, it took minutes to trawl through the same files. Even a "standalone" search utility, FindOnClick, wasn't much quicker.
Just launched an Opus search on "GPSoftware", eight minutes so far, found two files, still running.
I may be mistaken but I don't think Opus's Find tool knows how to search PDF file contents. (It doesn't currently use IFilter plug-ins either.)
If you do match text within a PDF file then I think it's because what you searched for appeared in the file as plain-text. Since PDF files can contain control codes and mangle the text in other ways it isn't reliable to search their contents using Opus at the moment, although sometimes it will work.
[quote="michaelkenward"]...Personally, I would try one of the more powerful DTS tools. After all, Microsoft offers it free, it is hard to avoid it in Vista, and DO already works with it....[/quote]I thought that DOpus just worked with Google Desktop Search. How do I make it invoke MS Vista DTS?
My mistake. I misread Leo's reminder about integration with Google Desktop Search (GDS).
I also know that X1 can set itself to replace MS Vista's search. So there must be a way to plug into Vista.
It is, to me, surprising that DO opts to work with GDS rather than the operating system's "native" facility. Maybe that is what the original questioner should submit as a feature request.
This would deliver the function he wants without having to find or add anything in the way of new software. This would be faster than DO search even if that were capable of using iFilters to support search.
As I understand it, iFilters extract text minus the garbage, as Leo says, for search but do not do anything in the way of indexing files. Even if DO can attach itself to an iFilter, any search would be incredibly slow if the user has a lot of PDF files.
Over on DTS forums, the big argument is over indexing metadata in PDF files. More and more people use metadata to "tag" files. Any DTS software really has to work with that as well as the embedded text.
When GDS integration was added to Opus, Windows Desktop Search (or whatever it was called back then) was fairly uncommon/unknown, and not built into any version of the OS (Vista was not out yet).
Integrating it now would make sense but there are problems to solve about how to integrate it without losing existing search functionality in Opus or adding two completely different search modes (like there already are if you have GDS).
I found a way to do search (though not index), which solves my problem. I encountered this page: docu-track.com/home/prod_use ... dfx_viewer), which is free and capable of search PDF files in directory for detail.
Thanks for help!
Well spotted. Now it is just a case of integrating all that stuff with Opus. Or did you do that already?
On a different, but slightly related, note, I stumbled across the following when working with Outlook 2007 and wondering how to "preview" PDF files:
http://timheuer.com/blog/archive/2008/05/09/foxit-pdf-preview-handler.aspx
It is a "Foxit PDF Preview Handler" that allows preview without using Adobe Reader 8.1. (I struggle by on Acrobat 7.) I throw it in just in case your desire for PDF tools includes this sort of toy. Not sure that it means much for Opus, which can preview PDFs.
If you have FoxIt installed then it will be used by the Opus viewer pane automatically, provided it's installed to work with Internet Explorer. (That usually just means if you have another PDF viewer that works in IE then you need to have installed FoxIt last so that it overwrites the other viewer's settings.)
If you install the FoxIt preview handler (which itself requires FoxIt) then Opus will use that instead; however, that actually gives you a slightly worse viewer than the default FoxIt one.
If you want to install the preview handler for Outlook 2007 or Explorer's preview panel while using the normal viewer in Opus then you can do that. Configure the ActiveX plugin in Opus and clear the FoxIt checkbox under the Preview Handlers heading. (If there's no checkbox then the preview handler isn't installed.) The ActiveX plugin then drops down to IE which in turn will invoke the normal FoxIt viewer.
When GDS integration was added to Opus, Windows Desktop Search (or whatever it was called back then) was fairly uncommon/unknown, and not built into any version of the OS (Vista was not out yet).
Integrating it now would make sense but there are problems to solve about how to integrate it without losing existing search functionality in Opus or adding two completely different search modes (like there already are if you have GDS).[/quote]Hello Leo...
Do you happen to know if Windows DTS integration is already on the GPSoftware to-do list? I agree it would make sense to have the option.
It is.
Aplogies for persisting with this, but the above statement puzzles me. It is exactly what I assumed.
But I decided to install the preview handler I linked to without first installing FoxIt.
http://timheuer.com/blog/archive/2008/05/09/foxit-pdf-preview-handler.aspx
It will complain, I thought, if FoxIt is essential and it cannot find it.
Before installing this, PDF files did not preview in Outlook 2007. After installing it, without also installing FoxIt, PDF files do preview.
It may be that I once installed FoxIt and then removed it. (I have used it on and off over the years.) But there is no sign of any debris.
Maybe there is some subtle interaction going on with something else that I don't understand.
Oh, you're probably right in that case. I thought the FoxIt Preview Handler hooked into the standalone FoxIt viewer DLLs (like the Internet Explorer viewer does) but maybe the preview handler comes with its own copy of the DLLs.
It's about a year since I looked at it so my memory is probably faulty. Looking at the preview handler's webpage there's no mention of downloading FoxIt so, yeah, it's gotta be standalone.