How can I search for text in a pdf?

I've been using Paperport to organize my scanned documents, which are in pdf format, but now I've gone to Windows 7 and my old version of it doesn't work anymore. Since I can see thumbnails of pdf's in Opus, and I can also create a toolbar button to start my scanner software, it seems like Opus may work as a replacement. The only thing I can't figure out how to do is to be able to search for text in the pdf's.

I understand that I'd probably need to incorporate some sort of OCR module to do this, but has anyone come up with a way to search a folder of pdf's for a string? I've also tried to just create some "tags" in the pdf and display them in the lister. But I can't see how I can search for them. Any ideas?

You can use Opus to find (Ctrl+F) text in documents if you use the "contain" option and then get the name to match *.pdf. You can also limit this to locations on your PC.

But if you do this often, there is a much better approach that helps Opus.

If you use Windows 7, then you have on your PC something called Windows Search. You can use that to index and search for text in PDF files.

Windows Search will index all the PDF files on your PC and then when you want to find a PDF file containing a particular word it will throw this up almost instantly. "Find" on the other hand can take an age if you donlt have Windows Search doing its thing.

Those sensible folks at GP Software have rightly decided that there isn't much point in writing software to do things that are already in the operating system, so they plug Opus into Windows Search. So, use that to find things and away you go.

Sadly, some of us use more powerful index and search software that Opus can't talk to. But what the heck? Then we can usse the software itself.

Opus can't search scanned text (e.g. PDF files that contain a scanned bitmap image, rather than actual text).

It depends if you're on a 32bit or 64bit system. For a 32bit system if you have Adobe Acrobat installed, you'll have a PDF iFilter installed which will allow you to search PDFs. Even if you don't have Acrobat, I believe you can find that iFilter on Adobe's website, and install it separately. For 64-bit systems, the issue is that Adobe has even now still managed/bothered to provide a working iFilter DLL.

Your second option is to download the FoxIt PDF iFilter. Note that there is a free version available for desktop/home use.

Both of the above iFilters will plug into Windows Indexing Service, which will index your PDF files and then allow you to search their content.

Just to make sure, all of these only work with PDFs that in fact contain readable (=selectable) text. If you have PDFs that contain bitmaps only (e.g. from scanners), then you'll have to separately run an OCR module to make that text machine-readable. Again, Adobe Acrobat has that capability, and probably a number of other PDF programs as well.

Absolutely. I chime in only to add that PDF files that contain readable text – for scanned documents this is usually an optical character recognition (OCR) overlay atop the image – are often referred to as "PDF searchable". The test is to open the file and try to select and copy text from it.

Some scanners come with software that creates "PDF searchable" files. But you may have to specify this output during the scanning phase.

Here is the 64-bit iFilter previously mentioned.

[ul][li]Adobe PDF iFilter 9 for 64-bit platforms[/li][/ul]

I always scan with Acrobat to have max. compatibility. I tried different cheaper pdf-tools and scanner-software, most of them seemed to work 100% on "closed" sys, but I often send PDF to my customers which uses different readers or Acrobat Reader versions and they got errors.

Hey everyone,

this is a very sad report, as I apparently found something that works pretty simply in Windows Explorer, whereas I couldn't manage the same thing in Directory Opus.

First *.pdf full text search does not work by default on 64bit systems, as they lack the corresponding IFilter.

After installing this filter as described above from: documentsnap.com/how-to-fix- ... -7-64-bit/ for a nice tutorial on what to do and how to check whether it acutally worked), the search in pdf funcitonality, worked like a charm in the normal Windows Explorer.

I still couldn't figure out what to do to get the same in directory opus. In Windows Explorer if I search for the word "Michael" in a folder containing 100s of pdfs this gives an answer within about 1 second (reporting many hits). I guess MS is using the index.

In Directory Opus, however, the search takes forever even within a folder that contains only one file (I am using the Find - Containing Text - option) and then does not report any proper hits for *.pdf files, whereas it does work for e.g. *.txt files.

When I let DOpus search in a folder without *.pdf files, everything is fast and reports correct answers, but it also finds hits within files that are not actually in the folder I am searching in ... Very weird!

Dear DOpus team, please fix this!

Info: I am using DOpus 10.2 on Windows 7 x64

Cheers,

Michael

Have you tried using the search field at the top-right of the window (if using the default toolbars)?

That will use Windows Search and should work the same as the similar search field in Windows Explorer.

Thanks a lot for the reply. I did not know that there was a difference between the search field and the "find" tool that can be accessed via "Ctrl + F".

That works indeed just like the indexed search that normal windows explorer uses.

Thanks!

I tried to search from withing directory opus with ctrl+f, but as I had windows search service disabled (old pc) it did not work.
Instead, I use pdf-xchange editor that can also find within pdfs with total compatibility... Much better !
On the other side, I don't want to depend on windows search service to be able to find things. It fails to find files very often (from my experience) because
you cannot be sure when everything is yet indexed. Very frustrating !

Ctrl-F launches Opus's internal Find command which doesn't use or rely on Windows search. You will need an IFilter installed to search inside PDF files (I don't know if PDF-Xchange does or not).

Using the upper (toolbar) search obviously starts the default Windows Search Engine and it can be used to search inside PDFs that have searchable text content.

Now, what if those PDF files are located on a share on my (Synology) NAS? As far as I know those (network) paths are not indexed by Windows search. Does that mean I cannot find PDFs using DO (so does Windows Search) for finding appropriate files?

You can make Windows Search look at the content of files in non-indexed locations by explicitly asking it to.

e.g. Use content:xyz as the query instead of just xyz

(You can also use Opus's internal Find tool, which doesn't care where the files are since it never uses an index. Both assume an PDF IFilter is installed, but Windows 10 seems to include one by default unless something I've installed added one.)

I'm using PDFXedit für viewing and editing PDF files, which perfectly works with DO Preview pane.

There is a tool that can be used to set up an IFilter shell extension, which I've used new with these options, no restart of Windows so far:

2020-01-01_162509

I've restarted DO (removed it from the notification tray) and used CTRL+F to search some text inside PDF files. However, I do not get any results back.

Do I've to enable this shell extension within DO first?

Replying myself: a restart did not change the game, however, it seems that setting this marked option did the trick:
2020-01-01_163945

The search result tab now shows the correct PDF files :grinning:

Thanks for this information.

If people are looking for the location of the exe that allows these changes in PDF-XCHANGE Editor (full version) you may find it here:

C:\Program Files\Tracker Software\Shell Extensions\

Rennie

1 Like

Exactly, that's correct. Forgot to mention :stuck_out_tongue_winking_eye:

Thanks - your answers helped me :slight_smile:
(sorry for the late reply)