To index PDF files with xpdf, you will need to:
Download and install pdftotext, it is part of the xpdf package: xpdf-3.00 |
Place the provided script, usexpdf.sh, somewhere accessible, and add these lines: *.pdf /path/to/usexpdf.sh < to the .glimpse_filters file in your archive directory. NOTE: usexpdf.sh assumes pdftotext is in your path, if not you will need to edit the script accordingly. The reason we use usexpdf.sh, is because .glimpse_filters works on STDIN, but pdftotext requires an |
On the Manage Archive page, enter all or pdf PDF in the field labeled"Prefilter filetypes for speed:" Prefiltering is recommended for efficiency and speed. However, if you prefer to filter files on the fly in order to save space, then edit the wgreindex file in each archive that needs to access PDF files. You will need to add the -z switch to both glimpseindex command lines. |
Make sure .pdf files aren't being excluded from the indexing! Check the .wgfilter-index file and delete any line |
Important Add a line rm /tmp/xpdf* either to your crontab or the end of the wgreindex script. The xpdf filter tends to leave around tmp files and these |
Run ./wgreindex in your archive directory to regenerate the indexes. To search, make sure the "Use Filters" box is |