How can I convert a djvu to a pdf preserving word searchability?

Solution 1:

I wrote a script to do this a long time ago. It is essentially glue code around a few utilities that do the heavy lifting. The difference between my script and the other tools at the time is that mine was the only one that did all of the following:

  • had a similar compression ratio to the original DjVu file (1.5-2x the size instead of 10-20x the size)
  • preserved bookmarks / table of contents metadata (for navigation in the pdf reader)
  • preserved the embedded text layer for searching

That being said, it is very primitive. I just made sure it worked well for all of my own files and haven't worked on it since.

Solution 2:

I packed vindvaki's scripts into docker image with required dependencies. You can try it with:

  docker run --rm -u $(id -u):$(id -g) -v $(pwd):/opt/work ilyabystrov/djvu2pdf filename.djvu filename.pdf

Check djvu2pdf-docker for details.

Solution 3:

Open the PDF file in PDF-XChange Viewer and perform OCR (I believe only four languages are supported). It takes time but it is damn good (even on two-column documents).

On Mac and Linux you will need Wine.

Solution 4:

Have you tried Calibre? The contributor to Calibre mentions that OCR'd text in djvu is supported. So it could probably be converted to PDF with searchable text.

Solution 5:

This DjVu to PDF converter definitely preserves word searchability in case the original DjVu is searchable. It also produces smaller output files than calibre.