Convert PDF to clean SVG? [closed]
Solution 1:
You can use Inkscape on the commandline only, without opening a GUI. Try this:
inkscape \
--without-gui \
--file=input.pdf \
--export-plain-svg=output.svg
For a complete list of all commandline options, run inkscape --help
.
Solution 2:
Inkscape is used by many people on Wikipedia to convert PDF to SVG.
http://inkscape.org/
They even have a handy guide on how to do so!
http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape
Solution 3:
I am currently using PDFBox which has good support for graphic output. There is good support for extracting the vector strokes and also for managing fonts. There are some good tools for trying it out (e.g. PDFReader will display as Java Graphics2D). You can intercept the graphics tool with an SVG tool like Batik (I do this and it gives good capture).
There is no simple way to convert all PDF to SVG - it depends on the strategy and tools used to create the PDFs. Some text is converted to vectors and cannot be easily reconstructed - you have to install vector fonts and look them up.
UPDATE: I have now developed this into a package PDF2SVG which does not use Batik any more:
which has been tested on a range of PDFs. It produces SVG output consisting of
- characters as one
<svg:text>
per character - paths as
<svg:path>
- images as
<svg:image>
Later packages will (hopefully) convert the characters to running text and the paths to higher-level graphics objects
UPDATE: We can now re-create running text from the SVG characters. We've also converted diagrams to domain-specific XML (e.g. chemical spectra). See https://bitbucket.org/petermr/svg2xml-dev. It's still in Alpha, but is moving at a useful speed. Anyone can join in!
UPDATE. (@Tim Kelty) We are continuing to work on PDF2SVG and also downstream tools that do (limited) Java OCR and creation of higher-level graphics primitives (arrows, boxes, etc.) See https://bitbucket.org/petermr/imageanalysis https://bitbucket.org/petermr/diagramanalyzer https://bitbucket.org/petermr/norma and https://bitbucket.org/petermr/ami-core . This is a funded project to capture 100 million facts from the scientific literature (contentmine.org) much of which is PDF.
Solution 4:
This topic is quite old, but here is a handy solution that I found:
http://www.cityinthesky.co.uk/opensource/pdf2svg/
It offers a tool, pdf2png, which once installed does exactly the job in command line. I've tested it with irreproachable results so far, including with bitmaps.
EDIT : My mistake, this tool also converts letters to paths, so it does not address the initial question. However it does a good job anyway, and can be useful to anyone who does not intend to modify the code in the svg file, so I'll leave the post.