How can I automatically convert all source code files in a folder (recursively) to a single PDF with syntax highlighting?
I would like to convert source code of a few projects to one printable file to save on a usb and print out easily later. How can I do that?
Edit
First off I want to clarify that I only want to print the non-hidden files and directories(so no contents of .git
e.g.).
To get a list of all non-hidden files in non-hidden directories in the current directory you can run the find . -type f ! -regex ".*/\..*" ! -name ".*"
command as seen as the answer in this thread.
As suggested in that same thread I tried making a pdf file of the files by using the command find . -type f ! -regex ".*/\..*" ! -name ".*" ! -empty -print0 | xargs -0 a2ps -1 --delegate no -P pdf
but unfortunately the resulting pdf file is a complete mess.
Solution 1:
I was intrigued by your question and got kinda carried away. This solution will generate a nice PDF file with a clickable index and color highlighted code. It will find all files in the current directory and subdirectories and create a section in the PDF file for each of them (see the notes below for how to make your find command more specific).
It requires that you have the following installed (the install instructions are for Debian-based systems but these should be available in your distribution's repositories):
-
pdflatex
,color
andlistings
sudo apt-get install texlive-latex-extra latex-xcolor texlive-latex-recommended
This should also install a basic LaTeX system if you don't have one installed.
Once these are installed, use this script to create a LaTeX document with your source code. The trick is using the listings
(part of texlive-latex-recommended
) and color
(installed by latex-xcolor
) LaTeX packages. The \usepackage[..]{hyperref}
is what makes the listings in the table of contents clickable links.
#!/usr/bin/env bash
tex_file=$(mktemp) ## Random temp file name
cat<<EOF >$tex_file ## Print the tex file header
\documentclass{article}
\usepackage{listings}
\usepackage[usenames,dvipsnames]{color} %% Allow color names
\lstdefinestyle{customasm}{
belowcaptionskip=1\baselineskip,
xleftmargin=\parindent,
language=C++, %% Change this to whatever you write in
breaklines=true, %% Wrap long lines
basicstyle=\footnotesize\ttfamily,
commentstyle=\itshape\color{Gray},
stringstyle=\color{Black},
keywordstyle=\bfseries\color{OliveGreen},
identifierstyle=\color{blue},
xleftmargin=-8em,
}
\usepackage[colorlinks=true,linkcolor=blue]{hyperref}
\begin{document}
\tableofcontents
EOF
find . -type f ! -regex ".*/\..*" ! -name ".*" ! -name "*~" ! -name 'src2pdf'|
sed 's/^\..//' | ## Change ./foo/bar.src to foo/bar.src
while read i; do ## Loop through each file
name=${i//_/\\_} ## escape underscores
echo "\newpage" >> $tex_file ## start each section on a new page
echo "\section{$i}" >> $tex_file ## Create a section for each filename
## This command will include the file in the PDF
echo "\lstinputlisting[style=customasm]{$i}" >>$tex_file
done &&
echo "\end{document}" >> $tex_file &&
pdflatex $tex_file -output-directory . &&
pdflatex $tex_file -output-directory . ## This needs to be run twice
## for the TOC to be generated
Run the script in the directory that contains the source files
bash src2pdf
That will create a file called all.pdf
in the current directory. I tried this with a couple of random source files I found on my system (specifically, two files from the source of vlc-2.0.0
) and this is a screenshot of the first two pages of the resulting PDF:
A couple of comments:
- The script will not work if your source code file names contain spaces. Since we are talking about source code, I will assume they don't.
- I added
! -name "*~"
to avoid backup files. -
I recommend you use a more specific
find
command to find your files though, otherwise any random file will be included in the PDF. If your files all have specific extensions (.c
and.h
for example), you should replace thefind
in the script with something like thisfind . -name "*\.c" -o -name "\.h" | sed 's/^\..//' |
- Play around with the
listings
options, you can tweak this to be exactly as you want it.
Solution 2:
I know I'm waaaay too late, but somebody looking for a solution might find this useful.
Based on @terdon's answer, I have created a BASH script that does the job: https://github.com/eljuanchosf/source-code-to-pdf
Solution 3:
(from StackOverflow)
for i in *.src; do echo "$i"; echo "---"; cat "$i"; echo ; done > result.txt
This will result a result.txt containing:
- Filename
- separator (---)
- Content of .src file
- Repeat from the top until all *.src files are done
If your source code have different extension, just change as needed. You can also edit the echo bit to add necessary information (maybe echo "filename $1" or change the separator, or add an end-of-file separator).
the link have other methods, so use whatever method you like best. I find this one to be most flexible, although it does come with a slight learning curve.
The code will run perfectly from a bash terminal (just tested on a VirtualBox Ubuntu)
If you don't care about filename and just care about content of files merged together:
cat *.src > result.txt
will work perfectly fine.
Another method suggested was:
grep "" *.src > result.txt
Which will prefix every single line with the filename, which can be good for some people, personally I find it too much information, hence why my first suggestion is the for loop above.
Credit to those in the StackOverflow forum people.
EDIT: I just realized that you are after specifically HTML or PDF as the end result, some solutions I've seen is to print the text file into PostScript and then convert postscript to PDF. Some code I've seen:
groff -Tps result.txt > res.ps
then
ps2pdf res.ps res.pdf
(Requires you to have ghostscript)
Hope this helps.