How do I run the preprocessor on local headers only?

How much effort are you willing to go to? There's an obnoxiously obscure way to do it but it requires you to set up a dummy directory to hold surrogates for the system headers. OTOH, it doesn't require any changes in any of your source code. The same technique works equally well for C code.

Setup

Files:

./class_a.hpp
./class_b.hpp
./example.cpp
./system-headers/iostream
./system-headers/string

The 'system headers' such as ./system-headers/iostream contain a single line (there is no # on that line!):

include <iostream>

The class headers each contain a single line like:

class A{};

The contents of example.cpp are what you show in the question:

#include <iostream>     //system
#include "class_a.hpp"  //local
#include <string>       //system
#include "class_b.hpp"  //local

int main() {}

Running the C preprocessor

Running the C preprocessor like this produces the output shown:

$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp
# 1 "example.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "example.cpp"
# 1 "system-headers/iostream" 1
 #include <iostream>
# 2 "example.cpp" 2
# 1 "class_a.hpp" 1
class A{};
# 3 "example.cpp" 2
# 1 "system-headers/string" 1
 #include <string>
# 4 "example.cpp" 2
# 1 "class_b.hpp" 1
class B{};
# 5 "example.cpp" 2

int main() {}
$

If you eliminate the # n lines, that output is:

$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
 #include <iostream>
class A{};
 #include <string>
class B{};

int main() {}
$

which, give or take the space at the beginning of the lines containing #include, is what you wanted.

Analysis

The -Dinclude=#include argument is equivalent to #define include #include. When the preprocessor generates output from a macro, even if it looks like a directive (such as #include), it is not a preprocessor directive. Quoting the C++11 standard ISO/IEC 14882:2011 (not that this has changed between versions AFAIK — and is, verbatim, what it says in the C11 standard, ISO/IEC 9899:2011 too, in §6.10.3):

§16.3 Macro replacement

¶8 If a # preprocessing token, followed by an identifier, occurs lexically at the point at which a preprocessing directive could begin, the identifier is not subject to macro replacement.

§16.3.4 Rescanning and further replacement

¶2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. …

¶3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, …

When the preprocessor encounters #include <iostream>, it looks in the current directory and finds no file, then looks in ./system-headers and finds the file iostream so it processes that into the output. It contains a single line, include <iostream>. Since include is a macro, it is expanded (to #include) but further expansion is prevented, and the # is not processed as a directive because of §16.3.4 ¶3. Thus, the output contains #include <iostream>.

When the preprocessor encounters #include "class_a.hpp", it looks in the current directory and finds the file and includes its contents in the output.

Rinse and repeat for the other headers. If class_a.hpp contained #include <iostream>, then that ends up expanding to #include <iostream> again (with the leading space). If your system-headers directory is missing any header, then the preprocessor will search in the normal locations and find and include that. If you use the compiler rather than cpp directly, you can prohibit it from looking in the system directories with -nostdinc — so the preprocessor will generate an error if system-headers is missing a (surrogate for a) system header.

$ g++ -E -nostdinc -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
 #include <iostream>
class A{};
 #include <string>
class B{};

int main() {}
$

Note that it is very easy to generate the surrogate system headers:

for header in algorithm chrono iostream string …
do echo "include <$header>" > system-headers/$header
done

JFTR, testing was done on Mac OS X 10.11.5 with GCC 6.1.0. If you're using GCC (the GNU Compiler Collection, with leading example compilers gcc and g++), your mileage shouldn't vary very much with any plausible alternative version.

If you're uncomfortable using the macro name include, you can change it to anything else that suits you — syzygy, apoplexy, nadir, reinclude, … — and change the surrogate headers to use that name, and define that name on the preprocessor (compiler) command line. One advantage of include is that it's improbable that you have anything using that as a macro name.

Automatically generating surrogate headers

osgx asks:

How can we automate the generation of mock system headers?

There are a variety of options. One is to analyze your code (with grep for example) to find the names that are, or might be, referenced and generate the appropriate surrogate headers. It doesn't matter if you generate a few unused headers — they won't affect the process. Note that if you use #include <sys/wait.h>, the surrogate must be ./system-headers/sys/wait.h; that slightly complicates the shell code shown, but not by very much. Another way would look at the headers in the system header directories (/usr/include, /usr/local/include, etc) and generate surrogates for the headers you find there. For example, mksurrogates.sh might be:

#!/bin/sh

sysdir="./system-headers"
for header in "$@"
do
    mkdir -p "$sysdir/$(dirname $header)"
    echo "include <$header>" > "$sysdir/$header"
done

And we can write listsyshdrs.sh to find the system headers referenced in source code under a named directory:

#!/bin/sh

grep -h -e '^[[:space:]]*#[[:space:]]*include[[:space:]]*<[^>]*>' -r "${@:-.}" |
sed 's/^[[:space:]]*#[[:space:]]*include[[:space:]]*<\([^>]*\)>.*/\1/' |
sort -u

With a bit of formatting added, that generated a list of headers like this when I scanned the source tree with my answers to SO questions:

algorithm         arpa/inet.h       assert.h          cassert
chrono            cmath             cstddef           cstdint
cstdlib           cstring           ctime             ctype.h
dirent.h          errno.h           fcntl.h           float.h
getopt.h          inttypes.h        iomanip           iostream
limits.h          locale.h          map               math.h
memory.h          netdb.h           netinet/in.h      pthread.h
semaphore.h       signal.h          sstream           stdarg.h
stdbool.h         stddef.h          stdint.h          stdio.h
stdlib.h          string            string.h          sys/ipc.h
sys/mman.h        sys/param.h       sys/ptrace.h      sys/select.h
sys/sem.h         sys/shm.h         sys/socket.h      sys/stat.h
sys/time.h        sys/timeb.h       sys/times.h       sys/types.h
sys/wait.h        termios.h         time.h            unistd.h
utility           vector            wchar.h

So, to generate the surrogates for the source tree under the current directory:

$ sh mksurrogatehdr.sh $(sh listsyshdrs.sh)
$ ls -lR system-headers
total 344
-rw-r--r--   1 jleffler  staff   20 Jul  2 17:27 algorithm
drwxr-xr-x   3 jleffler  staff  102 Jul  2 17:27 arpa
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 assert.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 cassert
-rw-r--r--   1 jleffler  staff   17 Jul  2 17:27 chrono
-rw-r--r--   1 jleffler  staff   16 Jul  2 17:27 cmath
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 cstddef
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 cstdint
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 cstdlib
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 cstring
-rw-r--r--   1 jleffler  staff   16 Jul  2 17:27 ctime
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 ctype.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 dirent.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 errno.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 fcntl.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 float.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 getopt.h
-rw-r--r--   1 jleffler  staff   21 Jul  2 17:27 inttypes.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 iomanip
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 iostream
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 limits.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 locale.h
-rw-r--r--   1 jleffler  staff   14 Jul  2 17:27 map
-rw-r--r--   1 jleffler  staff   17 Jul  2 17:27 math.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 memory.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 netdb.h
drwxr-xr-x   3 jleffler  staff  102 Jul  2 17:27 netinet
-rw-r--r--   1 jleffler  staff   20 Jul  2 17:27 pthread.h
-rw-r--r--   1 jleffler  staff   22 Jul  2 17:27 semaphore.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 signal.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 sstream
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 stdarg.h
-rw-r--r--   1 jleffler  staff   20 Jul  2 17:27 stdbool.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 stddef.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 stdint.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 stdio.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 stdlib.h
-rw-r--r--   1 jleffler  staff   17 Jul  2 17:27 string
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 string.h
drwxr-xr-x  16 jleffler  staff  544 Jul  2 17:27 sys
-rw-r--r--   1 jleffler  staff   20 Jul  2 17:27 termios.h
-rw-r--r--   1 jleffler  staff   17 Jul  2 17:27 time.h
-rw-r--r--   1 jleffler  staff   19 Jul  2 17:27 unistd.h
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 utility
-rw-r--r--   1 jleffler  staff   17 Jul  2 17:27 vector
-rw-r--r--   1 jleffler  staff   18 Jul  2 17:27 wchar.h

system-headers/arpa:
total 8
-rw-r--r--  1 jleffler  staff  22 Jul  2 17:27 inet.h

system-headers/netinet:
total 8
-rw-r--r--  1 jleffler  staff  23 Jul  2 17:27 in.h

system-headers/sys:
total 112
-rw-r--r--  1 jleffler  staff  20 Jul  2 17:27 ipc.h
-rw-r--r--  1 jleffler  staff  21 Jul  2 17:27 mman.h
-rw-r--r--  1 jleffler  staff  22 Jul  2 17:27 param.h
-rw-r--r--  1 jleffler  staff  23 Jul  2 17:27 ptrace.h
-rw-r--r--  1 jleffler  staff  23 Jul  2 17:27 select.h
-rw-r--r--  1 jleffler  staff  20 Jul  2 17:27 sem.h
-rw-r--r--  1 jleffler  staff  20 Jul  2 17:27 shm.h
-rw-r--r--  1 jleffler  staff  23 Jul  2 17:27 socket.h
-rw-r--r--  1 jleffler  staff  21 Jul  2 17:27 stat.h
-rw-r--r--  1 jleffler  staff  21 Jul  2 17:27 time.h
-rw-r--r--  1 jleffler  staff  22 Jul  2 17:27 timeb.h
-rw-r--r--  1 jleffler  staff  22 Jul  2 17:27 times.h
-rw-r--r--  1 jleffler  staff  22 Jul  2 17:27 types.h
-rw-r--r--  1 jleffler  staff  21 Jul  2 17:27 wait.h
$

This assumes that header file names contain no spaces, which is not unreasonable — it would be a brave programmer who created header file names with spaces or other tricky characters.

A full production-ready version of mksurrogates.sh would accept an argument specifying the surrogate header directory.


With clang you can do e.g.:

 clang -Imyinclude -P -E -nostdinc -nobuiltininc main.cpp

There does not seem to be a way to preserve the system #include lines it cannot find though.

This doesn't work for gcc, as its preprocessor will stop when using -nostdinc and it can't find an #included header file.


You can protect the system includes with a temporarily included comment, keep comments in the preprocessor output (-CC) and then remove the protectors again.

Something like:

sed -i 's%#include <%//PROTECTED #include <%g' $(find . -name '*.[hc]pp')
g++ -E -P -CC main.cpp -o new_main.cpp
sed -i 's%//PROTECTED %%g' new_main.cpp

Since you are modifying the source files, it might be a good idea to create a copy first and work on those copies instead. Plus some other details, but the above is the general idea you need.