How do header and source files in C work?
Converting C source code files to an executable program is normally done in two steps: compiling and linking.
First, the compiler converts the source code to object files (*.o
). Then, the linker takes these object files, together with statically-linked libraries and creates an executable program.
In the first step, the compiler takes a compilation unit, which is normally a preprocessed source file (so, a source file with the contents of all the headers that it #include
s) and converts that to an object file.
In each compilation unit, all the functions that are used must be declared, to let the compiler know that the function exists and what its arguments are. In your example, the declaration of the function returnSeven
is in the header file header.h
. When you compile main.c
, you include the header with the declaration so that the compiler knows that returnSeven
exists when it compiles main.c
.
When the linker does its job, it needs to find the definition of each function. Each function has to be defined exactly once in one of the object files - if there are multiple object files that contain the definition of the same function, the linker will stop with an error.
Your function returnSeven
is defined in source.c
(and the main
function is defined in main.c
).
So, to summarize, you have two compilation units: source.c
and main.c
(with the header files that it includes). You compile these to two object files: source.o
and main.o
. The first one will contain the definition of returnSeven
, the second one the definition of main
. Then the linker will glue those two together in an executable program.
About linkage:
There is external linkage and internal linkage. By default, functions have external linkage, which means that the compiler makes these functions visible to the linker. If you make a function static
, it has internal linkage - it is only visible inside the compilation unit in which it is defined (the linker won't know that it exists). This can be useful for functions that do something internally in a source file and that you want to hide from the rest of the program.
The C language has no concept of source files and header files (and neither does the compiler). This is merely a convention; remember that a header file is always #include
d into a source file; the preprocessor literally just copy-pastes the contents, before proper compilation begins.
Your example should compile (foolish syntax errors notwithstanding). Using GCC, for example, you might first do:
gcc -c -o source.o source.c
gcc -c -o main.o main.c
This compiles each source file separately, creating independent object files. At this stage, returnSeven()
has not been resolved inside main.c
; the compiler has merely marked the object file in a way that states that it must be resolved in the future. So at this stage, it's not a problem that main.c
can't see a definition of returnSeven()
. (Note: this is distinct from the fact that main.c
must be able to see a declaration of returnSeven()
in order to compile; it must know that it is indeed a function, and what its prototype is. That is why you must #include "source.h"
in main.c
.)
You then do:
gcc -o my_prog source.o main.o
This links the two object files together into an executable binary, and performs resolution of symbols. In our example, this is possible, because main.o
requires returnSeven()
, and this is exposed by source.o
. In cases where everything doesn't match up, a linker error would result.
There is nothing magic about compilation. Nor automatic!
Header files basically provide information to the compiler, almost never code.
That information alone, is usually not enough to create a full program.
Consider the "hello world" program (with the simpler puts
function):
#include <stdio.h>
int main(void) {
puts("Hello, World!");
return 0;
}
without the header, the compiler does not know how to deal with puts()
(it is not a C keyword). The header lets the compiler know how to manage the arguments and return value.
How the function works, however, is not specified anywhere in this simple code. Somebody else has written the code for puts()
and included the compiled code in a library. The code in that library is included with the compiled code for your source as part of the compilation process.
Now consider you wanted your own version of puts()
int main(void) {
myputs("Hello, World!");
return 0;
}
Compiling just this code gives an error because the compiler has no information about the function. You can provide that information
int myputs(const char *line);
int main(void) {
myputs("Hello, World!");
return 0;
}
and the code now compiles --- but does not link, ie does not produce an executable, because there is no code for myputs()
. So you write the code for myputs()
in a file called "myputs.c"
#include <stdio.h>
int myputs(const char *line) {
while (*line) putchar(*line++);
return 0;
}
and you have to remember to compile both your first source file and "myputs.c" together.
After a while your "myputs.c" file has expanded to a hand full of functions and you need to include the information about all the functions (their prototypes) in the source files that want to use them.
It is more convenient to write all the prototypes in a single file and #include
that file. With the inclusion you run no risk of making a mistake when typing the prototype.
You still have to compile and link all the code files together though.
When they grow even more, you put all the already compiled code in a library ... and that's another story :)