C/C++ header and implementation files: How do they work?

This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).

So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?

ex. Say I have 3 files

  • main.cpp
  • myfunction.cpp
  • myfunction.hpp

//main.cpp

#include "myfunction.hpp"
int main() {
  int A = myfunction( 12 );
  ...
}

-

//myfunction.cpp

#include "myfunction.hpp"
int myfunction( int x ) {
  return x * x;
}

-

//myfunction.hpp

int myfunction( int x );

-

I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?

I apologize if this isn't clear or I'm vastly mistaken about something, new here


The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.

The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.

In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.

So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.

Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.

I hope that helps


You have to understand that compilation is a 2-steps operations, from a user point of view.


1st Step : Object compilation

During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.

Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.


2nd Step : Linking

During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.


And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.

It even allows to solve undefined symbol on demand, with dl* functions.


1. The principle

When you write:

int A = myfunction(12);

This is translated to:

int A = @call(myfunction, 12);

where @call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.

2. A point on ABI

How does this @call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).

Therefore, @call actually does:

@push 12, reg0
@invoke myfunction

And the function definition knows that its first argument (x) is located in reg0.

3. But I though dictionaries were for dynamic languages ?

And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.

For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:

.undefined
[0]: myfunction

Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:

  • those which are within the library, and can be referenced through an offset (the final address is still unknown)
  • those which are outside the library, and whose address is completely unknown until runtime.

Both can be treated in the same fashion.

.dynamic
[0]: myfunction at <undefined-address>

And then the code will reference the look-up entry:

@invoke .dynamic[0]

When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).


As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:

  1. The compiler is invoked for each cpp file and translate it to an object file (binary code) with a symbol table which associates function name (names are mangled in c++) to their location in the object file.
  2. The linker is invoked only one time: whith every object file in parameter. It will resolve function call location from one object file to another thanks to symbol tables. One main() function MUST exist somewhere. Eventually a binary executable file is produced when the linker found everything it needs.

The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit). When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.

In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.

In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.