What is the effect of extern "C" in C++?
What exactly does putting extern "C"
into C++ code do?
For example:
extern "C" {
void foo();
}
extern "C"
makes a function-name in C++ have C linkage (compiler does not mangle the name) so that client C code can link to (use) your function using a C compatible header file that contains just the declaration of your function. Your function definition is contained in a binary format (that was compiled by your C++ compiler) that the client C linker will then link to using the C name.
Since C++ has overloading of function names and C does not, the C++ compiler cannot just use the function name as a unique id to link to, so it mangles the name by adding information about the arguments. A C compiler does not need to mangle the name since you can not overload function names in C. When you state that a function has extern "C"
linkage in C++, the C++ compiler does not add argument/parameter type information to the name used for linkage.
Just so you know, you can specify extern "C"
linkage to each individual declaration/definition explicitly or use a block to group a sequence of declarations/definitions to have a certain linkage:
extern "C" void foo(int);
extern "C"
{
void g(char);
int i;
}
If you care about the technicalities, they are listed in section 7.5 of the C++03 standard, here is a brief summary (with emphasis on extern "C"
):
-
extern "C"
is a linkage-specification - Every compiler is required to provide "C" linkage
- A linkage specification shall occur only in namespace scope
-
All function types, function names and variable names have a language linkageSee Richard's Comment: Only function names and variable names with external linkage have a language linkage - Two function types with distinct language linkages are distinct types even if otherwise identical
- Linkage specs nest, inner one determines the final linkage
-
extern "C"
is ignored for class members - At most one function with a particular name can have "C" linkage (regardless of namespace)
-
See Richard's comment:extern "C"
forces a function to have external linkage (cannot make it static)static
insideextern "C"
is valid; an entity so declared has internal linkage, and so does not have a language linkage - Linkage from C++ to objects defined in other languages and to objects defined in C++ from other languages is implementation-defined and language-dependent. Only where the object layout strategies of two language implementations are similar enough can such linkage be achieved
Just wanted to add a bit of info, since I haven't seen it posted yet.
You'll very often see code in C headers like so:
#ifdef __cplusplus
extern "C" {
#endif
// all of your legacy C code here
#ifdef __cplusplus
}
#endif
What this accomplishes is that it allows you to use that C header file with your C++ code, because the macro "__cplusplus" will be defined. But you can also still use it with your legacy C code, where the macro is NOT defined, so it won't see the uniquely C++ construct.
Although, I have also seen C++ code such as:
extern "C" {
#include "legacy_C_header.h"
}
which I imagine accomplishes much the same thing.
Not sure which way is better, but I have seen both.
Decompile a g++
generated binary to see what is going on
main.cpp
void f() {}
void g();
extern "C" {
void ef() {}
void eg();
}
/* Prevent g and eg from being optimized away. */
void h() { g(); eg(); }
Compile and disassemble the generated ELF output:
g++ -c -std=c++11 -Wall -Wextra -pedantic -o main.o main.cpp
readelf -s main.o
The output contains:
8: 0000000000000000 7 FUNC GLOBAL DEFAULT 1 _Z1fv
9: 0000000000000007 7 FUNC GLOBAL DEFAULT 1 ef
10: 000000000000000e 17 FUNC GLOBAL DEFAULT 1 _Z1hv
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _Z1gv
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND eg
Interpretation
We see that:
-
ef
andeg
were stored in symbols with the same name as in the code -
the other symbols were mangled. Let's unmangle them:
$ c++filt _Z1fv f() $ c++filt _Z1hv h() $ c++filt _Z1gv g()
Conclusion: both of the following symbol types were not mangled:
- defined
- declared but undefined (
Ndx = UND
), to be provided at link or run time from another object file
So you will need extern "C"
both when calling:
- C from C++: tell
g++
to expect unmangled symbols produced bygcc
- C++ from C: tell
g++
to generate unmangled symbols forgcc
to use
Things that do not work in extern C
It becomes obvious that any C++ feature that requires name mangling will not work inside extern C
:
extern "C" {
// Overloading.
// error: declaration of C function ‘void f(int)’ conflicts with
void f();
void f(int i);
// Templates.
// error: template with C linkage
template <class C> void f(C i) { }
}
Minimal runnable C from C++ example
For the sake of completeness and for the newbs out there, see also: How to use C source files in a C++ project?
Calling C from C++ is pretty easy: each C function only has one possible non-mangled symbol, so no extra work is required.
main.cpp
#include <cassert>
#include "c.h"
int main() {
assert(f() == 1);
}
c.h
#ifndef C_H
#define C_H
/* This ifdef allows the header to be used from both C and C++
* because C does not know what this extern "C" thing is. */
#ifdef __cplusplus
extern "C" {
#endif
int f();
#ifdef __cplusplus
}
#endif
#endif
c.c
#include "c.h"
int f(void) { return 1; }
Run:
g++ -c -o main.o -std=c++98 main.cpp
gcc -c -o c.o -std=c89 c.c
g++ -o main.out main.o c.o
./main.out
Without extern "C"
the link fails with:
main.cpp:6: undefined reference to `f()'
because g++
expects to find a mangled f
, which gcc
did not produce.
Example on GitHub.
Minimal runnable C++ from C example
Calling C++ from C is a bit harder: we have to manually create non-mangled versions of each function we want to expose.
Here we illustrate how to expose C++ function overloads to C.
main.c
#include <assert.h>
#include "cpp.h"
int main(void) {
assert(f_int(1) == 2);
assert(f_float(1.0) == 3);
return 0;
}
cpp.h
#ifndef CPP_H
#define CPP_H
#ifdef __cplusplus
// C cannot see these overloaded prototypes, or else it would get confused.
int f(int i);
int f(float i);
extern "C" {
#endif
int f_int(int i);
int f_float(float i);
#ifdef __cplusplus
}
#endif
#endif
cpp.cpp
#include "cpp.h"
int f(int i) {
return i + 1;
}
int f(float i) {
return i + 2;
}
int f_int(int i) {
return f(i);
}
int f_float(float i) {
return f(i);
}
Run:
gcc -c -o main.o -std=c89 -Wextra main.c
g++ -c -o cpp.o -std=c++98 cpp.cpp
g++ -o main.out main.o cpp.o
./main.out
Without extern "C"
it fails with:
main.c:6: undefined reference to `f_int'
main.c:7: undefined reference to `f_float'
because g++
generated mangled symbols which gcc
cannot find.
Example on GitHub.
Where is the extern "c"
when I include C headers from C++?
- C++ versions of C headers like
cstdio
might be relying on#pragma GCC system_header
which https://gcc.gnu.org/onlinedocs/cpp/System-Headers.html mentions: "On some targets, such as RS/6000 AIX, GCC implicitly surrounds all system headers with an 'extern "C"' block when compiling as C++.", but I didn't fully confirm it. - POSIX headers like
/usr/include/unistd.h
are covered at: Do I need an extern "C" block to include standard POSIX C headers? via__BEGIN_DECLS
, reproduced on Ubuntu 20.04.__BEGIN_DECLS
is included via#include <features.h>
.
Tested in Ubuntu 18.04.
In every C++ program, all non-static functions are represented in the binary file as symbols. These symbols are special text strings that uniquely identify a function in the program.
In C, the symbol name is the same as the function name. This is possible because in C no two non-static functions can have the same name.
Because C++ allows overloading and has many features that C does not — like classes, member functions, exception specifications - it is not possible to simply use the function name as the symbol name. To solve that, C++ uses so-called name mangling, which transforms the function name and all the necessary information (like the number and size of the arguments) into some weird-looking string processed only by the compiler and linker.
So if you specify a function to be extern C, the compiler doesn't performs name mangling with it and it can be directly accessed using its symbol name as the function name.
This comes handy while using dlsym()
and dlopen()
for calling such functions.