How can a program with a global variable called main instead of a main function work?
Before going into the meat of the question about what is going on, it is important to point out that program is ill-formed as per defect report 1886: Language linkage for main():
[...] A program that declares a variable main at global scope or that declares the name main with C language linkage (in any namespace) is ill-formed. [...]
The most recent versions of clang and gcc makes this an error and the program will not compile (see gcc live example):
error: cannot declare '::main' to be a global variable
int main = ( std::cout << "C++ is excellent!\n", 195 );
^
So why was there no diagnostic in older versions of gcc and clang? This defect report did not even have a proposed resolution until late 2014 and so this case was only very recently explicitly ill-formed, which requires a diagnostic.
Prior to this, it seems like this would be undefined behavior since we are violating a shall requirement of the draft C++ standard from section 3.6.1
[basic.start.main]:
A program shall contain a global function called main, which is the designated start of the program. [...]
Undefined behavior is unpredictable and does not require a diagnostic. The inconsistency we see with reproducing the behavior is typical undefined behavior.
So what is the code actually doing and why in some cases does it produce results? Let's see what we have:
declarator
| initializer----------------------------------
| | |
v v v
int main = ( std::cout << "C++ is excellent!\n", 195 );
^ ^ ^
| | |
| | comma operator
| primary expression
global variable of type int
We have main
which is an int declared in the global namespace and is being initialized, the variable has static storage duration. It is implementation defined whether the initialization will take place before an attempt to call main
is made but it appears gcc does do this before calling main
.
The code uses the comma operator, the left operand is a discarded value expression and is used here solely for the side effect of calling std::cout
. The result of the comma operator is the right operand which in this case is the prvalue 195
which is assigned to the variable main
.
We can see sergej points out the generated assembly shows that cout
is called during static initialization. Although the more interesting point for discussion see live godbolt session would be this:
main:
.zero 4
and the subsequent:
movl $195, main(%rip)
The likely scenario is that the program jumps to the symbol main
expecting valid code to be there and in some cases will seg-fault. So if that is the case we would expect storing valid machine code in the variable main
could lead to workable program, assuming we are located in a segment that allows code execution. We can see this 1984 IOCCC entry does just that.
It appears we can get gcc to do this in C using (see it live):
const int main = 195 ;
It seg-faults if the variable main
is not const presumably because it is not located in an executable location, Hat Tip to this comment here which gave me this idea.
Also see FUZxxl answer here to a C specific version of this question.
From 3.6.1/1:
A program shall contain a global function called main, which is the designated start of the program. It is implementation defined whether a program in a freestanding environment is required to define a main function.
From this it looks like g++ happens to allow a program (presumably as the "freestanding" clause) without a main function.
Then from 3.6.1/3:
The function main shall not be used (3.2) within a program. The linkage (3.5) of main is implementation defined. A program that declares main to be inline or static is illformed. The name main is not otherwise reserved.
So here we learn that it's perfectly fine to have an integer variable named main
.
Finally if you're wondering why the output is printed, the initialization of the int main
uses the comma operator to execute cout
at static init and then provide an actual integral value to do the initialization.
gcc 4.8.1 generates the following x86 assembly:
.LC0:
.string "C++ is excellent!\n"
subq $8, %rsp #,
movl std::__ioinit, %edi #,
call std::ios_base::Init::Init() #
movl $__dso_handle, %edx #,
movl std::__ioinit, %esi #,
movl std::ios_base::Init::~Init(), %edi #,
call __cxa_atexit #
movl $.LC0, %esi #,
movl std::cout, %edi #,
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) #
movl $195, main(%rip) #, main
addq $8, %rsp #,
ret
main:
.zero 4
Note that cout
is called during initialization, not in the main
function!
.zero 4
declares 4 (0-initialized) bytes starting at location main
,
where main
is the name of the variable[!].
The main
symbol is interpreted as the start of the program.
The behavior depends on the platform.
That is an ill-formed program. It crashes on my test environment, cygwin64/g++ 4.9.3.
From the standard:
3.6.1 Main function [basic.start.main]
1 A program shall contain a global function called main, which is the designated start of the program.
The reason I believe this works is that the compiler does not know it is compiling the main()
function so it compiles a global integer with assignment side-effects.
The object format that this translation-unit is compiled into is not capable of differentiating between a function symbol and a variable symbol.
So the linker happily links to the (variable) main symbol and treats it like a function call. But not until the runtime system has run the global variable initialization code.
When I ran the sample it printed out but then it caused a seg-fault. I assume that's when the runtime system tried to execute an int variable as if it were a function.