This obfuscated C code claims to run without a main(), but what does it really do?
#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)
int begin()
{
printf("Ha HA see how it is?? ");
}
Does this indirectly call main
? how?
Solution 1:
C language define execution environment in two categories: freestanding and hosted. In both execution environment a function is called by the environment for program startup.
In a freestanding environment program startup function can be implementation defined while in hosted environment it should be main
. No program in C can run without program startup function on the defined environments.
In your case, main
is hidden by the preprocessor definitions. begin()
will expand to decode(a,n,i,m,a,t,e)
which further will be expanded to main
.
int begin() -> int decode(a,n,i,m,a,t,e)() -> int m##a##i##n() -> int main()
decode(s,t,u,m,p,e,d)
is a parameterized macro with 7 parameters. Replacement list for this macro is m##s##u##t
. m, s, u
and t
are 4th, 1st, 3rd and 2nd parameter used in the replacement list.
s, t, u, m, p, e, d
1 2 3 4 5 6 7
Rest are of no use (just to obfuscate). Argument passed to decode
is "a,n,i,m,a,t,e" so, the identifiers m, s, u
and t
are replaced with arguments m, a, i
and n
, respectively.
m --> m
s --> a
u --> i
t --> n
Solution 2:
Try using gcc -E source.c
, output ends with:
int main()
{
printf("Ha HA see how it is?? ");
}
So a main()
function is actually generated by preprocessor.
Solution 3:
The program in question does call main()
due to macro expansion, but your assumption is flawed - it doesn't have to call main()
at all!
Strictly speaking, you can have a C program and be able to compile it without having a main
symbol. main
is something that the c library
expects to jump in to, after it has finished its own initialization. Usually you jump into main
from the libc symbol known as _start
. It is always possible to have a very valid program, that simply executes assembly, without having a main. Take a look at this:
/* This must be compiled with the flag -nostdlib because otherwise the
* linker will complain about multiple definitions of the symbol _start
* (one here and one in glibc) and a missing reference to symbol main
* (that the libc expects to be linked against).
*/
void
_start ()
{
/* calling the write system call, with the arguments in this order:
* 1. the stdout file descriptor
* 2. the buffer we want to print (Here it's just a string literal).
* 3. the amount of bytes we want to write.
*/
asm ("int $0x80"::"a"(4), "b"(1), "c"("Hello world!\n"), "d"(13));
asm ("int $0x80"::"a"(1), "b"(0)); /* calling exit syscall, with the argument to be 0 */
}
Compile the above with gcc -nostdlib without_main.c
, and see it printing Hello World!
on the screen just by issuing system calls (interrupts) in inline assembly.
For more information about this particular issue, check out the ksplice blog
Another interesting issue, is that you can also have a program that compiles without having the main
symbol correspond to a C function. For instance you can have the following as a very valid C program, that only makes the compiler whine when you up the Warnings level.
/* These values are extracted from the decimal representation of the instructions
* of a hello world program written in asm, that gdb provides.
*/
const int main[] = {
-443987883, 440, 113408, -1922629632,
4149, 899584, 84869120, 15544,
266023168, 1818576901, 1461743468, 1684828783,
-1017312735
};
The values in the array are bytes that correspond to the instructions needed to print Hello World on the screen. For a more detailed account of how this specific program works, take a look at this blog post, which is where I also read it first.
I want to make one final notice about these programs. I do not know if they register as valid C programs according to the C language specification, but compiling these and running them is certainly very possible, even if they violate the specification itself.