This source code is switching on a string in C. How does it do that?

I'm reading through some emulator code and I've countered something truly odd:

switch (reg){
    case 'eax':
    /* and so on*/
}

How is this possible? I thought you could only switch on integral types. Is there some macro trickery going on?


Solution 1:

(Only you can answer the "macro trickery" part - unless you paste up more code. But there's not much here for macros to work on - formally you are not allowed to redefine keywords; the behaviour on doing that is undefined.)

In order to achieve program readability, the witty developer is exploiting implementation defined behaviour. 'eax' is not a string, but a multi-character constant. Note very carefully the single quotation characters around eax. Most likely it is giving you an int in your case that's unique to that combination of characters. (Quite often each character occupies 8 bits in a 32 bit int). And everyone knows you can switch on an int!

Finally, a standard reference:

The C99 standard says:

6.4.4.4p10: "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."

Solution 2:

According to the C Standard (6.8.4.2 The switch statement)

3 The expression of each case label shall be an integer constant expression...

and (6.6 Constant expressions)

6 An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.

Now what is 'eax'?

The C Standard (6.4.4.4 Character constants)

2 An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'...

So 'eax' is an integer character constant according to the paragraph 10 of the same section

  1. ...The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

So according to the first mentioned quote it can be an operand of an integer constant expression that may be used as a case label.

Pay attention to that a character constant (enclosed in single quotes) has type int and is not the same as a string literal (a sequence of characters enclosed in double quotes) that has a type of a character array.

Solution 3:

As other have said, this is an int constant and its actual value is implementation-defined.

I assume the rest of the code looks something like

if (SOMETHING)
    reg='eax';
...
switch (reg){
    case 'eax':
    /* and so on*/
}

You can be sure that 'eax' in the first part has the same value as 'eax' in the second part, so it all works out, right? ... wrong.

In a comment @Davislor lists some possible values for 'eax':

... 0x65, 0x656178, 0x65617800, 0x786165, 0x6165, or something else

Notice the first potential value? That is just 'e', ignoring the other two characters. The problem is the program probably uses 'eax', 'ebx', and so on. If all these constants have the same value as 'e' you end up with

switch (reg){
    case 'e':
       ...
    case 'e':
       ...
    ...
}

This doesn't look too good, does it?

The good part about "implementation-defined" is that the programmer can check the documentation of their compiler and see if it does something sensible with these constants. If it does, home free.

The bad part is that some other poor fellow can take the code and try to compile it using some other compiler. Instant compile error. The program is not portable.

As @zwol pointed out in the comments, the situation is not quite as bad as I thought, in the bad case the code doesn't compile. This will at least give you an exact file name and line number for the problem. Still, you will not have a working program.