Does buffer overflow change data type of variable it is overwriting? [closed]

Say I have a C character array char buf[15]. Say variable int set_me = 0 has its data stored in a memory location directly after char buf[15]. If I overflowed buf with string "aaabbbcccdddeee\xef\xbe\xad\xde", would set_me's data type change from an integer to a character array?


Solution 1:

No.

The "data type" of a variable is only relevant in source code (and even then only in some languages). It tells the compiler how to treat the variable.

These high-level data types do not exist as such in compiled (native) code. They can affect what instructions a compiler generates, but the instructions themselves don't care if the data represents a character or a number.


Variables do not exist in hardware. In hardware, you have memory locations and the instructions that operate on them.

A variable could be seen as a view of the data at a memory location — if you squint and look at the same memory slightly differently (a different variable with different type referring to the same location), the same binary value can have a different meaning.

For example, the byte 0x41 could be interpreted as the UTF-8-encoded character A. It could also be interpreted as the single-byte integer 65. It could also be interpreted as one byte in a multi-byte integer or floating point number, or one byte in a multi-byte character encoding. It could be the bitset 0b1000001. All from the same byte in the same memory location. In the C language, you can see this effect by casting to these different types.

When you have a "buffer overflow", you're doing something outside the bounds of what your compiler or language might expect. But, as far as the hardware is concerned1, you are writing bytes (whether single or multiple) to a memory location. A memory location does not have a "type". In fact, the hardware doesn't even know that any particular set of bytes makes an array or buffer in your code.

Wherever you next access that memory location in your code, the instructions will run as originally defined. e.g. if they were expecting a number there, they will act on whatever bytes of data as if they were a number.


To use your example, assuming your int is a signed 4-byte (32-bit) integer:

+-------------+--------------------------------------------+-----------+
| Source code |                  char[15]                  |    int    |
+-------------+--------------------------------------------------------+
| Memory      |61|61|61|62|62|62|63|63|63|64|64|64|65|65|65|EF|BE|AD|DE|
+-------------+--------------------------------------------------------+

You can see that the int's memory location now contains 0xEFBEADDE, assuming a big-endian system2. This is the signed 32-bit int -272716322. Now, if you interpret the same memory as an unsigned int (uint), it would be 4022250974 instead. For exactly the same data in memory, the meaning depends entirely on how you view it.


1 There are some mechanisms which prevent you from writing into protected regions of memory, and will crash your program if you attempt to do so.

2 x86 is actually little-endian, which means you interpret the bytes making up a larger value backwards. So on x86 you would instead have 0xDEADBEEF, giving signed -559038737 or unsigned 3735928559.

Solution 2:

From a C perspective, the answer would be "Who knows? It's Undefined Behavior".

Types are a C concept, not hardware. But the C rules don't apply if your program has Undefined Behavior, that is the literal meaning of Undefined Behavior in the C standard. And buffer overflows are one form of that.

I initially wrote "the C rules no longer apply", but in fact the Undefined Behavior is retroactive. C rules don't apply to a program that will have Undefined Behavior in the future.