When does Endianness become a factor?

Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.

Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.

  • If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?

  • If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?

    // Sender:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        stl_bitset[n] = (value >> n) & 1;
    };
    
    // Receiver:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        value |= uint16(stl_bitset[n] & 1) << n;
    };
    
  • Is there a standard way of compensating for endianness?

Thanks in advance!


Solution 1:

Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.

Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialize data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.

To wit - writing:

uint32_t n = get_number();

unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 };  // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n };  // big-endian order

write(bytes..., 4);

Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.

And reading:

unsigned char buf[4] = read_data();

uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian

Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.


As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.

Solution 2:

For the record, if you're transferring data between devices you should pretty much always use network-byte-ordering with ntohl, htonl, ntohs, htons. It'll convert to the network byte order standard for Endianness regardless of what your system and the destination system use. Of course, both systems shoud be programmed like this - but they usually are in networking scenarios.

Solution 3:

  1. No, though you do have the right general idea. What you're missing is the fact that even though it's normally a serial connection, a network connection (at least most network connections) still guarantees correct endianness at the octet (byte) level -- i.e., if you send a byte with a value of 0x12 on a little endian machine, it'll still be received as 0x12 on a big endian machine.

    Looking at a short, if you look at the number in hexadecimal,it'l probably help. It starts out as 0x0001. You break it into two bytes: 0x00 0x01. Upon receipt, that'll be read as 0x0100, which turns out to be 256.

  2. Since the network deals with endianess at the octet level, you normally only have to compensate for the order of bytes, not bits within bytes.

  3. Probably the simplest method is to use htons/htonl when sending, and ntohs/ntohl when receiving. When/if that's not sufficient, there are many alternatives such as XDR, ASN.1, CORBA IIOP, Google protocol buffers, etc.

Solution 4:

The "standard way" of compensating is that the concept of "network byte order" has been defined, almost always (AFAIK) as big endian.

Senders and receivers both know the wire protocol, and if necessary will convert before transmitting and after receiving, to give applications the right data. But this translation happens inside your networking layer, not in your applications.

Solution 5:

Both endianesses have an advantage that I know of:

  1. Big-endian is conceptually easier to understand because it's similar to our positional numeral system: most significant to least significant.
  2. Little-endian is convenient when reusing a memory reference for multiple memory sizes. Simply put, if you have a pointer to a little-endian unsigned int* but you know the value stored there is < 256, you can cast your pointer to unsigned char*.