Convert Little Endian to Big Endian

I just want to ask if my method is correct to convert from little endian to big endian, just to make sure if I understand the difference.

I have a number which is stored in little-endian, here are the binary and hex representations of the number:

‭0001 0010 0011 0100 0101 0110 0111 1000‬

‭12345678‬

In big-endian format I believe the bytes should be swapped, like this:

1000 0111 0110 0101 0100 0011 0010 0001

‭87654321

Is this correct?

Also, the code below attempts to do this but fails. Is there anything obviously wrong or can I optimize something? If the code is bad for this conversion can you please explain why and show a better method of performing the same conversion?

uint32_t num = 0x12345678;
uint32_t b0,b1,b2,b3,b4,b5,b6,b7;
uint32_t res = 0;

b0 = (num & 0xf) << 28;
b1 = (num & 0xf0) << 24;
b2 = (num & 0xf00) << 20;
b3 = (num & 0xf000) << 16;
b4 = (num & 0xf0000) << 12;
b5 = (num & 0xf00000) << 8;
b6 = (num & 0xf000000) << 4;
b7 = (num & 0xf0000000) << 4;

res = b0 + b1 + b2 + b3 + b4 + b5 + b6 + b7;

printf("%d\n", res);

Solution 1:

OP's sample code is incorrect.

Endian conversion works at the bit and 8-bit byte level. Most endian issues deal with the byte level. OP code is doing a endian change at the 4-bit nibble level. Recommend instead:

// Swap endian (big to little) or (little to big)
uint32_t num = 9;
uint32_t b0,b1,b2,b3;
uint32_t res;

b0 = (num & 0x000000ff) << 24u;
b1 = (num & 0x0000ff00) << 8u;
b2 = (num & 0x00ff0000) >> 8u;
b3 = (num & 0xff000000) >> 24u;

res = b0 | b1 | b2 | b3;

printf("%" PRIX32 "\n", res);

If performance is truly important, the particular processor would need to be known. Otherwise, leave it to the compiler.

[Edit] OP added a comment that changes things.
"32bit numerical value represented by the hexadecimal representation (st uv wx yz) shall be recorded in a four-byte field as (st uv wx yz)."

It appears in this case, the endian of the 32-bit number is unknown and the result needs to be store in memory in little endian order.

uint32_t num = 9;
uint8_t b[4];
b[0] = (uint8_t) (num >>  0u);
b[1] = (uint8_t) (num >>  8u);
b[2] = (uint8_t) (num >> 16u);
b[3] = (uint8_t) (num >> 24u);

[2016 Edit] Simplification

... The type of the result is that of the promoted left operand.... Bitwise shift operators C11 §6.5.7 3

Using a u after the shift constants (right operands) results in the same as without it.

b3 = (num & 0xff000000) >> 24u;
b[3] = (uint8_t) (num >> 24u);
// same as 
b3 = (num & 0xff000000) >> 24;
b[3] = (uint8_t) (num >> 24);

Solution 2:

Sorry, my answer is a bit too late, but it seems nobody mentioned built-in functions to reverse byte order, which in very important in terms of performance.

Most of the modern processors are little-endian, while all network protocols are big-endian. That is history and more on that you can find on Wikipedia. But that means our processors convert between little- and big-endian millions of times while we browse the Internet.

That is why most architectures have a dedicated processor instructions to facilitate this task. For x86 architectures there is BSWAP instruction, and for ARMs there is REV. This is the most efficient way to reverse byte order.

To avoid assembly in our C code, we can use built-ins instead. For GCC there is __builtin_bswap32() function and for Visual C++ there is _byteswap_ulong(). Those function will generate just one processor instruction on most architectures.

Here is an example:

#include <stdio.h>
#include <inttypes.h>

int main()
{
    uint32_t le = 0x12345678;
    uint32_t be = __builtin_bswap32(le);

    printf("Little-endian: 0x%" PRIx32 "\n", le);
    printf("Big-endian:    0x%" PRIx32 "\n", be);

    return 0;
}

Here is the output it produces:

Little-endian: 0x12345678
Big-endian:    0x78563412

And here is the disassembly (without optimization, i.e. -O0):

        uint32_t be = __builtin_bswap32(le);
   0x0000000000400535 <+15>:    mov    -0x8(%rbp),%eax
   0x0000000000400538 <+18>:    bswap  %eax
   0x000000000040053a <+20>:    mov    %eax,-0x4(%rbp)

There is just one BSWAP instruction indeed.

So, if we do care about the performance, we should use those built-in functions instead of any other method of byte reversing. Just my 2 cents.