Difference between Big Endian and little Endian Byte order

What is the difference between Big Endian and Little Endian Byte order ?

Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?

Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234 as a string of bytes (0x00-0xFF):

Byte Index:      0  1
---------------------
Big-Endian:     12 34
Little-Endian:  34 12

In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE, FF, the encoding is UTF-16BE. For FF, FE, it is UTF-16LE.

A visual example: The word "Example" in different encodings (UTF-16 with BOM):

Byte Index:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
------------------------------------------------------------
ASCII:       45 78 61 6d 70 6c 65
UTF-16BE:    FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65
UTF-16LE:    FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00

For further information, please read the Wikipedia page of Endianness and/or UTF-16.

Ferdinand's answer (and others) are correct, but incomplete.

Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32. They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.

If you have a number with the value 0x12345678 then in memory it will be represented as 12 34 56 78 (BE) or 78 56 34 12 (LE).

UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.

UTF-16 encodes Unicode into 16-bit values. Most modern filesystems operate on 8-bit bytes. So, to save a UTF-16 encoded file to disk, for example, you have to decide which part of the 16-bit value goes in the first byte, and which goes into the second byte.

Wikipedia has a more complete explanation.

little-endian: adj.

Describes a computer architecture in which, within a given 16- or 32-bit word, bytes at lower addresses have lower significance (the word is stored ‘little-end-first’). The PDP-11 and VAX families of computers and Intel microprocessors and a lot of communications and networking hardware are little-endian. The term is sometimes used to describe the ordering of units other than bytes; most often, bits within a byte.

big-endian: adj.

[common; From Swift's Gulliver's Travels via the famous paper On Holy Wars and a Plea for Peace by Danny Cohen, USC/ISI IEN 137, dated April 1, 1980]

Describes a computer architecture in which, within a given multi-byte numeric representation, the most significant byte has the lowest address (the word is stored ‘big-end-first’). Most processors, including the IBM 370 family, the PDP-10, the Motorola microprocessor families, and most of the various RISC designs are big-endian. Big-endian byte order is also sometimes called network order.

---from the Jargon File: http://catb.org/~esr/jargon/html/index.html

Difference between Big Endian and little Endian Byte order

Related

Recent Posts