Difference between Big Endian and little Endian Byte order
What is the difference between Big Endian and Little Endian Byte order ?
Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?
Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234
as a string of bytes (0x00-0xFF):
Byte Index: 0 1
---------------------
Big-Endian: 12 34
Little-Endian: 34 12
In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE
, FF
, the encoding is UTF-16BE. For FF
, FE
, it is UTF-16LE.
A visual example: The word "Example" in different encodings (UTF-16 with BOM):
Byte Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
------------------------------------------------------------
ASCII: 45 78 61 6d 70 6c 65
UTF-16BE: FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65
UTF-16LE: FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00
For further information, please read the Wikipedia page of Endianness and/or UTF-16.
Ferdinand's answer (and others) are correct, but incomplete.
Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32. They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.
If you have a number with the value 0x12345678
then in memory it will be represented as 12 34 56 78
(BE) or 78 56 34 12
(LE).
UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.
UTF-16 encodes Unicode into 16-bit values. Most modern filesystems operate on 8-bit bytes. So, to save a UTF-16 encoded file to disk, for example, you have to decide which part of the 16-bit value goes in the first byte, and which goes into the second byte.
Wikipedia has a more complete explanation.
little-endian: adj.
Describes a computer architecture in which, within a given 16- or 32-bit word, bytes at lower addresses have lower significance (the word is stored ‘little-end-first’). The PDP-11 and VAX families of computers and Intel microprocessors and a lot of communications and networking hardware are little-endian. The term is sometimes used to describe the ordering of units other than bytes; most often, bits within a byte.
big-endian: adj.
[common; From Swift's Gulliver's Travels via the famous paper On Holy Wars and a Plea for Peace by Danny Cohen, USC/ISI IEN 137, dated April 1, 1980]
Describes a computer architecture in which, within a given multi-byte numeric representation, the most significant byte has the lowest address (the word is stored ‘big-end-first’). Most processors, including the IBM 370 family, the PDP-10, the Motorola microprocessor families, and most of the various RISC designs are big-endian. Big-endian byte order is also sometimes called network order.
---from the Jargon File: http://catb.org/~esr/jargon/html/index.html