Why do we use hex so much, when there are enough letters to use base 32 instead? [closed]

You can have 0-255 in hex, stored in 2 characters, so it kind of compresses the data, and is used for all sorts of things including colour, IP and MAC addresses.

My question is why did they stop at 16 bit (or why is that most commonly used)? There are enough letters in the alphabet for 32 bit, which would give a range of 0-65536 contained within the same amount of space, potentially allowing for 280 trillion colours as opposed to just 16 million. If you make the letters case sensitive and add two symbols, you could go to 64 bit, allowing up to 4.3 billion values to be represented by the two characters.


Some examples of situations I think this would work:

IPv4 is running out. I know v6 is being rolled in, but it's very long and will be hard to remember. Take the 192.168.0.1 address, it can also be stored as C0.A8.0.1. Using 64 bit hex but still keeping it to a maximum of 8 characters, you can have 280 trillion combinations instead of 4 billion, and we wouldn't have this problem.

As mentioned above, it also provides a much larger range of colours. The RAW photo format records at 32 bits per colour channel instead of 8, with the downside of a huge increase to the file size. If the RGB values were stored as hex, there should be no change in the size of the file as you increase the range of colours, as it would still be stored within 6 bits per pixel, but with a higher base number. Instead, it's recorded as numerical values at 96 bits per pixel, which is a very unnecessary increase of 1600%, leaving photos at over 20MB (and according to an online calculator, 4K RAW video at 32 bits of colour could go up to 2.5GB per second).


This part isn't really to do with the question, but I wrote a script a while back which can convert the numbers to different base values, ranging from binary to base 88 (ran out of symbols after that), which shows it's easily possible to implement similar things. As an example, here's the output from 66000.
Base 2: 11111111111110000
Base 16: 101D0
Base 32: 20EG
Base 64: G7G
The code is here if anyone is interested, it still has a few bugs though and I've only tried it from within Maya. A bit off topic, but I've also just noticed that normal hex seems to be around 20% fewer bits than the original number, and base 88 is almost a 50% reduction.


One final question: Has anyone attempted my idea of storing photos as hex? Would it potentially work if you used 64 bit hex, and stored the photos with data like [64;1920;Bgh54D;NgDFF4;...]? If not, I might try create something which can do that.


Solution 1:

If I am reading the question correctly, you are saying that the data 'shrinks' when you use larger bases, when in fact it doesn't.

Take your own example: Base 2: 11111111111110000 Base 16: 101D0 Base 32: 20EG Base 64: G7G

We would use 101D0 for that, because hex is standard. What would happen if we used base 64 notation?

The answer is: essentially nothing, since you are still storing and processing the data in bits in your device. Even if you say you have G7G instead of 101D0, you are still storing and working with 11111111111110000 in your device. Imagine you have the number 5. If you put that in binary it would be 101. 101 has 3 digits and 5 has one, and this does not mean 5 is more compressed than 101, since you would still be storing the number as 0101 on your computer.

Just to keep with your examples, the IPv6 thing, or MAC addresses (for this example they are just the same thing, strings of two digits separated by dots).

We have, in hex, 00:00:FF:01:01. That is how you would regularly express it. This translates in binary as 0000 0000 0000 0000 1111 1111 0000 0001 0000 0001 (You are probably starting to see why we use hex now). This is easy, because since 16=2^4, you can convert one hex digit as 4 binary digits and just put the result together to get the actual binary string. In your base 64 system, if we had something like GG:HH:01:02:03, each letter would translate to 6 bits.

What is the problem with this then? The fact that computers work internally with powers of two. They don't really care about the notation you are using. In CPU registers, memory and other devices, you will never see data divided in groups of 6 bits.

TL;DR: Hexadecimal is just a notation to help us humans see binary things easier since a byte can be expressed as two characters (0-F), what is stored and processed in the computer is the same no matter the notation you use to read it.

Solution 2:

Hexadecimal literally means 16. ;)

But aside from the snarky answer, hexadecimal (or any other power-of-2 base numbering system) is simply a more compact format for representing binary data. At the lowest level, the values are still represented numerically by bits. At the lowest level, these bits are broken down into chunks that the hardware architecture can handle easily.

Keep in mind that hexadecimal numbers are not represented as characters 0-9 and a-f--they are literally stored as bits. Each "digit" is not, as you suggest, encoded as an 8-bit character 0-255, where only the first 16 values in the system are being used.

Let's compare compare the base 2 and base 64 representations in your example.

base2: 11111111111110000 --> 17 "digits" with 1 bit per digit = 17 bits
base64: G7G --> need 3 "digits" with 6 bits per digit = 18 bits

Now consider a base64 encoding where each "digit" is actually represented by an 8-bit character. You still have G7G, but now each "digit" requires 8 bits.

G7G --> 3 "digits" with 8 bits per digit --> 24 bits

Even in this oversimplified example, if you use base64 to represent everything, you could have a lot more slack (wasted) space than a numbering system that allocates space in smaller chunks.

As I said, the previous example is an oversimplification and assumes you are only dealing with unsigned numbers (i.e., no negative numbers). In reality, the data will be stored in "words" whose size can vary depending on the hardware architecture. If you have an 8-bit word, you must assign values in chunks of 8 bits, so the 17-bit value now requires 24 bits to store.

So although it is trivial to use any power-of-two base numbering system as you suggest, it just isn't common. This may be because popular modern computer architectures arose out of 16-bit architectures where hexadecimal was literally the hardware's native language.

Solution 3:

Hex seems to be a pretty good compromise between binary and decimal.

  • It's easy to convert to binary just by looking at it.

  • Easy to read, write, and communicate verbally if needed. Imagine trying to tell someone a base64-encoded string over the phone.

  • Single board computers in the 70's and 80's used to have 7-segment LEDs and no other display mechanism out of the box. Fortunately, A, B, C, D, E, and F all can be rendered in one of those.

Of course, when we talk about 64-bit, 128-bit, and larger quantities, or things like hashes, it's not easy to communicate in hex, decimal, or anything really. To me, the "heyday" of hexadecimal was when 8 and 16-bit CPUs were commonplace, and also when low-level programming was more commonplace because it was more necessary. I could be wrong.

I'm not sure hex is in common use except to express the address of pointers in C/C++. I guess hex is used out of habit or tradition, here, and has also come to be a signal that something is a "raw binary" value and not really any "type."

Has anyone attempted my idea of storing photos as hex?

Any file, no matter what its type or contents, is a big chunk of bytes. It's already in binary. Hex is just a (very minimally) human-friendly view of that.

If you want to look at the bytes of a file in hex format, there's a plethora of hex editors and viewers that will do that.

If you are proposing to store a photo as a text file containing a list of hex numbers, I guess you could do that if you want, but it's going to be larger and slower to process than the original file.