How to represent FLOAT number in memory in C

c floating-point

While reading a tutorial I came across how to represent Float number in memory. The tutorial had an example with a floating point number.

   float a=5.2  with below Diagram

enter image description here

Can anyone please tell how this 5.2 is converted in to binary and how it is represented in memory in above the above diagram?

As was said, 5.2 is represented as a sign bit, an exponent and a mantissa. How do you encode 5.2?

5 is easy:

101.

The rest, 0.2 is 1/5, so divide 1.00000... (hex) by 5 and you get 0.3333333... (hex).

(This can be followed more easily if you consider one bit less: 0.FFFF... → F / 5 = 3, so it is easy to see that 0.FFFF... / 5 = 0.33333.... That one missing bit doesn't matter when dividing by 5, so 1.0000... / 5 = 0.3333... too).

That should give you

0.0011001100110011001100110011...

Add 5, and you get

101.00110011001100110011...         exp 0    (== 5.2 * 2^0)

Now shift it right (normalize it, i.e. make sure the top bit is just before the decimal point) and adjust the exponent accordingly:

1.010011001100110011001100110011... exp +2   (== 1.3 * 2^2 == 5.2)

Now you only have to add the bias of 127 (i.e. 129 = 0b10000001) to the exponent and store it:

0 10000001 1010 0110 0110 0110 0110 0110

Forget the top 1 of the mantissa (which is always supposed to be 1, except for some special values, so it is not stored), and you get:

01000000 10100110 01100110 01100110

Now you only have to decide little or big endian.

This is not exactly how it works, but that is more or less what happens when a number like 5.2 is converted to binary.

I think the diagram is not one hundret percent correct.

Floats are stored in memory as follows:

They are decomposed into:

sign s (denoting whether it's positive or negative) - 1 bit
mantissa m (essentially the digits of your number - 24 bits
exponent e - 7 bits

Then, you can write any number x as s * m * 2^e where ^ denotes exponentiation.

5.2 should be represented as follows:

0 10000001 01001100110011001100110    
S    E               M

S=0 denotes that it is a positive number, i.e. s=+1

E is to be interpreted as unsigned number, thus representing 129. Note that you must subtract 127 from E to obtain the original exponent e = E - 127 = 2

M must be interpreted the following way: It is interpreted as a number beginning with a 1 followed by a point (.) and then digits after that point. The digits after . are the ones that are actually coded in m. We introduce weights for each digit:

bits in M: 0   1    0     0      1       ... 
weight:    0.5 0.25 0.125 0.0625 0.03125 ... (take the half of the previous in each step)

Now you sum up the weights where the corresponding bits are set. After you've done this, you add 1 (due to normalization in the IEEE standard, you always add 1 for interpreting M) and obtain the original m.

Now, you compute x = s * m * 2^e and get your original number.

So, the only thing left is that in real memory, bytes might be in reverse order. That is why the number may not be stored as follows:

0 10000001 01001100110011001100110    
S    E               M

but more the other way around (simply take 8-bit blocks and mirror their order)

01100110 01100110 10100110 01000000
MMMMMMMM MMMMMMMM EMMMMMMM SEEEEEEE

The value is represented in memory in reverse order, but the confusing point may be that 5.2f is really represented as 5.1999998 due to the accuracy loss of the floating point values.

How to represent FLOAT number in memory in C

Related

Recent Posts