Why is a whitespace character only represented by 6 bits in ASCII?
I have written a code in python to represent strings using their ASCII counterparts. I have noticed that every character is replaced by 7 bits (as I expected). The problem is that every time I include a space in the string I am converting it is only represented by 6 bits instead of 7. This is a bit of a problem for a Vernam Cipher program I am writing where my ASCII code is always a few bits smaller than my key due to spaces. Here is the code and output below:
string = 'Hello t'
ASCII = ""
for c in string:
ASCII += bin(ord(c))
ASCII = ASCII.replace('0b', ' ')
print(ASCII)
Output: 1001000 1100101 1101100 1101100 1101111 100000 1110100
As can be seen in the output the 6th sequence of bits which represents the space character has only 6 bits and not 7 like the rest of the characters.
Instead of bin(ord(c))
, which will automatically strip leading bits, use string formatting to ensure a minimum width:
f'{ord(c):07b}'
The problem lies within your "conversion" - the value for whitespace happens to only need 6 bits, and the bin
built-in simply don't do left padding with zeros. That is why you are getting 7 bits for other chars - but it would really be more confortable if you would use 8 bits for everything.
One way to go is, instead of using the bin
call, use string formatting operators: these can, besides the base conversion, pad the missing bits with 0s:
string = 'Hello t'
# agregating values in a list so that you can easily separate the binary strings with a " "
# by using ".join"
bin_strings = []
for code in string.encode("ASCII"): # you really should do this from bytes -
#which are encoded text. Moreover, iterating a bytes
# object yield 0-255 ints, no need to call "ord"
bin_strings.append(f"{code:08b}") # format the number in `code` in base 2 (b), with 8 digits, padding with 0s
ASCII = ' '.join(bin_strings)
Or, as a oneliner:
ASCII = " ".join(f"{code:08b}" for code in "Hello World".encode("ASCII"))