convert text file of bits to binary file
I have a file instructions.txt
with the contents:
00000000000000000000000000010011
00000010110100010010000010000011
00000000011100110000001010110011
00000000011100110000010000110011
00000000011100110110010010110011
00000000000000000000000000010011
How can I create a binary file instructions.bin
of the same data as instructions.txt
. In other words the .bin
file should be the same 192 bits that are in the .txt
file, with 32 bits per line. I am using bash on Ubuntu Linux. I was trying to use xxd -b instructions.txt
but the output is way longer than 192 bits.
oneliner to convert 32-bit strings of ones and zeros into corresponding binary:
$ perl -ne 'print pack("B32", $_)' < instructions.txt > instructions.bin
what it does:
-
perl -ne
will iterate through each line of input file provided on STDIN (instructions.txt
) -
pack("B32", $_)
will take a string list of 32 bits ($_
which we just read from STDIN), and convert it to binary value (you could alternatively use"b32"
if you wanted ascending bit order inside each byte instead of descending bit order; seeperldoc -f pack
for more details) -
print
would then output that converted value to STDOUT, which we then redirect to our binary fileinstructions.bin
verify:
$ hexdump -Cv instructions.bin
00000000 00 00 00 13 02 d1 20 83 00 73 02 b3 00 73 04 33 |...... ..s...s.3|
00000010 00 73 64 b3 00 00 00 13 |.sd.....|
00000018
$ xxd -b -c4 instructions.bin
00000000: 00000000 00000000 00000000 00010011 ....
00000004: 00000010 11010001 00100000 10000011 .. .
00000008: 00000000 01110011 00000010 10110011 .s..
0000000c: 00000000 01110011 00000100 00110011 .s.3
00000010: 00000000 01110011 01100100 10110011 .sd.
00000014: 00000000 00000000 00000000 00010011 ....
Adding the -r
option (reverse mode) to xxd -b
does not actually work as intended, because xxd simply does not support combining these two flags (it ignores -b
if both are given). Instead, you have to convert the bits to hex yourself first. For example like this:
( echo 'obase=16;ibase=2'; sed -Ee 's/[01]{4}/;\0/g' instructions.txt ) | bc | xxd -r -p > instructions.bin
Full explanation:
- The part inside the parentheses creates a
bc
script. It first sets the input base to binary (2) and the output base to hexadecimal (16). After that, thesed
command prints the contents ofinstructions.txt
with a semicolon between each group of 4 bits, which corresponds to 1 hex digit. The result is piped intobc
. - The semicolon is a command separator in
bc
, so all the script does is print every input integer back out (after base conversion). - The output of
bc
is a sequence of hex digits, which can be converted to a file with the usualxxd -r -p
.
Output:
$ hexdump -Cv instructions.bin
00000000 00 00 00 13 02 d1 20 83 00 73 02 b3 00 73 04 33 |...... ..s...s.3|
00000010 00 73 64 b3 00 00 00 13 |.sd.....|
00000018
$ xxd -b -c4 instructions.bin
00000000: 00000000 00000000 00000000 00010011 ....
00000004: 00000010 11010001 00100000 10000011 .. .
00000008: 00000000 01110011 00000010 10110011 .s..
0000000c: 00000000 01110011 00000100 00110011 .s.3
00000010: 00000000 01110011 01100100 10110011 .sd.
00000014: 00000000 00000000 00000000 00010011 ....
My original answer was incorrect - xxd
cannot accept either -p
or -r
with -b
...
Given that the other answers are workable, and in the interest of "another way", how about the following:
Input
$ cat instructions.txt
00000000000000000000000000010011
00000010110100010010000010000011
00000000011100110000001010110011
00000000011100110000010000110011
00000000011100110110010010110011
00000000000000000000000000010011
Output
$ hexdump -Cv < instructions.bin
00000000 00 00 00 13 02 d1 20 83 00 73 02 b3 00 73 04 33 |...... ..s...s.3|
00000010 00 73 64 b3 00 00 00 13 |.sd.....|
00000018
Bash pipeline:
cat instructions.txt \
| tr -d $'\n' \
| while read -N 4 nibble; do
printf '%x' "$((2#${nibble}))"; \
done \
| xxd -r -p \
> instructions.bin
-
cat
- unnecessary, but used for clarity -
tr -d $'\n'
- remove all newlines from the input -
read -N 4 nibble
- read exactly 4× characters into thenibble
variable -
printf '%x' "$((2#${nibble}))"
convert the nibble from binary to 1× hex character-
$((2#...))
- convert the given value from base 2 (binary) to base 10 (decimal) -
printf '%x'
- format the given value from base 10 (decimal) to base 16 (hexadecimal)
-
-
xxd -r -p
- reverse (-r
) a plain dump (-p
) - from hexadecimal to raw binary
Python:
python << EOF > instructions.bin
d = '$(cat instructions.txt | tr -d $'\n')'
print(''.join([chr(int(d[i:i+8],2)) for i in range(0, len(d), 8)]))
EOF
- An unquoted heredoc (
<< EOF
) is used to get content into the Python code- This is not efficient if the input becomes large
-
cat
andtr
- used to get a clean (one-line) input -
range(0, len(d), 8)
- get a list of numbers from 0 to the end of the stringd
, stepping 8× characters at a time. -
chr(int(d[i:i+8],2))
- convert the current slice (d[i:i+8]
) from binary to decimal (int(..., 2)
), and then to a raw character (chr(...)
) -
[ x for y in z]
- list comprehension -
''.join(...)
- convert the list of characters into a single string -
print(...)
- print it