convert text file of bits to binary file

I have a file instructions.txt with the contents:

00000000000000000000000000010011
00000010110100010010000010000011
00000000011100110000001010110011
00000000011100110000010000110011
00000000011100110110010010110011
00000000000000000000000000010011

How can I create a binary file instructions.bin of the same data as instructions.txt. In other words the .bin file should be the same 192 bits that are in the .txt file, with 32 bits per line. I am using bash on Ubuntu Linux. I was trying to use xxd -b instructions.txt but the output is way longer than 192 bits.


oneliner to convert 32-bit strings of ones and zeros into corresponding binary:

$ perl -ne 'print pack("B32", $_)' < instructions.txt > instructions.bin

what it does:

  • perl -ne will iterate through each line of input file provided on STDIN (instructions.txt)
  • pack("B32", $_) will take a string list of 32 bits ($_ which we just read from STDIN), and convert it to binary value (you could alternatively use "b32" if you wanted ascending bit order inside each byte instead of descending bit order; see perldoc -f pack for more details)
  • print would then output that converted value to STDOUT, which we then redirect to our binary file instructions.bin

verify:

$ hexdump -Cv instructions.bin
00000000  00 00 00 13 02 d1 20 83  00 73 02 b3 00 73 04 33  |...... ..s...s.3|
00000010  00 73 64 b3 00 00 00 13                           |.sd.....|
00000018

$ xxd -b -c4 instructions.bin
00000000: 00000000 00000000 00000000 00010011  ....
00000004: 00000010 11010001 00100000 10000011  .. .
00000008: 00000000 01110011 00000010 10110011  .s..
0000000c: 00000000 01110011 00000100 00110011  .s.3
00000010: 00000000 01110011 01100100 10110011  .sd.
00000014: 00000000 00000000 00000000 00010011  ....

Adding the -r option (reverse mode) to xxd -b does not actually work as intended, because xxd simply does not support combining these two flags (it ignores -b if both are given). Instead, you have to convert the bits to hex yourself first. For example like this:

( echo 'obase=16;ibase=2'; sed -Ee 's/[01]{4}/;\0/g' instructions.txt ) | bc | xxd -r -p > instructions.bin

Full explanation:

  • The part inside the parentheses creates a bc script. It first sets the input base to binary (2) and the output base to hexadecimal (16). After that, the sed command prints the contents of instructions.txt with a semicolon between each group of 4 bits, which corresponds to 1 hex digit. The result is piped into bc.
  • The semicolon is a command separator in bc, so all the script does is print every input integer back out (after base conversion).
  • The output of bc is a sequence of hex digits, which can be converted to a file with the usual xxd -r -p.

Output:

$ hexdump -Cv instructions.bin
00000000  00 00 00 13 02 d1 20 83  00 73 02 b3 00 73 04 33  |...... ..s...s.3|
00000010  00 73 64 b3 00 00 00 13                           |.sd.....|
00000018
$ xxd -b -c4 instructions.bin
00000000: 00000000 00000000 00000000 00010011  ....
00000004: 00000010 11010001 00100000 10000011  .. .
00000008: 00000000 01110011 00000010 10110011  .s..
0000000c: 00000000 01110011 00000100 00110011  .s.3
00000010: 00000000 01110011 01100100 10110011  .sd.
00000014: 00000000 00000000 00000000 00010011  ....

My original answer was incorrect - xxd cannot accept either -p or -r with -b...

Given that the other answers are workable, and in the interest of "another way", how about the following:

Input

$ cat instructions.txt
00000000000000000000000000010011
00000010110100010010000010000011
00000000011100110000001010110011
00000000011100110000010000110011
00000000011100110110010010110011
00000000000000000000000000010011

Output

$ hexdump -Cv < instructions.bin
00000000  00 00 00 13 02 d1 20 83  00 73 02 b3 00 73 04 33  |...... ..s...s.3|
00000010  00 73 64 b3 00 00 00 13                           |.sd.....|
00000018

Bash pipeline:

cat instructions.txt \
    | tr -d $'\n' \
    | while read -N 4 nibble; do 
        printf '%x' "$((2#${nibble}))"; \
      done \
    | xxd -r -p \
    > instructions.bin
  • cat - unnecessary, but used for clarity
  • tr -d $'\n' - remove all newlines from the input
  • read -N 4 nibble - read exactly 4× characters into the nibble variable
  • printf '%x' "$((2#${nibble}))" convert the nibble from binary to 1× hex character
    • $((2#...)) - convert the given value from base 2 (binary) to base 10 (decimal)
    • printf '%x' - format the given value from base 10 (decimal) to base 16 (hexadecimal)
  • xxd -r -p - reverse (-r) a plain dump (-p) - from hexadecimal to raw binary

Python:

python << EOF > instructions.bin
d = '$(cat instructions.txt | tr -d $'\n')'
print(''.join([chr(int(d[i:i+8],2)) for i in range(0, len(d), 8)]))
EOF
  • An unquoted heredoc (<< EOF) is used to get content into the Python code
    • This is not efficient if the input becomes large
  • cat and tr - used to get a clean (one-line) input
  • range(0, len(d), 8) - get a list of numbers from 0 to the end of the string d, stepping 8× characters at a time.
  • chr(int(d[i:i+8],2)) - convert the current slice (d[i:i+8]) from binary to decimal (int(..., 2)), and then to a raw character (chr(...))
  • [ x for y in z] - list comprehension
  • ''.join(...) - convert the list of characters into a single string
  • print(...) - print it