Is it safe to use standard input & output with binary data?

I need to split a binary file into two. I was wondering if head and/or tail could be used but then I wondered...is it safe to use redirection, piping etc with binary data? Do new lines get messed about with, or nulls ignored, or backspace or delete do something special? (bash, kubuntu 18.04 LTS)


Solution 1:

Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.

Solution 2:

The main problem with using commands like head or tail is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.

If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.

If you know the structure of the file, you can use a command such as

dd -if input-file -of output-file ...

with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.

It looks like the split command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count to tell it how large each piece of the file should be.


As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings command is a great way of taking a first look to see what you're dealing with.

strings -n 6 file | less

will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.

If you later have to explore the file in more detail with binary editor such as hexedit, you'll have some landmarks that point out where something interesting might be found.

strings has an option -t x that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.