How to grep a text file which contains some binary data?
grep -a
It can't get simpler than that.
One way is to simply treat binary files as text anyway, with grep --text
but this may well result in binary information being sent to your terminal. That's not really a good idea if you're running a terminal that interprets the output stream (such as VT/DEC or many others).
Alternatively, you can send your file through tr
with the following command:
tr '[\000-\011\013-\037\177-\377]' '.' <test.log | grep whatever
This will change anything less than a space character (except newline) and anything greater than 126, into a .
character, leaving only the printables.
If you want every "illegal" character replaced by a different one, you can use something like the following C program, a classic standard input filter:
#include<stdio.h>
int main (void) {
int ch;
while ((ch = getchar()) != EOF) {
if ((ch == '\n') || ((ch >= ' ') && (ch <= '~'))) {
putchar (ch);
} else {
printf ("{{%02x}}", ch);
}
}
return 0;
}
This will give you {{NN}}
, where NN
is the hex code for the character. You can simply adjust the printf
for whatever style of output you want.
You can see that program in action here, where it:
pax$ printf 'Hello,\tBob\nGoodbye, Bob\n' | ./filterProg
Hello,{{09}}Bob
Goodbye, Bob
You could run the data file through cat -v
, e.g
$ cat -v tmp/test.log | grep re
line1 re ^@^M
line3 re^M
which could be then further post-processed to remove the junk; this is most analogous to your query about using tr
for the task.
-v
simply tells cat
to display non-printing characters.
You can use "strings" to extract strings from a binary file, for example
strings binary.file | grep foo
You can force grep to look at binary files with:
grep --binary-files=text
You might also want to add -o
(--only-matching
) so you don't get tons of binary gibberish that will bork your terminal.