use sed to replace nbsp, 160, Hex 00a0, Octal 240, non-breaking space

I am having some problems with config files which have the non-breaking space chars in them.

How should I specify that character with sed so I can replace it with a space.

sed -n 's/ / /g'

examples of the errors

service named restart
Stopping named:                                            [  OK  ]
Starting named: 
Error in named configuration:
named.localhost:2: unknown RR type 'SOA '
named.localhost:8: unknown RR type '@'
named.localhost:9: unknown RR type '127.0.0.1'
named.localhost:10: unknown RR type '::1'
.....

I tried to include a line form the original offending file in this post. It does not seams to be working. Pastebin download seams to be the only tool that keeps all the original binary. http://pastebin.com/ZqT1EWbS. You should be able to copy and past the original line and have it work in your terminal.


The answer to this question depends on which of the non-breaking space characters you are encountering.

Below are examples of how to replace each of the non-breaking space characters mentioned in the questions title and additionally the UTF-8 version (C2 A0) that the OP is actually asking about according to the pastebin output.

All examples use printf to generate the output as it is more portable than echo. The space characters are replaced by X's to make the output clearer.

Examples

html

printf '&#nbsp;\n' | sed 's/&#nbsp;/X/g'
printf ' \n'  | sed 's/ /X/g'
printf ' \n'  | sed 's/&#x[aA]0;/X/g'

octal 240 = decimal 160 = hex A0

printf '\xA0\n' | sed 's/\xA0/X/g'

Or with tr:

printf '\xA0\n' | tr '\240' 'X'

U+00A0

printf '\x00\xA0\n' | sed 's/\x00\xA0/X/g'

UTF-8

printf '\xC2\xA0\n' | sed 's/\xC2\xA0/X/g'

Result

Output in all of the above cases is:

X

Answer

Now to your question, you have data that looks like this:

printf '@       IN SOA  @ rname.invalid. (' | od -x

Output:

0000000 c240 c2a0 c2a0 c2a0 c2a0 c2a0 20a0 4e49
0000020 5320 414f a0c2 4020 7220 616e 656d 692e
0000040 766e 6c61 6469 202e 0a28
0000052

In order to replace the C2 A0s with ordinary space, use this:

printf '@       IN SOA  @ rname.invalid. (' | sed 's/\xC2\xA0/ /g' | od -x

Output:

0000000 2040 2020 2020 2020 4e49 5320 414f 2020
0000020 2040 6e72 6d61 2e65 6e69 6176 696c 2e64
0000040 2820 000a
0000044

Thanks for all those who help me get to a working solution.

I tried to include a line form the original offending file in this post. It does not seams to be working. Pastebin download seams to be the only tool that keeps all the original binary. http://pastebin.com/ZqT1EWbS. You should be able to copy and past the original line and have it work in your terminal.

So here is what happens if I remove the octal \0240 or hex \xA0. It adds some other funky characters.

$ echo "@       IN SOA  @ rname.invalid. (" | sed -e "s/\xA0//g"
@������ IN SOA� @ rname.invalid. (

There is some extra data not printed in the actual files. I found the od (octal dump) tool quite useful to show me what the actual hex / oct / binary values for the whole line are.

$ echo "@       IN SOA  @ rname.invalid. (" | od -x
0000000 c240 c2a0 c2a0 c2a0 c2a0 c2a0 20a0 4e49
0000020 5320 414f a0c2 4020 7220 616e 656d 692e
0000040 766e 6c61 6469 202e 0a28
0000052

The other character that kept showing up was \xC2 It is not printed when the non breaking space \xA0 is there, but shows up if the nbsp is removed. So I had to modify the sed line in the @Thor answer to remove it as well.

This is what worked for me.

$ echo "@       IN SOA  @ rname.invalid. (" | sed -e "s/\xC2\xA0/ /g"
@       IN SOA  @ rname.invalid. (