How can I clean source code files of invisible characters?
I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as 
or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.
The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.
How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.
Solution 1:
You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.
To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.
On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:
:set nobomb
and save the file. Presto!
The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.
Solution 2:
If you are using Textmate and the problem is in a UTF-8 file:
- Open the file
- File > Re-open with encoding > ISO-8859-1 (Latin1)
- You should be able to see and remove the first character in file
- File > Save
- File > Re-open with encoding > UTF8
- File > Save
It works for me every time.
Solution 3:
It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:
grep -rn $'\xFEFF' *
It will show you the line numbers and filenames containing BOM.
Solution 4:
In Notepad++, there is an option to show all characters. From the top menu:
View -> Show Symbol -> Show All Characters
Solution 5:
I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.
See "Comparison of hex editors" in WikiPedia.