How can I clean source code files of invisible characters?

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.

The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.

How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.


Solution 1:

You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.

To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.

On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:

:set nobomb

and save the file. Presto!

The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.

Solution 2:

If you are using Textmate and the problem is in a UTF-8 file:

  1. Open the file
  2. File > Re-open with encoding > ISO-8859-1 (Latin1)
  3. You should be able to see and remove the first character in file
  4. File > Save
  5. File > Re-open with encoding > UTF8
  6. File > Save

It works for me every time.

Solution 3:

It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:

grep -rn $'\xFEFF' *

It will show you the line numbers and filenames containing BOM.

Solution 4:

In Notepad++, there is an option to show all characters. From the top menu:

View -> Show Symbol -> Show All Characters

Solution 5:

I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.

See "Comparison of hex editors" in WikiPedia.