is there a way to highlight all the special accent characters in sublime text or any other text editor?

I a using the the HTML encode special characters in Sublime text to convert all the special character into their HTML code. I have a lot of accented characters in different parts of the file. So, it would be great if I could select all the special character and then use the plugin to convert all at once!

Is there a regex that helps select all special characters only?


Solution 1:

Yes.

Sublime text supports regular expression and you can select all non-ASCII (code point > 128) characters. This regex find should be enough for you:

[^\x00-\x7F]

Just search and replace.

But if you are doing manual HTML encode in the first place you are doing it wrong. Save your files as UTF-8 encoding (Sublime Text 2 default) and make sure your web server also sends out those files as UTF-8. No conversion, encoding or anything needed.

Solution 2:

Just as further reference (or as complement):

The Sublime Text 2/3 package, named Highlighter, can (as his name says) highlight some characters with regex...

"You can also add a custom regex for characters to highlight."

So, with this package, plus @Mikko Ohtamaa answer, we can edit the file...

highlighter.sublime-settings - User

...and include the proposed regex, (expresed here as [^\\x00-\\x7F]) to end up with something like this:

{  
    "highlighter_regex": "(\t+ +)|( +\t+)|[^\\x00-\\x7F]|[\u2026\u2018\u2019\u201c\u201d\u2013\u2014]|[\t ]+$"  
}

The result would be an automatic highlight of any "non-ASCII (code point > 128) characters" in our file.

Note, this wil not made a selection of those characters, only will highlight them to easily realize if you have any.

Solution 3:

Another plugin option

I recently wrote a plugin dedicated to highlighting non-ascii characters: https://github.com/TuureKaunisto/highlight-dodgy-chars

The exactly same functionality can be achieved with Highlighter but with the less generic Highlight Dodgy Chars plugin you don't need to write a regular expression, you can just list the non-ascii characters you don't wish to highlight in the settings. The European special characters are whitelisted by default.