Replacing all special/accented characters with equivalent regular characters in Notepad++

I'm trying to figure out a way to automatically search and replace all special/accented letters/characters (such as Â/Ô) with the equivalent regular letters/characters (A/O) in Notepad++.

Tried using ToolFx but it didn't work.


Solution 1:

The suggestion above is excellent, but in this very moment it would not work because of an issue between Notepad++ and "Notepad++ Python Script". Since some months Notepad++ plug in manager downloads an old Python Script version that won't work with the editor. To fix that:

  1. Exit Notepad++
  2. Download the compatible version from SourceForge.
  3. Run the downloaded installer by double clicking it. On newer Windows it'll ask to switch to Administrator privileges.
  4. Make sure to pick the correct install drive at the beginning of the install process. It won't detect the Notepad++ installation disk correctly. I had to reinstall it again because by default it installs on C:\ even if Notepad++ is on another disk.
  5. Follow the wizard instructions.
  6. Once finished with the install process, (re)start Notepad++. Now go and open the Plugin menu. You should see a new "Python Script" item inside it. If it appears then you have force-installed the correct version well. You may also double check by opening the Plug in manager, going to the "Installed" tab and looking for an entry showing version 1.0.8 (at this time) of the Python plug in being present.
  7. You are almost done. Go to the Plugins => Python Script => Show console menu. A pane shall appear at the bottom of Notepad++. It MUST show a prompt like the following:

    Python 2.7.6-notepad++ r2 (default, Apr 21 2014, 19:26:54) [MSC v.1600 32 bit (Intel)]
    Initialisation took 156ms
    Ready.
    

The various version numbers are current as of today, of course they shall change as time goes on. If the bottom pane shows an exception stating an exception occurred or (and) it stays blank, then you have installed a wrong Python plug in version.

Now, let's apply the script in the correct way:

  1. Open two new, blank tabs/files.
  2. Paste your accented text in the first.
  3. Right click the tab of the second and select the 'Move to Other View' menu. The Notepad++ windows will split.
  4. Open the Python Script console as explained above (Plugins => Python Script => Show console menu).
  5. Go to the console line at the bottom of the Python pane, it has a ">>>" marking at its beginning.
  6. Type: from Npp import * and then press Enter (from now assume you'll always press Enter at the end of the commands).
  7. Enter: import unicodedata in the same input text.
  8. Click (select) the tab containing the accented text (this is important!).
  9. Enter the following commands, one line at a time, in the Python prompt and then press Enter after each line:

    eText = editor.getText()
    uText = unicode(eText, "UTF-8")
    nText = unicodedata.normalize( "NFKD", uText )
    

If you want to be sure Python "really got the text in": after you typed eText = editor.getText() (+ Enter key), enter: print eText + Enter. You should see your accented text dumped in the Python console output pane.

  1. Click (select) the empty tab (this is important!).
  2. Enter: editor.addText( nText.encode('ASCII', 'ignore') ) in the usual Python console command input text box.
  3. The empty tab shall fill in with the converted, accent-less text. Make sure to follow this to-do list carefully because it's easy to miss a step (expecially clicking the tabs) and then you'll have to restart from scratch.

Solution 2:

Don't be restricted by what you see as being available. We have python available from within N++, and that means a quick SO search [python] [unicode] remove accents reveals this highly voted question dealing with exactly that!

We can test easily enough in N++ to see how it works::

  1. Open two new buffers/tabs/files or whatever you like calling them.
  2. Copy and paste these Latin Extended UTF-8 Characters into the first.
  3. Right click the tab of the second and 'Move to Other View'.
  4. Open the Python Script console and try the following commands::

    from Npp import *
    import unicodedata
    << Select tab in view 1. >>
    eText = editor.getText() << Select tab in view 2. >>
    uText = unicode(eText, "UTF-8")
    nText = unicodedata.normalize( "NFKD", uText )
    editor.addText( nText.encode('ASCII', 'ignore') )

From looking around a bit it seems there are lots of ways to accomplish removing accents, the question is which works best for you. And now that you can see how easy it is to 'try' these solutions on your text, go forth and give it a shot. Once you like a particular method add it (using the plugin menu) as a script and it will be there whenever you need it.

Have fun!


BTW - if you don't have Python Script installed you can install it using the Plugins -> Plugin Manager.

Solution 3:

Script

Here is a variation on the script from the first and second answers, which can be used to assign a shortcut key to convert selected text:

class convert_char01:
    import unicodedata
    eText = editor.getSelText()
    uText = unicode(eText, encoding='utf-8', errors='ignore')
    nText = unicodedata.normalize( "NFKD", uText )
    editor.replaceSel(nText)

Shortcut

Here's how to create a shortcut key to run the script:

  • Install Python Script for NPP.
  • Create a new script:
    • Go to Plugins > Python Script > New Script.
    • Call it say "convert_char.py".
  • Add this script to the menu:
    • Go to Plugins > Python Script > Configuration, click your User Script and click the Add button to add to Menu items.
    • Suggest also setting Initialisation to ATSTARTUP to speed things up.
  • Create a new shortcut hot key:
    • Probably need to restart NPP first, to get the new menu item to show up in Shortcut Mapper.
    • Create a new shortcut accelerator under Settings > Shortcut Mapper....

Now when you have a string highlighted you can quickly convert the characters in that screen to run the script without using the console.

References

  • python - UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c - Stack Overflow
  • Python documentation - Unicode HOWTO
  • python - Can somone explain how unicodedata.normalize(form, unistr) work with examples? - Stack Overflow
  • Python documentation - unicodedata - Unicode Database