"Unmappable character for encoding UTF-8" error

Solution 1:

You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.

Solution 2:

I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

Solution 3:

The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.

However, the data in your .java files is not actually encoded in UTF-8. The problem is probably the ¬ character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.

Solution 4:

In eclipse try to go to file properties (Alt+Enter) and change the Resource → 'Text File encoding' → Other to UTF-8. Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.

Change the encoding Resource → 'Text File encoding' back to Default.

Compile and deploy the code.

Solution 5:

For IntelliJ users, this is pretty easy once you find out what the original encoding was. You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying:

The encoding you've chosen ('[encoding type]') may change the contents of '[Your file]'. Do you want to reload the file from disk or convert the text and save in the new encoding?

So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. For me this turned the ? characters into their proper value.

IntelliJ can tell if you most likely did not pick the right encoding and will warn you. Revert back and try again.

Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). This time select the 'Convert' button on the dialog.

For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. The offending characters were single quotes (‘ and ’) likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8.