Maven: Source Encoding in UTF-8 not working?

Solution 1:

I have found a "solution" myself:

I had to pass the encoding into the maven-surefire-plugin, but the usual

<encoding>${project.build.sourceEncoding}</encoding>

did not work. I still have no idea why, but when i pass the command line arguments into the plugin, the tests works as they should:

<plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-surefire-plugin</artifactId>
      <version>2.15</version>
      <configuration>
        <argLine>-Dfile.encoding=UTF-8</argLine>
      </configuration>
</plugin>

Thanks for all your responses and additional comments!

Solution 2:

  1. When debugging Unicode problems, make sure you convert everything to ASCII so you can read and understand what is inside of a String without guesswork. This means you should use, for example, StringEscapeUtils from commons-lang3 to turn ä into \u00e4. That way, you can be sure that you see ? because the console can't print it. And you can distinguish " " (\u0020) from " " (\u00a0)

    In the test case, check the escaped version of the inputs as early as possible to make sure the data is actually what you expect.

    So the code above should be:

    assertEquals("\u010d\u00e4\u....", escape(l_string));
    
  2. Make sure you use the correct encoding for file I/O. Never use the default encoding of Java, always use InputStreamReader/OutputStreamWriter and specify the encoding to use.

  3. The POM looks correct. Run mvn with -X to make sure it picks up the correct options and runs the Java compiler using the correct options. mvn help:effective-pom might also help.

  4. Disassemble the class file to check the strings. Java will use ? to denote that it couldn't read something.

    If you get the ? from System.out.println( ">>> " + l_string );, this means the code wasn't compiled with UTF-8 or that the source file was maybe saved with another Unicode encoding (UTF-16 or similar).

    Another source of problems could be the properties file. Make sure it was saved with ISO-8859-1 and that it wasn't modified by the compilation process.

  5. Make sure Maven actually compiles your file. Use mvn clean to force a full-recompile.