Write a file in UTF-8 using FileWriter (Java)?
Solution 1:
Safe Encoding Constructors
Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader
and OutputStreamWriter
to receive a proper exception on an encoding glitch.
For file I/O, always make sure to always use as the second argument to both OutputStreamWriter
and InputStreamReader
the fancy encoder argument:
Charset.forName("UTF-8").newEncoder()
There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:
OutputStreamWriter char_output = new OutputStreamWriter(
new FileOutputStream("some_output.utf8"),
Charset.forName("UTF-8").newEncoder()
);
InputStreamReader char_input = new InputStreamReader(
new FileInputStream("some_input.utf8"),
Charset.forName("UTF-8").newDecoder()
);
As for running with
$ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere
The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.
Longer Example
Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:
// this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
Process
slave_process = Runtime.getRuntime().exec("perl -CS script args");
// fetch his stdin byte stream...
OutputStream
__bytes_into_his_stdin = slave_process.getOutputStream();
// and make a character stream with exceptions on encoding errors
OutputStreamWriter
chars_into_his_stdin = new OutputStreamWriter(
__bytes_into_his_stdin,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newEncoder()
);
// fetch his stdout byte stream...
InputStream
__bytes_from_his_stdout = slave_process.getInputStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stdout = new InputStreamReader(
__bytes_from_his_stdout,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
// fetch his stderr byte stream...
InputStream
__bytes_from_his_stderr = slave_process.getErrorStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stderr = new InputStreamReader(
__bytes_from_his_stderr,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin
, chars_from_his_stdout
, and chars_from_his_stderr
.
This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.
Just don’t get me started about PrintStream
s eating exceptions.
Solution 2:
Ditch FileWriter
and FileReader
, which are useless exactly because they do not allow you to specify the encoding. Instead, use
new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)
and
new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);
Solution 3:
You need to use the OutputStreamWriter
class as the writer parameter for your BufferedWriter
. It does accept an encoding. Review javadocs for it.
Somewhat like this:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("jedis.txt"), "UTF-8"
));
Or you can set the current system encoding with the system property file.encoding
to UTF-8.
java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...
You may also set it as a system property at runtime with System.setProperty(...)
if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter
.
By setting the system property you can use FileWriter
and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.
EDIT
Starting from API 19, you can replace the String "UTF-8" with
StandardCharsets.UTF_8
-
As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the
OutputStreamWriter
approach and use the constructor that receives a charset encoder.Somewhat like
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); encoder.onMalformedInput(CodingErrorAction.REPORT); encoder.onUnmappableCharacter(CodingErrorAction.REPORT); BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
You may choose between actions
IGNORE | REPLACE | REPORT
Also, this question was already answered here.