What is the difference between Reader and InputStream?
What is the difference between Reader and InputStream? And when to use what? If I can use Reader for reading characters why I will use inputstream, I guess to read objects?
An InputStream is the raw method of getting information from a resource. It grabs the data byte by byte without performing any kind of translation. If you are reading image data, or any binary file, this is the stream to use.
A Reader is designed for character streams. If the information you are reading is all text, then the Reader will take care of the character decoding for you and give you unicode characters from the raw input stream. If you are reading any type of text, this is the stream to use.
You can wrap an InputStream and turn it into a Reader by using the InputStreamReader class.
Reader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
InputStreams are used to read bytes from a stream. So they are useful for binary data such as images, video and serialized objects.
Readers on the other hand are character streams so they are best used to read character data.
I guess the source of confusion is that InputStream.read()
returns an int
and Reader.read()
also returns an int
.
The difference is that InputStream.read()
return byte values between 0 and 255 corresponding to the raw contents of the byte stream and Reader.read()
return the character value which is between 0 and 65357 (because there are 65358 different unicode codepoints)
An InputStream
lets you read the contents byte by byte, for example the contents "a‡a" has 3 characters but it's encoded at 5 bytes in UTF-8. So with Inputstream
you can read it as a stream of 5 bytes (each one represented as an int
between 0 and 255) resulting in 97
, 226
, 128
, 161
and 97
where
a -> U+0061 -> 0x61 (hex) -> 97 (dec)
‡ -> U+2021 -> 0xE280A1 (utf-8 encoding of 0x2021) -> 226 128 161 (1 int per byte)
a -> U+0061 -> 0x61 (hex) -> 97 (dec)
A Reader
lets you read the contents character by character so the contents "a‡a" are read as 3 characters 97
, 8225
and 97
where
a -> U+0061 -> 0x61 -> 97
‡ -> U+2021 -> 0x2021 -> 8225 (single int, not 3)
a -> U+0061 -> 0x61 -> 97
The character ‡ is referred as U+2021 in Unicode
One accepts bytes and the other accepts characters.