Java reading a file different methods

It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).

Boris the Spider also mentioned Channel and RandomAccessFile.

Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?

(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?


Solution 1:

Lets start at the beginning. The question is what do you want to do?

It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:

  1. File(Input|Output)Stream - read these bytes as a stream of byte.
  2. File(Reader|Writer) - read from a stream of bytes as a stream of char.
  3. Scanner - read from a stream of char and tokenise it.
  4. RandomAccessFile - read these bytes as a searchable byte[].
  5. FileChannel - read these bytes in a safe multithreaded way.

On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).

So - when wouldn't I want to use it [a Scanner]?.

You would not use a Scanner if you wanted to, (these are some examples):

  1. Read in data as bytes
  2. Read in a serialized Java object
  3. Copy bytes from one file to another, maybe with some filtering.

It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.

So you may be better off with

try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
    //do stuff
}

Ugly, I know.

It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:

byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)

Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:

Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)

For example to get a Stream of words from a Path you can do:

    final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
            flatMap((in) -> Arrays.stream(in.split("\\b")));

Solution 2:

SCANNER:

can parse primitive types and strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

DATA INPUT STREAM:

Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html

BufferedReader:

Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,

BufferedReader in   = new BufferedReader(new FileReader("foo.in"));

will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html