CSV parsing in Java - working example..? [closed]

I want to write a program for a school java project to parse some CSV I do not know. I do know the datatype of each column - although I do not know the delimiter.

The problem I do not even marginally know how to fix is to parse Date or even DateTime Columns. They can be in one of many formats.

I found many libraries but have no clue which is the best for my needs: http://opencsv.sourceforge.net/ http://www.csvreader.com/java_csv.php http://supercsv.sourceforge.net/ http://flatpack.sourceforge.net/

The problem is I am a total java beginner. I am afraid non of those libraries can do what I need or I can't convince them to do it.

I bet there are a lot of people here who have code sample that could get me started in no time for what I need:

  • automatically split in Columns (delimiter unknown, Columntypes are known)
  • cast to Columntype (should cope with $, %, etc.)
  • convert dates to Java Date or Calendar Objects

It would be nice to get as many code samples as possible by email.

Thanks a lot! AS


You also have the Apache Commons CSV library, maybe it does what you need. See the guide. Updated to Release 1.1 in 2014-11.

Also, for the foolproof edition, I think you'll need to code it yourself...through SimpleDateFormat you can choose your formats, and specify various types, if the Date isn't like any of your pre-thought types, it isn't a Date.


There is a serious problem with using

String[] strArr=line.split(",");

in order to parse CSV files, and that is because there can be commas within the data values, and in that case you must quote them, and ignore commas between quotes.

There is a very very simple way to parse this:

/**
* returns a row of values as a list
* returns null if you are past the end of the input stream
*/
public static List<String> parseLine(Reader r) throws Exception {
    int ch = r.read();
    while (ch == '\r') {
        //ignore linefeed chars wherever, particularly just before end of file
        ch = r.read();
    }
    if (ch<0) {
        return null;
    }
    Vector<String> store = new Vector<String>();
    StringBuffer curVal = new StringBuffer();
    boolean inquotes = false;
    boolean started = false;
    while (ch>=0) {
        if (inquotes) {
            started=true;
            if (ch == '\"') {
                inquotes = false;
            }
            else {
                curVal.append((char)ch);
            }
        }
        else {
            if (ch == '\"') {
                inquotes = true;
                if (started) {
                    // if this is the second quote in a value, add a quote
                    // this is for the double quote in the middle of a value
                    curVal.append('\"');
                }
            }
            else if (ch == ',') {
                store.add(curVal.toString());
                curVal = new StringBuffer();
                started = false;
            }
            else if (ch == '\r') {
                //ignore LF characters
            }
            else if (ch == '\n') {
                //end of a line, break out
                break;
            }
            else {
                curVal.append((char)ch);
            }
        }
        ch = r.read();
    }
    store.add(curVal.toString());
    return store;
}

There are many advantages to this approach. Note that each character is touched EXACTLY once. There is no reading ahead, pushing back in the buffer, etc. No searching ahead to the end of the line, and then copying the line before parsing. This parser works purely from the stream, and creates each string value once. It works on header lines, and data lines, you just deal with the returned list appropriate to that. You give it a reader, so the underlying stream has been converted to characters using any encoding you choose. The stream can come from any source: a file, a HTTP post, an HTTP get, and you parse the stream directly. This is a static method, so there is no object to create and configure, and when this returns, there is no memory being held.

You can find a full discussion of this code, and why this approach is preferred in my blog post on the subject: The Only Class You Need for CSV Files.


My approach would not be to start by writing your own API. Life's too short, and there are more pressing problems to solve. In this situation, I typically:

  • Find a library that appears to do what I want. If one doesn't exist, then implement it.
  • If a library does exist, but I'm not sure it'll be suitable for my needs, write a thin adapter API around it, so I can control how it's called. The adapter API expresses the API I need, and it maps those calls to the underlying API.
  • If the library doesn't turn out to be suitable, I can swap another one in underneath the adapter API (whether it's another open source one or something I write myself) with a minimum of effort, without affecting the callers.

Start with something someone has already written. Odds are, it'll do what you want. You can always write your own later, if necessary. OpenCSV is as good a starting point as any.