How can I read and manipulate CSV file data in C++? [duplicate]

More information would be useful.

But the simplest form:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream  data("plop.csv");

    std::string line;
    while(std::getline(data,line))
    {
        std::stringstream  lineStream(line);
        std::string        cell;
        while(std::getline(lineStream,cell,','))
        {
            // You have a cell!!!!
        }
    }
 }

Also see this question: CSV parser in C++


You can try the Boost Tokenizer library, in particular the Escaped List Separator


If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:

struct Customer {
    int id;
    std::string first_name;
    std::string last_name;
    struct {
        std::string street;
        std::string unit;
    } address;
    char state[2];
    int zip;
};

Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.

With that in mind, think of your CSV handling as two operations:

// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
    CSVReader(const std::string &inputFile);
    bool hasNextLine();
    void readNextLine(std::vector<std::string> &fields);
private:
    /* secrets */
};
class CSVWriter {
public:
    CSVWriter(const std::string &outputFile);
    void writeNextLine(const std::vector<std::string> &fields);
private:
    /* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);

Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:

  1. Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
  2. You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
  3. Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
  4. CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.

I've worked with a lot of CSV files in my time. I'd like to add the advice:

1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".

2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.

3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".

4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.


This is a good exercise for yourself to work on :)

You should break your library into three parts

  • Loading the CSV file
  • Representing the file in memory so that you can modify it and read it
  • Saving the CSV file back to disk

So you are looking at writing a CSVDocument class that contains:

  • Load(const char* file);
  • Save(const char* file);
  • GetBody

So that you may use your library like this:

CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();

CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
    CSVDocumentField* col = header->GetField(i);
    cout << col->GetText() << "\t";
}

for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
    CSVDocumentRow* row = body->GetRow(i);
    for (int p = 0; p < row->GetFieldCount(); p++)
    {
        cout << row->GetField(p)->GetText() << "\t";
    }
    cout << "\n";
}

body->GetRecord(10)->SetText("hello world");

CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");

doc->Save("file.csv");

Which gives us the following interfaces:

class CSVDocument
{
public:
    void Load(const char* file);
    void Save(const char* file);

    CSVDocumentBody* GetBody();
};

class CSVDocumentBody
{
public:
    int GetRowCount();
    CSVDocumentRow* GetRow(int index);
    CSVDocumentRow* AddRow();
};

class CSVDocumentRow
{
public:
    int GetFieldCount();
    CSVDocumentField* GetField(int index);
    CSVDocumentField* AddField(int index);
};

class CSVDocumentField
{
public:
    const char* GetText();
    void GetText(const char* text);
};

Now you just have to fill in the blanks from here :)

Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.

:)

EDIT

I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.