Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.]

Solution 1:

I found this error, the cause was that there were some carriage returns "\r" in the data that pandas was using as a line terminator as if it was "\n". I thought I'd post here as that might be a common reason this error might come up.

The solution I found was to add lineterminator='\n' into the read_csv function like this:

df_clean = pd.read_csv('test_error.csv',
                 lineterminator='\n')

Solution 2:

If you are using python and its a big file you may use engine='python' as below and should work.

df = pd.read_csv( file_, index_col=None, header=0, engine='python' )

Solution 3:

Not an answer, but too long for a comment (not speaking of code formatting)

As it breaks when you read it in csv module, you can at least locate the line where the error occurs:

import csv
with open(r"C:\work\DATA\Raw_data\store.csv", 'rb') as f:
    reader = csv.reader(f)
    linenumber = 1
    try:
        for row in reader:
            linenumber += 1
    except Exception as e:
        print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))

Then look in store.csv what happens at that line.