Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.]
Solution 1:
I found this error, the cause was that there were some carriage returns "\r" in the data that pandas was using as a line terminator as if it was "\n". I thought I'd post here as that might be a common reason this error might come up.
The solution I found was to add lineterminator='\n' into the read_csv function like this:
df_clean = pd.read_csv('test_error.csv',
lineterminator='\n')
Solution 2:
If you are using python and its a big file you may use
engine='python'
as below and should work.
df = pd.read_csv( file_, index_col=None, header=0, engine='python' )
Solution 3:
Not an answer, but too long for a comment (not speaking of code formatting)
As it breaks when you read it in csv module, you can at least locate the line where the error occurs:
import csv
with open(r"C:\work\DATA\Raw_data\store.csv", 'rb') as f:
reader = csv.reader(f)
linenumber = 1
try:
for row in reader:
linenumber += 1
except Exception as e:
print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))
Then look in store.csv what happens at that line.