Python Pandas Error tokenizing data
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
you could also try;
data = pd.read_csv('file1.csv', on_bad_lines='skip')
Do note that this will cause the offending lines to be skipped.
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(filepath, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
Another solution may be to try auto detect the delimiter
# use the first 2 lines of the file to detect separator
temp_lines = csv_file.readline() + '\n' + csv_file.readline()
dialect = csv.Sniffer().sniff(temp_lines, delimiters=';,')
# remember to go back to the start of the file for the next time it's read
csv_file.seek(0)
df = pd.read_csv(csv_file, sep=dialect.delimiter)