Python: skip comment lines marked with # in csv.DictReader
Actually this works nicely with filter
:
import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
print(row)
fp.close()
Good question. Python's CSV library lacks basic support for comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #
must appear as the first symbol. A more generic solution would be:
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield raw
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
As an example, the following dummy.csv
file:
# comment
# comment
a,b,c # comment
1,2,3
10,20,30
# comment
returns
['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']
Of course, this works just as well with csv.DictReader()
.
Another way to read a CSV file is using pandas
Here's a sample code:
df = pd.read_csv('test.csv',
sep=',', # field separator
comment='#', # comment
index_col=0, # number or label of index column
skipinitialspace=True,
skip_blank_lines=True,
error_bad_lines=False,
warn_bad_lines=True
).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)
For this csv file:
a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82
we will get this output:
b c d e
a
1 NaN 16 NaN 55
8 77.0 77 NaN 16
13 19.0 25 28.0 82
b c d e
a
1 no value 16 no value 55
8 77 77 no value 16
13 19 25 28 82