Python parse CSV ignoring comma with double-quotes
Solution 1:
This should do:
lines = '''"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in csv.reader(lines, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True):
print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']
Solution 2:
You have spaces before the quote characters in your input. Set skipinitialspace
to True
to skip any whitespace following a delimiter:
When
True
, whitespace immediately following the delimiter is ignored. The default isFalse
.
>>> import csv
>>> lines = '''\
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ"
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']
Solution 3:
[Posted edited to be more clear.] If you dont want to parse comma's under double-quotes so your output will include the commas inside the columns, here is another way of doing this. It is elegant and allows you to use cloud buckets to store your CSV file. The key is to use smart_open as a drop-in replacement to the standard file open.
Also, I am using DictReader instead of reader.
import csv
import json
from smart_open import open
with open('./temp.csv') as csvFileObj:
reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
# csv.reader requires bytestring input in python2, unicode input in python3
for record in reader:
# record is a dictionary of the csv record
print(f'Record as json shows proper reading of file:\n {json.dumps(record, indent=4)})')
print(f'You can reference an individual field too: {record["field3"]}')
print(f' {record["field4"]}')
Note that I added 2 parameters to DictReader. delimiter=',', quotechar='"' Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default. Real output from code:
Record as json shows proper reading of file:
{
"field1": "AAA",
"field2": "BBB",
"field3": "Test, Test",
"field4": "CCC"
})
You can reference an individual field too: Test, Test
CCC
done
Record as json shows proper reading of file:
{
"field1": "111",
"field2": "222, 333",
"field3": "XXX",
"field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
YYY, ZZZInput file:
Input data file (I added a header record for clarity. If you don't have a header record the first record will get gobbled up but there is prob a parameter for that too.)
"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"