Python parse CSV ignoring comma with double-quotes

Solution 1:

This should do:

lines = '''"AAA", "BBB", "Test, Test", "CCC"
           "111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in  csv.reader(lines, quotechar='"', delimiter=',',
                     quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']

Solution 2:

You have spaces before the quote characters in your input. Set skipinitialspace to True to skip any whitespace following a delimiter:

When True, whitespace immediately following the delimiter is ignored. The default is False.

>>> import csv
>>> lines = '''\
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ" 
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']

Solution 3:

[Posted edited to be more clear.] If you dont want to parse comma's under double-quotes so your output will include the commas inside the columns, here is another way of doing this. It is elegant and allows you to use cloud buckets to store your CSV file. The key is to use smart_open as a drop-in replacement to the standard file open.

Also, I am using DictReader instead of reader.

import csv
import json
from smart_open import open

with open('./temp.csv') as csvFileObj:
    reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
    # csv.reader requires bytestring input in python2, unicode input in python3
    for record in reader:
        # record is a dictionary of the csv record
        print(f'Record as json shows proper reading of file:\n {json.dumps(record, indent=4)})')
        print(f'You can reference an individual field too: {record["field3"]}')
        print(f'                                           {record["field4"]}')

Note that I added 2 parameters to DictReader. delimiter=',', quotechar='"' Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default. Real output from code:

Record as json shows proper reading of file:
 {
    "field1": "AAA",
    "field2": "BBB",
    "field3": "Test, Test",
    "field4": "CCC"
})
You can reference an individual field too: Test, Test
                                           CCC
done
Record as json shows proper reading of file:
 {
    "field1": "111",
    "field2": "222, 333",
    "field3": "XXX",
    "field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
                                           YYY, ZZZInput file:

Input data file (I added a header record for clarity. If you don't have a header record the first record will get gobbled up but there is prob a parameter for that too.)

"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"