Converting output from subprocess to csv.reader object

This is a problem in Python 3. The CSV module needs unicode input, not byte strings. In addition to this, csv.reader() needs an iterable such as an open file or a list of strings. Try this:

encoding = 'ascii'    # specify the encoding of the CSV data
p2 = subprocess.Popen(['sort', '/tmp/data.csv'], stdout=subprocess.PIPE)
output = p2.communicate()[0].decode(encoding)
edits = csv.reader(output.splitlines(), delimiter=",")
for row in edits:
    print(row)

If /tmp/data.csv contains (I've used commas as the separator):

1,2,3,4
9,10,11,12
a,b,c,d
5,6,7,8

then the output would be:

['1', '2', '3', '4']
['5', '6', '7', '8']
['9', '10', '11', '12']
['a', 'b', 'c', 'd']

The following works for me (even though the docs warn about reading from stdout). Wrapping stdout with an io.TextIOWrapper() supports newlines embedded in the data for fields.

Doing this allows a generator to be used which has the advantage of allowing stdout to be read incrementally, one line at at time.

p2 = subprocess.Popen(["sort", "tabbed.csv"], stdout=subprocess.PIPE)
output = io.TextIOWrapper(p2.stdout, newline=os.linesep)
edits = csv.reader((line for line in output), delimiter="\t")
for row in edits:
    print(row)

Output:

['1', '2', '3', '4']
['5', '6', '7', '8']
['9', '10', '11', '12']
['a', 'b\r\nx', 'c', 'd']

The tabbed.csv input test file contained this (where » represents tab characters and the ≡ a newline character):

1»2»3»4
9»10»11»12
a»"b≡x"»c»d
5»6»7»8

Converting output from subprocess to csv.reader object

Related

Recent Posts