Converting output from subprocess to csv.reader object
This is a problem in Python 3. The CSV module needs unicode input, not byte strings. In addition to this, csv.reader()
needs an iterable such as an open file or a list of strings. Try this:
encoding = 'ascii' # specify the encoding of the CSV data
p2 = subprocess.Popen(['sort', '/tmp/data.csv'], stdout=subprocess.PIPE)
output = p2.communicate()[0].decode(encoding)
edits = csv.reader(output.splitlines(), delimiter=",")
for row in edits:
print(row)
If /tmp/data.csv
contains (I've used commas as the separator):
1,2,3,4 9,10,11,12 a,b,c,d 5,6,7,8
then the output would be:
['1', '2', '3', '4'] ['5', '6', '7', '8'] ['9', '10', '11', '12'] ['a', 'b', 'c', 'd']
The following works for me (even though the docs warn about reading from stdout
). Wrapping stdout
with an io.TextIOWrapper()
supports newlines embedded in the data for fields.
Doing this allows a generator to be used which has the advantage of allowing stdout
to be read incrementally, one line at at time.
p2 = subprocess.Popen(["sort", "tabbed.csv"], stdout=subprocess.PIPE)
output = io.TextIOWrapper(p2.stdout, newline=os.linesep)
edits = csv.reader((line for line in output), delimiter="\t")
for row in edits:
print(row)
Output:
['1', '2', '3', '4']
['5', '6', '7', '8']
['9', '10', '11', '12']
['a', 'b\r\nx', 'c', 'd']
The tabbed.csv
input test file contained this (where »
represents tab characters and the ≡
a newline character):
1»2»3»4
9»10»11»12
a»"b≡x"»c»d
5»6»7»8