How to do row-to-column transposition of data in csv table?
I'm new to scripting. I have a table (Table1.txt
) and I need to create another table that has Table1's rows arranged in columns and vice versa. I have found solutions to this problem for Perl and SQL but not for Python.
I just started learning Python two days ago, so this is as far as I got:
import csv
import sys
with open(sys.argv[1], "rt") as inputfile:
readinput = csv.reader(inputfile, delimiter='\t')
with open("output.csv", 'wt') as outputfile:
writer = csv.writer(outputfile, delimiter="\t")
for row in readinput:
values = [row[0], row[1], row[2], row[3]]
writer.writerow([values])
This just reproduces the columns as columns. What I would have liked to do now is to write the last line as writer.writecol([values])
but it seems that there is no command like that and I haven't found another way of writing rows as columns.
@Ashwini's answer is perfect. The magic happens in
zip(*lis)
Let me explain why this works: zip takes (in the simplest case) two lists and "zips" them: zip([1,2,3], [4,5,6])
will become [(1,4), (2,5), (3,6)]
. So if you consider the outer list to be a matrix and the inner tuples to be the rows, that's a transposition (ie., we turned the rows to columns).
Now, zip
is a function of arbitrary arity, so it can take more then two arguments:
# Our matrix is:
# 1 2 3
# 4 5 6
# 7 8 9
zip([1,2,3], [4,5,6], [7,8,9])
>>> [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
# Now it is
# 1 4 7
# 2 5 8
# 3 6 9
The problem we're facing is that in your case, we don't know how many arguments we want to pass to zip
. But at least, we already know the arguments: they are the elements of lis
! lis
is a list, and each element of that list is a list as well (corresponding to one line of numbers in your input file). The *
is just Pythons way of telling a function "please use the elements of whatever follows as your arguments and not the thing itself!"
So
lis = [[1,2,3], [4,5,6]]
zip(*lis)
is exactly the same as
zip([1,2,3], [4,5,6])
Congrats, now you're a Python pro! ;-)
The solution in general to transpose a sequence of iterables is: zip(*original_list)
sample input:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
program:
with open('in.txt') as f:
lis = [x.split() for x in f]
for x in zip(*lis):
for y in x:
print(y+'\t', end='')
print('\n')
output:
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
Since we are talking about columns, rows and transposes, perhaps it is worth it to mention numpy
>>> import numpy as np
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
>>> x
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> x.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Just to construct on @Akavall answer, if you want to read from a file, transpose and then save again just do:
from numpy import genfromtxt, savetxt
data = genfromtxt('in.txt')
savetxt('out.txt',data.T)
data.T
in the 3rd line is where the data gets transposed.
Here's one way to do it, assume for simplicity that you just want to print out the objects in order:
# lets read all the data into a big 2d array
buffer = []
for row in readinput:
values = [row[0], row[1], row[2], row[3]]
buffer.append(values)
# what you have in your code
for i in range(len(buffer)):
for j in range(len(buffer[0])):
print buffer[i][j]
# this is called a transpose; we have buffer[i][j] to read row then column,
# switch i and j around to do the opposite
for i in range(len(buffer[0])):
for j in range(len(buffer)):
print buffer[j][i]
Since you need an array to pass to writer.writerow
, you could do this
for i in range(len(buffer[0])):
writer.writerow([buffer[j][i] for j in range(len(buffer))])