Merging two CSV files using Python
When I'm working with csv
files, I often use the pandas library. It makes things like this very easy. For example:
import pandas as pd
a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
Some explanation follows. First, we read in the csv files:
>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
title stage jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2 three 1.115 1.146 2.921
>>> b
title mar apr may jun Unnamed: 5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2 three 0.285 1.283 0.924 956.0000 NaN
and we see there's an extra column of data (note that the first line of fileb.csv
-- title,mar,apr,may,jun,
-- has an extra comma at the end). We can get rid of that easily enough:
>>> b = b.dropna(axis=1)
>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2 three 0.285 1.283 0.924 956.0000
Now we can merge a
and b
on the title column:
>>> merged = a.merge(b, on='title')
>>> merged
title stage jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000
and finally write this out:
>>> merged.to_csv("output.csv", index=False)
producing:
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
You need to store all of the extra rows in the files in your dictionary, not just one of them:
dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}
Then, since the values in the dictionaries are lists, you need to just concatenate the lists together:
w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])