Merging pandas dataframes generated by for loop
Let say I have below code:
import pandas as pd
import numpy as np
import random
import string
data = ['DF1', 'DF2', 'DF2']
for i in data :
DF = pd.DataFrame([random.choices(string.ascii_lowercase,k=5), [10, 11, 12, 13, 14]]).T
DF.columns = ['col1', 'col2']
DF['i'] = i
So for each i
, I have different DF
. Finally I need to merge all those data frames based on col1
and add numbers in col2
row-wise.
In this case, total number of such dataframes is based on length of data
array, and therefore variable. In R
we can use do.call()
function to merge such varying number of data frames. In Python
is there any way to achieve this?
For example, lets say we have 3 individual tables as below:
After joining based on col1
, I expect below table (sorted based on col1
)
Any pointer will be highly appreciated.
Solution 1:
IIUC, try:
df1 = pd.DataFrame({'col1':[*'adrtg']
,'col2':[10,11,12,13,14]
,'data':['DF1']*5})
df2 = pd.DataFrame({'col1':[*'adspq']
,'col2':[10,11,12,13,14]
,'data':['DF2']*5})
df3 = pd.DataFrame({'col1':[*'dcxyz']
,'col2':[10,11,12,13,14]
,'data':['DF3']*5})
pd.concat([df1, df2, df3]).groupby('col1', as_index=False)['col2'].sum()
Output:
col1 col2
0 a 20
1 c 11
2 d 32
3 g 14
4 p 13
5 q 14
6 r 12
7 s 12
8 t 13
9 x 12
10 y 13
11 z 14