Concatenate Pandas columns under new multi-index level
Given a dictionary of data frames like:
dict = {'ABC': df1, 'XYZ' : df2} # of any length...
where each data frame has the same columns and similar index, for example:
data Open High Low Close Volume
Date
2002-01-17 0.18077 0.18800 0.16993 0.18439 1720833
2002-01-18 0.18439 0.21331 0.18077 0.19523 2027866
2002-01-21 0.19523 0.20970 0.19162 0.20608 771149
What is the simplest way to combine all the data frames into one, with a multi-index like:
symbol ABC XYZ
data Open High Low Close Volume Open ...
Date
2002-01-17 0.18077 0.18800 0.16993 0.18439 1720833 ...
2002-01-18 0.18439 0.21331 0.18077 0.19523 2027866 ...
2002-01-21 0.19523 0.20970 0.19162 0.20608 771149 ...
I've tried a few methods - eg for each data frame replace the columns with a multi-index like .from_product(['ABC', columns])
and then concatenate along axis=1
, without success.
You can do it with concat
(the keys
argument will create the hierarchical columns index):
d = {'ABC' : df1, 'XYZ' : df2}
print pd.concat(d.values(), axis=1, keys=d.keys())
XYZ ABC \
Open High Low Close Volume Open High
Date
2002-01-17 0.18077 0.18800 0.16993 0.18439 1720833 0.18077 0.18800
2002-01-18 0.18439 0.21331 0.18077 0.19523 2027866 0.18439 0.21331
2002-01-21 0.19523 0.20970 0.19162 0.20608 771149 0.19523 0.20970
Low Close Volume
Date
2002-01-17 0.16993 0.18439 1720833
2002-01-18 0.18077 0.19523 2027866
2002-01-21 0.19162 0.20608 771149
Really concat
wants lists so the following is equivalent:
print(pd.concat([df1, df2], axis=1, keys=['ABC', 'XYZ']))
Add a symbol column to your dataframes and set the index to include the symbol column, concat and then unstack that level:
The following assumes that there are as many symbols as DataFrames in your dict, and also that you check that the order of symbols is as you want it based on the order of the dict keys:
DF_dict = {'ABC': df1, 'XYZ' : df2}
dict_keys = DF_dict.keys()
symbols = ['ABC', 'ZXY']
for x in xrange(len(symbols)):
DF_dict[dict_keys[x]]['symbol'] = symbols[x]
DF_dict[dict_keys[x]].reset_index(inplace = True)
DF_dict[dict_keys[x]].set_index(['symbol', 'Date'], inplace = True)
DF = pd.concat(DF_dict[df] for df in dict_keys)
DF = DF.unstack('symbol')
I think that would be the approach I would take. Some people are against the inplace
syntax. I use it here only as convenience.