How to merge two pandas DataFrames into single Multi-Index DataFrame?
I have two DataFrames that are equally indexed, but each represents a different aspect of my full dataset.
For instance:
import pandas as pd
from datetime import date
df_price = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=["A", "B", "C"],
data={"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}
)
df_quantity = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=["A", "B", "C"],
data={"A": [9, 8, 7], "B": [6, 5, 4], "C": [3, 2, 1]}
)
What I want is the equivalent of doing this:
index = pd.MultiIndex.from_product([["A", "B", "C"], ["price", "quantity"]], names=["first", "second"])
df_total = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=index,
data=[[1, 9, 4, 6, 7, 3], [2, 8, 5, 5, 8, 2], [3, 7, 6, 4, 9, 1]]
)
first A B C
second price quantity price quantity price quantity
2021-01-01 1 9 4 6 7 3
2021-01-02 2 8 5 5 8 2
2021-01-03 3 7 6 4 9 1
Any ideas? I have tried the common methods of join and merge, but all I could do is add the columns with suffixes.
One option:
(i) join
the two DataFrames
(ii) split column names on '_'
and because we want to use from_tuples
, map the sublists to tuples
(iii) use pd.MultiIndex
to convert the column to MultiIndex
(iv) sort column names to match the desired outcome
df_total = df_price.join(df_quantity, lsuffix='_price', rsuffix='_quantity')
df_total.columns = pd.MultiIndex.from_tuples(map(tuple, df_total.columns.str.split('_')))
df_total = df_total.reindex(df_total.columns.sort_values(), axis=1)
Output:
A B C
price quantity price quantity price quantity
2021-01-01 1 9 4 6 7 3
2021-01-02 2 8 5 5 8 2
2021-01-03 3 7 6 4 9 1