Merge daily and monthly Timeseries with Pandas
I have two dataframes, "bio" with monthly data, where I only need the column bio['Bio_Avg']. And another one "tc" with daily data. Both with Datetime index.
What I need is to merge both keeping daily granularity but adding monthly data repeating the value of the column bio['Bio_Avg'].
print(bio.head(5))
year month ncena ... contingency D Bio_Avg
Date ...
2019-12-01 2019 12 NaN ... NaN 1 38826.48
2019-11-01 2019 11 NaN ... NaN 1 33867.68
2019-10-01 2019 10 NaN ... NaN 1 31358.80
2019-09-01 2019 9 NaN ... NaN 1 29583.84
2019-08-01 2019 8 NaN ... NaN 1 27763.12
and
print(tc.head(5))
exchg_rate
2019-11-01 59.71
2019-11-02 59.74
2019-11-03 59.73
2019-11-04 59.70
2019-11-05 59.65
The expected result should be:
exchg_rate Bio_Avg
2019-11-01 59.71 33867.68
2019-11-02 59.74 33867.68
2019-11-03 59.73 33867.68
2019-11-04 59.70 33867.68
2019-11-05 59.65 33867.68
I've tried:
merge = pd.merge(tc, bio, left_index=True, right_index=True)
print(merge.head(5))
And I get:
exchg_rate year month ... contingency D Bio_Avg
2019-11-01 59.75 2019 11 ... NaN 1 33867.68
2019-12-01 59.94 2019 12 ... NaN 1 38826.48
Off course it keeps only the exact match with the date, but I loose the days in between. What could be an elegant way to solve this without breaking all dataframes?
Solution 1:
first we will reset the index on df2 and then merge on your month column, if the month column wasn't present we could use df['Date'].dt.month
#df1 = bio
#df2 = tc
#df2.reset_index(inplace=True)
df2 = df2.reset_index()
merged_df = pd.merge(
df2,
df[["month", "Bio_Avg"]],
left_on=df2['Date'].dt.month,
right_on="month"
).drop("month", axis=1)
print(merged_df)
Date exchg_rate Bio_Avg
0 2019-11-01 59.71 33867.68
1 2019-11-02 59.74 33867.68
2 2019-11-03 59.73 33867.68
3 2019-11-04 59.70 33867.68
4 2019-11-05 59.65 33867.68
Edit:
working with multiple columns to join on, instead of of using your pre-built month + year column, we can extract them from the actual Date
column ensure it's a date time by using pd.to_datetime(your_dfs['Date'])
final =pd.merge(
bio[["Bio_Avg", "Date"]],
tc,
left_on=[bio["Date"].dt.year, bio["Date"].dt.month],
right_on=[tc["Date"].dt.year, tc["Date"].dt.month],
how="right",
suffixes=("bio", "tc_"),
).drop(["key_0", "key_1"], axis=1)
print(final)
Bio_Avg Datebio Datetc_ exchg_rate
0 33867.68 2019-11-01 2019-11-01 59.71
1 33867.68 2019-11-01 2019-11-02 59.74
2 33867.68 2019-11-01 2019-11-03 59.73
3 33867.68 2019-11-01 2019-11-04 59.70
4 33867.68 2019-11-01 2019-11-05 59.65