Resample a distribution conditional on another value
I would like to create a series of simulated values by resampling from empirical observations. The data I have are time series of 1-minute frequency. The simulations should be made on an arbitrary number of days with the same times each day. The twist is, that I need to sample conditional on the time, i.e. when sampling for a time of 8:00, it should be more probable to sample a value around 8:00 (but not limited to 8:00) from the original serie.
I have made a small sketch to show, how the draw-distribution changes depending on which time the a value is simulated for:
I.e. for T=0 it is more probable to draw a value from the actual distribution where the time of day is close to 0 and not probable to draw a value from the original distribution at the time of day of T=n/2 or later, where n is the number of unique timestamps in a day.
Here is a code snippet to generate sample data (I am aware that there is no need to sample conditional on this test data, but it is just to show the structure of the data)
import numpy as np
import pandas as pd
# Create a test data frame (only for illustration)
df = pd.DataFrame(index=pd.date_range(start='2020-01-01', end='2020-12-31', freq='T'))
df['MyValue'] = np.random.normal(0, scale=1, size=len(df))
print(df)
MyValue
2020-01-01 00:00:00 0.635688
2020-01-01 00:01:00 0.246370
2020-01-01 00:02:00 1.424229
2020-01-01 00:03:00 0.173026
2020-01-01 00:04:00 -1.122581
...
2020-12-30 23:56:00 -0.331882
2020-12-30 23:57:00 -2.463465
2020-12-30 23:58:00 -0.039647
2020-12-30 23:59:00 0.906604
2020-12-31 00:00:00 -0.912604
[525601 rows x 1 columns]
# Objective: Create a new time series, where each time the values are
# drawn conditional on the time of the day
I have not been able to find an answer on here, that fits my requirements. All help are appreciated.
Solution 1:
I consider this sentence:
need to sample conditional on the time, i.e. when sampling for a time of 8:00, it should be more probable to sample a value around 8:00 (but not limited to 8:00) from the original serie.
Then, assuming the standard deviation is one sixth of the day (given your drawing)
value = np.random.normal(loc=current_time_sample, scale=total_samples/6)