Search for optimal parameters in ARIMA?

I am learning about time-series forecasting. The ARIMA model has three parameters: the lag for AR, order of integration and the lag for MA.

I was following this course and there they just estimated what the best parameters are by trying(adding lags) different models and examining whether it increased Log-Likelihood and decreased Information Criteria. It seems quite pedestrian.

Is there a process to run, where you can search for the optimal parameters. Something like hyper-parameter tuning? I was thinking of adapting sklearn's GridSearch if they are no other options.


Solution 1:

There are two possibilies:

1. You can use autocorrelation (ACF) and partial autocorrelation (PACF).

Autocorrelation shows the correlation of past observations (lags) with the time series, which is the correlation of the time series with itself. If you have a time series y(t), then you calculate the correlation of y(t) and y(t-1), y(t) and y(t-2), and so on.

The problem with the autocorrelation is that so called intermediary effects/indirect correlations are also included. If y(t) and y(t-1) correlate, and y(t-1) and y(t-2) also correlate. This influences the correlation of y(t) and y(t-2) indirectly. You can find a more detailed explanation here:

https://otexts.com/fpp2/non-seasonal-arima.html

Partial autocorrelation also shows the correlation of a time series and it’s lags, but intermediary effects are removed. That means in the PACF you can only see how y(t) is influenced directly by y(t-1), y(t-2), and so on. Maybe also have a look here:

https://towardsdatascience.com/time-series-from-scratch-autocorrelation-and-partial-autocorrelation-explained-1dd641e3076f

There are many rules of thumg to interpret these plots. I recommend the following:

If the ACF trails off, use an MA model with the significant and strong correlations from the PACF.

If the PACF trails off, use an AR model with the significant and strong correlations from the ACF.

You can also have a look here:

https://towardsdatascience.com/identifying-ar-and-ma-terms-using-acf-and-pacf-plots-in-time-series-forecasting-ccb9fd073db8

2. You can use auto_arima()

The package pmd offers a function auto_arima() to automatically find the optimal parameters. You need to find d and D yourself, but it can find good parameters for p, P, q and Q. It compares different models with the AIC to find the best possible fit. Keep in mind that it is not 100 % reliable and that you need to take care of stationarity yourself. In short, you can use it like this:

from pmdarima.arima import auto_arima
    
auto_arima(y=your_data,
           seasonal=True/False, 
           m=season_length, #only if seasonal=True
           trace=True #so that you can see what is happening.
)

For more details, check this:

https://alkaline-ml.com/pmdarima/tips_and_tricks.html

https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.auto_arima.html