Keywords

1 Introduction

Over the years the retail price of petroleum fuel in Malaysia, Ron95, Ron97 and Diesel have been controlled by the governments using the Automatic Price Mechanism (APM) which made the price of fuel in Malaysia relatively stable up until 2004. Beyond the year 2004, the price of petroleum fuel has been volatile even with APM, still being implemented. After changing the Policy to Managed float system in 2016 fuel price have still not been stable. Reasons that have been attributed to the volatilities are the international crude oil price and foreign exchange volatilities and reduction of subsidies to improve government fiscal space [1].

The unstable nature of the fuel price demands the need for forecasting of the fuel price. Modelling and forecasting fuel price for a section of forecasters have become difficult. This because the APM model, Eq. (1), the popular model used for fuel price forecasting uses an input variable MOPS, (A), which is published regularly but sold to forecasters, making it difficult to access. Equation (1), have the inputs (A) as the Refined fuel price, Mean of Platts Singapore (MOPS), published by Platts [2], (B) Alpha, the difference between the MOPS and actual purchasing price from the refinery’s companies, (C) Tax/Subsidy [3, 4], (D) Operational cost at bulk storage for transportation and advertisement, (E) Bulk distribution company’s margin and (F) Fuel station [5, 6].

$$P = A + B + C + D + E + F$$
(1)

In this paper, we apply the time series method Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) to model and forecast the petroleum fuel pricee for Ron97 using available public data.

2 Data

The primary data used in the paper are the published weekly price of Ron97  [7,8,9], daily crude oil price in barrels (WTI, BRENT and OPEC) [10, 11] and daily foreign exchange rate (Selling rate) of the Ringgit per US dollars [12]. The data is pre-processed to clean the data and standardize the data for the modelling process. The crude oil price and foreign exchange rates are converted to weekly averages. Daily missing data are replaced with weekly averages. The foreign exchange rates, (G) in Ringgits per US dollar (RM/USD) [12], is multiplied with the international crude oil price (X) to convert ringgits per barrel (RM/blur), (I), using Eq. (2). The crude oil price are converted to ringgit per litre using the Barrel to Litre Metric Conversion (M),  [13], as shown in Eq. (3). The compiled data can be found on the doi link: https://doi.org/10.17632/zxjnrpmwd8.1 [14].

$$I = X \times G$$
(2)
$$Q = I \times M$$
(3)

Data used for the model, range from 7 April 2017 to 6 March 2020. Data used for the validation process, range from 13 March 2020 to 7 August 2020. We apply the correlation matrix to select the best crude oil price time series to model Ron97. Figure 1 is the Correlation Matrix of the three crude oil prices being assessed to model the Ron97 price. OPEC crude oil price have a better correlation of 0.79 with Ron97 compared with that of WTI and Brent crude oils. OPEC crude oil price is therefore selected as the exogenous variable to apply in modelling Ron97 price. Figure 2 are the selected Data for Ron97 price modelling.

Fig. 1
figure 1

Correlation matrix of Ron97 and crude oil price in Malaysia

Fig. 2
figure 2

Selected data for Ron97 modelling

3 Methodology

An Autoregressive Integrated Moving Average with Explanatory Variable model (ARIMAX), Eq. (4), can be viewed as a multiple regression model with one or more autoregressive (AR) terms and/or one or more moving average (MA) terms.

$$\phi (1 + L)^{p} (1 + L)^{d} y_{t} = c + \mathop \sum \limits_{i = 1}^{n} x_{i} \beta_{i} + \theta (1 + L)^{q} \varepsilon_{t}$$
(4)

This method is suitable for forecasting when data is stationary/nonstationary, and multivariate with any type of data pattern. ARIMAX is related to the ARIMA technique but, while ARIMA is suitable for datasets that are univariate, ARIMAX is suitable for analysis where there are additional explanatory variables (multivariate) in categorical and/or numeric format [15,16,17,18]. The ARIMAX modelling is in three stages, the Model Identification, Model estimation and Diagnostic checking [19]. The Modelling and 18-week forecast are implemented in MATLAB. The model is validated by plotting the forecast of the ARIMAX model with ARIMA, NARNET and actual data recorded over the time [20,21,22].

4 Results

The result has been sectioned into two parts the modelling and Evaluation of the forecast.

4.1 ARIMAX Modelling

Ron97 fuel price has been modelled in a three-step procedure;

Model Identification

The precondition for the ARIMAX model to be identified is for the time series to be stationary or there exist no unit roots in the time series [23]. Thus for multiple time series they should be cointegrated [24]. We consider two Time Series, Ron97 and OPEC time series for the ARIMAX modelling. A suitable ARIMA model is identified for the dependent variable, Ron97, then the best regression is done on the ARIMA model and the independent variable, OPEC (Fig. 3). The Sample autocorrelation (SAC) of Ron97 dies down sharply at lag 1 on the second differencing as shown in Fig. 4; thus, stationarity is at difference order of 2 and moving average order is 1. The sample partial autocorrelation (SPAC) at the second differencing, Fig. 5 does not die down, which implies the autoregression order is 0 at the second differencing. The analysis deduced from the above information implies the ARIMA model have a differencing order of 2 and the ARIMA model is set at ARIMA (0, 2, 1) for Ron97. The ARIMA model is modified to ARIMAX by introducing the exogenous variable, OPEC (Fig. 6). OPEC must not be unit root time series. Based on Phillips-Perron tests [25], OPEC at lags 0, 1 and 2, (OPEC, OPECDiff, OPECDiffDiff) with significance level of (p-value) of 0.05 do not contain unit root as shown in Table 1. We will need the ARIMA model of OPEC to forecast the exogenous value in the ARIMAX model. Assessing the SAC and SPAC of OPEC as shown in Figs. 7 and 8 respectively, the SAC does not die down, but the SPAC is considered to die down at lag 2. Thus, the ARIMA (2, 0, 0) is chosen as the model for OPEC. The distributions of Ron97 and OPEC are also considered. Observing the histograms from the correlation matrix table in Fig. 1, Ron97 and OPEC have skewed tail on the left side which is characteristics of the t-distribution, hence the t-distribution is specified over the Gaussian for all ARIMA and ARIMAX models that will be estimated.

Fig. 3
figure 3

Second difference of Ron97

Fig. 4
figure 4

Sample autocorrelation (SAC) of Ron97

Fig. 5
figure 5

Sample partial autocorrelation (SPAC) of Ron97

Fig. 6
figure 6

Time series of OPEC

Table 1 Phillips-Perron test on OPEC time series
Fig. 7
figure 7

Sample autocorrelation (SAC) of OPEC

Fig. 8
figure 8

Sample partial autocorrelation (SPAC) of OPEC

Model Estimation

MATLAB Econometric Medullar Application [26] is applied to estimate the model parameters. The Model Identification section identified the tentative model, ARIMAX (0, 2, 1) with t-distribution and beta innovations of the exogenous predictors’ time series (OPEC, OPECDiff, OPECDiffDiff). The ARIMA model associated with the ARIMAX is estimated first, then the exogenous predictors are introduced. Table 2 is the estimated parameters of ARIMA (0, 2, 1) model, Eq. (5). Equation (5) is expanded to give Eq. (6) and the estimated parameters is substituted into Eq. (6) to give Eq. (7)

$$\left( {1 - L} \right)^{2} y_{t} = \left( {1 + \theta_{1} L} \right)\varepsilon_{t}$$
(5)
$$y_{t} = 2y_{t - 1} - y_{t - 2} + \varepsilon_{t} + \theta_{1} \varepsilon_{t - 1}$$
(6)
$$y_{t} = 2y_{t - 1} - y_{t - 2} + \varepsilon_{t} - 0.9801\varepsilon_{t - 1}$$
(7)
$$(1 - L)^{2} y_{t} = X_{1} \beta_{1} + X_{2} \beta_{2} + X_{3} \beta_{3} + \left( {1 + \theta_{1} L} \right)\varepsilon_{t}$$
(8)
Table 2 Estimation results ARIMA (0, 2, 1) (t-distribution) for Ron97

Table 3 presents the tentative model estimates of the parameters of the ARIMAX which have the exogeneous time series’ variables OPEC, OPECDiff and OPECDiffDiff with its associated model, is Eq. (8). We want the most parsimonious model for the ARIMAX Ron97 model. From Table 3, the p-value of the regression coefficients OPECDiff and OPECDiffDiff are statistically not significant at significance level of 5%. They are 9.7% and 50.4% respectively. On the other-hand, the p-value of OPEC is significant with a value of 1.3%. Estimating the ARIMAX model again with OPEC as the only exogenous variable Table 4, gives the p-value, 2.51%, Table 4.

Table 3 Estimation results ARIMAX (0, 21) (t-distribution) for Ron97 with OPEC, OPECDiff and OPECDiffDiff as exogeneous variables
Table 4 Parameter estimation results, ARIMAX (0, 2, 1) (t-distribution) for Ron97 with OPEC as exogeneous variable

Re-estimating the model, Eq. (8) as Eq. (9) makes the model more parsimonious. The Akaike information criterion (AIC) or the Bayesian information criterion (BIC) of the Eq. (9) is smaller than that of Eq. (8). The errors of Eq. (9) have been minimized more than that of Eq. (8), as shown. Figure 5. Hence the final model, ARIMAX (0, 2, 1) model with OPEC as the exogeneous variable is chosen as shown in Eq. (9).

$$\left( {1 - L} \right)^{2} y_{t} = X_{1} \beta_{1} + \left( {1 + \theta_{1} L} \right)\varepsilon_{t}$$
(9)
$$y_{t} = 2y_{t - 1} - y_{t - 2} + X_{1} \beta_{1} + \varepsilon_{t} + \theta_{1} \varepsilon_{t - 1}$$
(10)
$$y_{t} = 2y_{t - 1} - y_{t - 2} \begin{array}{*{20}c} { - 0.00014} \\ \end{array} X_{1} + \varepsilon_{t} - \varepsilon_{t - 1}$$
(11)

Note that the constant term was omitted when specifying the model. This helped achieved a more parsimonious model. Expanding Eq. (9) into Eq. (10), and substituting the parameters from Table 4 into Eq. (10), we obtain Eq. (11) (Table 5)

Table 5 Goodness of fit for ARIMAX

Lastly, we estimate the OPEC ARIMA model which is needed to update the exogenous variable \(X_{t}\) in the forecast horizon interval in the ARIMAX model. Table 6 is the model parameters and Eq. (12) is it associated equation. Expanding Eq. (12) and substituting the parameters from Table 6 into Eq. (13), we obtain Eq. (14), the exogenous variable simulator for the ARIMAX model.

Table 6 Estimation results of ARIMA(2, 0, 0) (t-distribution) with OPEC
$$\left( {1 - \phi_{1} L - \phi_{2} L^{2} } \right)y_{t} = \varepsilon_{t}$$
(12)
$$y_{t} = \phi_{1} y_{t - 1} + \phi_{2} y_{t - 2} + \varepsilon_{t}$$
(13)
$$y_{t} = 1.324y_{t - 1} - 0.324y_{t - 2} + \varepsilon_{t}$$
(14)

In the current scenario a one step ahead forecast can be done using Eqs. (15) and updating the exogenous variable with Eq. (16). These models are deduced from Eqs. (13) and (14)

$$y_{t + 1} = 2y_{t} - y_{t - 1} \begin{array}{*{20}l} { - 0.00014} \hfill \\ \end{array} X_{1 + 1} + \varepsilon_{t + 1} - \varepsilon_{t}$$
(15)
$$X_{t} = 1.324X_{t - 1} - 0.324X_{t - 2} + \varepsilon_{t}$$
(16)

Diagnostic Checking

After fitting the ARIMAX model we are left with residual Time series of Ron97, We assess the residual to confirm the adequacy of the ARIMAX model. We apply the residual autocorrelation plot tests and the Ljung Box Q test in the diagnostic check [27]. The sample autocorrelation function is very weak as the correlation plot does not spike. This is corroborated by the Ljung Box Q Test. The null Hypothesis is accepted, ‘The first m autocorrelations of the residuals of ARIMAX_RON97 are jointly zero ‘0’’. This implies the ARIMAX model is adequate for forecasting the fuel price in the neighbourhood of the period considered.

4.2 Model Forecast Performance and Validations

The ARIMAX was better at forecasting Ron97 for the first month than the bench marks ARIMA and NARNET as can be seen in Fig. 9.

Fig. 9
figure 9

Forecast performance of the ARIMAX models

5 Conclusion

This paper has examined the ability of ARIMAX to model the fuel price of Ron97 using times series data of Ron97 and another exogenous time series, the crude oil price, in this case OPEC in Malaysia. It is possible to do the modelling and forecasting accurately using the ARIMAX.