Keywords

1 Introduction

Efficient Transport systems depends on cheap fuel prices. One aspect of obtaining cheap fuel is the planning of the fuel price. The fuel price is announced weekly by The government’s body Ministry of Domestic Trade and Consumer Affairs in conjunction with the Ministry of Finance based on the Managed Floating System (MFS) Policy [1,2,3,4,5]. Transport owners, pump station managers and other stakeholders of petroleum fuel would want to forecast the fuel price to ensure there is an optimized fuel budget. Based on the MFS policy, the fuel price can be modelled by the formular (1) [2, 6].

$$ P = A + B + C + D + E + F $$
(1)

Here, (A) is the Refined fuel price, Mean of Platts Singapore (MOPS), published by Platts [7]. (B) is Alpha, the difference between the MOPS and actual purchasing price from the refinery’s companies, (C) is Tax/Subsidy [8], (D) is Operational cost at bulk storage for transportation and advertisement, (E) is Bulk distribution company’s margin and (F) is Fuel station margins. Using the formula (1) is quiet and difficult and expensive as the MOPS value s is not accessible to all. (A) MOPS is a subscription-based Time series. Medium to small scale and entities are unwilling to subscribe to it, because of the expense liability, and technicalities associated with its Assessment Methodology. To solve this problem, ARIMA modelling is explored. The historical weekly data on fuel price is modelled. This paper focuses on modelling Ron97 a variant of petrol that that has the oxidation number 97. The MFS policy allows Ron97 price to float at the international market price without subsidies. The pricing Policy of 97 has remained consistent. It is a premium product as compared to Ron 95 and Diesel sold on the Malaysian Market. The sometimes subsidized [9,10,11].

2 Data

Data source for the study is the weekly price of Ron97 announced by the Ministry of trade and Consumerism, Malaysia [3, 4, 12] starting from 7 April 2017 to 6 March 2020, Fig. 1. Validation of the forecasting model is done using Fuel price from 13 march 2020 to 7 August 2020. There are some weeks in the time series where the Policy MFS was suspended for the APM Policy or the period of review of fuel price was changed to one month [1, 13]. Where there is such problem linear interpolation is applied to fill Missing weekly fuel price [14, 15] . The weekly day for announcing Fuel Price, has changed in the timeline of the time series. The dates are aligned to a common day point by taking a weekly average of the historical price of Ron97. Saturday is assigned as the common date point. The compiled data can be found on [16].

Fig. 1
figure 1

Time series of Ron97 price in Rm/Ltr

3 Methodology

ARIMA or the Box–Jenkins methodology of order ARIMA (p, d, q) model is a time series forecasting method for non-stationary data series. The future value of a variable\({y}_{t}\), is assumed to be a linear function of several past observations (Ron97 price) and random errors, \({\varepsilon }_{t}\) as represented by Eq. (2). ARIMA modelling is segmented into three-part process; model identification, parameter estimation and diagnostic checking. Here, p represents the autoregressive order, q, represents the moving average and d is the differencing order of the Time series, fuel price [17, 18].

$$ \phi (1 + L)^{p} (1 + L)^{d} y_{t} = c + \theta (1 + L)^{q} \varepsilon_{t} $$
(2)

MATLAB Econometric Modeler [19] is used to do the model. We implement the models in MATLAB to forecast. Validation and forecast performance are assessed by comparing the Ron97 price with an 18-week horizon forecast Results.

4 Results

Ron97 fuel price has been modelled in a three-step procedure; ARIMA modelling.

4.1 ARIMA Modelling

Model Identification

The first step in identifying the ARIMA model is to check for stationarity of the time series, Ron97. It is not stationary as the fuel price does not fluctuate uniformly around a mean with a constant variance as shown in Fig. 1. The Time series is differenced to achieve stationarity time series. Figure 2a–c are the Sample autocorrelation (SAC) of Ron97 Sample partial autocorrelation (SPAC) of Ron97 of the first difference. They are stationary. The identified order for the SAC and SPAC is 14 and 14 respectively as the relevant leading spiking Lags in the correlograms are found at the 14th Lags respectively, and the rest the rest of the correlations die down. ARIMA (14, 1, 14) is the Identified model.

Fig. 2
figure 2

Correlographs of Ron97

The Autoregressive order obtained from the SPAC, and Moving Average order obtained from the SAC are too high for the first differencing. Generally lower orders are preferred. The second differencing correlograms, Fig. 2d–f are assessed to determine its order. The SPAC has an order of 0 as the correlation Lags do not die done. The SAC has spikes at Lag 1 and relatively die down after it. Hence ARIMA (0, 2, 1) is identified at the second differencing.

Model Estimation

The Models identified are estimated using the Econometric Modeler or Econometric Tool box in MATLAB. Ron97 has a t distribution and from the estimation process it is observed that the models are more parsimonious when the constant, term, c is zero (0). These conditions are specified in the software during the estimating process. The best model is the one with the least Akaike Information Criteria (AIC) or Bayesian Information Criteria (BIC). Tables 1, 2 and 3 are summary of estimates of the model parameters. Comparing the AIC and BIC respectively, it is established that ARIMA (14, 1, 14) is the most parsimonious and the best fit model. Equation (3) is the ARIMA (14, 1, 14). Model.

$$ \left( {1 - \varnothing_{14} L^{14} } \right)\left( {1 - L} \right)y_{t} = \left( {1 + \theta_{14} L^{14} } \right)\varepsilon_{14} $$
(3)
Table 1 Estimation results for ARIMA (14, 1, 14)
Table 2 Estimation results for ARIMA (0, 1, 2)
Table 3 Goodness of fit, of models

Equation (3) is expanded to give Eq. (4)

$$ y_{t} = y_{{\left( {t - 1} \right)}} + \phi_{{\left( {14} \right)}} y_{{\left( {t - 14} \right)}} - \phi_{{\left( {14} \right)}} y_{{\left( {t - 15} \right)}} + \theta_{14} \varepsilon_{{\left( {t - 14} \right)}} + \varepsilon_{t} $$
(4)

If the parameters are substituted in Eq. (4), with the parameter estimates in Table 1. Estimation Results for ARIMA (14, 1, 14) , the tentative forecasting model is achieved as Eq. (5)

$$ y_{t} = y_{t - 1} - 0.7084y_{t - 14} + 0.7084y_{t - 15} + 0.6308\varepsilon_{t - 14} + \varepsilon_{t} $$
(5)

Model Diagnostic Checking

The model is assessed for its adequacy in forecasting the fuel price, Ron97. The model residual is assessed, using the LJung-Box Q-test and residual correlation plot. The Ljung-Box Q-test is a quantitative way to test for autocorrelation at multiple lags jointly. In perfuming the Ljung-Box test, the we specify the degree of freedom to 5 as there are only five independent variables in the model. The number of possible lags is increased by 5 lags from the 15th lag onward till the 150th lag, Table 4. The p-value of the Ljung test remains less than 5% for the Null Hypothesis: “The first m autocorrelations of the residuals of ARIMA_RON97 are jointly 0”. This implies the are no autocorrelation in the ARIMA residual. The Residual correlation plot Fig. 3 shows only one feeble spike at the 23rd lag which is not that significant. Hence the Model is adequate for the forecasting of Ron97 price in the neighbourhood of the period considered.

Table 4 Ljung-box Q-test
Fig. 3
figure 3

Residual SAC of ARIMA_ Ron97

4.2 Model Forecast Performance and Validations

The forecast for ARIMA(14, 1, 14) performed better than ARIMA(0, 1, 2) as shown in the circled area in the plot. It is the second best in the graph compared to observed data. NARNET, Nonlinear autoregressive neural network model was even better (Fig. 4).

Fig. 4
figure 4

Forecast performance of the ARIMA models

5 Conclusion

The main objective of this paper is to assess the ability to forecast Ron97 fuel price using ARIMA models. It is possible to forecast accurately fuel price using the ARIMA models but limited to a shorter period, month. Improvement can thought be made to the model to increase the length of forecast periods.