Keywords

18.1 Introduction

Over the last decade, containerization and ports have played important roles in international trade [1]. Containerization is an important element of the logistics and security innovations that revolutionized freight handling in the twentieth century. The pattern characteristics of container throughput time series include cycles, seasonality, mutability, and randomity. These traits are determined by the economic structure and market development of the port’s hinterland [2].

Some econometric models have previously been used for container throughput forecasting. Forecasting methods can be derived from qualitative or quantitative, where quantitative can be classified into time series and causal methods [3, 4]. Time series generalize the historical data activities to future and suitable to short-term forecasting. Causal methods assume a relationship between involved variables, for instance, between port throughput and gross domestic product (GDP) and are useful for medium to long-term forecasting [5, 6]. Authors of [7] conducted forecasting methods for short-term prediction of port cargo. Authors of [8, 9] highlighted medium and long of port throughput. Critical reviews mention that a variety of short-term methods have been used to forecast and window analysis of container throughput [10]. Authors of [11] used a modified regression model for short-term forecasting of the volumes of import and export containers in Taiwan. Authors of [12] compared six univariate models for short-term forecasting of container throughput in Taiwan as well. Authors of [13] compared short-term forecasting accuracy of three models, at with genetic programming is the best model. Similarly, Ref. [14] proposed an algorithmic method combining projection pursuit regression and genetic programming for short-term forecasting.

The container throughput time series is usually complex; thus, a single model based on linear assumptions, or a nonlinear dynamic model often cannot obtain satisfactory forecasting performance. An increasing number of researchers have constructed hybrid forecasting models to solve this problem. For example, Ref. [15] proposed three hybrid models based on the least squares support vector regression (LSSVR). In this study, both ARIMA and SARIMA models are applied using the data of container throughput of Malaysia ports between the span of the period of 2007–2015 in order to attain the future of container demand in Malaysia. At the end of the findings, the forecasted value of container throughput for year 2017 and 2018 are compared with the actual value of throughput, to get a clear comparison of both measurements.

The remainder of this paper is organized as follows: The literature review is described in Sect. 18.2. Section 18.3 illustrates the methodology to the forecasting of container throughput by applying ARIMA and SARIMA models. Then, model identification and forecasting are discussed in Sect. 18.4. Section 18.5 presents results and discussion. Finally, Sect. 18.6 concludes the study.

18.2 Maritime Studies

18.2.1 Seaport Operation

The common application methods for maritime studies are either parametric or non-parametric models. Authors of [16], 17 employed regression and neural network methods, respectively, to forecast container growth in Hong Kong. Authors of [15], 18 used genetic programming and modified regression models for container throughput forecasts in Taiwan. Authors of [13] showed six univariate forecasting models for the container throughput at three major seaports in Taiwan. Authors of [19] used a hybrid traditional fuzzy set theory and regression analysis and developed a fuzzy regression model to forecast import and export cargo volumes in Taiwan seaports. Authors of [20] recommended a vector error correction model to forecast, in a long-term, container throughput in Hong Kong seaport. Besides on forecasting, Ref. [21] highlighted on port competition and not port competitiveness, focusing on competitive advantages. Then, it is possible to classify the competition or cooperation models into qualitative and quantitative ones. The qualitative approach reveals port competition, Ref. [22] used questionnaires to identify the criteria of container ship owners of a port. Authors of [23] discussed local and regional competition and cooperation between Hong Kong and South China Ports from administrative and ownership structures. Authors of [24] conducted a study on Copenhagen Malmö Port by referring to [24] and found several advantages of cooperation, such as more effective use of port resources and specialization in which toward port economic of scales by utilizing all resources.

The previous research that is focused on maritime studies discusses on port competition or cooperation; Refs. [25, 26] used market share evaluation, the growth-share matrix, shift-share analysis, and evaluation of the Hirshmann-Herfindahl index (HHI) to analyze the dynamics of container traffic and port concentration within the EU port system. Authors of [27] discussed the port concentration dynamics in Eastern Asia over the period 1975–2005 using a variant of the HHI called the geo-economic concentration index (GECI). Authors of [28] modeled the market share of the North Sea container ports of Rotterdam, Antwerp, Bremen, and Hamburg using the log it model discussed on quality of service. Authors of [29] presented an algorithm based on a multi criteria method called as hierarchical fuzzy process method to identify the competitiveness of container ports in Asia. Apart of that, research on slot capacity analysis for detecting port competition like [30] analyzed developments in container port competition between 10 major ports in East Asia during 1995–2001. Authors of [31] examined competition dynamics between the South-East Asian ports of Lang, Singapore, and Tanjung Pelepas during 1999–2004. Authors of [11] introduced a port market share forecasting model that explicitly modeled port competition by considering origin and destination, as well as cargo type, ship size, maritime access, port capacity and efficiency, and hinterland transport. Authors of [32] examined the competition between the ports of Busan in South Korea and Kobe in Japan using a non-cooperative game model. Authors of [33] highlight qualitative approach for competition and cooperation among Japanese container ports. Authors of [34] highlight a cointegration analysis that has become a popular method for analysing relationships between seaports. Authors of [20] used a structural vector error correction model to conduct a detailed study of competition between the ports of Hong Kong and Singapore. Authors of [30] analyzed the long and short-run competition dynamics between 10 major container ports in East Asia from 1980 to 2001 using the vector autoregressive model (VARM) and Johansen’s cointegration test. Authors of [35] analyzed relationships between and among six main Asian ports using a cointegration test and the Granger causality test for short-term relationship. Authors of [36] studied competition–cooperation relationships between four Liaoning ports (China) using the VARM model. They interpreted negative and positive signs of coefficients in regression equations as indications of competition and cooperative competition. VAR-like models have drawbacks, like variables should be stationary in first differences, then decide the number of lags and optionally select interception and trend. As there is no unique VARM from a given data set, the model reveal is statistical and not necessarily realistic. Authors of [37] discussed the expansion of container traffic in the port of Koper at the beginning of this millennium. Authors of [38] compared three forecasting models for quarterly container throughput in the ports of Koper, Trieste, Venice, and Ravenna during 2002–2012. They found the ARIMA model to be superior to the other models for each port in their study. Authors of [37] provided an analysis of NAPs similar to that of [24] for the EU container port system.

Typically, in the literature, parametric or non-parametric forecasting models are adopted for this purpose. Parametric models assume a model structure that can be described by known mathematical expressions, while non-parametric models, on the other hand, do not assume any definite functional form of dependent and independent variables. Despite the fact that many forecasting models have been developed in the literature, most models to date either lack consideration of short-term seasonal variations, measured in terms of monthly container throughput, or they simply focus on a specific seaport or country. Hence, issues such as periodicity, complexity and spatial applicability may not be appropriately addressed. Therefore, there is a need for the development of forecasting model for container terminals in Malaysia by incorporating seasonality with spatial considerations.

a. ARIMA versus SARIMA

ARIMA and SARIMA models are expansions of ARMA lesson in attempts to include more practical elements, in specific, separately, non-stationarity in mean and seasonal behaviors. In practice, numerous financial time series are non-stationary in mean, and they can be modeled only by expelling the non-stationary source of variation [12]. This is typically done by differencing the series. Suppose \(X_{t}\) is non-stationary in mean, and the idea is to build an ARMA model on the series \(w_{t}\) which is defined as the result of the operation of differencing the series \(d\) times (in general \(d = 1\)), \(w_{t} = \Delta^{d} X_{t}\).

Hence, ARIMA models (\(I\) is defined as integrated) are the ARMA models defined on the \(d\)th difference of the original process:

$$\Phi \left( B \right) \Delta^{d} X_{t} = \theta \left( B \right)a_{t}$$
(18.1)

where \(\Phi \left( B \right) \Delta^{d}\) is called the generalized autoregressive operator and \(\Delta^{d} X_{t}\) is a quantity made stationary through the differentiation and can be modeled with an ARMA.

The autoregressive integrated moving average (ARIMA) approach is one method that could be employed for short-term port throughput forecasting. It was found from the literature that this approach tends to have high performance in short-run forecasts [39,40,41]. Time series models tend to outperform their counterparts because of the restrictive nature of other econometric models. For instance, they do not incorporate the dynamic structure of time series data and impose improper restrictions on the structural variables. ARIMA models, with the flexibility to incorporate the dynamic structure, have an inherent advantage in short-term forecasting. However, the forecast performance of ARIMA models deteriorates as the forecast span increases, as the model is inefficacious in capturing long-term economic relationships [40]. However, short-term ARIMA forecasts are acceptable.

In many cases, time series have a seasonal component that replicates each \(s\) observations. For month-to-month perceptions \(s = 12\) (12 in 1 year), or for quarterly observations \(s = 4\)(4 in 1 year). In order to deal with regularity, ARIMA processes have been modified into SARIMA models [12].

$$\Phi \left( B \right) \Delta^{d} X_{t} = \theta \left( B \right)a_{t}$$
(18.2)

where \(a_{t}\) is such that;

$$s\phi \left( {B^{s} } \right){\Delta }_{s}^{D} a_{t} = s{\Theta }\left( {B^{s} } \right)a_{t}$$
(18.3)

hence,

$$\Phi \left( B \right)s\Phi \left( {B^{s} } \right)\Delta_{s}^{D} \Delta^{d} X_{t} = \theta \left( B \right) s\Theta \left( {B^{s} } \right)a_{t}$$
(18.4)

and we write \(X_{t}\) ∼ ARIMA (p, d, q) × (P, D, Q) s. The idea is that SARIMA are ARIMA (p, d, q) models whose residuals αt are ARIMA (P, D, Q). With ARIMA (P, D, Q) we intend ARIMA models whose operators are defined on \(B^{s}\) and successive powers. Concepts of admissible regions SARIMA are analog to the admissible regions for ARIMA processes, and they are just expressed in terms of \(B^{s}\) powers.

This study explores the use of seasonal autoregressive, integrated, and moving average model (SARIMA) models to forecast container throughput at several major international container ports, while taking into consideration seasonal variations. The SARIMA modeling methodology is described, then a database consisting of yearly container port traffic data from 2007 to 2015, followed by a forecasting model for container terminals in Malaysia.

18.3 Methodology

18.3.1 Undertaking Methodology for ARIMA and SARIMA Model

The data for the study is derived from the container throughputs between the periods of 2007 to 2015, and it is obtained from the Marine Department, Malaysia.

A forecasting technique using ARIMA is applicable for time series in forecasting container throughput. The modified technique SARIMA is used for forecasting container throughput in Malaysia container terminals.

There are several practical phenomena whose data are presented in time series with seasonal characteristic. A seasonal time series is defined as a series with a regular pattern of changes that repeats over S time-periods, i.e., the average values at some particular times within the seasonal intervals are usually significantly different from those at other times. Thus, a seasonal time series is usually a non-stationary series which should be made stationary by using either differencing or logging techniques before ARIMA models are used to do the forecasting for the series.

18.3.1.1 Non-seasonal ARIMA Model

The non-seasonal ARIMA model usually has the form of ARIMA (p, d, and q), where:

  • p is the number of lags of the differenced series appearing in the forecasting equation, called autoregressive parameter,

  • d is the difference levels to make a time series stationary, called integrated parameter, and

  • q is the number of the lags of the forecast errors, called moving average parameter. The “Auto Regressive” term refers to the lags of the differenced series appearing in the forecasting equation and the “Moving Average” term refers to the lags of the forecast errors. This “Integrated” term refers to the difference levels to make a time series stationary.

18.3.1.2 Seasonal ARIMA Model

The variation of a time series is usually affected by several different factors, including seasonality. Seasonality may make several non-stationary time series significantly vary. And, due to the environmental influence, such as periodic trends, the variations induced by seasonal factor sometimes dominate the variations of the original series. A seasonal time series is usually a non-stationary time series that follows some kind of seasonal periodic trend and can be made stationary by seasonal differencing which is defined as a difference between one value and another one with lag that is a multiple of S. Seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model with the form of SARIMA (p, d, q) (P, D, Q) S, where: p, d, q are the parameters in non-seasonal ARIMA model that apply to Fig. 18.1 as follows;

Fig. 18.1
figure 1

Container throughput forecasting flow

  • P is the number of seasonal autoregressive order,

  • D is the number of seasonal differencing,

  • Q is the number of seasonal moving average order, and

  • S is the time span of the repeating seasonal pattern

18.4 Model Identification and Forecasting

The method of seasonal ARIMA is commonly applied to time series analysis. The term of ARIMA is in short and stands for the three components which are autoregressive, integrated, and moving average models. The fundamental concept to undertake when we developing models is to understand the characteristics of the series datasets and how it behaves over times. There are some advantages of undertaking this strategy, i.e., to give the freedom to the researcher to select the most appropriate model from all potential models according to the time plot. In our case, we arranged the dataset according to the state which is repeating from 2007 till 2015. The series with seasonal needs the additional differencing to eliminate the seasonal effect. Let \(z_{t}\) be seasonal differenced series,\(z_{t} = y_{t} - y_{t - 15}\) for state data series. If \(z_{t}\) remained non-stationary, then the next step is to perform non-seasonal differencing which is denoted by \(w_{t} = z_{t} - z_{t - 1}\). The specific name for the seasonal model is SARIMA (p, d, q)(P, D, Q)s. Below are the steps of model identification.

Fig. 18.2
figure 2

Time plot of throughput

Fig. 18.3
figure 3

ACF plot original series

Step 1: Initial Data Investigation

Figure 18.2 shows that a simple data investigation was conducted to understand the basic pattern of the series for the Total Throughput Port (TTP). The data was obtained from the Ministry of Transport (MOT) of the maritime section. From the series plot, it is indicating that the series is not stationary with the existence of seasonal components. Therefore, the data are used to build the SARIMA model. Figures 18.3 and 18.4 show at the lag of 15 that the spike is significant. These characteristics (from Figs. 18.3 and 18.4) show that the seasonal effect is present.

Fig. 18.4
figure 4

PACF plot original series

Step 2: Performing Seasonal Differencing

The seasonal difference is given as \(z_{t} = y_{t} - y_{t - 9}\). By observing Figs. 18.3 and 18.4, it show that the original series of the container total throughput port (TTP) could increased at one degree, while in the non stationaries form. Figure 18.5 shows the time series plot of the TTP difference and can be concluded that the series is stationary from the time series plot of TTP Difference. Figures 18.6 and 18.7 show the degree of both non-seasonal and seasonal difference, the series plotted now become stationary. From the stationary series, there are four propose models which are identified in Table 18.1.

Fig. 18.5
figure 5

Time plot of series in seasonal difference \(z_{t} = y_{t} - y_{t - 9}\)

Fig. 18.6
figure 6

The ACF of \(z_{t}\)

Fig. 18.7
figure 7

The PACF of \(z_{t}\)

Table 18.1 Summary of Portmanteau test for each model

Step 3: Models Identified

In order to determine the best model formulations to be fitted to the data series, one needs to observe for significant spike in Figs. 18.5 and 18.6. Analysis in Fig. 18.1 contains the seasonal component and the general formulation is written as SARIMA (p, d, q)(P, D, Q)15. To identify the non-seasonal and seasonal part, one needs to observe the spikes at ACF and PACF of \(w_{t}\). The spike for MA can be identified by looking ACF plot of \(w_{t}\) and AR by looking at PACFof \(w_{t}\). While the spike for the seasonal MA and seasonal AR can be obtained by looking at the “irregular” spike for most series. A significant spike is observed at the lag 9 to suggest the seasonal SMA (from ACF) and SAR (from PACF). All possible models will be correctly checked for their representative, this to ensure that a well specified model is not missed out.

Table 18.1 depicts that all four proposed models are well specified since the errors are white noise. After considering the concept of parsimony and the size of their respective MSE. Model SARIMA (1,0,1)(0,1,1) 1 therefore is being selected as the good model to represent the data. After selection of the model, the next step is to forecast the future value by the gained model.

Step 4: Forecasting Values from the Obtained Model

Figure 18.8 depicts the forecasting result for the model in which the red line represents the forecasting value of the study. The volatile value is significant and consistent for the plot.

Fig. 18.8
figure 8

Time series plot with forecasting values

18.5 Results and Discussion

The best model that has been derived from the research is based on throughput data of Malaysia container terminals from 2007 to 2015. The following best model is therefore ARIMA (1,0,1) AR (1) and MA (2)\(, y_{t} = \mu + \emptyset_{1} y_{t = 1} - \theta_{1} \varepsilon_{t + 1} + \varepsilon_{1}\).

, \(y_{t} - \emptyset_{1} y_{t - 1} = \mu + \varepsilon_{t} - \theta_{1} \varepsilon_{t + 1}\), \((1 - \theta_{t} {\text{\rm B}})\varepsilon_{t}\) and SARIMA (0,1,1)1 SMA(1), \(z_{t} = y_{t} - y_{t + 1}\),\(z_{t} = \varepsilon_{t} - \theta_{1} \varepsilon_{t + 1, } \left( {1 - B^{1} } \right) = \left( {1 - \theta_{1} B^{1} } \right)\varepsilon_{t} ; z_{t = } y_{t} - y_{t - 1}\). Therefore, combining ARIMA and SARIMA with, the model is \(\left( {1 - \emptyset_{1} B} \right)\left( {1 - B^{1} } \right)y_{t} = \left( {1 - \emptyset_{t} B} \right)\left( {1 - \theta_{1} B^{1} } \right)\varepsilon_{t}\). Table 18.2 depicts the forecasting estimation for 2017, 2018 and 2019 as well as the actual values for throughput for year 2017 and 2018. It is shown from the forecasting results that a significant increase of container throughput for container terminals in Malaysia is seen for all ports. Nonetheless, the actual figures from official data given for the year 2017 and 2018 show a different case. Except for AW, CP and EPP, all other ports recorded a decreased value compared to the forecasted values. This could happen due to several factors, and it is likely contributed by the specification of the period taken for this study. ARIMA and SARIMA models would perform better with longer period rather than 8 years of testing [42]. Other than that such external and internal factors might also lead to the outcome. For example, certain ports were changing their strategic overview, thus leading to a change of regulation by the management level [43]. On the other hand, the lack of demand from importers and exporters in calling to the port could be one of the issues, as well as the projection of competitors that resulted in a stiff competition, which hinder the growth of the container throughput. Apart of that the state interference in determining the pattern of trade could also indirectly affect the direction taken by the port authority in general.

Table 18.2 Estimation of TTP for 2017, 2018 and 2019

18.6 Conclusion

The forecasting of container throughput is important for operation and management of ports. This study proposes the linear model of ARIMA and SARIMA to predict the outcome of container throughput of Malaysian ports. From that, the predictions of throughput for the forecasting result are mapped out. On the other hand, it is notable to find that some of the actual results show substantial differences between the forecast and actual number of container throughput. This could be happening due to several factors such as the length of sample period, the change of regulation by the port authority in facing the demand, the lack of demand from importers and exporters to the port, state interference in determining the trade, as well as tough competition from the nearby ports that hamper the growth of the related port. For that, this study could be further enhanced with a better representation of data since ARIMA and SARIMA models are better performed with longer period of sample. Other than that, an additional study could be done to investigate the demand coming out from importers and exporters, as well as the extent of the role of state in affecting the direction taken by the port authority in handling up the trade activities.