1 Introduction

From a worldwide perspective, tourism makes a great contribution to economic growth [1, 2]. Take the case of China, according to China National Tourism Administration, the total revenue of China’s tourism industry was 6.63 trillion yuan in 2019, which raises 11 percent compared with 2018 and accounts for more than 11% of China's GDP. Therefore, forecasting tourist arrivals plays an important role in forecasting future economic growth. Moreover, tourist arrivals forecasting can provide valuable reference for subsequent strategic planning and policy formulation [3, 4]. Accurate forecast of tourist arrivals can make the operation of travel agencies more effective and help tourist destinations to be better managed, which is very important to the sustainable development of the whole tourism industry and even the entire economy. In general, the study of tourist arrivals forecasting is of great significance to the whole society, both politically and economically. However, due to the complex characteristics of tourist arrivals series (e.g., seasonality, randomness, and non-linearity), tourist arrivals forecasting is still a difficult problem.

To solve this problem, a growing number of researchers are paying attention to the analysis and prediction of tourist arrivals. Meanwhile, numerous models have been formulated and designed to forecast tourist arrivals. According to related literature [5], single forecasting models that were widely used to forecast tourist arrivals can fall into two main types, i.e., econometric models and artificial intelligence (AI) models. The econometric models, such as autoregressive moving average (ARMA) [6], autoregressive integrated moving average (ARIMA) [7], exponential smoothing (ES) [8], and generalized autoregressive conditional heteroskedasticity (GARCH) [9], are more suitable for forecasting a relatively stable time series [10]. When forecasting data such as tourist arrivals with non-linear characteristic and rapid changes, it has been pointed out that econometric models perform poorly in achieving effective prediction results [11]. As for the AI models, the development of AI techniques has greatly promoted their application in various fields, including air quality early warning [12], the prediction of crude oil price [13], and electricity price [14]. The commonly used AI models for forecasting tourist arrivals include artificial neural networks (ANNs) [15], extreme learning machine (ELM) [16], and support vector machine (SVM) [17]. Compared with the econometric models, AI models are more effective due to their strong robustness and fault tolerance. All these forecasting methods have significantly promoted the sustainable development of world tourism industry.

However, almost every single forecasting model has its pros and cons, and even AI models are unlikely to achieve satisfactory performance in all scenarios. For example, due to the poor effect of extrapolation, narrow prediction scale, and high requirement on data quantity and quality, econometric models are unsuitable for data with high fluctuation and noise [18]; for ANNs, the prediction performance of the models will be affected by the initial weights and thresholds which are generated randomly [19]. For this reason, researchers started to turn their attention to developing hybrid forecasting models by incorporating some existing single methods. Numerous studies have shown that hybrid forecasting model can achieve relatively ideal effect and has become the current mainstream forecasting method [20].

In order to develop hybrid models for forecasting, some decomposition methods, such as variational mode decomposition (VMD) [21], empirical mode decomposition (EMD) [22], and wavelet transform (WT) [23], have been employed to extract the main features of raw series. Our previous work [24] has proved that data preprocessing with an effective decomposition method can significantly improve prediction performance. Specifically, data preprocessing strategies can fall into two types. One refers to “decomposition & de-noising” strategy [25]. Under this strategy, the noisy information of the original series is first removed, then the forecasting model is established by using the filtered time series. The other refers to “divide & conquer” strategy [26]. Under this strategy, raw series is first decomposed into several components, which then can be predicted using a determined prediction model respectively, and finally, the predicted values of all components are integrated to get the final results. In terms of tourist arrivals forecasting, Jiang and Ma [27] used fast ensemble EMD (FEEMD) method for data preprocessing to build a hybrid model, which performs well in forecasting future tourist arrivals. Similarly, by using WT for data preprocessing and kernel-based ELM and ARMA for forecasting, Yang et al. [28] developed a hybrid model for daily tourist arrivals forecasting, and the empirical results based on three real tourism markets show that the developed model has good linear and non-linear prediction abilities. In the above studies, the hybrid forecasting models can improve the prediction accuracy and thus perform better than all the considered benchmark models. Nevertheless, data preprocessing only using a single decomposition method in the hybrid model may not be able to fully extract the main features of the tourist arrivals series. Furthermore, inherent defects existed in some data decomposition methods, such as mode mixing and endpoint effect, may also limit their application in feature extraction [29]. In fact, problems such as incomplete data feature extraction and the inherent defects existed in decomposition methods will make it difficult for the hybrid model to achieve satisfactory prediction results. Therefore, to improve the prediction performance, it is worth further improve the data preprocessing techniques in future work.

In addition, there is a problem that the commonly used single forecasting models have a poor interpretation of the prediction results. The fuzzy time series (FTS) model which divides the universe of discourse based on historical data features can solve this problem well. However, most of the traditional FTS models divide the universe of discourse with equal widths and ignore the potential features of the data, which makes the prediction results still unsatisfactory [30]. To address this issue, scholars developed some novel methods for dividing the universe of discourse, such as genetic algorithms and clustering algorithms. Therefore, from the perspective of strengthening the interpretation of the results and improving model accuracy, it is of great value to further explore how to divide the universe of discourse of FTS by fuzzy C-means (FCM) algorithm.

To sum up, the above analysis shows that the existing studies are insufficient to comprehensively improve the forecasting effectiveness. Thus, it is very urgent for sustainable economic and social development to develop a novel forecasting model of tourist arrivals for the tourism industry and significantly improve the forecasting effectiveness.

This paper proposes a novel hybrid forecasting model of tourist arrivals using dual decomposition strategy and an improved fuzzy time series method. Two stages are included in this hybrid model: dual decomposition, and integrated forecasting. In the first stage, the seasonal adjustment method (i.e., X12-ARIMA [31]) is employed to decompose the tourist arrivals data to extract its significant seasonal characteristics, and then an improved empirical mode decomposition method (i.e., ICEEMDAN [29]) is applied to decompose the remaining component sequences for reducing data complexity. Then in the second stage, the FTS model with the universe of discourse divided by the FCM algorithm, i.e., FCM-FTS method, is used to model and predict each component sequence after the second decomposition, and the predicted values of all the components are linearly summed up to get the final results.

The main contributions of this paper can be summarized as below:

  1. (1)

    Most importantly, we develop a hybrid forecasting model with high accuracy and high robustness, and its effectiveness has been verified in forecasting Hong Kong’s inbound tourist arrivals. According to the experimental results, our hybrid model can decompose and extract the complex features of the raw series, thus obtaining more accurate and more robust prediction results. Hence, it is a very effective tool to predict real tourism markets and can provide valuable reference for tourism decision-making.

  2. (2)

    Our hybrid forecasting model has two major differences from the traditional hybrid approaches. Firstly, a different strategy for data preprocessing is presented. In most of the former research, individual decomposition approaches have been adopted generally to decompose the raw series, of which the main features may not be fully extracted by such data preprocessing strategy. Therefore, this paper presents a dual decomposition strategy based on X12-ARIMA and ICEEMDAN, which can overcome the drawbacks of the traditional data preprocessing strategies and further improve the prediction performance. Secondly, an effective clustering algorithm, i.e., FCM, is adopted to optimize the domain partition module of FTS model, of which the performance has been successfully improved.

  3. (3)

    In terms of numerical experiments, this paper not only compares the proposed hybrid model with five commonly used single forecasting models but also compares it with other six hybrid forecasting models using different data preprocessing strategies, which comprehensively demonstrates the superiority of our model. In addition, the benchmark models considered can represent currently popular modeling strategies and ideas, similar to the high-quality papers published in international journals in recent years. On the basis of comparative study, this paper verifies and demonstrates the significance of the components of our hybrid model in detail, such as the validity of X12-ARIMA and ICEEMDAN, as well as the superiority of dual decomposition strategy and FCM method. Moreover, this paper also verifies the robustness of our hybrid model. To sum up, we finally demonstrate that the developed novel tourist arrivals forecasting model has high superiority and practical values for the real tourism markets.

  4. (4)

    To verify the model prediction performance, this paper provides a scientific evaluation and an in-depth discussion of the prediction results. We use six typical criteria, including average error (AE), mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), Theil inequality coefficient (TIC), and index of agreement (IA), to evaluate the performance of the forecasting models. Moreover, we further demonstrate the superiority of the proposed model through an insightful discussion from five aspects: (a) the model robustness according to the prediction performance at different years; (b) the significance of the model from the perspective of statistics; (c) the forecasting effectiveness based on the comparative studies; (d) the improvement percentage relative to the benchmarks; and (e) the grey relational analysis of all the models involved.

The rest of the paper is arranged as follows. Section 2 introduces the main methods involved and the overall framework of the developed hybrid forecasting model. Section 3 mainly describes the data, conducts the comparative experiments, and analyzes the prediction results. Section 4 presents the related discussions. Finally, Sect. 5 concludes the study.

2 Methods

This section presents a hybrid model of dual decomposition and an improved fuzzy time series method for tourist arrivals forecasting. Specifically, Sects. 2.12.5 describe the relevant methods for decomposition and prediction respectively, and Sect. 2.6 provides the overall process of our hybrid model. Table 3 in Appendix 1 shows the used nomenclature in this paper.

2.1 X12-ARIMA

X12-ARIMA [31] is a popular seasonal adjustment method developed by the United States Census Bureau, which mainly includes two functional modules: regARIMA module and X-11 seasonal decomposition module. In particular, the regARIMA module can carry out various types of data preprocessing, such as outlier detection and correction, estimation, and elimination of the influence of calendar factors [32]. The X-11 seasonal decomposition module decomposes the preprocessed data through multiple iterations of moving average method to form a seasonal factors series and a seasonally adjusted series. For the purpose of this paper, we just introduce the basic algorithm for the X-11 seasonal decomposition module.

It is assumed that the monthly series can be decomposed into a seasonal factor (i.e., S), a trend-cycle factor (i.e., TC), and an irregular factor (i.e., I). Two main steps are involved in the X-11 seasonal decomposition module:

Step 1 Estimation of the initial components

Firstly, the 2 × 12 moving average method is applied to estimate the initial TC component sequences. Then, this TC component is subtracted from the raw time series to obtain the initial estimation of the seasonal-irregular component (i.e., SI). Next, the 3 × 3 moving average method is applied to estimate the initial seasonal component, which then is normalized by a 2 × 12 moving average. Finally, the normalized seasonal component is subtracted from the raw series to obtain the initial estimation of the seasonally adjusted series (i.e., SA).

Step 2 Final seasonal adjustment

Firstly, the Henderson moving average method is used to obtain the second estimation of the TC component from the initially estimated SA series. Then, this new TC component is subtracted from the raw series to obtain the second estimation of the SI component. Next, the 3 × 5 moving average method is applied to estimate a new seasonal component, which then is normalized by a 2 × 12 moving average. Finally, the normalized seasonal component is removed from the raw series to obtain the final SA series.

It is worth noting that the selection of the number of terms in the moving average is critical in the X-11 seasonal decomposition module. The higher the number of terms, the more irregular factors can be eliminated. But as the number of terms increases, more information is lost. For monthly series that change periodically on a 12-month basis, a centered 12-term moving average can be considered for obtaining the initial TC component and the normalized seasonal component. However, if the series to be decomposed is also an economic flow time series (such as the monthly tourist arrivals), a 2 × 12 moving average is required to ensure that each element of the newly generated sequence after using the moving average is aligned with that of the raw series. For other parts of the module, the number of terms in the moving average is specified with reference to the standard X-11 procedure [33].

2.2 ICEEMDAN

Traditional empirical mode decomposition methods, including empirical mode decomposition (EMD) [22], ensemble EMD (EEMD) [34], and complete ensemble EMD with adaptive noise (CEEMDAN) [35], have some problems such as mode mixing, noise, and redundancy, and pseudo components after decomposition. Aiming at these problems, Colominas et al. [29] proposed an improved complete ensemble EMD with adaptive noise (ICEEMDAN), which has a higher ability to extract the components of the complex time series with different time scale features. The following are the main steps and relevant formulas of this algorithm:

Step 1 Calculate the first residue of the original series using the following equation:

$$r_{1} = \frac{1}{I}\sum\limits_{i = 1}^{I} {M\left( {x + \beta_{1} E_{1} \left( {w^{i} } \right)} \right)} ,$$
(1)

where Ek() is an operator, which uses EMD method to decompose a series into several intrinsic mode functions (IMFs) and one residual, with the k-th IMF component (i.e., the k-th mode) as output; M() also represents an operator, which produces the local mean (i.e., the mean of the upper and lower envelopes) of a series; \(x\) represents the original time series; \(w^{i}\) indicates a realization of white noise, whose mean value is zero and variance is one, \(i = 1,2,...,I\), and \(I\) is the number of times that white noise is added; \(\beta_{k}\) is the parameter that controls the energy of the white noise in each iteration, \(k = 1,2,...,K\), and \(K\) is the maximum iterations. Mode mixing is defined as either a single IMF consisting of components of widely disparate scales or a component of a similar scale residing in different IMFs [34]. The purpose of including white noise in this equation is to avoid the mode mixing problem so that the components of the complex time series with different time scales can be identified and extracted more accurately.

Step 2 Subtract the first residue from the original series to get the first mode \(d_{1}\):

$$d_{1} = x - r_{1} .$$
(2)

Step 3 Obtain the second residue of the original time series, i.e., \(r_{2}\), in the same way as in step 1, and finally obtain the second mode \(d_{2}\) by the following equation:

$$d_{2} = r_{1} - r_{2} = r_{1} - \frac{1}{I}\sum\limits_{i = 1}^{I} {M\left( {r_{1} + \beta_{2} E_{2} \left( {w^{i} } \right)} \right)} .$$
(3)

Step 4 Obtain the k-th residue and k-th mode by the following equation:

$$d_{k} = r_{k - 1} - r_{k} = r_{k - 1} - \frac{1}{I}\sum\limits_{i = 1}^{I} {M\left( {r_{k - 1} + \beta_{k} E_{k} \left( {w^{i} } \right)} \right)} .$$
(4)

Step 5 Return to step 4 for next k until the residue can no longer be decomposed or \(K\) is reached.

2.3 Fuzzy C-means clustering

The fuzzy C-means (FCM) algorithm is one of the commonly used clustering methods [36]. The basic idea of FCM algorithm is to continuously update the cluster centers of all data and the membership degrees of each data point belonging to all cluster centers through iterative calculation, until the dissimilarity index function and the iteration error reach the preset minimum value. The following are the main steps and related formulas of FCM algorithm:

Step 1 Calculate the number of cluster centers:

$$c = \left[ {{{\left( {x_{\max } - x_{\min } } \right)} \mathord{\left/ {\vphantom {{\left( {x_{\max } - x_{\min } } \right)} {\frac{{\sum\limits_{t = 2}^{{n_{1} }} {\left| {x_{t} - x_{t - 1} } \right|} }}{{n_{1} - 1}}}}} \right. \kern-\nulldelimiterspace} {\frac{{\sum\limits_{t = 2}^{{n_{1} }} {\left| {x_{t} - x_{t - 1} } \right|} }}{{n_{1} - 1}}}}} \right],$$
(5)

where \(x_{t} (t = 1,2, \cdots ,n_{1} ) \in R\) is the element of the original series \(x\), and \(n_{1}\) is the number of elements in \(x\). \(c\) is the number of cluster centers, \(c \in \left\{ {2,3,...,n_{1} - 1} \right\}\). \(x_{\max }\) and \(x_{\min }\) represent the maximum and minimum values in the original series, respectively; [] represents the rounding operation.

Step 2 Initialize the cluster centers. Randomly select \(c\) samples in \(x\) as the initial cluster centers \(V(0) = \left\{ {{\text{v}}_{01} ,{\text{v}}_{02} ,...,{\text{v}}_{0c} } \right\}\).

Step 3 Calculate the membership matrix:

$$u_{ij} = \left( {\sum\limits_{r = 1}^{c} {\frac{{d_{ij} }}{{d_{rj} }}} } \right)^{ - 1} ,$$
(6)

where \(d_{ij}\) is the Euclidean distance from the element \(x_{j}\) to the cluster center \(v_{i} , \, i = 1,2,...,c, \, j = 1,2,...,n_{1}\).

Step 4 Iterate new cluster centers:

$$v_{i} = {{\sum\limits_{j = 1}^{{n_{1} }} {u_{ij}^{m} x_{j} } } \mathord{\left/ {\vphantom {{\sum\limits_{j = 1}^{{n_{1} }} {u_{ij}^{m} x_{j} } } {\sum\limits_{j = 1}^{{n_{1} }} {u_{ij}^{m} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j = 1}^{{n_{1} }} {u_{ij}^{m} } }},$$
(7)

where \(m\) is the weighted index of membership degree, which is used to adjust the fuzzy degree of the clustering results, generally \(m = 2\).

Step 5 Repeat steps 3 and 4 iteratively until the condition \(\left\| {V\left( {k + 1} \right) - V\left( k \right)} \right\| < \varepsilon\) is satisfied (\(\varepsilon\) is the iteration stop threshold) or the maximum iterations are reached.

2.4 Fuzzy time series algorithm

On the basis of the fuzzy set theory and other concepts proposed by Zadeh [37], Song and Chisom [38, 39] established the fuzzy time series (FTS) model, which was successfully used to predict the enrollment data for the University of Alabama. Subsequently, traditional FTS model and its variants were widely applied in other fields (e.g., temperature, stock index, and network traffic) to perform forecasting and have achieved good forecasting results [40, 41]. The basic definitions of FTS are as below:

Definition 1 It is assumed that \(U\) is a given universe of discourse, which can be divided into \(n_{2}\) subintervals in order, then \(U = \left\{ {u_{1} ,u_{2} , \cdots ,u_{{n_{2} }} } \right\}\). Define \(A\) as the fuzzy set on the universe \(U\), expressed as:

$$A = \frac{{f_{A} \left( {u_{1} } \right)}}{{u_{1} }} + \frac{{f_{A} \left( {u_{2} } \right)}}{{u_{2} }} + \cdots + \frac{{f_{A} \left( {u_{{n_{2} }} } \right)}}{{u_{{n_{2} }} }},$$
(8)

where \(f_{A} ( \, )\) is the membership function of fuzzy set \(A\), \(f_{A} ( \, ) \in \left[ {0,1} \right]\); \(f_{A} (u_{i} )\) represents the membership degree of the interval \(u_{i} (1 \le i \le n_{2} )\) with respect to the fuzzy set \(A\).

Definition 2 Let the original time series \(Y = \left\{ {y_{t} } \right\} = \left\{ {Y(t)} \right\}(t = 1,2,...)\) be a subset of the real number field R. Define a set of fuzzy sets \(f_{i} (t)\;(i = 1,2,...)\) on the series \(Y\), and the series \(F\left( t \right) = \left\{ {f_{1} \left( t \right),f_{2} \left( t \right), \cdots } \right\}\),then \(F = \left\{ {F(t)} \right\}(t = 1,2,...)\) is a fuzzy time series defined on \(Y\).

Definition 3 Suppose there is a fuzzy logical relationship (FLR), i.e., \(R(t,{\text{t - 1}})\), between \(F\left( t \right)\) and \(F\left( {t{ - 1}} \right)\), which satisfies:

$$F\left( t \right) = F\left( {t - 1} \right) \odot R\left( {t,t - 1} \right),$$
(9)

then it is said that \(F\left( t \right)\) is obtained only by \(F\left( {t{ - 1}} \right)\) (\(\odot\) is a combination operator). And set \(F\left( {t{ - 1}} \right){ = }A_{{\text{i}}}\) and \(F\left( t \right){ = }A_{j}\), then the FLR can also be expressed as: \(A_{{\text{i}}} \to A_{{\text{j}}}\). Between them, \(A_{{\text{i}}}\) and \(A_{{\text{j}}}\) are called the left-hand side (LHS) and right-hand side (RHS) of the FLR, respectively.

Definition 4 All the single FLRs with the same LHS can be composed into the same fuzzy logical relationship set (FLRS). For example, the three FLRs (\(A_{l} \to A_{r1}\), \(A_{l} \to A_{r2}\), \(A_{l} \to A_{r3}\)) with the same LHS can be composed into one FLRS, which is expressed as \(A_{l} \to A_{r1} ,\;A_{r2} ,A_{r3}\).

2.5 FCM-FTS model

For fuzzy time series, the unsupervised discretization method was generally used to obtain the equal-width intervals, which is simple and convenient. However, equal-width interval partitioning method is not very interpretable for the intervals and the forecasting results are not accurate enough [42]. The FCM clustering algorithm partitions the universe of discourse according to data characteristics, which is more objective. Furthermore, this algorithm can explain the actual meaning of each sub-interval by the explanation of the clustering center, which is more scientific and reasonable than the equal-width interval partitioning method. In this paper, the FTS model optimized by Chen [43] with the FCM algorithm partitioning the universe of discourse, i.e., FCM-FTS model, is applied for prediction. The specific steps are as follows:

Step 1 Detect the stationarity of the time series to be predicted by the augmented Dickey-Fuller (ADF) test [44]. If the series is stable, turn to step 2 directly. Otherwise, make the series stable by preprocessing it with the difference method [7].

Step 2 Divide the universe \(U\) into \(n_{2}\) intervals by the FCM clustering algorithm, then \(U = \left\{ {u_{1} ,u_{2} ,...,u_{{n_{2} }} } \right\}\).

Step 3 Define the fuzzy set for the raw time series by determining the fuzzy membership function. Then, construct fuzzy set \(A_{{\text{i}}}\) based on the intervals. And the fuzzy membership function \(f_{{A_{i} }} (u_{j} )\) can be defined as follows [45]:

$$f_{{A_{i} }} (u_{j} ) = \left\{ {\begin{array}{*{20}l} {1,\;\;\;i = j} \hfill \\ {0.5,\;\;\;i = j + 1} \hfill \\ {0,\;\;\;{\text{others}}.} \hfill \\ \end{array} } \right.$$
(10)

Step 4 Fuzzify the actual values. Fuzzify a raw value to \(A_{{\text{i}}}\) when the highest degree of membership of that raw value is in \(A_{{\text{i}}}\) [43].

$$fuzzify(actual_{t} ) = A_{i} {\text{ if }}f_{{actual_{t} }} (A_{i} ) = \max [f_{{actual_{t} }} (A_{z} )],{\text{ z = 1,2,}}...{\text{,M,}}$$
(11)

where \(f_{{{\text{actual}}_{t} }} (A_{z} )\) denotes the degree of membership of the actual value at t under \(A_{z}\), and \(M\) denotes the number of the fuzzy sets.

Step 5 Establish and group the FLR. According to the definition 3 and 4 in Sect. 2.4, the first-order FLR and FLRS are constructed for all fuzzy sets of the fuzzy time series.

Step 6 Determine and standardize the weight matrix. The weights can be calculated and standardized based on step 5, and then the centroid defuzzification method can be used to further calculate the defuzzification matrix.

$$\begin{gathered} W\_s(t) = (W_{1} ^{\prime},W_{2} ^{\prime},...,W_{k} ^{\prime}), \hfill \\ W_{{\text{i}}} ^{\prime} = W_{i} /\sum\limits_{i = 1}^{k} {W_{i} } , \hfill \\ \end{gathered}$$
(12)

where \(W_{{\text{i}}}\) is the unstandardized weighting matrix element, and \(W_{{\text{i}}} ^{\prime}\) denotes the standardized one. \(W\_s\) represents the standardized weighting matrix.

Step 7 Obtain the forecasting results. Multiply the defuzzified matrix by standardized weighting matrices to obtain the rudimentary forecasting results:

$$\hat{F}(t) = D(t - 1) \times W\_s(t - 1),$$
(13)

where \(\hat{F}(t)\) denotes the forecasting result and \(D\) denotes the defuzzified matrix.

2.6 Overall process of the proposed model

To forecast tourist arrivals, we propose a novel hybrid model of X12-ARIMA, ICEEMDAN, FCM, and FTS, namely X12-ARIMA-ICEEMDAN-FCM-FTS model. This hybrid model includes two stages, i.e., dual decomposition and integrated forecasting. Figure 1 shows the overall process of our hybrid forecasting model, with four main steps involved as follows:

Fig. 1
figure 1

Overall process of the proposed hybrid forecasting model

2.6.1 Stage 1: Dual decomposition

Step1: Considering the seasonal characteristics of the tourist arrivals data, first the original time series is decomposed by X12-ARIMA method, extracting the seasonal component and obtaining the seasonally adjusted series.

Step 2: ICEEMDAN is then used to decompose the seasonally adjusted series into n-1 intrinsic mode functions (\({\text{IMF}}_{1}\), \({\text{IMF}}_{{\text{2}}}\),…, \({\text{IMF}}_{{n - 1}}\)) with different time scale features and one smooth residual series (Residue), in order to reduce the data complexity.

2.6.2 Stage 2: Integrated forecasting

Step 3 The FCM-FTS method is used to model and predict the seasonal factors series, n-1 IMFs component series, and the residual series, respectively.

Step 4 Finally, the predicted values for all the components, respectively noted as SEA', \({\text{IMF}}_{1}\)', \({\text{IMF}}_{{\text{2}}}\)',…, \({\text{IMF}}_{{n - 1}}\)', and Residue', are linearly summed up to get the final prediction results.

3 Experiment

In this section, we used the developed hybrid model to forecast Hong Kong’s inbound tourist arrivals from three countries (i.e., USA, UK, and Germany) for illustration and verification purposes. In particular, several related experiments were carried out with multiple control groups set up, and comparison and analysis were made from various aspects to verify the performance of our proposed model, in which the main parameters involved can be seen in Table 4 (in Appendix 1). Furthermore, final prediction results were taken as the average of 100 runs to avoid the influence of random factors.

3.1 Data description

The monthly tourist arrivals to Hong Kong from USA, UK, and Germany (simply noted as GER) are selected as data samples, as shown in Fig. 2. For each series, there are 168 observations, covering the period from January 2006 to December 2019, which can be obtained from Wind Database (http://www.wind.com.cn/). Meanwhile, to evaluate the model robustness, the samples are rolled backward for one year at a time, thus each sample can produce three subsamples with the same number of observations, covering the periods from January 2006 to December 2017, January 2007 to December 2018, and January 2008 to December 2019, respectively. The sample data are shown in detail in Table 5 (in Appendix 1). In addition, a link to the supplementary material related to this article (including the data and the code) can be found in Appendix 2.

Fig. 2
figure 2

The monthly tourist arrivals to Hong Kong from USA, UK and GER

In addition, the experiments conducted in this paper all perform one-step-ahead predictions. The data of each subsample can be divided into training set for model training and testing set for evaluating model performance. In particular, the data of the preceding 11 years (132 observations) are used as training set, while the following year (12 observations) as testing set. Finally, the monthly tourist arrivals in 2017, 2018, and 2019 are predicted, respectively. According to the results of the three forecasting years, the final prediction performance of the proposed model is evaluated.

3.2 Evaluation criteria

Considering that there is no universally applicable standard for prediction model error evaluation [46], we choose six popular criteria (i.e., AE, MAPE, RMSE, MAE, TIC, and IA) to evaluate the model prediction performance, as listed in Table 1. Obviously, except for the IA criterion, a smaller evaluation criterion means that the prediction is more accurate.

Table 1 Evaluation criteria

3.3 Experiment design

In this paper, three experiments were designed for comparison purpose. In Experiment I, the proposed model is compared with other six hybrid models based on different decomposition methods to prove the superiority of the proposed dual decomposition strategy. Specifically, the six hybrid models selected as benchmarks are as follows: X12-ARIMA-FCM-FTS, ICEEMDAN-FCM-FTS, CEEMDAN-FCM-FTS, EEMD-FCM-FTS, WD-FCM-FTS, and ICEEMDAN(R)-FCM-FTS. In Experiment II, from a longitudinal perspective, the proposed model is compared with several partial hybrid models which only use some of the single methods involved in our model. On the basis of this experiment, we try to demonstrate the importance of the components of our model, including the effectiveness of X12-ARIMA and ICEEMDAN methods, as well as the superiority of FCM algorithm and dual decomposition strategy. In Experiment III, we further compare the proposed model with some popular single models, such as typical econometric models and ANNs, to prove the superiority of our model.

3.4 Experiment I

To fully verify the forecasting superiority of our proposed dual decomposition strategy, two types of comparative analysis were carried out. In Comparison I, we compare the performance of five hybrid models using different decomposition methods (including X12 -ARIMA, WD, EEMD, CEEMDAN, ICEEMDAN) and the same forecasting method (i.e., FCM-FTS model) to demonstrate the effectiveness of the individual decomposition methods used in our model. Tables 6 and 7 in Appendix 1 show the main parameters of the compared decomposition methods and the corresponding prediction results, respectively. In Comparison II, we compare our hybrid model with ICEEMDAN-FCM-FTS, X12-ARIMA-FCM-FTS, and ICEEMDAN(R)-FCM-FTS models to further prove the superiority of the proposed dual decomposition strategy, with the corresponding experimental results shown in Table 8 in Appendix 1. In detail, the ICEEMDAN(R)-FCM-FTS model is performed under the decomposition & de-noising strategy [25], while other compared hybrid models in this subsection are performed under the divide & conquer strategy [26]. Generally speaking, the two data preprocessing strategies are widely used for forecasting complex data with high volatility and irregularity and can represent currently popular modeling strategies.

The detailed comparison and analysis are as below:

  1. (1)

    In Comparison I, by comparing the prediction performance of four hybrid models, including ICEEMDAN-FCM-FTS, CEEMDAN-FCM-FTS, EEMD-FCM-FTS, and WD-FCM-FTS, we can find that the FCM-FTS forecasting model combined with the ICEEMDAN is superior to that combined with CEEMDAN, EEMD, and WD, which shows the advantages of ICEEMDAN when compared with other traditional decomposition methods. Moreover, it can be found that the X12-ARIMA-FCM-FTS performs better than the above-mentioned four hybrid models in all cases. Taking case 1 as an example, the value of MAPE in the X12-ARIMA-FCM-FTS model is the lowest (9.5122%), in the ICEEMDAN-FCM-FTS model is the second lowest (11.8851%), while in the CEEMDAN-FCM-FTS, EEMD-FCM-FTS and WD-FCM-FTS models are, respectively 12.6937%, 13.2176%, and 14.0263%. Similar results can be obtained in other two cases, which fully prove the superiority of ICEEMDAN and the necessity of seasonal decomposition for tourist arrivals series forecasting. Moreover, for the data (such as Hong Kong’s tourist arrivals) with significant seasonal characteristics, using X12-ARIMA for seasonal decomposition can effectively improve the model prediction performance.

  2. (2)

    In Comparison II, by comparing the proposed model with other three hybrid models, including ICEEMDAN-FCM-FTS, X12-ARIMA-FCM-FTS, and ICEEMDAN(R)-FCM-FTS, we can see that our model, i.e., X12-ARIMA-ICEEMDAN-FCM-FTS, has the best prediction performance. In general, the proposed dual decomposition strategy in this paper has more advantages than the above-mentioned traditional decomposition strategies and can achieve better prediction performance. It can be seen from Table 8 (in Appendix 1) that except for the AE criterion in case 2, the X12-ARIMA-ICEEMDAN-FCM-FTS model performs best in all evaluation criteria in all cases. Obviously, the proposed dual decomposition strategy plays a significant role in improving the model prediction performance.

Remark 1

Based on the comparative analysis of the values of AE, MAPE, RMSE, MAE, TIC, and IA criteria, it can be found that compared with other traditional decomposition methods (such as WD and EEMD), ICEEMDAN is more effective when combined with the prediction model, reflecting the superiority of ICEEMDAN. Meanwhile, the X12-ARIMA-FCM-FTS model performs best among all the hybrid models in Comparison I, which again verifies the rationality and necessity of adopting targeted data preprocessing strategy for tourist arrivals data with significant seasonal characteristics. Furthermore, in Comparison II, we also successfully prove the effectiveness of the proposed dual decomposition strategy compared with the traditional data preprocessing strategies. To sum up, by applying the divide & conquer strategy to both the raw series with seasonal patterns and the seasonally adjusted series, the proposed dual decomposition strategy can successfully overcome the potential disadvantages of individual decomposition approaches and plays an important role in improving the model prediction performance.

3.5 Experiment II

Experiment II was designed mainly to verify the effectiveness of the hybrid modeling strategy based on X12-ARIMA, ICEEMDAN, FTS, and FCM. Thus, Experiment II consists of Comparison I and Comparison II for longitudinal comparison purpose. Generally speaking, the discretization method is important to the prediction performance of an FTS method. In the related studies, as the most commonly used unsupervised discretization methods, the equal frequency (EF) and equal width (EW) interval algorithms cannot always achieve satisfactory forecasting results. Therefore, we use FCM algorithm as the discretization method for a fuzzy time series. Accordingly, in Comparison I set in this subsection, the FTS model with FCM method dividing the universe of discourse (i.e., FCM-FTS model) is compared with that with EW and EF methods dividing the universe of discourse (i.e., EW-FTS and EF-FTS models) to prove the effectiveness of FCM algorithm. In Comparison II, we compare our model, i.e., X12-ARIMA-ICEEMDAN-FCM-FTS, with other three models, including FCM-FTS, X12-ARIMA-FCM-FTS, and ICEEMDAN-FCM-FTS models, which are partial hybrid models only using some of the single methods involved in the proposed model, to fully illustrate the rationality of our hybrid modeling strategy. The detailed comparison and analysis are shown below:

  1. (1)

    As demonstrated in Table 9 (in Appendix 1), we can observe that compared with EW-FTS model and EF-FTS model, the FCM-FTS model has almost all the best evaluation criterion values in all three cases. Taking case 1 as an example, except for the AE criterion, the values of MAE, RMSE, MAPE, TIC, and IA of the FCM-FTS model are 16,127.6028, 20,978.7502, 15.8852%, 0.1013, and 0.9910, respectively, which are all smaller than that of the EW-FTS and EF-FTS models. This fully demonstrates that using FCM algorithm as the discretization method can improve the performance of the fuzzy time series model more effectively.

  2. (2)

    Table 10 in Appendix 1 shows the experimental results of Comparison II. Obviously, the X12-ARIMA-FCM-FTS and ICEEMDAN-FCM-FTS, models perform better than the FCM-FTS model, which once again shows the importance and necessity of using data preprocessing to the data with complex characteristics. By comparing the proposed hybrid model with the FCM-FTS, X12-ARIMA-FCM-FTS and ICEEMDAN-FCM-FTS models, we can see that our model, i.e., X12-ARIMA-ICEEMDAN-FCM-FTS, has the best prediction performance. For example, the MAPE values of our proposed model in the three cases are 4.2343%, 3.4946%, and 4.7533% respectively, evidently lower than that of the other three models. Moreover, focusing on the IA criterion, the criterion values of our model in the three cases are all greater than 0.999, while the three compared models’ criterion values are all below 0.999.

Remark 2

It is necessary to find a reasonable and effective discretization method for the fuzzy time series model. In Comparison I, the FTS model with FCM as the discretization method performs better than that with EW and EF as the discretization method. In addition, the proposed forecasting model performs best in Comparison II, which intuitively shows that the proposed hybrid modeling strategy can significantly improve the performance of the benchmark FTS model by integrating the advantages of every single method greatly.

3.6 Experiment III

In Experiment III, by taking some commonly used single forecasting models as benchmarks, including traditional econometric models, typical ANNs, and other popular models, we further test the prediction performance of our model. Specifically, the extreme learning machine (ELM) and backpropagation neural network (BPNN) are chosen for comparison as typical ANNs, while seasonal ARIMA (SARIMA) and double exponential smoothing (DES) as typical traditional econometric models. Meanwhile, support vector regression (SVR), which is popular in forecasting, is also chosen for comparison in this experiment.

Accordingly, two standard models of the ANNs, i.e., BPNN (5–1-1) and ELM (5–1-1) are established respectively, of which the training set and testing set are presented in Fig. 3. For the SARIMA model, the parameters are determined based on a stability test and the Akaike Information Criterion (AIC) [47]. For the SVR model, radial basis function (RBF) as the most popular kernel function is chosen. The main parameters of the comparison models are shown in Table 11 (in Appendix 1), and Table 12 in Appendix 1 presents the corresponding results.

Fig. 3
figure 3

The training set and testing set of ANN models

According to Table 12 (in Appendix 1), the detailed experimental results and comparative analysis are shown below:

  1. (1)

    The proposed X12-ARIMA-ICEEMDAN-FCM-FTS model performs best in all three cases. Since the AE criterion is not sufficient to reflect the prediction accuracy, more attention should be paid to MAPE, RMSE, and MAE criteria when evaluating the model prediction performance. Taking MAPE criterion as an example, the propose forecasting model has the lowest MAPE values in all three cases, even reaching 3.4946% in case 2, which fully reflects its superior prediction performance.

  2. (2)

    Moreover, the BPNN performs slightly better than the ELM in the ANN models. In the traditional econometric models, the DES model performs worse, even with the MAPE value as high as 25.5657% in case 3. Comparatively speaking, the SARIMA performs better, only worse than our proposed model, which again shows that for data such as tourist arrivals series with significant seasonal characteristics, the targeted use of certain seasonal forecasting methods can achieve better prediction results.

Remark 3

Compared with the five commonly used single forecasting models, our hybrid model performs best. Among all the single models selected, the SARIMA performs best, and the DES performs worst.

4 Discussion

This section presents an in-depth comparative analysis for the prediction results of all models at different years of 2017, 2018, and 2019. Moreover, we further analyze the prediction performance of all the models involved from several different perspectives, including the DM statistics, forecasting effectiveness, improvement percentage, and grey relational degree.

4.1 Forecasting results at different years

As we all know, tourist arrivals data has complex characteristics, and is extremely vulnerable to abnormal events, resulting in abnormal fluctuations, which greatly increases the difficulty in prediction. Therefore, it is particularly important for the model to maintain a stable and great prediction performance when abnormal events occur. In view of this concern, based on three basic evaluation criteria of prediction accuracy, i.e., MAE, RMSE, and MAPE, this subsection compares and analyzes the prediction results of all the involved models at different years of 2017, 2018, and 2019. It should be noted that Hong Kong's tourism industry was severely affected in 2019 due to the outbreak of some social events, which made the data of tourist arrivals to Hong Kong in that year showing a very irregular pattern compared with the previous years. It can also be found from the experimental results that the prediction performances of each model in these three years are very different.

As demonstrated in Table 13 (in Appendix 1), we can observe that: (a) In all three cases, the proposed model performs best in all three years. Taking the prediction of 2017 in case 1 as an example, the MAPE value of our proposed model is as low as 2.30%, showing a very excellent prediction performance. (b) Compared with 2019, most of the prediction models perform better in 2017 and 2018. For example, the SARIMA model has a good performance in predicting the tourist arrivals to Hong Kong in 2017 and 2018 in case 2, where the MAE, RMSE, and MAPE values are 2440.96, 3057.41, and 4.93% in 2017 as well as 1241.60, 1952.32, and 2.44% in 2018, respectively. In 2019, the values of these three criteria are as high as 5997.63, 8297.25, and 15.69% respectively, which are several times the criterion values of the previous two years. This result verifies that the tourist arrivals to Hong Kong in 2019 have undergone extremely irregular changes compared with the previous years due to social events. Therefore, models which are built based on historical data are difficult to capture the trend variation of the data in 2019. (c) Some models show little change in the prediction results of the three years, and even their prediction performances of 2019 are better than that of the previous two years, such as the EEMD-FCM-FTS and WD-FCM-FTS models in case 2 and case 3, which indicates that these models are relatively stable and almost immune from the influence of the abnormal events in 2019. Nevertheless, the comprehensive prediction performances of these models are still poor, and the proposed model performs best among all the comparison models. Even for the data of 2019 which were affected by the abnormal events, our proposed model can still show a superior prediction performance.

4.2 Statistical hypothesis testing: Diebold-Mariano test

From a statistical perspective, Diebold and Marino [48] used the Diebold-Mariano (DM) statistic to test the difference in the significance of prediction performances between different models. For the DM test, the null hypothesis is that there is no significant difference in the prediction performances of two comparison models. If the test result rejects the null hypothesis at a certain level of significance, it indicates that the prediction performances of the two comparison models are significantly different.

For further comparison, the DM test is implemented to test the different significance of the performances between our proposed model and all the fourteen benchmark models involved in the previous designed experiments. Using mean square error as the loss function, the corresponding statistics are shown in Table 14 (in Appendix 1). For all cases and the average level, we can observe that: (a) almost all benchmark models’ DM statistics are greater than the upper limit at the 1% significance level, which once again reflects the remarkable superiority of our model; (b) among all benchmark models, the DM statistics of SARIMA and X12-ARIMA-FCM-FTS models are almost the lowest, showing that for tourist arrivals data with seasonal characteristics, specifically using some seasonal forecasting methods can achieve better forecasting results, and also proving that the seasonal decomposition method chosen in this paper is scientific and reasonable.

4.3 Forecasting effectiveness

Furthermore, we use forecasting effectiveness (FE) [12] to measure the model prediction accuracy. The higher the value of FE, the better is the prediction performance. Specifically, the first-order and second-order FE values are calculated by the expected value of the prediction accuracy sequences as well as the difference between its expected value and standard deviation, respectively. To further compare the prediction accuracy, we calculate the first-order and second-order FE values for all the involved models in this paper, as shown in Table 15 (in Appendix 1). We can observe that: (a) in all the three cases, the FE values of our hybrid model are always the highest, which are 0.957657, 0.965054, and 0.952467 as well as 0.905740, 0.928907, and 0.903511 for the first-order and second-order FE, respectively, meaning that our model performs best; (b) among all the comparison models, the FE values of the EF-FTS model are the lowest, which means that the traditional equal frequency division method (EF) is not suitable for data such as the tourist arrivals to Hong Kong.

4.4 Improvement percentage

In this subsection, we use three criteria to measure the improvement percentages of our hybrid model relative to all the comparison models. Accordingly, the improvement percentage criteria are denoted as \(RE_{MAE}\),\(RE_{RMSE}\) and \(RE_{MAPE}\), representing the decreased relative error (RE) of MAE, RMSE and MAPE, respectively, of which the calculation formulas can be seen in Table 2. Table 16 in Appendix 1 shows the corresponding results, and further comparisons and analyses are as below.

Table 2 Improvement percentage criteria

As reported in Table 16 in Appendix 1: (a) Our proposed model has the greatest improvement relative to the benchmark FTS models (i.e., EW-FTS, EF-FTS), with the improvement percentages at about 80%. Meanwhile, the improvement percentages of our model relative to the hybrid models using traditional decomposition methods and strategies are also high, which are around 70%. The above results once again verify the rationality for our model using hybrid modeling strategy and dual decomposition strategy, which greatly improves the model prediction performance by integrating the advantages of each method involved. (b) Relative to the traditional single prediction models, our proposed model shows a significant improvement, with almost all of the values of the three improvement percentage criteria above 50%. Obviously, it can achieve a satisfactory prediction performance in tourist arrivals forecasting.

4.5 Grey relational degree

In this subsection, the correlation degree between the prediction results and the actual time series is measured by the grey relational degree (GRD) [49], of which a higher value means a better prediction performance. The corresponding results are shown in Table 17 (in Appendix 1), with the detailed analysis as follows: (a) the prediction results of our model have the strongest correlation to the actual series, with the values of GRD greater than 0.9 in all three cases, and the average is 0.924274; (b) the SARIMA model and X12-ARIMA-FCM-FTS model also perform very well, with the average GRD being 0.858878 and 0.871148 respectively, which once again reflects the rationality of using seasonal data preprocessing method for tourist arrivals series with significant seasonal characteristics. Therefore, we can reasonably conclude that our hybrid forecasting model has significant differences from the benchmark models in the level of prediction accuracy.

5 Conclusions

Accurate prediction of tourist arrivals is important for the whole tourism industry and also the entire economy. Meanwhile, it is of reference value to both travel agencies and tourist destinations. Unfortunately, due to its complex characteristics, tourist arrivals forecasting remains a challenging task. Thus, a hybrid model using dual decomposition strategy and an improved FTS method is proposed to predict tourist arrivals.

In the empirical study, experiments are designed using Hong Kong’s tourist arrivals from USA, UK, and Germany as data samples. The results demonstrate that: (a) The novel dual decomposition strategy based on X12-ARIMA and ICEEMDAN methods proposed in this study can not only overcome the inherent defects of individual decomposition methods, such as mode mixing, noise, and redundancy but also fully extract the complex features of the original time series at different time scales. Compared with traditional decomposition strategies used in hybrid forecasting models, the proposed dual decomposition strategy is more significant and effective in improving the model prediction performance. (b) The combination of the X12-ARIMA-ICEEMDAN decomposition strategy and FCM-FTS forecasting model is very effective, which reasonably integrates the advantages of every single method involved. Our model performs better than benchmark models in all three cases, indicating that it is a promising tool for tourist arrivals forecasting.

The current results have important practical implications. The findings imply that more accurate predicted values of the monthly tourist arrivals to a country or region from other countries or regions can be obtained via the proposed model, which can bring about at least two benefits: (a) Taking the monthly dynamics of the future tourist arrivals as a reference for decision-making, travel agencies and other tourism-related enterprises can operate more effectively, and tourist destinations can also be managed more efficiently. (b) Based on the accurate monthly forecasts of the tourist arrivals, we can monitor the performance of the whole tourism industry and the entire economy in a real-life environment. The findings also imply that the proposed model can maintain a stable and great prediction performance when abnormal events occur. This is important in practical applications, as it is common for the data to be affected by abnormal events. Thus, the proposed model is also applicable to other fields with similar data characteristics to tourist arrivals.

In addition, our proposed hybrid model is still inadequate in qualitative research. Meanwhile, the factors related to tourism demand (such as per capita GDP, the number of air routes opened) can be considered adding into the forecasting model. In future research, qualitative analysis and quantitative prediction can be combined organically so as to achieve better prediction performance.