Big Data: Forecasting and Control for Tourism Demand

Reina, Miguel Ángel Ruiz

doi:10.1007/978-3-030-56219-9_18

Miguel Ángel Ruiz Reina ORCID: orcid.org/0000-0001-6055-7810⁶

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

International Conference on Time Series and Forecasting

1186 Accesses
1 Citations

Abstract

In this study, innovative forecasting techniques and data sources from Big Data are used for the study of Hotel Overnight Stays for Spain, from January 2018 to June 2019. The unstoppable development of the Tourism sector with the application of Big Data technologies, allow to make efficient decisions by economic agents. In this work, the use of the data collected from the Google Data Mining tools allows to obtain knowledge about Hotel Tourism Demand in Spain. The analysis carried out meets the four basic principles of Big Data analysis: volume, velocity, variety and veracity. In this setting, the methodology used corresponds to ARDL models, and ECM models being developed Granger-Causality extended to seasonality. The first one explains easily when economic agents will make their decisions; while the second one allows forecasting for short-term and long-term. This fact means that tourist offers and demands can be perfectly adjusted at every moment of the year. As a criterion for the selection of models, the innovative Matrix U1 Theil is proposed, this allows to quantify how much a model is better than another in terms of forecasting.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Tourism Demand Analysis Algorithm Based on Data Forecast Model Analysis

Time Series Analysis

Application of Benford’s Law to the Tourism Demand: The Case of the Island of Sal, Cape Verde

Keywords

1 Introduction

The use of massive data in a digital environment has led to a disruptive change in the developed economies of the world. Before the appearance of the Big Data concept, the amount of data collected already exceeded the ability to process and analyze data. The generation of massive data by the millions of device users and data analysis have created an unsuspected digital economy decades ago [1].

The “Tourism Industry” [2], generates a quantity of data to be analyzed. This sector increasingly has a greater weight in the Gross Domestic Products (GDP) and turn generates externalities in economic agents [3].

This paper introduces a modern unexplored analysis of the data generated on the internet network for the Spanish tourism accommodation market by country of origin. Innovative modelling of data processing from primary data sources (official sources) with secondary sources from Big Data (Google Trends—GT) is introduced following four basic principles of analysis: volume, velocity, variety and veracity. GT analyzes the shift of searches throughout the time and reveal consumer intentions.

The main objective of this paper is to obtain forecasting on Hotel Overnight Demand in Spain (HODS) from January 2018 to June 2019, by establishing a causality model for monthly data. The multivariate method developed of Autoregressive Distributed Lags with seasonal variables (ARDL + seasonality) uses as an explanatory variable for HODS a search interest rate (generated by GT) and seasonal dummies variables for monthly data by country of origin. This second contribution is a very relevant fact since tourism agents will be able to make efficient decisions in the tourism market. To explain causation relations, the Granger-Causality test extended with seasonality is developed and modelling we will be able to identify when consumer interest occurs. Ultimately, a criterion for the selection of new models, such as Matrix U1 Theil, has been developed, and it will be applied in this paper [4]. The forecasting is compared with univariate techniques such as Seasonal Autoregressive Moving Average (SARIMA) and the relatively new non-parametric technique Singular Spectrum Analysis (SSA).

The remainder of this research is as follows: Sect. 2 provides a review of the existing literature on the forecasting of Tourism Demand, influenced by the techniques of every epoch; in Sects. 3 and 4, data analysis is initially carried out along with the methodological development and information criteria. The use of the criterion for the selection of predictive models based on Theil’s index is considered a great contribution to the literature. In Sect. 5 an empirical analysis is carried out verifying the application of the proposed methodology. Section 6 shows the conclusions and future lines of research for Data Scientists and some economic implications. Finally, there is a section for the bibliographical references used.

2 Literature Review

Data science is a fundamental field for the exploitation and generation of knowledge to make decisions in efficiency. In the bibliographic research carried out the appearance of these new datasets from open data such as Google could modify the culture and business in the Tourism field [5].

Tourism Demand is caused by multiple exogenous factors and techniques have focused on obtaining robustness and dynamic modelling, scalability and granularity [6]. The variety of Big Data studies has been applied to Tourism research, making a great improvement in the area [7]. Traditionally these studies have been influenced by the techniques of the moment [8,9,10,11]. However, researchers have found the need for greater integration between computational and scientific fields [12].

In our study, we will carry out an analysis with novel techniques and will be compared with most used techniques, a contribution of this study is the use of Big Data [13], tools summarized in an index of relevance provided by GT.

2.1 Forecasting Methods Using Google Search Engines (Google Trends)

Previous researchers such as Lu and Liu [14], found correlations between Internet search behaviour and the flows produced by tourists. Shimshoni et al. [15] concluded that 90% of the categories analyzed are predictable, making a great contribution to the scientific literature (categories: Socio-Economics fields).

Using the R programming and developing several examples in which the GT tool is used, it is worth mentioning the study of Choi and Varian [16], to analyze the tourism demand in Hong Kong. They obtained models with high explanatory capacity (on average $ R^{2} = 73{\% } $) using ARDL. Gawlik et al. [17] concluded that the GT search popularity evolution offers a useful predictor of tourism rates for a series of arrivals of Hong Kong. For the Charleston region (USA), practical and interesting applications were found on the use of search engine data. The main limitation is that it was done only in one city [18].

To carry out Chinese Tourists’ forecasting, Yang, Pan et al. [19], proposed and demonstrated the valence of the use of search engines based on web searches comparing Baidu search engines with those of GT. In this sense, with data obtained through GT, comparing purely autoregressive models with ARDL models with seasonal dummy variables, short-term results were obtained for the case of Vienna with data from images, words search or videos on YouTube [20].

Studies from the use of GT have meant an improvement in predictions for the Caribbean area. Autoregressive Mixed-Data Sampling models represent an improvement over SARIMA (Seasonal Autoregressive Integrated Moving Average) and AR for 12-months predictions [21].

The study of the tourist flows from Japan to South Korea has been examined with the construction of the Google variable combining the lowest Mean Square Error (MSE) or the absolute average of forecast errors for monthly data. Finding the best results for the model that uses Google data [22].

In the case of tourist flows from Spain, Germany, UK and France, Google data was used with the construction of indicators through Dynamic and SARIMA models [23]. For tourist arrivals in the city of Vienna [24], Google Analytics data was extracted using Bayesian methods. In the case of Puerto Rico, the volume of searches has been studied to predict the hotel demand of non-residents with a Dynamic Linear Model. The results showed improvements in forecasting time horizons greater than 6 months [25]. Google data has been used for the flow of tourists in Portugal [26] and tourists flow in Spain [27].

Irem Önder [28] compared forecasting models with web and/or image search indices regarding two cities (Vienna and Barcelona) and two countries (Austria and Belgium). Tourist Arrivals in Prague was analyzed by Zeynalov [29], with the objective to assess whether GT were useful for forecasting tourists’ arrivals and overnight stays in Prague with weekly data. The results confirm that predictions based on Google searches are advantageous for policymakers and businesses operating in the Tourism sector.

The online behaviour of hotel consumers for the United States of America was researched with Discrete Fourier Transformation using data from GT, with empirical evidence for its use in marketing strategies [30].

In the case of Amsterdam, it has been investigated by Rödel [31], on forecasting Tourism Demand using keywords related to “Amsterdam” in GT. With the development of Big Data technology in the last decades have emerged collaborative economy companies [32]. They have carried out studies on a vacation rental company that operates worldwide but reducing it to results from the Iberian Peninsula. In 2018, a study was published on the online and offline behaviour of consumers, for US restaurants with Google and Baidu search engine data. [33].

The data provided by Google use an index that summarizes the interest of the search words, in the case of data from Baidu. Li et al. [34], developed an index of interest with data from Baidu. Demonstrating the forecasting capacity of Dynamic Factor Model (GDFM) to forecast tourist demand in a destination for Monthly Beijing tourist volumes from January 2011 to July 2015. A relevant study using Machine Learning algorithms is the one developed by Sun et al. [35], using criteria for the selection of models such as Normalized Root Squared Error (NRMSE) and MAPE, in addition to using the Diebold-Mariano criterion to determine if the prediction differences are significant.

Measures of forecasting. As observed above, the Tourist Industry has had an interest in the past, in the present and in the future, and it will continue to have it. Mainly because it is an industry signal of the evolution of the service economy. So, the modelling used is very diverse, one aspect to be taken into account has been the criteria of information on the selection of models. It has been observed in the literature review the use of Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE); Theil’s index [36,37,38,39]; Symmetric Mean Percentage Error (SMAPE) [40]. Some authors developed the RMSE ratio [41, 42], and in this article, we will develop the Matrix U1 Theil as a criterion for the selection of forecasting models [4]. This method allows quantifying the gain of the use of one methodology versus another.

To summarize the review of the literature, we can say that new models have been used in Data Science. In this work, new methodologies are developed, such as the improved Ganger causality test for seasonal data. Dynamic models have been developed to analyze the forecasting capacity in the short and long-term. Big Data tools have been used from one of the largest search engines in the world and a decision matrix on predictive capacity has been developed for different time horizons.

3 Methodology

In this section, the scheme (see Fig. 1), of the cycle between offer and demand in tourism has been developed under four basic principles of Big Data. Specifically, in our paper, the objective is modelling and forecasting, however, we will suppose ad hoc the data from the Data Warehouse [43]. In this sense, the data will come from official sources of the INE^{Footnote 1} and Google^{Footnote 2}. So, all of the Extraction, Transformation and Loading—ETL [44], work will come from the data engineering of these entities. The main objective is to make efficiencies predictions based on knowledge to improve the user experiences of Tourism Demand and the offers of the stakeholders.

3.1 Modelling and Forecasting Evaluation

In this paper, ARDL + seasonality model is proposed and its application with data from Big Data architectures is analyzed. This modelling allows to know how HODS is generated through the searches of Google users (by country of origin). The purpose of this model is to know the causality relationship and to be able to make forecasts. To analyze the relationship between Granger causality and seasonality a test is developed. To evaluate the forecasting capacity is developed Matrix U1 Theil by country of origin. This matrix is developed to evaluate forecasting capabilities in order to obtain a comparative dimensionless measure among models. For a more in-depth detail of the predictions made, the reader can refer to the references of SARIMA [45] and Singular Spectrum Analysis [46]. All models are made for different scenarios and forecast comparisons are made for different time horizons h = 3, 6, 12, 18.

Granger causality and seasonality testing: ARDL and ECM. We develop the test proposed by Granger [47] and discussed by Montero [48], to detect the causality, since it is not observed with the simple analysis of correlation.

The model considered by Granger is for two variables (y_t, x_t). Due to the great influence of seasonality [49], in the Tourism sector, the following equation is proposed with HAC covariance method which determines the robust standard error for parameters estimated

$$ \ln \left( {y_{t} } \right) = \beta_{0} \ln \left( {x_{t} } \right) + \sum\limits_{j = 1}^{m} {\beta_{j} \ln \left( {x_{t - j} } \right)} + \sum\limits_{j = 1}^{m} {\alpha_{j} \ln \left( {y_{t - j} } \right)} + \sum\limits_{i = 1}^{12} {\delta_{i} } w_{i} + \varepsilon_{t}^{\prime } $$

(1)

where $ w_{i} $ is a deterministic seasonal dummy (i = 1, …, 12) component and for monthly data is defined as follows:

$$ \begin{aligned} w_{1} = - 1,{for} \, {others} \, w_{i} = 0 \hfill \\ w_{1} = - 1,w_{2} = 1\;{for} \, {others} \, w_{i} = 0 \, \hfill \\ w_{1} = - 1,w_{3} = 1\;{for} \, {others} \, w_{i} = 0 \hfill \\ \, \vdots \hfill \\ w_{1} = - 1,w_{12} = 1\;{for} \, {others} \, w_{i} = 0 \hfill \\ \end{aligned} $$

The use of HAC covariance method guarantees the efficiency of the parameters estimated. Once obtained $ \varepsilon_{t}^{{\prime }} $, this will be distributed as white noise.

The decision of causality with seasonal effects (Testing linear restrictions for parameters of $ x_{t - j} $ and $ w_{i} $) is asymptotically ($ T \ge 60) $ as Chi-squared [50].

The most general expression of a dynamic model named ARDL^{Footnote 3} (m, n) with seasonal components is as follows [51, 52]:

$$ \gamma \left( L \right)\ln \left( {y_{t} } \right) = \delta \left( L \right)\ln \left( {x_{t} } \right) + \sum\limits_{i = 1}^{12} {\alpha_{i} w_{i} } + \varepsilon_{t} $$

(2)

With the interest of evaluating the dynamic persistence of an effect on the exogenous variable at a certain moment, the Error Correction Model (ECM regression or ARDL Error Correction Regression) is constructed. The ECM^{Footnote 4} regression is as follows:

$$\begin{aligned} \Delta \ln \left( {y_{t} } \right) = & \delta _{0} \Delta \ln \left( {x_{t} } \right) + \sum\limits_{{j = 1}}^{n} {\lambda _{j} \Delta \ln \left( {x_{{t - j}} } \right)} + \sum\limits_{{j = 1}}^{m} {\delta _{j} \Delta \ln \left( {y_{{t - j}} } \right)} \\ & - \gamma \left( L \right)\left[ {\ln (y_{{t - 1}} ) - \beta \ln (x_{{t - 1}} )} \right] + \sum\limits_{{i = 1}}^{{12}} {\alpha _{i} w_{i} } + \varepsilon _{t} \\ \end{aligned} $$

(3)

In this model, short-term effect is represented by parameters of first variables differentiated, while long-term effects $ |\gamma (L)| < 1 $ are represented by Correction Error term. According to Zivot [53], if long-term effect is not statically significant, cointegration does not exist. The long-run multiplier is defined as $ \beta = \frac{\delta \left( L \right)}{\gamma \left( L \right)} $

Forecasting Evaluation: Theil’s measures. To verify the forecasting accuracy of different models, we adopted an evaluation criterion to compare the out-sample forecasting performance. We will work with the inequality index of Theil [36]

$$ U_{1} = \frac{{\left[ {\frac{1}{h}\sum\limits_{h = 1}^{18} {\left( {y_{T + h} - \hat{y}_{T + h} } \right)^{2} } } \right]^{1/2} }}{{\left[ {\frac{1}{h}\sum\limits_{h = 1}^{18} {\left( {y_{T + h} } \right)^{2} } } \right]^{1/2} + \left[ {\frac{1}{h}\sum\limits_{h = 1}^{18} {\left( {\hat{y}_{T + h} } \right)^{2} } } \right]^{1/2} }} $$

(4)

Ratio Theil’s (RT’s) is designed to comparisons between predicted variables with horizons h = 3, 6, 12,18.

$$ RT's_{{y_{it} ,y_{jt} }} = \frac{{U_{1}^{{y_{it} }} }}{{U_{1}^{{y_{jt} }} }} $$

(5)

In the mathematical interpretation of the RT’s, three situations are described according to the predictive capacity of models: if the RT’s is equal to one, both models have the same explanatory capacity; if the ratio is greater than one, this would indicate that the denominator’s model has a better explanatory capacity than that of the numerator; if the ratio is less than one, the numerator’s model has better predictive results than the denominator.

4 Data

The Data of the number of HODS has been collected by INE. For the number of tourists in Spain, by country of origin, the dataset from the first month of 2010 to June of 2019, was obtained. In the grouping of nationalities, the name of “Resident abroad” should be noted. This includes all foreign nationalities except for the 5 main nationalities described in the table (Germany, France, Italy, Netherlands, UK, USA).

According to the data represented in Fig. 2, the average of Residents Abroad was 16,180,005.75 in the period cited. The maximum number of hotel occupancy was recorded in August 2017, with 29,594,071 and the minimum 11.887.105 in January 2010.

To obtain data from Google, the Big Data tool called GT has been used. Previously GT tools have been used to make forecasts as is cited in the literature review. The lowest interest occurred in December of the year 2010. Analyzing the data obtained of interest for the keyword or Google Query (GQ) “visit Spain”, the greatest worldwide interest of the word was in May 2017, just with three periods of advance to the maximum historical overnight stays in Spain.

With the observation of the maximum and minimum values of both series analyzed, it is observed graphically that searches on the Internet are made with at least one period in advance.

Table 1 displays a summary of variables selected by nationalities: Hotel demand and GQ. According to the two series selected, it is worth mentioning that only the variable “Google Queries” in the case of Residents abroad (and USA HODS) meets the hypothesis of normality at 95% confidence (Jarque-Bera). As for stochastic trends (ADF test), all nationalities have unitary roots in Hotel demand and only three cases have been found in which there is evidence of unit root: they are the Google Queries of the Residents abroad, UK and USA. Regarding the stationarity in variance (KPSS), a more stationary behaviour is observed in the Hotel Demand variable for all nationalities including Residents abroad. On the other hand, in the Google queries variable, there is a clearly non-stationary behaviour in the series of Residents Abroad, UK and USA.

Table 1 Mean and stationary analysis of HODS and keyword “visit Spain” sample period Jan. 2010–December 2017. P-values in brackets. Own elaboration

Full size table

5 Empirical Results

The empirical results obtained from the application of the previously proposed methodology section are briefly summarized in the following text. In this paper of predictive techniques, we will focus expressly on the dynamic model with explanatory variables of Internet searches (“visit Spain”) and seasonal factors. The Granger-Causality test extended to seasonality confirms this hypothesis at least within 95% of confidence. As usual in the literature, the forecasting is carried out for time horizons h = 3, 6,12,18 months. Moreover, this article considers the training period from January 2010–December 2017 and out-sample period from January 2018–June 2019.

The results obtained through the Granger causality test including seasonal factors have determined that the number of HODS could be explained by the number of searches generated on the internet and by a systematic seasonality (Fig. 3).

The ECM with seasonality obtained for residents abroad is as follows (lags selected under Akaike Info Criterion):

$$ \begin{aligned} \Delta \ln \left( {\hat{y}_{t} } \right) = \mathop { - 0.28}\limits_{(0.00)} \Delta \ln \left( {x_{t} } \right) - \mathop {0.13}\limits_{(0.03)} \left[ {\ln \left( {y_{t - 1} } \right) - \mathop {0.55}\limits_{(0.00)} \ln \left( {x_{t} } \right)} \right] + \sum\limits_{i = 1}^{12} {\hat{\alpha }_{i} w_{i} } + \hat{\varepsilon }_{t} \hfill \\ Sample: \, 2010M1 \, 2017M12 \, R^{2} = 0.9888 \hfill \\ \end{aligned} $$

$$ \begin{aligned} \sum\limits_{{i = 1}}^{{12}} {\hat{\alpha }_{i} } w_{i} = & \mathop { - 22.41}\limits_{{(0.03)}} w_{1} \mathop { + 1.86}\limits_{{(0.02)}} w_{2} + \mathop {2.05}\limits_{{(0.01)}} w_{3} + \mathop {2.08}\limits_{{(0.01)}} w_{4} + \mathop {2.23}\limits_{{(0.00)}} w_{5} + \mathop {2.13}\limits_{{(0.01)}} w_{6} \\ & + \mathop {2.11}\limits_{{(0.01)}} w_{7} + \mathop {2.01}\limits_{{(0.02)}} w_{8} + \mathop {1.81}\limits_{{(0.04)}} w_{9} + \mathop {1.65}\limits_{{(0.06)}} w_{{10}} + \mathop {1.16}\limits_{{(0.18)}} w_{{11}} + \mathop {1.48}\limits_{{(0.08)}} w_{{12}} \\ \end{aligned} $$

In the model defined for the HODS resident abroad variable, two aspects stand out (p-values in brackets): firstly, the existence of a cointegration relationship; second, the strong influence of seasonality. Table 2 shows models and results for HODS by country of origin.

Table 2 Summary of ARDL + seasonality models by country of origin for HODS. Sample Jan. 2010–December 2017. The table shows no relevant seasonality (months). Own elaboration

Full size table

It emphasizes, on the one hand, that all models show a long-term relationship (except for the UK) with a 95% confidence level (USA with 90%). On the other hand, all models are affected by the monthly seasonality, highlighting the fact that the German country of origin every month is significantly different from zero.

Once the results of the three forecasting models cited in the methodology section have been obtained by nationalities of tourists who visit Spain, the RT’s can be applied to quantify which model is better in predictive terms.

The results of the forecasting accuracy (see Table 3), depend on the time horizon used and the country of origin analyzed.

Table 3 Matrix U1 Theil forecasting evaluation (Jan. 2018–June 2019): RT’s by country of origin. Own elaboration

Full size table

In general, we can say that SARIMA models have obtained better results than SSA models (except the Netherlands with h = 12, 18). On the other hand, when comparing with the ARDL causal models with seasonality, the diversity of the results does not allow us to conclude which model has the best forecasting capacity. With a time horizon of 3 months, SARIMA presents the best results in three nationalities of origin (Residents abroad, France, UK), for the rest they have obtained better results of forecasting with ARDL seasonally. For a 6-month time horizon, the best results of ARDL with seasonality have been obtained for France and the Netherlands, against SARIMA. For the 12-month and 18-month time horizons, the gains from using ARDL models with seasonality are observed in the German and Netherlands nationalities. For the rest of the cases, the SARIMA models are superior to those analyzed in this paper.

6 Conclusions

In this paper, the importance of Forecasting modelling and historical analysis carried out in the literature review has been highlighted. The four dimensions of Big Data have been discussed: volume, the technologies coming from Google tools for data ETL have allowed analyzing the main markets of origin tourism in Spain; velocity, related to the volume of data, the data engineering provided by Google technologies allow us to monitor the Tourism Demand search intentions of the main nationalities who visit Spain; variety, the use of primary data source (INE) and secondary (Google) have allowed build knowledge based on the data. This last one is a novel aspect in the analysis since the users show their interest through the search of information on the Internet; veracity of the data verified through the cointegration contrasts carried out. They have allowed modelling the forecasts of Spanish hotel demand by country of origin.

In addition, this article has used more common techniques (SARIMA or ARDL) with a novel technique named SSA. The contribution, in particular, can be divided into the following points:

1.
A Granger causality test extended to seasonality has been developed. In the literature, it was usual to perform only the contrast between endogenous and exogenous variables.
2.
A criterion of the model’s selection based on the predictive capacity of the models has been developed (RT´s). In previous literature work, the gain in the use of models has not been quantified. Theil ratio quantifies the gain between pairs of models.
3.
Related to the previous point, Econometric modelling with data from Big Data technologies does not guarantee an improvement in forecasting capacity. It has been demonstrated by the main nationalities who visit Spain.
4.
Concerning the dynamic models with seasonality, we have empirically demonstrated that hotel demand decisions are made with at least a period in advance.
5.
Cointegration relationship has been revealed expressed in the ECM model.

We can conclude that the models used in this work improve the explanatory capacity of causality (R² close to 1) and cointegration relationships have been demonstrated, provide seasonal knowledge in decision making for the Spanish Tourism Demand. According to the results obtained, it is not possible to conclude that there is a gain in terms of forecasting by the use of tools from Big Data engineering; in contrast to what some authors claim [35]. The econometric interpretation of causality models and the economic interpretation can facilitate an adjustment of the offer in terms of prices or even advertising to the agents interested in visiting Spain. This article has been the basis of future research in which data from Big Data technologies are used to make efficient decisions. The theoretical framework could be developed in fields where online markets are relevant. The preferred frameworks for this type of analysis could be Finance, Automotive, Insurance or any sort of market which implies searches on the internet network and this is translated into a quantification of the final decision of the consumer.

Notes

1.
INE: Instituto Nacional de Estadística (Spain). The National Statistics Institute (Spain).
2.
www.google.com.
3.
m is the number of endogenous variables $ y_{t} $(HODS); n is the number of exogenous variables $ x_{t} $(Google Queries). $ { \ln } $ is the Natural Logarithm. $ (L) $ is the Lag operator. Stability conditions: if inverted roots are $ |\gamma (L)| < 1 $.
4.
Granger-Engle representation theorem and parameters are estimated in two stages. Consistency and Efficiency of estimators are fulfilled.

References

García, J., Molina, J.M., Berlanga, A., Patricio, M.Á., Bustamante, Á.L., Padilla, W.R.: Ciencia de Datos: Técnicas Analíticas y Aprendizaje Estadístico. Un enfoque práctico. Alfaomega, Tarragona (2018)
Google Scholar
Juul, M.: Tourism and the European Union: Recents Trends and Policy Developments. European Parliamentary Research Service (2015)
Google Scholar
Pegg, S., Patterson, I., Vila Gariddo, P.: The impact of seasonality on tourism and hospitality operations in the alpine. Int. J. Hosp. Manage. 31, 659–666 (2012)
Google Scholar
Ruiz-Reina, M.Á.: Big Data: does it really improve forecasting techniques for tourism demand in Spain?. In: ITISE 2019: International Conference on Time Series and Forecasting on Proceedings of Papers, pp. 694–706. Godel Impresiones Digitales S.L. Granada (2019)
Google Scholar
Jansen, B.J.: Review of “The search: how Google and its rivals rewrote the rules of business and transformed our culture”. Inform. Process. Manage. Int. J. 2(5), 1399–1401 (2006)
Article Google Scholar
Wu, D.C., Song, H., Shen, S.: New developments in tourism and hotel demand modeling and forecasting. Int. J. Contemp. Hosp. Manage. 29(1), 507–529 (2017)
Article Google Scholar
Li, J., Xu, L., Tang, L., Wang, S., Li, L.: Big data in tourism research: a literature review. Tour. Manag. 68, 301–323 (2018)
Article Google Scholar
Li, C., Song, H., Wit, S.: Recent developments in econometric modeling and forecasting. J. Travel Res. 44(1) (2005)
Google Scholar
Song, H., Li, G.: Tourism demand modelling and forecasting: a review of Recent research. Tour. Manag. 29(2), 203–220 (2008)
Article Google Scholar
Peng, B., Song, H., Crouch, G.I.: A meta-analysis of international tourism. Tour. Manag. 45, 181–183 (2014)
Article Google Scholar
Xiaoying Jiao, E., Li Chen, J.: Tourism forecasting: a review of methodological developments over the last decade. Tour. Econ. 20(10), 1–24 (2018)
Google Scholar
Mariani, M., Baggio, R., Fuchs, M., Höepken, W.: Business intelligence and big data in hospitality and tourism: a systematic literature review. Int. J. Contemp. Hosp. Manage. (2018)
Google Scholar
Silva, E.S., Hassani, H., Heravi, S., Huang, X.: Forecasting tourism demand with denoised neural networks. Ann. Tour. Res. 74, 134–154 (2019)
Article Google Scholar
Lu, Z., Liu, N.: The guiding effect of information flow of Australian tourism website on tourist flow: process, intensity and mechanism. Hum. Geogr. 22(5), 88–93 (2007)
Google Scholar
https://www.researchgate.net/publication/238115677_On_the_Predictability_of_Search_Trends. Last accessed 06 Nov 2019
https://static.googleusercontent.com/media/www.google.com/es//googleblogs/pdfs/google_predicting_the_present.pdf. Last accessed 06 Nov 2019
http://cs229.stanford.edu/proj2011/GawlikKaurKabaria-PredictingTourismTrendsWithGoogleInsights.pdf. Last accessed 06 Nov 2019
Pan, B., Wu, D.C., Song, H.: Forecasting hotel room demand using search engine data. J. Hosp. Tour. Technol. 3(3), 196–210 (2012)
Google Scholar
Yang, X., Pan, B., Evans, J.A., Benfu, L.: Forecasting Chinese Tourist volume with search engine data. Tour. Manage. (2015)
Google Scholar
Onder, I., Gunter, U.: Forecasting tourism demand with Google trends: the case of Vienna. Tour. Anal. (2015)
Google Scholar
Bangwayo-Skeete, P., Skeete, R.W.: Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach. Tour. Manage. 46, 454–464 (2015)
Article Google Scholar
Park, S., Lee, J., Song, W.: Short-term forecasting of Japanese tourist inflow to South Korea using Google trends data. J. Travel Tour. Market. 34(3), 357–368 (2017)
Article Google Scholar
Artola, C., Pinto, F., de Pedraza, P.: Can internet searches forecast tourism inflows. Int. J. Manpower 36(1), 103–116 (2015)
Article Google Scholar
Gunter, U., Onder, I.: Forecasting city arrivals with Google analytics. Ann. Tour. Res. 61, 199–212 (2016)
Article Google Scholar
Rivera, R.: A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data. Tour. Manag. 57, 12–20 (2016)
Article Google Scholar
Dinis, G., Costa, C., Pacheco, O.: The use of Google trends data as proxy of foreign tourist inflows to Portugal. Int. J. Cult. Digital Tour. 3(1), 66–75 (2016)
Google Scholar
Camacho, M., Pacce, M.J.: Forecasting travellers in Spain with Google’s search volume indices. Tour. Econ. 24(4), 434–448 (2017)
Article Google Scholar
Önder, I.: Forecasting tourism demand with Google trends: accuracy comparison of countries versus cities. Int. J. Tour. Res. 19(6), 1–39 (2017)
Article Google Scholar
Zeynalov, A.: Forecasting tourist arrivals in Prague: Google econometrics. Munich Personal RePEc Archive (2017)
Google Scholar
Liu, J., Li, X., Guo, Y.: Periodicity analysis and a model structure for consumer behavior on hotel online search interest in the US. Int. J. Contemp. Hosp. Manage. 29(5), 1486–1500 (2017)
Article Google Scholar
Rödel, E.: Forecasting tourism demand in Amsterdam with Google Trends. Master Thesis (2017)
Google Scholar
Palos-Sanchez, P.R., Correia, M.B.: The collaborative economy based analysis of demand: study of Airbnb case in Spain and Portugal. J. Theor. Appl. Electron. Commerce Res. 13(3), 85–98 (2018)
Article Google Scholar
Tang, H., Qiu, Y., Liu, J.: Comparison of periodic behavior of consumer online searches for restaurants in the U.S. and China based on search engine data. IEEE Access (2018)
Google Scholar
Li, X., Pan, B., Law, R., Hyang, X.: Forecasting tourism demand with composite search index. Tour. Manage. 59, 57–66 (2017)
Google Scholar
Sun, S., Wei, Y., Tsui, K.-L., Wang, S.: Forecasting tourist arrivals with machine learning and internet search index. Tour. Manag. 70, 1–10 (2019)
Article Google Scholar
Theil, H.: Econ. Forecasts Policy (1958)
Google Scholar
Theil, H.: Appl. Econ. Forecasting (1966)
Google Scholar
Bliemel, F.W.: Theil’s forecast accuracy coefficient: a clarification. J. Mark. Res. 10(4), 444–446 (1973)
Article Google Scholar
Ahlburg, D.A.: Forecast evaluation and improvement using theil’s decomposition. J. Forecasting 3(3), 345–351 (1984)
Article Google Scholar
Tofallis, C.: A better measure of relative prediction accuracy for model selection and model estimation. J. Oper. Res. Soc. 66(8), 1352–1362 (2015)
Article Google Scholar
Hassani, H., Webster, A., Simiral Silva, E., Heravi, S.: Forecasting U.S. tourist arrivals using optimal singular spectrum analysis. Tour. Manage. 46, 322–335 (2015)
Google Scholar
Hassani, E.S., Antonakakis, N., Filis, G.: Forecasting accuracy evaluation of tourist arrivals. Ann. Tour. Res. 63, 112–127 (2017)
Google Scholar
Dedić, Stanier: An evaluation of the challenges of multilingualism in data warehouse development. In: 18th International Conference on Enterprise Information Systems—ICEIS 2016 (2016)
Google Scholar
Dunning, T., Friedman, E.: Time series databases: new ways to store and access data. O’Reilly Media (2014)
Google Scholar
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis, Forecasting and Control. Wiley, USA (2008)
MATH Google Scholar
Golyandina, N., Korobeynikov, A., Zhigljavsky, A.: Singular Spectral Analysis with R. Springer (2018)
Google Scholar
Granger, C.: Investigating causal relations by econometric models and cross spectral methods. Econometrica 37(3), 424–438 (1969)
Article Google Scholar
Montero, R.: Test de Causalidad. Documentos de Trabajo en Economía Aplicada. Universidad de Granada, España (2013)
Google Scholar
Vergori: Forecasting tourism demand: the role of seasonality. Tour. Econ. 18(5), 915–930 (2012)
Google Scholar
Buse, A.: The likelihood ratio, Wald, and Langrange multiplier test: an expository note. Am. Statitician 36(3), 153–157 (1982)
Article Google Scholar
Hylleberg, S., Engle, R., Granger, C., Yoo, B.: Seasonal integration and cointegration. J. Econometrics 44, 215–238 (1990)
Article MathSciNet Google Scholar
Nkoro, E., Uko, K.: Autoregressive Distributed Lag (ARDL) cointegration technique: application and interpretation. J. Stat. Econometric Methods 5(4), 63–91 (2016)
Google Scholar
Zivot, E.: The power of single equation tests for cointegration when the cointegrating vector is prespecified. Econometric Theory 16(3), 407–439 (2000)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economic Theory and Economic History, PhD Program in Economics and Business, University of Málaga, S/N, Plaza Del Ejido, 29013, Málaga, Spain
Miguel Ángel Ruiz Reina

Authors

Miguel Ángel Ruiz Reina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Ángel Ruiz Reina .

Editor information

Editors and Affiliations

Faculty of Sciences, University of Granada, Granada, Spain
Olga Valenzuela
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Luis Javier Herrera
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Héctor Pomares
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reina, M.Á.R. (2020). Big Data: Forecasting and Control for Tourism Demand. In: Valenzuela, O., Rojas, F., Herrera, L.J., Pomares, H., Rojas, I. (eds) Theory and Applications of Time Series Analysis. ITISE 2019. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-56219-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-56219-9_18
Published: 21 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56218-2
Online ISBN: 978-3-030-56219-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Big Data: Forecasting and Control for Tourism Demand

Abstract

Similar content being viewed by others