Keywords

1 Introduction

In the 19th century, Astronomer and Mathematician Simon Newcomb found a pattern that occurred with the handling of logarithmic table books. He observed, in 1881, that the first pages of the books were more worn, that is, more used, and corresponded to the pages whose numbers started with 1, 2 or 3 when compared to the last pages whose first digits were the numbers 7, 8 or 9 [1]. He deduced, therefore, that in many quantitative phenomena, that is, in a set of numbers that appeared in nature, more numbers started with digit 1 than with digit 2, and more with 2 than with 3, and more with 3 than with 4, and more with 4 than with 5, and more with 5 than with 6, and so forth. He designed the probability distribution of the occurrence of the first digit of numbers in a set of data, thus visualizing a logarithmic distribution [1].

The topic was taken up and studied, and the theory was formally demonstrated and disseminated in 1939 by the American Physicist Frank Benford, becoming known as the Benford Law, or the First Digit Law, or the Newcomb-Benford Law, or the Law of Numbers Anomalous. Of course, for the reader who is neither a physicist nor a mathematician, this Law seems somewhat intriguing, as we can see: in a game that involves rolling a six-sided dice the probability of getting anyone side is 1/6, and the probability of getting heads or tails by tossing a coin is 1/2. In this logic, our common census would say that in a data set any number that we randomly choose would have 1 as the first digit with probability 1/9 and 2 as the first digit with the same probability, and so on. That is, all digits (1, 2…, 9) would have the same frequency, meaning they would occur in practically the same quantity, however, an uniform distribution does not occur in many phenomena studied by Benford [2]. He analysed several sets of data from various sources, such as river lengths, number of city populations, etc., thus proving what Newcomb had discovered fifty-eight years ago.

In fact, lately, Benford’s Law has been used to detect evidence of irregularities and data manipulation, electoral and tax fraud, economic and psychological data. The work in [3] showed that it is possible to detect false or fraudulent reporting of accounting data using Benford’s Law on data drawn from companies listed on the Johannesburg Stock Exchange. This information was then used by auditors and revenue services, but also by financial analysts as input to prediction models. Regarding electoral fraud, publications are pointing in different directions, the work in [4] showed how in this context Benford’s Law can be a problematic forensic tool. Benford’s Law was successfully applied as a lie detector, presenting an advance in cognitive phycology [5]. The work in [6], showed that the relative frequencies of the first and second digits in economic research are broadly consistent with Benford’s Law. Tourism demand data follows patterns of economic growth, the review in [6] supports the question that Bendford’s Law could be applied to this area.

It should be noted that Benford’s law has empirical evidence, research and practical application; however, like any law/model, it does not have a universal explanation. Therefore, Benford’s law can be used as a test of whether a dataset contains real numbers [7].

In the present study, Benford’s law is a preliminary approach to check the reliability of the monthly numbers of overnight stays data registered in the accommodation establishments in the Island of Sal, Cape Verde. To our knowledge, there is only a recent work that explores the application of this law in the context of tourism traffic in Croatia [8]. Nevertheless, it could be very important to analyse the applicability of the law to these numbers considering the hypothesis that the number of overnight stays occurs at random. The implication of this fact on the estimation of tourism demand is crucial for the development and optimization of prediction models.

Tourism today represents a highly competitive sector and has been assumed as an alternative or even as the main driver of development for countries and localities, whose dynamism and complexity permeate other crosscutting segments, corroborating its potential for profitability and sustainability. However, when these countries have integrated tourism products and a defined geographical area, establishing themselves as a tourism destination, they end up facing competitors for the same source markets [9].

In this paper, the concept of tourism demand is presented first. Then, an overview of tourism on the Island of Sal is presented, followed by Benford’s Law methodology and data description. The next section presents the implementation and analysis of the results. Finally, the conclusions summarise the essential elements of the study.

2 Tourism Demand

Tourism demand can be split between those who travel (Effective Demand) and those who do not (Latent Demand) to a specific destination and focuses on three fundamental aspects: the existence of a destination, the availability of goods and services and the price to be paid throughout the tourism process. It can be understood or expressed as follows: the number of visitors travelling to a specific destination, far from their place of residence or work, by the number of goods and services they consume at a specific price that they are willing to pay during their stay and in a given period of time [10,11,12,13,14].

Its measurement can be done in different ways, depending on whether the approach is, for example, physical or monetary. Physical tourism demand can be measured through volume variables, namely through ‘arrivals at the borders of each country’/‘number of visitors’ and/or by the ‘number of overnight stays’/‘number of guests’ in the means of accommodation. The monetary approach can be obtained through ‘visitor consumption’ and/or ‘visitor expenses’. However, the factors that can influence tourism demand have been classified by Uysal into [15]: psychosociological, economic and exogenous. Cunha, in turn, categorized them as [13]: structural, conjunctural and psychosociological. Alternatively, sometimes there are those who present the factors divided into three types [16]: push factors, pull factors and resistance factors. Divisekera creates two categories [17]: factors that affect travel (ex-ante) and factors that influence the choice of destinations (ex-post). Many of these classified and categorised factors have variables that are reflected in the tourism destination Island of Sal.

For this study, the number of overnight stays will be used. The series of overnight stays in accommodation establishments per month is considered a significant indicator of tourism activity since it provides information about the number of visitors that have taken advantage of tourism facilities [18]. The registration of the number of monthly overnights stays from visitors in accommodation establishments on the Island of Sal is recorded in two ways: (i) for accommodation establishments that have computer support, the National Statistics Institute of Cape Verde (INE) sends a digital form to fill in the data on the movement of guests; (ii) for accommodation establishments without computer support, a paper sheet is sent to them. Once the forms are filled in, they are sent to INE, which is responsible for processing and disseminating the data.

3 Overview of Tourism in the Island of Sal

The island has been growing demographically at an average annual rate of 4.5%, and currently has a resident population of at least 39,693 inhabitants [19], with approximately 7.2% of the population residing in Cape Verde. Its economy is markedly based on tourism activity with a Gross Domestic Product (GDP) of 197,197.66 eurosFootnote 1 with a predominance of the tertiary sector that generates 2/3 of the island’s wealth, with a GDP/per capita of 4,496.69 euros. Transport and storage are the main economic activity with 23.6% of the island’s wealth, determined mainly by the airport.

In 2019, the island had 30 new accommodation establishments, 10.6% of the total existing in the country and an accommodation capacity of 13,219 guests and 372,785 guests (45.5%) and 2,655,932 overnight stays. Of visitors arriving on the Island of Sal as a tourism destination, those from the United Kingdom are the most frequent (89,798) followed by Portugal (39,533), Belgium and the Netherlands (39,515) and Germany (32,759). There are 4,787 people employed by accommodation establishments, of which 54% are Cape Verdeans and 56% are women. In terms of the number of rooms and beds, the Island of Sal is one of the islands with the highest expression (46.2% and 45.4%) at the national level [20].

Considering the tourism potential of the Island of Sal, different types of tourism products can be developed and offered. From ‘sun and beach’ tourism (diving, windsurfing, surfing, kitesurfing, kite beaching, shark bay, fishing and sports); nature tourism (hiking, cycling, quads, tour of the island), cultural tourism (Museum of Sal, handicrafts, Palmeira town, Pedra de Lume village, gastronomy) to health tourism (massage/spa, mud baths, baths salt, specialized health services). Santa Maria beach with 3 km of white sands was named by TripAdvisor as one of the 25 most beautiful beaches in the world and one of the seven wonders of Cape Verde.

In the period between 2000 and 2018, investments in sports infrastructures, combined with hotel infrastructures and the excellent natural conditions of the Island of Sal, contributed to the realization of several important events at an international level. Some of the events that mobilised foreign and national visitors were: Kite-Surf World Tour (2018: February); finals of the national championships in 2018 (Handball - May/June, Basketball - July, Karate - September, Athletics - June/July, Boxing - August, Chess - March). In addition, the opening of new internationally renowned hotels also increased the number of visitors, such as the Hilton Hotel (2017: October; 2018: January), Riu Funaná (2006: October) renewed for Club-Hotel Riu Funaná (2016), Riu Garopa (2011) renewed for Riu Palace Cabo Verde (2016: March), Meliá Lana (2016: December), Meliá Tortuga (2011: May), Meliá Dunas (2014: November), Tui Sensimar (2016: December). At the same time, some cultural events were held, such as the World of Literature (2017 and 2018) and the Santa Maria Festival (2000 to 2018), which also boosted the movement and increased the number of visitors on the island [21, 22].

4 Methodology

4.1 Benford’s Law

Benford’s law has increasingly been used as a diagnostic tool to isolate large data sets with irregularities that merit closer inspection. Performing a systematic sweep is extremely arduous; therefore, these data are a natural candidate for initial screening compared to Benford distributions [23]. Benford’s Law can be defined as a collection of empirical evidence related to the frequency distribution of the leading digits in numerical data sets. The best-known version of the law states that in those data sets representing a collection of ‘natural’ data, the probability of seeing a particular digit in the first position is inversely related to its rank [24].

The proposal conceived by Newcomb and demonstrated by Benford shows that the first digit of numbers, in a data set, follows a logarithmic distribution, where the probability of occurrence of the first digit (d) in an universe of random numbers is given by [1, 2]:

$$ P\left( {leading\;digit = d} \right) = \log_{10} \left( {1 + \frac{1}{d}} \right), \quad d = 1,2, \ldots ,9. $$
(1)

Employing Eq. (1) it is then possible to determine the probability of occurrence of the numbers whose first digit is 1, 2, …, or 9 as shown in Table 1. For some reason, it seemed that numbers beginning with digits 1 and 2 are much more frequent than numbers beginning with digits 8 and 9. For it seemed that numbers beginning with 1 and 2 occurred more frequently in nature than numbers beginning with 8 and 9 [25].

Table 1. Benford’s law for the first digit of numbers in a data set.

Through Eq. (1) and Table 1, it is easy to understand that, according to Benford’s Law, the distribution of leading digits in numbers is far away from uniform. A randomly selected number should begin with the digit 1 about 30% of the time: more precisely, the proportion should be 0.301, or log10(2). The frequency of numbers with 2 as the first digit should be about 17.6% (obtained from log10(3/2)), those with 3 as the first digit should be about 12.5% (from log10(4/3)), and then successively until the frequency of digit 8 should be about 5.1% and for the digit 9 about 4.6% [25].

4.2 Goodness of Fit

To compare the observed frequencies of the first digit of the number of monthly overnights stays in accommodation establishments on the Island of Sal with the expected frequencies according to Benford’s Law, the Chi-Square frequency comparison test (χ2) was being used. This test verifies the existence (or not) of discrepancies between the observed and expected frequencies, and is given by [26]:

$$ \chi^{2} = \frac{{\left( {o_{1} - e_{1} } \right)^{2} }}{{e_{1} }} + \frac{{\left( {o_{2} - e_{2} } \right)^{2} }}{{e_{2} }} + \ldots + \frac{{\left( {o_{k} - e_{k} } \right)^{2} }}{{e_{k} }} = \mathop \sum \limits_{i = 1}^{k} \frac{{\left( {o_{i} - e_{i} } \right)^{2} }}{{e_{i} }} $$
(2)

where,

  • \({o}_{i}\) - observed frequencies of the i digit, where i = 1, 2, …, 9;

  • \({e}_{i}\) - expected frequencies of the i digit, where i = 1, 2, …, 9.

The null hypothesis (\({H}_{0}\)) and the alternative hypothesis (\({H}_{1}\)) are:

  • \({H}_{0}\): The number of monthly overnight stays registered in accommodation establishments on the island of Sal follows Benford’s Law.

  • \({H}_{1}\): The number of monthly overnight stays recorded in accommodation establishments on the island of Sal does not follow Benford’s Law.

4.3 Data Description

To verify Benford’s Law, monthly data on the number of overnight stays in accommodation establishments on Island of Sal, between 2000:01 and 2018:12, provided by the National Statistics Institute of Cape Verde (INE), were used.

Figure 1 displays the values of the data under analysis, it is visible the existence of irregular oscillations that suggest a non-stabilisation of the average and the presence a seasonality (maximum values in the winter months: between October and December, and minimum values in the summer months: between May and September), indicating that the values of overnights stays in accommodation establishments depend on the time of the year. It also shows an increasing trend after the year 2010. Although, at the end of 2007 and beginning of 2008, it showed high values when compared with the previous year. Since 2014, there has been a significant increase because of the promotion of the island and even of the Cape Verde archipelago among potential visitors in Europe.

Fig. 1.
figure 1

Source: Own elaboration with data provided by INE.

No. of monthly overnight stays in accommodation establishments on Island of Sal [2000:01–2018:12].

5 Implementation and Analysis of Results

In this section, Benford’s law is applied to the monthly data of the number of overnight stays in accommodation establishments on the Island of Sal, between 2000:01 and 2018:12, to test the data reliability.

Based on the data presented in Table 2 and Fig. 1, it can be said that:

  • Digit 1 as the first digit appears for the first time when the number of overnight stays reaches six digits, which occurs in the last quarter of the year 2007. From then until 2015, the entire number of overnight stays with six digits starts with the digit 1, except for one or another exception, and this is one year after the opening of the international hotel Riu Funaná;

  • Digit 2 as the first digit appears in 2000 a few times when the number of overnight stays is five digits, and it only reappears more frequently from 2015, when the opening of large international hotels occurs. In 2018, 2 is the first digit in practically all overnight stays. This may be due to the six national events and one international event that took place in seven months of that year;

  • The number of overnight stays has been increasing as a result of the opening of internationally renowned hotels and occasional international and national sporting and cultural events; and, the observed percentage of digit 1 numbers exceeded the expected since the lack of significant events leading to a slow increase of the number of overnight from 100,000 to 200,000;

  • The range of values starting with the digit 1 or with digit 2 is wider than the range of values starting with the other digits. Since there is a visible increasing trend in the number of overnight stays; there are different rates of growth in the number of overnight stays, showing a possible change in tourism demand after 2014; and, there is a possible underlying oscillatory/random behaviour of the data, which could be partially explained by the Benford’s law.

Table 2. Number of monthly overnight stays recorded in accommodation establishments on Island of Sal [2000:01–2018:12].

For this study, the calculation of the Chi-Square χ2 was based on the sum of the nine possible digits to be the first position in the data set (Table 2). For a significance level of 5% and with 8 degrees of freedom (gl = 9 − 1 = 8), we have, when consulting the Chi-Square table [26], that the χ2 critical value is 15,507.

Table 3 shows the steps performed in Excel to calculate the χ2 value for the data set. In this case, the calculated χ2 (= 28,940) is greater than the critical χ2 (= 15,507), so the null hypothesis (H0) is rejected, at a significance level of 5%. The test decision is that the number of monthly overnight stays in accommodation establishments on the Island of Sal in the period between 2000 and 2018 does not follow Benford’s Law. These results were confirmed by IBM SPSS Software, with a p-value < 0.001.

Table 3. Calculation of the \({\upchi }^{2}\) value for the number of overnight stays data set.

Figure 2 shows the discrepancy between the observed and expected relative frequencies. Although there is an approximation to the shape of the curve, the observed frequencies are above the expected frequencies (according to Benford’s Law) for the number of overnight stays beginning with digits 1, 7, 8 and 9. For the number of overnight stays beginning with digits 2 and 3 the observed frequencies are below the expected, but for the numbers beginning with digits 4, 5 and 6 the observed frequencies are close to the expected values. This shows how sensitive the Chi-Square χ2 test is, but also how there could be sporadic disturbances on the data and even the necessity of increasing the data collection period.

Fig. 2.
figure 2

Source: Own elaboration.

Comparison of the observed and expected relative frequencies of the first digit of the number of monthly overnights stays in accommodation establishments on the Island of Sal [2000:01–2018:12].

6 Conclusion

In this study, the hypothesis that the number of monthly overnight stays in accommodation establishments followed Benford’s Law was rejected, according to the data of Island of Sal. The discrepancies between the observed and expected relative frequencies were statistically significant, although there is a similarity in the shape of the curve.

Benford’s law was a preliminary approach to check the reliability of the monthly numbers of overnight stays data registered in the accommodation establishments in the Island of Sal, and to our knowledge, it is its first application in the context of tourism demand in Cape Verde.

The delay in the diversification of offers in terms of infrastructure, and cultural and sporting events can lead to the number of overnight stays being outside the Benford’s Law, falling short and beyond expectations. In addition, it is also necessary to consider the possibility of visitors who spend their nights in complementary accommodation, which is not accounted for in these data (such as private accommodation) and could make a difference in the results special for national visitors.

In a recent study [8], Benford’s Law was applied to the number of domestic and foreign overnight stays, in a monthly series over a period of three years in Croatia. In that study, only the domestic overnight stays did not follow Benford’s Law. However, the authors did not identify the causes of adaptation and non-adaptation of the number of overnight stays to Bendford’s Law, they assumed that this occurred mainly due to the lack of accuracy and absence of records in tourist accommodations.

In the Island of Sal, there are still other legal tourist accommodations that, not being known to INE, do not provide data for official statistics. This could have an impact on the adaptation of the Bendford’s Law and is in line with the conclusions in [8]. In this sense, it is necessary that the local government, through the Tourism Office, continues the effort to update the information on these accommodation providers and that the data is reported not just for statistical purposes but to enhance the tourism experience of Island.

In contrast with the work in [8], the data for the Island of Sal is not separated into domestic and foreign overnight stays and the size of the database is approximately six times larger, which enables a detailed data comparison. One aspect that could impact the adaptation of Bendford’s Law to the data from this work, is the fact that the data does not span over a large order of magnitudes.

Some aspects could be considered in future research mainly: incorporating more data and data from other islands in the archipelago; quantify the magnitude of the impact of specific events; look at the rate of change of overnight stays; discriminate between national and foreign tourists’ overnight stays.

The main conclusion of this study points out that the results obtained not only support the reliability of secondary data (from INE) but also suggest that the baseline number of monthly overnight stays in accommodation establishments can be simulated using Benford’s law and events can be considered as possible disturbances to the law.

Tourism today represents a highly competitive sector and has been assumed as an alternative or even as the main driver of development for countries and localities, whose dynamism and complexity permeate other crosscutting segments, corroborating its potential for profitability and sustainability.

Nevertheless, it was important to analyse the applicability of the law to data considering the hypothesis that the number of overnight stays occurs at random. The implication of this fact on the estimation of tourism demand is crucial for the future development and optimization of prediction models.