Keywords

1 Introduction

Data ubiquitously generated during the plain interaction between mobile phone and the serving telecommunication network are a rich source of information. These pervasive datasets are recorded and stored in Base Transceiver Station BTS which are basic devices providing wireless communication between mobile phones and a telecommunication network. General availability of mobile phones accompanying us in everyday life in most of the time creates a great potential towards identifying people activities. (Anonymized) Call Detail Records CDRs produced during the above-mentioned interactions allows to estimate locations of important places as well as other behavioural aspects of user/tourist activities especially when some other open and available technologies are applied supportively. Thus, the purpose of this paper is to map population datasets into a collection of individual behaviours to support urban ecosystem which produce context-aware and pro-active decisions. Activities and use cases for municipal services that support urban management, as well as refer to tourist movement in a destination, are considered.

Quality is always crucial for a successful tourism industry. Thus, the evaluation of effectiveness in achieving the goals of the assumed and expected sustainable tourism development is fundamental. Tourist trajectories and patter behaviours might be extracted from mobile phone datasets, and thus replace inefficient/traditional destination questionnaires, no matter manual/paper or web-based, within leisure, recreation and tourism settings. Since manual surveys are so expensive to conduct in terms of time and money, the automatic surveys based on the analysis of pervasive datasets from mobile networks seems to be an excellent alternative the benefits of which is hard to overestimate. Another advantage of this approach is impartiality as well as the generation in real-time reliable information covering tourist activities.

The aim of this paper is to show how to use information about tourist activity in urban spaces obtained from CDRs for context-aware smart decisions in urban ecosystem. The contribution of the paper is an unified approach consisting of sample use cases for an urban context-aware system and algorithms for software agents which are evidence and an argument validating the proposed system. Another contribution is a novel method of mapping filtered pervasive streams of datasets into a collection of individual and anonymized tourist activities located in a tourist destination. To the best of our knowledge, this early research paper presents the first study for mentioned area as well as the tourist movement case. This research opens some new directions especially related to implementation and experiments in particular.

There are works considering behavior recognition in ubiquitous computing; however, their relevant subset which focuses on pervasive datasets stored in BTS stations is a subject of these research interests. On the other hand, most of works focus on the entire streams of behaviors rather not considering individual one. For example, in work [12] mobile phone data are analyzed as data that create holistic and dynamic city system. It allows to build a dynamic and real-time representation that goes city-wide. Work [6] provides a method of identifying inhabitants’ important locations by clustering and regression. Based on some simple rules, algorithms for selecting home and work locations are described, and both individual and population behavior is considered. Work [2] describes mobile phones in a real-time urban monitoring system based on fixed sensors and GPS receivers. These combined approaches allow to prepare a monitoring platform to visualize a vehicular traffic and movements of pedestrians. Behavioral patterns are discussed in work [4]. Summarizing these works, there is a lack of a strong focusing limited only to the individual behaviors within mobile phone datasets. However, the works influence this research showing challenging research direction as well as considering some patterns of behaviors. In work [13] not only current research trends for leisure, recreation and tourist are surveyed but there are also numerous questionnaires included.

2 Mobile Phone Infrastructure

Systems for mobile communications (e.g. GSM or UMTS) are well established. There are many works introducing the world of data communication procedures, e.g. work [5].

Fig. 1.
figure 1

A sample BTS network (source: btsearch.pl)

The most obvious part of the mobile phone network is a base station. A base transceiver station (BTS) is a piece of equipment that enables wireless communication between a user and the network. Nowadays, cities and regions are covered with a relatively dense network of BTSs, see for example Fig. 1. Although outside the cities networks are less dense, in each case they gather and store important and interesting information about different types of users’ activities.

A call detail record (CDR) contains data recorded and produced by telecommunications equipment. CDRs, as collections of information, have a special format [3]. Below is a sample fragment of a CDR text decoded from the binary format. The first row must contain a header row which includes the field names:

figure a

The meaning of the columns is not analysed here since they are intuitive and the detailed discussion is outside the scope of the paper. Location information is extracted as part of the interaction data. These location observations, i.e. the moment of the phone’s/object’s entry into the area of a station (log in), and the moment they leave that area (log out), are of fundamental importance to the considerations given in the following sections of the paper.

3 Smart Urban Ecosystem

Ecosystem is a distributed, self-organized and open system gathering knowledge about (selected aspect of) smart city environment. It constitutes a community of digital devices and their environment functioning as a whole (hardware, software, services). This system might be extended considering other aspects of smart city, for example urban pollutions, fire and emergency systems, water and sanitation, energy, etc.

Fig. 2.
figure 2

A use case diagram for a smart urban ecosystem (fragment)

A sample urban system is shown in Fig. 2. Some users (actors) for a smart ecosystem that is context-aware are identified: Emergency services, Municipal police, and Public transportation management. Emergency services are organizations ensuring public safety and deal with emergencies when they occur (ambulance service, the police, the fire brigade, and others). Municipal police are law enforcement agencies that are under the control of a town, city, or borough or its local government. Public transportation management are systematic processes collecting and analyzing information on the condition and needs as inputs to the urban planning processes to support decision-makers for appropriate strategies.

The above actors operate in the context-aware urban system which consist of the following sample use cases: Manage crisis, Urban surveillance (UC2), and Manage transport (UC3). Brief descriptions of use case features are provided instead of a formal scenario.

Manage crisis (UC1) means process dealing with events that threaten for general public. When tourist activities in selected areas increased, responses might comprise: launching/establishing the special emergency call number, increasing the number of open/active, and night shifts, pharmacies in selected areas, increasing the number of hospital emergency rooms, improved security and enforcement of regulations, etc.

Urban monitoring (UC2) means checking processes or tracking in a systematic way, supervising activities in progress. When tourist activities in selected areas increased, responses might comprise: intensification of monitoring in selected areas, increased energy consumption/production, increased waste and pollution issues, additional patrols, sending drones, etc.

Manage transport (UC3) means to supply chain management for transportation operations in the public area. When tourist activities in selected areas increased, responses might comprise: increasing the frequency of buses/trams courses, shuttle services if necessary, activating additional bicycle rental systems, etc.

4 Tourist Destination Questionnaires

A questionnaire is a form containing a set of questions usually addressed to statistically important tourist activities. A tourist questionnaire is a typical way of gathering information which can be used for managing context-aware urban ecosystem. A questionnaire for tourist movement in destinations is discussed now to clarify how smart systems basing on recognising tourist activities work.

Fig. 3.
figure 3

Lisbon and close/distance surroundings.

Lisbon, the capital city of Portugal, as well as its surroundings are considered and used as an example.

Tourists/visitors stay in Lisbon and, probably day by day, visit its monuments and various tourist attractions. However, some tourists during their whole stay in the city may wish to visit its surroundings, e.g. Fátima (religious reasons) or Cascais (recreational reasons), as well as Sintra which is known for historical and architectural monuments and is classified as a UNESCO World Heritage Site, see Fig. 3. All these places/sub-destinations, except Fátima, are located in the Grande Lisboa subregionFootnote 1.

Table 1. A sample tourist questionnaire (a sketch).

Sample and common questions for visitors are shown in Table 1. There are also available many other tourist questionnaires, for example [1, 14]. These questionnaires are distributed to visitors during their stay at a destination. They refer to many details of visitors’ trip and stay. Forms are usually designed by tourism organizations for people who are going to spent at least one night at the destination. Questionnaires are conducted anonymously.

One of the main objectives of the questionnaires is to know more about visitor characteristics for marketing purposes, as well as to identify the size of the tourism activity. Other characteristics cover types of visitors (foreign or home, business or leisure, overnight or day trip). They also allow to identify where visitors, if any, go outside the examined basic destination and what is the scale of sub-destination visits.

The purpose of this paper is also to provide methods of gathering automatically information about tourist movements, that is to replace manual surveys by a fully automatic process, and then use this information for smart urban ecosystem. It should be noted that the typical granulation for the BTS station is about 500 m in a city (urban areas) and about 1,000 m outside a city. On the other hand, there are some advanced algorithms and models [2] enabling an estimation of a phone position between stations with an accuracy of about 150 m in urban areas. Let us also note that Home Location Register HLR is maintained in mobile networks in order to provide information about subscribers who are registered in a core/local network. The opposite meaning has the Visitor Location Register VLR which provides information about network visitors (outside/country or foreign). These two records are important for the approach since they allow finding who is a visitor and who is not. Although there are some exceptions, the probability of correct verification based on VLRs/HLRs is very high. In the case of any difficulties or doubts, the billing databases of mobile providers might additionally be examined.

5 Towards Algorithm

The analysis of points/questions in Table 1 leads to the following taxonomy based on the information expected to be obtained from the BTS datasets, which constitute an informally expressed algorithm:

  1. 1.

    answers that are obviously easy to obtain, e.g. point 1 or 3;

  2. 2.

    some answers are available through digging deeper but still direct analysis of the BTS data is needed, e.g. point 2 and the VLR/HLR records;

  3. 3.

    a certain number of answers need a pattern analysis for individuals, e.g. the comparison of the locations during day and night for point 4, or less/limited mobility (business) and greater mobility (an active city exploration typical for tourists) for point 6;

  4. 4.

    some answers require a pattern analysis for a group, if any, of visitors, in other words, it is examined whether a group of objects are moving together, e.g. the city exploration with a group of mobile phones/visitors for point 8, or with a one local (c.f. VLR/HLR records) mobile phone of a local guide for point 9;

  5. 5.

    some points need additional (open) technologies to answer questions, e.g. OpenStreetMap OSMFootnote 2 to locate/identify selected objects like airports or railway stations for point 7, hotels/hostels for point 5, museums/churches for point 6, or suburban areas (close or distant) for point 12;

  6. 6.

    there are some answers that require the historical data analysis, e.g. previous presence in a destination for point 10;

  7. 7.

    some answers require accesses to commercial/bank data, e.g. credit/debit cards used in the destination for point 11;

  8. 8.

    several answers could be obtained while analyzing, for example, social networks, reservation systems or web vendors, e.g. sources of information for point 16;

  9. 9.

    some answers could be obtained when web forms are sent directly to mobile phones, after the visit in the destination is over, e.g. sources of information for points 13–15, 20;

  10. 10.

    some points for which obtaining answers basing on BTS datasets are impossible or problematic, e.g. points 17–19;

  11. 11.

    last but not least, there is some information that could be extracted from the BTS data, and which usually is not a subject of any questionnaire (thus, no points in Table 1 are indicated here) but it could be used to analyze other parameters of tourist activities, e.g. intensity of call/sms/mms/web transmissions during the entire visit or in particular places, and thou numerous valuable conclusions that follow.

The above classification is crucial and gives an idea of the foundations for solutions and methods proposed in the paper, that is how use information gathered in CDRs treated as a base for pro-active decisions of an urban ecosystem. In other words, the above classification constitute a base for methods of building knowledge about tourist activities, c.f. line 2 in Algorithm 2.

6 A Multi-agent System

A multi-agent system and its architecture is proposed in this Section. The system is used to solve the problem of surveying the tourist movement in a destination in the way as described in the previous Section.

The following taxonomy of agent is proposed:

  • AAngel-the-guard agent, that is an agent created for a new object that appears in the entire destination network when this object is classified as a visitor. From this moment the agent exist in a system until the object leaves the entire network; the agent stores all events that refer to the object. After the object leaves a destination then the gathered information are passed to the agent Q and the agent A is removed from the system;

  • EEvent agent, that is an agent that exist in a system permanently the purpose of which is to process new events that appear. It is assumed that a list of basic events is pre-defined and only these events are handled;

  • QQuestionnaire agent, that is an agent that exists in a system and its purpose is to update a questionnaire which is build in this destination. The questionnaire is updated when an object leaves the entire destination and its agent A is to be removed. There is one questionnaire agent for one type of questionnaire;

  • MManaging agent, that is an agent that exists in a system permanently, and the purpose of this agent is to initiate system variables, and to handle two (selected) events as well as to manage questionnaire agents in this destination.

Summing up, the number of agents A in a system is equal to the number of visitors in a given destination; there is only one agent E in a destination; the number of agents Q in a system is equal to the number of questionnaire types built in the system, and there is only one agent M in the system.

7 Methods and Algorithms

Several algorithms for handling the entire system are proposed in this Section. They refer to the classification of the agents defined in the previous Section, that is agents that operate in a system.

Some assumptions related to the algorithms are made. There is a pre-defined set/list of BTSs BTSlist that belong to a considered destination. This set/area is closed, and only these BTSs constitute the destination. (A “BTS corridor”, for surroundings, see Fig. 3, must be built from a destination to a sub-destination.) An event loop (message dispatcher) EventList is the primary method of processing. There is also predefined set of events PredefEvent which are registered and inserted into loop EventList. Every event e defines an event type/name and the associate/coupled object o (mobile phone). In other words, an event always means a pair of an event’s name and an object. The events describe different aspects of objects’ activities registered in BTSs and are not widely discussed here. There are two special cases of events objCome and objLeave which mean that an object enters or exits a destination defined by BTSlist, respectively. The “nil” means empty event. The “others” stands for events that are outside the scope of (tourist) interest. Loop EventList as well as PredefEvent are placed in a basic/native mobile network system, and the system inserts every registered event to the loop which is processed by Algorithms 1–4.

figure b

The entire system is initiated by agent M whose operations are shown as Algorithm 1. Firstly, the agent initiates system/global variables and questionnaires. Secondly, agent M processes in the loop two (special) events that appear in the system. The global variables are V is (all active visitors observed/registered in a destination), Res (all active residents observed/registered in a destination), PredefEvent (list of legal and predefined events that are handled), and list of events EventList that are currently being processed in a system. Two of these events are handled directly in the main loop, that is only the agent M processes these events.

figure c
figure d
figure e

The agent Q, shown as Algorithm 2, processes data gathered by the agents A. Data are (temporary) stored in the system. Questionnaire Que is updated using data of agent a in the way it is required by this questionnaire. The agent A gathers data for object o and event e and is shown as Algorithm 3. Object o is a visitor and event e belongs to the list of legal events PredefEvent. The agent E performs events as defined in EventList and is shown as Algorithm 4. Events are processed in the (main) loop similar to the loop in the agent M’s operations, however, the right to process two events from EventList is reserved for the agent M.

Call “call process \(M\!\!-\!\!operations\)” (see also line 8 in Algorithm 1) starts the system. The system does not need a synchronization because of disjoint subsets of events processed in two separate loops (see, line 12 in Algorithm 1 and line 4 in Algorithm 4). If more agents E are introduced to speed up the entire system, e.g. call “call process \((1..p)~E\!\!-\!\!operations\)” (line 8 in Algorithm 1), the synchronization is mandatory.

Finding time complexity, which signifies the total time required by the algorithm to run to completion, for Algorithms 1–4 begins with Algorithms 2 and 3 which are elementary and called from other algorithms. Time complexity for Algorithm 2 depends on the length of a particular questionnaire \(\mathcal {O}(q_n)\), where \(q_n\) is a total number of questions in a questionnaire, c.f. Table 1. Time complexity for Algorithm 3 depends on the number of predefined and considered events \(\mathcal {O}(e_p)\) for an agent. Line 6 in Algorithm 4 contains its dominant operation, hence, time complexity for Algorithm 4 is \(\mathcal {O}(p')\), where \(p'=(PredefEvents \setminus \{others\}) \setminus \{objCome,objLeave\}\), that is all predefined events minus other ones, which are omitted and not considered for questionnaires, and minus two events processed separately in Algorithm 1. Most instructions of Algorithm 1 are fixed cost instructions, even the loop instruction in line 8 has fixed cost. The main purpose of the Algorithm is to pre-handle two special events objCome and objLeave, as well as to create/delete the angel for an object. Moreover, Algorithm 3 is always called, as well as Algorithm 2 is called in one case. Thus, worst case time complexity for Algorithm 1 is \(\mathcal {O}(e_l + n\cdot q_{i=1,...,n})\), where \(e_l\) is a number of predefined events, and n is a number of considered questionnaires, see lines 16, 20, and 22.

8 Conclusion

The paper presents a novel method for mining individual behaviours of visitors in a destination from pervasive BTS datasets. The questionnaire behaviour, see Table 1, is expressed informally in an introduced classification giving an idea how it works, and then authenticated through an architecture of a multi-agent system and more formally through Algorithms 1–4. The gathered information constitute a base for the context-aware urban ecosystem shown in Fig. 2.

Future works should cover more detailed algorithms as well as an architecture for a multi-agent system. Formal logic is an appropriate background, when considering workflows for software models [7, 8], applications of formal reasoning processes, e.g. [10, 11], or mining behaviours from datasets [9]. More theoretical and especially experimental evaluations are also required for future work.