Keywords

1 Introduction

Nowadays, there are different tracking technologies, embedded into mobile devices, that allow to gather data about movements of people, vehicles, and even animals or general objects. The widespread use of this kind of technologies increased the interest in collecting and analyzing data in order to model users’ trajectories and extract information about how they move, what kind of places they visit, and more recently, to understand their habits and behaviors. Several applications have been developed for this purpose, in particular for tracking users movements during their daily activities and to observe how they define their routines. The wide availability of these mobile applications allows to gather a big amount of information about human activities, useful to understand mobility patterns for both the individual and the community point of view.

The location data are often associated to several services in order to provide useful features to users. Common uses are related to user’s current position, exploited to realize navigation systems, fitness applications, or even to add locations on social networks activities. Originally used as stand-alone information, users’ positions have been then deeper analyzed often in combination with an history of places previously visited by the same users, or also by others in the same area, in order to improve location based services. This approach has led to make progress toward a more social direction, which takes care of the users perspective. A new set of mobile applications has began to spread, the so called personal assistants, which provides suggestions to users in several ways, such as textual messages, web pages to visit, and even other applications to download. To achieve this objective new methodologies and algorithms have been designed in order to get information about users’ habits and behaviors.

By studying users’ trajectories it is possible, for instance, to analyze how often places are visited, also with particular patterns, in order to understand which locations are the most important during a day or a week, or which are the locations visited for extended periods or for a few moments. With this kind of information it is possible to define a set of user behaviors that leds to a set of habits, useful to improve the algorithms used in mobile applications to provide location based services and suggestions to users. This methodology can be applied to a wide range of users in order to perform a community mobility analysis, for instance users who live in the same city, or movements in a delimited area during a particular event.

With these premises it is clear how important is the new information extracted about users for several research fields and also for industry. It can bring benefits to both the public and private sectors, for urban planning and transportation management, but also for security, privacy issues, and even marketing. Different research areas are affected by these applications: data mining, information retrieval, statistic studies, etc., are some of the main fields that can exploit this social approach and provide improved and more user oriented solutions.

The main contribution of the paper is a general state of the art of research work related to moving objects analysis, in particular users, by focusing on trajectories modeling, and then highlighting recent interests on methodologies more user oriented, to analyze social habits and behaviors. We describe social approaches used to extract new semantic knowledge, combining those that are often separate literatures about the study of trajectories and the study of social habits and behaviors as a further step on trajectory modeling. In particular, we provide a general perspective on studies on human mobility by depicting and comparing methods and algorithms focusing on two significant aspects, namely, as important place recognition and destination prediction.

The paper is organized as follows. Section 2 illustrates how trajectories model movements in different ways, depending on different applications. Section 3 describes processing algorithms that allow us, from raw data, to obtain different trajectory descriptions. Then, in Sect. 4, we focus on social trajectories and we survey different research areas on human mobility and social habits and behaviors, with particular attention to the recognition of points of interest (POIs) and the prediction of future ones. Finally, conclusions are drawn in Sect. 5.

2 Tracking Spatial Data and Trajectory Modeling

Using different technologies integrated in mobile devices a wide amount of real life data can be collected. Nowadays mobile devices are usually provided with GPS receivers to deal with the Global Positioning System, a space-based satellite navigation system that allows us to detect their spatial location in specific temporal instants. These receivers are usually combined with other components. There are several situations in which GPS is not available or not usable, such as indoor applications or among tall buildings. For these reasons, hybrid positioning systems are used to replace, or estimate, GPS positions through different wireless networks (Liu et al. 2007). Moreover, even if they are considered not very accurate, accelerometers, gyroscopes, magnetometers, etc. have a limited cost and can be used to produce more precise data (Sun et al. 2014; Shoaib et al. 2015), in particular for a physical activity recognition.

Nowadays, academia and industry are very interested on both mobility data computation and related studies, due to their benefits in several applications. Tracking movements of objects in general can be useful to provide information for tasks such as monitoring or optimizing their paths. It is possible to perform this kind of analysis with every kind of object in movement: packages to deliver, vehicles to monitor, even with animals, and people, by using proper sensors and technologies. A lot of mobile applications have been developed with the aim to exploit information extracted from raw location data. The simplest ones track users current position in order to get information for navigation systems. Other applications, instead, track the entire users movement in order to get a history of visited locations, with related data, for instance about time, space and/or speed. Some of those exploit this kind of tracking during sport activities in order to monitor user’s performance and provide information about the current training, and even to give suggestions with the aim to help the user to obtain improvements. Also whole community decisions can exploit mobility knowledge. Understanding the most frequent movements in a city, and their temporal ranges, allows to support the design of new public connections in a transportation network or to identify traffic congestion situations where intervention strategies are required. These topics can be exploited also in environmental or marketing applications: areas where traffic is more intense are usually connected with pollution problems, and where advertising can be viewed by a large number of people.

A trajectory can be represented in different ways depending on what kind of application is needed, therefore, the same movement can be described with different properties necessary to provide the right information for the required purpose. In a similar way as described by Yan et al. (2010, 2013), an hybrid model, that includes several levels for trajectory description, can be used to divide the set of operations and analisys. The state of the art about trajectory modeling focuses on the following three levels:

  • raw trajectory level: starting from raw data, i.e. sequences of positions associated with a timestamp, and applying cleaning and filtering algorithms, simple raw trajectories can be created;

  • structured trajectory level: a first level of abstraction can be applied to raw trajectories to obtain a structured description of a movement. A structured trajectory can be described as a sequence of episodes, groups of sequential GPS points with common properties. A classical example is the segmentation in stops/stay points and movements;

  • semantic trajectory level: trajectories can be enriched adding annotations that, associated to different granularity levels (from simple points to whole trajectories), allow us to motivate and describe each component of a movement.

These different levels represent sequential steps of operations to support the requirements of trajectory applications. In Sect. 3 we summarize several analysis and data processing related to each level. Several related works in literature do not use a strict breakdown of these levels, for instance, Marketos et al. (2013) focus on just two levels, ordering slightly differently the operations inside. They apply a preliminary structuring into the raw trajectory level, then they integrate the episodes definition into the semantic level. In Fig. 1 these levels are illustrated as layers 1, 2, and 3, and represent the stages of data processing, starting from the basic level composed only of raw data.

Fig. 1
figure 1

Different levels in trajectory modeling

The widespread of tracking technologies embedded into mobile devices, and the large interest in developing algorithms and software solution to mine the data they gather, have led several companies to invest in that direction to better understand users’ behaviors and habits. Some companies use location data as a feature for social network based applications, in order to give new services to users based on their check-ins. Well known examples are Foursquare, that bases its entire service on users location information to give suggestions about POIs, and Facebook and Twitter that allow users to add their location while posting a new message on their account, in order to add more information for users who read their statuses. This kind of mobile applications gives people the possibility to track their location data associated to several different services, and to share with their friends this increasingly important source of information. This activity of sharing data provides the additional advantage of improving the shared services offered to the community.

Focusing on these research areas and industrial applications, by starting from previous trajectory models, an additional extended level can be defined, namely, the social layer (Fig. 1, layer 4), in order to make a further step closer toward user modeling. It allows to define trajectories under a new point of view: by analyzing a set of user trajectories, or a group of users who share common properties, for instance, the same geographic area or the same hours of work, it is possible to extract new data about important places they visit, type of movements, and more complex information about common habits and behaviors. As depicted in the top layer of Fig. 1, personal important places (points, lines or areas) for different purposes can be identified and described by extracting new information. For instance, some locations can delimit an area connected to a user activity (the leftmost pin, green) which involves some movements in semantically connected places. Other locations can be described by a large amount of time spent on them, such as a train station due to the waiting time, or a shopping mall where people walk. Some places may be important for more than just one person: a set of users can visit a place repeatedly (pin with a chat icon, blue) for meeting other people for talking or eating. It is possible to analyze user behaviors also associated to time and days to understand what locations can be suitable for spending free time, for instance entertainment venues (pin with a heart icon, red) such as bar, cinema, shopping mall, etc. Moreover, personal places, such as home (pin with a house icon, yellow) or roads where users use personal vehicles starting from their own houses (pin with a car icon, yellow), can be identified by observing what window of time users tend to enter or leave that locations.

On this basis it is clear how same locations can have different meanings for different users; indeed, a shopping center could be a work place for who has a job there, or a place for free time for other people. Likewise, a dwelling may be a place that users visit occasionally, such as a friend’s house, or their own homes visited intensively. The features used to describe places in this new “social” way highlight the different semantics deriving from different users. In Sect. 4, we summarize main advances in trajectory mining based on social approaches and with focus on users’ behaviors.

3 From Raw Data to Semantic Information

From raw data, depending on needs and aims of the specific application, several trajectory reconstruction algorithms can be defined. They integrate different processing steps that allow to obtain trajectories on the different modeling levels. As described in Sect. 2, and depicted in Fig. 1, it is possible to apply data mining to get only a structured trajectory, or to analyze data deeper to reach the semantic level. The following sections illustrate these steps.

3.1 Cleaning and Filtering Raw Data

Datasets collected by mobile sensors are often imprecise and incorrect due to noise. Raw data are exposed to two different kind of errors (Jun et al. 2006): systematic errors derived from system positioning limitations (a low number of satellites while detecting position, a low accuracy due to signal problems, etc.) that affect the final quality of data; random errors due to external reasons as clock and receiver issues, atmospheric and ionospheric effects, etc. Usually, different methods based on several parameters as time, speed, etc., or geometrically regression models are used to solve these problems. For instance, Yan et al. (2013) describe a data preprocessing layer to clean data that applies a speed threshold to remove points that do not give us a reasonable correlation with expected speed to solve systematic errors and a gaussian regression model is used to deal with random ones.

Working in a network (e.g., road and rail networks), different map-matching algorithms can be used to replace or clean GPS positions of an object by a point on the network. These algorithms can be divided into geometric, topological, probabilistic and advanced (Quddus et al. 2007; Velaga et al. 2010). While geometric and topological algorithms, that use geometric and topological information, are simple, fast and easy to implement in real-time, probabilistic and advanced ones, that use probabilistic information and more refined concepts as mathematical theory of evidence, fuzzy logic models, etc., offer an higher accuracy but, are generally slow and difficult to implement.

Data related to movements grow progressively and intensively as the tracking time goes by, and data compression is an essential task that can be applied directly to raw data. Working with raw data, the compression consists in a reduction of the points used to describe a trajectory. Different algorithms, trying to balance the trade-off between accuracy (and information loss) and storage size, consider different spatial and temporal parameters. Muckell et al. (2014), proposing a new approach to trajectory compression, called SQUISH-E, perform a comprehensive evaluation and a comparison of several of them: uniform sampling, Douglas-Peucker, opening window and dead reckoning.

3.2 Structuring Trajectories

Except for simple movement maps, raw trajectories are insufficient and usually not so useful for meaningful trajectory applications. For this reason, a first basic analysis can be performed to structure trajectories in episodes, sequences of GPS points with common properties.

This step, usually called segmentation, can be defined using different features associated to GPS points. A frequently used approach recognizes two states in a trajectory, in particular stops and movements. To obtain this segmentation, several works in literature exploit threshold based approaches in order to properly identify where the object is in movement or not. For instance, Yan et al. (2013) use time and speed to analyze each detected position in a trajectory. Li et al. (2008) identify two categories of stay points: points where a user remains stationary for a period of time and points where a user moves slowly within a certain spatial region for a time period. Other authors consider more properties of movements, in order to improve the analisys. Buchin et al. (2011) propose a framework composed of several algorithms which segment any trajectory into a minimum number of sub-trajectories by combining techniques based on different sets of thresholds. They exploit basic attributes as location, heading, speed, and other more sophisticated as curvature and sinuosity. Recently, Pavan et al. (2015) described how state of the art techniques exploit threshold based approaches in order to refine the stay point identification, but highlighting several issues related to parameters which are too strict, such as acceleration, or which give low contribution during the computation, such as heading change.

The structuring phase may also include an additional segmentation of different sub-trajectories, in order to identify portions of path where the object may have movement related to a specific context, such as time periods or geographic areas. Different policies can be applied to divide trajectories based on the concerned application, for instance daily or weekly trajectories, if based on time, provincial or regional trajectories, if based on space. Usually, this process is applied as first step of trajectory structuring. Marketos et al. (2008) present a system where this process is adopted. They define a trajectory reconstruction algorithm, to divide the movement of an object in different sub-trajectories, which scans each point into the GPS detections list, in order to decide if the new series of coming data during the scan have to be appended to an existing trajectory or they contribute to the creation of a new one. This technique is based on thresholds, such as space, time and speed, that allow to properly apply this segmentation. Moreover, the thresholds help to remove noise and redundant raw data.

3.3 Semantic Enrichment of Trajectories

Semantic trajectories allow, through annotations, to enrich data with additional information depending on the specific aim of the application, and on the desired granularity level of information. To annotate each point is not usual because it can cause a big amount of redundant data. As described in (Parent et al. 2013), annotations are usually associated to episodes or to whole trajectories. Starting from contextual data repositories (e.g., OpenStreetMap and GoogleMaps), map-matching algorithms based on topological relationships allow to associate episodes of a trajectory with points (e.g., restaurants and shops), lines (e.g., walking streets and train rails) or regions (e.g., building and administrative areas) of interest (Yan et al. 2013). Moreover, depending on similar associations and additional observations, activities or transportation modes allow to motivate and describe episodes. More general annotations can characterize the whole trajectories (e.g., work and touristic trajectories).

In many applications moving objects are restricted to move within a given network (e.g., vehicles on the road network). Particular kinds of annotations can be defined in a network. For instance, Richter et al. (2012) define a semantic trajectory as a sequence of points localized in a transportation network annotated with specific events as origin, destination, intersections or stops. Several proposals combine map-matching algorithms on networks and data compression, in order to save only the interesting and significant points in a transportation network, losing only an acceptable amount of data which does not compromise the resulting information (Richter et al. 2012; Kellaris et al. 2013).

In a more abstract description, aiming for the user level, the spatial details about movements from one place to another one and the specific geographic positions of those locations can be lost mapping trajectories in a graph structure keeping just the relations between nodes and attributes for edges. Abandoning the bond with a geographical map led us to focus on the elements that define user behaviors and habits, in order to build a more generic model that allows to analyze users to find similarities, even if they live in different countries, but with same life style, i.e., same places and movement types. For instance, Zheng et al. (2009) and Xiao et al. (2010) build graphs among users’ locations connecting nodes (i.e., clusters of positions with semantic annotations) with directed edges to study sequences of locations.

4 Towards Social Habits and Behaviors

The social level introduced in Sect. 2 opens new challenges in trajectory mining and leds researchers to work on new systems based on social approaches, with focus on users’ behaviors, in order to define that new layer of information. It is clear how this new source of information, resulting from this new user oriented approach, could be important to exploit, and to improve current methodologies used in research to understand people and their behaviors. To design and implement an extraction process in an effective way, it is very important to get the right information from the collected raw location data. Also, a further refinement has considerable value to make deeper analysis, in order to infer additional knowledge about users.

Several researchers focus on recognizing patterns in mobile environments to analyze user communities. Karamshuk et al. (2011) present a survey on existing approaches to mobility modeling. Hui and Crowcroft (2008) propose a system for the analysis of human mobility considering the community structure as a network, in order to emphasize the relationships and improve the understanding of behaviors. Laxmi et al. (2012) present a study that analyzes the behavior of user patterns related to existing works of the past few years. In this direction other authors present their work on analysis of user communities, in order to build human mobility models. Noulas et al. (2011) analyze a large dataset from Foursquare to find spatio-temporal patterns and to observe how users make use of check-in feature provided by the social platform. Their results are useful for urban computing to study user mobility and urban spaces. Mohbey and Thakur (2013) propose a system based on mobile access pattern generation which has the capability to generate strong patterns between four different parameters, namely, mobile user, location, time and mobile service. They focus on mobile services exploited by users and their approach shows to be very useful in the mobile service environment for predictions and recommendations. Zheng et al. (2008, 2009, 2010) develop a brand new social network system, called GeoLife. It is based on user locations and trajectories, aiming to mine correlations between them.

The interest in these issues is strong, therefore some researchers also work on fundamental problems related to information extraction. A good starting point is to recognize important locations for the users, such places can tell a lot about their routine, namely, daily behavior and habits, thus, a sort of personal POIs. This process aims to identify places which have particular meaning for users, such as home, work, or any place where they spend a considerable amount of time during the day or which they visit with regularity.

4.1 Important Places Recognition

One of the most important issues underlying the systems that analyze user behaviors and habits is the recognition of users’ important places. Several studies focus on this topic to propose new approaches on this recognition process, and thus provide novel algorithms to use on more complex systems. Passing from raw information about coordinates to semantically enhanced data (e.g., shop, work, bar) is an important aspect in the task of discovering important places.

Kang et al. (2004) introduce a time-based clustering algorithm for extracting significant places from a trace of coordinates. They then evaluate it using real data from Place Lab (Schilit et al. 2003). Montoliu and Gatica-Perez (2010), Montoliu et al. (2013) propose a system based on two levels of clustering to obtain POIs: first, a time-based clustering technique which discovers stay points, then a grid-based clustering on the stay points to obtain stay regions. Isaacman et al. (2011) propose new techniques based on clustering and regression to analyze anonymized cellular network data usage in order to identify generally important locations.

Hightower et al. (2005) exploit WiFi and GSM radio fingerprints, collected by mobile devices, to automatically discover the places visited by people, associating semantics to coordinates, and detecting when people return to such locations. Their BeaconPrint algorithm, according to the authors, is also effective in discovering places visited infrequently or for short time. De Sabbata et al. (2008, 2009) provide an adaptation of the well-known PageRank algorithm, in order to estimate the importance of locations on the basis of their geographic features, focusing on aspects as contiguity, and the users movements. In particular, in the calculus of the importance score for each location, the speed can be used to highlight either places where the user has stopped or places where there is a high traffic density. Thus, the notion of importance of a location can be customized by considering the current needs or situation.

Many of these approaches base their algorithms on the number of user detected positions within a geographic area, and in some works with attention to the elapsed time between a detected position and the next one. For instance, Umair et al. (2014) introduce an algorithm for discovering PPOIs, exploiting a notion of “stable and dense logical neighborhood” of a GPS point. The latter is automatically determined using a threshold-based approach working on space, time and density of detections. To improve the recognition process, other factors and parameters are taken into consideration to enhance the algorithms. Li et al. (2008) mine single user movements in order to identify stay points where users spend time; then, by analyzing space and time thresholds, they compute a similarity function among users based on important places that represent them. Xiao et al. (2010) add semantics to users’ locations exploiting an external knowledge based on a database of POIs, in order to understand user’s interests and compute a similarity function between two of them without overlaps in geographic spaces. Recently, Bhattacharya et al. (2012) extracted significant places exploiting speed and the bearing change during user movement. More recently, Pavan et al. (2015) proposed a novel approach based on a feature space for mapping stay points. They first identify locations where users remain stationary, with state-of-the-art algorithms, then they define a new space composed of features more related to users, by considering parameters which describe users’ behaviors and habits. The feature space has, as dimensions, the area underlying the stay point, its intensity (the time spent in a location) and its frequency (the number of total visits). This approach allows to model aspects that are more semantically related to users and better suited to reason about their similarities and differences than, e.g., latitude, longitude, and timestamp.

Hang et al. (2013) adopt a different perspective presenting Platys, an adaptive and semisupervised solution for place recognition based on user labeling. It makes minimal assumptions about common parameters, such as types and frequencies of sensor readings, which are usually tuned up manually in other systems. Platys lets users to label the place at any time, assuming that important locations are those visited sufficiently often.

The results of these recent works have built the foundation for the next step in the direction of understanding users’ behaviors, in order to predict their future interests in terms of destinations they would like to visit in the next future.

4.2 Destination Prediction

Destination prediction is an useful feature for a lot of mobile applications that nowadays provide services to users, in order to recommend sightseeing places and targeted advertising based on destination. A common approach to destination prediction is to derive the probability of a location being the destination based on historical trajectories. Other researchers, instead, with a user oriented approach, more focused on habits and behaviors analysis, explored and discovered new techniques, in some cases exploiting external knowledge sources.

Avasthi and Dwivedi (2013) propose a system for user behavior prediction based on clustering. They analyze simultaneously the different mobile behaviors among users and temporal periods in order to compute clusters of users with features in common, and then to find similarities. Scellato et al. (2011) propose NextPlace, a novel approach to location prediction based on time of arrival and time users spend in relevant places. Zheng and Xie (2011) perform travel recommendations by mining multiple users’ GPS traces by analyzing the most interesting locations and places which match user’s travel preferences. Cheng et al. (2013) propose a method for POIs recommendation where the personalized Markov chains and region localization are used to take into account the temporal dimension and to improve the quality of recommendations. Liu et al. (2013) propose a novel recommendation model for destinations, exploiting the transition patterns of users’ preferences over location categories, in order to improve the accuracy of location recommendation. Lv et al. (2012) exploit hierarchical clustering techniques, to extract visited places from GPS trajectories, with Bayesian networks, identifying temporal patterns and analyzing custom databases of POIs. With this approach they extract semantics for each location, and are able to discover in an effective way user’s POIs. Gao et al. (2015) propose an approach based on content information related to different aspects of a user’s check-in action. Usually, existing works about POIs recommendation on location-based social networks, discover the spatial, temporal, and social patterns of user check-in behavior, with no particular importance to the content information. They model this kind of information, such as POI properties, user interests, and opinion expressions, by modeling a framework that considers their relationship to check-in actions, in order to improve existing recommendation systems.

5 Conclusions

The widespread of tracking technologies embedded into mobile devices increased the interest in collecting and analyzing movements of people, vehicles, and even animals or general objects, due to its benefits in several applications. The ever growing mobility data introduced several research problems and challenges, such as storage and indexing issues, quality and uncertainty of data, querying methodologies, efficient retrieval and optimized data mining. Traditionally, several efforts have been applied to data modeling and management. Researchers designed different methods and algorithms that, starting from raw data, i.e., sequences of positions associated with a timestamp, obtain different descriptions of movements through trajectories. Nowadays, data mining, and in particular trajectory data mining, is one of the most important research areas which contributes to provide innovative methodologies to extract new information from data about movements, such as particular patterns, or representative and common trends.

The wide amount of human mobility data and the big interest on social studies have opened new challenges. It is possible to achieve knowledge about how users or groups of users move, what kind of places they visit, and to understand their habits and behaviors. In this paper, we underlined how, starting from trajectory modeling, current research is working to aim at an additional social layer that deals with this kind of knowledge, in order to make a further step closer toward user modeling. Data about movements and locations have been processed to extract new knowledge about user habits and behaviors, and made possible a deeper analysis to have information about user preferences. This approach helps to provide more customized services to users, improves recommendation systems and personal assistants on mobile devices. The interest on these topics involves both accademia and industry activities, due to their benefits that led to algorithms improvements, and effects on products for final customers.

As summarized in this paper, it is clear the importance of the social layer, the new layer added on classical trajectory modeling, to make improvements in various directions. Recent research works, focalized on important places recognition and destination prediction, highlight how currently researchers consider significant working on the extraction of this new knowledge, also to bridge the gap among different disciplines, e.g., computer sciences, civil engineering, sociology. The information fusion across heterogeneous data sources and integration of algorithms of different domains can be explored to achieve further improvements and discover new knowledge.