Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Current advances in technology allow for the efficient capturing and storage of high-resolution and high-frequency person movement data. The advent of Wi-Fi position triangulation has allowed us to capture human movement with a great deal of accuracy inside a closed urban structure, e.g., a university or a shopping mall. While there have been significant advances in our ability to capture this data, advances in robust modeling techniques have been largely absent. Inferences are drawn mainly by visual techniques (heat maps, path plotting, etc.) in technology-driven applications. In this paper, we aim to present our theoretical insights based on person movement data collected from Deloitte University, West Lake Texas. We outline two independent approaches in this paper. The first aims to model person movement and develop an appropriate prediction mechanism, while the second aims to classify people based on their movement history.

2 Data Description

We collect the movement data of 1,425 users, who have logged into Deloitte University’s Wi-Fi. The exact x and y coordinates are calculated by triangulating the signals from two Wi-Fi routers every 3 s. For a given user, we would have data of the following form (MAC Address, Latitude, Longitude, Time), where MAC is an identifier that uniquely identifies a user’s mobile device. Every third second, a new row of data is added to the aforementioned form. We have 435,290 such rows of data. For a given user, a “chain” of observations is a sequence of time-ordered geo-location 3-dimensional vectors of the following form: {(Latitude1, Longitude1, Time1),…,(LatitudeT, LongitudeT, TimeT))}. Where T refers to the last observed time point in the dataset.

3 Preliminaries

We define a few mathematical preliminaries to aid our discussion. For a given individual “k”, we define the following set: \( p_{k} = \{ (x_{1} ,y_{1} ,t_{1} ), \ldots ,(x_{T} ,y_{T} ,t_{T} ))\} \) such that ti < ti+1. For a given real space define the set of labels: L = {“A”, “B”, “C”, “D”,…}. Each label corresponds to a specific room in the retail space. For example, the label set could be written as L = {“Cinema Hall”, “Coffee Shop”, “Park Spot 453”,…}. In order to maintain conciseness, we choose label names “A”, “B”, and so on rather than more illuminating label names.

4 Part 1: Movement Prediction

The movement prediction approach takes place in using two independent methods applied successively as follows:

  • Part 1. Room assignment using k-means clustering. The objective is to find out the rooms inside Deloitte University. The basic idea is that the rooms inside D.U. will be densely populated with co-ordinate vectors. We use the k-means algorithm to discover such rooms.

  • Part 2. Movement prediction using the Hidden Markov Model. After we have assigned every point in the coordinate space of a room, we translate every given users movement data to room data. The details of which are enclosed in the following sections. Once, we have translated every user’s movement data from a sequence of coordinates to a sequence of rooms, we can then use the Hidden Markov Model trained on such discrete symbol sequences to predict newly observed sequences.

4.1 Room Assignment

One of the major challenges with person movement is that due to the continuous nature of the data and the complex layout of most urban structures, it is not easy to assign rooms to a specific collection of (x, y) coordinates. For example, let us say we have two pairs of points (xl, yl) and (xh, yh) such that xl6 = xh and yl6 = yh. Both these points may correspond to the same physical room. The simple way to go about this would be to use some set of inequalities on the real plane to assign rooms to the points. This approach suffers from two drawbacks, this may be prohibitively complex for large spaces with a large number of small rooms, the second more subtle problem with this approach is that it would eliminate any inferences we could draw from the spatial data on how people actually use the room. As an example, consider how people use a supermarket or a grocery store, while from a construction point of view the grocery store is a single large “room”, people may not be visiting all corners of that large “room”. In order for us to actually predict or draw inferences from movement data, we must not only gather their movement data but also assign it a label that makes sense from a business point of view. In this context it would be the specific section that the coordinate refers to, e.g., “Fresh Vegetables” or “Cereal” (Fig. 1).

Fig. 1
figure 1

Colors represent rooms (here we have 10 centers) and topography represents density of observations (Color figure online)

The approach we propose actually allows us to infer what the rooms are based on the observed data, not only this is computationally more efficient but also allows retail space owners to understand how people actually use their retail space. We choose a distance metric (in most practical applications, the Euclidean metric should be sufficient) and cluster the data based on the distance metric starting with an arbitrary number of cluster centers (usually, we set the number of cluster centers as the number of large rooms we see in the data. Choice of cluster centers depends critically on what level of granularity we want in our predictions. We will deal with this issue later in the text. The process of room assignment essentially defines a one to one mapping from the real space to the space of labels for each of the rooms in the data, i.e., f: R × R → L.

4.2 Movement Modeling Using Hidden Markov Models

Recall that the path is defined as follows: \( p_{k} = \{ (x_{1} ,y_{1} ,t_{1} ), \ldots ,(x_{T} ,y_{T} ,t_{T} ))\} \), where x, y were GPS coordinates and t was the time stamp. Now, every point in that path has been assigned to a room, hence the sequence is now discrete and given by a sequence of labels, e.g., \( {\text{P}} = \left\{ {{\text{A, A, A, B, B, C, D, A }} \ldots {\text{A, C, C, D, }} \ldots {\text{D}}} \right\} \), where every label represents the room to which the corresponding co-ordinate was allocated using the k-means clustering algorithm. Note that “P” represents a path in terms of a sequence of room labels and “p” represents the path in terms of the coordinate space. In our example paths, it is necessarily true that (x1, y1) ∈ A and (xT, yT) ∈ D. Also note that, the sequence denoted by P is ordered in the time-sense, i.e., first observation occurred before the second, second before third, and so on. These path sequences will form long chains of observations for any given individual. This naturally suggests the use of the discrete version of the Hidden Markov Model. For a given individual, we train a Hidden Markov Model of order 1 (we can conceptually extend this to higher order Markov models as well) having two states on the movement data of the whole dataset. Before we fit a Hidden Markov Model, we must completely specify the following parameters (for a further discussion on Hidden Markov Models and their specification see Rabiner):

  1. 1.

    N = 2 (Hidden States play an important role in Hidden Markov Theory, for a more complete discussion refer to Rabiner, in our case it can be intuitively thought of the time of day, i.e., the sequence of room labels are very likely to be different depending on the tie of the day. We assume here that there are two major states, which emit two very different kinds of Markov sequences depending on the time of day).

  2. 2.

    M = L (Number of rooms, i.e., number of symbols we observe, as standard for a Markov chain the observed labels must come from a predefined set of limited labels).

  3. 3.

    \( {\text{A}} = \left[ {\begin{array}{*{20}c} {{\text{S}}_{11} } & {{\text{S}}_{12} } \\ {{\text{S}}_{21} } & {{\text{S}}_{22} } \\ \end{array} } \right] \)—The transition probabilities between hidden states Si and Sj, where \( a_{ij} = P[q_{t + 1} = S_{j} |q_{t} = S_{i} ] \), \( 0 < i,j < N - 1 \), where i, j denote the positions in the A matrix., where qt denotes the state which occurs at time “t”. Not that even though the variable qt can assume values Si and Sj at any time “t” we cannot observe this, it is purely theoretical (for a more complete discussion refer to Rabiner).

  4. 4.

    B = {bj(k)}—The probabilities of the observable labels, i.e., rooms L k in state Sj.

  5. 5.

    \( \Pi = \{ \pi_{i} \} \)—The initial hidden state probabilities, where \( pi_{i} = P[q_{i} = S_{i} |t = 0] \) \( 0 < i < N - 1 \).

5 Part 2: Path Clustering

Customer paths offer a huge amount of high-dimensional data that can be used to draw insights into their behavior. Simply visualizing these patterns of movement can give several powerful insights into customer behavior. However, in this section, we seek to formalize some techniques so that we can employ these techniques over large data sets fairly easily. Clustering essentially defines pi pj p ck where pi and pj are paths and ck is a cluster. In this context, our algorithm works in several stages

  1. Step 1:

    Path Simplification using the Ramer–Douglas–Peucker (RDP) algorithm.

  2. Step 2:

    Defining a distance metric (we use Hausdorff distance) and using this distance metric to cluster the data.

  3. Step 3:

    Choosing an appropriate clustering algorithm to cluster various data points.

6 Experimental Results

In this section, we discuss our experimental results on the Deloitte University Westlake Texas Campus. We assign 6 symbols to the 6 main areas of the Deloitte University Campus. We present the experimental results for a USER ID “USCRESTRONMVSC”. We have an observation sequence for this user which is 12,000 observations long, we train the HMM on 9600 observations and then test on the remaining 2400 observations. We report the following:

For N = 2, we chose two hidden states because this gave the best results and they correspond roughly to the two states we see in our data, “Workhours” and “After-Work Hours”.

M = {“A”, “B”, “C”, “D”, “E”, “F”} Here, we label the major areas of Deloitte.

University Westlake, Texas using the integer labels 1 through 6 (shown in Fig. 2). These labels are shown in Fig. 2. The major areas are as follows:

Fig. 2
figure 2

Major locations in Deloitte University

  1. A.

    Entrance 2, an alternate entry/exit point.

  2. B.

    Grand Ballroom, largest conference hall in the campus.

  3. C.

    DFit which is the gymnasium.

  4. D.

    Bistro 375.

  5. E.

    The Market which is a large open cafeteria.

  6. F.

    Porte Cochere which is the main entrance.

$$ {\text{A}} = \left[ {\begin{array}{*{20}c} {0.9897389} & {0.0102611} \\ {0.13262620} & {0.8673740} \\ \end{array} } \right] $$

In keeping with our theme, that the two states in our model represent “Work-Hours” (State 1) and “After-Work Hours” (State 2), the state transition matrix matches our intuition, as the Diagonal Transitions are of highest probability and the off-diagonals are of fairly low probability.

$$ {\text{B}} = \left[ {\begin{array}{*{20}c} {0.009762765} & {0.32275700} & {0.31065118} & {0.009762765} & {0.009762765} & {0.33730352} \\ {0.086580087} & {0.08658009} & {0.086580087} & {0.086580087} & {0.5670099567} & {0.08658009} \\ \end{array} } \right] $$

The emission probability matrix also matches our intuition as 5 represents The Market (large cafeteria), where people go in the evening. This confirms that the model is picking up on aspects of human behavior as we would expect. As the emission probability of The Market is highest during State 2 and the emission probabilities of the other five locations are fairly high during State 1, i.e., “Work-Hours”.

$$ {\text{P}} = \left[ {\begin{array}{*{20}c} {0.7} & {0.3} \\ \end{array} } \right] $$

The initial state probabilities are also skewed towards “Work-Hours” because the Wi-Fi triangulation devices are switched off shortly after 9 pm. We use the Hidden Markov Model described in the parameters above to predict the next in a sequence of observations using the algorithm described below:

  1. Step 1:

    Train Hidden Markov Model up to observation sequence of length T.

  2. Step 2:

    Simulate T + 1, 105 times call this sequence \( T = \{ T_{1}^{{\prime }} ,T_{2}^{{\prime }} ,T_{3}^{{\prime }} , \ldots .,T_{m}^{{\prime }} \} \)

  3. Step 3:

    Take a vote over the set τ and choose the prediction that occurs the most frequent.

  4. Step 4:

    Call this T′ + 1, evaluate against T + 1 (the actual value and give it a score of 1 if correct else 0).

  5. Step 5:

    Repeat step 1 for the observation chain up to T + 1.

We report an accuracy of 65% for this user. Such a prediction algorithm would allow us to predict a user’s next location based on their current location. In this context, accuracy is not always the best measure of usefulness, it is merely a guide when designing algorithms. As any targeted advertising based on this algorithm would also behave as a nudge to the user, users may just boost the usefulness of the algorithm by using any byproduct of the algorithm as a behavioral nudge rather than rejecting it entirely. We could also boost the accuracy by increasing the order of the Markov chain used to make the prediction.

7 Part 2: Path Clustering (Experimental Results)

Path Clustering is an extremely complex process, as one can see by the sheer number of calculations needed between all permutations and combinations between the sets of each path. Path Cleaning provides a simplification over the initial raw path that is an accurate representation of the raw path. The RDP algorithm works by calculating a new path consisting of a straight segment from given start and end points and either checking if all the points in between are not too distant or including the most distant point as a necessary endpoint, cutting the proposed segment into two and repeating recursively on the two smaller segments. In our case, we use an unconstrained path simplification but a constrained path simplification is the best practice as it will take into account all obstacles, which may contain important person movement information.

Clustering algorithms necessitate the development of a distance metric that judges the similarity of two objects in the dataset. Any two objects in our dataset are paths and the similarity between any two objects is defined by the Hausdorff distance. Hausdorff distance which has been discussed above is a basic measure of similarity between sets.

Choice of the clustering algorithm, we use the DBScan algorithm primarily because it does not necessitate the knowledge of the number of clusters beforehand.

Possible extensions to the method described above include extending the dimensions of the path vector, in order to integrate higher dimension variables, such extensions would be simple extensions from a theoretical standpoint but may be computationally expensive in practice. For example, a user path vector could include {x, y, t, W, D…}, where W denotes total money spent at similar {x, y}, co-ordinate on the previous trip in the same supermarket, D denotes total distance traveled since Wi-Fi first triangulated position. Such higher dimension path vectors would allow us to get a greater insight into user behavior.

8 Experimental Results of Path Clustering Using Hausdorff Distance

Here, we present a few examples of path clustering using our algorithm using the Hausdorff distance. Figure 3 shows three representative paths that could provide the motivation for the need for path clustering. The green path shows the movement of a user who has entered Deloitte University and used The Market, Bistro 375, and the Fit. The pink path shows a user who has used Bistro 375 and The Market. The blue path denotes the most usual Deloitte University user, who uses the Grand Ballroom during business hours perhaps to attend a conference and then moves to The Market for a lunch break and perhaps moves outside near Porte Cochere. Our algorithm as described in the preceding sections will first simplify the paths and then cluster them based on some similarity metric.

Fig. 3
figure 3

Path clustering without simplification

Figure 4 shows a larger number of representative paths from 3 similar user groups after simplification, the algorithm is able to segregate the users into 3 classes.

Fig. 4
figure 4

Path clustering example showing 3 clusters after simplification

9 Conclusions and Further Work

This paper presented a novel technique for modeling movement data of a large population as well as techniques to draw inferences about individuals using their path traces. We report an accuracy of 65% for a Hidden Markov Model of order 1. We expect that optimized models may yield higher prediction accuracy. Our results for a small number of paths show that our algorithm efficiently segregates paths into various path types. We recommend using HMMs of higher orders as these accurately model human behavior better as human movement usually has memory ≥1.