Hausdorff Path Clustering and Hidden Markov Model Applied to Person Movement Prediction in Retail Spaces

Mendes, Francisco Romaldo

doi:10.1007/978-981-13-1208-3_7

Francisco Romaldo Mendes²

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

1956 Accesses

Abstract

Current advances in technology allow for the efficient capturing and storage of high-resolution and high-frequency person movement data.

Access provided by Autonomous University of Puebla. Download chapter PDF

Spatiotemporal Pattern Mining: Algorithms and Applications

Using Mobile Phone Location Data for Urban Activity Analysis

Human Spatial Behavior, Sensor Informatics, and Disaggregate Data

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Current advances in technology allow for the efficient capturing and storage of high-resolution and high-frequency person movement data. The advent of Wi-Fi position triangulation has allowed us to capture human movement with a great deal of accuracy inside a closed urban structure, e.g., a university or a shopping mall. While there have been significant advances in our ability to capture this data, advances in robust modeling techniques have been largely absent. Inferences are drawn mainly by visual techniques (heat maps, path plotting, etc.) in technology-driven applications. In this paper, we aim to present our theoretical insights based on person movement data collected from Deloitte University, West Lake Texas. We outline two independent approaches in this paper. The first aims to model person movement and develop an appropriate prediction mechanism, while the second aims to classify people based on their movement history.

2 Data Description

We collect the movement data of 1,425 users, who have logged into Deloitte University’s Wi-Fi. The exact x and y coordinates are calculated by triangulating the signals from two Wi-Fi routers every 3 s. For a given user, we would have data of the following form (MAC Address, Latitude, Longitude, Time), where MAC is an identifier that uniquely identifies a user’s mobile device. Every third second, a new row of data is added to the aforementioned form. We have 435,290 such rows of data. For a given user, a “chain” of observations is a sequence of time-ordered geo-location 3-dimensional vectors of the following form: {(Latitude₁, Longitude₁, Time₁),…,(Latitude_T, Longitude_T, Time_T))}. Where T refers to the last observed time point in the dataset.

3 Preliminaries

We define a few mathematical preliminaries to aid our discussion. For a given individual “k”, we define the following set: $ p_{k} = \{ (x_{1} ,y_{1} ,t_{1} ), \ldots ,(x_{T} ,y_{T} ,t_{T} ))\} $ such that t_i < t_i+1. For a given real space define the set of labels: L = {“A”, “B”, “C”, “D”,…}. Each label corresponds to a specific room in the retail space. For example, the label set could be written as L = {“Cinema Hall”, “Coffee Shop”, “Park Spot 453”,…}. In order to maintain conciseness, we choose label names “A”, “B”, and so on rather than more illuminating label names.

4 Part 1: Movement Prediction

The movement prediction approach takes place in using two independent methods applied successively as follows:

Part 1. Room assignment using k-means clustering. The objective is to find out the rooms inside Deloitte University. The basic idea is that the rooms inside D.U. will be densely populated with co-ordinate vectors. We use the k-means algorithm to discover such rooms.
Part 2. Movement prediction using the Hidden Markov Model. After we have assigned every point in the coordinate space of a room, we translate every given users movement data to room data. The details of which are enclosed in the following sections. Once, we have translated every user’s movement data from a sequence of coordinates to a sequence of rooms, we can then use the Hidden Markov Model trained on such discrete symbol sequences to predict newly observed sequences.

4.1 Room Assignment

One of the major challenges with person movement is that due to the continuous nature of the data and the complex layout of most urban structures, it is not easy to assign rooms to a specific collection of (x, y) coordinates. For example, let us say we have two pairs of points (x_l, y_l) and (x_h, y_h) such that x_l6 = x_h and y_l6 = y_h. Both these points may correspond to the same physical room. The simple way to go about this would be to use some set of inequalities on the real plane to assign rooms to the points. This approach suffers from two drawbacks, this may be prohibitively complex for large spaces with a large number of small rooms, the second more subtle problem with this approach is that it would eliminate any inferences we could draw from the spatial data on how people actually use the room. As an example, consider how people use a supermarket or a grocery store, while from a construction point of view the grocery store is a single large “room”, people may not be visiting all corners of that large “room”. In order for us to actually predict or draw inferences from movement data, we must not only gather their movement data but also assign it a label that makes sense from a business point of view. In this context it would be the specific section that the coordinate refers to, e.g., “Fresh Vegetables” or “Cereal” (Fig. 1).

The approach we propose actually allows us to infer what the rooms are based on the observed data, not only this is computationally more efficient but also allows retail space owners to understand how people actually use their retail space. We choose a distance metric (in most practical applications, the Euclidean metric should be sufficient) and cluster the data based on the distance metric starting with an arbitrary number of cluster centers (usually, we set the number of cluster centers as the number of large rooms we see in the data. Choice of cluster centers depends critically on what level of granularity we want in our predictions. We will deal with this issue later in the text. The process of room assignment essentially defines a one to one mapping from the real space to the space of labels for each of the rooms in the data, i.e., f: R × R → L.

4.2 Movement Modeling Using Hidden Markov Models

Recall that the path is defined as follows: $ p_{k} = \{ (x_{1} ,y_{1} ,t_{1} ), \ldots ,(x_{T} ,y_{T} ,t_{T} ))\} $, where x, y were GPS coordinates and t was the time stamp. Now, every point in that path has been assigned to a room, hence the sequence is now discrete and given by a sequence of labels, e.g., $ {\text{P}} = \left\{ {{\text{A, A, A, B, B, C, D, A }} \ldots {\text{A, C, C, D, }} \ldots {\text{D}}} \right\} $, where every label represents the room to which the corresponding co-ordinate was allocated using the k-means clustering algorithm. Note that “P” represents a path in terms of a sequence of room labels and “p” represents the path in terms of the coordinate space. In our example paths, it is necessarily true that (x₁, y₁) ∈ A and (x_T, y_T) ∈ D. Also note that, the sequence denoted by P is ordered in the time-sense, i.e., first observation occurred before the second, second before third, and so on. These path sequences will form long chains of observations for any given individual. This naturally suggests the use of the discrete version of the Hidden Markov Model. For a given individual, we train a Hidden Markov Model of order 1 (we can conceptually extend this to higher order Markov models as well) having two states on the movement data of the whole dataset. Before we fit a Hidden Markov Model, we must completely specify the following parameters (for a further discussion on Hidden Markov Models and their specification see Rabiner):

1.
N = 2 (Hidden States play an important role in Hidden Markov Theory, for a more complete discussion refer to Rabiner, in our case it can be intuitively thought of the time of day, i.e., the sequence of room labels are very likely to be different depending on the tie of the day. We assume here that there are two major states, which emit two very different kinds of Markov sequences depending on the time of day).
2.
M = L (Number of rooms, i.e., number of symbols we observe, as standard for a Markov chain the observed labels must come from a predefined set of limited labels).
3.
$ {\text{A}} = \left[ {\begin{array}{*{20}c} {{\text{S}}_{11} } & {{\text{S}}_{12} } \\ {{\text{S}}_{21} } & {{\text{S}}_{22} } \\ \end{array} } \right] $—The transition probabilities between hidden states S_i and S_j, where $ a_{ij} = P[q_{t + 1} = S_{j} |q_{t} = S_{i} ] $, $ 0 < i,j < N - 1 $, where i, j denote the positions in the A matrix., where q_t denotes the state which occurs at time “t”. Not that even though the variable q_t can assume values S_i and S_j at any time “t” we cannot observe this, it is purely theoretical (for a more complete discussion refer to Rabiner).
4.
B = {b_j(k)}—The probabilities of the observable labels, i.e., rooms L _k in state S_j.
5.
$ \Pi = \{ \pi_{i} \} $—The initial hidden state probabilities, where $ pi_{i} = P[q_{i} = S_{i} |t = 0] $ $ 0 < i < N - 1 $.

5 Part 2: Path Clustering

Customer paths offer a huge amount of high-dimensional data that can be used to draw insights into their behavior. Simply visualizing these patterns of movement can give several powerful insights into customer behavior. However, in this section, we seek to formalize some techniques so that we can employ these techniques over large data sets fairly easily. Clustering essentially defines p_i p_{j ∀} p _∈ c_k where p_i and p_j are paths and c_k is a cluster. In this context, our algorithm works in several stages

Step 1:
Path Simplification using the Ramer–Douglas–Peucker (RDP) algorithm.
Step 2:
Defining a distance metric (we use Hausdorff distance) and using this distance metric to cluster the data.
Step 3:
Choosing an appropriate clustering algorithm to cluster various data points.

6 Experimental Results

In this section, we discuss our experimental results on the Deloitte University Westlake Texas Campus. We assign 6 symbols to the 6 main areas of the Deloitte University Campus. We present the experimental results for a USER ID “USCRESTRONMVSC”. We have an observation sequence for this user which is 12,000 observations long, we train the HMM on 9600 observations and then test on the remaining 2400 observations. We report the following:

For N = 2, we chose two hidden states because this gave the best results and they correspond roughly to the two states we see in our data, “Workhours” and “After-Work Hours”.

M = {“A”, “B”, “C”, “D”, “E”, “F”} Here, we label the major areas of Deloitte.

University Westlake, Texas using the integer labels 1 through 6 (shown in Fig. 2). These labels are shown in Fig. 2. The major areas are as follows:

A.
Entrance 2, an alternate entry/exit point.
B.
Grand Ballroom, largest conference hall in the campus.
C.
DFit which is the gymnasium.
D.
Bistro 375.
E.
The Market which is a large open cafeteria.
F.
Porte Cochere which is the main entrance.

$$ {\text{A}} = \left[ {\begin{array}{*{20}c} {0.9897389} & {0.0102611} \\ {0.13262620} & {0.8673740} \\ \end{array} } \right] $$

In keeping with our theme, that the two states in our model represent “Work-Hours” (State 1) and “After-Work Hours” (State 2), the state transition matrix matches our intuition, as the Diagonal Transitions are of highest probability and the off-diagonals are of fairly low probability.

$$ {\text{B}} = \left[ {\begin{array}{*{20}c} {0.009762765} & {0.32275700} & {0.31065118} & {0.009762765} & {0.009762765} & {0.33730352} \\ {0.086580087} & {0.08658009} & {0.086580087} & {0.086580087} & {0.5670099567} & {0.08658009} \\ \end{array} } \right] $$

The emission probability matrix also matches our intuition as 5 represents The Market (large cafeteria), where people go in the evening. This confirms that the model is picking up on aspects of human behavior as we would expect. As the emission probability of The Market is highest during State 2 and the emission probabilities of the other five locations are fairly high during State 1, i.e., “Work-Hours”.

$$ {\text{P}} = \left[ {\begin{array}{*{20}c} {0.7} & {0.3} \\ \end{array} } \right] $$

The initial state probabilities are also skewed towards “Work-Hours” because the Wi-Fi triangulation devices are switched off shortly after 9 pm. We use the Hidden Markov Model described in the parameters above to predict the next in a sequence of observations using the algorithm described below:

Step 1:
Train Hidden Markov Model up to observation sequence of length T.
Step 2:
Simulate T + 1, 10⁵ times call this sequence $ T = \{ T_{1}^{{\prime }} ,T_{2}^{{\prime }} ,T_{3}^{{\prime }} , \ldots .,T_{m}^{{\prime }} \} $
Step 3:
Take a vote over the set τ and choose the prediction that occurs the most frequent.
Step 4:
Call this T′ + 1, evaluate against T + 1 (the actual value and give it a score of 1 if correct else 0).
Step 5:
Repeat step 1 for the observation chain up to T + 1.

We report an accuracy of 65% for this user. Such a prediction algorithm would allow us to predict a user’s next location based on their current location. In this context, accuracy is not always the best measure of usefulness, it is merely a guide when designing algorithms. As any targeted advertising based on this algorithm would also behave as a nudge to the user, users may just boost the usefulness of the algorithm by using any byproduct of the algorithm as a behavioral nudge rather than rejecting it entirely. We could also boost the accuracy by increasing the order of the Markov chain used to make the prediction.

7 Part 2: Path Clustering (Experimental Results)

Path Clustering is an extremely complex process, as one can see by the sheer number of calculations needed between all permutations and combinations between the sets of each path. Path Cleaning provides a simplification over the initial raw path that is an accurate representation of the raw path. The RDP algorithm works by calculating a new path consisting of a straight segment from given start and end points and either checking if all the points in between are not too distant or including the most distant point as a necessary endpoint, cutting the proposed segment into two and repeating recursively on the two smaller segments. In our case, we use an unconstrained path simplification but a constrained path simplification is the best practice as it will take into account all obstacles, which may contain important person movement information.

Clustering algorithms necessitate the development of a distance metric that judges the similarity of two objects in the dataset. Any two objects in our dataset are paths and the similarity between any two objects is defined by the Hausdorff distance. Hausdorff distance which has been discussed above is a basic measure of similarity between sets.

Choice of the clustering algorithm, we use the DBScan algorithm primarily because it does not necessitate the knowledge of the number of clusters beforehand.

Possible extensions to the method described above include extending the dimensions of the path vector, in order to integrate higher dimension variables, such extensions would be simple extensions from a theoretical standpoint but may be computationally expensive in practice. For example, a user path vector could include {x, y, t, W, D…}, where W denotes total money spent at similar {x, y}, co-ordinate on the previous trip in the same supermarket, D denotes total distance traveled since Wi-Fi first triangulated position. Such higher dimension path vectors would allow us to get a greater insight into user behavior.

8 Experimental Results of Path Clustering Using Hausdorff Distance

Here, we present a few examples of path clustering using our algorithm using the Hausdorff distance. Figure 3 shows three representative paths that could provide the motivation for the need for path clustering. The green path shows the movement of a user who has entered Deloitte University and used The Market, Bistro 375, and the Fit. The pink path shows a user who has used Bistro 375 and The Market. The blue path denotes the most usual Deloitte University user, who uses the Grand Ballroom during business hours perhaps to attend a conference and then moves to The Market for a lunch break and perhaps moves outside near Porte Cochere. Our algorithm as described in the preceding sections will first simplify the paths and then cluster them based on some similarity metric.

Figure 4 shows a larger number of representative paths from 3 similar user groups after simplification, the algorithm is able to segregate the users into 3 classes.

9 Conclusions and Further Work

This paper presented a novel technique for modeling movement data of a large population as well as techniques to draw inferences about individuals using their path traces. We report an accuracy of 65% for a Hidden Markov Model of order 1. We expect that optimized models may yield higher prediction accuracy. Our results for a small number of paths show that our algorithm efficiently segregates paths into various path types. We recommend using HMMs of higher orders as these accurately model human behavior better as human movement usually has memory ≥1.

References

Algorithms for the reduction of the number of pints required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112–122 (1973).
Google Scholar
Bauckhage, C., Sifa, R., Drachen, A., Thurau, C., & Hadiji, F. (Aug 2014) Beyond heatmaps: Spatio-temporal clustering using behavior based partitioning of game levels. In CIG’14: IEEE Conference on Computational Intelligence and Games pages (pp. 1–8).
Google Scholar
Birney, E. (2001) Hidden Markov models in biological sequence analysis. IBM Journal of Research and Development, 45(3/4).
Google Scholar
Campbell, J., Tremblay, J., & Verbrugge, C. (2014) Clustering player paths. In Proceedings of the 9th International Conference on Foundations of Digital Games.
Google Scholar
Gellert, A., & Vintan, L. (2006) Person movement prediction using hidden Markov models. Studies in Informatics and Control, 15(1).
Google Scholar
Rabiner, L.R. (Feb 1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2).
Google Scholar
Campbell, J., Tremblay, J. Unity 3d—Clustering Open Source Implementation.
Google Scholar

Download references

Author information

Authors and Affiliations

Deloitte Consulting LLP, New York, USA
Francisco Romaldo Mendes

Authors

Francisco Romaldo Mendes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Romaldo Mendes .

Editor information

Editors and Affiliations

Indian Institute of Management Ahmedabad, Ahmedabad, Gujarat, India
Arnab Kumar Laha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mendes, F.R. (2019). Hausdorff Path Clustering and Hidden Markov Model Applied to Person Movement Prediction in Retail Spaces. In: Laha, A. (eds) Advances in Analytics and Applications. Springer Proceedings in Business and Economics. Springer, Singapore. https://doi.org/10.1007/978-981-13-1208-3_7

Download citation

DOI: https://doi.org/10.1007/978-981-13-1208-3_7
Published: 08 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1207-6
Online ISBN: 978-981-13-1208-3
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics