Study on the Personal Information Anonymization Method for the Releasing of Navigation Data

Gao, Jiannan; Ying, Rendong; Liu, Peilin; Yu, Wenxian

doi:10.1007/978-3-642-54737-9_14

Jiannan Gao⁵,
Rendong Ying⁵,
Peilin Liu⁵ &
…
Wenxian Yu⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 303))

2685 Accesses

Abstract

In this paper we give a quantitative calculation of personal information anonymization in the released navigation data based on the information theory. Individual personal information privacy index is defined and calculated based on the Markov chain model. The simulation of special state model shows that the proposed algorithm can be used to evaluate the ambiguity of different individuals from the navigation data. The simulation is based on Markov chain model, while the proposed personal information privacy metric algorithm does not depend on the motion model, one can apply it to other more accurate individual motion models and provide a quantitative basis for the personal information anonymization for the releasing of navigation data.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Personalized semantic trajectory privacy preservation through trajectory reconstruction

Article 26 August 2017

Introduction to Mobility Data Privacy

Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing

Article 30 September 2022

Keywords

1 Introduction

Along with the development of mobile communication technology, satellite navigation technology and data mining theory, the mobile services based on the location information develop rapidly in recent years. Personal location information is used in different areas and provides customers with convenient and efficient information services including traffic analysis and optimization, mall pedestrian flow analysis, advertising, etc.

However, numerous utilizations of location-based services (LBS) also bring privacy and security issues. Once the personal location information is stolen, it is likely to cause the leakage of behavioral patterns, hobbies, living habits and other information, more likely to cause personal security threats. Therefore, LBS information privacy protection is becoming the hotspot and trends of the theoretical study of LBS currently. Most of the existing algorithms based on k-anonymous technologies meet the needs of location information privacy protection of single query users, while for continuous query users, those technologies do not work well because from the analysis of the trajectory, some privacy can still be revealed. When additional ancillary information is provided, it is readily to identify an individual. Therefore, resulting in the leakage of personal privacy.

There are already some trajectory information anonymization technologies including personalized k-anonymous technology [1], the silent period technology [2], the PRIVE method [3] and MOBIHIDE method [4]. SP. Li puts forward the anonymity measurement based on entropy theory [5].

In this paper, a new anonymity measurement is proposed, which is based on the changing of entropy of trace data with time. Besides by utilizing the Markov motion model, personal information anonymization problem during the navigation trajectory information dissemination is also studied in this article.

2 Problem Description

The scenes of this study are described as follows. Given a known initial location data of target A, and a target movement trajectory, how to avoid associating target A and the moving trajectory?

An example of application is that a large shopping mall, manager of the mall needs to optimize the layout of the shops by the analysis of the moving trajectory of customers in the store. But the customer does not want the shopping mall to associate their personal information with their trajectory, namely: the shopping mall can get the trajectory data, but is not able to identify the customers.

From a practical point of view, at some specific locations of a shopping mall, such as the elevators with video camera or the cashier section, the identity and location of customer can be known simultaneously. Therefore, to avoid this kind of information binding, the navigation information at these locations should be masked. Suppose that at time 0, the location of individual target is C ₀, in order to avoid the association between the published trajectory data and the target, one need to eliminate the trajectory data between the time 0 and T. The question is: what’s the minimal value of T. In the following section we will discuss the mathematic model that defined the ambiguity index with the change of time T.

3 Mathematic Model

We define the ambiguity of trajectory data and individual goals from the perspective of probability. Assume that at time T, there are M people (M > 1) at the position u. the probability distribution of the identity of the person appearing at this moment and this position can be used to infer his identity. This probability is a function of T, u and the individual identity m, as:

$$ p(m;T,{\mathbf{u}}) $$

(3.1)

where m = 1, 2,…, M represent the identity of each person and u is the location an identity can appear. So for different individual m, the variation or flatness of its probability $ p(m;T,{\mathbf{u}}) $ can be used to evaluate the ambiguity of each individual. In view of the information theory, the ambiguity can be evaluated by the entropy, namely:

$$ h(T;{\mathbf{u}}) = - \sum\nolimits_{m} {p(m;T,{\mathbf{u}})\log p(m;T,{\mathbf{u}})} $$

(3.2)

The entropy h(T; u) reaches the maximum when p(m; T, u) is constant for different m, i.e. each identity looks the same from the point of view of probability. Since the entropy h(T; u) is a function of time T, therefore, it represents the level of personal information anonymization during the navigation trajectory

Considering the actual situation, the customers will not always linger in one area, and with the increase of T they will eventually leave the area, which means that for large enough T,

$$ \sum\nolimits_{m} {p(m;T,{\mathbf{u}}) < 1} $$

(3.3)

In order to make the definition of ambiguity in the formula (3.1) meaningful, it is necessary to “normalize” entropy, which gives the definition of the “information privacy metric” as below:

$$ g(T;{\mathbf{u}}) = - \frac{1}{a}\sum\nolimits_{m} {p(m;T,{\mathbf{u}})\log } \frac{{p(m;T,{\mathbf{u}})}}{a} $$

(3.4)

where $ a = \sum\nolimits_{m} {p(m;T,{\mathbf{u}})} $ is the normalization factor. For in the scenarios of M individuals, the maximum of $ g(T;{\mathbf{u}}) $ is logM.

In order to find the relationship between $ g(T;{\mathbf{u}}) $ and T or u, we need to calculate the probability $ p(m;T,{\mathbf{u}}) $. In order to define the following motion model we first discretize the time and denote each time step by integers (n = 0, 1, 2,…). Then the first-order Markov chain model is selected as the simplified motion model for individuals. An example of the model is shown in Fig. 14.1.

In the state transition model shown above, each state represent a location, therefor 5 locations (u _k, k = 1, 2, 3, 4, 5) are considered in the model. Among these locations, u ₅ is special, which represents the case that customers leave the mall and don’t come back, namely “absorbing state”. In this model the position vector of all the identite u(n) at any moment only depends on the position at former moment u(n − 1) with the transfer probability matrix of the Markov chain model. Numbers on the arrows of the figure represents the probability from one location to another location, which is also the state transition probability. Consider the case when there is only one person in the store, denote p(n) the probability that this person at time n in each position in the map, thus resulting in the state transition equation below:

$$ {\mathbf{p}}(n) = {\mathbf{p}}(n - 1){\mathbf{H}} $$

(3.5)

where the matrix H is the state transition probability matrix. Corresponding to the example shown in Fig. 14.1, the value of H is given by:

$$ {\mathbf{H}} = \left[ {\begin{array}{*{20}c} {0.2} & {0.4} & {0.4} & 0 & 0 \\ {0.3} & {0.2} & 0 & {0.4} & {0.1} \\ {0.4} & 0 & {0.6} & 0 & 0 \\ 0 & {0.6} & 0 & {0.4} & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \end{array} } \right] $$

(3.6)

Based on state transfer formula, and the probability distribution of the initial position p(0), the probability p(n) of every individual’s potion at any time n can be achieved from the above model, namely:

$$ {\mathbf{p}}(n) = {\mathbf{p}}(0){\mathbf{H}}^{n} $$

(3.7)

Denote

$$ \begin{array}{*{20}c} {{\mathbf{p}}(0) = [\begin{array}{*{20}c} 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \end{array} ]} \\ {i{\text{th}}} \\ \end{array} $$

(3.8)

the initial probability vector that individual m in position u _i at the initial moment, thus the jth element of p(n) representing the probability of individual m appear at position u _j at the time n, namely p(m, n, u _j) in the formula 3.1. So calculate the formula 3.4 and get the personal information anonymization degree function g(n, u _i), and decide whether to release trajectory information at time n according to the proximity of the function and the maximum possible values.

4 Model Simulation

The simulation is based on Markov motion model assumptions and state transition matrix. Here is the simulation. Suppose that there are four people at the initial moment, respectively in the four position u ₁–u ₄. With the passage of time, the information privacy metric is showed in Fig. 14.2.

The “information privacy metric” function in the figure increases with the increasing time n, reflecting the phenomenon of the gradually rising of the difficulty to judge corresponding individual from the location information.

The three diagrams (Figs. 14.3, 14.4, 14.5) are given respectively: (1) At initial time two persons are in u ₁ and the other two are in u ₂; (2) At initial time one person is in u ₁ and the other three are in u ₂; (3) At initial time four persons are all in u ₂. The diagram shows that “privacy metric” function of g increases with the concentration at initial time, which reflects the fact that at the initial moment the more concentrated the people are, the more difficult to distinguish different individuals by the trajectory.

5 Conclusion

In this paper we discuss the method to evaluate the extension of personal information anonymization during the navigation trajectory based on Markov chain model. Besides, this paper defines a function of “personal information privacy metric” based on the information entropy. With this function, the degree of association between data and personal information can be evaluated quantitatively, and the degree of personal information privacy is shown to be function of time and can be calculated with the person motion model. A Markov chain motion model based simulation is given in this paper, which shows the change of ambiguity of person with time and initial locations. The Markov chain model in the simulation is a simplified model, for more accurate individual motion models, it is necessary to analysis large amount of trajectory informations, which will be the future work of this paper. The calculation method of “personal information privacy metric” proposed in this paper can be readily extended to other motion models to get more accurately evaluation result, and it provides a quantitative basis for the personal information anonymization in the publication of navigation data.

References

Gedik B, Liu L (2008) Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Trans Mob Comput 7(1):1–18
Article Google Scholar
Huang L, Matsuura K, Yamane H (2005) Enhancing wireless location privacy using silent period. In: Proceedings of the IEEE wireless communications and networking conference, 2005, pp 1187–1192
Google Scholar
Gabrial G, Panos K, Spiros S (2007) Prive: anonymous location-based queries in distributed mobile system. In: Proceedings of the 16th international conference on World Wide Web, 2007, pp 371–380
Google Scholar
Gabrial Q, Panos K, Spiros S (2007) MobiHide: a mobile peer-to-peer system for anonymous location-based queries. In: Proceedings of the 10th international symposium on advances in spatial and temporal databases, 2007, pp 221–238
Google Scholar
Lin X, Li SP, Yang ZH (2009) Attacking algorithms against continuous queries in LBS and anonymity measurement. J Softw 20(4):1058–1068. http://www.jos.org.cn/1000-9825/3428.htm
Google Scholar

Download references

Acknowledgments

The research work has been jointly funded by Beidou Navigation Satellite System Management Office (BDS office) and the Science and Technology Commission of Shanghai Municipality; the funding project number is BDZX005.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, 200240, China
Jiannan Gao, Rendong Ying, Peilin Liu & Wenxian Yu

Authors

Jiannan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Rendong Ying
View author publications
You can also search for this author in PubMed Google Scholar
Peilin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxian Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jiannan Gao or Rendong Ying .

Editor information

Editors and Affiliations

China Aerospace Science and Technology Corporation, Chinese Academy of Sciences, Beijing, China
Jiadong Sun
China Satellite Navigation Office, Beijing, China
Wenhai Jiao
Navigation Headquarters, Chinese Academy of Sciences, Beijing, China
Haitao Wu
Department of Electronic Engineering, Tsinghua University, Beijing, China
Mingquan Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, J., Ying, R., Liu, P., Yu, W. (2014). Study on the Personal Information Anonymization Method for the Releasing of Navigation Data. In: Sun, J., Jiao, W., Wu, H., Lu, M. (eds) China Satellite Navigation Conference (CSNC) 2014 Proceedings: Volume I. Lecture Notes in Electrical Engineering, vol 303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54737-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-54737-9_14
Published: 23 April 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54736-2
Online ISBN: 978-3-642-54737-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics