Enhancing spatial and temporal utilities in differentially private moving objects database release

Deldar, Fatemeh; Abadi, Mahdi

doi:10.1007/s10207-020-00516-5

Enhancing spatial and temporal utilities in differentially private moving objects database release

Regular contribution
Published: 24 July 2020

Volume 20, pages 511–533, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Information Security Aims and scope Submit manuscript

Enhancing spatial and temporal utilities in differentially private moving objects database release

Download PDF

487 Accesses
7 Citations
Explore all metrics

Abstract

The pervasive use of mobile technologies and GPS-equipped vehicles has resulted in a large number of moving objects databases. Privacy protection is one of the most significant challenges related to moving objects databases because of the legal requirements in many application domains. Over the last few years, several differentially private mechanisms have been proposed for moving objects databases. However, most of them aim to answer statistical queries and do not release a differentially private version of a moving objects database. In this paper, we present DP-MODR, a differentially private (DP) mechanism for synthetic moving objects database release (MODR). DP-MODR tries to efficiently and effectively release synthetic trajectories while preserving spatial and temporal utilities. In this way, the released differentially private moving objects database can be used for different purposes as well, including data analysis tasks. DP-MODR keeps some main spatial and temporal properties of original trajectories and defines a new differentially private tree structure to keep the most probable paths with different lengths and different starting points, which are then iteratively joined to generate synthetic trajectories in a bottom-up way. Also, we present an extension of DP-MODR to support moving objects databases whose locations are time-dependent. Extensive experiments on real moving objects datasets using multiple spatial and temporal evaluation measures show that DP-MODR enhances the utility of query answers and better preserves the main spatial and temporal properties of original trajectories in comparison with recent related work.

Differentially Private Data Publishing of Trajectory Synthesis Based on Generalization and Probability

Article 13 September 2024

SGTP: A Spatiotemporal Generalized Trajectory Publishing Method With Differential Privacy

Article 11 December 2022

Continuous release of temporal correlation location statistics with local differential privacy

Article 04 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The popularity of location-based services and applications is growing with the rapid growth of smartphone owners, resulting in the rapid growth of moving objects databases. A moving objects database is a multiset of trajectories, each of which represents the movement history of a moving object during a period of time. Moving objects databases offer a vast application potential for researchers and enterprises, and there is a great interest in mining these databases for purposes such as city planning, traffic control, trajectory pattern analysis, and municipal transportation. For example, transport authorities can use moving objects databases for better designing transportation systems and optimizing resource consumption. However, unauthorized exposure of moving objects’ trajectories may disclose their trip histories, home and work locations, frequent meeting points, or visits to sensitive locations such as hospitals, health clinics, and airports. The disclosure of such information has always been of concern to the owners of trajectories and prevents them from sharing their trajectories in moving objects databases.

Traditional privacy protection techniques for moving objects databases have mostly focused on location privacy, which is often achieved by perturbing or obfuscating each point of a trajectory. However, these location-based techniques are not usually sufficient for protecting the spatial and temporal properties of trajectories. On the other hand, anonymized moving objects databases that do not contain personal identifiers or other evidence of identity still do not prevent the precise identification of moving objects [13]. For example, it was shown that 87% of the population in the USA had reported characteristics that likely made them be uniquely identified, even though all explicit identifiers were removed from data records [32].

Differential privacy [9, 40] has emerged as one of the strongest privacy definitions for privacy protection. The intuition is that the same conclusions must be reached independently of whether an individual data record opts into or opts out of a database. Specifically, it ensures that the probability that a statistical query will produce a given result is approximately the same as when one data record is added or removed from a database. Differential privacy provides strong privacy guarantees independently of the background knowledge of the adversary [12, 19]. This is because differential privacy is a property of the data release mechanism, not of an interaction between the mechanism and the adversary [10]. Thus, differentially private mechanisms are immune to a wide range of privacy attacks [12]. Initially, work on differential privacy mainly concentrated on answering statistical queries [5, 7, 9, 28]. However, some recent work has begun to use differential privacy for data release scenarios in different fields [1, 29, 39].

In the last years, several differentially private mechanisms have been proposed to answer statistical queries over moving objects databases [3, 8, 17, 33]. However, as mentioned, the majority of them do not release a differentially private (synthetic) version of an original moving objects database. Although some few mechanisms have been proposed to address this issue [15, 16], they cannot properly preserve the spatial and temporal properties of original trajectories. In this paper, we continue this line of research by presenting DP-MODR, a differentially private mechanism for synthetic moving objects database release that preserves spatial and temporal utilities as much as possible. In this mechanism, we first derive some useful properties of an original moving objects database, including number of trajectories, number of points in each trajectory, and mobility patterns of trajectories, in a differentially private way. Then, we construct some so-called noisy cost-sensitive path trees to keep existing most probable paths with different lengths (up to a maximum length) and different starting points. Finally, using these noisy cost-sensitive path trees and by considering the obtained differentially private spatial and temporal properties of original trajectories, we efficiently construct a synthetic moving objects database. Furthermore, we extend DP-MODR to support moving objects databases whose locations are time-dependent. In this new extension, also known as DP-MODRT, the synthetic moving objects database can preserve the time information of trajectories as well as the location information, in a differentially private way.

In the following, we list the main contributions of this paper:

We introduce DP-MODR, a differentially private mechanism for synthetic moving objects database release, which aims to enhance both spatial and temporal utilities simultaneously. DP-MODR achieves this aim by preserving the spatial and temporal properties of original trajectories in synthetic trajectories, in a differentially private manner.
We present a new tree structure, known as a noisy cost-sensitive path tree, to keep existing most probable paths with different lengths and different starting points while satisfying differential privacy. We efficiently use the noisy cost-sensitive path trees to generate synthetic trajectories.
We efficiently construct a differentially private moving objects database by generating synthetic trajectories in a bottom-up way. Each synthetic trajectory is generated by iteratively joining the most probable paths until the intended length of that trajectory is reached.
We design an attack, called sensitive locations disclosure attack, on synthetic moving objects databases and show to what extent DP-MODR is resilient to it.
We extend DP-MODR to support moving objects databases whose locations are time-dependent. The new differentially private mechanism, also known as DP-MODRT, is especially suitable for answering time-dependent queries over a synthetic moving objects database.
Through extensive experiments on real moving objects datasets, we show that DP-MODR enhances the utility of query answers and better preserves the main spatial and temporal properties of original trajectories in comparison with recent related work. Also, through some experiments, we show that DP-MODRT can preserve the time information of trajectories as well as the location information.

The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 provides some preliminaries and basic definitions. Section 4 introduces DP-MODR, explains it in detail and analyzes its privacy guarantee and performance. In Sect. 5, we extend DP-MODR to support moving objects databases whose locations are time-dependent. In Sect. 6, we report our experimental results in detail, and finally, in Sect. 7, we give a summary and discussion.

2 Related work

In this section, we review the state-of-the-art mechanisms for preserving differential privacy in moving objects databases.

The notion of differential privacy was introduced by Dwork [9] in 2006, and since then, it has been successfully applied to a wide range of data analysis tasks [4, 6, 18, 35, 37]. To maximize the utility of the results provided by differential privacy, the magnitude of the random noise should be as small as possible. The basic idea is to concentrate the probability mass around zero as much as possible. Dwork et al. [11] proposed the Laplace mechanism to preserve differential privacy for numerical values by calibrating the standard deviation of the noise according to the global sensitivity of the query function. Almost all the work done in the context of differential privacy for numerical values has used the Laplace mechanism to achieve differential privacy guarantees. However, there is little work to find the optimal data-independent noise distribution to achieve differential privacy [14, 31]. For example, Soria-Comas et al. [31] proposed a general optimality criterion based on the concentration of the probability mass of the noise distribution around zero. They showed that any noise optimal under this criterion must be optimal under any other sensible criterion. They also built the optimal data-independent noise distribution. Geng et al. [14] derived the optimal $\varepsilon $-differentially private mechanism for single real-valued query functions under a very general utility maximization (or cost minimization) framework. They showed that the class of noise probability distributions in the optimal mechanism has staircase-shaped probability density functions that are symmetric (around the origin), monotonically decreasing, and geometrically decaying. Accordingly, in our work, optimal differential privacy can be achieved by applying the optimal noise distribution instead of the Laplace distribution.

In the last few years, some mechanisms have been proposed to enforce differential privacy in moving objects databases. For the first time, Chen et al. [3] studied the problem of differential privacy for moving objects databases. They proposed a data-dependent sanitization algorithm by constructing a noisy prefix tree over the underlying moving objects database. However, with the growth of the noisy prefix tree, the number of trajectories falling into the same branch decreases quickly, resulting in poor utility. To address this problem, in subsequent work, Chen et al. [2] employed a variable-length n-gram model that extracts the essential information in terms of a set of variable-length n-grams. The model makes use of an exploration tree based on the Markov assumption to decrease the magnitude of added noise. However, this work still suffers from the problem that by increasing the number of locations, the size of the exploration tree will grow exponentially, and thus, it is not scalable for spatial domains with a large number of locations. He et al. [17] presented DPT, a system to synthesize trajectories while ensuring differential privacy. DPT, which stands for differentially private trajectories, discretizes the spatial domain at multiple resolutions using a hierarchy of reference systems to capture movements at different speeds. However, DPT suffers from the problem that, for fine resolutions, the frequencies of subtrajectories will be small, and thus, the added noise will become relatively large. Wang et al. [34] proposed a private trajectories calibration and publication system (PTCP), which can be used to release trajectories in social media under differential privacy. PTCP adopts a noisy calibrated trajectories publication solution with privacy guarantees by building noise-enhanced prefix trees and extends the utility of released data through a differentially private post-processing sampling approach. However, all of these works use some tree structure to represent a moving objects database that causes the noise added to nodes with small real value results in a large relative error. Moreover, leveraging tree structures to represent moving objects databases usually incurs high time and space overheads. In this paper, we preserve the mobility patterns of original trajectories using a so-called normalized frequency matrix, which reduces time and space overheads.

Li et al. [23] proposed a differentially private trajectory data release mechanism with a bounded noise generation and a trajectory merging algorithm. The noise generation algorithm is designed such that the noise added to true trajectory counts is sampled in a legal range. Xu et al. [36] proposed DP-LTOD, a differential privacy latent trajectory community discovering scheme, which obfuscates original trajectory sequences into differentially private trajectory sequences. DP-LTOD first partitions an original trajectory sequence into different segments. Then, it selects the suitable locations and segments to constitute an obfuscated trajectory sequence. Specifically, it formulates a trajectory obfuscation problem to select an optimal trajectory sequence which has the smallest difference with the original trajectory sequence. Wang et al. [33] proposed DP-PSP, a differentially private statistics publication mechanism for real-time trajectory streams. DP-PSP discovers sensitive anchor points and divides the road network into a number of segments. Each spatial location in a trajectory stream is then calibrated to its nearest anchor point to handle the heterogeneity of trajectories. DP-PSP allows users to specify their own dynamic privacy budget distribution to optimize their own privacy budget. It also presents a private k-nearest neighbor selection and perturbation algorithm to reduce the amount of perturbation distortion induced by adding random noise.

Gursoy et al. [16] presented AdaTrace, a utility-aware trajectory synthesizer with differential privacy guarantee. AdaTrace performs feature extraction, learning, and noise injection using a database of real trajectories. It then generates synthetic trajectories while preserving differential privacy, enforcing resilience to inference attacks, and upholding statistical and spatial utilities. They also presented DP-Star [15], similar work to AdaTrace, which uses a normalization algorithm to summarize raw trajectories using their representative points, in its first step. However, these works do not properly consider some useful properties of original trajectories, such as number of points and mobility patterns, in synthetic trajectories and, thus, cannot preserve some spatial and temporal properties of original trajectories (as we will show in our experiments). Moreover, they do not consider time-dependent locations and, thus, are not able to answer time-dependent queries.

Deldar and Abadi [8] presented PDP-SAG, a differentially private mechanism that combines the sensitive attribute generalization (SAG) with personalized differential privacy (PDP) in a unified manner. By this combination, they aimed to provide different levels of differential privacy protection for moving objects that have non-spatiotemporal sensitive attributes as well. However, this work aims to provide personalized differential privacy for moving objects databases that have non-spatiotemporal sensitive attributes as well and does not release synthetic moving objects databases, as we do in this paper.

3 Preliminaries

In this section, we give some definitions and preliminaries that are used throughout the paper.

3.1 Differential privacy

Differential privacy (DP) is one of the strongest privacy guarantees available today that provides a mathematically provable guarantee of privacy protection against a wide range of privacy attacks [12]. It guarantees that the adversary will learn no information about an individual data record, even though he/she observes sequences of query outputs from two neighboring databases, one with and the other without that data record. In the following, we define the concepts related to differential privacy.

Definition 1

(Neighboring databases) Two distinct databases $\mathcal {D}_1$ and $\mathcal {D}_2$ from the universe of databases $\mathfrak {D}$ are said to be neighbors, denoted by $\mathcal {D}_1\sim \mathcal {D}_2$, iff one can be obtained by adding or removing a single data record from the other.

Definition 2

($\varepsilon $-Differential privacy) A randomized algorithm $\mathcal {A}$ is said to be $\varepsilon $-differentially private or $\varepsilon $-DP iff for any two input neighboring databases $\mathcal {D}_1$ and $\mathcal {D}_2$, and any subset O of all possible outputs of $\mathcal {A}$, we have

$$\begin{aligned} {\Pr [\mathcal {A}(\mathcal {D}_1)\in O]}\le \exp (\varepsilon )\times {\Pr [\mathcal {A}(\mathcal {D}_2)\in O]} , \end{aligned}$$

(1)

where $\varepsilon $ is a privacy parameter, known as the total privacy budget, that determines the strength of the privacy guarantee. A smaller $\varepsilon $ will result in a stronger privacy guarantee, and vice versa.

A popular and widely used mechanism for answering statistical queries under differential privacy is the Laplace mechanism [11], which adds random noise drawn from the Laplace distribution to the output of statistical queries. The magnitude of the noise is scaled according to the (global) sensitivity of the query function, which is a measure of the maximum possible change to query outputs over any two neighboring databases.

Definition 3

(Sensitivity) Let $f:\mathfrak {D}\rightarrow \mathbb {R}^d$ be a query function that maps any database in the universe of databases $\mathfrak {D}$ to a vector of d real numbers. The sensitivity of f, denoted by $\sigma _f$, is defined as

$$\begin{aligned} \sigma _f=\max _{\mathcal {D}_1\sim \mathcal {D}_2}{\Vert f(\mathcal {D}_1)-f(\mathcal {D}_2)\Vert _1} , \end{aligned}$$

(2)

where $\Vert \cdot \Vert _1$ denotes the $L^1$-norm of a vector.

Definition 4

(Laplace mechanism) Let $f:\mathfrak {D}\rightarrow \mathbb {R}^d$ be a query function for the universe of databases $\mathfrak {D}$. A randomized algorithm $\mathcal {A}$ satisfies $\varepsilon $-DP iff for any input database $\mathcal {D}\in \mathfrak {D}$, we have

$$\begin{aligned} \mathcal {A}(\mathcal {D})=f(\mathcal {D})+{\text {Lap}}(\sigma _f/\varepsilon ) , \end{aligned}$$

(3)

where ${\text {Lap}}(\lambda )$ is a Laplace random variable with probability density function $h_{\lambda }(z)=\frac{1}{2\lambda }\exp (-|z|/\lambda )$ and variance $2\lambda ^2$.

The Laplace mechanism does not apply to all statistical queries, such as those that have categorical (or discrete) outputs. The exponential mechanism [25] is more general than the Laplace mechanism and applies to all types of queries. It uses an arbitrary scoring function that given an input database $\mathcal {D}$ and a discrete output r, it assigns a real-valued score to r to quantify the quality of r.

Definition 5

(Exponential mechanism) Let $q:\mathfrak {D}\times \mathcal {R}\rightarrow \mathbb {R}$ be an arbitrary scoring function for the universe of databases $\mathfrak {D}$ and a domain of discrete outputs $\mathcal {R}$. The randomized algorithm $\mathcal {A}$ that returns the discrete output $r\in \mathcal {R}$ for an input database $\mathcal {D}\in \mathfrak {D}$ with a probability proportional to $\exp (\varepsilon q(\mathcal {D},r)/2\sigma _q)$ satisfies $\varepsilon $-DP, where $\sigma _q$ is the sensitivity of q and defined as

$$\begin{aligned} \sigma _q=\max _{r\in \mathcal {R},\mathcal {D}_1\sim \mathcal {D}_2}{\Vert q(\mathcal {D}_1,r)-q(\mathcal {D}_2,r)\Vert _1} . \end{aligned}$$

(4)

Any sequence of differential privacy computations is also differentially private. This property is known as compositionality and has two different types: sequential composition and parallel composition [26].

Theorem 1

Let $\varLambda =\{\mathcal {A}_1,\mathcal {A}_2,\dots ,\mathcal {A}_n\}$ be a set of randomized algorithms, where each $\mathcal {A}_i\in \varLambda $ satisfies $\varepsilon _i$-DP for an input database $\mathcal {D}$. Then, the sequential composition $\mathcal {A}_1\circ \mathcal {A}_2\circ \cdots \circ \mathcal {A}_n$ over $\mathcal {D}$ satisfies $(\sum _{i} {\varepsilon _i})$-DP and the parallel composition $\mathcal {A}_1\parallel \mathcal {A}_2\parallel \cdots \parallel \mathcal {A}_n$ over disjoint subsets of $\mathcal {D}$ satisfies $(\max _{i}{\varepsilon _i})$-DP [26].

As mentioned above, differential privacy guarantees that the distribution of query results changes only slightly due to the modification of any one data record in the database. This allows protection against powerful adversaries who know the entire database except for one data record. On the other hand, differential privacy mechanisms implicitly assume that data records in a database are independent. To the best of our knowledge, all of the works that apply differential privacy to real databases such as moving objects databases also follow this assumption [15,16,17]. Similarly, we follow the same assumption in this paper. However, few works have introduced the issue of dependable data records in differential privacy [21, 24]. As discussed in [21], if we do not make this assumption and consider a database where some individuals may have multiple data records; according to the Pareto principle, most of the individuals in this database will have few data records (or often one data record), whereas a small proportion of them may have more data records (see [21] for more details). Thus, we can separate a large number of low-frequency individuals from a small number of high-frequency ones and compute the sensitivity of the query function based on the group of low-frequency individuals [21]. This allows us to guarantee $\varepsilon $-DP for most individuals while having a little weaker differential privacy guarantee for others.

3.2 Moving objects database

Moving objects databases store and manage discrete or continuous changes of moving objects over an underlying spatial domain.

Given a spatial domain in which the movement of moving objects is constrained within it, a moving objects database $\mathcal {D}$ over this spatial domain is a multiset of trajectories. Each trajectory $T\in \mathcal {D}$ is a sequence of points or latitude/longitude locations $\langle X_1,X_2,\dots ,X_{|T|}\rangle $, where |T| is the length (or number of points) of T. The point $X_1$ is called the head of T, and the subtrajectory $\langle X_2,X_3,\dots ,X_{|T|}\rangle $ is called the tail of T. More specifically, the head of T is defined to be its leading point, and the tail of T is defined to be the subtrajectory obtained by removing its leading point.

Definition 6

(Subtrajectory) A trajectory $T_r=\langle X_1^r, X_2^r,\dots ,X_n^r\rangle $ is said to be a subtrajectory of a trajectory $T_s=\langle X_1^s,X_2^s,\dots ,X_m^s\rangle $, iff there exists n consecutive integers $1\le i<i+1<\cdots <i+n-1\le m$ such that $X_1^r=X_i^s,X_2^r=X_{i+1}^s,\dots ,X_n^r=X_{i+n-1}^s$.

4 Differentially private moving objects database release

Many companies like Google, Uber, and others collect a huge volume of data about the movements of moving objects every day through their mobile apps, resulting in large moving objects databases. Analyzing such databases is of great value for data analysts and has many applications in different tasks such as city planning, traffic analysis, taxi service prediction, and passenger demand analysis. However, due to the concerns of disclosure of any information about moving objects such as trip histories, home and work locations, frequent meeting points, or visits to sensitive locations like hospitals, health clinics, and airports, these companies often cannot safely provide their collected moving objects databases to data analysts.

In this section, we introduce DP-MODR, a differentially private mechanism for synthetic moving objects database release that preserves spatial and temporal utilities efficiently and effectively. DP-MODR consists of five main steps. In the first step, we discretize the continuous spatial domain into a finite set of domain cells and create a noisy histogram of starting domain cells of original trajectories to keep the distribution of trajectory heads. In the second step, we compute the noisy median length of original trajectories that start in each domain cell to preserve the distribution of trajectory lengths around their median. In the third step, we construct a noisy transition cost matrix to preserve the mobility patterns of original trajectories. In the fourth step, we construct some noisy cost-sensitive path trees using the noisy transition cost matrix to keep existing most probable domain cell paths with different lengths and different starting domain cells. Finally, in the fifth step, we release synthetic trajectories by constructing a differentially private moving objects database using the information obtained in the previous steps. It should be mentioned that the first three steps work on original trajectories; therefore, to satisfy differential privacy, we divide $\varepsilon $ into three parts, namely $\varepsilon _1$, $\varepsilon _2$, and $\varepsilon _3$, and give each part to one of the steps, respectively. For the rest of this section, we will assume that we are given a moving objects database $\mathcal {D}$, and our goal is to release a differentially private version of it, denoted by $\hat{\mathcal {D}}$. Table 1 summarizes the notations used throughout the paper.

Table 1 Notations used throughout the paper

Enhancing spatial and temporal utilities in differentially private moving objects database release

Abstract

Similar content being viewed by others

Differentially Private Data Publishing of Trajectory Synthesis Based on Generalization and Probability

SGTP: A Spatiotemporal Generalized Trajectory Publishing Method With Differential Privacy

Continuous release of temporal correlation location statistics with local differential privacy

1 Introduction

2 Related work

3 Preliminaries

3.1 Differential privacy

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Theorem 1

3.2 Moving objects database

Definition 6

4 Differentially private moving objects database release

4.1 Creating a noisy starting domain cells histogram

Example 1

4.2 Estimating noisy trajectory lengths

Example 2

4.3 Constructing a noisy transition cost matrix

Definition 7

Example 3

4.4 Constructing noisy cost-sensitive path trees

Definition 8

Example 4

Example 5

4.5 Constructing a differentially private moving objects database

Example 6

4.6 Privacy analysis

Theorem 2

Proof

4.7 Performance analysis

5 Differentially private time-dependent moving objects database release

Example 7

6 Experiments

6.1 Experimental setup

6.2 Evaluation measures

6.2.1 Count query error

6.2.2 Locations rank correlation

6.2.3 Frequent patterns rank correlation

6.2.4 Trip error

6.2.5 Length error

6.2.6 Diameter error

6.2.7 Total distance error

6.3 Experimental results

6.4 Attack resilience analysis

6.5 Comparison

7 Conclusion and discussion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation