Keywords

1 Introduction

With the fast development of communications and infrastructures, there is a huge volume of data generated by various devices, including smart phones and wearable devices [24]. Data sharing and publishing greatly improve the convenience of the daily life of peoples by data analysis techniques, such as service recommendation and data mining [5, 9, 19,20,21,22,23, 44]. For example, some mobile user sends his/her health information collected by a smart watch to medical experts and gets a scientific sport plan. Mobile users also take advantage of their own data to get others’ applications and services, such as GPS navigation, shopping and takeaway. The data collectors (e.g., hospitals) share medical data to disease prevention departments to predict possible infectious diseases (e.g., coronavirus disease 2019, COVID-19 [39]). Although users benefit from the applications and services, it brings a certain amount of leakage risk of their sensitive information. Since data sent to service providers is left from data owners (e.g., mobile users), they cannot control the usage of the data. The risk may cause monetary or reputation loss of users to hinder data sharing and publishing [38, 40].

In recent years, there have been a series of research works to propose various private metrics and algorithms, including k-anonymity [31], \(\ell \)-diversity [14], t-closeness [11], differential privacy (DP) [7, 8, 34], local differential privacy (LDP) [6]. k-anonymity requires that each record in a perturbed dataset at least \( k-1 \) same records. Differential privacy is a rigorous mathematical definition that uses a privacy parameter \( \epsilon \) to limit the probability to distinguish any two datasets. Different from DP, LDP is an extended version, which is suited to the local setting. There are a lot of randomized mechanisms to satisfy DP and LDP, including the Laplace mechanism [8], the exponential mechanism [16], the Randomized Response mechanism [37]. Different private metrics and mechanisms are suitable to different scenes. Although these private metrics and mechanisms protect privacy of users, it decreases the accuracy of data and thus degrades the service quality of users.

Although there are a lot of effective private metrics and mechanisms, it is a challenge for users and data collectors (e.g., service providers) to choose proper private mechanisms and parameters and interact with the other key participants. On one hand, users and data collectors need to consider the balance between utility and privacy. On the other hand, they also analyze the influence of important factors, including attack strength of adversaries and the privacy degree of the other users and collectors. To this end, game theory is an efficient theoretical tool to research the behavior of various participants [17]. In actually, there have been a series of works to utilize game theory to analyze the interactions among multiple participants for privacy. Meanwhile, the game theoretical analysis contributes to improving efficiency of protection and reducing the cost of privacy.

However, to the best of our knowledge, there are few works to survey the related work about game theory-based privacy analysis for data sharing and publishing. In the previous surveys [4, 15, 18, 25, 42, 43], most of them focus on the survey of information security rather than the game theory-based privacy analysis. For example, Manshaei et al. [15] and Pawlick et al. [18] mainly focus on game theory-based network security and privacy. Therefore, this paper tries to survey the game theory-based privacy in data sharing and publishing. We first introduce the preliminaries about scenes, privacy metrics, players and model of games. Then, we survey the existing works about privacy preservation in data sharing and publishing. Finally, we discuss the possible future research directions.

The remaining part of the paper is organized as follows. Section 2 introduces the preliminaries about the key elements based on game theory. Section 3 surveys the existing game theoretical works for privacy analysis. Section 4 discusses the existing works and present the possible future research directions. Section 5 concludes the main work of the paper.

2 Preliminaries

In this section, we introduce the preliminaries about scenes, privacy metrics, game players and models.

2.1 Scenes

In this paper, we focus on two scenes, including data sharing and data releasing [26]. In these two scenes, data can be used for monetary reward and for service. For the former, users send their data to the data collector and get the monetary reward (i.e., data trading). For the latter, users send their data to service providers and get corresponding services. In detail, they are listed as follows:

  • Data Sharing. In order to get some service (e.g., location-based service, medical service), sensitive information (e.g., location, health information) of some user is shared to the third party.

  • Data Publishing. The third party collects a huge volume of data from users. Meanwhile, he/she may publish or share the perturbed data to the public or the organization.

For data sharing, either users need to perturb their data so as to protect their privacy. For data publishing, data collectors (e.g., service providers) needs to prevent users’ data from privacy leakage.

2.2 Privacy Metrics

There have been a series of privacy metrics and mechanisms to protect privacy, including \(k-\)anonymity [31], differential privacy [8] and local differential privacy [6]. Here, we present the definitions of the above privacy metrics as follows.

Definition 1

(\(k-\)anonymity [31]). A perturbed mechanism \( \mathcal {M} \) satisfies k-anonymity if after perturbation of some dataset D, each record has at least \( k-1 \) same records.

Both DP and LDP are rigorous mathematical definitions and suitable to different scenes. DP is used for the central setting, while LDP is for the local setting.

Definition 2

(Differential Privacy [8]). A randomized mechanism \(\mathcal {M}\) satisfies \(\epsilon \)-differential privacy if for any two datasets \( D_i \) and \( D_j \) which have at most one different record and the domain \( \mathcal {O} \) of mechanism \( \mathcal {M} \), the output should satisfy the following requirement:

$$\begin{aligned} \Pr [\mathcal {M}(D_i) \in \mathcal {O} ] \le \exp (\epsilon )\cdot \Pr [ \mathcal {M}(D_j) \in \mathcal {O} ] \end{aligned}$$
(1)

Definition 3

(Local Differential Privacy [6]). A randomized mechanism \( \mathcal {M} \) satisfies \( \epsilon \)-local differential privacy if for any two value \( v_i \) and \( v_j \) and the domain \( \mathcal {O} \) of mechanism \( \mathcal {M} \), the output should satisfy the following requirement:

$$\begin{aligned} \Pr [\mathcal {M}(v_i) \in \mathcal {O} ] \le \exp (\epsilon )\cdot \Pr [ \mathcal {M}(v_j) \in \mathcal {O} ] \end{aligned}$$
(2)

2.3 Players

In the privacy games, there are three key participants listed as follows:

  • Data Owners (DO). In order to get some service, the players share their personal information (e.g., trajectory, medical information) to data collectors. When the data collectors are untrusted, the owners should prevent their information from the leakage by some private mechanisms.

  • Data Collectors (DC). The data collectors collect data from a large number of users. For trusted data collectors, they should protect users’ data and avoid the possible leakage. For untrusted data collectors, they are adversaries.

  • Data Adversaries (DA). The objective of data adversaries is to get sensitive information of data owners. On one hand, they get side information by illegal measurements (e.g., eavesdropping devices, tracking) to launch inference attack. On the other hand, they may be untrusted collectors and get perturbed data from users.

For DOs and trusted DCs, their objective is to protect data from leakage. For untrusted DCs and DAs, their objective is to get sensitive information of DOs.

2.4 Game Model

According to different participants, the existing works about game theory-based privacy analysis are classified into four types as follows:

  • Data Owner vs. Collector (DOC). In this case, data collectors are not trusted, so that data owners need to adopt preservation mechanisms to protect their privacy. However, it brings a certain amount of utility loss, varying from privacy parameters. Therefore, the objective of data owners is to minimize the utility loss, while the objective of the latter is to get sensitive information as much as possible.

  • Data Owner vs. Owner (DOO). In some privacy metric (e.g., k-anonymity), the privacy decision of some data owner influences his own privacy, but also the privacy of the others. That is, some data owner benefits from the other’s privacy protection, which may lead to the unwillingness to protect privacy.

  • Data Collector vs. Adversary (DCA). Data collectors get a huge volume of data from users and have the responsibility to prevent privacy leakage against the adversaries. Therefore, there is the defense and attack between data collectors and adversaries.

  • Data Collector vs. Collector (DCC). Due to the privacy correlation of different datasets, the privacy degree of some dataset is influenced by its own privacy requirements and the others’ privacy parameters. Therefore, there is the interactions between multiple data collectors.

3 Privacy Games in Data Sharing and Publishing

According to different game models in Sect. 2.4, we present the detailed works of DOC, DOO, DCA and DCC.

3.1 DOC

In order to get high-quality service, the data owners get the optimal utility under a privacy requirement.

Shokri [29] considers a common scene, in which data owners share their personal information to a untrusted third party (i.e., data collector). Under a joint guarantee of differential privacy and distortion privacy, the objective of data owners is to minimize their utility loss. As an adversary, the data collector attempts to find the data owners’ secret by an inference attack. Therefore, it causes defense and attack between data owners and adversaries. In order to analyze the actions of different participants, Shokri [29] constructs a Stackelberg privacy game. In detail, the data owner first chooses a protection mechanism and then the adversary follows by designing an optimal inference attack. Shokri demonstrates that the optimal inference attack results in the same privacy for the data owner. In addition, it proposes linear program solutions for game analysis, in which each participant tries to get the optimal utility. The game result shows that the joint guarantee of differential privacy and distortion privacy is better than differential privacy and distortion privacy.

In order to get users’ data, the data collector either provides some specified service or offers the payment to the data owner. Different from the previous scenes, Chessa et al. [3] propose a special case, in which the data collector take the results of data analysis as a public good. The authors construct a game theory-based model, in which each data owner contributes a part of or all of data at a self-chosen level of precision. The data collector controls the degree of data precision. Chessa et al. [3] discuss two cases, i.e., homogeneous and heterogeneous individuals. The authors demonstrate that the data collector can decrease the bound on the precision of the data to increase the population estimate’s accuracy.

On the other hand, the data collector offers payments to incentivize data owners to report their real information.

Similar to [29], Wang et al. [36] consider the scheme of data for monetary reward. The data collector doesn’t know the privacy cost of data owners and tries to design a payment strategy to spend the minimal payment to get the desired accurate objective. The authors construct a Bayesian game model and analyze Nash equilibrium points. Meanwhile, a sequence of mechanisms are designed to increase the number of data owners, who report their real information. More importantly, the designed mechanism is asymptotically optimal.

Wang et al. [35] considers a case, in which an untrusted data collector offers payments for noisy data due to privacy concerns of data owners. Although it protects privacy of data owners, it degrades the quality of data. Therefore, it causes an interaction between the data collector and data owners. In order to get a desired quality of noisy data, Wang et al. [35] construct a game model to design a payment mechanism. The privacy requirement of each data owner is controlled by privacy parameter \( \epsilon \) in differential privacy. Meanwhile, the privacy parameter also influences the quality of perturbed data. The authors design a payment mechanism and analyze Nash equilibrium of the game. In the payment mechanism, for each data owner, the probability to report the real data as a best strategy is \( \frac{e^{\epsilon }}{e^{\epsilon } + 1} \). As a result, the payment mechanism satisfies \( \epsilon \)-differential privacy due to \( \frac{e^{\epsilon }}{e^{\epsilon } + 1}/ \frac{1}{e^{\epsilon } + 1} = e^{\epsilon } \).

Sooksatra et al. [30] propose a novel challenge: how to design a scheme that benefits both data owners and collectors and promotes their cooperation to prevent data secondary use (e.g., data releasing). The purpose of the scheme is to make data owners report their accurate information and the data collector not resell data. The authors consider two cases, i.e., data for services and data for monetary reward. Sooksatra et al. [30] construct an iterated data trading game model with asymmetric incomplete information. Then, the authors reveal the data trading dilemma problem, including two aspects. The first one is whether or not data owners report their data and face a risk of data release by the data collector. The second one is whether or not the data collector resell data to the others. To this end, the authors propose a zero-determinant strategy to promote data owners and collectors to cooperate rather than defection.

In the above literatures, none of them consider the privacy correlation of data owners’ data. Since the social correlation of data owners, it implies that their data is correlated. As a result, the privacy degree of some data owner is influenced by his/her and the other’s privacy requirement. Liao et al. [13] construct a two-stage Stackelberg game, in which the data collector chooses a certain number of data owners that report their perturbed data. The authors derive that at Nash equilibrium, only one data owner considers the privacy correlation and the others send their real data. For the data collector, the authors present an optimal privacy protection mechanism.

Sfar et al. [28] discuss the privacy preservation problem in Internet of Things. The data owners (e.g., drivers and vehicles) send their data to the data collector (e.g., data requester). The authors construct a game model to analyze the behaviors of data owners to get the optimal privacy mechanism.

The mobile crowd sensing (MCS) system is one of the most common applications in data sharing. In the MCS system, noisy sensory data is sent to the data collector. In order to motive data owners to report their true data, Jin et al. [10] propose a payment mechanism and construct a Bayesian game model to analyze the behaviors between data collectors and owners. As a result, the authors propose a truth discovery algorithm to motive data owners to maximize their efforts. Meanwhile, the algorithm satisfies individual rationality and budget feasibility.

3.2 DOO

Kumari et al. [33] consider privacy game in data publishing, in which each data owner reports his/her real or dummy data to the data collector. Under the privacy requirement of k-anonymity, the privacy degree of each data owner is influenced by the others’ privacy decision. The authors construct a cooperative privacy game (CoPG), in which each player considers a real value called cooperative value. At Nash equilibrium, the authors use information loss metric to evaluate the efficiency of anonymization process.

3.3 DCA

Chen et al. [2] propose an interesting problem: what is the behavior of a data owner to maximize his/her utility in a case with clear privacy costs? The authors construct an interaction game between a data owner and an adversary. The latter attempts to get the real value of the former’s data. In detail, for each data owner, he/she has a value \( v\in \{0,1\} \) and then reports the randomized value \( v' \) to the data collector. The purpose of the data collector is to get the accurate value v. Since the data collector doesn’t know the private value of the data owner, the authors construct a Bayesian game and analyze Bayesian Nash equilibrium. According to different payment functions, the authors discuss three cases and derive that the behavior of the data owner takes a randomized strategy.

Vakilinia et al. [32] consider a cyber threat information sharing scene to have proactive knowledge on the cybersecurity devices and improve the defense efficiency. Although the data collector gets the payment, they face the risk of privacy leakage. To this end, the authors construct a dynamic game model between the data collector and the adversary. In detail, the objective of the adversary is to maximize his/her utility by attacking, while the data collector needs to decide the amount of information. The authors propose 3-way game model with three main components, including CYBEX, data collectors and an adversary. They also derive an optimal strategy of how much sanitation an data collector choose to maximize his/her utility.

3.4 DCC

When there are multiple datasets, their privacy is correlated with each other. It implies that the privacy degree of some data collector is influenced by both his/her and the others’ privacy protection. To this end, Wu et al. [41] firstly present a novel definition of correlated differential privacy to describe the privacy relationship between different datasets. Then, the authors construct a game model, in which each data collector publishes the dataset and decides the proper privacy parameter \( \epsilon \) under differential privacy. They also analyze the existence and uniqueness of the pure Nash equilibrium.

For cybersecurity, multiple organizations share their network data to improve the defense of the whole network. Rawat et al. [27] introduce the idea of Blockchain concept to propose a novel information sharing system, i.e., iShare. The authors construct a Stackelberg game model to analyze the behaviors of organizations.

4 Discussion and Future Research Directions

In this section, we briefly discuss the privacy games in Sect. 3 and then present the possible research directions in the future.

4.1 Discussion

By the description in Sect. 3, we find that most of the existing works focus on DOC and DCA. In these two game models, they consists of three key participants, including data owners, collectors and adversaries. In particular, data collectors are the most important participants. A large number of works usually assume that the data collector is untrusted in a game model. The basic strategy is to take privacy as a commodity and evaluate its price. Then, the data collector offers a certain amount of payment or service as the compensation for privacy risk. When the data collector is trusted, the objective of data collectors is to maximize the utility of perturbed data under a certain amount of privacy guarantee. The brief description of privacy games under data sharing and publishing is shown in Table 1.

Table 1. Brief description of privacy games under data sharing and publishing

4.2 Future Research Directions

Game Model for Local Differential Privacy. Local differential privacy is a novel privacy metric in a local setting. In the previous works, there are few works to utilize local differential privacy to construct the privacy game models. It is interesting that some works (e.g., [35]) have been taken advantage of the similar idea of LDP to construct game models. In fact, with the fast development of Internet of Things and mobile cloud computing, a lot of efficient privacy mechanisms have been proposed to satisfy local differential privacy, including RR [37] and k-RR [1, 12]. Therefore, it is a possible research direction to utilize these mechanisms to construct the game model in the local setting.

Game Analysis for DOO and DCC. Relatively speaking, there are few works about DOO and DCC for data sharing and publishing. At present, the condition to construct such privacy game is that there exists the privacy correlation between owners or collectors. The privacy correlation indicates two aspects: (i) some data owner/collector improves the degree of privacy preservation to increase the privacy degree of the others; and (ii) due to the correlation of data, the risk is higher with the addition of more data owners. It is a challenge to define a proper privacy metric to compute the correlation of privacy.

Multiple-Agents for Privacy Game. In the existing privacy games, there are usually two agents to interact with each other. However, in the real environments, there are more than two agents to participate in the game. For example, data collectors receive data from data owners and then sell them to the third party. It implies that the privacy model considers the interaction not only between data owners and collectors, but also between data collectors and the third party. It is more challenging to construct the privacy model and analyze Nash equilibrium points.

5 Conclusion

This paper surveys the related work about privacy games in data sharing and publishing. We have classified privacy games into four types, i.e., DOO, DOC, DCA and DCC. We have presented a certain number of literatures based on game theory to analyze the behaviors of different participants for data privacy. Our object is to help readers to understand the existing works and the possible future directions.