1 Introduction

Social media becomes popular, such as Facebook, Instagram, and Twitter in recent years. People enjoy to communicate and share information with friends on the air. The social media is also a useful service for making friends and allowing users to reach other users from friend lists of their friends easily. As already known, it incurs the potential threat to users and their friends [9]. For example, the scoundrel can distinguish who is closer to you through your friend list and kidnap your acquaintances to threaten you. It is common on the news. Currently, people become aware of these serious issues and realize the necessity of preventing their friend lists to leak out. [8].

Unfortunately, recent researches show that the adversary can infer friends through check-ins of users even though that users hide their friend lists. These studies indicate that the social strength between two users could be deduced by their co-occurrences [5, 7, 12, 16, 19, 24], and the adversary cannot infer the friendships between users only when all friends configure their privacy levels to conceal their check-ins. As one can be imagined, the concealment of check-ins violates the willingness of users to share check-ins during their social journey joyfully. Despite its importance, how to appropriately resolve this issue remains a challenging task thus far.

In this paper, we explore a novel mechanism to prevent the malicious inference of friendships from user check-ins. The inspiration comes from our investigation on the publicly-available check-in dataset [6], which is collected from a location-based social service Gowalla. As observed the check-in history from November 2009 to September 2010, we clearly see that the friend lists will be precisely inferred when users reveal their stays in the social network as time advances. We utilize the state-of-the-art PGT mechanism for the friendship interference [24]. The “Unshielding” curve with the cross marker in Figure 1 refers to the performance (in terms of Precision@10) of the PGT method for friendship inference using Gowalla data. Eventually, almost all friends in average (Precision@10 ≈ 0.9) will be precisely inferred on 2010/9. While considering the case that the user stops to use the check-in function after 2010/5, the concern for inferring friendships still cannot be effectively diminished (as shown in the curve of the square marker in Figure 1).

Figure 1
figure 1

Observation on the Gowalla data

In practice, the friendship inference generally relies on the fact that friends may reveal their co-location over time via the check-in actions in location-based social services. The spatial and temporal closeness between check-ins of two users will provide the clue to their friendship. Based on such observations, we explore the mechanism to make chaotic check-ins without much compromising the intention of check-ins in the spatial-temporal sense in this paper. For example, when a user submits a check-in action to share his/her activity in the campus, the social check-in system can suggest the user to defer the check-in time or suggest to check in at the location with a loose spatial granularity (e.g., change the exact classroom location to the department building). As such, the slight obfuscation of check-ins from friends can significantly increase the difficulty to precisely infer friendships. To achieve this, we devise in this paper the novel “shielding check-in”. The essential idea of shielding check-in is to recommend secure places for user to check in so that the potential that the adversary correctly infers the friendships between users can be significantly reduced. In other words, by guiding the users to check in at alternative secure places, rather than the unshielding ones, the acquaintance privacy of users can be better protected. We propose the C heck-in S hielding S cheme (CSS) framework to realize the idea of shielding check-in. The result of applying shielding check-in after May 2010 can also be found in Figure 1. It can be convincingly seen that the performance of PGT will be clearly diminished as time advances, showing the effectiveness of shielding check-in. Such results demonstrate that shielding check-in can effectively reduce the potential that the adversary infers the friendships between users.

In the literature, the most relevant problems include Privacy Preserving in Social Networks [11, 14, 22, 23, 25, 26] and Privacy Preserving in Geo-social Services [1, 2, 4, 15, 21]. The goal of privacy preserving in social network is to anonymize the social network data for publishing. The social network data processed in this task is static, but in our work, we consider the dynamics of user check-ins – given the social network provided by the service provider, the check-in records will be generated by users over time. Besides, these previous studies for privacy preserving in geo-social services devoted to protect users’ location privacy, i.e., preventing users’ visited locations or home places from being identified. The goal of our work is to prevent users’ acquaintances (i.e., social connections) from being inferred by the adversary. Recent studies started to obfuscate the check-in data for protecting friendship privacy [3], but their tackled problem is static. That says, they presume all check-in data are given, and attempt to obfuscate the static check-in data for publishing. We believe that such setting is unrealistic because in real applications, we need to advise users to choose a better place to check in so that their friendships cannot be attacked by the adversary in the future. In other words, the check-in data used in this work is dynamic. This practical setting brings a challenge – what we have to shield users’ check-in is partially observed, i.e., the past check-ins till the present, we do not know the future check-ins. To this end, we aim at recommending a set of secure places to the user for shielding check-ins against friendship inference deployed by the adversary.

Here we create an example to explain the differences between traditional check-in and shielding check-in scenarios, as shown in Figure 2, in which the upper half is traditional check-in while the lower half is shielding check-in. The traditional check-in system shows the geographical distance and the check-in count of each candidate place for the user. None of the displayed places are examined to ensure the protection of acquaintance privacy. The adversary may thus infer the friends of users with high accuracy using their existing check-in records [12, 19, 24]. To boost the privacy degree of acquaintance and to reduce the effectiveness of friendship inference from the adversary, in this paper, we propose to develop the shielding check-in scenario. The new scenario is expected to evaluate the privacy risk of each candidate place for users so that they can choose the one that ensures higher privacy while also delivering their check-in intent (e.g., close to the current coordinate). Specifically, the place having lower privacy risk means that this place is securer for shielding against acquaintance inference. On the contrary, the places with higher privacy risk will incur accurate friendship inference by the adversary. The shielding check-in system will generate the place list, in which places with lower privacy risk will be put at top positions. If the users choose to check-in at the place with lower privacy risk, the accuracy of successful acquaintance inference will decline.

Figure 2
figure 2

Traditional check-in versus shielding check-in

We propose C heck-in S hielding S cheme (CSS) to implement the shielding check-in system, which consists of two parts. The first part is social strength quantification. We estimate the social strength based on the co-occurrences between users using the check-in data. The utilization of social strength can be divided into two points of views. For the adversary, the attackers tend to devise unsupervised approaches to infer the social ties between users. For the defending party, we need to ensure that the recommended shielding places can avoid the successful inference of friendships. Here we consider four essential factors for social strength quantification, including Global, Personal, Temporal, and Diversity. Each of them can be treated as either the adversary’s attacking strategy or the shielding strategy of the defending party. The second part is shielding place list generation. While a check-in action is executed, a list of near-by places will be generated as shielding check-in recommendation. Under a particular social strength factor as the adversary’s attacking strategy (but we cannot know in prior), these shielding places are expected to significantly reduce the social strength values between users who are acquainted with each other. Meanwhile, the social strength values between strangers (non-acquaintances) are also boosted. Note that CSS aims at protecting friendship information by ranking the shielding power for recommended check-in places. In practice, the service vendor can re-order the shielding list in light of user preference (such as sightseeing places first) while providing their shielding gain for reference. In addition, when the new check-in data is produced by adopting the recommended secure places, we are required to ensure the intent of the check-in (geographically close to her current coordinate) and preserve the usability of such check-in data (in real applications like Point-of-Interest recommendation).

Experiments conducted on Gowalla and Foursquare check-in dataset reveals the following findings. First, under the evaluation metric of Precision@N, CSS shows significant performance compared with several common heuristic competing method, the accuracy that the friends of users being inferred is clearly reduced. Second, when CSS is applied and new check-in records are generated, the usability of new check-in data is validated in both geographical distances and Point-of-Interest (POI) recommender systems. Third, a meaning measure, social density, is created and proven as an effective criterion to determine whether or not users adopt the competing shielding method. Fourth, a set of parameter studies exhibit the robustness of CSS under different factors.

The contributions of this work can be summarized as follows.

  • We formulate the C heck-in S hielding against A cquaintance I nference (CSAI) problem: recommending secure places for user check-ins so that the adversary tends to incorrectly infer the friendships between users. We also show the hardness of the CSAI problem. We are the first attempt to recommend check-in places for anti-inferring acquaintance.

  • We develop the C heck-in S hielding S cheme (CSS) framework, which consists of social strength quantification and shielding place list generation for reducing the potential that acquaintances being inferred by the adversary. We propose four factors, including Personal, Global, Temporal, and Diversity, to measure the social strength between users.

  • Experiments conducted on Gowalla and Foursquare check-in datasets show that the proposed CSS framework can deliver promising results in terms of reduced accuracy, compared with several competing methods. The proposed CSS can also ensure not only the preservation of check-in distance towards the current locations but also the usability of check-in data for POI recommendation.

The remainder of this paper is organized as follows. In Section 2, we review several relevant studies. Section 3 gives the problem formulation and proves its hardness. The proposed Check-in Shielding Scheme (CSS) will be elaborated in Section 4. The experimental results are presented in Sections 5 and 6 concludes this work.

2 Related work

Today large populations of individuals spend a significant portion of their time on social networking services. The trend has given rise to the growth of social network analysis, and the privacy issue of users is getting more attention than before. In the literature, researches had proposed various privacy protection techniques against different attacking behaviors of the adversary in the social network. In this section, we review previous studies related to privacy preserving in social networks, privacy preserving in geo-social services, and social ties inference in location-based social networks.

Privacy Preserving in Social Networks

The privacy preserving problem in social networking industry was early addressed [11, 14, 26]. Zhou et al. [26] first develop the anonymization techniques for privacy preserving publishing of social network data. Three important dimensions are identified in this work: privacy, background knowledge, and data utility. The authors also categorize different anonymization methods into two categories, clustering-based approaches and graph modification approaches. In addition, Liu et al. [14] investigate a specific graph-anonymization problem for identity anonymization on social graphs. The anonymization mechanism prevents the identities of users from being predicted by the adversary with prior knowledge of the network structure. They propose principle-based algorithms to examine the realizability of degree sequences against identity attacking. Moreover, a resisting structural identification in anonymized social networks is developed [11]. The authors quantify the privacy risks based on the knowledge used by the adversary. They prove that risks of attacks are varied greatly based on the network structures.

Recently, to prevent structural re-identification attacks, various graph modification methods have been proposed [22, 23, 25]. Tai et al. [23] identify the degree-based friendship attack, propose the idea of k2-degree anonymity, and exploit Integer Programming formulation to against such attacks. In addition, to cope with the privacy attackers, the concept of k-NMF anonymity is introduced, and an algorithm to ensure the k-degree anonymity plus the k-NMF anonymity is developed [22], in which it allows the occasional addition but no deletion of vertices. Wang et al. [25] propose two k-anonymization algorithms to protect a social network against the friendship attacks. One algorithm is devised based on adding dummy vertices, and the other is invented based on edge modification. They prove that their algorithms are able to generate k-anonymized networks with good utility.

Privacy preserving in Geo-social Services

Recent studies shift to preserve location privacy in geo-social applications [1, 2, 4, 15, 21]. Some of them add noise to users’ locations for privacy preserving. For example, Darakhshan et al. [15] propose an algorithm modifying locations by adding controllable noise to achieve differential privacy in geo-social networks. In addition, Andres et al. [2] present a perturbation technique for achieving geo-indistinguishability, which is drawn from a planar Laplace distribution. Similarly, researchers in [4] also consider the geo-indistinguishability approach to protect location privacy. They construct a mechanism that minimizes the service quality loss by using linear programming techniques. On the other hand, Puttaswamy et al. [21] develop a novel alternative that provides improved location privacy without adding uncertainty into query results. They want to apply coordinate transformations for distance preserving so that servers are unable to infer the actual locations of users from the transformed data. Acs and Castelluccia [1] present a new anonymization scheme to compute the spatio-temporal density of users. Their scheme is devised based on the technology of differential privacy, and provides provable privacy guarantee to each individual in the dataset.

We consider in this paper a different perspective of privacy preserving in location-based social networks. First, we aim to recommend secure places for real-time user check-ins, rather than modifying some of the entire dataset for publishing. Second, the adversary mentioned here utilizes some friendship inference methods rather than de-anonymization techniques. Third, we intend to protect the acquaintance privacy of users, instead of user identities and their visited locations. Fourth, our goal is to recommend alternative places that preserves data usability and user check-in intent. As a result, the proposed method neither perturbs the existing check-ins nor to change nodes and edges in the social networks.

Social ties inference in location-based services

From the perspective of the adversary, the goal is to attack the friendships between users, i.e., inferring the acquaintances of any given user. Hence, we discuss the relevant studies of social tie inference based on check-in data [3, 5, 7, 12, 16, 19, 24]. Cranshaw et al. [7] introduce a set of location-based features such as location entropy for analyzing the social context of a geographic region, and devise a model to predict friendships between users by analyzing their location trails. Pham et al. [19] propose an entropy-based model to not only infer social connections but also estimate the strength of social connections by analyzing people’s co-occurrences using check-in data. Hsieh et al. [12] further develop a two-stage feature engineering by identifying the direct and indirect linkages between users according to a check-in co-location graph. Cheng et al. [5] also focus on predicting whether two individuals are friends based on their mobility information. They simultaneously consider the co-occurrences and the visiting time intervals between users. In fact, these social ties inferences methods are based on purely check-in data, and thus can be considered as the adversary in our work. Backes et al. [3] devise an advanced feature learning technique to summarize users’ mobility features. They also proposed three defense mechanisms, i.e., hiding, replacement, and generalization, for protecting the social link privacy. Nevertheless, their technical goal is to obfuscate the entire set of check-in data for publishing, and their methods cannot handle the dynamic check-ins against the adversary of acquaintance inference in our work.

3 Problem formulation

Suppose there is a check-in set of n users. We first define the check-in records of a user i as a sequence of places and timestamps: \(A_{i} = \langle {a^{i}_{1}}, {a^{i}_{2}},{\ldots } \rangle \), where \({a^{i}_{m}} = ({p^{i}_{m}},{t^{i}_{m}})\) denotes the m-th check-in of user i. The couple \(({p^{i}_{m}},{t^{i}_{m}})\) represents the ID of the place \({p^{i}_{m}}\), and the timestamp \({t^{i}_{m}}\) that user i performs the check-in.

Given any two users i and j, the traditional method to measure the social relationship between user i and j is to calculate their social score by a certain interaction scoring function \(\mathcal {F}\) based on the co-occurrenceOij information [19, 24]. A co-occurrence is formed when two users checked in at the same place within a pre-defined time period τ. Note that the setting of τ empirically depends on the applications. Let Oij = {o1,o2,… } denotes the sequence of co-occurrences between user i and j. Each co-occurrence om contains a place and a timestamp, om = (pm,tm), where tm can be obtained by averaging the timestamps that user i and j check-in at pm within a pre-defined time period τ.

Given the co-occurrences Oij between user i and j, we propose to estimate their social strengthsij, which is a quantitative score derived by the interaction function \(\mathcal {F}\) that depicts how two users interact with one another. A higher value of sij reflects that user i and j tend to be acquainted with each other. One can consider that the social strength sij can be regarded as an unsupervised measure to predict user friendships. The detailed realization of \(\mathcal {F}\) will be discussed in Section 4.1. Since each user i has a value of social strength with any other user, we can create the matrix S = [sij] to represent all of the social strength values between every pair of user i and j.

The essential assumption of this work is that the adversary will devise some attacking strategies to infer the friendships between users based on their check-in records. We believe that the adversary’s strategy usually resorts to confidently approximate the connections between users by estimating their social strength from check-in data. Therefore, as the defending party (i.e., the service providers), the fundamental shielding idea is naturally to cheat the adversary. That says, a good shielding method should make the adversary inaccurately infer the friendships, i.e., treating non-acquaintances as acquaintances and regarding acquaintances as non-acquaintances. Since the friendship inference is derived by estimating the social strength values between users, the design of a shielding method can be formulated as: increasing the social strength values between non-acquainted user pairs and simultaneously decreasing the social strength values between acquainted user pairs. To achieve such a goal, we define the A nti-I nferring Acquaintance R atio (AIR) as the objective to measure how effective a shielding method can be. The formal definition of AIR is presented as below.

Definition 1

Anti-inferring acquaintance ratio: Given the check-in record set A containing the check-ins of all users, and assume that the social strength matrix S (can be derived by a certain interaction function \(\mathcal {F}\)) is constructed using the unshielding check-in data. Suppose a shielding method \(\mathcal {M}\) can recommend alternative secure places and users adopt such places for check-ins, the new social strength matrix is \(\bar {S}\). We define the AIR score between S and \(\bar {S}\) under a certain shielding method \(\mathcal {M}\) as below:

$$AIR_{\mathcal{M}}(S,\bar{S})=\sum\limits_{i,j} \left\{\begin{array}{ll} \frac{s_{ij}-\bar{s}_{ij}}{s_{ij}} & \text{, if user } i \text{ is acquainted with } j \text{ and } s_{ij} \neq 0 \\ -\bar{s}_{ij} & \text{, if user } i \text{ is acquainted with } j \text{ and } s_{ij} = 0 \\ \frac{\bar{s}_{ij}-s_{ij}}{s_{ij}} & \text{, if user } i \text{ is not acquainted with } j \text{ and } s_{ij} \neq 0 \\ \bar{s}_{ij} & \text{, if user } i \text{ is not acquainted with } j \text{ and } s_{ij} = 0 \end{array}\right. $$

The AIR takes all user-pairs into consideration. If user i and j are acquainted with each other and the shielding method \(\mathcal {M}\) can lower down their social strength score, i.e., \(s_{ij} > \bar {s}_{ij}\) (sijS and \(\bar {s}_{ij} \in \bar {S}\)), then the AIR score will get increased. It is because the potential of successful friendship inference via social strength is reduced. On the contrary, if a pair of users i and j have no friendship and the shielding method \(\mathcal {M}\) can boost their social strength score, \(\bar {s}_{ij} > s_{ij}\), then the AIR score will also get boosted. It is because the possibility that two strangers being inferred to have the friendship is raised. In short, inaccurate inference of friendships will increase the score of AIR while precise identification of friendships and non-friendships will damage the AIR score. A good shield method should generate higher AIR score by recommending secure check-in places. Note that some social strengths in social strength matrix S may be zero. However, these zero social strengths will be potentially increased after several check-ins coming as time advances. We need to consider the situation of sij = 0; otherwise, the calculation will be divided by zero.

Here we provide an illustrative example to elaborate the computation of AIR. Suppose that there are three users i, j and k. Among them, user i is acquainted with user j, and user k is not acquainted with users i and j. We also assume that these three social strength scores among such three friendships are: sij = 1.5, sik = 0.6, and sjk = 0.3. That says, the expected friendship inference will be: the potential that i is acquainted with j is larger than the potential that i is acquainted with k. While applying a certain check-in shielding method \(\mathcal {M}_{1}\) to recommend alternative secure places for some of these users’ future check-ins, the social strength scores become sij = 1.8, sik = 2.5, and sjk = 0.9, indicating that the adversary will generate the following inference: the potential that i and k are friends is larger than the potential that i and j are friends. We can derive the corresponding AIR score is: \(AIR_{\mathcal {M}_{1}}(S,\bar {S})= -\frac {1.8-1.5}{1.5}+\frac {2.5-0.6}{0.6}+\frac {0.9-0.3}{0.3}=\frac {14.9}{3} \approx 4.97\). If another shielding method \(\mathcal {M}_{2}\) can generate the social strength scores sij = 2.8, sik = 1.4, and sjk = 0.8, the corresponding AIR score is: \(AIR_{\mathcal {M}_{2}}(S,\bar {S})=-\frac {2.8-1.5}{1.5}+\frac {1.4-0.6}{0.6}+\frac {0.8-0.3}{0.3}=\frac {6.4}{3} \approx 2.13\). Since the shielding method \(\mathcal {M}_{1}\) can generate higher AIR score than \(\mathcal {M}_{2}\), \(\mathcal {M}_{1}\) is believed to be a better shielding method as it can significantly reduce the probability that friendships are precisely inferred while also boosting the possibility that non-acquaintances are incorrectly inferred as friends.

Problem 1

Check-in shielding against acquaintance inference (CSAI): Given a timestamp t, and a set of targeted users U = {u1,u2,…,un}, along with their check-in data records At(U) till timestamp t, suppose that users will perform c times of check-ins in the future from t, the goal of a shielding method \(\mathcal {M}\) is to recommend alternative secure check-in places for these users at each of c check-ins such that the Anti-Inferring Acquaintance Ratio \(AIR_{\mathcal {M}}(S,\bar {S})\) is maximized, where S is the social strength matrix before c check-ins, and \(\bar {S}\) is the social strength matrix derived from the alternative secure check-in places via \(\mathcal {M}\).

The CSAI problem is defined as an online check-in process to recommend the best check-in place. In other words, we cannot know when and where the users will perform check-ins in the future from the current timestamp t. Consequently, it is unrealistic to assume S can be obtained at timestamp t to compute the AIR score and to accordingly recommend the secure check-in places. In order to understand the hardness of the CSAI problem, we define a relaxed problem of CSAI (abbreviated as R-CSAI), in which all check-in records of the targeted users are available. R-CSAI is a special case of CSAI. R-CSAI can be considered as an offline secure check-in obfuscation.

Problem 2

Relaxed check-in shielding against acquaintance inference (R-CSAI): Given a set of targeted users U, and all of the check-in records Ai for each user iU, that users will perform c times of check-ins in the future from t, suppose the check-in places in Ai can be modified at most m times, the goal of a shielding method \(\mathcal {M}\) is to recommend alternative m secure check-in places for these users such that the Anti-Inferring Acquaintance Ratio \(AIR_{\mathcal {M}}(S,\bar {S})\) is maximized, where S is the social strength matrix obtained from Ai (iU), and \(\bar {S}\) is the social strength matrix derived from the set of modified check-in places by the shielding method \(\mathcal {M}\).

Theorem 1

The R-CSAI problem is NP-hard.

Proof

We prove that R-CSAI is NP-hard with the reduction from p-dispersion-sum problem (PDSP) [20], which is NP-hard. Specifically, the PDSP is the task of locating p facilities at some of n predefined locations such that the sum of distance between p facilities is maximized. In other words, given a set of predefined location N = {1,2,…,n} for the facilities to locate, the distance between these facilities i and j is denoted as dij. The PDSP is utilized to maximize the sum of distance between the p established facilities, where the locations of p facilities are selected from N. If the boolean variable xj indicates whether or not a facility is opened, the PDSP can be formulated as the following optimization problem:

$$\begin{array}{@{}rcl@{}} &\text{maximize}& \sum\limits_{i \in N}\sum\limits_{j \in N} d_{ij} x_{i} x_{j} \\ &\text{subject to}& \sum\limits_{j \in N} x_{j} = p \\ &&x_{j} \in \{0,1\}, j \in N \end{array} $$

To simplify the analysis, we consider that two targeted users in the R-CSAI problem, in which such two targeted users are not acquainted with each other. We use R-CSAI_S to represent this special case of R-CSAI problem. Given two users i and j and the check-in records A, the goal is to select a set check-in records M to change such that their social strength value between targeted users is maximized.

A feasible solution to the R-CSAI_S problem can be written as follows:

$$\begin{array}{@{}rcl@{}} &\text{maximize} & \sum\limits_{a_{i} \in A}\sum\limits_{a_{j} \in A} w(a_{i}, a_{j}) x_{a_{i}} x_{a_{j}} \\ &\text{subject to} & \sum\limits_{a_{j} \in A} x_{a_{j}} = p \\ && x_{a_{j}} \in \{0,1\}, a_{j} \in A \end{array} $$

The R-CSAI_S problem is equivalent to PDSP if we map every location iN in PDSP onto aiA in R-CSAI_S, and map dij onto w(ai,aj), which is the social strength contributed by ai and aj. Thus if there exists a set of check-in records M to modify, there also exists a set M for p-dispersion-sum problem. Clearly, the R-CSAI_S problem is NP-hard.

The R-CSAI_S problem is a special case of R-CSAI. In R-CSAI, we need to consider all user pairs and whether each pair is an acquaintance. Consequently, R-CSAI is also NP-hard. □

Given the R-CSAI problem is NP-hard, it is natural that our targeted CSAI problem is more challenging since we cannot consider all of the future check-in records for shielding at the same time. Nevertheless, for practical applications, R-CSAI is unrealistic to presume that all check-ins are available. Therefore, we concentrate on solving the CSAI problem in this work. The shielding method proposed in the next section is to tackle the CSAI problem.

4 Methodology

In this section, we present the proposed Check-in Shielding Scheme (abbreviate as CSS) framework to tackle CSAI problem. The CSS framework consists of two major components: (a) social strength quantification and (b) shielding place list generation. Though we have the social network information from the aspect of service provider, we have no idea about the adversary’s acquaintance inference strategy. Hence, we leverage the former component to estimate the social strength value between users, and consider the derived value as an unsupervised approximation of their social relationship. Based on the obtained social strength values, in the latter component, we aim at generating the list of places as the shielding recommendation for each new-coming check-in action.

4.1 Social strength quantification

Given that the check-in records of users can be used to infer their friendships, in this section, we propose to estimate the social strength between users, which is designed to characterize user interaction in geography. From the check-in data, we can extract the sequence of co-occurrences between users. Co-occurrence information had been proven to be an essential and strong clue for friendship inference [5, 7, 12]. Along with the co-occurrence information, we believe that there are four factors significantly contributing to the estimation of social strength between two users. They are Personal, Global, Temporal, and Diversity. Note that each factor has its advantages and disadvantages. We cannot just exploit one factor to measure the social strength between users. We not only consider four factors to measure the social strength between users, but also ensemble them simultaneously. The ensembling of four factors will be discussed in Section 4.2. Below we elaborate the intuitions and the quantification details for these social strength factors. Note that given each of the following defined social strength factor function \(\mathcal {F}(O_{ij})\), where Oij is the set of co-occurrences between user i and j, each element sij in the social strength matrix S can be obtained via \(s_{ij} = \mathcal {F}(O_{ij})\).

4.1.1 Personal factor

Every user has some places that she frequently visits, and has some places that she rarely visits. A certain place may mean distinct for different users, and different places may also mean distinct for a particular user. Therefore, to capture the social relationship with other users, we need to quantify the importance of each place to the user based on her check-in history. For simplicity, we use \({A^{k}_{i}}\) to denote the set of check-in records of user i who had ever visited place k, given by:

$${A^{k}_{i}} = \{({p^{i}_{m}}, {t^{i}_{m}})\in A_{i}|{p^{i}_{m}}=p_{k}\}, $$

where condition \({p^{i}_{m}}=p_{k}\) can be examined according to place IDs. The personal factor models the probability that a user check-in at a place. The probability of user i check-in at place k can be computed by:

$$p(i,k) = \frac{|{A^{k}_{i}}|}{|A_{i}|}, $$

where Ai is the entire set of locations visited by user i.

Given the probability computed by the personal factor to estimate the place importance to users, we can determine the significance of a co-occurrence ok between user i and user j at place k. The potential that a co-occurrence happens if two users check-in at places where they frequently visit. On the contrary, if two users check-in at places where they rarely visit, then they tend to possess lower potential to be acquainted with one another. Consequently, we can weight a co-occurrence okOij based on the personal factor. The personal co-occurrence weight between user i and j at place k, denoted by \(w^{p}_{ij}(o_{k})\), can be defined as below:

$$w^{p}_{ij}(o_{k}) = -\log(p(i,k)\cdot p(j,k)) $$

Given the set of co-occurrences Oij between user i and j, based on the personal co-occurrence weight \(w^{p}_{ij}(o_{k})\), we propose the personal social strength, denoted by \(\mathcal {F}_{p}(O_{ij})\), as follows:

$$ \mathcal{F}_{p}(O_{ij}) = \sum\limits_{o_{k} \in O_{ij}} w^{p}_{ij}(o_{k}). $$

If two users co-occur at more places and these places are frequently visited by them, a higher value of personal social strength will be derived. It is expected that two users possessing higher value of personal social strength tend to be acquainted with each other.

4.1.2 Global factor

While the personal factor emphasizes the place importance to the targeted pair of users, the popularity of places is also an essential clue to estimate the social strength between users. The co-occurrences at popular places mean less to the acquaintance between users, but the co-occurrences at private places that are usually far less popular provide a stronger evidence to characterize social relationships. For example, a shopping center is much more popular than someone’s home, and thus the shopping center is checked in by a larger number of people. Therefore, we should treat these places differently based on their popularity when modeling the social strength between users. We term such consideration as the global factor.

To model the global factor, we measure the popularity g(k) of a place k using the place entropy based on the number of check-ins. The place entropy is originally proposed by a previous study [7] and also adopted in recent work [19, 24]. Based on user i’s check-in records at place k (i.e., \({A^{k}_{i}}\)), we first compute the probability P(k,i) that place k was checked in by user i, given by:

$$P(k,i) = \frac{|{A^{k}_{i}}|}{{\sum}_{i} |{A^{k}_{i}}|} $$

Then the Shannon entropy of place k, denoted by g(k), can be estimated using the probability values of all users who had ever checked in at place k:

$$g(k) = -\sum\limits_{\{i|P(k,i)\neq 0\}} P(k,i) \cdot \log P(k,i) $$

For a co-occurrence ok = (pk,tk), we consider the global factor to measure the social strength. Generally speaking, private places tend to possess lower entropy values because such places are almost visited by the same individuals. Public places tend to have higher entropy values since they are often visited by different people.

We can weight a co-occurrence okOij based on the global factor. The global co-occurrence weight between user i and j at place k, denoted by \(w^{g}_{ij}(o_{k})\), can be defined based on place entropy and the exponential function [19], given by:

$$w^{g}_{ij}(o_{k}) = exp(-g(k)) $$

Given the set of co-occurrences Oij between user i and j, based on the global co-occurrence weight \(w^{g}_{ij}(o_{k})\), we propose the global social strength, denoted by \(\mathcal {F}_{g}(O_{ij})\), as follows:

$$\mathcal{F}_{g}(O_{ij}) = \sum\limits_{{o}_{k} \in O_{ij}} w^{g}_{ij}(o_{k}) $$

The intuition of global social strength is that if two users frequently co-occur at private places, a higher value of global social strength can be obtained. It is expected that two users possessing higher value of global social strength tend to be acquainted with each other.

4.1.3 Temporal factor

In addition to considering the co-occurrences separately in both personal and global factors for social strength estimation, the consecutive co-occurrences over time provide complementary clues. The basic idea is that two users acquainted with one another tend to meet occasionally (e.g. the time gap between two co-occurrences may usually be several days, weeks, or months). Strangers have higher potential to meet several times within a very short time period (e.g. co-occurring more than 10 times within one day, which may result from taking the same train or staying at the same public place). Therefore, a co-occurrence ok should be penalized if other co-occurrences appear temporally close to co-occurrence ok.

To estimate the social strength from the temporal perspective, we first take advantage of the exponential function [24] to quantify the temporal correlation t(ok,ol) between co-occurrence ok and ol based on their time difference as below:

$$t(o_{k},o_{l}) = \exp(-c_{t} \dot |t_{k} - t_{l}|), $$

where tk is the time of the co-occurrence ok. The parameter ct should be properly determined to scale the significance of time difference between co-occurrences. Two consecutive co-occurrences with shorter time difference will derive higher temporal correlation while those with longer time difference will lead to lower temporal correlation. To reflect the acquaintance based on the temporal factor, we need to penalize co-occurrences with higher temporal correlation and let those with lower correlation have higher weight for acquaintance. Hence, we define the temporal co-occurrence weight \(w^{t}_{ij}(o_{k})\) between user i and j at place k based on existing existing co-occurrences, given by:

$$w^{t}_{ij}(o_{k}) = \left\{\begin{array}{ll} 1-t(o_{k},o_{k-1}) & ,\ k>1 \\ 1 & ,\ k = 1 \end{array}\right. $$

Given the set of co-occurrences Oij between user i and j, based on the temporal co-occurrence weight \(w^{t}_{ij}(o_{k})\), we propose the temporal social strength, denoted by \(\mathcal {F}_{t}(O_{ij})\), as follows:

$$\mathcal{F}_{t}(O_{ij}) = \sum\limits_{o_{k} \in O_{ij}} w^{t}_{ij}(o_{k}) $$

The intuition of temporal social strength is that if the consecutive co-occurrences between two users have larger time gaps, a higher value of temporal social strength can be derived. It is expected that two users possessing higher value of temporal social strength tend to be acquainted with one another.

4.1.4 Diversity factor

People acquainted with each other tend to visit multiple different places together, rather than almost visiting the same places few times [6, 7, 18]. That says, higher diversity of co-occurred places can be an important clue of acquaintance. Therefore, co-occurrences at diverse places should be considered in the measure of social strength. Here we propose to model the diversity of co-occurrence based on the Shannon entropy. Let \(O^{k}_{ij}=\{o_{m} \in O_{ij}|p_{m} = p_{k}\}\) be the set of co-occurrences between user i and j at place k. Oij is the set of all co-occurrences between user i and user j at all places. The probability \(P^{k}_{ij}\) of co-occurrence at place k between user i and j can be represented by:

$$P^{k}_{ij} = \frac{|O^{k}_{ij}|}{|O_{ij}|}. $$

We can use the idea of entropy to define the diversity based on the probability \(P^{k}_{ij}\). Given the set of all co-occurrences Oij between user i and j, the co-occurrence entropy Hij can be defined as below:

$$H_{ij} = -\sum\limits_{k}P^{k}_{ij} \log P^{k}_{ij}. $$

Based on the derived entropy Hij, we use the exponential function of the entropy to define the diverse social strength, denoted by \(\mathcal {F}_{d}(O_{ij})\), as below:

$$\mathcal{F}_{d}(O_{ij}) = \exp (H_{ij}) $$

The intuition of diverse social strength is that if the co-occurrences between two users appear at more different places, a higher value of diverse social strength can be obtained. It is expected that two users possessing higher value of diverse social strength tend to be acquainted with each other.

The co-occurrence weight of diversity \(w^{d}_{ij}(o_{k})\) between user i and j at place k is different from other factors. Because the calculation of diversity should consider all co-occurrences, the co-occurrence weight of diversity is the difference between co-occurrence set with new co-occurrence and co-occurrence set without new co-occurrence .It can be denoted as follows:

$$w^{d}_{ij}(o_{k}) = \mathcal{F}_{d}(O_{ij}) - \mathcal{F}_{d}(O_{ij} \setminus \{o_{k}\}) $$

4.2 Shielding place list generation

Recall that in our problem definition, our goal is to recommend a list of shielding places such that the score of Anti-Inferring Acquaintance Ratio (AIR) before and after adopting the shielding method \(\mathcal {M}\), i.e., \(AIR_{\mathcal {M}}(S,\bar {S})\), can be maximized, where S and \(\bar {S}\) are the social strength matrix before and after using \(\mathcal {M}\), respectively. Given a user i with existing check-in records Ai, at each time she would like to perform check-in at a certain geographical coordinate q, we aim at generating a list L of shielding places to meet such a requirement. The idea of maximizing the AIR score is to expect that the recommended shielding place for check-in can reduce the social strength values between user i and her friends and boost the social strength values between user i and her non-acquaintances. Then the potential that the adversary accurately infers the acquaintances of user i can be lowered down. In other words, given a candidate place p, if it can lead to higher the AIR score, we say it possesses good “shielding effect” and should be added into the shielding place list.

However, directly finding places in geography that maximizes AIR is very inefficient and unpractical for real-world check-in services. The reason is three fold. First, to derive the AIR score, we need to compute the values of social strength for every user pairs. Such an action involves in a large number of redundant computation because not all social strength values of user pairs will be affected when user i performs a check-in action at a certain coordinate q. In fact, only those co-occurred with user i at the neighborhood of coordinate q can be influenced. Second, in the original AIR computation, the social strength values (i.e., \(\bar {s}_{ij}\)) derived from all check-in records and the next check-in place p need to subtract (or be subtracted by) those without p (i.e., sij). Such subtraction would lead to additional computation cost while we have shown that using only the social strength values of \(\bar {s}_{ij}\) can nearly approximate the results with the subtraction. Third, another significant redundancy when computing AIR comes from existing check-in records of user i (i.e., Ai). Each time when we consider the shielding effect of a candidate place p, the AIR computation will repeatedly calculate social strength values using Ai, but it is clearly unnecessary. Our empirical study shows that considering only the social strength value of the candidate place p is enough.

Based on the above three observations and equipped with the four factors of social strength, to boost the computation efficiency when generating the shielding place list, we define the Shielding Gain to measure the shielding effect of a given candidate place. When user i wants to check in at coordinate q, the near-by places that can lead to higher scores of shielding gain will be added to the list of recommended shielding places.

Definition 2

Shielding gain: Given the set acquaintances Fi of user i with her check-in records Ai, assume user i aims to check-in at place p that is in the neighborhood of her current geographical coordinate q at time t. Let N(p,t) be the set of users who performed a check-in action at p at time t, where |tt|≤ τ and τ is a pre-defined time window. Also let \(\hat {o}\) be a pseudo co-occurrence at place p at time \(\frac {t+t^{\prime }}{2}\) between user i and j. The shielding gain that user i check-ins at place p, denoted by δi(p), is defined as below:

$${\delta^{y}_{i}}(p) = -\alpha \sum\limits_{j \in F_{i} \cap N(p,t^{\prime})} w^{y}_{ij}(\hat{o}) + (1-\alpha) \sum\limits_{k \in N(p,t^{\prime}) \cap U \setminus F_{i}} w^{y}_{ik}(\hat{o}), $$

where y is the selected factor of social strength (y ∈{p,g,t,d}), U is the set of targeted users (e.g. users who adopt our shielding method), N(p,t) = {j|(p,t) ∈ Aj and |tt|≤ τ}, \(w^{y}_{ij}(\hat {o})\) is the factor y’s co-occurrence weight between user i and j, and α ∈ [0,1] is the parameter that determines the importance of shielding effect contributed between acquaintances and non-acquaintances.

The value of shielding gain \({\delta ^{y}_{i}}(p)\) gets higher if two situations happen: (1) the candidate check-in place p can make the adversary harder to infer the friends of user i, i.e., reducing the co-occurrence weights \(w^{y}_{ij}(\hat {o})\) of social strength factor y, and (2) the candidate check-in place p can lead to incorrectly infer i’s non-acquaintances as her friends, i.e., increasing the co-occurrence weights \(w^{y}_{ik}(\hat {o})\). In other words, if place p can boost the value of \(w^{y}_{ij}(\hat {o})\), we add a negative effect in the shielding effect. If place p can raise the value of \(w^{y}_{ik}(\hat {o})\), a positive effect will be added into the shielding gain. We put such two parts together to be the shielding gain. It is worthwhile noticing that we consider N(p,t) to lower down the computation cost since only those who have ever performed check-ins at p at nearby timestamp t (|tt|≤ τ) can be considered as co-occurrences with user i.

Eventually, given the current geographical coordinate q of a user i at time t, we aim at finding a list of places as the recommended shielding ones from q’s neighboring location set Γr(q) = {p|dist(p,q) ≤ r}, where r is the radius centered at coordinate q in meters. The underlying idea is to avoid user i check-in with her friends but guide user i to check-in with non-acquaintances. Here we provide the complete algorithm Check-in Shielding Scheme (CSS), as presented in Algorithm 1, to generate the list of secure places for protecting the acquaintance privacy of user i.

When user i plans to check-in at her current geographical coordinate q, we first find the set Γr(q) of neighboring place with radius r centered at q (line 1). After preparing the final secure place list L (line 2), we examine all of the candidate places in Γr(q) to estimate the shielding gain δi(p) (line 3-9). For each place p ∈Γr(q), we first construct the set N(p,t) of users who have ever performed check-ins at place p at time t (|tt|≤ τ) (line 4). Then we create the pseudo co-occurrence \(\hat {o}_{ij}\) between user i and j, where jN(p,t) (line 5). By adding the co-occurrences \(\hat {o}_{ij}\) into the check-in records of user i (line 6), we can compute the shielding gain \({\delta ^{y}_{i}}(p)\) of user i at place p (line 7). Afterwards, place p is included in the final list L (line 8) and the co-occurrences \(\hat {o}_{ij}\) are removed from user i’s check-in records Ai (line 9). Last, we rank all places in final list L in descending order based on their shielding gain p.score (line 10).

figure a

In our CSS algorithm, the defending party (e.g., the service provider) is allowed to specify which factor of social strength (i.e., Personal, Global, Temporal, and Diversity) for estimating the shielding gain of places. One can imagine that the adversary may utilize one of the four factors as an unsupervised attacking method to estimate the potential friendships between users. According to existing literatures, unsupervised approaches to infer friendship based on check-in data are widely considered [5, 7, 12, 16, 19, 24]. It is worthwhile to mention that four factors will not combine together in this work. The reason is that if the proposed social strength can successfully capture the adversary inference strategy, the performance of shielding method would be better. If we incorporate four factors together, it will become less likely for our social strength to reflect the adversary’s strategy. Since it is impossible to know the adversary’s strategy in advance, we cannot prepare the corresponding factor in measuring the shielding gain. Therefore, to develop a more robust and stronger CSS method, we choose to consider such four factors simultaneously. In other words, we are neither to use only one factor, nor to combine them. We examine all of these four factors. And by a ranking-based ensemble method, as described below, we seek that any of them is the possible strategy adopted by the adversary.

The new version of CSS is to ensemble four social strength factors {p,g,t,d} based on their individual ranking results. For every place p ∈Γr(q) and for each factor y ∈{p,g,t,d}, we can generate a rank value \({\rho ^{y}_{i}}(p)\) based on the shielding gain \({\delta ^{y}_{i}}(p)\) in descending order. A higher \({\delta ^{y}_{i}}(p)\) value leads to a higher rank \({\rho ^{y}_{i}}(p)\) (i.e., lower rank values). In the re-ranking step of the CSS Algorithm (line 10 in the pseudo code), we can generate an ensembled rank value \(\rho _{i}(p) = {\sum }_{y \in \{p,g,t,d\}} {\rho ^{y}_{i}}(p)\). Eventually, places with lower rank values ρi(p) will be ranked at top positions and first recommended. Note that the CSS method throughout the experiments adopts such an ensemble approach to simulate the real cases.

Recall that in Figure 2, we mentioned that our shielding check-in system will provide a value of privacy risk for each candidate place. Places with lower values of privacy risk refer to more secure ones and should be chosen by the user. We can consider the ensembling rank ρi(p) of user i for place p as the corresponding privacy risk value. In other words, lower values ensembling rank lead to lower privacy risk.

5 Evaluation

In this section, we conduct experiments to examine our proposed Check-in Shielding Scheme (CSS). The evaluation goal is designed to answer the following questions. First, does the proposed CSS outperform with other competing methods? Second, in our CSS method, is the effect of acquaintance or non-acquaintance more significant than the other one in the performance? Third, how different social characteristics in places affect the shielding performance? Fourth, can the proposed method lead to better performance for a variety of adversary methods? Fifth, how does the number of existing check-ins of users affect the performance? Sixth, can the usability of the adjusted check-in data be maintained in terms of keeping users’ check-in intents and remain the quality of Point-of-Interest recommendation? All experiments are conducted on a Linux machine with a 3.40 GHz Intel i7 Core and 16G memory.

5.1 Data description

The data used for the experiments was collected from two location-based social networking services, Gowalla and Foursquare. Users share their visited locations by checking in at places on these two services. The check-in data of Gowalla are collected from February 2009 to October 2010 [6]. The check-in data of Foursquare are collected from January 2015 to February 2016 [13]. In both datasets, in addition to obtaining the check-in information: 〈 user ID, timestamp, latitude, longitude, location ID 〉, we also construct the friendship network. The statistics of both datasets are presented in Table 1. For the co-occurrence, it is worthwhile to mention that we do not aggregate the same location with the different names or different ranges for check-ins. We consider only check-ins with their corresponding location IDs. We do not aim at solving the problem of identifying various locations using check-ins, which goes beyond the goal of our work. We assume that check-in places can to some extent represent different venues even if they locate at the same coordinate. In fact, each check-in record is explicitly associated with a venue name, which reflects the user’s check-in intent. For the real deployment, it is necessary to deal with the issue of sharing the same location by different names and ranges, which is an important but orthogonal issue to our work.

Table 1 Statistics of Gowalla and Foursquare data

5.2 Evaluation settings

We compare the proposed CSS with the following six methods to recommend the places to shielding the friendships.

  • Random: It randomly chooses a place from all of the places.

  • Nearest: It selects the place nearest to the given place. This method is to preserve the distance usability.

  • Popular: It selects the most popular place in the neighborhood of the given place. In other words, the place with the most check-in count will be chosen. The method can obfuscate the friendships since many people check-in at the popular place.

  • Private: The most private place will be selected. That says, the place with the least check-in count will be chosen. In this way, this method can create the fake friendships to cheat the adversary.

  • node2vec [10]: node2vec is a well-known feature representation learning technique for nodes in network. The goal of node2vec is to learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods. node2vec can be extended to accommodate our problem as a competitive method. We follow the previous work to construct the user-place bipartite graph by check-in data [17]. The vectors of all nodes are produced by executing node2vec on the user-place bipartite graph. We calculate the cosine similarity between vectors of the given place and its neighboring places. Finally, the most similar place will be selected.

  • walk2friends-replacement [3]: Backes et al. proposed three mechanisms (i.e., hiding, replacement, and generalization) to mitigate the social link privacy risk in walk2friends. The replacement mechanism is similar to our goal that recommends a place for the user to check in. They adopt the random walk mechanism to find the locations close to the original ones. A user-place bipartite graph is constructed by using check-in data, and random walk starts from the user node in the graph. The last visited node in the trace of random walk will be selected for recommending. In our study, the number of steps in the random walk mechanism is 15 by following the conclusion in [3]. This competing method is abbreviated as “w2f-replace” in experimental figures.

Since there is no criterion to decide when users should adopt the places recommended by any method. Hence, we design the Social Density threshold (abbreviated as ξ) to determine whether one will use the recommended place. We consider that if a place exhibits high social density, it is exposed to the adversary with higher risk because more friendships can be observed here. The social density of a place is the percentage of friendships in all visiting user-pairs. Let Hi is the set of people who check in at location i, and the Ri is the set of their corresponding friendships. \(C^{|H_{i}|}_{2}\) presents the number of all user pairs in Hi. The social density can be presented as \(\frac {|R_{i}|}{C^{|H_{i}|}_{2}}\). When the social density of place is higher than the user-defined threshold ξ, user adopts the recommended check-in place. In other words, smaller threshold values of social density will make users more often to adopt the recommended places. The time-interval τ for the co-occurrence is set as one hour.

The evaluation metric used in our experiments is Precision@N, which estimates the percentage of relevant instances among the top-N ones. Here we consider N is a small number since investigators might concern more about those early reported. We empirically set N = 20 at most in this work. Let Fi(N) be the set of the top-N inferred acquaintances for user i and \(\hat {F}_{i}\) be the set of user i’s ground-truth friends. The score of Precision@N is defined as \(\frac {|F_{i}(N)\cap \hat {F}_{i}|}{N}\). A good shielding method against friendship inference by the adversary will lower down the score of Precision@N since it is able to mislead the adversary’s inference. In other words, under a particular adversary’s friendship inference method, if places derived by a shielding place recommendation method can generate the curve of Precision@N with a larger gap below the curve of using the unshielding places, then such a method can be considered as an effective one. Note that the adversary may exploit any subset of four factors as an unsupervised attacking method to infer the potential friendships between users. We shall not assume the utilized factors can be known in advance. To simulate the real cases, the CSS method in our empirical studies refers to the ensemble method which has been discussed in Section 4.2.

5.3 Experimental results

5.3.1 Comparison of recommended strategy

We first present the performance of our CSS framework, comparing to the six competitors. The experiments are conducted on Gowalla and Foursquare. The number of inferred friends N is varied from 1 to 20. Note that the parameter α of our CSS in this experiment is 0.8 by default. We will discuss the effectiveness of varying α values later in Section 5.3.2. In this experiment, we assume that the shielding mechanism works at all the time for users. Note that we will discuss the efficacy of different starting time of using the shielding place recommendation in Section 5.3.6. We aim at discussing the performance of various check-in place obfuscating methods, in which the better obfuscating methods can lead to lower precision. The friendship inference method used by the adversary is PGT [24], which is the state-of-the-art unsupervised friendship inference method.

Figure 3 presents the results in Gowalla data under social density thresholds 0.3 and 0.1. As can be apparently seen, CSS significantly outperforms all competitors. The lower social density threshold (i.e., ξ = 0.1 in this study) leads to the better performance for the competing methods. However, CSS is still better than competing methods. In addition, these methods, including nearest, popular, private and node2vec, walk2friends-replacement have no significant improvement as the social density threshold is decreased from 0.3 to 0.1 when N < 10. The result supports the argument that the number of recommended places will not significantly affect the early inferred friends. In addition, those places recommended by CSS can effectively shield the friendship inference.

Figure 3
figure 3

Comparison of different shielding methods in Gowalla (adversary:PGT)

Figure 4 shows the results in Foursquare data under social density thresholds equal to 0.003 and 0.001. Similarly, we can observe that Precision@N scores of CSS outperform competing methods with the largest reduced precision. The performance of competing methods merely gain a slight improvement when the social density threshold is reduced from 0.003 to 0.001, showing the situation as same as the experimental results in Gowalla. In addition, we also find that a larger number of recommended places might not lead to the better shielding improvement. Such results imply that recommending proper places is more important than recommending more places.

Figure 4
figure 4

Comparison of different shielding methods in Foursquare (adversary:PGT)

To clearly compare the performance between the competing methods and the proposed CSS, we create Tables 2 and 3 to show the improvement of CSS. Here we consider N = 1 to N = 10 since it is common for adversaries to concern more about those who are early reported as the inferred acquaintances. The best and worst competing methods refer to methods with the best and the worst performance among six competing methods (i.e., Random, Nearest, Popular, Private, node2vec, walk2friends-replacement). For example, the best competing method in P@8 is Random and the worst in P@8 is walk2friends-replacement. We can observe that improvement of CSS is up to 54.7% and 69.8% in Gowalla and Foursquare, respectively. Though the performance of lower social density is better than that of higher social density, the usability of lower social density is worse than that of higher social density. Such results imply that lower social density value leads to recommend more places for the user. When the more places are recommended to the user, instead of freewill places, the usability of check-in data will be reduced. The discussion of usability analysis is presented in Section 5.3.7. Specifically, the utilization of the shielding gain leads to the significant performance in CSS. Note that the shielding gain is determined according to the variation of social strength values between acquaintances and non-acquaintances when issuing a check-in action. CSS can thus recommend shielding places with better interference of friendship information. In contrast, the design of other competing methods generally lacks for the consideration of shielding effectiveness, incurring their weak effectiveness to protect the friendship information.

Table 2 Improvement on competitive performance in Gowalla data (WCM denotes worst competitive method, and BCM denotes best competitive method)
Table 3 Improvement on competitive performance in Foursquare data (WCM denotes worst competitive method, and BCM denotes best competitive method)

It can be apparently to notice that the main difference between Gowalla and Foursquare datasets lies in the social network density, as shown in Table 1. The social network density denotes the ratio of the number of existing edges to the total possible number of edges in the social network. Based on Figures 3 and 4, the Gowalla network possessing higher density leads to the higher accuracy of friendship inference than the Foursquare network. That says, it is more difficult to lower down the accuracy of acquaintance inference for the Gowalla network, compared with the Foursquare network. Another factor to affect the accuracy of friendship inference is the average number of check-ins per user. The precision of friendship inference in unshielding check-ins in Gowalla is similar to in Foursquare. However, the reduced precision caused by shielding methods in Foursquare is higher than that in Gowalla. The reason is that the social network density of Foursquare is relatively lower than of Gowalla so that the adversary is less likely to accurately infer friendships between users after adopting the shielding method. Experiments conducted on both Gowalla and Foursquare datasets with different methods demonstrate the consistent trend with respect to the social network density.

An important requirement of a good shielding method is not to destroy the property of unshielding data. When users perform check-ins and see the recommended places, it is usually that users want to exhibit themselves there. Hence, even she adopts the secure places recommended by a shielding method, such places must be quite close to the unshielding one. In this experiments, we aim to exhibit the distance between the unshielding check-in places and the adopted shielding places. A reliable method is believed to tend to recommend near-by and secure places. The average geographical distances between the unshielding check-in places and the adopted recommended places with different social density thresholds are shown in Figures 5 and 6, respectively. Note that walk2friends-replacement mechanism is not drawn in this experiment, since it always recommends places which are far away from the actual user location. We can find that it is natural that the Nearest method leads to the shortest average distance, and the average distances of Random and Popular are much longer than other methods. We think it is because Random would randomly select distant places, and the most popular place is usually far from users’ unshielding check-in locations. The Private method also generates short average distance since most places have low check-in count and such places distribute in geography. Although the average distance of our proposed CSS is 50 meters larger than the Nearest method, but the difference between Nearest and CSS is still small and acceptable. However, the performance in terms of reducing Precision@N, as shown in Figures 3 and 4, CSS is much better than all competing methods. In short, the proposed CSS can strike a balance between the usability regarding geographical distance and the privacy of friendships.

Figure 5
figure 5

Average distance to the unshielding check-in places in Gowalla

Figure 6
figure 6

Average distance to the unshielding check-in places in Foursquare

5.3.2 Effect on α in our CSS

We discuss the effect of α that determines the importance between acquaintance and non-acquaintance in our CSS method in this experiment. We set α = 0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.8 and 1. We not only vary the parameter α, but also show the variation of α with four factors (Diversity, Global, Personal, Temporal) in both Gowalla and Foursquare data. The results are shown in Figures 7 and 8. We can observe that in general the performance gets improved as α increases in each factor. However, if we only consider the part of acquaintance (i.e., α = 1.0), the performance is the worst case (i.e., the curve of Precision@N is the highest one). We can find α = 0.8 leads to the best performance. Such results imply that we should not only consider the effect of acquaintances but also consider the effect of non-acquaintances. Empirically, we set α = 0.8 by default to execute all experiments.

Figure 7
figure 7

Performance by varying α in Gowalla (adversary:PGT)

Figure 8
figure 8

Performance by varying α in Foursquare (adversary:PGT)

5.3.3 Effect of social density threshold ξ

Recall that the social density threshold is created to help users determine when to adopt the places recommended by a shielding method. In this experiment, we aim at discussing the effect of social density threshold ξ. Figures 9 and 10 present the effect of social density for each method by varying the social density threshold from 0.0 to 0.8 in Gowalla and Foursquare, respectively. We can observe that the lower social density leads to better performance. The performance has no significant improvement until the social density is close to 0. The reason is that many places have low social density. The recommender system for the methods will not recommend a place for these situations which social density is low. Therefore, the difference is not apparent between ξ= 0.2 and ξ= 0.8.

Figure 9
figure 9

Results by varying the social density threshold in Gowalla (adversary:PGT)

Figure 10
figure 10

Results by varying the social density threshold in Foursquare (adversary:PGT)

5.3.4 Comparison of social strength factors in CSS

Since our CSS mechanism consists of four factors of social strength, including Personal, Global, Temporal, and Diversity, in this experiment, we aim to understand how each factor works by considering each of them as the adversary’s attacking strategy. The reason we choose these social strength factors as the adversary is that these factors are the most intuitive and essential clues of acquaintance privacy that a shielding method needs to defend. The results are shown in Figures 11 and 12. It can be apparently found that in both datasets, each of the proposed factors can significantly reduce the precision scores under various adversary’s strategies. However, no factor consistently possesses the lowest precision scores. One may believe that for different adversary’s strategies, the shielding method should be adjusted accordingly. While no perfect individual factor against all adversary strategies, we alternatively seek to make the combination of such four factors, as proposed in Section 4.2.

Figure 11
figure 11

Performance of four social strength factors in Gowalla

Figure 12
figure 12

Performance of four social strength factors in Foursquare

5.3.5 Relationship between AIR and precision

In the definition of our CSAI problem, we propose AIR as the objective to select secure check-in places. A good check-in shielding method is supposed to maximize the AIR score. In the experiments, the performance is also measured by to what extent a shielding method reduces Precision@N. Here we plan to examine the relationship between AIR and reduced Precision@N under CSS and other competing methods. Figures 13 and 14 shows results in Gowalla and Foursquare datasets. Note that in these sub-figures, the left y-axis is AIR drawn as bar charts while the right y-axis is the reduced score of Precision@10 drawn as line charts. The score of reduced precision is the difference between the original precision and the precision based on new check-ins generated by a certain shielding method. From the result, it is apparent that AIR is highly correlated with reduced Precision@10. That says, under different shielding methods, more reduced Precision@10 is consistent with higher AIR values against various adversary’s attacking strategies under different social density threshold ξ. Such results also implicitly unveil that the shielding gain in our CSS is able to effectively approximate AIR scores while reducing the computational cost.

Figure 13
figure 13

AIR versus precision in Gowalla (Adversary:PGT)

Figure 14
figure 14

AIR versus precision in Foursquare (Adversary:PGT)

5.3.6 Effect on existing check-ins before shielding

We aim at studying how existing check-in records of users affect the shielding performance. We wonder to know whether the early place shielding can more protect users’ acquaintance privacy and whether the late shielding will raise the risk that the friendships are inferred by the adversary. We observe Precision@N of different method under varies number of existing check-ins (i.e., 0, 100, 200 and 300) before applying a place shielding method. Figures 15 and 16 present the results in Gowalla and Foursquare, respectively. Some observations can be found in this study. First, it can be apparently observed that the performance of CSS is the best in any number of check-ins before using check-in place shielding method. Such results imply that our CSS method is reliable even if users have many check-ins that may have revealed their acquaintances. Second, we can see that more number of check-ins before shielding lead to the smaller difference between CSS and other competing methods. The reason is that our proposed CSS recommends fewer places for users when the number of check-ins before shielding increases. For example, when the check-in count before shielding increases to 300 in Gowalla, the ratio of check-ins recommended by CSS is 7.56% and the ratio of check-ins recommended by the competing methods is 11.83%. As the same situation in Foursquare, when the check-in count before shielding increases to 300, the ratio of check-ins recommended by CSS is 6.33% and the ratio of check-ins recommended by the competing methods is 11.19%. Such results reflect the fact that CSS uses the shielding gain to determine the places for check-ins while the competing methods rely on the social density thresholds.

Figure 15
figure 15

Results by varying the number of existing check-ins before shielding in Gowalla (adversary:PGT)

Figure 16
figure 16

Results by varying the number of existing check-ins before shielding in Foursquare (adversary:PGT)

5.3.7 Usability of recommended shielding places

An important application based on user check-in data is Point-of-Interest (POI) recommendation. Therefore, the service providers of location-based social networks need not only to prevent users’ profiles and their acquaintances from being inferred by the adversary, but also to maintain the usability of check-in records that may have been modified by some shielding method. In this experiment, we aim at examining the usability of the shielded check-in data under the application of POI recommendation. We take advantage of the state-of-the-art random walk-based model proposed by Likhyani et al. [17] as the POI recommendation method. The input and output of such POI recommender are the modified shielding check-in places and the top-K recommended POIs, respectively. The evaluation metric is that among the top-K POIs recommended by Likhyani et al. [17] as the ground truth, what percentage of the ground-truth POIs can be reported by using the check-in data modified by a certain shielding method. We term such percentage as “Coverage Ratio”. Let OC be the set of K POIs recommended using the unshielding check-in data, and RC be the set of K POIs recommended using the obfuscated check-in data. The coverage ratio is defined as: \(\frac {|OC \cap RC|}{|OC|}\).

The results of coverage ratios for Gowalla and Foursquare are reported in Figures 17 and 18, respectively. Obviously, the coverage ratio of CSS is better than that of other competing methods except for the walk2friends-replacement method. Since the random walk mechanism is adopted by the walk2friends-replacement method which can recommend the similar locations, walk2friends-replacement has the best coverage performance in this experiment. Nevertheless, recall that our CSS mechanism is capable to significantly reduce the Precision scores compared to these competing methods, as shown in Figure 3. We believe our CSS is the best choice to strike a balance between protecting user privacy of acquaintances and maintaining the usability for POI recommendation.

Figure 17
figure 17

Difference of venue recommendation in Gowalla

Figure 18
figure 18

Difference of venue recommendation in Foursquare

6 Conclusions

In this paper, we solve the novel Check-in Shielding against Acquaintance Inference (CSAI) problem, which is to recommend secure places for user check-ins so that the adversary is hard to identify the acquaintances of users. To tackle the CSAI problem, we develop a shielding check-in system, called C heck-in S hielding S cheme (CSS). CSS is capable of exploit user location visiting preferences to guide the selection of check-in places such that the potential friendships being inferred by the adversary is reduced. Meanwhile, the possibility of the strangers being predicted as acquaintances is boosted. Note that the goal of the CSS framework is to provide a list of nearby places with better shielding gain. In practice, the user can follow her/his personal preference to select the check-in place from the list. Finally, extensive experimental studies on two real check-in data demonstrate that CSS significantly outperforms other competing methods, showing its feasibility to be a promising service for user privacy protection.