1 Introduction

With the rapid development of mobile device and wireless networks, Spatial Crowdsourcing (SC) is a new extension of traditional crowdsourcing, where crowd workers are assigned by the crowdsourcing platform to perform spatial tasks. Spatial Crowdsourcing provides on-demand ability and quality services and spends less monetary cost than hiring specialized technical personnel. Now, it has been successfully applied in various industries, for example, to online taxi-calling service (e.g., DiDi and Uber), handyman service (e.g., TaskRabbit) and food delivery service (e.g., Eleme and Meituan). In spacial crowdsourcing services, workers are ready for task assignment in SC platform and requesters usually raise their requests to the SC platform, then the platform assigns these tasks to workers. The assigned workers always need to travel to the location of corresponding task physically. The basic problem in SC is how to choose suitable workers to perform right tasks. To ensure practicability, it is significant to study this problem in the online scenario (online task assignment problem) where workers and tasks appear on the SC platform dynamically.

By observing the release modes of SC task assignment proposed in existing studies, we found that many works focus on optimizing the benefits of SC platform during the process of task assignment. For example, [29] and [36] aim to minimize the total cost and then the benefit of platform is maximized. However, userexperience is equal significant and can not be ignored in practice for the SC platform. The foundation of SC platform is the number of users. There always are several similar platforms which provide the same service, and users prefer to choose the platform which provides service with higher user experience. As a result, it is vital for the SC platform to focus on improving the user experience. Besides, many previous works chose the travel distance instead of the travel time between workers and tasks. This could save some computation, but it makes experimental results inaccurate and unpractical. The following example demonstrates the importance of user experience and the necessary of taking the velocity attribute into consideration:

We choose online taxi-calling service as an example, users (passengers) raise their requests to the SC platform and hope that they only need to wait a short time so that the taxi can reach their locations and take them to their destinations as soon as possible. As is shown in Figure 1, we assume that a taxi-calling platform has 3 tasks (s1s3) and 5 workers (u1u5) in 2D space, the ID, distance and location of them and the velocity of workers are marked in this figure. The arriving time of all tasks and workers are shown in Table 1. We assume that worker u3 and worker u5 get stuck in traffic and their speeds are slower than others such as the speed of u3 is 1. What’s more, a task can only be assigned to one worker and each worker can perform only one task. Then we compare the average waiting time (the evaluation indicator of user experience) of the following two different strategies. In the first strategy, we set the optimization goal as minimizing the average waiting time of all tasks, and the solution will be that s1, s2, s3 are matched to u1, u2, u4. And the average waiting time is (24/6 + 18/3 + 25/5)/3 = 5 (Here we ignore the time of waiting for assignment and we will consider it in the body of this paper). In the second strategy, we set optimization goal as minimizing the average traveling distance, the solution will be that s1, s2, s3 are matched to u3, u2, u5. The average waiting time is (10/1 + 18/3 + 18/2)/3 = 8.3. During this process, users need to wait for the taxi and the waiting time should be as short as possible because long waiting time may urge users to use competitive platforms and then the benefits will be significantly reduced. In the second strategy, user (passenger) s1 and s3 wait too long and achieve a bad user experience so that they may choose another taxi-calling platform instead of using this platform next time. Obviously, the SC platform cannot assign tasks based on the distance of workers and tasks rather than the traveling time of workers because the speed of workers (traffic condition) could have a great influence on user’s waiting time and user experience.

Figure 1
figure 1

An example of 3 passengers and 5 taxis

Table 1 Arriving Time of workers and tasks

In order to make optimal task assignment, it is necessary for the SC platform to know the locations and speeds of workers and tasks. However, the locations and speeds are privacy for workers and tasks because they can lead to workers being tracked or tasks being destroyed, etc. Task assignment may become inefficient when workers and tasks are hesitant to share their locations or others due to privacy concerns. For each worker, his/her location and speed are both private and should be protected. For tasks (users), the location is private and should be protected. We aim to encrypt the locations and speeds based on Paillier cryptosystem and then calculate the traveling time based on the encrypted data without compromising workers and tasks privacy. In order to calculating traveling time by locations and speed, we need to use division operation. How to perform division efficiently and accurately on encrypted data is still an open problem. In [16], the author proposed a protocol which can divide data securely based on ElGamal cryptosystem. However, this protocol cannot be applied to large SC system for it’s key length should be set large enough to avoid computation overflow and the large key size will lead to prohibitive computation cost in encrypted data. To overcome this weakness, we transform the secure division problem into a secure least common multiple (LCM) problem so that we can calculate the traveling time of workers securely and preserve the privacy of workers and tasks.

We summarize our main contributions as follows:

  • We formally define user experience-driven secure task assignment problem in Section 2, which tries to minimize average waiting time and protect private information during task assignment simultaneously.

  • We propose two methods to construct encrypted bipartite graph to protect private information of workers and users in Section 3.1.

  • We propose in Section 3.2 a secure Kuhn-Munkres algorithm which takes an encrypted bipartite graph as input, and outputs a plan of task assignment. Security and complexity of this approach is discussed in Section 4.

  • We conduct extensive experiments on synthetic data and show the effectiveness and efficiency of our approach in Section 5.

  • Compared with our previous work [13], the problem considered in this paper is more complex and reasonable in practice. To handle this complicated problem, we propose methods of constructing encrypted bipartite graph and making KM algorithm workable on encrypted data.

The rest of the paper is organized as follows. We present the preliminary in Section 2. In Section 3, we present two online privacy-preserving task assignment algorithms: SKM-EG and SKM-AG. Security and complexity analysis is presented in Section 4. Further, we report experimental results on synthetic dataset in Section 5 and review related work in Section 6. Finally, we conclude the paper in Section 7.

2 Preliminaries

2.1 System model

The core of a spatial crowdsourcing system is a platform connecting a set of workers with a set of tasks. Workers and tasks arrive the platform dynamically. Without loss of generality, we take a periodical task assignment model. Before explaining it, we give several basic definitions.

Definition 1 (Worker)

A worker u is denoted by a triple 〈l, v, x〉, where l is u’s location, v is u’s speed, and x is the time when u appears on the platform.

Definition 2 (Task)

A task s is denoted by a tuple 〈l, x〉, where l is its location and x is the time when s is posted on the platform.

Definition 3 (Task Assignment Instance Set)

Given a set of workers U and a set of tasks S, a task assignment instance set, denoted by I, is a set of pairs in the form of pij = (ui, sj) where (ui, sj) means worker uiU is assigned to task sjS.

In the above definition, we assume that one worker is assigned to only one task, which is quite common in real spatial crowdsourcing applications such as DiDi (i.e., one taxi is assigned to one passenger). Once a worker ui is assigned to a task sj, both of them will be removed from the platform. We don’t care how long it takes for ui to complete sj, but once sj is completed and ui becomes available, ui will appear on the platform again. Based on this setting, we define the periodical task assignment model as follows:

Definition 4 (Periodical Task Assignment Model)

The platform performs task assignment every τ time units. More specifically, at the beginning of one cycle, the platform generates a task assignment instance set I, given all available workers U and all available tasks S at that time. The workers and tasks involved in I will be removed from U and S, respectively. In the following τ time units, U and S may be updated due to the appearance of new workers and new tasks, or the logout of workers and the cancellation of tasks. After τ time units, the platform generates another task assignment instance set, and so on.

Remark:

The periodical task assignment model can be regarded as a trade-off between offline task assignment and online task assignment. When τ is so small that every task is assigned to an available worker as soon as it is posted on the platform, this model is clearly a complete online task assignment process. On the other hand, if τ is large enough, there is only one round of offline task assignment. Therefore, the exact value of τ is important and should be adjusted dynamically in practice. Further, it is very likely that not all tasks/workers can be matched to some workers/tasks in one round of assignment. In this case, these tasks/workers will participate the next round of task assignment until they are assigned or offline.

As mentioned earlier, user experience plays a crucial role in spatial crowdsourcing applications. In practice, once a task is posted in the platform, it should be completed the sooner the better. Here, a task is thought to be completed once the assigned worker arrives at its location (e.g., a passenger has got on a taxi). Note that we do not consider the time that a worker needs to perform a task, as sometimes this depends on the task itself (e.g., the distance between the source and the destination in a taxi-calling task), which is beyond the scope of task assignment. Therefore, we consider using the interval between task posting and task completion to be a specific index of user experience. In particular, we have the following definitions.

Definition 5 (Travel Time)

The travel time of worker ui to task sj, denoted by tij, is calculated as follows:

$$ t_{ij} = d(l_{i}, l_{j}) / v_{i} $$
(1)

where d is the direct Euclidean distance between location li and location lj.

Definition 6 (Average Waiting Time)

Given cycle τ in the periodical task assignment model and an interval of kτ time units, the average waiting time during this interval is calculated as follows:

$$ \omega =\left( \sum\limits_{c=0}^{k-1} \sum\limits_{p_{ij}\in I_{c} } (c\tau+t_{ij}-x_{j}) + \sum\limits_{s}k\tau\right) / n $$
(2)

where n is the total number of tasks posted during the period of kτ time units. The left part \({\sum }_{c=0}^{k-1} {\sum }_{p_{ij}\in I_{c} } (c\tau +t_{ij}-x_{j})\) is the sum of waiting time of tasks that have matched workers, noting that task sj has waited cτ time units before getting assigned in the c-th round of assignment. The right part \({\sum }_{s}k\tau \) is the sum of waiting time of tasks that have not been getting assigned in the k rounds of assignment.

In the above definition, we do not consider overdue tasks for simplicity. In practice, it is sometimes not easy to know the exact deadline of a task, for example, a passenger can cancel a taxi-calling request anytime.

2.2 Problem definition

We focus on the user experience-driven secure task assignment problem in spatial crowdsourcing under the semi-honest model. In particular, our objective is to minimize the average waiting time of all tasks in a given period without disclosing private information of workers and tasks to unauthorized parties. Based on the aforementioned definitions, we can formalize the User experience-driven Secure Task Assignment (USTA) problem as follows:

Definition 7 (USTA Problem)

Given cycle τ in the periodical task assignment model and an interval of kτ time units, the USTA problem is finding k task assignment instance sets to minimize the average waiting time of all tasks defined in (2), while satisfying the security requirement defined in (3).

2.3 Adversary model

In this paper, we try to protect the following private data: the location and speed of workers, and the location of tasks. All these private data should not be disclosed to unauthorized parties during the procedure of task assignment. To accurately describe the ability of unauthorized parties, we adopt the well-known semi-honest model [9]. In this model, each party will stick to a pre-defined protocol, showing the honest aspect. On the other hand, each party will try to derive extra information from what it received in the execution of the protocol, showing the dishonest aspect. The security under the semi-honest model can be formally defined as follows:

Definition 8 (Security under Semi-honest Model 9)

Suppose that \(\mathcal {F}(x_{1},\cdots ,x_{n}){}={}(\mathcal {F}_{1},\cdots , \mathcal {F}_{n})\) is a functionality computed by n parties jointly, where xi and \(\mathcal {F}_{i}\) are the input and output of the i-th party (1 ≤ in). For \(\mathcal {I} = \{{i_{1}},\cdots ,{i_{\kappa }}\} \subset \{1, \cdots , n\}\), we let \(\mathcal {F}_{\mathcal {I}}\) denote the subsequence \(\mathcal {F}_{i_{1}},\cdots , \mathcal {F}_{i_{\kappa }}\). Consider a protocol for computing \(\mathcal {F}\). The view of the i-th party during an execution of this protocol, denoted as VIEWi, is (xi, y, mi) where y represents the outcome of the i-th party’s internal coin tosses (i.e., a random integer) and mi represents the messages that the party has received. In other words, VIEWi is all the data that the i-th party can observe during the execution of the protocol. Let \(VIEW_{\mathcal {I}}\overset {\text {\tiny {def}}}{=}(\mathcal {I}, VIEW_{i_{1}}, \cdots , VIEW_{i_{\kappa }})\). Then, we say that the protocol securely computes \(\mathcal {F}\) if there exists a polynomial-time algorithm, denoted as \(\mathcal {A}\), such that for every \(\mathcal {I}\) above

$$ \mathcal{A}(\mathcal{I}, (x_{i_{1}}, \cdots, x_{i_{\kappa}}, \mathcal{F}_{\mathcal{I}})) \overset{\tiny{C}}{=}\!VIEW_{\mathcal{I}}, $$
(3)

where \(\overset {\tiny {C}}{=}\) denotes computational indistinguishability.

To achieve privacy-preserving, we introduce a semi-honest crypto cloud provider (CCP) who holds secret keys and provides crypto services. We assume the CCP and the SC platform do not collude by observing that both SC platforms (e.g., DiDi and Uber) and CCPs (e.g., pCloud Crypto and Boxcryptor) are typically run by large companies. Clearly, it is unlikely for a CCP and an SC platform to collude as it will damage their reputation which in turn affects their revenues. Therefore, this assumption has been widely used recently, for example, in [1, 7, 17, 19, 20].

2.4 Cryptosystems

The privacy-preserving property of our protocol is built on several well-known cryptosystems: PRG [24] and Paillier [22]. The details of PRG and Paillier can be found in the given references and all of them are proved to be secure. Here we only emphasize some important properties of these cryptosystems.

Pseudo-random number generator is an algorithm which can be used to generate numbers by a hash function. For Paillier, their encryption are denoted as Ep. In the same way, their decryption are denoted as Dp. This cryptosystem has some important properties and we listed them here in below:

  • Homomorphic Properties of Paillier: Given two messages m1 and m2, we encrypt them and then we have:

    $$ E_{p}(m_{1})E_{p}(m_{2}) = E_{p}(m_{1}+m_{2}). $$
    (4)

    Beyond that, given a message m, we encrypt it and then we have:

    $$ E_{p}(m)^{k} = E_{p}(km). $$
    (5)

3 User-experience-driven secure task assignment

USTA is essentially a global optimization problem. Generally, the optimal solution cannot be found unless we have a global view of the problem. In practice, workers and tasks enter into the SC platform dynamically. Obviously, the SC platform cannot have a global view of the problem. At any time, it only has a local view of available tasks and workers with their reported locations. It is therefore impossible for the SC platform to perform the optimal task assignment. As a result, we adopt a local optimal task assignment strategy, that is, we consider the k rounds of task assignment in USTA to be independent and for each round our objective is minimizing the average waiting time of all tasks in that round. Note that, this strategy is not new and has already been used in [11]. Solutions found by this strategy are typically sub optimal. To improve the quality of solutions, some advanced task assignment strategies, for example, those considering future tasks and workers [4, 31] have been proposed recently. In this paper, however, we only consider this simple local optimal strategy as it helps us focus on privacy-preserving during task assignment which is the main contribution of our work.

Each round of task assignment in USTA can be reduced to the weighted bipartite matching problem as follows. In round k, suppose Uk and Sk are the set of workers and tasks that are available respectively. Let Gk = (Vk, Ek) be a bipartite graph with Vk as the set of vertices and Ek as the set of edges. Each worker ui in Uk maps to a vertex vi in Vk and each task sj in Sk maps to a vertex \(v_{|U_{K}|+j}\) in Vk. If a worker uiUk can be assigned to a task sjSk, an edge eij connecting the vertex vi to the vertex \(v_{|U_{K}|+j}\) is added to Ek. Besides, eij is associated with a weight wij which is the travel time of worker ui to task sj, that is, tij defined in (1). Clearly, GK is a weighted bipartite graph, so finding a task assignment instance set to minimize the average waiting time of all tasks in round k is equivalent to solve the matching problem on Gk.

Following the above frame, the weighted bipartite matching problem can be solved by the classic Hungarian algorithm (also known as the Kuhn-Munkres (KM for short) algorithm [12, 21]). In our problem setting, however, the weights of the edges in Gk cannot be obtained directly due to privacy concerns. More specifically, the location of tasks, and the location and speed of workers, should not be disclosed to the unauthorized parties, such as the SC platform. The weights of the edges in Gk also should be encrypted because they are intermediate results of the calculation. Next, we will introduce two construction methods of encrypted Gk which aims to realize privacy-preserving, and named them EG for encrypted exact graph construction and AG for encrypted approximate graph construction respectively. After that, we will introduce the secure KM algorithm (denoted as SKM) on the encrypted bipartite graph Gk which aims to solve the task assignment problem and get the task assignment instance set Ik in round k. In particular, we combined the following two construction methods EG and AG with the secure KM algorithm SKM and named them SKM-EG and SKM-AG.

3.1 Construction of encrypted bipartite graph

3.1.1 Encrypted exact bipartite graph (EG)

All available workers and tasks are hold by the SC platform, so it is easy for SC platform to obtain Vk, the set of vertices in Gk. If a worker ui can be assigned to a task sj, there is a corresponding edge eij in Gk and this edge’s weight is tij which is the time ui takes to travel to the location of sj. To calculate tij, the SC platform needs to know dij, the distance between ui and sj, and vi, the speed of ui. Unfortunately, vi is the private information of ui. Furthermore, to calculate dij, the SC platform needs to know the locations of ui and sj, which are also private. To enable travel-time computation without disclosing private information, we adopt homomorphic encryption-based methodology. On one hand, we protect private data by encrypting them, so any party without secret key cannot learn anything from the encrypted data. On the other hand, it is feasible to calculate the distance and traveling time on ciphertext with the homomorphic property of encryption systems.

Secure distance computation is a common problem in the security domain. Here we follow our previous work [17] and [32] and use Paillier cryptosystem [22] to encrypt private data. Specifically, ui and sj do not report their locations to the SC platform directly. Instead, sj uses the public key of CCP to encrypt its location and the encrypted location is forwarded to ui via the SC platform. Using the homomorphic addition property, ui can calculate his/her squared Euclidean distance from sj as follows:

$$ E(d^{2}_{ij})=E((l_{ix})^{2}+(l_{iy})^{2})E((l_{jx})^{2}+(l_{jy})^{2})E(l_{jx})^{-2l_{ix}}E(l_{jy})^{-2l_{iy}} $$
(6)

In the above equation, E((ljx)2 + (ljy)2), E(ljx), and E(ljy) constitute the encrypted location of sj. In terms of computation cost, sj needs to perform encryption three times. In contrast, ui only needs to do encryption one time to encrypt (lix)2 + (liy)2.

With the help of the computation of workers, the SC platform can easily obtain \(E(d^{2}_{ij})\) for every possible task assignment pair (ui, sj). Finding square root over encrypted values is not supported by Paillier, so the SC platform needs to turn to CCP for help. To prevent CCP from knowing real values of distance, the SC platform needs to send the encrypted message to the CCP and then get return. When the SC platform send message to CCP, it must satisfying

$$ a^{*} = \alpha(a) + \beta $$
(7)

where α and β are two numbers randomly selected from prime field \(\mathcal {Z}_{q}\), and a is the encrypted message which is sent from SC to CCP. Specially, the SC platform encrypt the value \(E(d^{2}_{ij})\) again with the (7) and the secret key of Paillier as \(E(\alpha (d^{2}_{ij}) + \beta )\). Then the real value dij can be obtained by subtracting the random noise from the decryption result. Based on this real distance and his/her speed, ui can calculate tij and send E(tij) to the SC platform. Once the SC platform receives the data from all workers, it has all elements of Gk and can run secure KM algorithm for task assignment. We will present secure KM algorithm in Section 3.2.

3.1.2 Encrypted approximate bipartite graph (AG)

In terms of security, the method of constructing encrypted bipartite graph presented in the last section is intricate. To further improve the security of bipartite graph construction, we present another method in this section. Instead of computing the exact value of tij over ciphertext, we consider an approximate value of it and construct an approximate bipartite graph Gk. This is motivated by the fact that the square root of an encrypted value is hard to evaluate, and consequently, the distance between two objects is usually approximated by its square in the security domain. In some cases, SC platform perform task assignment by approximation result may result in sub optimal results. Therefore, we should design a good approximation so that we can obtain satisfied task assignment on the approximate bipartite graph. With sacrificing a little bit of accuracy, we can still ensure that strong security can be achieved during the p.

Our task assignment approximation strategy is based on the following lemma:

Lemma 1

Let\(U = \left \{u_{1}, \cdots , u_{n}\right \}\)be a set of n workers,\(S = \left \{s_{1}, \cdots , s_{m}\right \} \)be a set of m tasks, and\(D =\left \{d_{11}, \cdots , d_{1m}, d_{21}, \cdots , d_{nm}\right \}\)be the distances between uiand sj, Vlcmbe the Least Common Multiple of all workers’ speed, and\(v_{i}^{\prime } = V_{lcm}/v_{i}\) where 1 ≤ in. For any two different workers ui, ukU and two different tasks sj, slS, dij/vi < dkl/vkholds if\(d_{ij}v_{i}^{\prime } < d_{kl}v_{k}^{\prime } \).

Proof

\(d_{ij}v_{i}^{\prime } < d_{kl}v_{k}^{\prime } \Longleftrightarrow d_{ij}v_{i}^{\prime }/V_{lcm} < d_{kl}v_{k}^{\prime }/V_{lcm} \Longleftrightarrow d_{ij}/v_{i} < d_{kl}/v_{k}\). □

The above lemma tells us the time inequation still holds when the scaled speed \(v_{i}^{\prime }\) is used. This is important as our task assignment is based on travel time. Furthermore, division operation is not needed when comparing travel times. Instead, we only need to compute the product of two numbers, which can be supported by the Paillier cryptosystem. Based on lemma 1, we propose \(t^{\prime }_{ij}\), an approximation for tij, as follows:

$$ t^{\prime}_{ij}=d_{ij}^{2}v^{\prime2}_{i}=d_{ij}^{2}V_{lcm}^{2}/{v^{2}_{i}}. $$
(8)

As discussed in the last section, every user can compute his/her squared Euclidean distance from a task over ciphertext directly. According to (8), the encrypted approximated travel time will be send to SC platform for graph construction and it can be calculated by user ui as follows:

$$ E(t^{\prime}_{ij}) = E(d^{2}_{ij})^{v^{\prime2}_{i}} $$
(9)

The value of \(v^{\prime 2}_{i}\) is straightforward as long as ui knows Vlcm, which also needs to be calculated securely. To compute Vlcm securely, we adopt an aggregation protocol denoted as AP [14] which can calculate the sum of multiple numbers in a privacy-preserving manner. It works as follows:

figure a
Key generation::

Let X be a set of nc random numbers where n is the number of workers and c is a random number. Then, divide X into n random disjoint subsets Xi with c numbers and define \(M = 2^{\left \lceil \log _{2}{n{\Delta }}\right \rceil }\) where Δ is maximum value of workers’ data. At last, send ki to ui and the sum k0 to the SC platform where \(k_{i} = \left ({\sum }_{v^{\prime }\in V_{i}}v^{\prime }\right )\bmod {M}\) and \(k_{0} = \left ({\sum }_{v^{\prime }\in V}v^{\prime }\right ){}\bmod {M}\).

EncryptionEa::

For each worker ui, he/she encrypts data mi by computing:

$$ c_{i} = (k_{i} + m_{i}){\kern2.2pt}\bmod{M} $$
(10)
DecryptionDa::

The SC platform can decrypt the sum by computing:

$$ V\left( \sum\limits_{i=1}^{n} m_{i}\right) = \left( \sum\limits_{i=1}^{n} c_{i} - k_{0}\right){\kern2.2pt}\bmod{M} $$
(11)

Based on a credible assumption that the maximal worker’s speed is limited and known to all, we explain the Algorithm 1 as follows: In line 1 and 2, exclusion algorithm is performed to get the list L of 2-tuples < p, cp > whose complexity is \(O(n \log (\log n))\). For example, our maximal speed is 10. Then 3 is one prime where 3 < 10, and its maximal times is 2 for 32 ≤ 10. So the tuple < 3, 2 > will be inserted into the list. Besides, every worker calculates the factorization Fi of his own speed vi by Pollard’s rho algorithm whose complexity is \(O(n^{\frac {1}{4}})\). For example, the factorization F of a worker(vi = 6) is F = 2 ∗ 3 for 6 = 2 ∗ 3. Based on the list L, the AP generates \({\sum }_{p \in L} p*(c_{p}+1)\) different keys for same key may disclose workers’ speed in line 3. In line 4 to 7, each worker ui generates his flag data f[k](k ∈ [0, cp]) as follows:

$$ f[k] = \left\{ \begin{array}{lr} 1, & AT[p] = k\\ 0, & otherwise \end{array} \right. $$
(12)

where AT[p] is the appearance times of p in the corresponding Fi. Then, encrypts and sends flag data. In the above examples, when p = 3, this worker(vi = 6) generates these flag data f[0] = 0, f[1] = 1, f[2] = 0. In line 9 to 14, the LCM is computed by \(V_{lcm} = {\prod }_{p\in L}p^{H}\). For example, the factorization of another worker(vi = 9) is 3 ∗ 3. If p = 3, this worker generates flag data f[0] = 0, f[1] = 0, f[2] = 1. So the maximal times of 3 is 2 for the decrypted sum of f[2] meets the condition in line 12. Meanwhile, the maximal times of 2,5,7 are 1,0,0 respectively. So Vlcm = 21 ∗ 32 ∗ 50 ∗ 70 = 18 will be returned.

3.2 Secure KM on encrypted bipartite graph (SKM)

3.2.1 KM algorithm

In this section, KM algorithm is adopted to solve the matching problem on bipartite graph. But before we present how to run secure KM algorithm on encrypted bipartite graph (SKM), we will give a brief introduction to KM firstly. We start from some basic definitions of KM algorithm.

Definition 9 (Feasible Vertex Labeling)

Let l(v) be the vertex labeling of vertex v and eij be the weight of edge between vi and vj. Given a weighted bipartite graph G, if we have l(vi) + l(vj) ≥ eij for each edge, the vertex labeling l is a feasible vertex labeling.

Definition 10 (Tight Edge)

Given a weighted bipartite graph G and feasible vertex labeling l, if l(vi) + l(vj) = eij, edge eij is a tight edge.

Definition 11 (Subgraph of Tight Edges)

Given a weighted bipartite graph G = {V, E} and feasible vertex labeling l, let El = {eijE|l(vi) + l(vj) = eij}. Subgraph of tight edges is a graph Gl which consists of edge set El and corresponding vertex.

Definition 12 (Augmenting Path)

An augmenting path has the following properties: 1) all edges in augmenting path are tight edges; 2) The number of edges included in augmenting path is odd, and the number of odd numbered edges are one more than the number of even numbered edges; 3) The vertices in augmenting path start from the vertex corresponding to the worker and end at the vertex corresponding to the task, and appear in this two kinds of vertices alternately; 4) There is no repeated vertices in augmenting path; 5) Both the starting and finishing vertices in augmenting path are not included in the selected matching pairs, while the other vertices belong to the selected matching pairs.

To find the perfect match where all the workers are assigned to one task with minimum weight in total, KM starts by initializing all the vertex with feasible vertex labeling l. Then it starts from vertex v1 and iterates over all the vertexes to find the augmenting path and inserts all the founded worker-task match into the subgraph Gl accordingly. Everytime before iteration over vertexes, it also updates the vertex labeling l based on the following rules:

$$ l^{\prime}(v) = \left\{\begin{array}{ll} l(v)+d , & v \in U \cap P^{\prime}\\ l(v)-d , & v \in S \cap P^{\prime}\\ l(v) , & else \end{array}\right. $$
(13)

where v is a vertex in graph Gl and \(P^{\prime }\) is the failing augmenting path corresponding to the minimum d = min{li + ljeij}. The iteration stops when all the workers are assigned with one task and the matching result is included in subgraph Gl.

3.2.2 Secure KM algorithm

It is noteworthy that there are three main operations in the KM algorithm: initialize feasible vertex labeling, determine if it is tight edges and modify the vertex labeling. Specifically, we need to initialize the vertex labeling by selecting the maximum edge weight associated with each vertex at the beginning of algorithm. Besides, we also need to determine whether the edge weight is equal to the sum of corresponding vertex labeling. Furthermore, we need to choose the d = min{li + ljeij} and modify the vertex labeling by addition and subtraction operation. All these operations are straightforward in plaintext. However, here we only have the encrypted bipartite graph where all the weights are in ciphertext which makes the implementation very challenging, especially the maximum selection and compare operation.

To implement secure max, min and compare operations without compromising participator’s privacy, we employ CCP’s capability of decryption. A general idea can be that SC sends the encrypted message like a and b to CCP, CCP decrypts them and do the operation over the decrypted message. And eventually returns the result in ciphertext to SC. However, since the encrypted message can be sensitive information which should be kept privacy from CCP too, it is inappropriate to send them directly. Instead, we disguise all the message with random values as x = xαEp(β) which can be easily achieved by homomorphic encryption. By sending the disguised value to CCP, the privacy of original data is protected. And also, since all the values are disguised with same random value, the relationship (max, min, compare) is still hold. For example, SC wants the min value of two encrypted messages a and b, instead of raw value, it sends disguised values a = aαEp(β) and b = bαEp(β) to CCP, where α and β are generated according to the (7). With decryption, CCP can easily select the min value (a for example) and return its ciphertext back to SC. Thus, SC can obtain the min value (a) by comparing the returned ciphertext locally.

With the defined secure max, min and compare operations, we can go forward and design the secure KM algorithm. The detail of secure KM is shown in Algorithm 2. Step 1 reads in the bipartite graph Gk as eij = Ep(wij) where jm, in and m = n. Ep(wij) is calculated as follows:

$$ E(w_{ij}) = \left\{ \begin{array}{lr} E(-\infty) , & u_{i} \in LW(k)\\ E(-\infty), & s_{i} \in LS(k)\\ E(-t_{ij}^{\prime2}) or E(-t_{ij}), & else. \end{array} \right. $$
(14)

where LW(k), LS(k) is the set of logic workers and tasks. Here, we generate logic workers or tasks to equal the number of worker and task for further applying of KM algorithm. Xvisit and Yvisit initialized as False (step 3 and 7) are used for recording whether or not the vertex has been accessed during the process of finding augmenting path. Step 4 and 8 initialize the feasible vertex labeling of part X (worker vertex) as the ciphertext of maximum traveling time which is connected to the corresponding vertex and initialize the feasible vertex labeling of part Y (task vertex) as the ciphertext of 0. In step 5, we set the match array as 0 for that the corresponding worker is not matched to the task. In step 9 to 30, we continuously expand the subgraph of tight edges by traversing the worker vertex. Specifically, in each round of expanding the subgraph of tight edges, we initialize the difference value of each task less as the ciphertext of infinity in step 11. Then, we repeat the process of finding the augmenting path for worker i from step 12 to 30. During this process, we running the SecAP algorithm (details in algorithm 3) to get the maximum perfect match for the current subgraph of tight edges in step 17. If we cannot find the maximum perfect match, we will update the feasible vertex labeling (from step 19 to 30) and then repeat to find the augmenting path by SecAP algorithm. For the feasible vertex labeling X, Y, the difference of each path diff and less and the edge weights which are involved in algorithm 3 step 5 are ciphertext of Paillier, we calculate and modify them based on the homomorphic properties of Paillier. What’s more, algorithm 3 SecAP is a secure algorithm to find augmenting path (maximum perfect match) recursively.

figure b
figure c

4 Security and complexity analysis

4.1 Security analysis

Theorem 1

Our SKM-EG model is allowed to be privacy-preserving with K0 = Vlcm, \(K_{-1} = \{V_{lcm}, \alpha (d^{2}_{ij}) + \beta , \alpha (w_{ij}) + \beta \}\)and Ki = Vlcm(1 ≤ in) extra knowledge.

Proof

We firstly consider the privacy of SC platform. For SC, the observed view is \(V_{0} = \{ E_{p}(s_{j}), E_{p}(t_{ij}), E_{p}(t^{\prime }_{ij}), V_{lcm}\}\). We also assume there is a probabilistic polynomial-time simulator P0 that generates \( V^{\prime }_{0} = \{ E_{p}(s_{j}), E_{p}(x_{i}), E_{p}(y_{i}), V_{lcm} \}\) where sj is a serial number defined by SC platform randomly and xi(1 ≤ imn) and yi(1 ≤ imn) are random numbers uniformly distributed in \(\mathbb {Z}_{N}\). As Paillier is secure, it is clear that view V0 is indistinguishable from view \(V^{\prime }_{0}\), thus, the privacy of SC is preserved.

Next we analyze each worker ui with Ki = Vlcm(1 ≤ in). The view for each worker ui is Vi = {Vlcm, Ep((ljx)2 + (ljy)2), Ep(ljx), Ep(ljy)} and the view generated by simulator Pi is \(V^{\prime }_{i} = \{ V_{lcm}, E_{p}(x_{1}), E_{p}(x_{2}), E_{p}(x_{3}) \}\) where xi(i = 2, 3) are random numbers follow Gaussian \(\mathcal N(500, 400^{2})\) and x1 is the sum of the square of x2 and x3. Based on the semantic security of Paillier, we can easily verify that \(V_{i} \equiv V^{\prime }_{i} (1 \leq i \leq n)\), thus, the privacy of each worker is also preserved.

Finally, we analyze the privacy of CCP u− 1 with \(K_{-1} = \{V_{lcm}, \alpha (d^{2}_{ij}) + \beta , \alpha (w_{ij}) + \beta \}\). The view observed by CCP is \(V_{-1} = \{ V_{lcm}, \alpha (d^{2}_{ij}) + \beta , \alpha (w_{ij}) + \beta \}\). We also assumed that there is a probabilistic polynomial-time simulator P− 1 generating \(V^{\prime }_{-1} = \{ V_{lcm}, \alpha (x_{i}) + \beta , \alpha (y_{j}) + \beta \}\) where each xi(1 ≤ imn) and and each yj(1 ≤ jmn) are the squares of the distance which are computed from two random numbers following Gaussian \(\mathcal N(500, 400^{2})\). What’s more, α and β are two numbers randomly selected from prime field \(\mathcal {Z}_{q}\). As the randomness of α and β and also the security of (7), V− 1 is indistinguishable from \( v^{\prime }_{-1}\) which means the privacy of CCP holds.

Based on the above proofs, our SKM-EG model is secure with K disclosure where has neglected effects on individual privacy. □

Theorem 2

Our SKM-AG model is allowed to be privacy-preserving with K0 = Vlcm, K− 1 = {Vlcm, α(wij) + β} and Ki = Vlcm(1 ≤ in) extra knowledge.

Proof

Here, we focus on the security proof of CCP as the security proof of the SC platform u0 with K0 = Vlcm and every worker ui with Ki = Vlcm(1 ≤ in) are similar to the proof of Theorem 1. For CCP u− 1 with K− 1 = {Vlcm, α(wij) + β}, the view is V− 1 = {Vlcm, α(wij) + β}. We also assume there is a probabilistic polynomial-time simulator P− 1 that generates \(V^{\prime }_{-1} = \{ V_{lcm}, \alpha (x_{i}) + \beta \}\) where each xi(1 ≤ imn) is the square of the distance which is computed from two random numbers following Gaussian \(\mathcal N(500, 400^{2})\). Since α and β are two random numbers selected from prime field \(\mathcal {Z}_{q}\), and the security of (7) holds, thus it’s easy to prove that V− 1 is indistinguishable from \(v^{\prime }_{-1}\) and the security of CCP holds.

Based on the above proofs, our SKM-AG model is secure with K disclosure where has neglected effects on individual privacy. □

4.2 Complexity analysis

In our system, every worker computes and communicates in parallel. Ignoring some cheap operations, we will analyze the complexity of SKM-EG and SKM-AG from aspect of SC platform, CCP, and each worker. Li (i = p, a)is the key size of Paillier or AP encryption strategy. Due to the size of ciphertext by Paillier is larger than plaintext and the ciphertext by AP, we exclude the latter two from communication cost Ep, Ea, are the encryption and decryption operation of Paillier and AP. We assume that we have n workers and m tasks at round p. At each round, our system needs to run this flow for a round. The complexity of SKM-EG and SKM-AG in cycle p are shown in Tables 2 and 3.

Table 2 Computation Cost of SKM-EG and SKM-AG
Table 3 Communication Cost of SKM-EG and SKM-AG

5 Experiment study

5.1 Related algorithms introduction

There are many existing works on spatial crowdsourcing task assignment, most of them adopt greedy selection strategies. However, the spatial crowdsourcing models and problems in these works are different from ours and the existing algorithms cannot be chosen for comparison directly. In order to evaluate the effectiveness of our privacy-preserving task assignment strategies SKM-EG and SKM-AG, we summarize the basic idea of these works and design three task assignment algorithms based on different heuristic strategies for comparison: G-MP, G-MT and G-MD. Like the SKM-EG and SKM-AG strategies, these three algorithms are performing task assignment based on the encrypted bipartite graph. Specifically, G-MP strategy is selecting the minimum travel time task-worker pair based on sorted travel times iteratively. G-MT strategy is selecting minimum travel time worker for each task when it is released on the SC platform. G-MD strategy is selecting minimum travel distance worker for each task.

5.2 Experiment settings

We conduct our experiments on synthetic dataset. We will introduce the data setting in below. We use a 1000 ∗ 1000 2 dimensional space as working space. We generate workers and tasks that appear the SC platform dynamically. The location of them follows Gaussian \(\mathcal N(500, 400^{2})\) and the speed of workers follows Gaussian \(\mathcal N(5,2.5^{2})\). The arriving time of all workers and tasks follow Poisson distribution. What’s more, for comparing effectiveness, we also generate tasks and workers with their arriving time follows uniform distribution. The distance function dis is Euclidean distance function.

Two criteria are introduced to evaluate our proposed framework, namely computing time and average waiting time respectively. For computing time, we compare the construction ways of encrypted bipartite graph EG and AG and compare SKM-EG and SKM-AG strategies to those three greedy strategies which are mentioned in Section 5.1. For average waiting time, we compare the results which are produced by our SKM-EG, SKM-AG strategies and above three greedy strategies.

Tables 4 and 5 summarize the parameters in comparisons. In our simulation, we set the number of workers and tasks as 100,200,300,400,500 and the time interval τ are 2,5,10,20,50. We assume that the time window of our system is 10.

Table 4 Evaluation Settings of Efficiency
Table 5 Evaluation settings of effectiveness

The algorithms are implemented in Python and the experiments are performed on a PC with i7-7700K CPU and 16G memory.

5.3 Experimental results

5.3.1 Efficiency

As shown in Figure 2a, the computing time of EG strategy is more than AG strategy. It is easy to explain that the EG strategy should send the encrypted edge weight to CCP for decryption and then CCP need to square it. What’s more, the secondary infilling for communication also take some time. Besides, we could find that the running time of both EG and AG are short and we could apply it in practice.

Figure 2
figure 2

Effect of Workers/Tasks Number

Figure 2b shows the running time of Secure KM, G-MP, G-MT and G-MD. It is obvious that the running time of SKM is clearly longer than others. However, the running time is acceptable and practicability. We can observe that the running time of SKM when worker/task number is 100 is nearly equal to the worker/task number is 80. The reason is that it ran into a coincidence when the worker/task number is 100 and it does not have to do many rounds of calculations to get the assignment set. What’s more, our work solve the problem that key size cannot be set too long which was meet in [16]. We can compute the real traveling time without consider the problem that too much workers may lead to the overflow of result. In other words, the most important meaning for our work is to break through the speed limitation of Liu et al.’s framework, and we put this method into real-world practice.

5.3.2 Effectiveness

Figure 3a and b show the performance of our strategies on varying the number of workers and tasks. It is obvious that our SKM-EG and SKM-AG strategies perform much better than others and SKM-AG is very close to the local optimal solution SKM-EG. Specifically, in Figure 3a, we set the number of tasks as 300 and set the number of workers as 100 to 500. We observe that the average waiting time when task number is 300 is higher than others. To explain this, it is necessary for algorithms to match all of the workers and tasks and all of them could be matched because the number of workers is equal to the number of tasks. As a result, some remote workers and tasks are matched and it may cause a long waiting time. What’s more, when the worker number is less than 300, we can find that SKM-EG, SKM-AG and G-MP strategies perform better than others. The reason is that G-MT and G-MD do task assignment for each task when task arrived while task number is more than worker number and this could miss better matching for the loss of perspective. When the worker number is more than 300, G-MD performs worst shows that it is better for us to choose travel time instead of travel distance. In Figure 3b, we set the number of workers as 300 and the number of tasks as 100 to 500. The characteristics of the result are similar to Figure 3a whether the number of workers and tasks is equal or not. Finally, our SKM-AG strategy is not sensitive to the quantitative relationship between tasks and workers and it always performs well.

Figure 3
figure 3

Effect of Workers/Tasks Number

Figure 4a and b show the performance of our strategies on varying the time interval τ. We generate workers and tasks while their arriving time follow Poisson distribution and uniform distribution. Firstly, we can intuitively observe that our SKM-AG strategy is very close to the local optimal solution SKM-EG and performs much better than others. Secondly, when time interval τ is small, the average waiting time of all tasks is longer than the adjacent value. The reason is that only a few number of workers are available when time interval τ is small and it is inevitable that some of worker-task pairs have a long travel time. Thirdly, when time interval τ is large, the average waiting time of all task is longer than the adjacent value too. It is caused by that there exists many tasks which appears early but waits a long time for matching.Finally, both of the Figure 4a and b show that it is better for us to set time interval τ as 10 to 20 instead of too large or too small. The distribution of both tasks’ and workers’ arriving time has no effect on our SKM-AG strategy and it has similar performance in Figure 4a and b.

Figure 4
figure 4

Effect of time interval τ

6 Related work

In this section, we review representative work from three aspects, task assignment for spatial crowdsourcing, online matching problems and privacy protection in spacial crowdsourcing.

Spatial crowdsourcing is becoming prevalent in both research community (e.g., [2, 3, 28, 29]) and industry (e.g., Waze, DiDi). Task assignment is the core problem in spatial crowdsourcing [3,4,5,6, 11, 30, 31, 33,34,35, 38]. To our best knowledge, [11] firstly proposed task assignment problem in spatial crowdsourcing. Their objective is to maximize the total number of assigned tasks. Some follow up works also focused on the same objective such as [30]. In addition, some other works focused on how to do better task assignment and propose different constraints and goals for new application scenarios. [11] aims to find a maximum-cardinality matching with minimum total distance and [5] aims to maximize the reliability and diversity of finished tasks. In [38], the workers acceptance was maximized to improve the system throughput. In the real world scenario, both workers and tasks arrive platform dynamically. However, some of the earlier studies focused on static scenario. [10] first used an online model to describe the assignment process. Most of existing works consider the static scenario and the economic benefits for the crowdsourcing platform while we focus on the user experience and aim to find a maximum-cardinality matching with minimum total time between the matched worker-task pairs in the online scenario.

In [28], Tong et.al categorizes task assignment in spatial crowdsourcing into (static) offline and (dynamic) online scenarios. In online scenarios, there are two ways to deal with the workers and tasks which dynamically appear one by one in the physical space: (1) batch-based matching approach [3, 6, 11, 29]; (2) online matching approach (i. e. , workers/tasks are assigned as soon as they reach the platform [4, 31, 33, 35]). Our work utilizes batch-based matching approach in online scenarios and extends [3, 6, 11, 29] in the privacy protection aspect.

Privacy protection is an emerging issue in spatial crowdsourcing. It focuses on protecting the information (e.g., location, speed) of workers and tasks in dynamic scenarios. State-of-the-art techniques about privacy protection are as follows: (1)Cloaked Locations-based protection technique (i.e., the location of workers is transformed as a cloaked area in [23]); (2)Differential Privacy-based protection technique, (i.e., workers send their locations to a trusted third party and the third party sanitizes the location of workers according to differential privacy techniques, such as [8, 26, 27, 37] ); (3)Encrypted Data-based protection(i.e., the exact distances between tasks and workers can be computed based on their encrypted locations in [15, 16, 18, 25]). Our work utilizes Encrypted Data-based protection technique to improve [16, 18, 25] in two aspects. First, [18, 25] is based on the encrypted data of workers’ location and tasks’ location to calculate the encrypted data of the distance, involving the add operation of encrypted data. However, our work needs us to calculate the encrypted data of the travel time , involving the ciphertext division operation. Second, [16] can only be small calculation, including homomorphic division operation on the ciphertext. While our division operation is based on the least common multiple ways, which can process large-scale computing.

7 Conclusion

In this paper, we have studied a novel task assignment problem in spatial crowdsourcing, named user experience-driven secure task assignment problem, which finds k task assignment instance sets in kτ time units to minimize the average waiting time of all tasks while satisfying the security requirement. We have presented two methods to construct encrypted bipartite graph to protect private data of both workers and users. We have also enhanced conventional KM algorithm and made it workable on encrypted graph. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approach.