Lightweight Privacy-Preserving Task Assignment in Skill-Aware Crowdsourcing

Béziaud, Louis; Allard, Tristan; Gross-Amblard, David

doi:10.1007/978-3-319-64471-4_2

Louis Béziaud^19,21,
Tristan Allard^19,20 &
David Gross-Amblard^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1194 Accesses
10 Citations

Abstract

Crowdsourcing platforms dedicated to work are used by a growing number of individuals and organizations, for tasks that are more and more diverse, complex, and that require very specific skills. These highly detailed worker profiles enable high-quality task assignments but may disclose a large amount of personal information to the central platform (e.g., personal preferences, availabilities, wealth, occupations), jeopardizing the privacy of workers. In this paper, we propose a lightweight approach to protect workers privacy against the platform along the current crowdsourcing task assignment process. Our approach (1) satisfies differential privacy by letting each worker perturb locally her profile before sending it to the platform, and (2) copes with the resulting perturbation by leveraging a taxonomy defined on workers profiles. We overview this approach below, explaining the lightweight upgrades to be brought to the participants. We have also shown (full version of this paper [1]) formally that our approach satisfies differential privacy, and empirically, through experiments performed on various synthetic datasets, that it is a promising research track for coping with realistic cost and quality requirements.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Privacy-preserving worker allocation in crowdsourcing

Article 16 January 2022

From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Article Open access 02 October 2017

Keywords

1 Introduction

Crowdsourcing platforms are disrupting traditional work marketplaces. Their ability to compute high-quality matchings between tasks and workers, instantly and worldwide, for paid or voluntary work, has made them unavoidable actors of the 21$^{st}$ century economy. Early crowdsourcing platforms did not (and still do not) require strong and specific skills; they include for example Amazon Mechanical Turk^{Footnote 1} (for online micro-tasks), Uber^{Footnote 2} (for car-driving tasks), or TaskRabbit^{Footnote 3} (for simple home-related tasks—e.g., cleaning, repairing). Today’s crowdsourcing platforms now go one step further by addressing skill-intensive contexts (e.g., general team building^{Footnote 4}, collaborative engineering^{Footnote 5}) through the collection and use of fine-grained worker profiles. Such platforms carry the promise to facilitate, fasten, and spread innovation at an unprecedented scale.

However abusive behaviors from crowdsourcing platforms against workers are frequently reported in the news or on dedicated websites, whether performed willingly or not (see, e.g., the privacy scandals due to illegitimate accesses to the geolocation data of a well-known drivers-riders company^{Footnote 6}, or the large-scale exposure of workers’ identifying and sensitive information—e.g., real name, book reviews, or wish-list— through Amazon Mechanical Turk IDs [8]). The problem is even more pregnant with skill-intensive crowdsourcing platforms since they collect detailed workers’ profiles for computing highly accurate matchings (e.g., demographics, encompassive set of skills, detailed past experiences, personal preferences, daily availabilities, tools possessed). We advocate thus for a sound protection of workers’ profiles against illegitimate uses: in addition to the necessary compliance with fundamental rights to privacy, it is a precondition for a wide adoption of crowdsourcing platforms by individuals.

Computing the assignment of tasks to workers is the fundamental role of the platform (or at least facilitating it). This paper considers precisely the problem of computing a high-quality matching between skill-intensive tasks and workers while preserving workers’ privacy. To the best of our knowledge, this problem has only been addressed by a single recent work [6]. However, this work is based on costly homomorphism encryption primitives which strongly hamper its performances and prevent it to reason about skills within the assignment algorithm (e.g., no use of semantic proximity).

We propose an approach (see Fig. 1) that addresses these issues by making the following contributions:

1.
A simple skills model for a worker’s profile: a bit vector and a taxonomy.
2.
An algorithm run independently by each worker for perturbing her profile locally before sending it to the platform. By building on the proven randomized response mechanism [3, 11], this algorithm is privacy-preserving (provides sound differential privacy guarantees [5]), and lightweight (no cryptography, no distributed computation, only bitwise operations).
3.
A suite of weight functions to be plugged in a traditional assignment algorithm run by the platform and dedicated to increase the quality of matchings performed over perturbed profiles. Our weight functions reduce the impact of the perturbation by leveraging the skills taxonomy, vertically and horizontally, averaging the skills according to their semantic proximity in order to reduce the variance of the differentially-private perturbation. The variance reduction is mathematically sound and does not jeopardize privacy.
4.
An experimental study (see the full version [1]), over a synthetic taxonomy and various synthetic datasets, that shows promising preliminary results about the practical adequacy of our approach from the sides of performance and quality.

For space reasons, we give in this paper an overview of our approach. We refer the interested reader to the full version of our work [1] that describes our approach in details, presents its experimental results, and positions it with respect to related work.

The rest of the paper is organized as follows. Section 2 introduces the notions used in our approach and defines more precisely the problem we tackle. We overview our algorithms in Sect. 3 and conclude in Sect. 4 outlining interesting future works.

2 Problem Definition

Skills and Participants. The set of skills that can be possessed by a worker (resp. requested by a task) is denoted $\mathcal {S}$. A worker’s profile $p_i \in \mathcal {P}$ (resp. a task $t_i \in \mathcal {T}$) is represented by a bit vector, i.e., $p_i = {\{0, 1\}}^{|\mathcal {S}|}$, where each bit corresponds to a skill $s_j \in \mathcal {S}$ and is set to 1 if the given worker has the given skill (resp. the given task $t_i = {\{0, 1\}}^{|\mathcal {S}|}$ requests the given skill). Without loss of generality, we consider that each requester has a single task and that the number of workers and requesters is the same (i.e., $|\mathcal {P}|=|\mathcal {T}|$). Furthermore, we assume that a skills taxonomy $\mathcal {S}_T$ exists^{Footnote 7} [9], structuring the skills according to their semantic proximity, and is such that the skills in $\mathcal {S}$ are the leaves of $\mathcal {S}_T$ (i.e., no non-leaf node can be possessed nor requested). The non-leaf nodes of the taxonomy are called super-skills (Fig. 2).

The platform is essentially in charge of intermediating between workers and requesters. The workers’ profiles are considered private while the requesters’ tasks are not. The platform holds the set of workers’ profiles, perturbed to satisfy differential privacy (defined below) and denoted $\widetilde{\mathcal {P}}$, as well as the exact set of requesters’ tasks $\mathcal {T}$. All participants, i.e., workers, requesters, and the platform, are considered to be honest-but-curious. This means that they participate in the protocol without deviating from its execution sequence (e.g., no message tampering, no data forging) but they will try to infer anything that is computationally-feasible to infer about private data (i.e., the set of workers’ non-perturbed profiles $\mathcal {P}$).

The Traditional Tasks-to-Workers Assignment Problem. In a traditional context, where workers’ profiles are not considered private, the objective of the crowdsourcing platform is to assign a worker to each task such that the overall expected quality is maximized. This well-known combinatorial optimization problem is referred as the assignment problem and can be expressed as a standard linear problem [7] (assuming $|\mathcal {T}| = |\mathcal {P}|$). Assignment algorithms rely on a weight function $\mathtt {C}:\mathcal {T}\times \mathcal {P}\rightarrow \mathbb {R}$ in charge of defining the cost of each assignment, i.e., the divergence between the requirements vector of a task and the skills vector of a worker. Common weight functions include the usual distance metrics (e.g., Hamming distance) or disimilarities (e.g., cosine distance). Since our approach is independent from the algorithm, we simply use the Hungarian method [7], a standard academic choice.

Security. We say that our approach is secure against honest-but-curious participants if and only if no participant learns information about the set of non-perturbed profiles $\mathcal {P}$ that has not been perturbed by a differentially-private mechanism, where differential privacy [4] - the current de facto standard model for disclosing personal information while satisfying sound privacy guarantees - is defined below. Differential privacy is self-composable [10] and secure under post-processing [5].

Definition 1

(Differential Privacy [4]). A randomized mechanism $\mathtt {M}$ satisfies $\epsilon $-differential privacy with $\epsilon >0$ if for any possible set of workers’ profiles $\mathcal {P}$ and $\mathcal {P}'$ such that $\mathcal {P}'$ is $\mathcal {P}$ with one additional profile (or one profile less), and any possible set of output $O \subseteq Range(\mathtt {M})$,

$$\begin{aligned} \Pr [\mathtt {M}(\mathcal {P}) \in O] \le e^\epsilon \times \Pr [\mathtt {M}(\mathcal {P}') \in O] . \end{aligned}$$

(1)

Quality. The inherent information loss due to the differentially-private perturbation impacts the quality of the worker-to-task assignment. This is the price to pay to satisfy a sound privacy model. We quantify this impact by measuring the relative increase of the assignment cost as well as the fraction of profiles that have all the skills required by the task to which they are assigned (see the full version for formal definitions [1]).

3 A Flip-Based Approach

This section overviews our approach. We first focus on the workers’ side: we describe the algorithm that we propose for perturbing each worker’s profile, show its adequacy to our context, and demonstrate that it complies with our security model. Second, we shift to the platform’s side. We explain how to reduce the impact of the differentially private perturbation (while still satisfying differential privacy) and we describe the assignment algorithm based on perturbed profiles. Finally, we overview technical means for letting workers fetch their assignment in a secure way in order to complete it. We refer the interested reader to the full version [1] for more details (including the formal proofs).

3.1 At a Worker’s Side: Local Perturbation

Building Block: Randomized Response. Randomized response [11] is a simple though powerful perturbation mechanism shown to satisfy differential privacy (see below). Basically, it inputs a single bit (e.g., the answer of an individual to a sensitive boolean question) and flips it randomly according to a well-chosen distribution probability. We describe below the variant called innocuous question that we use in this paper and show that it satisfies differential privacy^{Footnote 8}. Let $x \in \{0,1\}$ be a private value. The randomized response mechanism simply outputs the perturbed value of x, denoted $\widetilde{x}$, as follows:

$$\begin{aligned} \widetilde{x} = {\left\{ \begin{array}{ll} x &{}\text {with probability } 1-\Pr _{flip} \\ 1 &{}\text {with probability } \Pr _{flip}\times \Pr _{inno} \\ 0 &{}\text {with probability } \Pr _{flip}\times (1-\Pr _{inno}) \\ \end{array}\right. } \end{aligned}$$

where $Pr_{flip}$ depends on $\epsilon $ (see below) and $Pr_{inno} \in [0, 1]$. We use $\Pr _{inno} = 0.5$ in the rest of the paper, since it minimizes the variation of the estimated value of $x$ after perturbation [3].

Claim

For a given differential privacy parameter $\epsilon >0$, and a worker’s profile made of a single bit to be flipped, the innocuous question randomized response scheme satisfies $\epsilon $-differential privacy if $\Pr _{flip} = \frac{2}{1+e^\epsilon }$ (see [1] for the proof).

Flip Mechanism. Our Flip mechanism (Algorithm 1) essentially consists in applying the randomized response mechanism to each binary skill of a worker’s profile before sending it to the platform and inherits thus its high efficiency. The self-composability properties of differential privacy allow that by distributing $\epsilon $ over the bits of the skills vector.

Claim

The Flip mechanism satisfies $\epsilon $-differential privacy (see [1] for the proof).

3.2 At the Platform’s Side: Task Assignment

Efficient traditional assignment algorithms do not need any modification to work with perturbed profiles, which are bit vectors, exactly as non-perturbed profiles are. The main question is the impact on quality due to our perturbation, and this impact is naturally related to the weight function $\mathtt {C}:\mathcal {T}\times \mathcal {P}\rightarrow \mathbb {R}$ on which assignment algorithms rely. As there is no clear consensus on what is a good weight function for task assignment, in the sequel we recall several reasonable functions, ignoring or using the skill taxonomy. We also propose new weight functions and explain how they could cope with the differentially-private perturbation.

Existing Weight Functions. Numerous weight functions have been proposed as metrics over skills. The Hamming distance is a common choice to compute dissimilarity between two vectors of bits but it does not capture the semantics needed for crowdsourcing (e.g., a worker possessing all the skills has a high Hamming distance from a task requiring only one skill, although he is perfectly able to perform it). The weight function proposed in [9] adresses this problem based on a taxonomy. We slightly adapt it and call it the Ancestors weight function (AWF for short).

Definition 2

( $\mathtt {Ancestors}$ Weight Function (adapted from [9])). Let $d_{max}$ be the maximum depth of the taxonomy $\mathcal {S}^T$. Let $\mathrm {lca}(s, s') \in \mathcal {S}^T$ be the lowest common ancestor of skills $s$ and $s'$ in the taxonomy.

$$\begin{aligned} \mathtt {AWF}(t, \widetilde{p}) = { \sum _{s_i \in \widetilde{p}} { \min _{s_j \in t} \left( \frac{d_{max} - \mathrm {depth}(\mathrm {lca}(s_i, s_j))}{d_{max}} \right) } } \end{aligned}$$

(2)

Naive Skill-Level Weight Functions. The Missing weight function (MWF for short) between a worker and a task revisits the Hamming distance. It settles a task-to-worker assignment cost that is intuitive in a crowdsourcing context: it is defined as the fraction of skills required by the task that the worker does not have (see Definition 3).

Definition 3

( $\mathtt {Missing}$ Weight Function $\mathtt {(MWF)}$ ). $\mathtt {MWF}:\mathcal {T}\times \widetilde{\mathcal {P}}\rightarrow \mathbb {R}$ is defined as follows:

$$\begin{aligned} \mathtt {MWF} (t, \widetilde{p}) = \sum _{\forall i} t[i] \wedge \lnot \widetilde{p}[i] \end{aligned}$$

(3)

Leveraging the Taxonomy. In realistic contexts, the differentially private perturbation may overwhelm the information contained in the original profiles and make the perturbed profiles be close to uniformly random bit vectors. We cope with this challenging issue by building on the taxonomy $\mathcal {S}_\mathtt {T}$. Indeed, the taxonomy allows to group large numbers of skills according to their semantic proximity and to reduce the variance of the perturbation by using group averages [2].

Climbing Weight Function. The Climbing weight function (CWF for short) leverages the vertical relationship given by the taxonomy by averaging, for each profile, the skills along the root-to-leaf paths. In other words, before performing the assignment, the platform converts each perturbed profile into a tree, i.e., the same as the taxonomy $\textsc {S}_T$, and for each node n of the tree, it computes the mean of the skills that appear below n (interpreting the boolean values 1 and 0 as integers). For a given node, this mean is actually a rough estimator of the fraction of descendant skills possessed. We call it score below. During an assignment, given a task and a perturbed profile, the Climbing weight function consists essentially in computing the distance between the scores vector of the task and the scores vector of the profile at each level. Definition 4 formalizes the Climbing weight function.

Definition 4

( $\mathtt {Climbing}$ Weight Function $\mathtt {(CWF)}$ ). Let $v_i$ (resp. $u_i$) denote the scores vector at level i in the tree corresponding to the profile $\widetilde{p}$ (resp. to the task t), and $\mathtt {d} :\mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {R}$ be a classical distance function on real-valued vectors (e.g., Cosine). Then, $\mathtt {CWF}:\mathcal {T}\times \widetilde{\mathcal {P}}\rightarrow \mathbb {R}$ is defined as follows:

$$\begin{aligned} \mathtt {CWF} (t, \widetilde{p}) = \sum _{\forall i} i \times \mathtt {d} (u_i, v_i) \end{aligned}$$

(4)

Touring Weight Function. The Touring weight function (TWF for short) leverages the horizontal relationship given by the taxonomy, i.e., the neighbouring proximity degree between skills. As described in Definition 5, it returns the average path-length—according to the taxonomy $\mathcal {S}_T$—between the skills required by the task and the skills of the worker’s profile. The expected variance reduction comes from the average path-length of the full cartesian product between the skills required by a task and the skills set to 1 in a perturbed profile. The reduction depends on the taxonomy (similarly to Climbing) and on the number of skills averaged.

Definition 5

( $\mathtt {Touring}$ Weight Function $\mathtt {(TWF)}$ ). Let $\leadsto $ denote the path-length operator between two skills in the taxonomy $\mathcal {S}_T$. Then, $\mathtt {TWF}:\mathcal {T}\times \widetilde{\mathcal {P}}\rightarrow \mathbb {R}$ is defined as follows:

$$\begin{aligned} \mathtt {TWF} (t, \widetilde{p}) = \frac{ \sum _{\forall i} \sum _{\forall j} (t[i] \wedge \widetilde{p}[j]) \times ( s_i \leadsto s_j ) }{ \sum _{\forall i} \widetilde{p}[i] \times \sum _{\forall i} t[i] } \end{aligned}$$

(5)

3.3 Post-assignment Phase

Workers need a secure way to fetch their own assignment. This can be solved easily by well-known technical means. For example, the platform could post the assignments on the Web (e.g., to a dedicated webpage for each perturbed profile) so that each worker would then access it through a secure web browser (e.g., TOR^{Footnote 9}).

4 Conclusion

We have overviewed in this paper a lightweight privacy-preserving approach to the problem of assigning tasks to workers. Our approach allows each worker to perturb her skill profile locally in order to satisfy the stringent differential privacy model without any need for additional communication or computation cost. We have proposed novel weight functions that can be easily plugged in traditional centralized assignment algorithms, and that are able to cope with the differentially private perturbation by leveraging the presence of a skill taxonomy. Additionally, promising preliminary results of experiments performed over a synthetic taxonomy and synthetic datasets are presented in the full version of the paper [1]. Future works include consolidating experiments (e.g., more profiles and tasks, alternative quality measures), use the taxonomy during the Flip mechanism, collecting a large-scale skill dataset, and continue exploring the performance/quality tradeoff by designing other profile perturbation strategies (e.g., collaborative perturbation protocols designed to minimize the perturbation).

Notes

1.
https://www.mturk.com/.
2.
https://www.uber.com/.
3.
https://www.taskrabbit.com/.
4.
https://tara.ai/.
5.
https://makake.co/.
6.
https://tinyurl.com/wp-priv.
7.
In practice, skills taxonomies concerning numerous real-life contexts exist today (see, e.g., the Skill-Project http://en.skill-project.org/skills/, or Wand’s taxonomies http://www.wandinc.com/wand-skills-taxonomy.aspx).
8.
Any other variant could have been used, provided that it satisfies differential privacy.
9.
https://www.torproject.org/.

References

Béziaud, L., Allard, T., Gross-Amblard, D.: Lightweight Privacy-Preserving Task Assignment in Skill-Aware Crowdsourcing (Full Version) (2017). https://hal.inria.fr/hal-01534682
Bienaymé, I.-J.: Considérations à l’appui de la découverte de Laplace sur la loi de probabilité dans la méthode des moindres carrés. Mallet-Bachelier, Imprim (1853)
Google Scholar
Blair, G., Imai, K., Zhou, Y.-Y.: Design and analysis of the randomized response technique. J. Am. Stat. Assoc. 110(511), 1304–1319 (2015)
Article MathSciNet Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1
Chapter Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
MathSciNet MATH Google Scholar
Kajino, H.: Privacy-Preserving Crowdsourcing. Ph.D. thesis, University of Tokyo (2016)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Lease, M., Hullman, J., Bigham, J.P., Bernstein, M.S., Kim, J., Lasecki, W., Bakhshi, S., Mitra, T., Miller, R.C.: Mechanical turk is not anonymous. SSRN Electron. J. (2013)
Google Scholar
Mavridis, P., Gross-Amblard, D., Miklós, Z.: Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing. In: Proceedings of WWW 2016, pp. 843–853 (2016)
Google Scholar
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of ACM SIGMOD 2009, pp. 19–30 (2009)
Google Scholar
Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Univ. Rennes 1, Rennes, France
Louis Béziaud, Tristan Allard & David Gross-Amblard
IRISA, Rennes, France
Tristan Allard & David Gross-Amblard
ENS Rennes, Rennes, France
Louis Béziaud

Authors

Louis Béziaud
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Allard
View author publications
You can also search for this author in PubMed Google Scholar
David Gross-Amblard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Louis Béziaud .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Béziaud, L., Allard, T., Gross-Amblard, D. (2017). Lightweight Privacy-Preserving Task Assignment in Skill-Aware Crowdsourcing. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-64471-4_2
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64470-7
Online ISBN: 978-3-319-64471-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lightweight Privacy-Preserving Task Assignment in Skill-Aware Crowdsourcing

Abstract

Similar content being viewed by others

Privacy-preserving worker allocation in crowdsourcing

From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Keywords

1 Introduction

2 Problem Definition

Definition 1

3 A Flip-Based Approach

3.1 At a Worker’s Side: Local Perturbation

Claim

Claim

3.2 At the Platform’s Side: Task Assignment

Definition 2

Definition 3

Definition 4

Definition 5

3.3 Post-assignment Phase

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Lightweight Privacy-Preserving Task Assignment in Skill-Aware Crowdsourcing

Abstract

Similar content being viewed by others

Privacy-preserving worker allocation in crowdsourcing

From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Keywords

1 Introduction

2 Problem Definition

Definition 1

3 A Flip-Based Approach

3.1 At a Worker’s Side: Local Perturbation

Claim

Claim

3.2 At the Platform’s Side: Task Assignment

Definition 2

Definition 3

Definition 4

Definition 5

3.3 Post-assignment Phase

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation