1 Introduction

The development of mobile Internet breeds rich location-based applications and related research [5, 8, 11, 14, 15, 18]. In recent years, the combination of sharing economy and mobile Internet is initiating many kinds of spatial crowdsourcing platforms such as Gigwalk (http://www.gigwalk.com), TaskRabbit (http://www.taskrabbit.com) and gMission [30], where requesters of tasks and workers log in and off dynamically at different time and places. A fundamental research topic on spatial crowdsourcing is task assignment [21, 24], i.e., assigning suitable workers to nearby tasks with different optimization objectives, such as maximizing the total utility [27], maximizing the number of completed tasks [26] and minimizing the total moving distance of the workers [29].

Existing works on task assignment mainly assume that all the tasks are atomic and homogeneous, which can be completed byany single worker, such as picking up and delivering things [31].

However, like the research in the web-based crowdsourcing [22], there are also some tasks in spatial crowdsourcing which are complex and structural and require a group of workers with different kinds of skills to work collaboratively. For example, a requester on gMission can publish a task whose content is to organize a party. Then, workers who are good at preparing the food, the drinks and the lights may be needed. Or, if the task is yard work, workers who can mow lawns, repair the fence and take care of the flowers are necessary.

Although there are some studies on assigning workers for complex tasks such as [23, 28], they only consider the static scenario. In other words, they assume that the information on the tasks and workers is known in advance, based on which task assignment is performed. However, in practice, the tasks and workers log in and out of the spatial crowdsourcing platform dynamically and we always perform task assignment based on the information released currently. In this paper, we study the O nline M ulti-skill-A ware T ask A ssignment problem (Fig. 1).

Fig. 1
figure 1

Locations of Tasks and Workers

We next use an example to demonstrate the above motivation.

Example 1

Suppose we have three tasks t1-t3 whose information is shown in Table 1. Specifically, each task has a required skill list, a location and a budget. In this example, all the three tasks require workers with the skills of mixing drinks (s1), preparing food (s2), preparing barbecue (s3), decorating venues (s4) and taking photos (s5). They have different locations. For example, t1 is located at the position (1,1). They all have budget of 50.

Table 1 Information of tasks

We also have six workers whose information is shown in Table 2. Specifically, each worker has a mastered skill list and the corresponding fee of each skill. They also have locations. For example, the worker w1 masters the skill of s1 (mixing drinks), s2 (preparing food) and s3 (preparing barbecue). If s/he is assigned to a task and the content of the work is to mix drinks (s1) and prepare barbecue (s3), s/he will be paid 3 + 10 = 13.

Table 2 Information of workers

The arriving and leaving time of the above tasks and workers is shown in Table 3. For example, w1 arrives at 17:05 and leaves at 17:45. Note that we cannot wait until all the tasks and workers arrive and perform the task assignment, as some tasks or workers have left by then. Thus, we always need to perform task assignment based on partial information (namely the tasks and workers that have arrived) and cannot change the decisions that have been made.

Table 3 Timeline

Motivated by the above example, we formalize the O nline M ulti-skill-A ware T ask A ssignment (OMATA in short) problem. In our problem, the tasks with requirement on different skills and the workers mastering different skills appear and leave dynamically. We need to perform task assignment before the tasks and workers leave to maximize the total utility, which is defined as the remaining budget of all tasks after paying the assigned workers.

Briefly, we make the following contributions.

  • We formally define the OMATA problem and prove that even its offline version is NP-hard.

  • Inspired by the research of online micro task assignment, we propose the Online-Exact algorithm, which always calculates the optimal assignment through brute force search for each task.

  • To release the high time complexity of Online-Exact, we propose the Online-Greedy algorithm which assigns workers for each task via an effective heuristics method.

  • We conduct experiments on both synthetic and real dataset to evaluate the proposed methods.

The following of the paper is organized as follows. We review the related work in Section 2, formalize our problem and analyze its hardness in Section 3. Then we propose our solutions in Section 4 and conduct experiments in Section 5. We finally conclude in Section 6.

2 Related work

We review related works from two categories: task assignment in SC and team formation in social networks.

2.1 Task assignment in spatial crowdsourcing

2.1.1 Micro-task assignment

The early works on task assignment in SC mainly focus on micro-tasks, which can be completed by any single worker. Kazemi and Shahabi [3] propose task assignment problem in spatial crowdsourcing under the offline scenario and the goal is to maximize the total number of the assigned tasks. Tong et al. [27] consider online scenario of task assignment and proposes algorithms with theoretical guarantees to maximize the total utility score of the assignment. Song et al. [12] also focus on the online scenario and takes the influence of work space into consideration. Tong et al. [29] study online task assignment to minimize the total moving distance of the workers. Tong et al. [26] propose a prediction-based method to solve the online task assignment problem. Liu et al. [2] present a privacy-preserving task assignment protocol which can protect the privacy for both workers and tasks with acceptable overheads.

Tao et al. [13] recommend routes for workers to maximize the total utility. Zeng et al. [16] assign tasks to workers while trading off quality and latency of task completion. Tong et al. [17] propose a match-based approach to solve the dynamic pricing problem in spatial crowdsourcing. Tran et al. [20] propose a real-time framework for task assignment.

The differences between the above works and this paper is that we focus on the scenario where tasks require specific skills and cannot be completed by single workers generally. Thus, the methods of above works cannot apply to our problem.

2.1.2 Macro-task assignment

We use macro-task to denote the tasks that have requirement on different skills. Gao et al. [4, 7] recommend top-k teams with or without leaders to a macro-task, and the goal is to minimize the total cost of the recommended teams. Song and Cheng et al. [23, 28] study assigning workers for specialty-aware tasks to maximize the total utility score.

The biggest difference between our work and the above studies is that in our paper we consider the dynamics of the tasks and workers. Specifically, the tasks and workers appear dynamically and their information cannot be known in advance. Thus, we must make decisions (assignments) based on current information. Besides, the assignments that have been made cannot be canceled. Otherwise, the user experience of both the workers and the requesters of the tasks may be hurt. Because of the above challenges, existing algorithms such as those in [23, 28] cannot be applied or extended trivially to solve it, as they have to be performed on all of the tasks and workers. But in our problem, this information is inaccessible.

2.2 Team formation problem

Another related topic is the team formation problem, where teams of experts with different skills are formed to complete tasks requiring multiple skills and the goal is often to find a qualified team with the minimal communication cost, which is defined based on the social graph of the experts [6, 9, 10, 19], such as the diameter and the sum of the weights of the minimum spanning tree of the team members’ social graph in [9], the communication cost as the sum of the shortest distances between the members and the leader in [19]. Anagnostopoulos and Majumder et al. [6, 10] also consider the workload balance among the members of a team.

The difference between our problem and the above works on the team formation problem and its variants is that (i) the core focus in our paper is the total utility of the assignment and thus we do not consider the social relationships between users; (ii) we aim at assigning workers with multiple tasks while the above works mainly focus on assigning a team of experts with a single task.

3 Problem definition

In this section, we first define our problem and then analyze its hardness via the offline version.

Definition 1 (Worker)

A worker w is defined as < ow,bw,ew,Sw,Pw >, which means a worker appears at the location ow at time bw and will leave the platform at ew if no task is assigned to her/him. \(S_{w}=\{{s^{w}_{1}}, {s^{w}_{2}}, \cdots , s^{w}_{|S_{w}|}\}\) is the set of skills that w masters and \(P_{w}=\{{p^{w}_{1}}, {p^{w}_{2}}, \cdots , p^{w}_{|S_{w}|}\}\) is the multi-set of cost for each of the skills in Sw. In other words, if w is assigned to use her/his skill \({s^{w}_{i}}\) to perform a task, the responsible payment is \({p^{w}_{i}}\).

Definition 2 (Task)

A task t is defined as < ot,bt,et,St,Bt >, which means that a task t appears at the location ot at time bt and will leave if it is not assigned to workers before et. \(S_{t}=\{{s^{t}_{1}}, {s^{t}_{2}}, \cdots , s^{t}_{|S_{t}|}\}\) is the set of required skills of t. All skills in St should be mastered by an assigned worker or t cannot be completed. Bt is the total monetary budget.

We next define how to calculate a worker’s reward, which includes two parts: (1) reward for the skills that the worker is required to be responsible for and (2) reward for letting the worker move from her/his current location to the location of the assigned task. The reasons are as follows. First, the reward of a worker roots in the labour s/he contribute. Thus, the reward should include the fees for the specified skills that the workers are assigned. Besides, to complete a task, a worker should move from her/his location to the location of the task, which incurs extra cost. We think this cost should be paid to the assigned worker and the scale of the cost should be calculated according to the distance between the locations of the task and worker.

Definition 3 (Reward of Worker)

If worker w is assigned to perform task t and required to be responsible for the skills of \(S^{\prime }_{w}\subseteq S_{w}\), the reward of w is \(R(w,t,S^{\prime }_{w}) = \gamma \cdot dis(o_{w}, o_{t}) + {\sum }_{s\in S^{\prime }_{w}}{{p^{w}_{s}}}\), where dis(ow,ot) is the distance between Lw and Lt, which can be Euclidean distance or road network distance, γ is a global parameter representing the unit transportation fee.

We next define the utility of a task as follows.

Definition 4 (Utility of Task)

If a set of workers Wt is assigned to task t, the utility of task t is defined as \(U(t, W_{t}) = B_{t} - {\sum }_{t\in W_{t}}R(w,t,S^{\prime }_{w})\), where Bt is the budget of the task and \({\sum }_{t\in W_{t}}R(w,t,S^{\prime }_{w})\) is the summation of rewards of workers assigned to t. Note that if an assigned set of workers cannot complete the task, the reward of the task is zero.

We finally define our problem as follows.

Definition 5 (O nline M ulti-skill-A ware T ask A ssignment (OMATA) Problem)

Given a set of tasks T, a set of workers W and a global unit transportation fee γ, the problem is to assign workers to tasks to maximize the total utility of the completed tasks and the following constraints should be satisfied:

  • Invariable Constraint: once a set of workers is assigned to a task, it cannot be changed.

  • Skill Constraint: the workers assigned to a task should be able to cover the required skills;

  • Budget Constraint: the total rewards of workers assigned to a task cannot exceed the task’s total budget;

Hardness Analysis

We next prove that even the offline version of the OMATA problem is NP-hard. Thus, the OMATA problem is very difficult to address.

Theorem 1

The offline version of the OMATA problem is NP-hard.

Proof

We prove through a reduction from the weighted set cover (WSC) problem [25].

In an instance of the WSC problem, we are given a set U = {a1,a2,⋯ ,am} and its n subsets \(A_{1}, A_{2}, \cdots , A_{n} \subseteq U\). Each Ai has a weight wi. The goal is to find \(\mathcal {A}^{*}\subseteq \mathcal {A}=\{A_{1}, A_{2}, \cdots , A_{n}\}\) such that \(\cup \mathcal {A}^{*}=U\) and \({\sum }_{A_{j}\in \mathcal {A}^{*}}{w_{j}}\) is minimized.

We next show how to transform the WSC problem to an instance of the offline version of OMATA problem. Suppose we only have one task t which requires skills St = U and has much enough budget Bt. For n workers {w1,w2,⋯ ,wn}, their required fees for skills are all zero, and we adjust their locations and γ to make their transportation fee to perform t be ci. We aim to find a set of workers K to maximize the utility of t, which in this case equals \(B_{t}-{\sum }_{i\in K}c_{i}\). Thus we only need to minimize \({\sum }_{i\in K}c_{i}\). In this way, we reduce an instance of WSC problem to one of the OMATA problem. As the WSC problem is known to be NP-hard [25], the offline version of OMATA problem is also NP-hard. □

4 Method

We introduce the methods to solve the OMATA problem in this section.

4.1 Online-exact algorithm

Basic Idea

When a new object obj appears, which could be a task t or a worker w, we calculate the optimal assignment among the current set of tasks Tc, the current set of workers Wc through obj. In other words, obj must be involved in the assignment if such assignment exists.

Algorithm

Alg. (1) shows the pseudo-code of the Online-Exact algorithm. We first initialize the current assignment and the sets of tasks and workers which are not assigned currently in line 1. In Lines 2-3, for each new arrival object, which could be a task or a worker, we call the Exact Local Assignment (ELA in short) algorithm (see Alg. 2) which performs an optimal assignment for obj and update the assignment M. We update Tc,Wc in line 4 and return the final assignment in line 5. In Alg. 2, if obj is a task t (line 1), for each skill s required by t, we collect the workers who master this skill as set Ws (lines 3-4). We calculate the Cartesian product of all the Ws to enumerate all cases to assign workers for t in line 5 and return the case with the maximal utility in line 6. If obj is a worker w in lines 7-8, for each task tTc, we use the method in lines 1-6 to find the assignment \(M_l^t\) for t in lines 10-14. We finally update Ml using the assignment with the maximum utility among all \(\{M_l^t\}\) in line 15. We return the assignment in line 16.

figure f
figure g

Complexity

We assume the number of skills that a task requires or a worker masters is a constant C. For Alg. (2), the worst case happens when a new worker w arrives. For each task t that have arrived and have not left (line 10), we have to calculate all possible ways to assign a group of workers including w with t (in lines 11-14), which consumes O(|Wc|C) time. Thus the total complexity is |Tc||Wc|C. As a result, the time complexity of Alg. 1 is (|T| + |W|)|T||W|C.

We next give an running example of Alg. (1).

Example 2

After w4 arrives, we assign w1,w2,w3 and w4 to t1. Specifically, we let w1 perform the skill of s1, w2 perform the skill of s3, w3 perform the skill of s5 and w4 perform the skills of s2 and s4. Thus, the utility gained from t1 is 50 − 0.1 × (4.47 + 5 + 1 + 2.23) − (3 + 3 + 3 + 3 + 10) = 26.73. After that, Online-Exact does not assign workers for t2 and t3 as they cannot be completed by the remaining workers.

4.2 Online-Greedy algorithm

Basic Idea

Although Online-Exact can optimize the assignment for each task, it has two drawbacks. First, the time complexity is too high. Second, it may consume too many workers to pursue the optimal assignment for a single task and thus works bad when the workers are insufficient. In this subsection, we propose the Online-Greedy (OG for short) algorithm, which considers both the reward gained from a single task and the number of workers the task consumes.

Algorithm

Alg. (3) shows the pseudo-code of the Online-Greedy, which is similar to Alg. (1). The only difference lies in the line 3 where we use the Greedy Local Assignment (GLA) algorithm to perform the assignment for the new arriving object obj. Alg. (4) shows the pseudo-code of the GLA algorithm. When a task t appears in line 1, we iteratively assign workers with the minimal \(R(w, t, S^{\prime }_{w})/|S^{\prime }_{w}\cap S_{t}|\) to it in lines 4-8. The underlying idea is that we alway assign the worker with low reward and high skill coverage at the same time. Then we update the state of t by removing the covered skills in line 9 and calculating the remaining budget in line 10. If obj is a worker w in lines 12-14, we first find the task t with the minimum \(R(w,t,S^{\prime }_{w})\) in line 15. Then we assign workers to cover the remaining skills of t through recursion in line 16. We finally return the assignment in line 17.

figure h
figure i

Complexity

We still assume the number of skills that a task requires or a worker masters is a constant C. For Alg. (4), the worst case happens when a new worker w arrives. We first find a task t in line 15 using O(Tc) time. Then, we use O(C|Wc|) time to assign workers to t (in lines 4-10). Thus the total time complexity is O(|Tc| + |Wc|). As a result, the time complexity of Alg. 3 is (|T| + |W|)2.

We next give an running example of Alg. (3).

Example 3

When w4 arrives, we find t1 can be completed. We first assign w4 to t1 and let w4 to perform the skills of s1 and s5. Then, we assign w3 to t1 and let w3 to perform the skills of s2,s3 and s4 because w3 has the minimal value of \(R(w, t, S^{\prime }_{w})/|S^{\prime }_{w}\cap S_{t}|\). After that, we find t1 is completed and the utility is 10.77. Similarly, t2 cannot be completed until w5 comes. We assign w5 to t2 and let w5 perform the skills of s1,s2 and s4. Afterwards, we assign w2 who has the minimal value of \(R(w, t, S^{\prime }_{w})/|S^{\prime }_{w}\cap S_{t}|\) to t2. Thus, t2 can be completed and the utility is 5.5. When t3 comes, we first assign w6 to t3 and let w6 perform the skills of s2,s4 and s5 because this can get the minimal value of \(R(w, t, S^{\prime }_{w})/|S^{\prime }_{w}\cap S_{t}|\). We assign w1 and let w1 perform the skills of s1 and s3. Finally, the total utility is 24.76.

5 Evaluation

5.1 Experiment setup

Datasets

We use both real and synthetic datasets to evaluate our algorithms.

The real dataset is from [1], which is crawled from Meetup, an event-based social network. We use the data from the area of Hong Kong with latitude from 22.209 to 22.609 and longitude from 113.843 to 114.283. We use the events as the tasks and users as the workers. The tags are viewed as the skills. We remove the tasks which cannot be completed as they have the skills that are not mastered by any workers, the workers whose mastered skills do not have any overlap with all the tasks and the skills of workers which are not required by the tasks. Finally, the statistics of the remaining tasks and workers is shown in Table 4. Besides, we generate the budgets of tasks and costs of skills mastered by workers following the Gaussian distribution. Especially, because of the uncertain size of skills that both tasks require and workers master, we generate the mean of budget depending on the size of skills required by tasks. The variance of Bt related to one skill is fixed as 10 and the variance of pw is fixed as 5. The setting of these generated parameters can also be found in Table 4.

Table 4 Real dataset

We also use a synthetic dataset for experiments. Specifically, we generate data varying the following parameters.

  • |T|: the number of tasks;

  • |W|: the number of workers;

  • |St|: the number of skills that each task requires;

  • |Sw|: the number of skills that each worker masters;

  • Mean of Bt: the mean of budgets of different skills required by the tasks following Gaussian distribution;

  • Variance of Bt: the variance of budgets of different skills required by the tasks following Gaussian distribution;

  • Mean of pw: the mean of costs of different skills mastered by the workers following Gaussian distribution;

  • Variance of pw: the variance of costs of different skills mastered by the workers following Gaussian distribution;

  • Skill Range: the total number of skills;

  • γ: the global transaction fee for unit distance.

Statistics of the above parameters are shown in Table 5, where we mark our default settings in bold font. The appearing and leaving time of the tasks and workers are generated following uniform distribution.

Table 5 Synthetic dataset

Compared Algorithms

The following algorithms are compared in the experimental study.

  • Baseline, which assigns workers (tasks) to new arriving tasks (workers) according to the appearing order of the workers (tasks);

  • Online-Exact (Alg. (1)), which performs assignment for the new arriving tasks or workers via ELA algorithm (Alg. (2));

  • Online-Greedy (Alg. (3)), which performs assignment for the new arriving tasks or workers via GLA algorithm (Alg. (4));

Metrics

We compare the total utility, runtime and memory cost of the compared algorithms.

5.2 Experimental results on synthetic datasets

Because of the inefficiency of Online-Exact, we first study the performance of all the three algorithms in a small dataset where the number of tasks and workers is 100x times less than that of Table 5. We then only compare Baseline and Online-Greedy following the setting in Table 5.

Results on the small synthetic datasets

The results are shown in Fig. 2. It can be seen that the utility of Online-Greedy is a little less than Online-Exact and outperforms Baseline a lot. Reasonably, the time and memory cost of Online-Exact is extremely more than that of Online-Greedy and Baseline, making it useless in real applications. In Fig. 2e, the runtime of Online-Exact decreases overall with the increment of the number of tasks. The reason is that when there is only a small number of tasks, each task can be assigned to more workers. Due to the high time complexity of Online-Exact, more total time is needed. However, when there are more tasks, each task can only be assigned to a small number of workers. As a result, although more tasks should be assigned, the runtime is less.

Fig. 2
figure 2

Results on varying the number of the workers and tasks on a small dataset

Effects of the number of workers |W|

The results are presented in Fig. 3a-c. We first observe that the total utility increases reasonably with larger |W| for all two algorithms. The reason is that more workers are available and thus the utility gained from all the completed tasks increases. The increment of Baseline is not obvious. The reason is that Baseline does not try to select suitable workers who can bring more utility. As a result, Online-Greedy outperform Baseline a lot. We can find that Baseline is even slower than Online-Greedy when |W| is more than 6,000. The reason is that Baseline is very ineffective and as a result, there are always many tasks and workers waiting for assigning. Thus Baseline needs to spend much time to process each new arriving task and worker. While in Online-Greedy, the number of waiting tasks and workers is small, making it more efficient than Baseline. Finally, for the memory consumption, the observation is that generally the memory uses increase when |W| increases. Baseline consumes more memory as the heap of unassigned tasks and workers.

Fig. 3
figure 3

Results on varying the number of the workers and tasks

Effects of the number of tasks |T|

The results are presented in Fig. 3d-f. For the total utility score, we can first observe that Online-Greedy still outperforms Baseline. Besides, the tendency of Online-Greedy is complex. After analyzing the intermediate results of Online-Greedy, the phenomenon can be explained as follows. The total utility is effected by two factors, namely the number of finished tasks and the utility gained from each tasks. When |T| increases from 1,000 to 2,250, more tasks can be finished and the assignment for each task can be optimized as there are adequate available workers. However, when |T| increases from 2,250 to 3,000, although more tasks can be finished, the utility of each finished tasks decreases rapidly. As a result, the total utility deceases, too. When |T| increases more than 3,000, the number of finished workers is stable, as the workers are lacking. Thus, the utility is also steady. As for running time, we can find Online-Greedy is still faster than baseline. For the complex tendency of Online-Greedy, the reason is that first with the increase of |T|, nearly all the workers are assigned and thus few workers are accumulated, making the assignment for each new arriving task or worker faster. Then when |T| increases further, more tasks are accumulated, making the running time increase. For memory consumptions, still Baseline consumes more memory for the heap of tasks and workers.

Effects of mean of Bt

The results are presented in Fig. 4a-c. Firstly we can observe that the total utility increases reasonably when Bt increases as more utility scores can be obtained from each accomplished task. Online-Greedy still outperforms Baseline obliviously. As for running time, when the mean is 100, Online-Greedy is slower than Baseline. The reason is that the budget is small, thus many tasks and workers cannot be assigned and heap. In this case, Baseline is faster for its low time complexity. When the mean is bigger, for the effectiveness of Online-Greedy, less workers and tasks heap up, making Online-Greedy even faster than Baseline. When the mean is bigger than 400, Baseline can also consume the tasks and workers efficiently as Online-Greedy and runs faster than Online-Greedy. Finally, as for memory, when the mean is small, the heap of tasks and workers in Baseline makes it consume more memory than Online-Greedy. After the mean is large enough, the memory cost of the two algorithms is similar.

Fig. 4
figure 4

Results on varying the mean and variance of the tasks’ budget

Effects of variance of Bt

The results are presented in Fig. 4d-f. Firstly, we can see that the utility scores keep stable on varying the variance of Bt and Online-Greedy outperforms Baseline a lot. Second, as Baseline is ineffective, many tasks and workers heap up, making it consume more time and memory than Online-Greedy.

Effects of mean of the cost of workers’ each skill

The results are presented in Fig. 5a-c. As for the total utility, we can observe that the utility scores decrease as |pw| increases. This is reasonable as utility reduces when workers get more amount of payment. Similarly, with the increment of the mean, more workers and tasks heap up in Baselines, thus Baseline consumes more time and memory while the time and memory cost of Online-Greedy is stable.

Fig. 5
figure 5

Results on varying mean and variance of the cost of workers’ each skill

Effects of variance of the cost of workers’ each skill

The results are presented in Fig. 5d-f. Online-Greedy has more utility and less time and memory cost. The reason is similar. Baseline is ineffective and many tasks and workers heap up. Thus, although the time complexity of Baseline is lower than Online-Greedy, it still consumes more time and memory.

Effects of the number of required skills of tasks

The results are presented in Fig. 6a-c. Online-Greedy still outperforms Baseline in term of utility. As for time, initially, Baseline is more efficient. The reason is that the number of required skills is small. Thus all the tasks can be completed easily and as a result there are not many tasks and workers heaping up in Baseline. However, when the number is bigger, Baseline cannot find workers to complete enough tasks which results in the heap of tasks and workers and makes Baseline slower than Online-Greedy. In terms of memory, with the increment of the number of required skills, Baseline is influenced for the above reason and consumes more memory.

Fig. 6
figure 6

Results on varying the number of required skills of tasks, the number of mastered skills of workers and the number of total skills

Effects of the number of mastered skills of workers

The results are presented in Fig. 6d-f. Online-Greedy has more utility than Baseline. Still for the heap of the tasks and workers, Baseline consumes more time and memory.

Effects of the number of total skills

The results are presented in Fig. 6g-i. We find that both algorithms have less utility with the increment of the number of total skills. This is reasonable as generally the tasks become harder to complete. Besides, Online-Greedy outperforms Baseline again in terms of utility, time and memory.

Effects of γ

The results are presented in Fig. 7a-c. First, with the increment of γ both of the algorithms has less utility. This is reasonable as the the payment for the workers is increased. Again Online-Greedy outperforms Baseline. For the heap of the tasks and workers in Baseline, it consumes more time and memory. The time cost of Online-Greedy increases quickly as with a larger γ, the degree of accumulate of Online-Greedy is heavier, to which Online-Greedy is sensitive for its higher time complexity. In Fig. 7b, the runtime of Online-Greedy increases with the increment of γ. The reason is that the increment of γ is equivalent to the decrement of the budget. As a result, Online-Greedy has to use more time to find a group of workers through more loops in the algorithm. On the other hand, the runtime of Baseline is stable. The reason is that Baseline does not consider much on budget. It only cares about the order of the workers when assigning a task to them.

Fig. 7
figure 7

Results on varying γ

5.3 Experimental results on real datasets

In this subsection, we demonstrate the experimental results on real datasets. As in the real dataset, the total number of skills is 554 (see Table 4), Online-Exact cannot finish in 24 hours. Thus we do not show the results of Online-Exact.

The experimental results are similar with that of the synthetic dataset. Specifically, with the increment of the mean of the tasks’ budget on each skill (see Fig. 8a-c), both algorithms have more utility as they can get more profit from each completed tasks. With the increment of the mean of the cost of the workers’ each skill (see Fig. 8d-f) and γ (see Fig. 8g-i), both algorithms have less utility as they have to pay more to the workers. Baseline still consumes more time and memory for the heap of tasks and workers.

Fig. 8
figure 8

Results on varying the mean of the tasks’ budget for each skill, the mean of the cost of workers’ each skill and γ

5.4 Experimental summary

Online-Exact and Online-Greedy always perform better than Baseline in terms of total utility. Online-Exact is only a little better than Online-Greedy but consumes unbearable time and memory cost. In most cases, Online-Greedy even has less time and memory cost, as it can consumes the arriving workers and tasks effectively. While in Baseline, the heap of tasks and workers makes it even more inefficient.

6 Conclusion

In this paper we formalize a novel O nline M ulti-skill-A ware T ask A ssignment problem. We first analyze the difference of our problem and existing works. Then we prove that even the offline version of the problem is NP-hard. To solve the problem, we first propose Online-Exact algorithm, which always find an optimal assignment for the new coming task or worker. However, because of the high time complexity of Online-Exact, it is not practical in real applications. We next propose the Online-Greedy algorithm, which considers both the utility gained from a task and the occupation on the workers. We finally verify the effectiveness and efficiency of our problems via experiments on both synthetic and real datasets.