Keywords

1 Introduction

Scheduling of tasks in cloud computing is a well studied problem which is NP-complete in nature. Therefore, several heuristics and meta-heuristics approaches have been proposed to address this problem that deal with independent or dependent tasks. This article describes the scheduling of independent tasks on the virtual resources and present a novel algorithm which is based on an efficient meta-heuristics approach, called flower pollination algorithm (FPA) [1]. It is noteworthy that there exist \(m^{n}\) possible schedules for n tasks to be mapped on m VMs. Therefore, the computational complexity to generate an optimal task-VM mapping by a brute force approach would be exponential. This also consumes huge space. Thus, a nature-inspired approach such as flower pollination algorithm (FPA) [1] can be very effective to generate a near optimal solution for the aforesaid task scheduling problem in cloud environment. Note that the FPA can produce better solution than other meta-heuristic approaches such as GA, PSO and has the highest convergence rate among them [1].

The key goal of the proposed scheme is to map the tasks of a given application on the active virtual machines such that the overall application processing time namely makespan(M) is minimized and the average cloud resource utilization (\(Avg.U_c\)) is maximized. The proposed scheme is presented with the employed pollen representation scheme. A novel multi-objective fitness function is also derived by considering the aforesaid objectives which contradict each other. We perform rigorous experiments to compute the results of the proposed scheme using one synthetic dataset and two benchmark datasets instances. The performance results are compared with three other meta-heuristic based scheme i.e., PSO, GA and GSA. The simulated results demonstrate its superiority in terms of makespan and average cloud utilization.

Many researches have been carried out in recent years to address the task scheduling problem in  [2,3,4,5,6,7,8,9,10,11,12, 16, 19, 20]. Panda et al. [8] have proposed both on-line and off-line task scheduling algorithm to schedule the independent tasks for cloud environment. The proposed algorithms of task scheduling are shown to perform better than the cloud list scheduling (CLS) [17], Round Robin (RR) and Min-Min task scheduling algorithms [18]. In [16], the authors have used the linear programming to solve the problem of preemptive task scheduling in cloud computing. In [7], a two phase genetic algorithm with multi-parent crossover is suggested to minimize both makespan and energy consumption. However, they have not considered average cloud utilization. In [6], an improved genetic algorithm (IGA) is proposed, in which the virtual machines are selected on the basis of dividend policy. The performance of their algorithm is shown to outperform [7]. In [4], the authors have developed a PSO based algorithm for static task scheduling problem in a homogeneous cloud environment which supersedes the GA based approach. A multi-objective differential evolution (MODE) scheme is proposed in [2], as a solution to the task scheduling problem. In [5], the authors have addressed the problem of task scheduling in a grid environment and proposed a PSO based scheme to generate the task-VM mapping. The proposed algorithm is shown to produce the best solution for the task scheduling problem. Our proposed algorithm in this paper, is a novel approach in the sense that we are the first researchers to exploit the features of the meta-heuristic FPA to develop an efficient task scheduling algorithm for cloud computing. Further, there is novelty in the pollen representation and derivation of an efficient fitness function based on the two important objectives for task scheduling.

The remainder of the paper is systematized as follows. Section 2 describes the models and terminologies used in the proposed algorithm. The proposed algorithm is explained in Sect. 3. The simulation results are displayed in Sect. 4 followed by the conclusion in Sect. 5.

2 Preliminaries

2.1 Cloud and Task Model

Let us consider a cloud user’s request which is expressed in the form of an application A consisting of a set of n tasks such that A=\(\{T_1, T_2, \cdots , T_n\}\). The user submits his application to a cloud service provider (CSP) for the needful operations (i.e., computing, communication and storage). The CSP provides all the requested services to the users by negotiating a service level agreement (SLA).

Table 1. Notations used

Consider an IaaS cloud model similar to the Amazon EC2, where m heterogeneous virtual machines (VMs) are deployed on a cloud. Each VM has different computing capability. Therefore, all VMs are able to execute the tasks of the given application. The key components of the assumed cloud model are as follows.

  1. 1.

    Cloud user: This is the end user of cloud resources. The cloud users initiate the various cloud service requests to execute their applications. These requests are submitted to the cloud scheduler via cloud service provider (CSP).

  2. 2.

    Cloud scheduler: Cloud scheduler is the recipient of the cloud users’ service requests. The core responsibility of the cloud scheduler is to maintain status of all VMs and assign the tasks over the virtual machines so that the user’s requirement is fulfilled.

  3. 3.

    Cloud service provider (CSP): The cloud service provider acts as an intermediary between the cloud users and the cloud scheduler. The CSP contains the resource pool of the active VMs. The user’s application is executed on the active VMs as per the decided task scheduling algorithm.

2.2 Notations and Terminologies

Now, we define various terminologies used to derive the fitness function and analyze the simulation results as follows. The notations are shown in the Table 1 with their description.

  1. 1.

    Processing speed (PS): This is the processing speed of each VM say, \(VM_j\) which is measured in million instructions per second (MIPS) and denoted by \(PS_j\).

  2. 2.

    Execution time of a task (\(ET_{ij}\)): Consider an application consisting of n tasks \(\{T_1, T_2, T_3, \cdots , T_n\}\), where the size (\(I_i\)) of every task \(T_{i}\) is measured in million instructions (MIs). Therefore, the execution time of task \(T_{i}\) on virtual machine \(VM_{j}\) can be mathematically expressed as follows.

    $$\begin{aligned} ET_{ij}= \frac{I_j}{PS_j} \end{aligned}$$
    (1)
  3. 3.

    Makespan (M): Let \(M\_VM_{i}\) denote the makespan of the \(i^{th}\) virtual machine, where \(1\le i\le m\). Then, we can mathematically express the overall makespan as follows.

    $$\begin{aligned} M = \textit{max}\{M\_VM_{i}\mid 1\le i\le m\} \end{aligned}$$
    (2)
  4. 4.

    Average makespan (\(Avg.M\_VM\)): It is the mean of the makespan of all the VMs. In other words,

    $$\begin{aligned} Avg.M\_VM=\frac{\sum _{j=1}^{m}M\_VM_j}{m} \end{aligned}$$
    (3)
  5. 5.

    Standard deviation of VM’s makespan (\(Sd.M\_VM\)): It is the standard deviation of virtual machines’ makespan. Hence,

    $$\begin{aligned} Sd.M\_VM=\sqrt{\frac{\sum _{i=1}^{m}(Avg.M\_VM-M\_VM_{i})^2}{m}} \end{aligned}$$
    (4)
  6. 6.

    Average cloud utilization (\(Avg.U_c\)): It is the ratio of the average makespan (\(Avg.M\_VM\)) and the overall makespan (M). Therefore,

    $$\begin{aligned} Avg.U_c=\frac{Avg.M\_VM}{M} \end{aligned}$$
    (5)

2.3 Problem Definition

We address the following problem in this paper. Given an application with a set of n independent tasks, \(A= \{T_1, T_2, T_3, \cdots , T_n\}\) and given a set of m virtual machines \(C = \{VM_1, VM_2, VM_3, \cdots , VM_m\}\), the problem is to determine the task-VM mapping with the following objectives:

  • Objective 1: The overall makespan (M) is minimized.

  • Objective 2: The average cloud utilization (\(Avg.U_c\)) is maximized.

3 Proposed Work

In this section, we present a FPA-based scheme to address the problem defined in Sect. 2.3. We first show the pollen representation scheme used in the proposed algorithm. We then design an efficient multi-objective fitness function to incorporate in the proposed algorithm.

3.1 Pollen Representation

Each pollen (P) is represented by an n-dimensional vector, where each element (say \(p_{ij}\mid 1\le i, j\le n\)) has a random value in the range (0, 1) i.e., \(0< p_{ij}< 1\). The employed mapping function is given by Eq. 6.

$$\begin{aligned} T\_VM_{ij} = floor(p_{ij}\times m)+ 1\mid 1\le i\le p\_size~\text {and}~1\le j\le n \end{aligned}$$
(6)

where, m denotes the total number of VMs, \(p\_size\) denotes population size and n denotes the total number of tasks. This pollen representation is analogous to [15]. It is noteworthy that each pollen is able to produce entire solution for the task scheduling problem. To demonstrate this, we consider a hypothetical could-resource model where \(n=8\) and \(m=3\). The process of task-VM mapping is exhibited by considering a sample pollen as shown in Fig. 1.

Fig. 1.
figure 1

Retrieval of task-VM mapping from a pollen

In Fig. 1, we can observe that the first task is mapped on virtual machine 1 since the value of \(p_{i1}\) is 0.14 and therefore, \(floor(0.14\times 3)+1=1\). Remaining tasks are also mapped on the VMs following similar procedure. For instances, the tasks \(T_2\) and \(T_3\) are mapped onto \(VM_2\) and \(VM_3\) respectively using same formula given in Eq. 6.

3.2 Fitness Function

We consider two contradictory objectives to derive an efficient multi-objective fitness function for problem of independent task scheduling in cloud computing. The first objective is to minimize the makespan. The makespan can be minimized, if the standard deviation of makespan of all the VMs is minimized i.e., lower the value of \(Sd.M\_VM\), lower will be the makespan. Therefore,

$$\begin{aligned} \text {Objective 1: Minimize}~Sd.M\_VM \end{aligned}$$
(7)

Our second objective is to maximize the average cloud utilization. Therefore,

$$\begin{aligned} \text {Objective 2: Maximize} = Avg.U_c \end{aligned}$$
(8)

We combine both above-mentioned objectives into a single-objective for minimization problem by employing the weighted sum approach where the value of the first objective is multiplied with a weight value \(C_1\) and reciprocal of the second objective is multiplied by \(C_2\). Thus,

$$\begin{aligned} \text {Objective: Minimize}(C_1\times Avg.M\_VM+\frac{C_2}{Avg.U_C}) \end{aligned}$$
(9)

We choose this is as the single objective fitness function in our proposed algorithm. In other words,

$$\begin{aligned} \text {}~Fit = C_1\times Avg.M\_VM+\frac{C_2}{Avg.U_C} \end{aligned}$$
(10)
Fig. 2.
figure 2

Proposed FPA-based task scheduling algorithm

3.3 FPA-based Task Scheduling Algorithm

The pseudo code of the proposed FPA-based algorithm is given in Fig. 2. In step 1, population of size \(p\_size\) is initialized. In step 2 through step 4, the value of the fitness function for each pollen of the current population is computed using Eq. 10. In step 5 and step 6, the fitness value of the best pollen is initialized to maximum value of the integer and the best pollen is initialized as an empty set respectively. In step 7 through step 11, the best pollen from the initial population is identified and stored as \(Best\_pol\). In step 12 through step 27, at each iteration for every pollen, a random number between 0 and 1 is generated and if the generated value is greater than the \(Switch\_prob\) than the global pollination is applied to the pollen using Levy distribution, otherwise the local pollination is applied using uniform distribution [1].

The proposed algorithm FPA is tested through extensive simulations which were carried out on an Intel Core 2 Duo processor, 2.20 GHz CPU with 4 GB RAM running on Microsoft Windows 7 platform by using MATLAB R2012a version 7.14.0.739. We evaluate the performance for the simulation results on one synthetic dataset of 6 instances and two benchmark datasets of 12 instances. For the comparison purpose, we also simulate GA, GSA and PSO by employing the same solution representation and the fitness function. In simulation, the switching probability and the size of population (\(p\_size\)) is considered as 0.5 and 100 respectively.

3.4 Datasets

In simulation, we considered two benchmark datasets as given in [13] with 512 and 1024 tasks respectively. The benchmark datasets comprises of 12 diverse dataset instances. These instances are described by u_x_yyzz, where u implies uniform distribution to generate the dataset, x refers for the consistency i.e., inconsistent (i), semi-consistent (s) or consistent(c) and yyzz refers the heterogeneity of task-VM i.e., hihi, hilo, lohi, lolo respectively [13, 14]. Moreover, we created one synthetic dataset using the uniform distribution. This synthetic dataset also consists of 6 different instances, where size of each instance is \(n \times m\), for n tasks and m VMs.

3.5 Experimental Results

We consider makespan (M) and average cloud utilization (please refer Sect. 2.2) as the performance metrics to show the dominance of the proposed algorithm over other simulated schemes. The comparison of cloud makespan for the dataset instances of size \(512\times 16\) size and \(1024\times 32\) are shown in Figs. 3 and 4 respectively by the bar graph. We also plot the average cloud resource utilization for both the benchmark datasets as shown in Figs. 5 and 6 respectively. Similarly, for synthetic dataset, we also calculated the makespan and average cloud resource utilization as shown by Fig. 7. It can be observed that the proposed FPA-based scheme has the minimum makespan and maximum average cloud utilization for all the dataset instances in comparison with GA, GSA and the PSO based approaches.

Fig. 3.
figure 3

Comparison of cloud makespan for \(512\times 16\) size dataset instances

Fig. 4.
figure 4

Comparison of cloud makespan for \(1024\times 32\) size dataset instances

Fig. 5.
figure 5

Comparison of avg. cloud utilization for \(512\times 16\) size dataset instances

Fig. 6.
figure 6

Comparison of avg. cloud utilization for \(1024\times 32\) size dataset instances

Fig. 7.
figure 7

Comparison for synthetic dataset instances (a) cloud makespan and (b) avg. cloud utilization

4 Conclusion

We have proposed a FPA-based task scheduling scheme in a cloud computing environment. We have demonstrated an efficient scheme of pollen representation and the process to retrieve the task-VM mapping from a given pollen. We have also shown the derivation of a single objective fitness function by combining two conflicting objectives, i.e., minimization of makespan and maximization of average cloud resource utilization. Through simulation runs on one synthetic dataset and two benchmark datasets, we have shown that the proposed technique performs better than the similar meta-heuristics, namely, PSO, GA and GSA in terms of two performance metrics i.e., makespan and average cloud resource utilization. However, the proposed work considers the static behavior of the tasks and virtual machines. Our future work will be focused on dynamic workflow scheduling.