Keywords

1 Introduction

The huge requirement of internet-based technology and application heads toward the rapid development of the cloud computing environment. It offers numerous services over the internet. Users of cloud computing avail facilities like computing resources, storage for data, system resources, etc. with an illusion of infinite computing facilities. All these are handled by data centers that are geographically distributed and situated in many regions. Cloud computing provides mainly three types of services infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) [1]. Resource provision along with elastic computing is provided by the IaaS cloud. In PaaS, clients are given authority to use the cloud environment to build their software in the platform provided by cloud service providers (CSP). Users can avail of PaaS to use the software directly from the CSP. Virtualization is an important aspect of cloud computing, which enables a required number of virtual machines (VM) to the users [2]. It plays a vital role in task scheduling in a cloud computing environment. Different users submit their requests to the cloud environment in the form of a set of tasks that are allocated to machines or VMs in the cloud. Cloud computing follows the technique behind a combination of parallel and distributed computing. In a cloud system, the user aims to achieve efficient scheduling parameters after allocating all the tasks to the available resources [3, 4]. To provide an efficient and optimal schedule, it is required to analyze different optimization parameters which can be implemented in cloud task scheduling. In most of the research studies, evaluation parameters like makespan, cloud utilization, deadline time, energy consumption, etc. are considered [5,6,7].

Nowadays, energy consumption has become the crucial factor in cloud datacenters. To reduce it, cloud engineers are approaching nature-inspired optimized scheduling techniques which focus on less energy consumption in data centers.

Energy estimation and cloud resources usage are profoundly coupled [8, 9]. Low cloud resource usage is hollering an unsuitable measure of energy when they are completely used or adequately stacked. A new review on energy utilization and carbon emission of huge data centers is exceptionally high in the year of 2005, in the USA. Data centers in European locale have been assessed to devoured 1% sum of carbon emission, while in the USA, it is 2.8% around the same time [9,10,11,12,13]. In distributed computing, the basic equipment framework is hidden from the end client. Even though application solicitation can be thought about for low utilization of energy and high usage of cloud resources, cloud resources ought not to be over-burden or underloaded by the undertakings, rather ought to be utilized ideally [14].

In cloud task scheduling, different types of techniques are like heuristics scheduling, workflow scheduling, static scheduling, and dynamic scheduling [15]. According to the complexity of an algorithm, task scheduling can be described as heuristic, meta-heuristic, and hybrid task scheduling approaches [16]. Heuristics algorithms are mainly used to evaluate task scheduling algorithms like minimum execution time (MET), minimum completion time (MCT), shortest job to fastest processor (SJFP), min-min, max–min. In a cloud environment, generating a task schedule using a meta-heuristic algorithm is becoming the most approachable area of research. It deals with a multi-modal optimization problem. Task scheduling in the cloud is an NP-complete problem using meta-heuristics methods. Traditional meta-heuristics algorithms are particle swarm optimization (PSO), ant colony optimization (ACO), genetic algorithm (GA). In the recent study, a variety of approaches are found like Jaya algorithm, social-group optimization (SGO), teaching learning-based optimization (TLBO), etc. All these algorithms focus on a set of populations. In PSO, these populations are considered as particles. Scheduling is applied on this set of solutions to generate an efficient schedule. Using popular cloud scheduling parameters like makespan, it is evaluated [17].

In PSO, a set of particles is considered and initialized randomly. Each particle is assumed to be a task and is allocated to the available VM. Random initialization of particles is an important key factor in PSO. It leads to providing a wide range of search space for the particles which is a boost to an optimal solution [18]. It improves the performance of the PSO algorithm. The working mechanism of PSO is given through a flowchart in Fig. 1.

Fig. 1
A flow diagram illustrates the particle swarm optimisation working mechanism. A set of particles is considered and randomly initialised in P S O.

PSO working mechanism

In this paper, the proposed method is to utilize heuristic scheduling algorithms to initialize the PSO search process. Makespan, cloud utilization, and energy consumption are also considered as optimization parameters [19]. The rest of the paper is listed as follows. Section 2 overviews the related works. Section 3 describes the system model of our scheduling approach. The scheduling algorithm is presented in Sect. 4. Experimental evaluations and importance of dataset are described in Sect. 5. Section 6 concludes our work and highlights the future works.

2 Background and Related Work

In the current research area of interest, optimizing the scheduling parameters using an efficient schedule is gaining popularity. Besides this, resource allocation, cloud load balancing, security aspects in the cloud, energy efficiency are also major issues that attract attention. Clients or users, data centers, task managers are the components of this model. Data centers are responsible for the provision of an ample number of virtual machines required for the processing of requests. A task manager and scheduler are implemented to track the tasks before and after submitting into the resources and to prepare a schedule for upcoming tasks. Client requests in the form of tasks are updated in the task queue and later allowed for submission into resources.

In recent years of the survey, it has been noticed that many task scheduling algorithms are generated. These algorithms are either heuristics or meta-heuristics. Some of the algorithms focus on reducing makespan, whereas some algorithm designs a schedule that maximizes the cloud utilization values. There are some static traditional cloud scheduling algorithms proposed among which min-min, max–min, suffrage heuristics are included. The most important aspect is energy consumption in the cloud environment. In cloud computing, virtualization plays an important role, where data centers are engaged to create VMs. Data centers are globally distributed, and the number of data centers is growing rapidly to fulfill the client requests. In a recent study, it has been observed that 16% of data centers have been added. This results in a high increase in carbon emission, and the computation rate raised to 76%. To neutralize this impact, the USA has taken initiatives like the European code of conduct, the Energy Star program, and the 80 PLUS. The cloud environment has an architecture of two types, i.e., hardware and software. It is a tedious task to manage this architecture. The hardware environment can be manipulated by modifying the circuits to reduce energy consumption. In cloud data centers, the resource and energy consumption using an efficient task scheduling method is applied as suggested by [14]. But, the performance evaluation is not conducted in this paper. Energy-aware task consolidation is proposed by Refs. [19, 20] which aims to restrict CPU utilization. Hierarchical scheduling method is proposed to optimize the energy parameters in many research articles.

3 Models Used in the Proposed Algorithm

3.1 Cloud Task Scheduling Model

A cloud computing scenario is an architecture that incorporates several servers inside a data center. Servers are nothing but high-end processors and are treated as the resources for the client's requests. Requests are submitted in the form of a task queue. Tasks follow a specific algorithm, pass through a queue and task manager, and result the output. These algorithms are judged or analyzed in terms of theirs efficiency. Data transfer cost is also one among the quality of service (QoS) parameters which are treated as a vital factor. However, in most of the research papers, it is considered negligible. Using PSO, an efficient schedule is prepared, and along with that, QoS parameters like makespan, completion time, cloud utilization, and energy consumption are tracked.

3.2 Application Model

A group of tasks is taken in the form of application model. These are prepared to form a schedule, and a queue is generated. All the tasks are treated as independent and have their individual properties like execution time, completion time, etc. In this application model, execution time is used and previously estimated. It is stored in a task to machine/cloud matrix popularly known as the expected time to compute (ETC) matrix [2]. This application model preferably runs on a static multi-cloud environment. It is assumed to be no interference between the tasks and other I/O requests, cloud storage, and cloud resources.

3.3 Energy Model

A new energy model [21] is referred to estimate the total amount of energy consumption using the PSO scheduling method in a heterogeneous multi-cloud environment. To calculate this estimation, makespan and cloud utilization are formulated. It is found that average cloud utilization and energy consumption are linearly proportional according to a few research propositions. The execution time of the task on each cloud is referred to from the ETC matrix. Makespan is calculated as the completion time of the last task. Alternatively, it refers to the overall completion time required to execute all tasks in a multi-cloud environment. It is mathematically defined as given by Eq. (1).

$$ \begin{aligned} M =& \max \left( {\sum\limits_{i = 1}^{n} {{\text{ETC}}(i,1) \times F(i,1),} \sum\limits_{i = 1}^{n} {{\text{ETC}}(i,2) \times F(i,2),....} p\sum\limits_{i = 1}^{n} {{\text{ETC}}(i,m) } } \right.\\ & \quad\left. { \times F(i,m) } \right)\end{aligned}$$
(1)

Energy consumption can be overly calculated as per Eq. (2). It is also seen that energy consumption is using overall cloud utilization for the calculation of energy used in a particular algorithm.

$$ E_{i} = (P_{\max } - P_{\min } ) \times U_{i} + P_{\min } $$
(2)

where

Ui = denotes the utilization of cloud i.

Pmax = power use at the maximum load (or 100% cloud utilization).

Pmin = minimum power consumption in the active mode (or as low as 1% utilization).

Here for idle resources, overhead to turnoff time is negligible. Hence, it is not considered.

Average energy consumption is denoted as in Eq. (3).

$$ E = \frac{{\left( {\sum\limits_{i = 0}^{m} {E_{i} } } \right)}}{m} $$
(3)

where 1 ≤ i ≤ n and 1 ≤ j ≤ m.

3.4 Scheduling Model in Cloud

In this scheduling model, a set of tasks and a set of machines are taken to compile the overall energy consumption using PSO algorithms. T = {T1, T2, T3, …….Tn} is a set of independent tasks, and C = {C1, C2, C3,….. Cm} is a set of machines or clouds. Each task has some execution time on each machine. This mapping is represented in the expected time to compute (ETC) matrix. ETC matrix is given in Eq. (4), where Tth task execution time on Cth cloud is denoted.

$$ {\text{ETC}} = \begin{array}{*{20}c} {{\text{ETC}}_{11} } & {{\text{ETC}}_{12} } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & {{\text{ETC}}_{1m} } \\ {{\text{ETC}}_{21} } & {{\text{ETC}}_{22} } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & {{\text{ETC}}_{2m} } \\ { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } \\ { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } \\ {{\text{ETC}}_{n1} } & {{\text{ETC}}_{n2} } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & {{\text{ETC}}_{nm} } \\ \end{array} $$
(4)

4 Proposed PSO-Based Task Scheduling Algorithm

PSO is one of the traditional meta-heuristics scheduling algorithms, which provides the optimal solution for a group of particles. In task scheduling, PSO also outperforms to generate an efficient method. It is inspired by the social behavior of particles which later follows an evolutionary computational method. In this proposed task scheduling model, a group of swarm or particles is considered as a set of scheduled tasks. Each schedule has its behavior as the particles have. A predefined searched space is introduced to enhance the efficiency of the algorithm. Each particle represents a solution to the optimization problem, which is optimizing some cloud QoS parameters. Each particle is associated with position and velocity which helps them to move forward to the next step or position. Here, fitness function is evaluated at each step or position to identify the best solution or best particle. It will optimize the overall makespan, and the corresponding energy consumption will be calculated using the proposed algorithm. Velocity at the next position is defined by Eq. (5).

$$ v_{l}^{t + 1} = \omega v_{l}^{t} + c_{1} r_{1} \left( {p_{l}^{{{\text{best}}}} - p_{l}^{l} } \right) + c_{2} r_{2} \left( {p^{{{\text{gbest}}}} - p_{l}^{t} } \right) $$
(5)

where

\(p_{1}^{t}\) is the lth particle at iteration t.

\(v_{l}^{t}\) is the velocity of the lth particle at iteration t.

pbest is the best position found.

pgbest is the best global position among all pbest

1 < l < L, where L is the population size.

Parameters c1 and c2 are the acceleration constants, r1 and r2 are random numbers between 0 and 1, and ω is the inertia factor.

In the proposed cloud task scheduling algorithm using PSO, solution (particle) which is denoted as S1, S2, S3Sn is given in Fig. 1. For an instance, solution S1 is a schedule where T1 is allocated to C2, T2 is allocated to C2, etc. Optimization function (F(x)) is calculated for each solution after the end of each iteration, and pbest and pgbest are chosen.

The parameters pbest and pgbest are the velocities of the current particle and best particle. The best particle is one with less F(x) value (in of case minimization function). Here, schedule which gives less makespan is chosen. Hence, optimization parameter is a minimization function. A newly updated position of the particle can be calculated using Eq. (6).

$$ p_{t + 1} = \left( {\left| {v_{t + 1} } \right| + \bmod m} \right) + 1 $$
(6)

where pt+1 = next position of particle (here, it is cloud/machine).

vt+1 = velocity at next position.

m = number of cloud or machine.

For each solution, the updated position for each machine is calculated. Here, it is assumed that the number of iteration is the same as the number of solutions or particles. An algorithm is given in the following table.

Algorithm: PSO based task scheduling

Input:

1. A set of n independent tasks

2. A set of m cloud

3. An ETC matrix

Output:

1. Makespan

2. Energy consumption

1. Particle or solution = mn is randomly initialized and a set of solutions is randomly chosen(40% of solution size). iter= 40% of mn

2. F (x) = makespan of selected solution is calculated .

3. for l= 1 to iter and t = 1 to iter(where iter= 40% of mn)

  \(v_{l}^{t + 1} = \omega v_{l}^{t} + c_{1} r_{1} (p_{l}^{best} - p_{l}^{l} ) + c_{2} r_{2} (p_{{}}^{gbest} - p_{l}^{t} )\)

    For, m = number of cloud

      \(p_{t + 1} = (\left| {v_{t + 1} } \right| + \bmod m) + 1\)

4. Repeat step 3 until the termination condition is fulfilled.

5. Update fitness function value, choose the best schedule and calculate average energy consumption (Eqs. 2, 3)

4.1 Illustration

In Fig. 2, a set of solutions or particles is there which will undergo a series of steps to generate an optimized result. Best F(x) value is treated as gbest, and the makespan of each solution is pbest. In the first iteration, the new velocity and position of the particle are calculated using Eqs. 5 and 6. Here, the position of the particle indicates, to which cloud the task will be assigned. For each solution, this process will be repeated, and simultaneously value of F(x) is also calculated. After reaching the termination condition, the solution with the best F(x) (minimization function) value is chosen. An example is illustrated in Fig. 3a–d.

Fig. 2
A table in the figure illustrates the representation of a particle as a set of solutions that will undergo some steps to generate an optimised result.

Representation of a particle

Fig. 3
A four-part figure. Part a is an E T C matrix. Part b is the initial solution set. Part c is the updated position of cloud. Part d is the optimised schedule after the first iteration.

a ETC matrix. b Initial solution set. c Updated position of cloud. d Optimized schedule after first iteration

In this example, the proposed algorithm is running for only one iteration. After a desired number of iteration, an updated solution will be obtained with a task to cloud allocation sequence. Makespan, cloud utilization, and energy consumption are estimated for the solution. For the example given above, cloud utilization and the energy consumption are found to be 81% and 0.563 units, respectively.

5 Experimental Evaluation and Results

To evaluate the performance of different cloud scheduling parameters, both synthesized and benchmark datasets are taken. It is implemented using an Intel processor (2.6 GH) using. Benchmark datasets are heterogeneous. The 1024 × 32 and 512 × 16 datasets are represented in exam format for input to the proposed algorithm [22]. A geographical comparison of makespan, cloud utilization, and energy consumption for various cloud task algorithms like min-min, max–min, GA, etc. is given in Fig. 3 (1024 × 32) and Fig. 4 (512 × 16). Average cloud utilization and energy consumption are given in Table 1. In Table 1, instances of datasets are given in u_x_yyzz format, where

Fig. 4
A multiple bar graph illustrates the makespan comparison of 1024 times 32 benchmark dataset for various cloud task algorithms.

Makespan comparison of 1024 × 32 benchmark dataset

Table 1 Cloud utilization and energy consumption for benchmark dataset
  • u = uniform distribution to generate these instances.

  • x = type of consistency (i.e., consistent (C), inconsistent (I), or semi-consistent (S)).

  • yy = task heterogeneity (i.e., high (Hi) or low (Lw)).

  • zz = machine or cloud heterogeneity (i.e., high (Hi) or low (Lw)).

Note that these instances are especially used in cloud scheduling.

Fig. 5
A multiple bar graph illustrates the makespan comparison of 512 times 16 benchmark dataset for various cloud task algorithms.

Makespan comparison of 512 × 16 benchmark dataset

6 Conclusion

In this paper, the working of the PSO algorithm in a heterogeneous multi-cloud environment has been portrayed. It is resulting from an efficient scheduling mechanism by taking a large search space. Hence, it is improving the overall makespan and generates cloud utilization and average energy consumption in cloud systems. These are cloud parameters required to evaluate the performance of the scheduling algorithm. PSO is one of the traditional evolutionary algorithms which performs better in a large search space. In the proposed algorithm, it is evaluating the overall energy consumption after completion of the scheduling method. It outperforms the existing algorithms in a distributed environment and evolutionary task scheduling. Synthesized and benchmark datasets are tested upon the proposed algorithm. The future work will be the implementation of the proposed algorithm in a real cloud environment and analysis of the proposed algorithm with advanced evolutionary scheduling strategies.