1 Introduction

Recently, cloud-computing pools a massive amount of resources to support a large variety of IT services through the internet. These services can be provisioned as: software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Enterprises and individual users no longer need to pay a fortune to purchase expensive hardware. They can purchase computing power with minimal management effort [1]. Further, they can increase or decrease on demand [2]. Nevertheless, the widespread use of cloud computing in different fields causes many challenges including load balancing, power consumption, security and resource scheduling.

Resource scheduling is considered one of the major problems in the cloud computing environment. It refers to the process of distributing tasks of a given application onto available resources in the cloud as virtual machines (VMs). Since the number of VMs is limited and has different capabilities, an efficient scheduling method is required to carefully schedule application tasks onto such VMs. Hence, the task scheduling problem is classified as an optimization problem [3]. Therefore, many heuristics and meta-heuristics algorithms were developed to solve such problem with different objectives [4]. These objectives include minimizing schedule length, computation time, used memory and economic cost as well as maximizing throughput, balancing degree and resource utilization.

In the early stage of task scheduling, various heuristic algorithms were designed to address the task scheduling problem [5]. However, most heuristic strategies failed to improve the quality of solutions for large-sized problems. On the other side, meta-heuristic algorithms have been shown to be more efficient in solving a variety of complex optimization problems. However, they have some weak points like tuning of different parameters may cause local optimum, low convergence rate with iterations, and high memory usage [6]. To cover these defects, one or more heuristic and meta-heuristics approaches may be combined to minimize their weakness and combine advantages of these methods for successful research direction.

Harris hawk optimizer (HHO) is a novel meta-heuristic population-based natural-inspired algorithm that imitates the social behavior of Harris hawks in nature [7]. It is used in solving many real engineering problems as image segmentation [8], renewable energy plants [9], feature selection problem [10], and others [11]. In this research work, an enhanced version of the basic HHO, called elite learning Harris hawks optimizer (ELHHO), is developed and implemented to address the multi-objective scheduling problem. The modification is done using a scientific intelligent method called elite opposition-based learning (EOBL) for generating the solution and its opposite at one time for more diversity in the search space [12]. Farther, the minimum completion time (MCT) [13] heuristic algorithm is used as an initial phase to obtain a determined initial solution rather than a random solution in each running time to avoid local optimality and satisfy the QoS in terms of minimizing schedule length (SL), execution cost, and maximizing resource utilization, balance degree, and throughput.

The contributions of this work are as follows:

  • Formulating the scheduling problem as an optimization problem taking into account the availability of cloud resources.

  • Presenting an optimized multi-objective scheduling model for a minimum SL and minimum implementation cost. This model considers cost and scheduling performance as the budget constraints for the optimization problem.

  • Developing an enhanced version of the HHO, called elite learning Harris hawks optimizer (ELHHO), by using elite opposition-based learning strategy to overcome the drawbacks of the conventional HHO.

  • Implementing a hybrid approach combining the ELHHO and the MCT for optimum task scheduling in cloud computing.

The rest of this paper is organized as follows: Section 2 presents a literature survey of related work. Section 3 formulates the scheduling problem as an optimization problem. Section 4 describes the conventional HHO algorithm in some detail. Section 5 presents the proposed ELHHO in detail. Section 6 presents the experimental results and discussion. Finally, Sect. 7 presents the concluding remarks of this research work.

2 Related works

Recently, several methods were proposed to solve the scheduling problem. In [14], the authors introduce a comparative analysis and comprehensive survey of various task scheduling strategies for cloud and grid environments considering three popular meta-heuristic techniques: genetic algorithm (GA), particle swarm optimization (PSO), and ant colony optimization (ACO) as well as the bat optimization algorithm (BAT) and the league championship algorithm (LCA). In [15], a hybrid scheduling algorithm is developed based on the integration between imperialist competitive algorithm (ICA) and firefly algorithm (FA) to improve both the schedule length and load balancing. In [16], a multi-objective nested PSO was proposed for task scheduling problem in cloud computing to optimize energy consumption and overall computation time without considering other metrics like cost, balance degree, etc. In [17], a multi-objective algorithm is proposed for task scheduling to improve the throughput of the datacenter and reduce the cost without violating the service level agreement (SLA). Nasr et al. [18] introduced new strategy, called highest-priority first-execute, for scheduling multi-tasks from multi-users in the public cloud in a stable and optimized way, achieving minimum SL and finishing each users’ tasks on time without violating the SLA.

A comparative study among various scheduling algorithms like PSO, min-min, first come first served (FCFS), round-robin, GA, and greedy algorithm in terms of response time, load balance, execution time, and schedule length is presented in [19], while in [20] a simple survey was presented in which several state-of-the-art scheduling algorithms based on heuristic, meta-heuristic, and hybrid algorithms were presented. It also includes a systematic and taxonomic review of resource provisioning and scheduling algorithms, as well as their benefits and drawbacks. A scheduling mechanism based on simulated annealing (SA) is developed in [21], while in [22] the SA is combined with the ACO for cloudlet scheduling, where the SA is applied as a first phase followed with the ACO in the second phase to improve the system performance. Also, in [23], the SA is combined with the HHO aiming to overcome the locality in the exploration phase and improve the rate of convergence and the quality of generated solutions of the conventional HHO algorithm. The introduced HHOSA can obtain a minimum average SL comparing with the well-known salp swarm algorithm (SSA), PSO, moth-flame optimization (MFO), FA, and HHO algorithms.

In [24], a V-heuristics scheduling algorithm is introduced for cloud resource allocation under user constraint. The algorithm considers data transfer time as well as the processing time for virtualized networks and virtual machines. The simulation results of the introduced method are compared with three popular heuristic algorithms (MCT, min-min and max–min) under virtual or conventional networks. In [25], a multi-objective scheduling algorithm was presented based on the ACO. The algorithm considers the resource cost model and the schedule length and the user's budget costs as constraints of the optimization problem to achieve multi-objective optimization of both performance and cost. In [26], an improvement for the GA is proposed to minimize completion time, and maximize resource utilization. In [27], a new artificial intelligence (AI) emergent algorithm called deep reinforcement learning is proposed for solving the problem of resource scheduling in the cloud computing environment, where the problem of cloud scheduling is restructured to fit the reinforcement learning representation as reinforcement learning usually works on states. So, the cloud resources are formed as states in distinct images. The results outperform the shortest job first (SJF), longest job first (LJF), and random algorithms.

There are two main entities in the cloud environment: cloud providers and cloud consumers. From the above review, most of the researches on task scheduling were focused only on one entity incentives; cloud consumers or cloud providers. Further, most optimization problems consider single objective, but in the nature such problem is usually multi-objective. As in the cloud environment, every cloud user aspires to obtain cloud services at a low cost and with high performance, while cloud providers, on the other side, strive to provide cloud services with high resource utilization, revenue, and profit. Therefore, we address the task scheduling problem as a multi-objective problem. Also, most researchers formulate the problem without taking into consideration application requirements and cloud resources availability. So, in the following section, the scheduling problem is formulated as an optimization problem consisting of objective function subject to set of constraints representing both application requirements and cloud resources availability.

3 Task scheduling problem: description and formulation

In the IaaS model, cloud resources are grouped within cloud data centers as a pool of virtualized homogeneous and/or heterogeneous resources. Each server in the cloud data center can host one or more VMs that are running in a pseudo-parallel way using a time-shared or space-shared VM scheduling policy [28]. This virtualization layer provides resources with high availability and large scalability for cloud consumers. In the cloud environment, a data center broker, the backbone of the scheduling process, assigns user tasks to the available resources in the data center. When the users submit their cloudlets to the cloud provider, firstly, the cloudlet enters the task management component that organizes submitted tasks and provides the status of each task to the user who sent it. The task manager then sends task requests to the task scheduler, which uses the scheduling algorithm to assign incoming tasks to available and appropriate VMs. The task scheduler takes its scheduling decision according to the task and VM information included in the cloud information system (CIS). In this case, if VMs are not available for processing, tasks will be transferred to the task queue [29]. The process of assigning the users’ cloudlets onto the available VMs in the cloud environment refers to the scheduling problem. Such a problem is defined as an NP-complete problem due to different task requirements and the dynamic nature of the heterogeneous resources availability in the cloud [30].

The scheduling problem becomes more difficult with the rising complexity of the cloud environment. In general, developing algorithms to generate optimal task scheduling solutions in cloud computing is difficult. Due to the ability of meta-heuristic algorithms to provide near-optimal solutions in a reasonable amount of time, using meta-heuristic algorithms to address task scheduling has recently received more attention [31]. For clarity, assume that there is a set of n independent tasks {\(t_{1} ,~t_{2} ,~ \ldots t_{n}\)} submitted by cloud users to be processed, where each task has a specific length \(\ell (t_{i} )~\) in million instruction (MI). The scheduler (broker) allocates these tasks onto a set of \(m\) cloud resources {\({\text{vm}}_{1} ,~{\text{vm}}_{2} , \ldots ..{\text{vm}}_{m}\)}, where each \(vm_{i}\) has specific configurations such as main memory, storage, processing power in MIPS (million instruction per second), \({\text{vm}}\) price, and number of cores. These configurations are used in the scheduling algorithm to calculate the execution time for each task. Therefore, the value of expected execution time of task \(t_{i}\) on a virtual machine \(~{\text{vm}}_{j}\), i.e.,\({\text{~ECT}}\left( {t_{i} ~,{\text{vm}}_{j} } \right)\), can be calculated as in Eq. (1).

$$ {\text{ECT}}\left( {t_{i} ~,{\text{vm}}_{j} } \right) = \frac{{\ell (t_{{i~}} )~}}{{{\text{total}}\_{\text{MIPS}}\left( {{\text{vm}}_{j} } \right)}} $$
(1)

where \(\ell (t_{{i~}} )~\) refers to the required processing load of task \(t_{i}\). The processing power of each VM is measured in MIPS and \({\text{~total}}\_{\text{MIPS}}_{{\left( {{\text{vm}}_{j} } \right)}} = {\text{no}}{\text{.}}\;{\text{of~cores}}\left( {{\text{vm}}_{j} } \right)*{\text{mips}}\left( {{\text{vm}}_{j} } \right).\)

Since each task can be processed on any VM, datacenter broker creates \(n~ \times ~m\) ECT matrix, based on the characteristics of the VMs and cloudlets, where each element in the matrix is the expected completion time (\({\text{ECT}}_{{ij}}\)) of task \(t_{i}\) at \(~{\text{vm}}_{j}\) as in Eq. (2).

$$ {\text{ECT}} = ~\left[ {\begin{array}{*{20}c} {{\text{ECT}}_{{11}} } & \ldots & {{\text{ECT}}_{{1m}} } \\ {\begin{array}{*{20}c} {{\text{ECT}}_{{21}} } \\ {...} \\ \end{array} } & {\begin{array}{*{20}c} \ldots \\ \ldots \\ \end{array} } & {\begin{array}{*{20}c} {{\text{ECT}}_{{2m}} } \\ \ldots \\ \end{array} } \\ {{\text{ECT}}_{{n1}} } & \ldots & {{\text{ECT}}_{{nm}} } \\ \end{array} } \right] $$
(2)

Since each task can be processed on any VM and different VMs have different processing power, this makes different costs \(C\left( {t_{i} ~,{\text{vm}}_{j} } \right)\) of processing task \(t_{i}\) on different VMs. The main goal of task scheduling algorithm is to assign tasks to appropriate resources in optimal way that optimize one or more objectives [32]. The main objective of the proposed approach is to schedule the submitted tasks on the available VMs to achieve lower SL and lower cost.

To formulate the scheduling problem, let \(~\pi _{{ij}}\) be a binary decision variable given by:

$$ \pi _{{ij}} = \left\{ {\begin{array}{*{20}c} 1; {{\text{if~task}~{i}~{\rm is~submitted~to}~{{\rm vm}_j}}} \\ 0; {{\text{otherwise}}} \\ \end{array} } \right. $$
(3)

For any schedule solution \(~\Phi ~\), the SL is calculated as follows:

$$ SL\left( \Phi \right) = \mathop \sum \limits_{{i = 1}}^{n} \mathop \sum \limits_{{j = 1}}^{{m~}} {\text{ECT}}\left( {t_{i} ~,{\text{vm}}_{j} } \right)*\pi _{{ij}} $$
(4)

The first objective \(F_{1} \left( \Phi \right)~\) of the proposed approach is to minimize the SL so it may be calculated as:

$$ F_{1} \left( \Phi \right) = \min \left( {{\text{SL}}\left( \Phi \right)} \right) $$
(5)

In cloud computing, the SLA provides users with efficient services based on their needs and budget. The second objective \(~F_{2} \left( \Phi \right)\) of the proposed approach is to minimize the execution cost \(~EC\left( \Phi \right),\) so it may be calculated as:

$$ F_{2} \left( \Phi \right) = \min \left( {{\text{EC}}\left( \Phi \right) = \mathop \sum \limits_{{i = 1}}^{n} \mathop \sum \limits_{{j = 1}}^{{m~}} C\left( {t_{i} ~,{\text{vm}}_{j} } \right)*\pi _{{ij}} } \right)~ $$
(6)

The proposed meta-heuristic ELHHO algorithm can accept or reject the new generated schedule solution \(\Phi\) based on the value of fitness function for this solution. The fitness value may be expressed as in Eq. (7).

$$ F\left( \Phi \right) = F_{1} \left( \Phi \right)~ + F_{2} \left( \Phi \right)~ $$
(7)

The fitness function (F) is formulated to minimize both the total execution time and execution cost of all submitted tasks for the feasible solution \(~\Phi\). Furthermore, several constraints must be considered to satisfy tasks requirements without wasting cloud resources.

  1. (i)

    Each task must be submitted to only one VM, i.e.,

    $$ \mathop \sum \limits_{{j = 1}}^{m} \pi _{{ij}} = 1~~~~~\forall i \in \left\{ {1,2, \ldots .,~n} \right\} $$
  2. (ii)

    The required memory recourses \({\text{mem}}\left( {t_{{i~}} } \right)\) for all tasks assigned to a virtual machine \(~({\text{vm}}_{j} )\) do not exceed the available memory of that virtual machine, i.e.,

    $$ \sum\limits_{{i = 1}}^{n} {{\text{mem}}(t_{i} ) * \pi _{{ij}} } \le (vm_{{j_{{{\text{mem}}}} }} )\quad \forall j \in \left\{ {1,2, \ldots ,m} \right\} $$
  3. (iii)

    The required load for all tasks assigned to a virtual machine \(~({\text{vm}}_{j} )\) at a time must be less than or equal to the processing power of that virtual machine \(~{\text{total}}\_{\text{MIPS}}~\left( {{\text{vm}}_{j} } \right)\) i.e.,

    $$ ~\mathop \sum \limits_{{i = 1}}^{n} \ell (t_{{i~}} )~~*\pi _{{ij}} ~~~~ \le ~~~{\text{total}}\_{\text{MIPS}}~\left( {{\text{vm}}_{j} } \right)~~~~~~~~~~\forall j \in \left\{ {1,2, \ldots .,~m} \right\}~~ $$
  4. (iv)

    Each task can be executed on any available virtual machine that satisfies the task's requirements; therefore, its execution time is dependent on VM capabilities.

  5. (v)

    The tasks are non-preemptive so that each task must be executed without any interruption.

  6. (vi)

    More than one task may be assigned to the same VM at a time.

4 Harris hawks optimizer

Harris hawks optimizer (HHO) is a novel meta-heuristic population-based algorithm introduced in 2019 for solving global optimization problems [7]. This optimizer simulates the cooperative behavior of intelligent Harris’ hawks birds while the process of searching and catching their prey. Harris hawks follow a clever maneuver called a surprise pounce which is also called the “seven kills” strategy in hunting the escaped prey. The concrete implementation of the seven kills is that the team members initiate active attacks from various directions for higher success and then converge to the intended rabbit. In the HHO algorithm, each hawk considers a candidate solution in the population, while the rabbit location is considered as the best solution or nearly optimal solution. The HHO follows two main phases, exploration and exploitation, in the search process, as shown in Fig. 1. Hawks are randomly distributed in different locations to monitor different areas, in order to trace and detect the rabbit during the exploration phase, while the hawks make a surprise pounce or dive quickly for the team to exploit the neighborhood of the intended prey in the exploitation phase.

Fig. 1
figure 1

Phases of the HHO [7]

4.1 Exploration phase

In exploration phase, the hawk monitors and observes the desert to detect its prey that may take several hours. Therefore, the hawks crouch randomly on some locations and wait to discover a prey according to two different strategies with equal probability as formulated in Eq. (8).

$$ X_{k} \left( {t + 1} \right) = \left\{ {\begin{array}{*{20}l} {X_{r} \left( t \right) - r_{1} \left| {X_{r} \left( t \right) - 2r_{2} X\left( t \right)} \right|} \hfill & {q \ge 0.5} \hfill \\ {\left( {X_{{{\text{rabbit}}}} \left( t \right) - X_{a} \left( t \right)} \right) - r_{3} \left( {{\text{LB}} + r_{4} \left( {{\text{UB}} - {\text{LB}}} \right)} \right)} \hfill & {q < 0.5} \hfill \\ \end{array} } \right. $$
(8)

where \(X_{r} \left( t \right)\) is a random hawk selected from the current population, \(X\left( t \right)~\) is the current location of the hawk, while \(X_{{{\text{rabbit}}}} \left( t \right)\) is the rabbit location and \(X_{k} \left( {t + 1} \right)\) is the location of hawk k in the next iteration t. \(r_{1} ,~r_{2} ,~r_{3} ~r_{4} ~~{\text{and~}}q\) are random numbers in range [0,1], which are updated every iteration, UB is the upper bound and LB is lower bound of variables and \(X_{a} \left( t \right)\) is the average position of all hawks in the current population given as follows:

$$ X_{a} \left( t \right) = \frac{1}{M}\mathop \sum \limits_{{k = 1}}^{M} X_{k} \left( t \right)~ $$
(9)

where M is the number of all hawks in the iteration t which represents population size. The main objective of such phase is to allocate the hawks in the search area according to the value of q, where \(q \ge 0.5~\) denotes the hawk crouch on the giant trees randomly to explore the largest desert area, while \(q < 0.5\) means that the hawk perch in some locations according to the position of other team members and the location of each hawk. In this way, all members can assure they are near enough when attacking the intended prey [7].

4.2 Transition from exploration phase to exploitation phase

The HHO algorithm can change the searching process from the exploration phase to the exploitation phase according to the rabbit escaping energy (E). The rabbit escaping energy decreases during the rabbit run away from the hawks, and it is modeled as follows in Eq. (10).

$$ E = 2E_{0} \left( {1 - \frac{t}{T}~} \right)~ $$
(10)

where t is the current iteration the rabbit escaping energy, T refers to the maximum number of iterations, and \(E_{0}\) is the rabbit initial energy with random value in range [− 1, 1]. In the case of \(\left| E \right| \ge\) 1 the rabbit has the energy to run away from the hawks; therefore, the hawks look for different locations to explore the search space for rabbit location, while in the case of \(\left| E \right| <\) 1, the rabbit has no energy for escaping and then, the algorithm starts the exploitation phase to research the neighborhood of the available solutions.

4.3 Exploitation phase

In this phase, the intended prey detected in the previous phase will be attacked by the hawks. However, prey often tries to escape from these attacks according to its escaping energy and chasing strategies of the Harris’ hawks. Indeed, there are four different mechanisms for chasing patterns, namely: (1) soft-besiege, (2) soft-besiege with progressive rapid dives, (3) hard-besiege, and (4) hard-besiege with progressive rapid dives. Before the surprise pounce, a random number (r) is generated to determine the probability of prey in successfully escaping (\(r < 0.5\)) or unsuccessfully escaping (\(r \ge 0.5\)).

  1. (i)

    Soft-besiege

In this stage, \(r \ge 0.5~~\) and the prey has enough energy to run away by random jumps in various directions but finally it cannot where \(~\left| E \right| \ge 0.5\). This can be modeled as follows [7]:

$$ X\left( {t + 1} \right) = \Delta X\left( t \right) - E\left| {J~X_{{{\text{rabbit}}}} \left( t \right) - X\left( t \right)} \right| $$
(11)
$$ J = 2\left( {1 - r_{5} } \right)~ $$
(12)
$$ \Delta X\left( t \right) = X_{{{\text{rabbit}}}} \left( t \right) - X\left( t \right) $$
(13)

where J refers to the strength of random jump of the rabbit during the escaping process, \(r_{5} ~\) is a random number in range [0,1], and \(\Delta X\left( t \right)\) is the distance between the current location of the hawk and the rabbit location in the iteration t.

  1. (ii)

    Soft-besiege with progressive rapid dives

In this stage, \(r < 0.5\) and the prey has enough energy for escaping where \(\left| E \right| \ge 0.5\). It is assumed that the hawks have the strength to decide the next actions according to the following equations:

$$ X\left( {t + 1} \right) = \left\{ {\begin{array}{*{20}l} Y \hfill & {{\text{if}}~F\left( Y \right) < F\left( {X\left( t \right)} \right)} \hfill \\ W \hfill & {{\text{if}}~F\left( W \right) < F(X\left( t \right)} \hfill \\ \end{array} } \right. $$
(14)
$$ Y = X_{{{\text{rabbit}}}} \left( t \right) - E\left| {J~X_{{{\text{rabbit}}}} \left( t \right) - X\left( t \right)} \right| $$
(15)
$$ W = Y + S*LF\left( D \right) $$
(16)

where D refers to the problem dimension, S indicates a \(1 \times D\) random vector and \({\text{LF}}\) is the levy fight function which is given by:

$$ {\text{LF}}\left( x \right) = 0.01\frac{{u \times \sigma }}{{\left| v \right|^{{\frac{1}{\beta }}} }} $$
$$ \sigma = \left( {\frac{{\Gamma \left( {1 + \beta } \right) \times \sin \left( {\frac{{\pi \beta }}{2}} \right)}}{{\Gamma \left( {\frac{{1 + \beta }}{2}} \right) \times \beta \times 2^{{\left( {\frac{{\beta - 1}}{2}} \right)}} }}} \right)^{{1/\beta }} $$
(17)

where \(u,v~\) are random numbers in interval [0, 1] and \(\beta\) is a constant equal to 1.5.

  1. (iii)

    Hard-besiege

In this stage, \(\left| E \right| < 0.5\) but \(~r \ge 0.5\). The prey is very exhausted and does not have enough energy, while it is completely surrounded by the hawk. The location of the hawks can be updated as follows [7]:

$$ X\left( {t + 1} \right) = X_{{{\text{rabbit}}}} \left( t \right) - E\left| {~X_{{{\text{rabbit}}}} \left( t \right) - X\left( t \right)} \right| $$
(18)
  1. (iv)

    Hard-besiege with progressive rapid dives

In this stage, both \(\left| E \right| < 0.5\) and \(~r < 0.5\) indicate that the prey cannot escape. So, the hawks decrease the distance with the mean prey position. Updating the hawk location is similar to that used in soft besiege with progressive rapid with the following mathematical model [7],

$$ X\left( {t + 1} \right) = \left\{ {\begin{array}{*{20}l} Y \hfill & {{\text{if}}~F\left( Y \right) < F\left( {X\left( t \right)} \right)} \hfill \\ W \hfill & {{\text{if}}~F\left( W \right) < F(X\left( t \right)} \hfill \\ \end{array} } \right. $$
(19)
$$ Y = X_{{{\text{rabbit}}}} \left( t \right) - E\left| {J~X_{{{\text{rabbit}}}} \left( t \right) - X_{a} \left( t \right)} \right| $$
(20)
$$ W = Y + S*{\text{LF}}\left( D \right) $$
(21)

where \(X_{a} \left( t \right)\) is given by Eq. (9). The implementation steps of the HHO algorithm are shown in Algorithm 1.

5 Proposed framework

The conventional HHO is simple and effective for solving optimization problems. Nevertheless, it has some weaknesses that in turn affecting on the HHO performance and the quality of the final solution. These limitations are that the ability of exploration capacity is weak and the final schedule is varying in every running time. Briefly, when the HHO falls into a local optimum point, it is difficult to fit the search space toward the global optimum solution area. Further, the quality of the final solution depends on the quality of the initial population. Since the initial population is generated randomly, it may cause local optimum solution in some cases and causes variation of final schedule in every running-time. Hence, it is very important to direct the search process toward the global optimum solution area and guarantee the same schedule in every running-time.

In the proposed approach, two optimization techniques are combined into the conventional HHO to overcome its limitations. The two optimization strategies are the elite opposition-based learning [12] and the minimum completion time [13]. The EOBL strategy is used in the exploration process to improve the ability of the global search in the HHO to balance between exploration and exploitation. The MCT strategy is used to generate the initial scheduled list that will be used as the initial population of the proposed ELHHO to guarantee the same solution in every running time.

figure a

5.1 Elite opposition-based learning (EOBL)

Opposition-based learning (OBL) is a relatively new intelligent computation technique. It has been successfully incorporated in several evolutionary algorithms (EAs) for enhancing the performance of EAs [33]. The main idea of the OBL is based on generating a solution and its opposite at one time and selects the better one to be in the population for the next iteration. Let \(X_{s} = \left( {x_{1} ,x_{2} ,x_{3} , \ldots x_{D} } \right)\) be a solution in current population, where D is the dimension of the search space and \(~x_{d} \in \left[ {a_{d} ,b_{d} } \right],~~d = 1,2,3, \ldots D\), its opposition point \(X^{\sim} = \left( {x_{1}^{\sim} ,x_{2}^{\sim} ,x_{3}^{\sim} , \ldots x_{D}^{\sim} } \right)\) is given by Eq. (22).

$$ x_{i}^{\sim} = a_{d} + b_{d} - x_{d} $$
(22)

The elite opposition-based learning is a novel strategy in the intelligence computation field. If the elite individual (optimal solution) in the current population is \(~X_{e} = \left( {x_{{e1}} ,x_{{e2}} , \ldots x_{{eD}} } \right)\), the elite opposition-based solution of the individual \(X_{i}\) is \(X_{i}^{\sim} = \left( {x_{{i,1}}^{\sim} ,~x_{{i,2}}^{\sim} , \ldots x_{{i,D}}^{\sim} } \right)\) which can be given by Eq. (23).

$$ x_{{i,j}}^{\sim} = \eta *(da_{j} + db_{j} ) - x_{{e,j}} $$
(23)

where i = 1, 2,…n, n is the population size, j = 1,2,3,…,D, D is dimension of X, and η is a generalized coefficient with value \(\eta \in U\left( {0,1} \right).\) And \(da_{i} ~,db_{i}\) is the dynamic bound of \(j_{{th}}\) decision variable and its value is given by Eq. (24).

$$ da_{j} = \min \left( {x_{{i,j}} } \right),~~db_{j} = \max \left( {x_{{i,j~}} } \right)~ $$
(24)

If the dynamic boundary operator makes the value of \(x_{{i,j}}^{\sim}\) jumping out of the range \(\left[ {da_{j} ,db_{j} } \right]\), the value of \(x_{{i,j}}^{\sim}\) may be reset by Eq. (25).

$$ ~x_{{i,j}}^{\sim} = {\text{rand}}\left( {da_{j} ,db_{j} } \right)~\;{\text{if}}\quad x_{{i,j}}^{\sim} \left\langle {da_{j} ~{\text{or}}~~x_{{i,j}}^{\sim} } \right\rangle db_{j} $$
(25)

The EOBL creates an opposition individual based on the existing elite individuals in the population. Furthermore, it makes full use of elite individuals' characteristics to contain more valuable search information than normal individuals that leads to improving the population's diversity to some degree.

5.2 Minimum completion time algorithm (MCT)

The minimum completion time algorithm is used in solving the scheduling problem by assigning tasks, in random order, to virtual machines or resources with the minimum predictable completion time for each task, where tasks are first arranged by their minimum completion time and the expected execution time and the task with minimum ECT is assigned to the selected VM. In the MCT algorithm, some tasks are allocated to VMs or resources that have no minimum execution time [34]. Algorithm 2 presents the main steps of the MCT.s

figure b

5.3 Proposed ELHHO algorithm

In this section, the proposed approach, called elite learning Harris hawks optimizer (ELHHO), is proposed for task scheduling in the cloud computing environment. The main motivation of the proposed ELHHO algorithm is to improve the HHO algorithm in the process of movement from a solution to its neighbor by using a scientific intelligent method. In the original HHO, each generation required all Hawks to be explored once, and each hawk's search behavior was influenced by the other hawks in the group. To enhance the exploration process of the HHO, the new hawk position is generated according to Eqs. (8) and (9), and the opposite hawk position is generated by using Eq. (23) at the same time. Then, the two generated solutions are compared to select the best one for the next iteration. The main objective is to explore more positions in the search space and improve the exploration process of the HHO algorithm. Algorithm 3 in Table 1 presents main steps of the proposed ELHHO technique.

Table 1 ELHHO framework

The developed ELHHO starts by determining its control parameters like the population size, the number of submitted tasks, the number of available VMs, and the maximum number of iterations. Then, the algorithm starts by generating a population X with the dimension N × n from the initial solution that is the output of the MCT algorithm as an initial phase for the ELHHO approach. Each solution \(x_{i}\) ∈ X represents a candidate schedule solution, and it is assessed by the value of its final schedule length and processing cost where the best solution has the minimum SL and minimum cost. The solution can be represented with a 2 × n matrix where the first raw is the task number and its value between 0 and n − 1, while the second raw represents the number of a virtual machine that runs this task and takes the value between 0 and m − 1. For example, a solution \(x_{i} ~\) for scheduling 5 tasks into 2 virtual machines can be represented as follows:

$$ x_{i} = ~\left[ {\begin{array}{*{20}c} {1~~0~~3~~2~~4} \\ {1~~1~~0~~0~~~1} \\ \end{array} } \right] $$

This means that tasks number 0, 1 and 4 are assigned to \({\text{vm}}\) number 1 and tasks number 2 and 3 are assigned to \({\text{vm}}\) number 0. Then, in the next step, ELHHO determines the best solution in the population that is considered as the rabbit location or intended prey, at the same time the opposite elite solution is generated to select the best between them as the rabbit that all other hawks want to catch it. For improving the exploration process, with each newly generated hawk, the elite opposite hawk also generated to explore more locations in the space and the best between them is selected to insert it into the population for more improvement in the next iterations. The ELHHO performs the exploitation phase according to the exploitation of the original HHO that switches between several strategies based on a few parameters to provide the obtained best solution. Finally, if the termination criterion is met, the ELHHO stops the running and returns the final obtained rabbit location; otherwise, the algorithm continues in the searching process.

6 Experimental results

This section introduces the experimental results of scheduling different tasks into a different number of VMs. The results of the proposed multi-objective ELHHO are compared with those obtained by the conventional HHO, genetic algorithm, water wave optimization, and ant colony optimization algorithm in terms of schedule length, execution cost, balancing degree, throughput and resource utilization.

6.1 Experimental environment and datasets

The simulation is run on a laptop with an Intel Core i5-2540 M CPU @ 2.60 GHz, 8 GB of RAM, and a 64-bit Windows 7 operating system. To simulate the cloud system entities, the well-known CloudSim 3.0.3 tool kit is used [35]. The data center and host configurations are given in Table 2, while the characteristics of VMs are given in Table 3 with different costs as given in Table 4. Several data sets of real applications from the harmonized standard workload are used. These data sets are taken from records in the Division of Numerical Aerodynamic Simulation Systems (NAS) at the NASA AMES research center [36]. NASA-iPSC-1993–1.1-cln.swf uses the replaced cleaning log on 1 August 2006 for trial. For performance assessment, HPC2N (North High-Performance Computing Center) is also used. The NASA Ames iPSC/860 and the HPC2N set log are two of the most commonly used specifications for evaluating distributed system performance. The control parameters of the implemented algorithms are given in Table 5. For the comparison among these algorithms, each experiment is running 10 times and takes the average results.

Table 2 Data center and host configuration
Table 3 VM characteristics
Table 4 Different costs for VMs

6.2 Evaluation metrics

As is well known, there are two primary entities involved in the cloud: cloud service provider and cloud consumer. Cloud service providers make their resources available to cloud consumers on a rental basis, while cloud consumers send their jobs for processing.

Consumers care more about the performance of their applications, while service providers care more about the efficient use of their resources to make more profits. Thus, these optimization objectives can be categorized into two main categories: consumer desires and provider desires. In the proposed multi-objective optimization algorithm, the two-mentioned types are considered in terms of SL, execution cost, throughput, balance degree (BD), and resource utilization (RU).

  1. (A)

    Consumer desires

6.2.1 Schedule length (SL)

The schedule length refers to the maximum completion time of the assigned application tasks on the most loaded VM. This metric is essential for measuring the quality of the obtained results by any scheduling technique [37]. A low schedule length value indicates a good scheduling strategy, where the strategy can efficiently assign tasks to the resources. A high schedule length value indicates a poor scheduling strategy. The value of SL can be computed by Eq. (26).

$$ {\text{SL}} = \max \left( {\mathop \sum \limits_{{i = 1}}^{n} {\text{ECT}}\left( {t_{i} ~,{\text{vm}}_{j} } \right)\pi _{{ij}} } \right)~~~~~~~~~~\forall ~VMs~j~ $$
(26)

Tables 6 and 7 present the average SL for HHO, ELHHO, WWO, GA, and ACO algorithms considering NASA Ames iPSC/860 and HPC2N workloads, respectively. The number of tasks varied from 200 to 2000, while the number of virtual machines is 10, 15 and 20. From the results, it can be seen that the value of SL increases as the number of tasks increases except in some cases with ACO (like 10 VMs with 500 task and 1000 task cases and also 15 VMs with 200 task and 250 task cases) and one case with GA of 10 VMs with 200 task and 250 task as can be shown in Table 6. Also from Table 7, in the ACO algorithm of case 15 VMs with 250 task and 500 task, the SL value decreased, while the number of tasks increased as in 20 VMs with 200 task and 250 task cases. This results prove that the GA and ACO algorithms cannot obtain the optimal point or any near point to it in most cases, while the proposed ELHHO algorithm can obtain the optimal point in all cases.

Table 5 Algorithms parameters
Table 6 Average schedule length in sec with real workload NASA-iPSC-1993-1.1
Table 7 Average schedule length in sec with real workload HPC2N

This proves that the ACO cannot reach the optimal SL value and fall into the local optimum point; it may reach a point close to the optimal. On contrary, the proposed ELHHO achieves the best SL in all cases with different tasks and VMs. The ELHHO outperforms all other algorithms in minimizing the SL. For the implementation of NASA-iPSC workload, ELHHO improves HHO with different rates for example 87%, 89.8% and 94.9% at 2000 tasks with 10, 15, and 20 VMs, respectively. Similarly, with the implementation of HPC2N workload, ELHHO improves HHO with 92.3%, 89%, and 94%.

6.2.2 Execution cost

The execution cost (EC) is the total price for performing a user's application. It tends to be the most measurable metric today, but it is important to express the cost in terms of the characteristics of the given resources. So, the user aims to reduce the cost and the schedule length to obtain the fast response with the minimum cost. In this arrangement scheme, the execution cost can be computed as the cost of VMs. This may be expressed as in Eq. (27) [38].

$$ {\text{EC}} = \mathop \sum \limits_{{j = 1}}^{{m~}} {\text{CT}}_{j} *{\text{price}}_{j} ~ $$
(27)

where \({\text{CT}}_{j}\) is the completion time of the \(j{\text{th}}\) virtual machine after executing its last task. Figures 2, 3, and 4 show the average execution cost of assigning different cloudlets onto 20, 15, and 10 VMs. From Figs. 2, 3, and 4, the ELHHO outperforms the conventional HHO, GA, and WWO algorithms in terms of the execution cost for both HPC2N and NASA iPSC workloads. It has the best results in both workloads. This means that the proposed ELHHO leads to improvement of the performance in terms of the execution cost of tasks and SL, as shown in the previous section. Therefore, the proposed ELHHO satisfies its objectives in enhancing the performance which overcomes the weakness of the HHO. The average execution cost by the ELHHO was up to 60% less than that of the HHO for all instances. However, the ELHHO gives execution cost greater than the ACO in only five cases in both workloads with value not exceeding 6 $.

Fig. 2
figure 2

Average execution cost with 20 VMs

Fig. 3
figure 3

Average execution cost with 15 VMs

Fig. 4
figure 4

Average execution cost with 10 VMs

  1. (B)

    Provider desires

6.2.3 Throughput

Throughput is known as the maximum number of completed tasks in a time unit [39]. It determines the scheduling technique efficiency, where a high value of throughput yields a high execution rate with faster response.

Figures 5, 6, and 7 show the results of throughput when applying the ELHHO, HHO, WWO, GA and ACO to schedule different tasks (200, 250, 500, 1000 and 2000) onto different VMs (20, 15, and 10) considering both HPC2N and NASA iPSC real-time workloads. From the figures, it can be seen that the ELHHO technique outperforms the other techniques in maximizing the throughput in different cases of workloads. The proposed ELHHO technique efficiently explores the available number of VMs and enhances HHO performance. The ELHHO has the maximum throughput, and its value increases as the number of tasks and number of VMs increase for both HPC2N and NASA iPSC real-time workloads. Since ELHHO has the smallest SL, it leads to an increase in the number of tasks completed at one time and improves the performance of the system.

Fig. 5
figure 5

Average throughput with 20 VMs

Fig. 6
figure 6

Average throughput with 15 VMs

Fig. 7
figure 7

Average throughput with 10 VMs

6.2.4 Resource utilization

Resource utilization (RU) is another essential metric. The cloud provider aims to maximize resource utilization to keep resources occupied as much as possible. This metric becomes more important as the service provider aims to maximize his profit from a limited number of renting resources. The average RU can be calculated with Eq. (28) [40].

$$ {\text{average~RU}} = \frac{{\mathop \sum \nolimits_{{i = 1}}^{m} {\text{CT}}_{i} }}{{{\text{SL}}}}~ $$
(28)

Figures 8, 9, and 10 illustrate the utilization of resources when applying the ELHHO, HHO, WWO, GA and ACO to schedule different tasks (200, 250, 500, 1000 and 2000) onto different VMs (20, 15, and 10) considering both HPC2N and NASA iPSC real-time workloads. From the figures, the proposed ELHHO algorithm outperforms all algorithms in terms of RU for the HPC2N workload because it uses more resources to reduce SL and implementation cost. However, in the case of NASA iPSC workload, the HHO, WWO, GA and ACO achieve better resources utilization than the ELHHO at small number of tasks.

Fig. 8
figure 8

Average resource utilization with 20 VMs

Fig. 9
figure 9

Average resource utilization with 15 VMs

Fig. 10
figure 10

Average resource utilization with 10 VMs

6.2.5 Balance degree (BD)

Balance degree is the degree of balancing the workload on the available resources after applying the scheduling process. Higher BD refers to a more efficient scheduling algorithm. The BD may be calculated by Eq. (29):

$$ {\text{BD}} = {\text{SL}}_{{{\text{opt}}}} /{\text{SL}}_{{{\text{fin~}}}} $$
(29)

where \(~{\text{SL}}_{{{\text{fin~}}}}\) is the final obtained SL after applying the scheduling decision, while \({\text{SL}}_{{{\text{opt}}}}\) is the optimal schedule length. \({\text{SL}}_{{{\text{opt}}}} = {\text{MI}}_{t} /{\text{MIPS}}_{t}\), where\(~{\text{MI}}_{t} ~\) is the total of MI for all submitted tasks and \({\text{MIPS}}_{t}\) is the sum of all available MIPS [41]. Figures 11, 12, and 13 show BD of scheduling different cloudlets into different numbers of VMs. From the figures, the proposed ELHHO has the highest BD ratio across all test cases because it has the minimum SL in the experiment test cases. The results show that the proposed solution assigns the submitted cloudlets to available VMs with a higher BD ratio, preventing any VM from being overloaded at any time.

Fig. 11
figure 11

Average balance degree with 20 VMs

Fig. 12
figure 12

Average balance degree with 15 VMs

Fig. 13
figure 13

Average balance degree with 10 VMs

7 Conclusions

In this article, a multi-objective task scheduling problem was discussed and an enhanced version of the HHO algorithm, called ELHHO, was implemented to solve such problem and to overcome the limitations of the conventional HHO. The proposed ELHHO combines two scientific intelligent methods: elite opposition-based learning (EOBL) and the minimum completion time (MCT) with the Harris hawk optimizer. The ELHHO has less schedule length and execution cost than HHO, GA, WWO, and ACO with a high balance degree, throughput, and resource utilization. The proposed ELHHO was tested on two types of real applications (NASA Ames iPSC / 860 and the HPC2N). The results showed that the proposed ELHHO effectively assigns the submitted tasks onto the available VMs. The average schedule length minimization by ELHHO is up to 80% less than that of HHO for 200–2000 instances of tasks. The average execution cost by the ELHHO is up to 60% less than that of the HHO for all instances. Although the proposed ELHHO improves the performance, other directions for further improvement of the ELHHO may be considered. A security phase may be added to achieve security to the cloud user data and reduces the vulnerabilities. Indeed, the ELHHO may be tested on private and hybrid clouds with multiple users.