1 Introduction

Cloud Computing paradigm is a regime change in various industries, which changed utilization of computing, storage and network infrastructures and laid a platform to cope up with the evolvement of huge data in various industries especially to handle data intensive computations, large chunks of data storage. Therefore, this paradigm evolved as a utilization model through which all services i.e. computation, storage, network are given to consumers as services on demand. This model initially evolved as virtual infrastructure i.e., IaaS for various companies but later it was evolved as computing platform where we can develop our applications and install various software’s by using different services provided by cloud platform. Now a days cloud computing is useful in various sectors and some of the domains are mentioned here but not limited i.e., healthcare, education, entertainment, Government organizations, multimedia, transport, IoT, AI and ML. In cloud paradigm, services related to IoT, AI, ML, image processing requires huge processing capacity infrastructure as all these services consists of multimedia data which need to be processed accurately and scheduling multimedia data is a challenge in cloud paradigm. All the above-mentioned domains use various service models based on SLA. SLA depends on user and organization to which services they are subscribed. It is the responsibility of cloud provider to render services based on agreement and violation of SLA should not be happened from the cloud provider. Many of users are accessing virtual resources in cloud simultaneously and it is difficult to handle all these requests and assign virtual resources according to SLA is a challenging task and cloud paradigm provision resources automatically to users based on SLA without human intervention and these provisioning of virtual resources to tasks are to be handled by a scheduler.

The effectiveness of cloud computing paradigm mainly depends on how scheduler effectively manages tasks and schedules tasks onto suitable virtual resources. It also effects various parameters i.e. energy consumption, SLA violation that leads to the issues related to both cloud provider, user. If scheduler is not suitably mapping tasks to virtual resources then it directly effects makespan, which takes high amount of execution time, which leads to decay in quality of service. If a task takes huge amount of execution time then it may also incurs high amount of energy consumption. This can be also one of the reason to effect quality of service. Finally, if a given task is not expected to complete within stipulated time or if a task is provisioned for certain amount of time to a virtual resource but if user is still accessing resource after the provisioned time then it is a violation of SLA. It will happen in this paradigm due to improper scheduling of tasks to virtual resources. Therefore, it will cause a problem to cloud provider in view of SLA violations. Many of authors used heuristic techniques [4, 12, 17, 22] and nature inspired approaches [1,2,3] for tackling task scheduling but still there is lacking of an effective scheduler which schedules these dynamic tasks onto virtual resources appropriately while minimizing metrics such as energy consumption, makespan and SLA violations. Therefore, we have used a Machine learning technique i.e. Q-learning based on reinforcement learning method to solve task scheduling problem focused on multimedia tasks which fed to task manager by calculating priority of tasks, those tasks are fed to Q-learning model which takes decision based on upcoming tasks and tasks running in virtual machines. Tasks already running in VMs will be consolidated or migrated based on upcoming tasks at cloud console and decision taken by the ML model employed in scheduling algorithm while minimizing metrics makespan, SLA violation, energy consumption.

1.1 Motivation and contributions

The Cloud paradigm emerged as a utility computing approach where all computing, storage, network infrastructure to be given as a utility to cloud user. When all these services are given as utility with ease and seamless access, many of users will be attracted towards this paradigm. Theend users around the world who are working in different sectors are using cloud services based on their requirement. Providing cloud services to all users without any interruption is a huge challenge in cloud computing because cloud resources are heterogenous nature and upcoming requests are diverse as well as uncertain. Therefore, for assigning virtual resources to user requests there is need of an efficient scheduling approach that handles requests and map them onto virtual resources while maintaining the quality of service and SLA violations. The above reason motivates us to do the research in this area of cloud computing. We have also evaluated very important and primary parameters, which influences the performance of cloud model i.e., makespan- which is time taking to execute a task on a VM, SLA Violation- agreement made by cloud user and provider for the services, Energy consumption- which is consumption of energy by VMS at computation and idle time. The objective of our research is to optimize all these parameters without violating the conditions.

The contributions of the article are given below here.

  1. 1.

    A Scheduling algorithm is proposed by employing a ML technique which dynamically takes decision according to upcoming and existing tasks.

  2. 2.

    Deep Q- Learning network model is used as a ML technique, which is based on Reinforcement learning, and it is integrated into scheduling module.

  3. 3.

    The Extensive simulations are carried on Cloudsim. Initially random workload have been considered and then we have tested efficacy of our algorithm using HPC2N and NASA parallel work logs for evaluation of parameters makespan, energy consumption and SLA Violation.

  4. 4.

    The experimental results show that proposed approach DRLBTSA is superior to existing round robin, FCFS, Earliest deadline first, RATS-HM, MOABCQalgorithms.

The remaining paper is organized as follows: Existing state of arts approaches are presented and compared in section 2, problem formulation and proposed methodology based upon ML approach are discussed in section 3 & 4. The computing simulation results are discussed in section 5 and conclusion is discussed in last section of the article.

2 Related works

The authors formulated a resource allocation and security mechanism [5], which used a hybrid ML, approach i.e., RATS-HM technique. Totally, this work was done in three stages. In the first stage, a Cat Swarm optimization technique was used to address makespan, throughput. In the second stage, a DNN was used to address metrics such as bandwidth, load on resources for efficient allocation of resources to tasks. Finally, in the third stage a security authentication scheme was implemented to provide security to data stored in cloud. The Cloudsim [7] is used as simulation toolkit to assess the FCFS, RR algorithms performance, results it was identified that proposed RATS-HM mechanism shown a great influenceand surpass existing techniques for mentioned parameters. Authors in [26] proposed task scheduling model for large-scale cloud computing systems to address parameters i.e. task execution delay, resource utilization. Methodology chosen for this approach is a ML approach based on reinforcement learning. Four techniques are used for developing of scheduling mechanism i.e. RL, DQN, RNN-LSTM and DRL-LSTM. Matlab was used for simulation purpose. A real time dataset was taken from Google cluster, it was given as an input to algorithm, and among all techniques, DRL-LSTM performs better than other algorithms when they are compared with RR, PSO and SJF for above mentioned metrics.

Authors in [14], devised a scheduling mechanism AIRL based on reinforcement learning technique to schedule time sensitive requests in cloud. Main objective of AIRL is to minimize request response time, maximize success rate of user requests. Finally AIRL was compared over different schedulers i.e. RR, earliest, random, DQN. From Simulation results,the proposed AIRL shows a great effect over baseline algorithms. In [8], authors proposed scheduling algorithm which addresses QoS, cost of VMs, success rate, response time for scheduling model. This framework uses a DQN model, which works based on reinforcement learning. Entire experimentation was done on a real time cloud and it was evaluated against random, RR and earliest schedulers and from simulation results it was identified that DQN overcomes mentioned algorithms for mentioned parameters. [32], scheduling framework formulated minimizes execution time, waiting time of tasks. Authors used a ML technique i.e. CDDQLS based on reinforcement learning. Entire simulation carried on Cloudsim, posed deadline and resource constraints. After simulation CDDQLS evaluated over Random, Time shared, Space shared algorithms and it shown a great impact for mentioned algorithms. [10] proposed task scheduling model formulated to minimize makespan. It uses a ML approach named as DQN which uses reinforcement learning strategy for scheduling tasks. Experimentations conducted on MATLAB and compared against HEFT, CPOP algorithms. From results, it revealed that makespan greatly minimized over baseline mechanisms.

In [33], scheduling scheme designed to minimize makespan. A machine learning model used as methodology. QL-HEFT i.e. a combination of Q-Learning and HEFT algorithms. This process was done in two stages. In First phase, tasks will be sorted to get effective task allocation by Q-Learning. In second phase, processor allocation was done based on HEFT. Entire scheme implemented over Cloudsim. It compared with existing HEFT, CPOP algorithms. Finally this scheme was shown impact over existing approaches with respect to makespan. In [9], Dynamic task scheduling model which aims minimization of energy consumption, utilization of CPU. It was modeled by using Q-learning technique which is a ML approach. This mechanism is totally lies in two phases. In first phase, all incoming tasks are assigned with a VM in cloud using M/M/S queuing mechanism. In Second phase, by using decisions of Q-Learning tasks are allocated to corresponding VMs in cloud. This approach was implemented on Cloudsim, evaluated over Random, Fair Schedulers. In [25], trust aware scheduling mechanism was developed to minimize makespan, to improve QoS and to address security challenges posed in cloud environment. This work was done in three phases i.e. computation of trust levels of VMs, computation of priorities of tasks and careful scheduling of tasks based on above mentioned conditions. It was implemented on a Hadoop cluster and data generated onto Hadoop clusters are collected from Google cloud platform real workload traces and evaluated over PSO,SJF, RR algorithms and finally from results trusted aware scheduling performs better than existing approaches.

In [34], schedulingtechnique is formulated to optimize the significant QoS parameters and modeled by DQTS i.e. a combination of Q-learning and deep neural network. It was implemented on Workflowsim. Initial workload was generated randomly and used synthetic datasets. It evaluated against existing models. From results, it shows impact over existing mechanisms for load balancing. In [28], edge computing-based task scheduling algorithm was developed to maximize task degree satisfaction and success ratios. It was modeled with DRL to solve task scheduling and resource allocation. It was implemented by using python language and compared with FCFS and SJF state of art algorithms. From results above-mentioned parameters were improved to a great extent. In [19], DeepJS, a job scheduling mechanism developed to improve makespan to address scheduling issues in cloud datacenters. It uses reinforcement learning integrated with bin packing algorithm. It simulated by cloudsim and workload taken from real world workload traces. It compared against existing models, which uses heuristics, finally from results DeepJS converges fast more and minimizes makespan compared with other approaches. In [36], authors formulated a QoS aware scheduler aims at response time, utilization of VMs and user request distributions among VMs. It was modeled by using Deep Reinforcement learning. It was implemented on a customized simulation environment. Real world traces of NASA workload were used for simulation and evaluated over RR, FF, random, earliest and best fit approaches. From results, it observed that average response time minimized by DRL approach by 40% and success rate was improved by 93% over compared mechanisms. Authors in [11] formulated an effective scheduling mechanism in fog environment, which aims to reduce delay of service and computational costs. It was modeled by combining Deep Q-Learning and double Q-Learning mechanisms. It was implemented on ifogsim and evaluated against FF, GS and RS algorithms, evaluated metrics energy, cost, and these parameters shown a huge impact against existing algorithms. In [35],proposed workflow scheduling technique to address makespan, cost. Technique used in this technique was DQN model, which is a multi-agent technique based on reinforcement learning which gives rewards as time and cost. It was implemented on real time cloud environment i.e. AWS and extensive simulations were carried out and it shown huge impact in above-mentioned parameters. Authors in [27] developed an energy efficient task scheduler, which uses RANN model. GA used to generate dataset, which is of 18 million instances. It implemented on MATLAB, evaluated over existing approaches, this proposed task scheduler overcomes existing models by makespan, energy consumption, required active racks, execution overhead.In [5], an efficient resource allocation with light weight authentication scheme developed by authors. A hybridized mechanism developed i.e. RATS-HM. It consists of three steps. In first step, they used ICS-TS which optimizes makespan of scheduling mechanism, In second step, they used GO-DDN which is a deep neural network mechanism for efficient allocation of resources and in final step a light weight authentication mechanism developed. Experimentations conducted extensively on Cloudsim. From results, RATS-HM allocated resources effectively to users while addressing deadline constraints. In [16], a workload balancing strategy proposed by authors by addressing parameters i.e. cost, degree of imbalance, resource utilization. MOABCQ i.e. Q-learning added to modified ABC approach to model scheduling strategy. Extensive simulations conducted on Cloudsim. MOABCQ approach evaluated using realtime workload datasets and synthetic workloads. It compared with existing approaches and from results MOABCQ shows significant impact on existing approaches. In [29], authors used deep reinforcement learning approach used to propose energy aware task scheduling model developed to minimize energy consumption, makespan, resource utilization. It compared over SOTA approaches and deep reinforcement approach outperformed for above specified parameters. [37] proposed a task scheduling approach addresses energy concerns in datacenters for realtime workloads. DRL methodology used for energy aware scheduling. Extensive simulations revealed energy aware scheduling mechanism tackled realtime jobs in datacenters by minimizing energy consumption while improving QoS services provided by cloud provider.

From the above Table 1, all existing scheduling algorithms uses different variations of reinforcement learning techniques and addressed metrics, which we have mentioned in above table. Despite usage of above metrics task scheduling is still ineffective and therefore we have used Deep Q-Learning network to schedule tasks effectively by considering priorities of tasks and schedule them by the decision of ML model i.e. DQN addressed metrics makespan, SLA Violation, Energy consumption.

Table 1 Existing Task scheduling mechanisms using ML Techniques

In the below section, we have accurately defined problem, mentioned proposed system architecture in detailed manner.

3 Problem definition and proposed system architecture

In this section, problem definition is given below.

Definition

Assume we have K tasks, which are indicated as tK = {t1, t2, t3, …tK}, n VMs which are indicated as VMn = {VM1, VM2, VM3, …, VMn}, pphysical hosts indicated Hp = {H1, H2, H3, …, Hp} and r datacenters indicated DCq = {DC1, DC2, DC3, …. . DCq}. Scheduling problem defined in such a way that these K tasks are scheduled on to n VMs sitting in p physical hosts in turn resided in q datacenters. Incoming task priorities to be considered before scheduling onto VMs priorities of tasks are considered and fed to DQN model, which takes scheduling decision based on upcoming and current running tasks in underlying resources, which minimizes makespan, SLA violation and Energy consumption. The below table represents notations of proposed architecture (Table 2).

Table 2 Notations used in Proposed System Architecture

The optimal task scheduling architecture is represented in Fig. 1, which considers diverse requests from different users simultaneously. After submission of tasks to cloud interface application task manager collects those requests and calculates priorities of all tasks based on length of task, processing capacities of tasks. Further, it will be fed to DQN model based on task priorities, which is integrated with scheduling model. From the recommendations of scheduler, which is integrated with DQN model, have to schedule tasks appropriately onto the VMs. Initially, scheduler need to send these prioritized tasks onto execution queue and send these tasks onto VMs. In this proposed architecture, after every certain time interval T our scheduler needs to keep track of upcoming requests and resource manager about virtual resources. For every time interval T scheduler, which consists of DQN, model keeps track of upcoming requests, executing requests in VMs and virtual resources in resource manager. Therefore, based on these conditions scheduler will take a decision dynamically i.e. mapping a task to new VM or mapping a task to an existing VM or migrating existing tasks to another VMs if a VM is sufficient storage and processing capacity to accommodate running tasks. We have used Deep Q-Learning model as a methodology to schedule tasks intelligently based on the above said conditions for every time intervalT. It will update its decisions of scheduling to scheduler, takes care scheduling tasks appropriately onto VMs. Main aim of this scheduler to effectively task mapping to VMs based on their priorities minimizing parameters named as makespan, SLA Violation and Energy consumption. Initially evaluation of priorities of tasks need to check dependencies of task priorities of tasks. Therefore, evaluation of priorities of tasks entire load on VMs need to be calculated. The overall load on VMs can be identified by following eq. 1.

$$l{o}^{VM}=\sum l{o}^n$$
(1)
Fig. 1
figure 1

Proposed optimal task scheduling Architecture

Where lon indicates current running n number of VMs.

After calculation of current load on VMs, as all VMs are running in p physical hosts. Therefore, overall load on hosts are calculated using eq. 2.

$$l{o}^{H_p}=l{o}^{VM}/\sum {H}_p$$
(2)

Where \(l{o}^{H_p}\) indicates load on p physical hosts, loVM indicates current load on all VMs, Hp indicates hosts.

After calculation of load on VMs and physical hosts, identified a threshold value as cloud computing paradigm is dynamic and to process huge number of requests by VMs in a balanced manner and this load balancer need to work according to the requests coming onto VMs. Therefore, to have a load balancer in our model, a threshold value was calculated. Threshold value should be dynamic as cloud workloads are not static and different parameters such as upcoming requests, existing resource capacity etc. Therefore, threshold value in this model can be calculated using following eq. 3.

$$t{r}^p=\frac{\sum_{i=1}^pl{o}_i^{H_p}}{p}$$
(3)

Where trp is a dynamic threshold value identified in our work, \(l{o}_i^{H_p}\) is load on p physical hosts. This threshold value continuously changing as workload in cloud is dynamic and based on threshold value, utilization of hosts are calculated whether they are underutilized, balanced or over utilized. Utilization of hosts can be calculated using following eqs. 4, 5 and 6 respectively.

The below eq. 4 used to calculate over utilization of hosts.

$$V{M}_n>t{r}\;^p-\sum l{o}^{VM}$$
(4)

The below eq. 5 used to calculate underutilization of hosts.

$$V{M}_n<t{r}^p-\sum l{o}^{VM}$$
(5)

The below eq. 6 used to calculate balanced utilization of hosts

$$V{M}_n=t{r}^p-\sum l{o}^{VM}$$
(6)

From above equations, 4, 5 and 6 utilization of hosts through dynamic threshold value is calculated. Now, to schedule appropriate workload over cloud resources (VMs), need to calculate the processing power of resources as calculated in eq. 7. It is defined as multiplication of number of processing elements in VM, number of processing instructions per second in VM. It calculated by using following equn. Mentioned below.

$$p{r}_{ca}^{VM}=p{r}^{no}\ast p{r}^{mips}$$
(7)

The above equation shows that it is processing capacity of particular VM in n number of VMs considered in our architecture and after this entire processing capacity of VMs calculated as follows by using eq. 8.

$$ov{r}\;_{vm}^{pr}=\sum p{r}_{ca}^{VM}$$
(8)

Now after calculation of processing capacities of VMs, priorities of upcoming requests calculated based upon dependencies or inter-dependency, size of tasks, required resources by the request and many more parameters. Therefore, length of task is calculated using following eq. 9.

$${t}_k^l={t}_{mips}\ast {t}_{pr}$$
(9)

After calculation of length of task then priority of an incoming task onto scheduler can be calculated by using below equation.

$${t}\;^{prio}=\frac{t_k^l}{p{r}_{ca}^{VM}}$$
(10)

Based on priorities of tasks these are moved onto execution queue and map those tasks by scheduler to appropriate VMs. For this scheduling model, we have also considered a dead line constraint in a way that task should complete its execution before deadline i.e. dlt.

In this research work, our focus is to address parameters i.e. makespan, SLA Violation, Energy Consumption. Whenever makespan is evaluated, to calculate execution time because makespan is evaluated based on how much execution time it is taking for a task to run on a certain VM. Execution time of task for a certain task calculated by using following equation.

$$e{t}_{t_k}=\frac{e{t}_t}{p{r}_{ca}^{VM}}$$
(11)

For every task, which is scheduled into execution, queue gets a VM based on its availability in resources and it all depends on finishing time of a task. Therefore, finishing time of task calculated using below equation.

$$f{t}^{t_k}=\sum V{M}_n+e{t}_{t_k}$$
(12)

In this model, we assumed that each task should complete its execution within specified deadline. Therefore, for every task we are scheduling in this model finish time should always be less than or equal to its deadline. It is indicated as below.

$$f{t}^{t_k}\le d{l}^t$$
(13)

Here after mentioning deadline constraint, calculation of execution time, finish time then we have calculated makespan as in any scheduling model or mechanism makespan needs to be minimized. It is defined as execution time of tasks running over virtual resources. It is calculated as follows.

$${m}^k=\max \left(\ f{t}^{V{M}_n}\right)$$
(14)
$$\min ft\left({t}_kV{M}_n\right)=\sum\nolimits_{i=1}^k\sum\nolimits_{j=1}^n{\delta}_{ij} ft\left({t}_kV{M}_n\right)$$
(15)

From above eq. 15δij is set to 1 if task tk is assigned to VM i.e. VMn otherwise set to 0.

Thus, from eqs. 14 and 15 makespan is calculated.

Our next focus to minimize the energy consumption in cloud computing paradigm. It is one of significant and impactful parameters in cloud paradigm. As for processing the huge workloads, need the large scale infrastructure or cloud resources, which leads to increase the energy consumption as well as large emissions of CO2 [21, 38]and damages environment. Therefore, we are focusing on minimizing energy consumption in cloud paradigm. In Cloud model, energy consumption based on consumption for computing time, idle time. In this model for a VM energy consumption is calculated as follows. In cloud computing model any VM either should be in active state i.e. computing instructions or it should be in idle state represented in below eq. 16.

$$V{M}_n=\left(\genfrac{}{}{0pt}{}{\gamma_n\kern0.5em Active\ State\ of\ VM}{\tau_n\kern0.5em Idle\ State\ of\ VM}\right)$$
(16)

Energy consumption of all n VMs are calculated by using following equation

$${e}_{V{M}_n}^{con}=f{t}_n\ast {\gamma}_n+\left({m}^k-f{t}_n\right)\ast {\tau}_n$$
(17)
$${\min}_{act}^{con}=\left({e}^{mx}-{e}^{mn}\right)\ast re{s}^{util}+{e}^{mn}$$
(18)

Energy consumption in datacenter calculated as follows

$${e}^{con}=\sum {e}_{V{M}_n}^{con}+{\min}_{act}^{con}$$
(19)

Thus, from eqs. 16, 17, 18 and 19 Energy consumption in cloud computing calculated.

Our next focus is to minimize SLA Violation in cloud computing. Here, we need to discuss importance of SLA Violation. Service Level Agreement is one of the important perspective in terms of both user and cloud provider as if our system does not work according to SLA then problems will be persisted for both cloud provider and user. Therefore, it is important to design our scheduler, which should not violate SLA made between both user and cloud provider. Thus, we have defined SLA violation in cloud computing by using below equation.

$$SLAV=\frac{1}{p}\sum\nolimits_{i=1}^p\frac{T_i^{over}}{T_i^{over- active}}\ast \frac{1}{n}\sum\nolimits_{j=1}^n\frac{T_j^{pd}}{T_j^{cpc}}$$
(20)

Where p indicates number of physical hosts, \({T}_i^{over}\) indicates total time for which host gets overloaded, \({T}_i^{over- active}\) indicates amount of time a host lies in active state.\({T}_j^{pd}\) indicates estimation of performance degradation for a VM, \({T}_j^{cpc}\) indicates requested CPU capacity of a VM during its specified time.

4 Methodology

This section precisely discusses about methodology used to design our scheduling algorithm. We have used a reinforcement learning approach [31], which takes adaptive decisions. It is a machine learning approach, which considers inputs and gives decisions based on the history of previous events. Over a period of time it learns from previous decisions and makes adaptive decisions. For any Reinforcement learning approach there are three basic parts. They are 1. Input and Output states- The data which we are given as an input to the model is Input and the output state is to represent an outcome processed by algorithm based on data given as Input. 2. Rewards- This state is a representation of outcome by algorithm with positive or negative reward. 3. Artificial Intelligence framework- This can be used to take a decision based on input supplied to algorithm and gives outcomes through which rewards will be generated that may be good or bad. If those rewards generated at a time T are positive and framework captures those positive rewards and continue towards the next state for much more optimal decisions. If rewards are negative, it will learn from that experience and it will try to improve its decision making for the next state.

The complete functionalities of proposed methodology presented in the form of pseudo code.

figure a

Proposed Methodology Pseudo Code

Time complexity of the proposed Methodology:

The total time complexity of proposed methodology depends on time complexities of individual components of proposed methodology. Let’s break down each component and analyze its time complexity:

  1. 1.

    Collect requests and calculate task priorities: Time complexity of this component is O(n), n indicates number of requests received from cloud interface application.

  2. 2.

    Feed task priorities to DQN model: The time complexity of this component depends on the implementation of the DQN model. In general, the time complexity of a single forward pass through a neural network with n layers is O(n), since each layer involves a matrix multiplication operation.

  3. 3.

    Schedule tasks onto VMs: The time complexity of this component depends on the implementation of the scheduler and the DQN model. The time complexity of scheduling tasks using a DQN model is typically higher than traditional scheduling algorithms like round robin, as it involves training a neural network on a large dataset. The time complexity of adding tasks to the execution queue and scheduling them onto VMs depends on the implementation of the execution queue and VM manager.

  4. 4.

    Keep track of upcoming requests and resources every T time interval: The time complexity of this component is O(1), since it involves fetching the upcoming requests and virtual resources from the resource manager.

  5. 5.

    Make dynamic decisions: The time complexity of this component depends on the number of decisions to be made, and the time complexity of each decision. In general, time complexity of component is O(m), where m is number of decisions to be made.

Overall, time complexity of algorithm can be approximated as O(n + f + s + m), where n is number of requests received, f is time complexity of feeding the task priorities to the DQN model, s is the time complexity of scheduling tasks onto VMs, and m is the time complexity of making dynamic decisions.

Reinforcement learning approach uses agents to take decisions based on inputs given to system. Generally, any machine-learning model consists of different states. Therefore, agents will take specific actions based on input, which generates rewards either good or bad. These rewards can be useful for the decision to be taken by the algorithm in the next state. Reinforcement learning approach works based on these rewards and agents will try to take actions in next state based on reward generate in this current state. This approach learns from its previous states whether it is a good or bad reward and will take its decisions over a period of time. This adaptive nature i.e. self-learning based on its previous states is one of the advantage of reinforcement learning approach.

The above Fig. 2 gives representation of task scheduling using deep reinforcement learning in which, agent will learn through history of incoming user request sequences i.e. agent will be trained through previous user requests or tasks which are coming onto cloud console. Initially, a Prioritized user request is to be given as input to agent and it should make a scheduling decision based on situation in cloud environment. Decision would be given as an output of executed task or user request, which is (e.g. makespan, energy consumption, SLA violation in our study) a reward for an agent. If the value of reward is bad then agent improve its decision by updating its parameters of model. If the value of the reward is good then it will be stored in the current state and it will be used for the next time when decision is to be made by the agent in the next state.

Fig. 2
figure 2

Deep reinforcement learning technique for scheduling the tasks

In reinforcement learning, we have used Q- learning for our scheduling model [30]. This Q- learning is one of the most powerful technique as it doesn’t need any knowledge of current system. It will make decisions based on past actions stored as Q-function as a pair with two states indicated a q(S, A). It will updates its states by using below equation.

$$q\left({S}^t,{A}^t\right)\leftarrow q\left({S}^t,{A}^t\right)+\sigma \ast \left[r{e}^t+\hbox{\pounds}\ast maximu{m}^aq\left({S}^{t+1},A\right)-q\left({S}^t,{A}^t\right)\right]$$
(21)

Where σ is rate of learning and value of it is in between (0,1). ret is reward for taking action i.e. At for state St. £ is discount factor and its value lies in between (0,1).

From above equation for every iteration, q-learning model needs to check for rewards and updates its decisions according to model by using above equation. In classical q-learning, all q values are stored in q-table but to apply these q-learning model to a problem such as scheduling in cloud computing it is difficult to adaptive and optimal decisions for classical model as number of states and actions are comparably high for task scheduling technique. Therefore, we want to use a deep neural network in combination with reinforcement learning which can be helpful for our scheduling problems in cloud computing. Therefore, we have used a Deep Reinforcement Learning model [39] to tackle the problem of scheduling in cloud computing. Moreover, that Deep Reinforcement Learning approach already proved in various types of scheduling techniques in cloud computing as mentioned in [15, 26]. Therefore, combining deep neural network with reinforcement learning used for task scheduling in our model. The main reason to usethis scheduling model is to make it as a smart scheduler no prior knowledge will be given to agent and algorithm need to take a decision when real time data is given as input. It consists of different states as in q-learning, which consists of action space and state space.

4.1 Action space

In action space, as we have already mentioned n VMs which we have considered in this work. All incoming requests initially fed to task manager and after each task priority will be calculated and given to scheduler and it consists of DQN model which takes decision and sends tasks to a execution queue with respect to priorities of tasks. Then based on decision of Scheduler they need to execute on VMs according to the entry of tasks in queue according to their priorities. Thus, action space in our model is defined as follows.

$$A=\left[V{M}_1,V{M}_2,V{M}_3,\dots, V{M}_n\right]$$
(22)

4.2 State space

In this subsection, we are defining a state space where it consists of state of a task at specific time and state of a VM at that time when task arrives.

Let us assume that a task t arrives at time T and it is to be represented as Tt. Then, state of this task can be represented as follows

$${S}_{Tt}={S}_t\cup {S}_{Tt}^{VM}$$
(23)

Where St is a state of a task t at time T and \({S}_{Tt}^{VM}\) is a state of a VM when a task t comes on to a VM at time T.

$${S}_{Tt}=\left[{t}_k^l,{t}\;^{prio},e{t}_{t_k},f{t}^{t_k},{m}^k,{e}_{V{M}_n}^{con}, SLAV\right]$$
(24)

Where \({t}_k^l\) is length ofk tasks, prio is priority of ktasks, \(e{t}_{t_k}\) is execution time of ktasks, \(f{t}^{t_k}\) is finish time of ktasks, mk makespan of ktasks, \({e}_{V{M}_n}^{con}\) energy consumption of n VMs, SLAV is SLA Violation.

4.3 Reward function

Aim of this study is to find optimal mapping between cloud resources, mixed tasks with help of our DRLBTSA scheduler to optimize significant QoS parameters energy consumption, time, SLA violation. Thus, our reward function should be in terms of minimization of metrics mentioned in our work. It can be defined as follows.

$$re=\min \left({m}^k,{e}_{V{M}_n}^{con}, SLAV\right)$$
(25)

4.4 Training the agent

When incoming tasks arrived at scheduler DRLBTSA agent need to make decision in current state by considering priorities of tasks and resources available in Physical hosts and according to that it should map tasks to VMs. For this to happen, our DQN model need to be trained in such a way that initially it should map a task to a VM by considering above said condition with a probability ρ and its value decrease over to zero with respect to time. Therefore, initially DQN agent explores randomly and give its decision and later it gives its decisions by previous q-values stored in q-table. To make it happen, experience replay, fixed q-target values are to be used in algorithm. Whenever agent takes decisions over a period of time it will gain experience and it is to be represented as experience replay here in our work. Whenever a decision is taken by agent it will gives you a reward and it is represented as ret and further state is to be represented as St + 1. These experiences will be stored as values in a memory to be represented as replay memory indicated as ω. The values to be stored in this replay memory are (At, St, ret, St + 1) and capacity of a replay memory to be represented as Mω, and replay memory can be taken as batches and indicated as Gω. Whenever iterations are to be run and values in replay memory are to be updated and here iterations are to be called as batches. In our work, entire training was done in offline. After completion of training to our agent then it will become intelligent enough to take decisions in a smart way. In our work, 50 neurons were used for hidden layers in DQN model. We have kept scheduling time for taking decision for an agent is 10 ms, frequency of learning is f = 1, time for learning of agent taken as ε. The proposed DRLBTSA algorithm is shown below.

figure b

Proposed DRLBTSA- Proposed Deep reinforcement learning based Task Scheduling Algorithm in Cloud Computing.

The above algorithm flow is discussed here in a detailed manner. Initially all parameters such as batch size, replay memory, learning frequency, learning rate and discount factor are initialized. In the next step, q-function consists of state space and action space are initialized to zero. In the next step, for each event i.e. for every incoming task comes to cloud platform priority of tasks are calculated. For every event, scheduler need to choose action space i.e. VMs for corresponding state space i.e. tasks based on priority of tasks, availability of resources in physical hosts. Based on this condition, every time scheduler need to make a decision. In the next step, tasks according to their priorities to be scheduled to corresponding VMs with a random probability if it is a first task which is to be scheduled for first time otherwise DRLBTSA need to take a decision from existing q-table available for agent. After this step when action space to be chosen for every state space a reward will be generated. Here in our work, it is minimization of parameters i.e. makespan, energy consumption, SLA Violation. Reward value can be calculated by using eq. 25. If it is positive reward, reward score will be improved and if it is negative then it have to improve i.e. scheduler need to improve based on its experience. If positive rewards are encountered i.e. makespan, energy consumption, SLA violation they will be updated as minimized values of the current state. Rewards either positive or negative they will be stored in replay memory. After evaluating rewards it should update its state space to the next state by using eq. 21. This process continues until last state space i.e. task is encountered (Fig. 3).

Fig. 3
figure 3

Flowchart of proposed approach

5 Simulation set up and experimental results

This section discusses simulation, results of our work. Entire simulation is conducted on cloudsim [7] simulator. We have identified real time parallel worklogs from HPC2N [13] and NASA [23] which are of high performance computing clusters, those workloads are given input to our algorithm. After that we have fabricated different datasets on our own with different distributions and those were explained here in a detailed manner.

5.1 Configuration settings for simulation

Our simulation was carried out on cloudsim [7]. We have done extensive simulations using cloudsim [7]. In our work, initially we fabricated datasets in such a way that tasks have to be with different distributions and these workload distributions are fed to scheduler. Then to test our work efficiency, we used HPC2N [13], NASA [23] worklogs from high performance computing clusters. The fabricated datasets distributions are considered as follows i.e. Uniform, Normal, left and right skewed distributions. We represent all these datasets with uniform, normal, left and right skewed distributions as d1, d2, d3, d4. Worklogs of HPC2N and NASA are represented as d5 and d6. Uniform distribution tasks means all types of tasks distributed in an equal manner. Normal distribution represents more number of medium size tasks, less number of small and large tasks. Left skewed distribution indicates more number of small tasks, less number of large tasks. Finally Right skewed distribution indicates less number of small tasks, more number of large tasks. We have intentionally fabricated these distributions as we need to verify how our algorithm behaves with different types of tasks. Finally we have given HPC2N and NASA parallel worklogs as d5 and d6 datasets as input to algorithm to check its efficiency through realtime workload. After giving workload as input to algorithm, we have evaluated our DRLBTSA against existing baseline algorithms RR, FCFS, Earliest deadline first, RATS-HM, MOABCQ algorithms. We have taken standard configuration settings for our simulation from [20]. The following Table 3 clearly mention configuration settings required for simulation, Table 4 indicates various parameter settings of compared approaches with proposed DRLBTSA.

Table 3 Configuration Settings in Simulation
Table 4 Parameter Settings for various algorithms

5.2 Calculation of makespan

Makespan of tasks are evaluated using configuration settings available in Table 3 and different workloads are given to our DRLBTSA scheduler. Initially we have given workloads of d1, d2, d3, d4, d5 and d6 datasets and evaluated makespan using these datasets. We have evaluated our work against existing baseline algorithms RR, FCFS, Earliest deadline first. DRLBTSA ran for 100 iterations. Below Table 5 indicates calculation of makespan.

Table 5 Evaluation of makespan

The above Table 5 shows makespan of different tasks with different fabricated datasets i.e. d1, d2, d3, d4 with different distributions and workloads from HPC2N [13], NASA [23]. From Table 4, it was clearly shows that our DRLBTSA algorithm minimized makespan over RR, FCFS, EDF,RATS-HM,MOABCQ algorithms.

The above Fig. 4 and Table 4 clearly shows that our proposed DRLBTSA approach evaluated over RR, FCFS, EDF, RATS-HM, MOABCQ algorithms and from simulation results makespan is minimized over the mentioned algorithms.

Fig. 4
figure 4

Calculation of makespan using DRLBTSA

5.3 Calculation of energy consumption

Energy consumption is evaluated using configuration settings available in Table 3 and different workloads are given to our DRLBTSA scheduler. Initially we have given workloads of d1, d2, d3, d4, d5 and d6 datasets and evaluated Energy Consumption using these datasets. We have evaluated our work against existing baseline algorithms RR, FCFS, EDF, RATS-HM, MOABCQ. DRLBTSA ran for 100 iterations. Below Table 6 indicates calculation of Energy Consumption.

Table 6 Evaluation of Energy Consumption

The above Table 6 shows Energy consumption of different tasks with different fabricated datasets i.e. d1, d2, d3, d4 with different distributions and workloads from HPC2N [13], NASA [23]. From Table 5, it was clearly shows that our DRLBTSA algorithm minimized energy consumption over RR, FCFS, EDF,RATS-HM, MOABCQ algorithms.

The above Fig. 5 and Table 5 clearly shows that our proposed DRLBTSA approach evaluated over RR, FCFS, EDF, RATS-HM, MOABCQ algorithms and from simulation results Energy Consumption is minimized over the mentioned algorithms.

Fig. 5
figure 5

Calculation of Energy Consumption using DRLBTSA

5.4 Calculation of SLA violation

SLA Violation is evaluated using configuration settings available in Table 3 and different workloads are given to our DRLBTSA scheduler. Initially we have given workloads of d1, d2, d3, d4, d5 and d6 datasets and evaluated SLA Violation using these datasets. We have evaluated our work against existing baseline algorithms RR, FCFS, EDF, RATS-HM, MOABCQ. DRLBTSA ran for 100 iterations. Below Table 7 indicates calculation of Energy Consumption.

Table 7 Evaluation of SLA Violation

The above Table 7 shows SLA Violation of different tasks with different fabricated datasets i.e. d1, d2, d3, d4 with different distributions and workloads from HPC2N [13], NASA [23]. From Table 6, it was clearly shows that our DRLBTSA algorithm minimized SLA violation over RR, FCFS, EDF,RATS-HM, MOABCQ algorithms.

The above Fig. 6 and Table 6 clearly shows that our proposed DRLBTSA approach evaluated over RR, FCFS, EDF,RATS-HM,MOABCQ algorithms and from simulation results SLA Violation is minimized over the mentioned algorithms.

Fig. 6
figure 6

Calculation of Energy Consumption using DRLBTSA

5.5 Results discussion

In this section, we discussed about results of evaluated parameters and their improvement over existing algorithms i.e. RR, FCFS, EDF, RATS-HM, MOABCQ. We have clearly mentioned about improvement of our DRLBTSA over existing baseline algorithms for parameters makespan, energy consumption, SLA Violation. The below Tables 8, 9 and 10 indicates improvement of makespan, energy consumption, SLA Violation.

Table 8 Improvement of makespan over existing algorithms
Table 9 Improvement of Energy Consumption over existing algorithms
Table 10 Improvement of SLA Violation over existing algorithms

Table 8 clearly represents our proposed DRLBTSA improves makespan over compared existing algorithms with different varying workloads.

Table 9 clearly represents our proposed DRLBTSA improves Energy Consumption over compared existing algorithms with different varying workloads.

Table 10 clearly represents our proposed DRLBTSA improves SLA Violation over compared existing algorithms with different varying workloads. In Tables 8, 9 and 10 improvement of results means it is minimization of parameters mentioned in our work.

6 Conclusion and future work

The scheduling of diverse workload over cloud paradigm is a challenge issue, due to dynamism and heterogeneity nature of cloud computing. It is very difficult to map tasks to precised VMs. Many existing authors proposed various scheduling mechanisms to map tasks to VMs but still there is a chance to do research in this area for mapping of tasks to appropriate VMs. The scheduling in cloud model is highly dynamic scenario as many of miscellaneousworkload request the resources in multi-tenant environment to accomplished the demand based upon the processing capacities. To effectively map every task onto suitable VM, we have proposed DRLBTSA approach that find the optimal resources considering priority of taskswhile minimizing makespan, SLA Violation, Energy Consumption. We have used a machine-learning model i.e. DQN-model to solve task scheduling problem in our research. This DQN model is one of the variants of Deep Reinforcement learning. In this work, we have extensively done the simulations on cloudsim and input to algorithm is done through fabricated datasets different distributions, realtime parallel worklogs from HPC2N, NASA. We have evaluated our algorithm against existing baseline algorithms i.e. FCFS, RR, EDF, RATS-HM, MOABCQ. This simulation ran for 100 iterations. From results, it observed that our proposed approach i.e., DRLBTSA shown impact over baseline algorithms for above mentioned parameters. In future, we will test the efficacy of DRLBTSA by deploying in open stack and we want to generate realtime workloads in open stack environment and test the efficacy of our scheduler.