1 Introduction

The word serverless does not mean that it has gone ‘no Servers’. Nevertheless, the cloud provider avoids the difficulty of maintaining specific servers and offers an ephemeral computing system that will execute a software piece on request caused by programs and incidents, paying the network client only during the execution time. Serverless Computing [1, 2] is a fresh and impressive paradigm for cloud applications [3,4,5,6,7], largely due to the recent shift to containers and micro-services of enterprise applications [8]. Cloud infrastructure is becoming crucial with the increasing application of cloud infrastructure to deliver a variety of IT services. However, only qualitative network performance information is revealed by cloud providers [9].

Serverless computing is a newly emerging paradigm that usually refers to a software architecture in which the application is breakdown into events or triggers and functions or actions, and where a platform provides an easy-to-develop, manage, scale, and operate the hosting and execution environment. Serverless architectures offer enormous potential for system research [10].

Serverless computing is an abstract server management model and low-level infrastructure decisions away from developers. This provides a genuine, resource-free pay-by-go service, and reduces the barriers by entrusting the cloud provider with all operational complexities. The serverless computing system is closer to original expectations when clouds to be treated as a utility service compared to other cloud computers. This emerges as a new and impressive cloud application deployment paradigm, mainly because the enterprise application architectures have recently moved into containers and micro-services.

From the cloud provider point of view, serverless technology provides an important incentive for monitoring an entire development stack, lowering operating costs by leveraging and handling server resources efficiently, presents a mechanism that facilitates additional services in their environment and decreases the effort needed for the creation and maintenance of web-based applications.

The user may ask how this differs from the Platform-as-a-Service (PaaS), which also removes server management. A serverless system has a strip-down, status-based software model. By comparison with PaaS, the programmers were able to write arbitrary code without only using a prepaid program. The Function-as-a-Services (FaaS) is also a variant of serverless that explicitly utilizes functions as a delivery system.

In this analysis, the capabilities of serverless cloud computing are considered. Nonetheless, the following constraints on the serverless framework are required to address such features for efficient computing. In this study, we consider the characteristics of serverless computing in cloud platforms. However, such characteristics in a serverless platform offer the following limitations, which need to address for affective computing.

  • Use is typically measured so clients simply compensate for the time and resources that are used for serverless operations. This size flexibility is one of the key differentiators for a serverless system. The measured resources such as memory or Central Processing Unit (CPU) and the pricing, such as off-speak discounts that tend to differ between the suppliers.

  • There are a variety of limits set on serverless application runtime resource requirements, including the number of concurrent queries, and the total storage and CPU resources available for calling a method. Some limits can be increased by increasing the need for users such as the concomitant request threshold, while others, such as the maximum memory size, are inherent in platforms.

Even though serverless runtimes are limited in terms of its applications requiring lightweight data and storage, the proposed system uses the workload prediction and allocation using a machine learning algorithm where the serverless applications are improved beyond cloud runtimes.

In this paper, we propose a machine learning model to parallelize the jobs allocated to the event queue and the dispatcher of the serverless framework. We hence use the GWO model to improve the process of task allocation. GWO [11,12,13,14,15] is optimized with RIL, where GWO parameters are optimized by RIL and this helps in proper allocation of tasks. The study concentrates mainly on intra-datacenter network path rather than inter-datacenter.

The outline of the paper is presented as follows: Sect. 2 discusses the related works. Section 3 presents the preliminaries and Sect. 4 with the proposed method. Section 5 evaluates the entire work and Sect. 6 concludes the work with possible directions for future scope.

2 Related Works

Various studies are offered to improve serverless computing, however, very few of which focus only on task allocation and most other works focus on architecture design issues.

Li [16] describes MutualCast, a multi-stakeholder real-time audio-conferencing network without server peer-to-peer (P2P). The peers form a fully connected click in MutualCast. Each peer switches to blend and distribute compressed audio during the conference session. The audio is split into frames, and the number of frames that certain peer blends and produces is equal to the usable peer tools such as bandwidth or measurement energy. The audios are split into frames MutualCast balances between all participants the service charge required for mixing. It allows multi-party conferencing without a strong server.

Alqaryouti and Siyam [17] uses the hybrid approach that incorporates both FaaS and Infrastructure-as-a-Service (IaaS). Through FaaS, we may execute small tasks remotely and concentrate on big tasks only. It helps to increase resource utilization [7, 18,19,20,21,22] when minor activities are not taken into account during the planning process. An expansion of the cloud provider, which is a limited time and that, requires a full system to operate with the serverless architecture, eliminating the question of scheduling.

A method that analyzes the awareness of the serverless environment and agrees on a solution to increase the running time and at the same moment exploring different options which can result in better results has been established by Pinto et al. [23]. Besides, the technology built is also able to detect a server-less application remotely and resolve the problem by running the task locally and successfully addressing the query.

Denninnart et al. [24] optimize the roughness of a serverless framework, especially when it is overwritten, which can provide a serverless computing system. Our approach to optimizing strength is to build a task-splitting mechanism that can be implemented without altering current task-mapping heuristics. Taking activities that reach their deadlines at a low probability increases the probability that other tasks meet their deadlines, thus increasing process capacity and overall Quality of Service (QoS).

Lloyd et al. [25] developed and identified four different states of serverless computing infrastructure that include provider, Virtual Machine (VM), container, and warm cold that demonstrate the performance of microservices-based on the states.

Jonas et al. [26] predicted that serverless issues are considered to be solvable and that dominates the future of computing. Fox et al. [27] finds the issues on Serverless Computing and Pérez et al. [28] addresses similar issues using a methodology for the creation of Serverless Container. This method forms a high dynamic parallel event-driven serverless applications running on runtime environments. This method helps in reducing the cost of a massive image processing. Van Eyk et al. [29] provide the fundamental understanding of serverless computing and its use is studied by Feng et al. [30] based on its runtimes where the data parallelism is leveraged on all large models. In such motivation, a low-cost access serverless computing model is demonstrated by Kumanov et al. [31] and offers lesser cost than Amazon Web Services.

In edge-computing applications, Nastic et al. [32] novel approach implements cloud-based, real-time data analytics. The data analytics on a serverless edge platform is addressed on real-life healthcare use.

Cicconetti et al. [33] developed a framework that executes the lambda functions on the serverless model. This method supports distributed algorithms to adapt dynamically with executed requests for optimizing the performance. Finally, Wurster et al. [34] developed an event-driven model and this employs a standard lifecycle for provisioning and management of serverless applications (Table 1).

Table 1 Summary of various serverless frameworks

Most of this method deploys serverless computing on the distributed platform but most of it falls in implementing a scheduling strategy that adopts a heuristic approach to improve the task scheduling in the serverless state.

3 Preliminaries

Numerous misconceptions start with the name around serverless. Servers tend to be important, but developers do not have to contend with the maintenance of these servers. The serverless system takes charge of decisions like the number of servers and their capacities and automatically supplies the server resources when the workload is needed. This provides an abstraction where computing is disconnected from where it will take place in the form of a stateless function.

A core processing system as shown in Fig. 1 is the central functionality of a Serverless System. The system may handle a series of user-defined functions, take an event sent via Hypertext Transfer Protocol (HTTP), or retrieved from the event origin, specify the function(s) to which it should be sent, locate an established instance of the task, or create a new instance; send an event to the feature instance.

Fig. 1
figure 1

Serverless platform architecture

The challenge is to incorporate these functions when considering factors such as expense, scalability, and sensitivity for defects. A task must be initiated, and its data handled quickly and efficiently. The system must also schedule activities, scheduling tasks, and handle halting and allocating resources to empty operating instances based on the position of the queues and arrival time of the activity. The project also must analyze attentively how losses in a cloud environment can be assessed and controlled.

The problem formulation for the present study is considered under the assumption: (1) prior knowledge on computational time period of each task and (2) similar overheads prior scheduling of tasks.

Task Constraint is modeled as:

$$\begin{aligned} \gamma_{i} & = \frac{{c_{i} }}{{p_{i} }},i = 1,2, \ldots N \\ r_{ij} (\forall i) & = \left\{ {\begin{array}{*{20}l} 0 \hfill & {j = 1} \hfill \\ {d_{ij - 1} } \hfill & {j = 2,3, \ldots ,n_{i} } \hfill \\ \end{array} } \right. \\ d_{ij} & = r_{ij} + p_{ij} ,\,\,\,\,i = 1,2, \ldots ,N,\,\,\,j = 1,2, \ldots ,n_{i} \\ s_{ij} & < s_{{i^{\prime}j^{\prime}}} < f_{ij} , \\ if\, & priority\left( {\tau_{{i^{\prime}}} } \right) > priority\left( {t_{i} } \right)\;and\,r_{{i^{\prime}j^{\prime}}} \ge r_{ij} \forall i,j,i^{\prime},j^{\prime} \\ \end{aligned}$$

where i, task index; j, index of jth task executed; N, total tasks; T, scheduled time; ni, total task executed; rij, release time (j) of task τi; sij, start time (j) of task τi; fij, finish time (j) of task τi; i, time required for executing τi with τi.

4 Gray Wolf Optimization (GWO) Based Reinforcement Learning (RIL)

The technique suggested is focused on connecting GWO [35] and RIL [36] algorithms. Such two algorithms are currently used to determine load balance based on time and expense between asset and output as an approximation algorithm. With such hybridization, the process is accelerated, with local optimization improved and precision improved [37]. Earlier, as the primary solutions to the problem mentioned are solved using GWO and RIL algorithms.

In the current system, a server cluster consisting of \(M\) servers with \(D\) number of services w.r.t. distribution of cloud resources and the energy management process is considered. During active mode or sleep mode, a server normally uses to conserve energy. We call \(M\) as the physical server arrays and \(D\) as the resource collection.

Figure 2 shows the centralized distribution of cloud services and power management, which includes a global level and a local level. A task broker, managed by the global level in the centralized system suggested, dispatches the jobs to one of the servers at its time of arrival.

Fig. 2
figure 2

Resource allocation framework in serverless computing

The server transfers the delegated workers as First Come First Serve (FCFS) basis and allocates resources to them. If a computer does not have enough resources to perform a task, it waits for the availability of enough resources. On the other side, the local level handles control and distributes on or off for every server. Two significant influences on the overall power usage and efficiency of the server cluster are the timing and the local power control.

The job broker must prevent overloading servers to decrease task latency. To dynamically allocate work to the server, a scheduling system must be built, and assets distributed in every server.

If a work is assigned to a sleep mode server, it takes \(T_{on}\) for switching the server to an active mode. Similarly, \(T_{off}\) needs time to switch the network back to sleep mode, which is determined in a distributed manner by the local power manager. We assume that in sleep mode, the power consumption is zero and inactive mode it is based on CPU utilization [38],

$$P\left( {x_{t} } \right) = P\left( {0\% } \right) + \left( {P\left( {100\% } \right) - P\left( {0\% } \right)} \right)\left( {2x_{t} - x_{i}^{1.4} } \right)$$

where \(x _{t}\), denotes the CPU utilization of the server at time \(t\); \(P\left( {0\% } \right)\), denotes the power consumption of the server in the idle mode, and; \(P\left( {100\% } \right)\), denotes the power consumption of the server in a full load.

The server power consumption given in Fig. 3 during the transition from the sleep state to an active state is considered to be higher than \(P\left( {0\% } \right)\) [39].

Fig. 3
figure 3

Power management at server

A stable functioning estimation and a properly designed power controller are the secret to the Device Policy Management (DPM) architecture for local servers. In this paper, the GWO is considered as a good candidate for an indicator of workload to allow a more reliable forecast with continuous values. The power manager has to take the most appropriate measures (time-out values) to reduce server power consumption and job latency, and the RIL technique [40] serves as a good candidate for the adaptive power management Algorithm, with the accurate preview and updated system information being managed.

The DPM framework of local servers relies heavily on a confident workload prediction and a properly designed power manager. In this paper, we want to perform a more accurate (time-series) prediction with continuous values, and thus a GWO becomes a good candidate for the workload predictor. With accurate prediction and current information of the system under management, the power manager has to derive the most appropriate actions (timeout values) to help simultaneously reduce the power consumption of the server and the job latency, and the model-free RIL technique [40] serves as a good candidate for the adaptive power management algorithm.

4.1 Gray Wolf Optimization (GWO)

The problems with optimization were defined as a decentralized cloud-based network with S1, S2, S3,…,Sn resource structures. The resources are available to various nodes in the distributed network. Various tasks are sent through nodes to the root networks. The scheduler is responsible for assigning in a distributed system one or more jobs. The developers provide a timetable for the distribution of resources [39]. Many workers in the distributed system are allocated and performed concurrently in t time.

The total number of variables Tk is regarded as the combination of both resources and the jobs, where the variable is referred to as P and it is given as P = nm, where n is regarded as the total tasks and m are regarded as the total resources.

Each node is assumed with several jobs namely j1, j2,…, jn. A collection of unique resources R1, R2,…, Rm is needed for each job. If the resources allocated are required to a job with 1% of processing power, then the technical model can be described as what kind of jobs use which type of resources in order of achieving the average response times, maximum load efficiency, and minimal costs. All potential assignment modes have to be determined and the best mode selected for the exact solution to the problem. The problem is an instance of set packaging problems, which is an NP-complete type, owing to the great number of exponential modes.

The optimization function is can be defined in terms of a resource i allocated or a specific job j with yi representing the total number of allocated resources, xij stating where the resource i is assigned for a job j, C for each resource is then regarded as the maximum capacity and finally, wi represents the total number of jobs i a resource covers.

4.1.1 Fitness Function

The objective function for the required job allocation task is stated below:

$$\begin{aligned} & \hbox{min} B = a\left[ {1 - L_{{\left( {y_{j} } \right)}} } \right] + bC_{{\left( {y_{j} } \right)}} + cT_{{\left( {y_{j} } \right)}} \\ & \sum\limits_{i = 1}^{n} {w_{i} x_{ij} \le Ky_{j} ,} \forall j \\ & \sum\limits_{j = 1}^{n} {x_{ij} \le b_{j} ,} \forall j \\ & x_{ij} ,y_{i} = 0,1\,\,\,\,\forall i,j \\ & x_{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {job\; {\mkern 1mu} j\; {\mkern 1mu} is\; {\mkern 1mu} used} \hfill \\ 0 \hfill & {job\; {\mkern 1mu} j\; {\mkern 1mu} is\; {\mkern 1mu} not\; {\mkern 1mu} used} \hfill \\ \end{array} } \right. \\ & y_{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {resource\; {\mkern 1mu} j\; {\mkern 1mu} is\; {\mkern 1mu} used} \hfill \\ 0 \hfill & {resource\; {\mkern 1mu} j\; {\mkern 1mu} is\; {\mkern 1mu} not\; {\mkern 1mu} used} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

The aim is to decide the minimum machines yj to reduce the required objective functions. The values of load (L), cost (C), and time (T) are considered based on the resources yj. A, b, c represents the cloud system variables. The variable of xij is the job (i) on a machine (j), where 0 represents the no resource allocated, and 1 means enough resource is allocated. wi is the capacity of a job and bj is regarded as the capacity allocated by virtual resources.

4.2 Optimization Algorithm for Scheduling

Given more convergence power in the global optimality, the GWO algorithm is used as the base algorithm in the proposed algorithm.

In the initial state, the original population with a uniform distribution is considered by many random numbers and a fundamental solution for the problem is considered. Initialization of variables such as \(a, b, c\) is assigned and called wolf, and the problem is solved by every wolf. Across three groups, wolves are divided: α, β, and γ. Nonetheless, one of them provides a better solution to fitness evaluation, depending on the task.

Therefore, the approach reaches the main loop where the best fitness solution is found after several iterations. The wolf location is modified depending on the formulas in the GWO algorithm. The new places were built according to first-class wolves. Further factors were then regarded for the possibility of a solution. Based on this, it is possible to obtain beta and gamma group values and current wolves locations and their identification.

Furthermore, a new fitness function is calculated to divide the wolves into three new wolves. The algorithm must be more developed if an acceptable solution is found in the new classification. The first population for the GWO algorithm is considered the best option between the wolves. Therefore, the problem is solved with the algorithm of RIL.

figure d

Set the cloudlets and VMs in serverless computing. Initialize the parameter value \(a\) and \(t_{max}\) and generate an initial schedule. Estimate the objective function on all VMs and then group the VMs based on fitness value as α, β, and γ VMs. The alpha VMs has the least fitness value and the maximum fitness value with gamma VMs. The VMs with value lesser than γ are considered as ω. Update at each iteration \(a\) and \(t_{max}\) values for better scheduling. The value of \(a\) is changing at each iteration between 2 and 0. The value of alpha VMs tends to attain the best value with reducing \(a\) value and it is then chosen for the best allocation of tasks.

4.2.1 Reinforcement Learning

As Fig. 4 reveals the modeling of relations between the general agent and atmosphere (in both the classic and current RILs) consists of an element, a collection of measures accessible A and rewarding functions: \(S \times A \to R\). The decision-maker is named the handler. The communication process should be conditioned as it operates. The agent will communicate with the outside of the system, which is referred to as the environment.

Fig. 4
figure 4

Agent-environment interaction system

A continuous phase is a relationship between the agent and the world. The agent makes decisions \(ak\) based on the present state \(sk\) of the environment at every point of the judgment \(k.\) Once the decision is taken, the system must accept the decisions and make the appropriate changes, and the agent for future decisions will be provided with a new updated state \(sk + 1\) environment. According to the decision \(ak\), the environment also gives the agent reward \(rk\) and the agent tries with time to maximize cumulative rewards. For the agent to understand its optimum actions and regulation, a clear input system is necessary.

With Q-learning [41] the agent aims at maximizing the \(Q\left( {s,a} \right)\) value function, as it is expected to accrue (with discounts) when the system starts with State \(s\) and follows Action \(a\) (and certain policy thereby). In this case, it is the representative RIL algorithm. \(Q\left( {s,a} \right)\) for continuous-time systems is defined as:

$$Q\left( {s,a} \right) = E\left[ {\int\limits_{{t_{0} }}^{\infty } {e^{{ - \beta \left( {t - t_{0} } \right)}} } r\left( t \right)dt\left| {s_{0} = s,a_{0} = a} \right.} \right]$$
(1)

where \(r\left( t \right)\), is defined as the reward rate function and; \(\beta\), is defined as the discount rate.

The Q-learning is an online RL adaptive technique that works on an event-driven basis over time [42], reducing overheads related to periodic updating of the value of RIL discrete time [43].

The definition of value function \(Q\left( {s,a} \right)\) is based on continuous-time and given in Eq. (2) in Q-learning for SMDP. The RL company takes the behavior \(a_{k}\) with greedy policies [44] at every point of its time, analogous to the discrete RIL strategies. The updating rule is given at the following Decision epoch \(t_{t + 1}\) caused by state transition:

$${\text{e}}Q^{{\left( {k + 1} \right)}} \left( {s_{k} ,a_{k} } \right) \leftarrow Q^{\left( k \right)} \left( {s_{k} ,a_{k} } \right) + \alpha \cdot \left( \begin{aligned} &\frac{{1 - e^{{ - \beta \tau_{k} }} }}{\beta }r\left( {s_{k} ,a_{k} } \right) + \hfill \\ &\mathop {\hbox{max} }\limits_{{a^{\prime}}} e^{{ - \beta \tau_{k} }} Q^{\left( k \right)} \left( {s_{k + 1} ,a^{\prime}} \right) \hfill \\ &- Q^{\left( k \right)} \left( {s_{k} ,a_{k} } \right) \hfill \\ \end{aligned} \right)$$
(2)

where \(Q\left( k \right)\left( {sk,ak} \right)\), is considered as the estimation decision value at each iteration tk; \(r\left( {sk,ak} \right)\), is considered as the reward function; \(\tau k\), is considered as the sojourn time, where the RIL is said to lie in a state \(sk\) before the occurrence of the transition; \(\alpha \le 1\), is considered as the learning rate, and; \(\beta\), is considered as the discount rate.

This section provides a widespread RIL method compared to previous work, which could be used for the allocation of resources and other problems. The RIL technique consists of a GWO offline and a Q-learning phase online [45]. GWO is used during the offline process to evaluate the relationship between the regulated action pair \(\left( {s,a} \right)\) and its \(Q\left( {s,a} \right)\) value function. Adequate \(Q\left( {s,a} \right)\) value estimates and corresponding \(\left( {s,a} \right)\) samples must be collected in the offline RIL construction phase to build up a sufficiently precise RIL from measurement data [46].

This procedure includes preprocessing the game playback profiles, the status transitional profile, and \(Q\left( {s,a} \right),\) the estimated value for the gameplay applications [47]. The study uses real-world work arrival points for the cloud resource allocation request to obtain adequate system transfer profiles and \(Q\left( {s,a} \right)\) cost estimates which can be composite of latency, power consumption, and durability metrics for constructing the GWO.

This process can be followed by an arbitrary policy and gradually refined policy. The conversion profiles are placed in a capacity \(ND\) with storage \(D\). The use of memory promotes training and prevents parameter divergence. The GWO is trained based on the stored state transformation profiles and \(Q\left( {s,a} \right)\) values estimates [48].

The scheduling of each task is estimated as \(O\left( {mn} \right)\), and hence it takes \(O\left( {m^{2} n} \right)\) to estimate the task scheduling in each iteration.

5 Results and Discussions

In this section, experimental verification is carried out by writing the application in Java using CloudSim 3.0.3 Library with 1000 queries per second. It simulates the cloud and helps to perform the task allocation to the VM based on first come first serve policy.

The performance is evaluated in terms of different metrics that include cost, latency, and throughput. The cost is measured as $ per \(10^{6}\) queries, latency is measured in milliseconds, and throughput is measured in queries per second. In addition, the necessary Cloudsim setup parameters [49] are also provided in the following sections. The algorithm is iterated over 1000 epoch, where the result shows that the proposed RIL-GWO model offers improved throughput, reduced latency, and cost than other existing methods, which is evident from Figs. 5, 6, 7, 8 and 9 respectively. Table 2 provides the Cloudsim setup parameters.

Fig. 5
figure 5

Throughput

Fig. 6
figure 6

Latency with lesser loads

Fig. 7
figure 7

Cost with lesser loads

Fig. 8
figure 8

Time versus smaller file size

Fig. 9
figure 9

Time versus smaller file size

Table 2 Cloudsim setup parameters

Throughput may be stated as the rate at which the data is processed in unit time, in this analysis we have taken the number of queries processed per unit time (QPS). The analysis proved the efficacy of RIL-GWO in terms of throughput under varying load conditions, represented in Fig. 5. It is observed that, for load value 5, RIL-GWO delivered a throughput rate of 9.2 QPS (Queries/sec), while the RIL and GWO delivered a throughput of 6.25 QPS and 7 QPS, respectively. For load value 10, RIL and GWO provides 12.59 QPS and 13 QPS respectively, where the RIL-GWO gives 14.56 QPS. Similarly, for the load 20 and 30 the RIL-GWO provides a throughput rate of 25.54 QPS and 35.54 QPS respectively, which are higher than RIL and GWO.

Latency in networking is the delay incurred to execute the data; the unit of latency is measured in millisecond (ms). Latency and throughput are inversely proportional that, higher the delay in processing the data decreases the throughput rate, vice versa. From the Fig. 6, the latency incurred for varying loads are represented. For load 5, RIL-GWO incurs a latency of 678, whereas the RIL and GWO suffers 701 ms and 785 ms latency, respectively. Likewise, for load value 10, 20 and 30 RIL-GWO incurs a latency of 722 ms, 799 ms and 817 ms respectively. Out of the three schemes, RIL suffered a high latency among the others such that for the load values 10, 20 and 30 it possesses 853 ms, 907 ms and, 947 ms respectively.

The cost estimates which can be composite of latency, power consumption, and durability metrics which is measured as $ per 106 queries. From the Fig. 7, the cost value is analyzed for varying load conditions. For the load value 5, RIL-GWO imposes the cost value of 1.34 $ per 106 queries, whereas the RIL and GWO costs higher as 1.58 $ and 1.46 $ respectively. Correspondingly for load values such as 10, 20 and 30 the RIL-GWO requires 1.57 $, 1.79 $ and 1.83 $ per 106 queries, respectively. Abnormally, at load value 30, the RIL generated a cost value of 2.36 $ per 106 queries, due to the uncertain latencies and throughput conditions experienced by the server.

The size of the file is important, which is having a correlation with latency and throughput. Here in this graphical representation in Fig. 8, the elapsed time to process different file size is illustrated. So, for 1 MB file size RIL-GWO takes 100 ms, GWO takes 700 ms and RIL takes 1100 ms. For the maximum file size of 500 MB the RIL-GWO takes 2493 ms, GWO takes 2600 ms and RIL takes 2746 ms. The file size is inversely proportional to the processing time.

Similarly, for lower file size, the processing time versus file size observations are represented in Fig. 9. Such that for 1 kb file size, RIL-GWO takes 90 ms whereas, RIL and GWO takes 120 ms and 99 ms respectively. For 10,000 kb file size, RIL-GWO takes 250 ms, whereas RIL and GWO takes 295 ms and 259 ms respectively.

The Table 3 shows a marginal difference between the computational complexity associated with the proposed and existing methods. The comparison of the proposed method is made against the RIL and GWO algorithms, where the proposed method offers marginal difference than the other two methods. However, on other hand, it is seen that with increasing file size, the computational complexity increases.

Table 3 Computational complexity

6 Conclusions

In this paper, we discuss the use of serverless runtime and shows the complexities and drawbacks of serverless systems because of their closely linked design and suggest changes to underlying implementations of runtime to minimize the runtime problems. We show that the serverless runtimes can provide significant advantages for optimization using GWO-RIL learning models. The proposed hierarchical framework comprises of resource allocation to the servers. Besides the enhanced scalability and reduced state/action space dimensions, the proposed framework enables to performance of the local power management of servers in a distributed manner, which enhances the parallelism degree and reduces the computational complexity. Experience results show that the proposed GWO-RIL framework saves energy consumption or energy utilization significantly from the baseline while achieving a similar average latency. The suggested model can in effect accomplish the best compromise in a server cluster between latency and power or energy consumption. The findings are promising, and our analysis of the current architecture provides several opportunities for further development and research. In future, task allocation [50, 51] to distributed deep learning can be devised for better network traffic analysis. The application of distributed computing can be utilized for allocation of task in cloud based applications. Further, the utilization of task allocation strategies can be applied effectively to schedule the task in edge based cloud, task offloading, etc.