Keywords

1 Introduction

Cloud computing is a modeled technique, which equipped users with virtualized pool of resources in distributive environment and also facilitated pay as you go model for the resource utilization [1]. Cloud service providers (CSP) aim to serve request online as per the type of cloud environment (public, private, hybrid and community) and these services. In the beginning of cloud era, IaaS, PaaS and SaaS were the three accessible elastic and scalable services of cloud computing. These are Internet-based rented services made available by CSPs and assured over subscription through service-level agreement (SLAs) [2]. The client/customer does not need to own the hardware or software rather than they can use them online with the Internet facility.

With the enhancement in computing era, the cloud computing is rapidly enriching and able to offer (EaaS) Everything as a Service. Besides, day to day the amount of data evolving with it is also gaining an attention to be managed efficiently for the effective and sustainable working over distributed cloud scenario [3]. It is one of the major responsibilities of data handler to ensure the reliable and secure data handling in cloud. Large amount of data in cloud increase the velocity of demands which initiates a new challenge of load balancing in cloud computing that targets the effective scheduling and resource monitoring [46].

This paper justifies the requirement of load balancing in cloud computing with various supporting strategies that focus on improving the process of efficient resource utilization.

1.1 Load Balancing in Cloud Computing

Load balancing is a uniform approach of scheduling jobs among the available computing nodes. Monitoring resources and their effective utilization in broad network access is the main objective of load balancing [7]. A load balancing algorithm is considered resourceful when it is fault-tolerant and scalable in nature and guarantees to produce maximum throughput. There are numerous algorithms available that are categorized as dynamic or static in nature and operate in different environment. Static algorithms possess predefined factors and states on which it runs, whereas dynamic algorithms are operable at run time depending on the current state dynamically balancing the traffic on server. Various types of load balancing algorithms are as follows [6, 8].

  1. A.

    Static Load Balancing

Static load balancing techniques are non-preemptive in nature that has predefined strict rules to be followed based on input and does not depend upon the current state of the machine to manage the workload. It requires a prior knowledge of the system setup and resource availability. It is also known as policy-driven load balancing driven by parameters like server capacity, throughput rate, fault tolerance, response time, etc. [9]. Examples of static load balancing technique are artificial bee colony search, two phase scheduling, central manager algorithm, etc.

  1. B.

    Dynamic Load Balancing

Dynamic load balancing techniques are preemptive in nature that do not require prior knowledge of input and depend on the current state and enhance the overall working of the system. It manages load dynamically and prevents nodes from getting overloaded as it transfers load within nodes on run time. It is also known as feedback-driven load balancing [10]. Examples of dynamic load balancing technique are artificial ant colony search, round robin algorithm, throttled load balancing algorithm, etc.

1.2 Load Balancing Optimization Algorithms

This section shows the literature review done on different load balancing optimization algorithms describing different ways to balance huge data in cloud.

Throttled Load Balancing Algorithm

Throttled algorithm is a state-based algorithm that depends on the current state of virtual machines whether it is available or busy. Load balancers [11] are the modules of operating environment that dynamically balance the load on different available virtual machines and maintain their index as well as its associated state. If the state of virtual machine is available, the request is assigned; otherwise, the action is declined and searched for safe state. Virtual machines are responsible for the execution of request after its state has been verified by the load balancer. In [12], the comparison is made by the author between round robin and throttled algorithm in terms of time and cost. It is considered superior as compared to round robin algorithm in terms of cost as it reduces the cost of virtual machine’s usage per hour (Fig. 1).

Fig. 1
figure 1

Process flow of throttled algorithm

Ant Colony Optimization Algorithm

Ants are small blind insects that used to find food outside their nest when they are hungry. The way they find their food by following the shortest path is associated with load balancing in cloud. These ants decompose pheromone on their way to food with same speed and same rate. This helps the other ants to follow their path to food. So, more the followers, the higher will be the concentration of the pheromone decomposed. Evaporation rate of this pheromone on the shortest path is quiet low. In [13], the author has shown that this same process can be adopted by the job scheduler of cloud computing for mapping the task to available executing nodes. Schedulers check the load on their surrounding and transfer request accordingly for effective utilization by maintaining the descriptive table containing all the necessary information about virtual machines.

Honeybee Optimization Algorithm

It is one of the finest algorithms of load balancing in cloud computing that follows the behavior of honeybees. There are two categories of honeybees: One is detector honeybees that go out in the search of food and other one is follower honeybees that follow the path directed by leader honeybees. Leader honeybees come back and perform the well-known waggle dance in which they form numeric eight. This special dance tells the quality and quantity of the food and also the duration of the dance shows the distance of the food from beehive with its recorded profit [14]. Unemployed bees in the hive have the option to be detectors or followers. This paper [15] shows the improved artificial honeybee algorithm as the basic algorithm may create some imbalance of load among the nodes. According to this improved algorithm, threshold value is set for every server queue, and when the length of a server queue exceeds the value, the load is transferred to another server, and they are executed independently improving the throughput of the system.

Genetic Algorithm

Genetic algorithm is one of the optimal algorithms of effective search and optimization in load balancing. Simple GA follows the three-step process of selecting the population followed by genetic operations and replacement with new population. The genetic-based load balancing strategy in [16] balances the load based on process described based on genetic algorithm. It first initializes the population, which finds out the fitness factor followed by crossover and mutation. Replace the offspring with new population and do the acceptance testing. This improves the QoS requirements of the client.

Generalized Priority Task Scheduling Algorithm

Resource monitoring in cloud computing is done in three steps—discovering and filtering the resources in broad network, selecting the appropriate resource from the available ones and finally submitted the task to desired resource. In generalized priority optimal task scheduling [17], high size task is given high priority as well as the servers with high MIPS and maps the task accordingly to the virtual machine with identified id and updates the available resources. This paper shows the improved execution time as compared to round robin and first come first serve algorithms.

Agent-Based Load Balancing

In traditional load balancing, load balancers aim at scheduling the task to appropriate servers to avoid the overloading state in the system. The agent-based dynamic load balancing uses a software entity named as mobile agent which is independent software program and run on the behalf of network user. This agent covers one walk in shared pool of resources within two walks (Figs. 2, 3 and 4).

Fig. 2
figure 2

Process diagram of artificial honeybee

Fig. 3
figure 3

Flowchart of genetic algorithm

Fig. 4
figure 4

Agent-based load balancing

In its first walk, it gathers all the information about the status of the servers with the average calculation of the jobs, and in its second walk it analyzes the overloaded and under loaded state of the servers and transfers the load accordingly. The additional agent in dynamic load balancing improves the throughput and response time of the system [18].

  1. A.

    Estimated Finish Time Task Scheduling.

There are several computing factors like throughput, processing time, finish time, response time, etc., that counts the efficient load balancing. Faster the task execution takes place more, efficient will be the strategy. In estimated finish time task scheduling [19], the characteristics of the task are judged during the allocation and processing time in order to avoid the blocking of processes in the queue.ss It estimates its finish time at earlier stage during allocation and guides it to the appropriate server that improves the performance and resource utilization as it ensures the maximum usage of virtual machines.

2 Comparison and Important Findings

In this paper, comparison among several load balancing techniques has been discussed and summarized according to their performances and results. This section illustrates some important findings and performs the comparison between those existing strategies based on some measurement parameters that are listed and tabulated below:

  1. I.

    Throttled and agent-based load balancing algorithm can be applied on various cost-oriented models and plays an important role in business-oriented applications and government sectors.

  2. II.

    Honeybee and generalized priority focus on improving the overall execution of the system and balance the load among various nodes more efficiently as they are predictive in nature and pre-analyze the data to be allocated at different locations.

  3. III.

    Resources are the most valuable assets of the computing environment and are needed to be utilized effectively so that it can contribute to the scalability and meritorious performance of the system. Ant colony estimated finish time and genetic algorithm successfully achieved the commendable resource utilization and can be modeled further with other parameters.

  4. IV.

    These techniques can be integrated with other modeled load balancing techniques to achieve success in various sectors like banking, medical, forecasting, etc. (Table 1).

    Table 1 Analysis of various load balancing algorithms

3 Conclusion

Cloud computing is scalable Internet-based service that aims to improve the utility of computing resources with the increase in velocity, volume and variability of incoming data. Rapid rise of data makes it difficult to handle such large amount of data and introduced a new challenge of load balancing in distributive cloud environment. Several load balancing algorithms have been proposed by intellectual researchers to efficiently direct the tasks to the computing nodes for smooth and uniform execution. Considering various proposed algorithms, this paper performs a comparative analysis among them based on different metrics. This analysis concludes that different algorithms work on different parameters, and none of them works considering all the parameters. All the proposed algorithms are efficient in one way or the other but do not claim to be the best. Therefore, these algorithms can be carried out with some new measurement parameters and can improve the quality of distribution of data with enhanced security and privacy methods.