1 Introduction

The area of distributed computing has evolved over the years with advancements in network technologies. Cloud Computing has been widely used to refer to different technologies, services, and concepts. It is often associated with virtualized infrastructure or hardware on demand, utility computing, IT outsourcing, platform and software as a service, and many other things that now are the focus of the IT industry (Rodriguez and Buyya 2014). It aims at allowing user to avail the benefits of existing technologies, without having deep knowledge about them. It offers three different kinds of service, namely:

  • Software as a Service (SaaS): SaaS provides applications and services to the users based on their demands. SaaS generally consists of desktop applications such as office automation, photo and video editors, Enterprise Resource Planning softwares and so on which are running on the providers infrastructure and made accessible through a web browser at the user’s end when user demands it.

  • Platform as a Service (Paas): PaaS is responsible for providing scalable and elastic runtime environments for the user on demand. It also lets the user to host the execution of applications at its end. The services provided by PaaS are aided by middleware platform, which creates an abstract environment, where the user applications are hosted and run. The service provider of PaaS must make sure that scalability is achieved when user demands are more and to manage fault tolerance. The end users will create applications making use of service providers APIs and system libraries.

  • Infrastructure as a Service (Iaas): IaaS provides infrastructure such as virtual hardware, storage and networking on demand to the end users. Using IaaS all the end user computations can be achieved remotely using virtual hardware by utilizing the VM instances.

Virtualization is the core technology in cloud computing, mainly in infrastructure based services. Virtualization is used to create an customizable execution environment for the applications. Virtualization is realized by using software or combination of software and hardware to emulate an execution environment which is clearly different than the host which creates this application (Sanaei et al. 2014). The main components in Virtualization are: the guest, the host, and the virtualization layer.

  • The guest represents the system component that interacts with the virtualization layer.

  • The host represents the original environment where the guest is supposed to be managed.

  • The virtualization layer is responsible for recreating the same or a different environment where the guest will operate.

In the case of hardware virtualization, the guest is represented by a system image comprising an operating system and installed applications. These are installed on top of virtual hardware that is controlled and managed by the virtualization layer, also called the VM manager. The host is instead represented by the physical hardware, and in some cases the operating system, that defines the environment where the VM manager is running. The main common characteristic of all these different implementations is the fact that the virtual environment is created by means of a software program. The ability to use software to emulate such a wide variety of environments creates a lot of opportunities.

The applications of Cloud Computing are in diversified fields of data analytics, machine learning and deep learning and so on (Tokle et al. 2014; Fernandes et al. 2017). Thus it is of high importance that cloud computing technologies and its associated algorithms are efficient.

Cloud Computing is an heterogeneous computing technology, which is implemented at the data centers. The data centers heterogeneity is present at various levels such as instruction level architecture, operating system, application level and so on (Zhang et al. 2014). This however poses certain challenges to the data centers which will be dealt in this paper. The major challenges posed to the data centers are (Liang and Yang 2013):

  1. 1.

    The resources in the data centers have to be systematically provisioned in order to handle the incoming service requests.

  2. 2.

    The efficient allocation of the various tasks to the available resources so that the load is balanced across all the resources worldwide.

If the above challenges are overcome, then the Quality of Service (QoS) provided by service provider will increase, thereby confirming that there is no single point of failure. The main challanges faced by load balancing algorithms are (Al Nuaimi et al. 2012):

  1. 1.

    Distribution of cloud data center: the cloud data center servers are spatially distributed all over the globe. Most algorithms designed works fine for servers within a grid and located close by, where the delay in data communication is minimal or negligible. Thus, there is a need to develop efficient load balancing algorithms for data center servers which are distance apart connected via networks that are prone to network delays.

  2. 2.

    Storage and replication: data centers replicate the data to have a backup such that data is available all the time including any major catastrophic failures. In case of replicating data fully into the servers will incur more cost as the amount of storage required will be quite large. Thus, partial replication of data is done at different data center severs based on its processing power and available storage. This approach will lead to better utilization of available severs but will increase the complexity of load balancing algorithms due to the fact that replication is done partly across multiple severs.

  3. 3.

    Complexity of the algorithm: load balancing algorithms are preferred to have less execution time because they need to run across multiple cloud servers. Complex algorithms with higher execution time will generally lead to delay thereby reducing the efficiency drastically.

Load balancing algorithms are classified mainly into two categories:

  1. 1.

    Dynamic load balancing algorithm: in this category of algorithms, load is allotted to a VM based on the current state of the machine i.e during the rum time (Wang et al. 2014). State of the VM can be considered with respect to various characteristics like available memory, CPU speed, etc. Dynamic algorithms can be further classified into: off-line mode and online mode. Offline mode algorithms are those where the allotment of task to the VMs are done only at predefined time intervals. Whereas in online mode, immediate allotment of the task is done.

  2. 2.

    Static load balancing algorithm: in these category of algorithms, the VM information like capacity, memory size, performance, etc is known in advance and these details are considered for all the allocations. The change in these information during the run time do not affect the allotment of the task to VMs. Static algorithms are easy to incorporate but is not an good option for heterogeneous cloud environment (Liang and Yang 2013).

The load balancing algorithms are typically evaluated through the following performance evaluation metrics (Randles et al. 2010):

  • Reliability: if the system performs consistently as per the specification, then the system is reliable. In case of system failure, the task is performed by other VM to increase stability of the system.

  • Accuracy: it is the measure of difference between the expected result and the obtained result. As the difference decreases the accuracy increases.

  • Throughput: it is the measure of system performance where the number of instructions executed per unit time is considered. In load balancing, throughput specifies the number of tasks executed by the VM per unit time.

  • Scalability: capacity of the system to perform under unacceptable situations can be measured using scalability. In load balancing system, scalability is surviving when the task size or workload increases by incorporating appropriate mechanism.

  • Makespan: time taken to complete all tasks submitted to the system can be referred to as makespan. If proper load balancing techniques are employed, then the results will be in optimal makespan.

2 Related work

Over the years several cloud load balancing algorithms were designed and developed employing various techniques. A biologically inspired bee colony optimization technique was developed by Nakrani and Tovey (2004), to load balance the distributed cloud hosting web services. In this technique, when the number of requests for web services varies, the allocation of web servers varies accordingly. Here, the servers are deemed as virtual servers which contains a service request queue. A measure called “profit” is calculated by the server which gives the service of the application request, based on performance metric like CPU time, etc. There is also “advert board” which indicates the idle servers whether any services are needed by them. The overall profit of the corresponding infrastructure is calculated based on different metrics. Here, the severs will play the role of bees and idle servers will be like the bees waiting to fetch the nectar. Another biologically inspired load balancing technique based on the modified ant colony optimization (ACO) technique was developed by Nishant et al. (2012). ACO is based on making use of movement of ants as a phenomena in a certain direction. This is related to cloud load balancing by considering software module as ants and the VM’s as nodes. Initially, one node will be selected as the Regional Load Balancing Node (RLBN), which acts as the head node. Ants will originate from this RLBN and will be able to move in either direction towards the nodes, based on the load of nodes. In this approach a pheromone table is maintained, which maintains information about loads on the nodes. The main aim of ant is to find the least loaded node and assign application to it, thereby equally distributing the node in the network.

A software module called load balancer has been proposed by Jain et al. (2013). This load balancer is used to allocate VMs to multiple user application requests, based on the availability of VMs, such that the allocation strategy becomes optimal. The load balancer will maintain a queue for the user application request and then suitably assign the appropriate VM. It also maintains the information regarding allotments to the VMs, thereby knowing in advance which VMs are free. A variation of load balancer called as “active VM load balancer” was developed by Adhikari and Patil (2013). In this approach, the VM which is having less load will be assigned to the new application request. The “active VM load balancer” will send the ID of the VM to the data center controller, which sends requests to the VM having the ID received by it.

Scheduling of the VMs was also an important factor to utilize the resources to maximum extent. Round Robin based load scheduling algorithms had limitation that, once the VM is allocated to a user application request, then its state will not maintained. This drawback leads to the execution of the algorithm once again for the same request. An improved Round Robin based scheduler was proposed by Mahajan et al. (2013), which maintains the state each time an application request is run by the server. This was done by using specific data structures called hash map to keep information about the VM allocated to a specific user application request and VM state list, which maintains the status of the VM (i.e. whether busy or free). Simulation results showed that this algorithm increased the response time as compared to the ordinary Round Robin scheduler. A modified throttled algorithm for load balancing was developed by Domanal and Reddy (2013). In this technique, the best available VM is allocated to an application request based on the response time and processing time. Here “throttled VM load balancer” maintains the status of every VM. Upon an application request to the data center controller, it asks the “throttled VM load balancer” to assign the perfect VM based on the application requirements.

Authors in Xu et al. (2013) have developed a load balancing algorithm by applying the concepts of game theory to partition the cloud so that higher efficiency is obtained. A stochastic model based job scheduling algorithm was developed by Maguluri and Srikant (2014), where the server at the data center chooses the jobs to be scheduled for execution based on availability of resources. The scheduling strategy and resource provisioning for scientific workflow for IaaS cloud was proposed (Rodriguez and Buyya 2014), where the large dataset of workflows were managed efficiently by applying Particle Swarm Optimization (PSO). Here, the PSO techique has been implemented using CloudSim for four different workflows. Authors in Cao et al. (2014) have developed techniques based on queuing model to optimize the performance and reduce the power consumption in data centers. A heuristic based load balancing algorithm is developed by authors in Zhao et al. (2016), where the clustering approach is used. They have applied Baye’s theorem to obtain optimal clusters of physical hosts available for load balancing. The data intensive applications which manage tera bytes and peta bytes of data use cloud based workflow applicaions for processing the data. These workflow based cloud also need to balance their load and manage resources efficiently (Wang et al. 2014; Zhang et al. 2014). The power consumption at cloud data centers is quite huge which gave rise to techniques for optimal power usage in heavily loaded data centers (Cao et al. 2014; Tai et al. 2014; Wang et al. 2014). Resource allocation and scheduling of tasks to various resources in cloud data centers have been extensively studied in literatures (Fang et al. 2010; Maguluri and Srikant 2014; Maguluri et al. 2014). The various literatures with its contributions, limitations is highlighted in Table 1.

Table 1 Summary of literature indicating the load balancing algorithms

3 Proposed methodology and architecture

Fig. 1
figure 1

Data center model for the proposed MCLB algorithm

In the proposed algorithm, all the requests from the users all around the world arrive at the data center controller, which is one of the major component of the cloud computing. The data center controller forwards the requests to the proposed MCLB algorithm to assign the request to the available VMs. This algorithm maintains a table which contains the id’s of the VM, priority of the VM and the state of the VM. The algorithm will search the table to find the VM with highest priority and which is available at that moment. If found, the algorithm will reply back to the data center controller with the id of that machine (\(VM_{id}\)) and the data center controller will assign the request to that machine, else, it will wait for the VM to be free and once free that machine will be assigned to the request. The data center model of the proposed MCLB algorithm is depicted in Fig. 1.

Fig. 2
figure 2

Flowchart of MCLB algorithm

4 Implementation details

The sequential flow of MCLB is highlighted in Fig. 2. The proposed algorithm mainly consists of following modules:

  • User module: this module is used to create user bases. Here you can specify the number of user bases participating in simulation, ID of the user base, name of the user base and other characteristics. The functions performed by this module is sending the request to the data center controller and receiving the response cloudlet from the data center controller.

  • Data center controller module: this module does the load balancing with the help of the proposed algorithm. On receiving requests from the user bases, it will use the algorithm to find the VM which is available and has the highest priority. On finding the VM, the job will be assigned to that VM. If all the VMs are busy serving requests, the algorithm will reply back to the data center controller saying that all machines are busy. Now the Data Center Controller will have to wait for the signal from the VM which gets free and then assign the job that is in the queue to that VM. Once the job is assigned, data center controller should update the table which is maintained by the algorithm about the job allocation. The VM will process the job assigned to it and will reply back with the response cloudlet to the data center controller. This response is sent to the user who has sent the request. Also, the table is updated regarding the de-allocation of the VM and the state of that VM will be set to available. This module communicates directly with the other three modules of the system. All the requests from the users arrive at the data center controller, which is then forwarded to the VM by referring the algorithm and the table maintained by the algorithm for processing. The response cloudlet from the VM arrives at data center controller which is then sent to the appropriate users.

  • VM Creation module: this module is used to create the VMs. VMs are the machines which are not real but replicate the properties of real system. Such machines are used to efficiently use the available resources and improve the speed of processing requests. VMs can be created at data center as per the requirements using this module. It also allows us to set the characteristics of the VMs such as memory size, speed of the CPU, etc.

  • Maintaining table module: this module is used to maintain the table that is used by the proposed algorithm. The table contains id of the VM (\(VM_{id}\)), state of the VM and priority of the VM. Data Center Controller should update the table on allocation of every request to the VM and de-allocation from the VM after the processing of job is complete. \(VM_{id}\) indicates the unique identification number which is used to identify the VM.

    Priority of the VM is calculated based on the CPU speed and the memory of the VM. The formula is given as:

    $$\begin{aligned} P_r(i)=t\times T_c(i)+s\times T_m(i) \end{aligned}$$
    (1)

    Here t represents the CPU Weight i.e time of host CPU that is available for the Virtual CPUs execution.

    s represents the memory weight i.e size of memory available for the VM. \(T_m\) is the Memory resource available. \(T_c\) is the CPU speed (MIPS). \(t+s = 1\) i.e t represents the % of CPU time available for a particular VM out of the total availability and s is the % of memory of total memory available for a particular VM.

figure a

The proposed MCLB algorithm is described in Algorithm 1.

5 Results and analysis

The MCLB algorithm is simulated using the CloudSim framework, which is used for simulating cloud computing infrastructure and services (Calheiros et al. 2011). The MCLB algorithm is compared with the existing algorithms and is graphically analyzed considering various parameters stated in Sect. 1. The various compared algorithms and their conventions are:

  • Round Robin Algorithm (RRA)

  • Throttled Algorithm (TA)

  • Equally Spread Current Execution Load Algorithm (ESLBA)

5.1 Main configuration

As we can see in the Fig. 3, six user bases are created in six different regions. Requests per user per hour and the size of the data per request are kept same for all the user bases to obtain proper result. Different peak hours are set for different user bases and different average peak users and off peak users are set as shown in Fig. 3. Service Broker policy is set to Closest Data Center where the data centers receive the traffic from the user bases that are located near to that data center. Since we are using only one data center, all the user bases will send their requests only to that data center.

Fig. 3
figure 3

Snapshot of main configuration

One Data center is created with 50 VMs. The configurations of the data center is as shown in Fig. 4 . Here, the region where the data center should be located in, operating system, costs etc are shown.

Fig. 4
figure 4

Snapshot of data center configuration

Fig. 5
figure 5

Snapshot of advanced configuration

Figure 5 shows the advanced configuration, where the number of users forming one user base, request grouping factor and length of the instruction per request are set. All the configurations shown above remains the same for every simulation, but the load balancing policies are changed for every simulation and values are taken down for comparison and to plot graphs.

Fig. 6
figure 6

Comparison of response time for various algorithms

Fig. 7
figure 7

Comparison of average response time for various algorithms

Fig. 8
figure 8

Comparison of data center processing time

Fig. 9
figure 9

Comparison of total cost

5.2 Evaluation parameters

  1. 1.

    Response time: for the configurations set as discussed above, we have considered six user bases to estimate the response time by the region for various algorithms. is depicted in the graph obtained as shown in . It is observed from the graph in Fig. 6 that the proposed MCLB algorithm has the minimum response time compared to the other algorithms for every user base. This is due to the fact that MCLB takes up task based on priority and looks for VMs which are lightly loaded.

  2. 2.

    Average response time: for the configurations set as discussed above, the average response time for various algorithms is depicted in the graph obtained as shown in Fig. 7. From the graph we can conclude that the average response time for the user requests is less in the proposed MCLB algorithm as compared to other algorithms when distributing the load among available VMs.

  3. 3.

    Data center processing time: data center processing time is the time required to process the requests of user at the data center. Data center has to use the algorithm and forward the request to the appropriate VMs. The time taken for this process is data center processing time. For the configurations set as discussed above, the data center processing time for various algorithms is depicted in the graph obtained as shown in Fig. 8. It is observed that RRA and ESLBA takes almost double the time taken by TA and MCLB to process the request at Data Center. MCLB take slight lesser time than TA and hence is efficient compared to other algorithms.

  4. 4.

    Cost: the total cost includes cost per VM, storage cost, memory cost and data transfer cost. For the configurations set as discussed above, the total cost for various algorithms is depicted in the graph obtained as shown in Fig 9. Since the configurations set for comparing all the algorithms are same, there is no much difference in the total cost. However, MCLB reduces the overall cost compared to other algorithms.

6 Conclusion and future work

In this work we have proposed an MCLB algorithm. The MCLB algorithm balances the load among all the available VMs and thus takes care of the overloading and under loading of VMs. Allocation of the jobs is done by considering the priority and the state of the VM which helps in the fair allocation of the jobs and efficient resource utilization. Simulation results have a shown that, proposed MCLB is more efficient compared to Round Robin Algorithm, Throttled Algorithm and Equally Spread Current Execution load Algorithm in terms of performance evaluation metrics such as response time, average response time, data center processing time amd cost.

As part of future work we can consider live migration of VMs and more sophisticated auto scaling approaches. Also a load balancing algorithm can be developed by considering the processor utilization and memory utilization.