Keywords

1 Introduction

Cloud Computing is a model for enabling ubiquitous and on-demand access to shared pool of computing resource [1]. These resources are made available to customers on a pay-per-use model, similar to utilities such as electricity and water supply. Cloud data centres are building where a large number of inter-connected servers are operated. Each server hosts a number of virtual servers on which user requests are deployed. Cloud computing is being adapted at a phenomenal rate, as recent reports have shown that there are over 140 million active Cloud users [2] and are growing at a rate of about 17% annually [3]. With this huge number of users allocation of requests to Cloud servers must be done efficiently. In [4], the authors grouped Cloud users into three group – Gold, Silver and Bronze and prioritized allocation of their requests to Cloud servers. The Gold class enjoyed highest priority and least delay however the Silver and particularly the Bronze class users were not treated fairly and suffered long delays.

This work therefore proposes a user request allocation scheme that takes into consideration multiple user classes and allocated requests to servers in a fair manner. In the remaining sections of this paper, the terms Cloud servers and Physical Machines (PMs) are used interchangeably as well as the terms user request and user workload. The rest of this paper is organized as follows: related works on workload allocation in Cloud computing are discussed in Sect. 2. A detailed description of the proposed approach is given in Sect. 3. Experimental setup and performance analysis are done in Sect. 4. Sections 5 and 6 respectively presents the contributions and conclusion of this paper.

2 Related Works

Round Robin [5] though mostly suited for static environments has been proposed by numerous authors as a technique for workload allocation in Cloud Computing. It is a circular variant of First-Come-First-Serve (FCFS) allocation scheme, wherein workloads are allocated resources in time sharing manner. However, being a static approach, it does not take into consideration the status or current workload of the PMs into consideration when allocating workloads.

Round Robin with Server Affinity (RRSA) was proposed by [6]. The work introduced a variant of round-robin, which allocates workloads to PMs with a view of keeping a workload balance amongst the PMs. RRSA distributes workloads using the conventional round robin algorithm but introduces hash map and PM state list which store information about the last PM allocations and the current state of PMs respectively. Experimental results showed that when compared to the classic Round Robin, the response time improved as the number of data centres increased however processing time was only marginally better. There is also the high possibility of dip in performance during the process of searching the hash map.

In [7] an efficient version of the throttled allocation scheme was presented. Throttled allocation is a scheme that allocated workload to VMs with a view of keeping a balanced workload distribution amongst VMs. Prior to workload allocation, the scheduler checks its index-table for the status of all VMs. If a suitable VM is found for the workload, it is immediately allocated to the VM, otherwise it is enqueued. In the work, the proposed Efficient Throttled Algorithm (ETA) addressed the long execution time experienced by workloads when the using the classic throttled algorithm. The authors compared the performance of ETA with Round Robin, classic throttled and Equal Spread Current Execution algorithms and reported improvements in average processing time and overall response time. However the proposed approach was only minutely better than the classic throttled and at par with the two other algorithms on all metrics. The Round Robin and throttled algorithms are two of the default algorithms included in CloudAnalyst [8] - a popular Cloud research and simulation toolkit based on CloudSim.

Join Idle Queue was proposed in [9] as an approach that combines distributed dispatcher with Idle Queues (IQ). Each dispatcher has an IQ onto which idle PMs queue. On arrival of a task, the dispatcher checks its IQ for idle PM(s). If found, it dispatches the task to the first PM on the IQ else it is randomly dispatched to any PM. Idle PMs also enqueue themselves onto a dispatcher’s IQ randomly or based on the shortest IQ. Though effective, the approach does not take workload imbalanced between busy PMs into consideration. In [10], the author analysed the performance of JIQ [9] using fluid limit approach and showed that it is advantageous for a PM to enqueue itself on a dispatchers IQ, while still actively servicing user requests. The use of early threshold was also recommended.

In [11, 12], a linear workload allocation scheme called based on Best Fit Descending (BFD) was proposed. The scheme, sorts all PMs in descending order of available resources then searches through the list of PMs for one which best matches the user workload requirements (VM). Though an efficient allocation scheme in terms of resource utilization, the linear search process used can increase allocation time especially in data centres with large number of PMs and user workloads with multiple resource requirements.

All the above reviewed works considered single user workload queue, implying that it is assumed that all user workloads belong to a single class of requirement. This is not true in reality, as factors such as users’ purchasing power and socio-economic status play a significant role in daily lives and can therefore not be ignored. The authors in [4] therefore considered user workload classes and used these classes for workload allocation. The introduction of the Double-Depth Half Interval Scheduling (2DHIS) as a replacement to the BD scheme used in [11, 12], allowed the proposed algorithm finds suitable PMs for user workloads faster thus reducing allocation delay time. However, similar to [11, 12], a First-Come-First-Serve (FCFS) approach was also used for workload scheduling. Also the authors primarily focused on workloads within the Gold class with highest priority, thereby resulting in increased delay times for workloads of lower priority classes.

The authors in [13] introduced a modified Weighted Round Robin (WRR) [14], called Carry-on Weighted Round Robin (cWRR). It is a multi-queue scheduling algorithm like WRR but unlike WRR, in this work whenever a queue has less packets than its allocated service quantum, the excess quantum is carried over to the next active queue rather than being ignored as in the case of WRR. Unlike Deficit Round Robin (DRR) [15], the extra quantum is used in the current round, and not in the subsequent round. This in effect allows cWRR to better utilize the available bandwidth per round. Also the fact that cWRR refreshes each queues’ service quantum ever round makes it a fair scheduling algorithm.

This paper adapts the cWRR by applying it as a workload scheduling algorithm for Cloud environments with multiple workload classes. The new scheme, simply called adapted-cWRR or ACWRR improves on the work of [4] with the introduction of cWRR which is able to schedule multiple user workload classes in an efficient and fair manner to all classes.

3 Proposed Approach

In this section, the Adapted-Carry-On Weighted Round Robin (ACWRR) is presented. ACWRR groups users’ workloads into three (3) classes of Gold, Silver and Bronze, similar to [4]. However, rather than the first-come-first-serve used, a cWRR based scheduler is used to allocate user workloads to VMs/PMs in the Cloud Data centre (DC).

The ACWRR model shown in Fig. 1 can be broken into the following steps below, while a logical flow is depicted in Fig. 2.

Fig. 1.
figure 1

The Adapted-Carry-On Weighted Round Robin (ACWRR) model

Fig. 2.
figure 2

ACWRR system flow chart

  1. 1.

    Workloads are submitted by users.

  2. 2.

    The workloads are grouped into three classes.

  3. 3.

    CWRR is used to select workload to schedule, this is shown in Algorithm 1.

  4. 4.

    Selected work are allocated to resources within the DC for processing.

3.1 Algorithm 1: CWRR

figure a

3.2 System Process Flow

4 Performance Evaluation

In evaluating the efficacy of the proposed ACWRR model, simulated experiments were conducted using CloudSim [16]. The experimental setup used consist of a DC with 800 host machines of with two different processing powers in Millions of Instructions Per Second (MIPS) - 1,860 MIPS and 2,660 MIPS. Real workload traces from PlanetLab day 3 [17] were used, and consisted of 1078 workloads. The ACWRR model was benchmarked against the conventional WRR using the following metrics:

  1. 1.

    Average workload delay per queue – which is a measure of the time spent by a user workload on any given queue waiting to be scheduled; and given in (1).

    $$ \frac{{\mathop \sum \nolimits_{i}^{n} \left( {D_{i} - A_{i} } \right)}}{n}. $$
    (1)

    Where Di is departure time of a workload i, Ai is arrival time of a workload i, and n is number of workloads on a queue.

  2. 2.

    Makespan per queue – a measure of the time it takes to schedule the last workload on a queue.

    $$ D_{l} - D_{o}. $$
    (2)

    In (2), Dl is the departure time of the last workload on a queue, and Do is the departure time of the first workload on the queue.

  3. 3.

    Average overall workload delay – average time spent by a workload across all queues. It is given in (3)

    $$ m*\mathop \sum \nolimits_{k}^{m} \frac{{\mathop \sum \nolimits_{i}^{n} \left( {D_{i,k} - A_{i,k} } \right)}}{n}. $$
    (3)

    Where Di,k is departure time of a workload i on a queue k, Ai,k is arrival time of a workload i on a queue k, n is number of workloads on queue k, and m is the number of queues.

  4. 4.

    Latency per queue – inverse of the number of workloads scheduled on a queue per unit time and given in (4)

    $$ \left\{ {\frac{{\sum w_{q} }}{{T_{q} }}} \right\}^{ - 1}. $$
    (4)

    Where w is number of workloads on a queue q and T is time taken to schedule all workloads on q.

  5. 5.

    Average system throughput – inverse of the average number of workloads scheduled per unit time across all queues.

    $$ \left\{ {Q*\mathop \sum \nolimits_{q}^{Q} \frac{{\sum w_{q} }}{{T_{q} }}} \right\}^{ - 1}. $$
    (5)

    In (5), Q is the total number of queues, wq and Tq are the workloads and allocation time on queue q.

5 Results

Implementation was in two phases. At the first phase, workloads were enqueued onto the three queues, while in the second phase, the enqueued workloads were selected/dequeued for allocation using one of three arrival schemes viz. random, equal and double. Using the random scheme, workloads were enqueued onto randomly selected queues. With the equal scheme, an even proportion of workloads to be dequeued were enqueued, such that at each iteration the arrival rate of workloads equal the departure rate. Finally with the double, departure rate is twice the arrival rate. For each of these schemes, a dequeueing ratio of 5:3:2 was maintained for all queues.

In Fig. 3, the latency (inverse of throughput) of ACWRR and WRR are depicted. In this graph, for all queues, ACWRR outperformed WRR by at least 13.8% and implies that ACWRR is able to schedule at least 38.7% more workloads on the average than WRR.

Fig. 3.
figure 3

Latency of ACWRR vs. WRR using random enqueue scheme

A comparison of the makepan is shown in Fig. 4 and though WRR is shown to have lower makespan for all three queues versus ACWRR, the values vary across these queues. For queue 1, WRR results in 30,420 ns, while ACWRR gave 37,030 ns. For queue 2, makespans of 33,058 ns and 36,839 ns were recorded for WRR and ACWRR respectively. While for queue 3, WRR resulted in 35,672 ns versus ACWRR at 37,563 ns. These results show that the dequeuing ratio largely influences the makespan when WRR is used, resulting in queue 1 (with highest the dequeue ratio) having shortest makespan and queue 3 (with lowest dequeue ratio) having the longest makespan. However, when ACWRR is used, despite the dequeuing ratio (5:3:2), all queues experiences similar makespans, thus making ACWRR fairer to all its queues versus WRR.

Fig. 4.
figure 4

Makespan of ACWRR vs. WRR using random enqueue scheme

Figure 5 shows the waiting time or delay of workloads in a queue. For queue 1, which has the highest dequeue ratio, workloads experience 10.5% less delay with ACWRR versus WRR. For the second queue, workload delays improved by 6.8% when ACWRR is used, while for the last queue, ACWRR resulted in significantly less waiting time with an improvement of 150.9% versus WRR. On the average, ACWRR resulted in 38.7% lower workload delay time than WRR.

Fig. 5.
figure 5

Scheduling delay of ACWRR vs. WRR using random enqueue scheme

Due to the unpredictability of the random allocation scheme and to have a fair benchmarking assessment, the equal and double schemes were also used. When using the equal scheme (equal arrival and departure times), WRR resulted in an average of 4.5% improvement over ACWRR with respect to latency and workload delays. However, it must be noted that when arrival rate equals departure rate, no queues would be formed hence queue management algorithms such as ACWRR would not be needed. For this reasons, these results are not shown, except for Makespan which is a measure of the fairness of a scheme to its queues. This is shown in Fig. 6.

Fig. 6.
figure 6

Makespan of ACWRR vs. WRR using equal enqueue scheme

In Fig. 6, ACWRR is again shown to be fairer on all its queues versus WRR. For queue 1, 34,453 ns and 32,989 ns were reported for ACWRR and WRR respectively. For queue 2, makespans of 34,790 ns and 29,014 ns were reported for ACWRR and WRR respectively. For queue 3, 34,202 ns and 33,588 ns, were the makespans for ACWRR and WRR. As with the random enqueuing scheme, ACWRR is fairer to all its queues versus WRR.

In order to further ascertain the results, a last test was conducted using the double enqueue scheme, wherein, arrival rate is half the departure rate. In Fig. 7, the latency of ACWRR and WRR when using the double enqueue scheme are compared. The results show that under the same circumstances (enqueuing rate), ACWRR has significantly better latency (inverse of throughput) than WRR. On the average, ACWRR has a latency of 91 versus WRR’s 1977.

Fig. 7.
figure 7

Latency of ACWRR vs. WRR using double enqueue scheme

Figure 8, shows the makespan of ACWRR versus WRR. For all measurements, lower values are desirable. Across all queues, and consistent with the other enqueuing schemes, ACWRR results in an even makespan across all queues. ACWRR results in makespans of 73,755 ns, 73,466 ns and 71,400 ns for queues 1, 2 and 3 respectively. Similarly WRR also results in an even spread of makespan across all queues at 353,479 ns, 353,322 ns and 352,875 ns.

Fig. 8.
figure 8

Makespan of ACWRR vs. WRR using double enqueue scheme

Finally, and in tandem with the latency and makespan, Fig. 9, shows that ACWRR schedules workloads significantly faster than WRR across all queues, with an average delay of 66,968 ns versus 1,449,857 ns for ACWRR and WRR respectively. This implies that using ACWRR, workloads experience shorted delays than with WRR.

Fig. 9.
figure 9

Scheduling delay of ACWRR vs. WRR using double enqueue scheme

6 Contributions

The main contribution of this paper is in the development of a workload scheduling scheme for Cloud computing called ACWRR that:

  1. a.

    Takes multiple user workload classes into consideration when scheduling.

  2. b.

    Is fair to all queues by ensuring all queues experience similar makespan irrespective of their priority.

  3. c.

    Reduces the delays experienced by workload waiting on queues to be scheduled allocation.

7 Conclusion

In this paper, a workload scheduling scheme for Cloud Computing was developed. The scheme called ACWRR takes multiple workload classes into consideration and schedules workloads fairly across all the queues. It ensures that all queues experience similar makespan irrespective of their priority. The developed scheme was benchmarked against the well-known Weighted Round Robin scheme and it performed significantly better with respect to workload scheduling delays and makespan. The developed scheme can be applied to Cloud environment wherein prioritization or service differentiation is in use.

In future, the ACWRR scheme would be tested against the FCFS scheme used by numerous authors in order to test its performance with respect to workload delay and makespan. Also a test of violation of SLA as a result of queuing delay would be considered.