A survey and taxonomy on workload scheduling and resource provisioning in hybrid clouds

Wang, Bo; Wang, Changhai; Song, Ying; Cao, Jie; Cui, Xiao; Zhang, Ling

doi:10.1007/s10586-020-03048-8

A survey and taxonomy on workload scheduling and resource provisioning in hybrid clouds

Published: 05 February 2020

Volume 23, pages 2809–2834, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

A survey and taxonomy on workload scheduling and resource provisioning in hybrid clouds

Download PDF

Bo Wang ORCID: orcid.org/0000-0003-3598-5359¹,
Changhai Wang¹,
Ying Song^2,3,
Jie Cao¹,
Xiao Cui¹ &
…
Ling Zhang¹

1151 Accesses
29 Citations
Explore all metrics

Abstract

Hybrid cloud is a cost-efficient way to address the problem of insufficient resources for meeting the peak demand of its users for a service provider, which elastically scales up or down the cloud capability based on demand by combining local infrastructures and one or more public clouds. While, the combination introduces new challenges that must necessarily be addressed before adoption. To address these new challenges for improving the resource efficiency in hybrid clouds, much work tried to solve the decision problem of workload scheduling, resource provisioning, or both, where workload scheduling answers how to efficiently map workloads to available resources, and resource provisioning addresses how to optimally provision resources based on demand. In this article, we proposed a comprehensive taxonomy of workload scheduling and resource provisioning in hybrid cloud environments to investigate and classify 146 related research articles. Based on the investigation, we summarized the challenges which have not been addressed by these researches, and discussed future directions and trends in the area.

A comprehensive survey on cloud computing scheduling techniques

Article 22 November 2023

Resource Management and Scheduling Algorithm in Hybrid Cloud Environment

Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment: Advantages and Limitations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cloud computing has received increasing attention in both research and business for about one decade due as it provides a great deal of benefits, e.g., elastic resource provisioning, pay-for-use, economies of scale, high reliability, dynamic customization, etc. [175]. But although resources appear to be infinite to users, a cloud has limited resources in the real world. Thus a cloud should have enough resources for satisfying the peak demand of its users’ requests to satisfy all Quality of Service (QoS) requirements of users for its reputation [45].

When there are insufficient resources for a local cloud meeting the peak demand of its users, three methods can be exploited. The first one is to reject some unimportant requests, e.g., cheap requests, to make room for important requests whose rejections cost much more [106]. While this method would reduce the cloud provider’s reputation [45], and may further result in the loss of potential users. Second, the cloud provider increases infrastructures enough for the peak demand. While, in real production environments, the peak resource demand of users is usually much more than the average one, but transient [14, 99], in a cloud, which leads to lots of idle resources most of the time if using the second method. Besides, most of the small to medium enterprises have insufficient capital for infrastructure investments. The third method, hybrid cloud (a.k.a. cloud bursting), is a cost-efficient way to address the problem, which elastically scales up or down the cloud capability based on demand by combining local infrastructures (clusters, grids, or private clouds) and one or more public clouds. Surveyed by the European Network and Information Security Agency (ENISA), most of the small to medium enterprises prefer a mixture of cloud computing models (public cloud, private cloud) [26]. Nowadays, both commercial and open-source virtualization tools support basic cloud bursting functionalities, e.g., VMware [182], Open Nebula [150], OpenStack [151]. As a new computing paradigm, hybrid cloud computing plays a crucial role not only in developing cloud computing, but also in integration of cloud computing and Internet of Things (IoTs) [25] as others, e.g., edge computing [165], fog computing [51], mobile cloud computing [62].

For providing services, cloud providers must solve the problems of provisioning the optimal number of resources based on demand, i.e. resource provisioning, and mapping users’ workloads to available resources efficiently, i.e. workload scheduling [130]. Workload scheduling is to decide the order (priority) of workloads to be executed on available resources in various scheduling units, e.g., virtual machines (VMs), tasks, user requests, while resource provisioning judges which and what amount of resources should to be allocated to the scheduled workloads. For optimally providing services, the cloud provider should schedule workloads based on the characteristics of available resources, e.g., heterogeneity [55, 89], reliability [18], and provision resources considering the features of workloads to be run, such as various QoS [72], interdependences between service components [100], interferences between workloads [195], etc. Thus, workload scheduling and resource provisioning are intimately related with each other, and both of which are essential to a cloud management.

However, for a service provider, the hybrid cloud resource management not only has all of the challenges for provisioning services both on a private cloud, e.g., server consolidation [183], and on public clouds, e.g., elasticity management [10], but also introduces new ones, such as the heterogeneity between clouds in terms of various resources [131, 187], the decision of which services and/or which part of a service to be outsourced to the public cloud [100, 145], the performance overhead caused by the network connection between clouds with much lower bandwidth [73, 131], and so on.

In this paper, we surveyed the articles about workload scheduling and resource provisioning for hybrid clouds in recent 10 years. We first presented a comprehensive taxonomy of workload scheduling and resource provisioning in hybrid cloud environments to investigate these 146 related research works in detail. Then, we discussed the challenges which have not been addressed, and suggested several promising directions for future research, based on the detailed investigation for these related papers. To our best knowledge, no work has extensively and thoroughly reviewed workload scheduling and resource provisioning in hybrid clouds. We believe our review work is helpful to academia and industry concerning hybrid clouds.

The rest of this paper was organized as follows. Section 2 presented the hybrid cloud architecture, which is helpful to understand the remainder of the paper, the survey works related to the resource provisioning and the workload scheduling in hybrid clouds, and the method of collecting related works reviewed. Section 3 introduced in detail the comprehensive taxonomy of workload scheduling and resource provisioning on hybrid cloud management and investigated related works in depth. Section 4 summarized the challenge and the opportunity for future work. And finally, Sect. 5 concluded the paper.

2 Background

In this section, we first provided a simple overview of hybrid cloud architecture, which is helpful to understand the remainder of the paper. Then, we presented the previous work surveying hybrid cloud resource management, and the search method for the related literature reviewed in this paper.

2.1 Hybrid cloud architecture

In a hybrid cloud environment, as shown in Fig. 1, users request and pay for services with various QoS requirements from the service provider with local or public resources including computing, storage, network resources, and so on. The service provider provisions its local resources on which it schedules user requests to process, and rents resources from public clouds when local resources are not enough to satisfy any QoS requirement.

The local resources can be provisioned as either physical resources, e.g., clusters and grids, or virtualized resources which are managed with virtualization tools, e.g., Xen [16], KVM [103], and Docker [64]. Virtualization introduces many benefits, e.g., better isolation and manageability of resources, while with significant performance overheads [120, 185]. The public computing resources are usually provisioned in the form of VM, e.g., Amazon EC2 [2], Alibaba Cloud [1], etc. The local and public resources are different in various characteristics, which will be explained in detail in Sect. 3.3, and the resource provisioning policies should be designed with their own peculiarities, respectively. In a hybrid cloud, the public resources may be provisioned by more than one public clouds to avoid vendor lock-in. Then a broker of multi-cloud service composition may be introduced to save the service provider the complex issue of choosing the best fit public cloud [12]. While the use of broker may lead to the service provider losing some optimization opportunities, e.g., the communication cost between public clouds, as the execution of workloads on the public clouds are transparent to the provider.

In this article, we focused on the hybrid cloud resource management including workload scheduling and resource provisioning on hybrid clouds.

2.2 Related survey work

Although there have been plenty of researches surveying resource management on a cloud, only a few works concerned the hybrid cloud.

Bittencourt et al. [22] compared the performance of seven scheduling algorithms on hybrid clouds, where only two work was specially designed for hybrid clouds. And they assessed the impact of communication links on schedules, concluding that the increase of bandwidth reduced the costs and the makespan, and that HCOC [21, 23], a scheduler in hybrid clouds, outperformed MDP [196] designed for utility computing.

Fadel and Fayoumi [68] surveyed 19 works published over 5 years ago, tackling the issue of cloud bursting whose challenges were choosing the best workload to burst and choosing the best resource to provision.

Chopra and Singh [42] investigated only six task scheduling methods for hybrid clouds, considering only three aspects: their optimization criteria, multi-core processor awareness and the number of workflows supported. Their criteria involved only the cost for renting public computational resources and workflows’ finish time.

Manikandan and Suguna [134] reviewed 10 papers about resource provisioning for clouds, where there was only one work [217] focusing on hybrid clouds.

Bhosale and Bandari [19] reviewed three works [27, 173, 191] of Aneka cloud platform developed by CLOUDS Laboratory in the University of Melbourne, which provisioned resources for multiple non-interactive workloads involving a large number of files in hybrid clouds.

de Assunção et al. [53] investigated whether a local cluster can benefit from using clouds to improve its requests’ performance by evaluating seven scheduling strategies designed based on conservation [143], aggressive [121], and selective backfilling [172], in terms of various performance metrics, e.g., average weighted response, job slowdown, number of deadline violations, number of jobs rejected and the cost for using clouds.

In this article, we reviewed 146 research works studying on the workload scheduling or/and resource provisioning for various service deliveries, e.g., scientific computing, web services, infrastructures (i.e., VMs), in hybrid clouds. We presented a comprehensive taxonomy to categorize these related works for helping us to summarize the challenges which have not been addressed and propose promising directions for future research. We hope that our work is helpful for both research and business in hybrid cloud management.

2.3 Literature search

The literatures we reviewed include the followings:

(1)
the relevant papers obtained by querying the Engineering Village Compendex database [67] and the Web of Science Core Collection [190] with the searching conditions (in the form of the query statement in Engineering Village Compendex database), (“hybrid cloud” OR “hybrid clouds” OR “cloud bursting”) AND (“task scheduling” OR “job scheduling” OR “application scheduling” OR “request scheduling” OR “service scheduling” OR “task management” OR “job management” OR “request management” OR “application management” OR “service management” OR “service migration” OR “resource scheduling” OR “resource provision” OR “resource provisioning” OR “resource management”), to cover all high quality research papers;
(2)
the relevant references of the papers obtained in (1), (2) and (3) (a recursive procedure);
(3)
and the relevant literatures citing the papers obtained in (1), (2) and (3), which were achieved by Google Scholar [77] (a recursive procedure).

3 Taxonomy

This section presented the detailed taxonomy for workload scheduling and resource provisioning in hybrid cloud environments. We classify related works in six ways according to the properties of the hybrid cloud optimization problem they solved, as shown in Fig. 2. This classification can help us to review related works in detail and summarize them for leading out challenges and opportunities of optimizing the use of resources in hybrid clouds. The taxonomy is detailed as followings.

(1)
Requirement. When exploiting cloud resources, users have various QoS requirements for their workloads, e.g., response time, security, etc. In reviewed works, these requirements are treated as objective(s) of optimizing one or more QoS metrics, or constraints of restricting some metric values, illustrated in Sect. 3.1 in details.
(2)
Workload types. There are various types of workload to be executed on clouds. Distinct types of workload have requirements of various characteristics of resource, and thus should be managed by different methods to achieve the best result [37]. The workload type is a useful differentiating factor for related literatures to understand the applicability of a scheduling/provisioning method to a type of workload. Workload types concerned are detailed in Sect. 3.2.
(3)
Resource characteristics. Private and public resources have differences in various aspects, e.g., cost–performance ratio, security, reliability, etc., which will be described in Sect. 3.3, leading to different types and different amounts of hybrid resources required by various workloads with diverse QoS requirements. In general, cloud resources are heterogeneous because of continuously infrastructure updates during operation in a cloud, which is one of basic challenges complicating scheduling in clouds [56, 57]. The mix of private and public clouds make various resources more heterogeneous, which brings much more challenges. While, existing related works simplified the resource optimization problem by ignoring more or less heterogeneity or diversity of hybrid cloud resources, such as ignoring the heterogeneity between private resources and public resources, illustrated in Sect. 3.3.
(4)
Private resource cost model. The total costs of cloud operations are mainly composed of investment costs for buying infrastructures and operational costs which consist of the electricity costs for power, software copyright costs, hardware/software maintenance costs, and so on [171, 184]. In general, investment costs are consider as “sunk costs”, and operational costs are evaluated by consumed power due to its very largest part of operational costs [71]. When concerning the cost of executing workloads on hybrid clouds, one should consider costs of both private and public resources used. The ways how to deal with the private resource costs by related works are shown in Sect. 3.4.
(5)
Public resource costs. When renting resources from public clouds, service providers, which is also public cloud users, are charged based on the amount of rented resources and the rent time. In existing related literatures, there are three types of resources, computing (VMs), storage, and network bandwidth, concerned to be charged by the public cloud. The cost model for each type of resources for each work is presented in Sect. 3.5.
(6)
Factors of workload processing time. The performance of workloads, e.g., finish time, response time, is one of the most concern in clouds. While there are a number of factors affecting the performance, and one work cannot address all of these factors. Therefore, each work has its own concern on some factors to solve the scheduling or provisioning problem in a hybrid cloud environment with simplification which was believed reasonable. Performance factors concerned by related works are shown in Sect. 3.6.
Fig. 2
The taxonomy of workload scheduling and resource provisioning in hybrid cloud environments
Full size image

3.1 Requirements

Service providers have various requirements in different hybrid cloud environments concerning various QoS when managing hybrid resources. These requirements can be handled as optimization objectives or constraints in a hybrid cloud resource management problem. The requirements concerned by current works are as follows.^{Footnote 1} Table 1 summarizes requirements concerned by each work.

Table 1 Classifying literatures based on requirements

A survey and taxonomy on workload scheduling and resource provisioning in hybrid clouds

Abstract

Similar content being viewed by others

A comprehensive survey on cloud computing scheduling techniques

Resource Management and Scheduling Algorithm in Hybrid Cloud Environment

Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment: Advantages and Limitations

Explore related subjects

1 Introduction

2 Background

2.1 Hybrid cloud architecture

2.2 Related survey work

2.3 Literature search

3 Taxonomy

3.1 Requirements

3.1.1 Profit or cost

3.1.2 Application performance

3.1.3 Public resource amount

3.1.4 Resource utilization

3.1.5 Security

3.1.6 Reliability/availability

3.1.7 SLA violation

3.1.8 Others

3.1.9 Single- or multi-objective

3.2 Workloads

3.2.1 Batch jobs

3.2.2 Web services

3.2.3 Literature review

3.3 Hybrid resource characteristics

3.4 Cost model of local resources

3.5 Cost of public resources

3.6 Factors of processing time

4 Challenges and directions

4.1 Potential of using distributed public clouds

4.2 Cost evaluation

4.3 Performance evaluation

4.4 The VM provisioning delay

4.5 Workload prediction

4.6 Reliability

4.7 Security

4.8 Optimization for hybrid workloads

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation