Keywords

1 Introduction

In a cloud computing environment, scheduling a scientific workflow is a crucial problem that is extensively covered in the literature. The issue attracted a lot of attention due to the powerful features provided by cloud environments, which encourage businesses to carry out their scientific workflows in such environments. Among these characteristics, we distinguish the ability of cloud service providers to offer a pool of resources such as computing power, storage, and bandwidth in the form of on-demand services that users can rent via the Internet. Additionally, the cloud computing platform provides rapid elasticity, extensive network access, and measured services [32].

The problem was designated as multi-objective and multi-constrained due to the varying user needs, e.g., some users need to optimize the workflow execution time, while others are concerned with cost optimization or other objectives such as resource utilization and reliability of the system. In addition, a user can express one or more constraints like deadline, budget, security, and others.

Since security is one of the most important user concerns, researchers were constantly looking for the most efficient techniques to prevent security-related issues, ensure privacy and data confidentiality when scheduling. Thus, in the present paper, we introduce a literature review, taxonomy, and comparative analysis of the different research works that investigate the mentioned problem.

The remainder of this paper is structured as follows. Related work is presented in Sect. 2. In Sect. 3, we provide details about our systematic review process. The main aspects surrounding the workflow scheduling problem are presented in Sect. 4. Additionally, the proposed taxonomy is illustrated in Sect. 5. Then, we summarize the existing secure cloud workflow scheduling approaches in Sect. 6, and we present a comprehensive analysis in Sect. 7 with a highlight of the open issues and challenges in Sect. 8. Finally, Sect. 9 brings the paper to a close.

2 Related Work

Scheduling a scientific workflow in a cloud computing environment is a critical issue that is widely discussed in the literature. In fact, some researchers have tended to provide various heuristic, meta-heuristic, and hybrid approaches to solve the problem while considering different objectives and QoS requirements. However, other researchers have tackled the problem by reviewing, classifying, and analyzing existing solutions.

In this field, certain systematic studies define and assess a wide range of workflow scheduling techniques, classifying them in accordance with various criteria [2, 9, 22, 23, 36]. For example, in [17, 51] The authors concentrated on some of the key workflow scheduling techniques and divided them into static and dynamic scheduling strategies, while in [44] the categorization was done in terms of scheduling criteria, schedule generation, and task-resource mapping. Furthermore, in [31] the authors divided the existing scheduling schemes into task scheduling, workflow scheduling, and task and workflow scheduling schemes. After concentrating on the workflow scheduling schemes, the authors divided them into heuristic and meta-heuristic techniques, and then further divided them based on the type of scheduling algorithms. The existing schemes were similarly categorized into heuristic, meta-heuristic, and hybrid methods by [25]. However, the authors’ focus in [21, 43] was solely on meta-heuristic-based methods for cloud task scheduling. While [4] limited their survey on the existing cost optimization scheduling approaches.

Since security is one of the most important user concerns, researchers were constantly looking for the most efficient techniques to solve the problem under security and privacy considerations. However, till now, no study in the literature summarizes and classifies all the existing secure scheduling schemes, which can make it easier for Cloud Service Provider (CSP) and users to choose the most adequate approach to their needs, it also allows researchers to know the gaps, the existing problems and thus provide more effective solutions. Except, [18] performed an investigation into secured workflow scheduling systems in the cloud setting. But, the work is limited in terms of the number of studied approaches, where they explored only 9 research works, as well as they didn’t provide a critical literature review, and they didn’t mention the limitations of each approach. Consequently, we introduce in the present paper a systematic literature review, taxonomy, and comprehensive analysis of almost all the research works that investigate the aforementioned problem over the last decade. Additionally, we highlight and discuss limitations, gaps, and problem aspects that are not sufficiently investigated in the literature, as well as future research challenges.

3 Systematic Review Process

We outline the procedure we used to conduct this review in this section. Where we first gathered different research papers that are published in international journals and conferences during the period (2010–2022), we based on various well-known digital libraries such as Google Scholar,Footnote 1 Springer LNCS,Footnote 2 ScienceDirect—Elsevier,Footnote 3 and IEEE eXplore.Footnote 4 We used different search keywords, e.g., secure workflow scheduling in the cloud, secure optimization, QoS parameters, etc. Second, we examined the collected papers and then deleted the ones that don’t address our problem, or that are not published and indexed in well-known databases. Third, we studied and classified the selected papers, as well as we extracted the gaps of each reviewed paper as described in Sect. 6. Forth, we discussed and analyzed the reviewed approaches as illustrated in Sect. 7. Finally, we conducted the open issues and challenges surrounding the problem as mentioned in Sect. 8 (see Fig. 1).

4 Workflow Scheduling in Cloud

In this section, we describe the most important notions surrounding the cloud workflow scheduling problem.

Scientific Workflow: It is a group of computational tasks that reveal dependencies among them in order to address a scientific issue.

Cloud Environment: Cloud service providers can offer their services in public, private, community, or hybrid environments [32].

  • Public Cloud is a computing environment that allows the public to use resources freely. A corporation, academic institution, government agency, or a combination of them may own, manage, and operate the cloud infrastructure on the premises of the cloud service provider [32].

  • Private Cloud is a computing environment that allows one company to access resources privately. The business, a third party, or a combination of them may own, manage, and operate the cloud infrastructure, which is present on or off the cloud service provider’s premises [32].

  • Community Cloud is a computing environment that allows a community of organizations to access resources privately. The cloud infrastructure may be owned, managed, and maintained by one or more community groups, a third party, or a mix of them. It may exist on or off the cloud service provider’s premises [32].

  • Hybrid Cloud is a computing environment that combines two or more distinct cloud computing environments (public, private, or community) that are linked by standardized or proprietary technology that enables portability of data and applications [32].

Fig. 1
figure 1

Systematic review process

Scheduling Strategy: Workflow scheduling strategies are categorized into three categories (static, dynamic, static and dynamic) depending on the workflow resources’ information that was available during the scheduling process [36, 51].

  • Static Scheduling: this type of schedule ends before the workflow execution begins. The advantage of this strategy is that it produces high-quality solutions as it has the possibility of comparing several feasible solutions to obtain the best fit. However, the disadvantage of this strategy is that it only makes estimates concerning missing information such as task execution time and communication time, which may be poor or not adaptable with the real system [36, 49, 51].

  • Dynamic Scheduling: this type of scheduling is done during the execution of the workflow. The advantage of this strategy is that it adapts to real systems as it takes into account unexpected actions that may occur during execution, and relies on exact information about the workflow and resources. However, the disadvantage of the dynamic schedule is that it cannot produce high-quality solutions, because it has not the possibility of trying several feasible solutions and choosing the best one [36, 49, 51].

  • Static and Dynamic: this category combines the two aforementioned scheduling strategies to achieve their advantages simultaneously, where a static mapping is performed before the start of execution based on approximate estimations. Then, during the execution, the assignment is adapted and redone if necessary [36, 51].

Figure 2 shows the scheduling strategies with their advantages and disadvantages.

Fig. 2
figure 2

Scheduling strategies

Workflow Scheduling Objectives and constraints: Usually, the workflow scheduling process has some objectives to achieve and QoS constraints to meet, where these objectives and constraints express the user’s requirements. In the following, we explain these concepts.

4.1 Scheduling Objectives

In the literature, researchers have introduced various scheduling objectives, but the most studied are makespan and cost besides other objectives like resource utilization, energy consumption, load balancing, reliability of the system, throughput, etc.

  • Makespan is the time passed between the beginning of the first workflow task’s execution and the end of the last workflow task [10, 25].

  • Cost is the price that the user has to pay to run his workflow [11].

  • Resource Utilization refers to the optimal use of resources by minimizing time slots of resource inactivity [10].

  • Energy consumption refers to the consumed energy by the cloud servers including the use of electricity and the carbon emissions [2].

  • Load Balancing refers to the balanced distribution of workloads between different computing resources to avoid overloading one of the resources [5, 12].

  • Reliability is the system’s capacity to deliver service for a specific time frame without interruption or failure [39].

  • Throughput is the number of tasks completed in a predetermined period [33].

4.2 User’s Constraints

In the literature, researchers have introduced various scheduling QoS constraints, but the most expressed are deadline, budget, and security.

  • Deadline is the maximum time a user can wait to complete the execution of his workflow.

  • Budget is the maximum price a user can pay to run his workflow in a cloud environment

  • Security defines the user’s needs in terms of security services and scheduling policies to ensure high protection of his sensitive tasks and data.

Workflow Scheduling Algorithms: Generally, researchers have introduced heuristics, meta-heuristics, and hybrid algorithms to fix the cloud workflow scheduling issues.

  • Heuristics generally used to quickly find satisfactory solutions.

  • Meta-Heuristics generally used to find satisfactory solutions for large-scale problems [15].

  • Hybrid algorithms combine two or more heuristic or meta-heuristic algorithms to gain more advantages simultaneously.

5 Taxonomy of Secured Workflow Scheduling Approaches

In this paper, we have reviewed about 32 scientific papers dealing with secure workflow scheduling problem in cloud computing, these papers were selected from the period (2010–2022) using various well-known digital libraries such as Google Scholar,Footnote 5 Springer LNCS,Footnote 6 ScienceDirect—Elsevier,Footnote 7 and IEEE eXplore.Footnote 8 While reviewing the papers, we noticed that they can be classified according to the aspects mentioned in the previous Sect. 4, in addition to the other two parameters that we have extracted while reviewing, where the first relates to the policy of security they used and the second to the security level they targeted. Thus, Fig. 3 illustrates the taxonomy we have proposed for the secure workflow scheduling approaches in cloud environments.

5.1 Security Policy

The research works we have reviewed use different policies to meet the user requirements in terms of security, among them we distinguish two policies: task/data placement policy and security-enhanced scheduling policy.

Task/Data Placement Policy: This category includes the set of studies assuming that the CSP offers secure resources, e.g., the CSP provides some virtual machines that are pre-provisioned with a certain level of security services, even in the case of the hybrid cloud they assume that the private cloud is secure. Therefore, they propose data/task placement policies so that sensitive data/tasks will be mapped to secure resources.

Security-Enhanced Scheduling Policy: This category includes the set of studies assuming that the user has some security requirements, and the available cloud resources cannot meet these requirements, so they enhance the scheduling approach with some hash functions, encryption/decryption algorithms, and security services like authentication, confidentiality, and integrity services.

5.2 Security Level

The research works we have reviewed can be classified according to the level of security they target in three categories: task level, data level, and task and data level.

Task Level: In this category, the scheduler must preserve sensitive tasks, either by adding security services or by implementing placement strategies so that sensitive tasks will be protected.

Data Level: In this category, the scheduler must preserve sensitive data, either by adding security services or by implementing placement strategies so that sensitive data will be protected.

Task and Data Level: In this category, the scheduler must preserve both tasks and data simultaneously, it hybridizes the two aforementioned strategies.

Fig. 3
figure 3

Taxonomy of secured workflow scheduling techniques in cloud

6 Summary of Secured Workflow Scheduling Approaches

In this section, we list and summarize the existing secure workflow scheduling approaches. In addition, we classify them according to the taxonomy proposed in the previous section (Tables 1 and 2). Furthermore, we criticize the existing approaches and extract the disadvantages for each reviewed paper as shown in Table 3, in order to find the gaps and help researchers to improve and cover this research area and provide more effective solutions.

6.1 Task/Data Placement Policy

In the literature, various studies have proposed task/data placement strategies to secure workflow scheduling, so that sensitive tasks/data will be mapped to the most secure resources without affecting the temporal and monetary cost of the workflow scheduling, as described in Sect. 5.1. Table 1 summarizes and classifies the existing studies regarding the taxonomy proposed in the previous section.

  • Xiaoyong et al. [52] offered a Security Driven Scheduling (SDS) algorithm for heterogeneous distributed systems in order to meet tasks security requirements with minimum makespan, where the SDS algorithm measures the trust level of system nodes dynamically, then provides secure scheduling list that considers security overhead and risk probability.

  • Liu et al. [28] presented a Variable Neighborhood Particle Swarm Optimization (PSO) (VNPSO) to solve the scheduling issues for workflow applications in data-intensive distributed computing environments, where the goal of the schedule was to reduce the makespan while maintaining security requirements. To evaluate the proposed VNPSO, they compared it with Multi-Start Genetic Algorithm (GA) (MSGA) and Multi-Start PSO (MSPSO) and they found that VNPSO is the most feasible and efficient.

  • Marcon et al. [30] proposed a scheduling approach in order to reduce the tenant cost while preserving the security and time requirements in a hybrid cloud.

  • Jianfang et al. [24] They utilized a cloud model to assess the security level of tasks and resources as well as the user’s degree of security satisfaction. They developed a Cloud Workflow Discrete PSO (CWDPSO) algorithm to overcome the security issues while scheduling workflows in the cloud. The scheduling goals were to accelerate the execution, reduce the monetary cost and preserve security requirements.

  • Liu et al. [29] proposed a security-aware placement strategy based on the ACO algorithm, which uses dynamic selection to choose the best data centers for intermediate data while taking data transmission time into account.

  • Zeng et al. [54] introduced the concept of an immovable dataset, which limits the transfer of some information for economic and security reasons. In order to offer higher security services and a quicker response time, they also recommended a security and budget-constrained workflow scheduling approach (SABA).

  • Chen et al. [13] presented a Genetic Algorithm (GA)-based technique to reduce the processing cost and preserve data privacy while scheduling data-intensive workflow applications in the cloud, where they considered that the private data can only be placed and scheduled in a specified data center, it cannot be transmitted to or duplicated from other datacenters during scheduling process.

  • Sharif et al. [37] presented the MPHC algorithm with three policies (MPHC-P1/P2/P3) to address task/data privacy and time requirements while reducing workflow execution costs. Regarding privacy protection, they applied and studied two different policies Multi-terminal Cut and Bell-LaPadula.

  • Li et al. [27] introduced a PSO-based security and cost-aware scheduling (SCAS) algorithm, where the scheduling objective was to reduce monetary cost and met the user’s deadline and risk rate constraints.

  • Prince et al. [35] proposed a hybrid workflow scheduling algorithm that combines the HEFT and SABA algorithms, in order to reduce the workflow execution time under deadline, budget, and security limitations.

  • Shishido et al. [41] studied the impact of three meta-heuristic algorithms Practicle Swarm Optimisation (PSO), GA, and Multi Population GA (MPGA) on the cloud workflow scheduling problem; they aimed to minimize cost and met the user’s deadline and risk rate constraints. The experimental results show that in terms of cost and time, GA-based algorithms outperform PSO-based algorithms.

  • Sujana et al. [45] proposed a PSO-based approach to overcome the secure workflow scheduling issue. They used a Smart PSO algorithm to minimize cost and time and met security requirements, and they are based on a Variable Neighborhood PSO algorithm to fix the local optima issue.

  • Thanka et al. [47] suggested a more effective Artificial Bee Colony (ABC) algorithm to reduce time and cost while maintaining the risk rate restriction. According to the experimental findings, the algorithm guarantees security and is better than other similar algorithms in terms of cost, time, and task migration throughout the schedule.

  • Bidaki et al. [8] suggested an Symbiotic Organism Search (SOS)-based method to reduce the processing time and cost while maintaining security. The simulations demonstrate that in terms of cost, makespan, and level of security, the SOS-based method performs better than the PSO-based method.

  • Naidu and Bhagat [34] suggested a Modified-PSO with a Scout-Adaption (MPSO-SA) scheduling approach to reduce workflow execution time under security restrictions. In order to preserve the security constraint, they schedule the workflow tasks in three modes (secure mode, risky mode, and gamma-risky mode) according to the task security requirements. The simulation findings demonstrate that MPSO-SA offers a more cost-effective, low-security risk alternative to GA, PSO, and VNPSO.

  • Arunarani et al. [6] presented the FFBAT algorithm that hybridizes the Firefly and BAT algorithms to reduce the cost of workflow execution while satisfying deadline and risk restrictions. The simulation results demonstrate that, in terms of cost, time, and risk level, FFBAT is preferable to Firefly and BAT algorithms.

  • Xu et al. [53] proposed data placement method to save costs and energy consumption in the hybrid cloud environment, where the scheduling goals were to retain sensitive data while reducing energy use in the private cloud and financial costs in the public cloud.

  • Shishido et al. [42] suggested a Workflow Scheduling Task Selection Policy (WS-TSP), where they used a MPGA to reduce the execution cost while maintaining the deadline.

  • Swamy and Mandapati [46] proposed a fuzzy logic algorithm to address the dynamic cloud workflow scheduling problem while minimizing energy consumption and preserving the user’s security requirements. According to the simulation findings, the algorithm outperforms Max-Min and Min-Min algorithms in terms of makespan, resource utilization, and degree of imbalance.

  • Wen et al. [50] proposed a Multi-Objective Privacy-Aware (MOPA) workflow scheduling system, which intended to reduce cost while maintaining the privacy protection requirement. The algorithm uses the Pareto optimality technique to determine the trade-off between the scheduling objectives.

  • Abdali et al. [3] suggested a hybrid meta-heuristic scheduling technique that combines the Chaotic PSO algorithm and the ac GA algorithm to reduce risk rate and user limitations while minimizing cost and load balance deviation.

6.2 Security-Enhanced Scheduling Policy

In the literature, various studies have proposed security-enhanced scheduling strategies, where they proposed optimized scheduling schemes that meet the scheduling objectives and QoS constraints, then they add some security service to preserve the task/data privacy as described in Sect. 5.1. Table 2 summarizes and classifies the existing studies regarding the taxonomy proposed in the previous section.

  • Zhu et al. [55] provided a Security-Aware Workflow Scheduling algorithm (SAWS) to save cost and time, enhance resource usage of virtual machines, and maintain security measures. To achieve these objectives and meet the constraints, the approach makes advantage of task slack time, intermediate data encryption, and selective task duplication to fill empty slots.

  • Chen et al. [14] extended the Zhu et al. [55] work, where they presented a scheduling approach with selective tasks duplication, named SOLID, which is a developed version of the SAWS algorithm. The simulation results show that SOLID is more efficient in terms of makespan, monetary costs, and resource efficiency compared to existing similar algorithms.

  • Hammouti et al. [20] established a new workflow scheduling strategy for hybrid cloud to provide clients high security systems at minimal cost and time. The proposed strategy consists of three modules: the Pre-Scheduler, which assigns each task or dataset to be executed or stored in either the private or public cloud; the Security Enhancement Module, which adds the dataset’s necessary security services while minimizing the generated cost overhead; and the Post-Scheduler, which assigns each task or dataset to be executed or stored in the appropriate VM. The results of the experiment demonstrate that the suggested technique maintains the same cost but increases the execution time.

  • Dubey et al. [16] created a new Management System for supplying a Community cloud with multiple organizations (MSMC), where they proposed a new cloud framework enhanced with some security services in addition to three algorithms, where the first is an allocation method that effectively divides up the available VMs across different companies with a variety of employees, the second is scheduling algorithm, called Ideal Distribution Algorithm (IDA) that takes into account cost and time restrictions and the third is an Enhaced IDA (EIDA) algorithm to improve the workload balance.

  • Abazari et al. [1] developed a scheduling technique to reduce makespan and satisfy the task’s security criteria. Moreover, they enhanced the system security by introducing a novel attack response strategy to lessen some cloud security vulnerabilities.

  • Hammed and Arunkumar [19] proposed a new cloud workflow scheduling approach to minimize cost and time and maintain sensitive data. The proposed approach consists of four phases, categorization of tasks, scheduling of tasks, resource allocation, and security provisioning. Where the first phase categorizes the workflow tasks into high sensitive and less sensitive based on user requirements, the second and third phases are concerned with the workflow scheduling and resource allocation with minimum cost and time, and the fourth phase offers a security provisioning only for sensitive tasks resulted from the first phase.

  • Wang et al. [48] proposed a Task Scheduling method concerning Security (TSS) in hybrid clouds, where the scheduling objectives are the completed task number maximization and minimizing the total cost of renting public resources while meeting with user’s security and deadline requirements. Concerning security, they assume that the private cloud is secure, but for tasks assigned to the public cloud, they provide some authentication, integrity, and confidentiality services with different levels according to the user requirements. In addition, the TSS algorithm can control the overhead generated by the security services and minimize the total cost and meet the deadline constraint.

  • Shishido et al. [40] proposed a scheduling strategy employing a multi-population genetic algorithm to save costs and preserve the user’s defined deadline. In addition, they introduced a user annotation technique for workflow tasks according to sensitiveness, subsequently looked at the effects of using security services for delicate tasks. The simulation findings demonstrate that, when compared to existing techniques in the literature, the proposed method can more effectively preserve sensitive tasks at a lower cost.

  • Lei et al. [26] discussed that a hybrid encryption technique is developed using hash functions to maintain data security while data moves between multiple clouds, and a novel privacy- and security-aware scheduling model is also offered. Then, in order to minimize the cost within the restrictions of deadline and privacy, they introduced a simulated annealing method and a privacy- and security-aware list scheduling algorithm.

  • Bal et al. [7] proposed a combined Resource Allocation security with an efficient Task Scheduling algorithm in cloud computing using a Hybrid Machine learning (RATS-HM) technique; it consists of scheduling tasks with minimum makespan and maximum throughput using an improved cat swarm optimization algorithm. Then, allocating resources under bandwidth and resource load constraints using a group optimization-based deep neural network. Finally, an authentication method is suggested for data encryption to secure data storage.

Table 1 Secured workflow scheduling approaches with task/data placement policy
Table 2 Secured workflow scheduling approaches with security-enhanced scheduling policy

Table 3 summarizes the gaps and disadvantages we found while reviewing the papers, which may help researchers to develop, improve, and cover this research field and provide more effective solutions.

Table 3 Gaps of secure workflow scheduling papers

7 Analysis and Discussion

In the present paper, we reviewed about 32 secure scheduling approaches in cloud computing as shown in Tables 1 and 2. In this section, we provide a comprehensive analysis and discussion of the reviewed approaches regarding different aspects, including cloud environment, scheduling strategy, security levels, proposed algorithms, and scheduling objectives as illustrated in Fig. 4.

Fig. 4
figure 4

Secured cloud scheduling approaches

Cloud Environment. Figure 4a illustrates that the public cloud gains the interest of about 75% of the approaches reviewed, while 22% focus on the hybrid cloud. However, the community cloud is not yet sufficiently studied.

Scheduling Strategy. Figure 4b indicates that most of the studied researches (84%) use static scheduling strategies, while 13% use dynamic strategies and only 3% use static and dynamic strategies, despite the hybrid strategy is more useful than the others.

Security Level. Figure 4c shows that 58% of the reviewed researches aim to secure tasks, while 35% aim to secure data and only 07% aim to secure both task and data simultaneously.

Proposed Algorithm. Figure 4d demonstrates that 50% of the reviewed researches proposed meta-heuristic algorithms to solve the problem, while 41% of them proposed heuristic algorithms and only 9% proposed hybrid algorithms.

Optimization problem and Scheduling Objectives. Among Fig. 4e we notice that 62% of studied researches investigated single-objective optimization problems, while 38% considered multi-objective optimization problems. Concerning the scheduling objectives, we observe that Cost and Makespan are the most targeted objectives.

Figure 5 indicates that the problem of secure workflow scheduling has not been discussed sufficiently in the past decade and that most of the papers reviewed were published in 2017. However, it didn’t receive much attention during 2010–2014 and 2020–2022.

Fig. 5
figure 5

Secured cloud scheduling papers per year

8 Open Issues and Challenges

Among the previous sections, we conclude that the problem of secure workflow scheduling in cloud computing is not yet sufficiently discussed in the literature. Consequently, we extract the following issues and challenges:

  • Introduce dynamic scheduling strategies, as they are more adaptable to real systems and take more parameters into consideration.

  • Introduce hybrid scheduling strategies (static and dynamic), as they provide high-quality solutions and they are adaptable to the real systems at the same time.

  • Provide scheduling approaches that aim to secure both task and data, in order to improve and ensure the security of all the workflow.

  • Provide more secure scheduling approaches that address the hybrid and community clouds as they are insufficiently studied.

  • Introduce secure scheduling approaches that address more scheduling objectives, e.g., energy, load balancing, etc.

  • Focus on multi-objective optimization approaches, as they aim to provide trade-off solutions between different scheduling objectives.

  • Provide more robust scheduling security-enhanced policies to keep tasks and data secure simultaneously.

9 Conclusion

In this paper, we reviewed the existing secure workflow scheduling strategies, we outlined important aspects surrounding the problem, and then we introduced a taxonomy to categorize existing research papers. Moreover, we provided a comprehensive analysis, and hence we derived different gaps, issues, and challenges that can help researchers to improve and cover this research area and provide more effective solutions. In future studies, we can investigate the aforementioned issues and challenges.