1 Introduction

As the Internet of Things (IoT) continues to encompass a wide spectrum of devices and sensors, an unprecedented volume and variety of data is generated at staggering speeds, often requiring processing within strict time constraints, in a real-time manner [7, 10, 23]. Healthcare and traffic monitoring IoT devices and sensors are examples of such time-critical cases [9, 19, 30]. Often the IoT data are processed by real-time jobs, consisting of tasks with precedence constraints among them, forming a workflow, where the output data of a task are used as input by other tasks. A task in the workflow without any parent tasks is called an entry task, whereas a task without any child tasks is called an exit task [5, 25,26,27].

Due to the explosive growth of the IoT, fog computing emerged as a new paradigm, complementing cloud computing. The fog extends the cloud to the network edge, close to where the IoT data are generated, instead of sending vast amounts of IoT data to the cloud, as physical proximity affects end-to-end latency [2, 6, 11]. Typically, only selected data are sent to the cloud, for example for further historical analysis. The heterogeneous computational resources in the fog can be virtualized, as in the case of cloud computing. Consequently, a fog node can be a virtual machine (VM) [16]. However, the number and computational capacity of the resources in a fog platform are typically limited, compared to those in a cloud environment.

1.1 Motivation

As the volume, variety and velocity of the generated IoT data continue to increase, their real-time processing requires resources with higher computational capacity than those traditionally found in a fog environment. On the other hand, the computational capacity of the cloud is virtually unlimited, but entails a higher communication latency, as well as monetary cost [17].

Consequently, the workload operating on IoT data should be appropriately distributed to resources in both the fog and cloud layers, taking into account the computational and communication characteristics of each individual job [14]. The workload orchestration in such a framework requires the utilization of an effective and dynamic fog and cloud-aware scheduling heuristic. This is especially crucial in time-critical environments, such as traffic control systems [13].

1.2 Contribution

Towards this direction, in this paper we propose a hybrid fog and cloud-aware heuristic for the dynamic scheduling of multiple real-time IoT workflows in a three-tiered architecture. In contrast to traditional approaches where the main processing of IoT jobs is performed in the fog layer, our approach attempts to schedule computationally demanding tasks with low communication requirements in the cloud and communication intensive tasks with low computational demands in the fog, utilizing possible gaps in the schedule of the fog and cloud VMs. Furthermore, during the scheduling process, our approach takes into account the communication cost incurred by the transfer of data from the sensors and devices in the IoT layer to the VMs in the fog layer.

The remainder of the paper is organized as follows: Section 2 provides an overview of related literature. Section 3 presents the system and workload models, as well as the employed cloud pricing scheme. Section 4 describes the proposed scheduling heuristic, while Section 5 gives a description of the performance metrics, the experimental setup and analyzes the simulation results. Finally, Section 6 summarizes and concludes the paper.

2 Related work

One of the most well-known workflow scheduling techniques in heterogeneous environments, is the Heterogeneous Earliest Finish Time (HEFT) strategy, proposed by Topcuoglu et al. [29]. It consists of a task selection phase and a processor selection phase. During the task selection phase, tasks are prioritized according to their position in the workflow graph and the task with the highest priority is selected. Subsequently, in the processor selection phase, the selected task is assigned to the processor that can provide it with the earliest finish time, utilizing idle time slots in the processor’s schedule. In [1], Arabnejad and Barbosa propose the Predict Earliest Finish Time (PEFT) strategy, which is essentially an enhanced version of the HEFT policy. It introduces a look ahead feature based on an optimistic cost table. The authors show that their proposed approach outperforms HEFT in terms of the scheduling length ratio metric.

Jiang et al. in [12], present a novel clustering algorithm, the Path Clustering Heuristic with Distributed Gap Search (PCH-DGS), for the scheduling of multiple workflows in a heterogeneous cloud. Their proposed method tries to insert each group of tasks into the first available schedule gap in a processor’s schedule. The tasks of a workflow are partitioned into groups in an attempt to minimize the communication cost between them. In case the time gap cannot accommodate all of the tasks of the group, the rest of the group’s tasks are inserted into the next available gap in the schedule of the same or other processor in the system, in a recursive manner. Even though all of the above algorithms are suitable for scheduling workflows in a heterogeneous environment, however, they do not consider the characteristics of a fog and cloud architecture. More importantly, they are static and they do not take into account any timing constraints.

Scheduling in hybrid fog and cloud environments has been attracting more and more attention [4, 20, 28]. A workload allocation approach in a fog-cloud architecture is proposed in [8] by Deng et al. The authors investigate the tradeoff between power consumption and transmission delay in the two-tiered architecture. Their approach attempts to determine the optimal workload allocation between the fog and cloud layers, based on these two factors. Based on simulation experiments and analytical solutions, it is shown that by sacrificing modest computational resources in order to save communication bandwidth and reduce transmission latency through the proposed approach, fog computing can significantly improve the performance of cloud computing. However, the proposed approach cannot be applied to workflow applications, as data dependencies and precedence constraints are not considered between the tasks of the workload.

In [15], Nan et al. propose an online algorithm, called Unit-slot Optimization, for scheduling applications in a three-tiered architecture, consisting of an IoT layer, a fog layer and a cloud layer. The fog nodes do not involve any monetary cost for processing the applications, but have limited computational capacity. On the other hand, the cloud nodes are more computationally capable, but involve monetary cost. A portion of the applications that arrive at the fog layer are offloaded to the cloud layer in an attempt to find a balance between the average application response time and the average monetary cost. The proposed approach dynamically adjusts the tradeoff between these two factors, based on the technique of Lyapunov optimization. It is shown that the proposed approach can provide cost-effective processing, while guaranteeing average response time. However, even though the response time of the applications is considered in this work, no real-time constraints (i.e. deadlines) are taken into account. Furthermore, the proposed policy is not suitable for scheduling workflow applications, as no inter-task dependencies are considered.

On the other hand, a first attempt to workflow scheduling based on collaboration between cloud and fog computing is presented by Pham et al. in [18]. The major objective of the proposed heuristic, Cost-Makespan aware Scheduling (CMaS), is to achieve a tradeoff between the application time and the cost of the use of cloud resources, under user-defined constraints. Even though this approach is both fog and cloud-aware and is suitable for real-time workflows, utilizing idle time slots during the scheduling process, however, it exhibits the following drawbacks:

  • It is static and thus not practically suitable for the dynamic nature of IoT applications.

  • It only considers a single workflow for scheduling.

  • During the scheduling process, it does not take into account the communication cost incurred by the transfer of data from the IoT layer.

On the contrary, the fog and cloud-aware heuristic proposed in this paper is suitable for the dynamic scheduling of multiple real-time workflows, utilizing possible schedule gaps. Furthermore, during the scheduling process, it takes into account the communication cost incurred by the transfer of data from the sensors and devices in the IoT layer to the VMs in the fog layer.

3 Problem formulation

3.1 System model

The three-tiered environment under study is depicted in Fig. 1. The IoT layer consists of sensors and devices that transmit data through a WiFi or cellular (4G/LTE) network to the fog. The fog layer consists of a set of fog nodes. Specifically, the fog environment has an underlying infrastructure that consists of a set \(\mathcal {H}^{\text {fog}}=\{ host_{1}^{\text {fog}},..., host_{h}^{\text {fog}} \}\) of hfog physical hosts with heterogeneous processors, connected via Ethernet. Each host \(host_{i}^{\text {fog}}\) has a multi-core processor that consists of \(c_{i}^{\text {fog}}\) identical cores. There is a set \(\mathcal {V}^{\text {fog}}=\{ vm_{1}^{\text {fog}},...,vm_{v}^{\text {fog}} \}\) of vfog VMs in the fog, where each VM is assigned a virtual CPU (vCPU).

Fig. 1
figure 1

The IoT, fog and cloud layers of the architecture under study

The cloud layer consists of a set of reserved dedicated cloud nodes. Specifically, there is a set \(\mathcal {H}^{\text {cloud}}=\{ host_{1}^{\text {cloud}},..., host_{h}^{\text {cloud}} \}\) of hcloud physical hosts with heterogeneous processors, connected via Ethernet. Each host \(host_{i}^{\text {cloud}}\) has a multi-core processor that consists of \(c_{i}^{\text {cloud}}\) identical cores. There is a set \(\mathcal {V}^{\text {cloud}}=\{ vm_{1}^{\text {cloud}},...,vm_{v}^{\text {cloud}} \}\) of vcloud VMs in the cloud, where each VM is assigned a virtual CPU (vCPU). The vCPU of a fog or cloud VM corresponds to a physical core of the respective host. The operating frequency fi of a VM vmi corresponds to the operating frequency of its assigned physical core. All of the cores in the fog and cloud layers require the same number of clock cycles per instruction.

The data transfer rate between the IoT devices and sensors and the fog layer is denoted by lIoT and is uniformly distributed in the range \(\left [\overline {l^{\text {IoT}}}\cdot \left (1-L^{\text {IoT}}/2\right ),\overline {l^{\text {IoT}}}\cdot \left (1+L^{\text {IoT}}/2\right )\right ]\), where LIoT is the heterogeneity degree of the network that connects the IoT layer and the fog layer, whereas \(\overline {l^{\text {IoT}}}\) is the mean data transfer rate between the two layers.

The VMs in the fog and cloud layers are fully connected by a virtual network that connects the two layers over the Internet (e.g. through a site-to-site VPN). The data transfer rate between two fog VMs \(vm_{i}^{\text {fog}}\) and \(vm_{j}^{\text {fog}}\) is denoted by \(l_{ij}^{\text {fog}}\) and is uniformly distributed in the range \(\left [\overline {l^{\text {fog}}}\cdot \left (1-L^{\text {fog}}/2\right ),\overline {l^{\text {fog}}}\cdot \left (1+L^{\text {fog}}/2\right )\right ]\), where Lfog is the heterogeneity degree of the virtual network in the fog layer, whereas \(\overline {l^{\text {fog}}}\) is the mean data transfer rate of the respective communication links.

On the other hand, the data transfer rate between two cloud VMs \(vm_{i}^{\text {cloud}}\) and \(vm_{j}^{\text {cloud}}\) is denoted by \(l_{ij}^{\text {cloud}}\) and is uniformly distributed in the range \(\left [\overline {l^{\text {cloud}}}\cdot \left (1-L^{\text {cloud}}/2\right ), \overline {l^{\text {cloud}}}\cdot \right .\) (1 + Lcloud/2)], where Lcloud is the heterogeneity degree of the virtual network in the cloud layer, whereas \(\overline {l^{\text {cloud}}}\) is the mean data transfer rate of the respective communication links. Finally, the data transfer rate between a fog VM \(vm_{i}^{\text {fog}}\) and a cloud VM \(vm_{j}^{\text {cloud}}\) is denoted by \(l_{ij}^{\text {inter}}\) and is uniformly distributed in the range \(\left [\overline {l^{\text {inter}}}\cdot \left (1-L^{\text {inter}}/2\right ),\overline {l^{\text {inter}}}\cdot \left (1+L^{\text {inter}}/2\right )\right ]\), where Linter is the heterogeneity degree of the virtual network that connects the two layers, whereas \(\overline {l^{\text {inter}}}\) is the mean data transfer rate of the respective communication links. It is noted that the superscript indicators in the variables names are used in order to differentiate between the variables corresponding to each layer. There is a fog and cloud-aware central scheduler running on a dedicated host in the fog layer that is responsible for scheduling the tasks to the VMs in the fog and the cloud.

3.2 Workload model

The data generated and transmitted to the fog by the devices and sensors of the IoT layer, are processed by multiple real-time workflow jobs, which arrive dynamically at the central scheduler in a Poisson stream with rate λ. Each workflow job is represented by a directed acyclic graph (DAG)\(G=(\mathcal {N},\mathcal {E})\), where \(\mathcal {N}\) is the set of the nodes of the graph and \(\mathcal {E}\) is the set of the directed edges between the nodes. Each node represents a component task ni of the workflow, whereas a directed edge eij between two tasks ni and nj represents the data that must be transferred from task ni to task nj. The component tasks of a workflow are not preemptible, as preemption of real-time tasks may lead to performance degradation [3, 22, 24]. In the rest of the paper, the terms job, workflow and DAG are used interchangeably.

Each task ni has a weight wi that denotes its computational volume, i.e. the number of clock cycles required to execute the instructions of the particular task. The computational volume of each task is exponentially distributed with mean \(\overline {w}\). The computational cost of the task ni on a VM vmj is given by:

$$ Comp(n_{i},vm_{j})=w_{i}/f_{j} $$
(1)

where fj is the operating frequency of VM vmj.

Each edge eij between two tasks ni and nj has a weight zij that represents its communication volume, i.e. the number of GB of data needed to be transferred between the two tasks. The communication volume of each edge is exponentially distributed with mean \(\overline {z}\). The communication cost of the edge eij is incurred when data are transferred from task ni (scheduled on VM vmm) to task nj (scheduled on VM vmn) and is defined as:

$$ Comm\left( (n_{i},vm_{m}),(n_{j},vm_{n})\right)=z_{ij}/l_{mn} $$
(2)

where lmn is the data transfer rate of the communication link between the VMs vmm and vmn, which may belong to the same layer, i.e. fog (\(l^{\text {fog}}_{mn}\)) or cloud (\(l^{\text {cloud}}_{mn}\)), or different layers, i.e. one in fog and one in cloud (\(l^{\text {inter}}_{mn}\)). In case both tasks ni and nj are scheduled on the same VM or on VMs that run on the same physical host, the communication cost of the edge eij is considered negligible.

Each entry task of a workflow requires input data that may vary in size. The input data sizedi of an entry task ni is exponentially distributed with mean \(\overline {d}\). The communication cost incurred by the transfer of input data from the IoT layer to a task ni scheduled on a fog VM vmm, is given by:

$$ Comm\left( n_{i},vm_{m}^{\text{fog}}\right)=d_{i}/l^{\text{IoT}} $$
(3)

where lIoT is the data transfer rate between the IoT and the fog layer. In case the input data are required to be transferred from the IoT layer to the cloud layer (i.e. to a cloud VM vmn), they are first uploaded to the fog layer and then forwarded to the cloud layer, incurring an additional overhead. Hence, the communication cost in this case is given by:

$$ Comm\left( n_{i},vm_{n}^{\text{cloud}}\right)=d_{i} \cdot \left( 1/l^{\text{IoT}} + 1/l^{\text{inter}}\right) $$
(4)

The length of a path in the graph is the sum of the computational and communication costs of all of the tasks and edges, respectively, on the path, including the input data communication cost of the respective entry task on the particular path. The critical path lengthCPL is the length of the longest path in the graph. Each real-time job has an end-to-end firm deadlineD within which all of its component tasks must finish execution. It is defined as:

$$ D=A+RD $$
(5)

where A is the arrival time of the workflow and RD is its relative deadline, which is uniformly distributed in the range [CPL,2CPL]. In the time-critical environment under study, the deadline of each job must be met, otherwise its results would be useless. Therefore, in such a case, the job is considered lost.

The communication to computation ratioCCR of a workflow is the ratio of its average communication cost to its average computational cost on the target system and is given by:

$$ CCR=\frac {{\sum}_{e_{ij} \in \mathcal{E}} {\overline{Comm(e_{ij})}} } {{\sum}_{n_{i} \in \mathcal{N}} {\overline{Comp(n_{i})}} } $$
(6)

where \(\mathcal {N}\) and \(\mathcal {E}\) are the sets of the nodes and the edges of the workflow, respectively. \(\overline {Comm(e_{ij})}\) is the average communication cost of the edge eij over all of the communication links in the system, whereas \(\overline {Comp(n_{i})}\) is the average computational cost of the task ni over all of the VMs in the system. An example of a workflow job is shown in Fig. 2.

Fig. 2
figure 2

An IoT data-processing workflow represented as a directed acyclic graph with four entry tasks and five exit tasks. The number in each node denotes the average computational cost of the represented task. The number on each edge denotes the average communication cost between the two tasks that it connects. The blue arrows pointing to the entry tasks of the graph indicate the average communication cost incurred by the transfer of the required input data from the IoT devices. The critical path of the graph is depicted with thick arrows

3.3 Cloud pricing scheme

As mentioned earlier, the cloud layer consists of reserved dedicated physical hosts, each of which is charged at an effective average hourly rate Chost. Furthermore, the data that are transferred in and out of the cloud are charged per TB at a rate Cdata.

4 Hybrid fog and cloud-aware scheduling heuristic

A hybrid heuristic is employed that schedules a ready task of a workflow to a fog or a cloud VM, depending on the task’s potential communication and computational cost. Specifically, in contrast to traditional approaches where the main processing of IoT jobs is performed in the fog layer, our approach attempts to schedule computationally demanding tasks with low communication requirements in the cloud (which has resources with greater computational capacity than the fog, but higher communication latency between the cloud VMs and the IoT layer) and communication intensive tasks with low computational demands in the fog (which has limited computational resources, but lower communication latency between the fog VMs and the IoT layer, compared to the cloud), utilizing possible gaps in the schedule of the fog and cloud VMs. Furthermore, during the scheduling process, our approach takes into account the communication cost incurred by the transfer of data from the sensors and devices in the IoT layer to the VMs in the fog layer. The proposed scheduling strategy consists of two phases: (a) a task selection phase and (b) a VM selection phase.

4.1 Task selection phase

Tasks are prioritized according to their job’s deadline. The task that its job has the earliest deadline, has the highest priority. Consequently, tasks are prioritized according to the Earliest Deadline First (EDF) policy. In case two or more ready tasks have the same priority, the task with the highest average computational cost is selected first.

4.2 VM selection phase

Once a task is selected by the scheduler, it is allocated to the VM that can provide it with the earliest estimated finish timeEFT. All VMs in the fog and cloud layers are considered. The EFT of a ready task ni on a VM vmk is given by

$$\begin{array}{@{}rcl@{}} EFT(n_{i},vm_{k})&= & \\ \max {\left\{t_{\text{data}}(n_{i},vm_{k}),t_{\text{idle}}(n_{i},vm_{k})\right\}}& + &\\ Comp(n_{i}, vm_{k}) \end{array} $$
(7)

where tdata(ni, vmk) is the time at which all input data of task ni will be available on VM vmk, whereas tidle(ni, vmk) is the time at which vmk will be able to execute task ni.

In order to calculate the term tidle(ni, vmk), we must determine the position that task ni would be placed in the queue of VM vmk. Firstly, we find the initial position at which the ready task ni would be placed in the VM’s queue, according to its priority. Subsequently, we check whether a schedule gap exists that can be utilized by the task, as follows:

  • Step 1: In case all of the required input data of the ready task ni are available on VM vmk (i.e. tdata(ni, vmk) = tcurrent, where tcurrent is the current time), we check whether a schedule gap exists. A schedule gap is formed when the VM is idle and the task nq placed at the head of the queue is still in the process of receiving its required input data from other hosts. The capacity g of the schedule gap is calculated as:

    $$ g=t_{\text{data}}(n_{q},vm_{k}) - t_{\text{current}} $$
    (8)

    where tdata(nq, vmk) is the time at which all of the required input data of task nq will be received.

  • Step 2: If a schedule gap exists, we try to fill it in with the ready task ni:

    $$ g \geq w_{i}/f_{k} $$
    (9)

    where wi is the communication volume of task ni and fk is the operating frequency of VM vmk. In case the ready task ni cannot be placed into a schedule gap or a schedule gap does not exist, the position of task ni in vmk’s queue is determined only by its priority.

The pseudocode corresponding to the above procedure is shown in Algorithm 1. The first step of the procedure is described in lines 5-10, whereas the second step is described in lines 12-16. The utilization of schedule gaps is also performed in the same manner for a task waiting in a queue when all of its input data become available on its assigned VM. Furthermore, it is also performed for tasks that are waiting for service in a queue and either the task in service completes execution or a task is discarded from the queue because its job’s deadline has been reached. In the last two cases, eligible tasks are considered according to their priority.

We refer to our proposed scheduling heuristic as Hybrid-EDF. For comparison purposes and in order to examine the same heuristic, but in a cloud-unaware setting, an alternative version of our proposed approach was considered in our experiments, Fog-EDF. This alternative baseline approach only considers VMs in the fog layer during the VM selection phase. That is, no cloud VMs are utilized for the processing of the workflows.

figure c

5 Performance evaluation

5.1 Performance metrics

The following metrics were employed for the evaluation of the performance of our proposed scheduling heuristic, Hybrid-EDF, and its comparison to the baseline strategy, Fog-EDF:

  • Deadline Miss Ratio, which is the ratio of the number of jobs that did not finish their execution within their deadline (and thus lost), over the number of all of the jobs that arrived at the central scheduler of the fog layer, during the observed time period.

  • Percentage of Tasks Executed on Cloud, which is the percentage of tasks of completed jobs that were executed on cloud VMs, during the observed time period.

  • Total Monetary Cost, which is the total monetary cost (in US dollars) for the utilization of the resources of the cloud layer. This cost concerns the reserved dedicated hosts in the cloud layer, as well as the data transfers in and out of the cloud, during the observed time period.

5.2 Experimental setup

The performance of the system was evaluated by conducting a series of simulation runs using the independent replications method. Due to the complexity of the system and the workload model under study and in order to have full control on all of the required parameters, we implemented our own discrete-event simulation program in C++, tailored to the specific requirements of the particular problem. The hosts in the fog and cloud layers were based on real-world processors (one processor per host). The fog layer consisted of a small number of hosts with low to moderate computational capacity. Specifically, the fog layer consisted of 1 small (in terms of operating frequency) processor, modeled after the Intel Xeon Bronze 3106 processor, 1 medium processor, modeled after the Intel Xeon Silver 4108 processor, and 1 large processor, modeled after the Intel Xeon Silver 4109T processor. On the other hand, the cloud layer consisted of a larger number of hosts with greater computational capacity than those in the fog layer. Specifically, the cloud layer consisted of 5 small processors, modeled after the Intel Xeon Platinum 8158 processor, 10 medium processors, modeled after the Intel Xeon Gold 6134 processor, and 10 large processors, modeled after the Intel Xeon Platinum 8156 processor. The operating frequency and the number of cores of each processor are shown in Table 1.

Table 1 Simulation input parameters

In order to directly control the workload parameters and obtain unbiased, general results, not applicable only to particular workload traces, synthetic workload was used. The workflows were generated randomly, using our own random DAG generator, as described in [21]. Each generated workflow was a weakly connected graph, having a path between any pair of tasks, without taking into account the direction of the edges. There was at least one entry and one exit task in each generated workflow. We conducted three sets of experiments: (a) for computationally intensive (CCR = 0.5), (b) moderate (CCR = 1) and (c) communication intensive (CCR = 2) workflows.

For computationally intensive workflows (CCR = 0.5), the mean computational volume of the tasks was selected to be equal to \(\overline {w}= 1.1 \cdot 10^{12}\) clock cycles, so that on average, a task would take 10 minutes to execute on a fog VM. For moderate and communication intensive workflows (CCR = {1,2}), the mean computational volume of the tasks was selected to be equal to \(\overline {w}= 0.55 \cdot 10^{12}\) clock cycles, so that on average, a task would take half the time (i.e. 5 minutes) to execute on a fog VM, compared to the computationally intensive case.

For each CCR and mean computational volume \(\overline {w}\), the mean communication volume \(\overline {z}\) was calculated from (6). In order for the system to be stable, the job arrival rate was chosen to be λ = 0.002. As we wanted to examine data-intensive workflows, the mean input data size of the entry tasks was chosen to be \(\overline {d}= 100\) GB. In order to be in line with the prices of real-world cloud vendors, such as Amazon Web Services and Google Cloud Platform, the reserved dedicated cloud host effective average (for compute optimized hosts) hourly rate was chosen to be Chost = $1 per host, whereas the data transfer rate for transfers in and out of the cloud was chosen to be Cdata = $1 per TB. The heterogeneity degree of the networks in all layers was chosen to be equal to L = 0.5, since most modern networks feature moderate heterogeneity. All of the input parameters of the simulation model are shown in Table 1.

We ran 30 replications of the simulation with different seeds of random numbers, for each set of input parameters. Each replication was terminated when 104 workflows had been completed. We found by experimentation that this simulation run length was sufficiently long enough to minimize the effects of warm-up time. For every mean value, a 95% confidence interval was calculated. The half-widths of all of the confidence intervals were less than 5% of their respective mean values. Furthermore, in order to evaluate whether the differences between the mean values obtained by each scheduling method were statistically significant, a 95% confidence interval was calculated for the difference between each pair of mean values. The calculated confidence intervals did not include 0 and thus the differences in the results between the employed scheduling policies were statistically significant.

5.3 Simulation results

The simulation results reveal that, in terms of the deadline miss ratio metric, the proposed fog and cloud-aware scheduling heuristic, Hybrid-EDF, outperforms the cloud-unaware baseline strategy, Fog-EDF, for all cases of workload. This is shown in Fig. 3. It is noted that in this case a logarithmic scale is used for the deadline miss ratio values, as they are highly skewed. In the case of computationally intensive (CCR = 0.5), moderate (CCR = 1) and communication intensive (CCR = 2) workflows, Hybrid-EDF yields a much lower deadline miss ratio than Fog-EDF. Especially in the case of computationally intensive workflows, the difference is significant. Specifically, Hybrid-EDF yields a deadline miss ratio equal to 1.51%, whereas Fog-EDF yields a deadline miss ratio equal to 85.76%.

Fig. 3
figure 3

Deadline Miss Ratio (%) for computationally intensive, moderate and communication intensive workflows (Hybrid-EDF & Fog-EDF)

This is due to the fact that the proposed scheduling heuristic attempts to assign computationally demanding tasks with low communication requirements to VMs in the cloud, where there is a larger number of VMs with greater computational capacity than in the fog layer. On the other hand, it tries to schedule communication intensive tasks with low computational demands to fog VMs, which are closer to the IoT sources of the generated data, in an attempt to minimize the incurred communication cost. Overall, Hybrid-EDF provides on average 76.69% lower deadline miss ratio, compared to the baseline policy, Fog-EDF, under all cases of workload.

The scheduling decisions of the proposed strategy are clearly shown in Fig. 4, where it is apparent that for computationally intensive workflows, the majority (61.65%) of the tasks are scheduled on cloud VMs. In the case of moderate workflows, about half (48.88%) of the tasks are scheduled in the cloud and the other half in the fog. On the other hand, in the case of communication intensive workflows, the majority (57.45%) of the tasks are assigned to fog VMs. However, the impressive results of the proposed scheduling heuristic come at a significant monetary cost, as shown in Fig. 5. Specifically, for the simulated duration of about a month, the total monetary cost incurred by the data transfers in and out of the cloud and the reserved dedicated hosts in the cloud, was $19,441.61 in the case of computationally intensive workflows, $18,884.21 in the case of moderate workflows and $19,189.14 in the case of communication intensive workflows.

Fig. 4
figure 4

Percentage of Tasks Executed on Cloud (%) for computationally intensive, moderate and communication intensive workflows (Hybrid-EDF)

Fig. 5
figure 5

Total Monetary Cost ($) for computationally intensive, moderate and communication intensive workflows (Hybrid-EDF)

Thus, even though the proposed Hybrid-EDF policy outperforms the baseline Fog-EDF strategy, it requires a significant monetary cost in order to effectively utilize the cloud resources (as shown in Table 2), in addition to the fog resources, which are free. The simulation results are summarized in Table 3.

Table 2 Cloud resource usage statistics (Hybrid-EDF)
Table 3 Simulation results summary

6 Conclusions and future directions

In this paper, we proposed a hybrid fog and cloud-aware heuristic, Hybrid-EDF, for the dynamic scheduling of multiple real-time IoT workflows in a three-tiered architecture. In contrast to traditional approaches where the main processing of IoT jobs is performed in the fog layer, our approach attempts to schedule computationally demanding tasks with low communication requirements in the cloud and communication intensive tasks with low computational demands in the fog, utilizing possible gaps in the schedule of the fog and cloud VMs. Furthermore, our approach takes into account during the scheduling process the communication cost incurred by the transfer of data from the sensors and devices in the IoT layer to the VMs in the fog layer.

The performance of the proposed heuristic was evaluated and compared to a baseline cloud-unaware strategy, Fog-EDF, via a series of simulation experiments, for computationally intensive, moderate and communication intensive workflows. The simulation results reveal that Hybrid-EDF outperforms Fog-EDF in the framework under study, providing on average 76.69% lower deadline miss ratio. However, this comes at a significant monetary cost, due to the usage of cloud resources. In an attempt to minimize the monetary cost, we plan to apply our approach in architectures where the cloud layer consists of on-demand multi-tenant VMs, instead of reserved dedicated hosts.