1 Introduction

New generation information technologies, such as Internet of things, cloud computing, fog computing, artificial intelligence, big data, etc., have brought valuable development opportunities to many industries [2, 21]. Traditional industry is experiencing the development of information technology. Smart factory was born under such a background [6, 12]. Compared with traditional factories, smart factories need to process a large amount of data. One way is to use remote cloud computing, but there are many disadvantages [5], such as large delay, high bandwidth requirements, and unable to guarantee security and privacy. The emergence of fog computing can alleviate these problems [1]. It transfers computing, storage, control and network functions from cloud to edge devices, so as to reduce data transmission delay and required bandwidth. It allows a group of adjacent end users, network edge devices and access devices to cooperate to complete the tasks requiring resources. Therefore, many computing tasks that originally need to be completed by cloud computing can be effectively completed at the edge of the network through the decentralized computing resources around the data generation equipment.

Smart factory can realize resource virtualization and automatic service deployment through container technology and container automatic orchestration tools [8]. Container is a virtualization technology. Compared with virtual machine, it is lighter and can be quickly deployed on different operating platforms. Currently, docker containers are common. Related orchestration tool is Kubernetes (K8S), a platform tool that can span multiple computing nodes and manage containers on multiple computing nodes. We can containerize the application in the smart factory [9] into a docker container, and then use Kubernetes to automatically deploy the docker container to the appropriate fog computing node [13, 16].

How to reasonably deploy the above containers to the fog computing node of the smart factory and make full use of the fog computing resources is essentially a resource allocation and scheduling problem. At present, there are some related studies on this problem. Shahab Tayeb [15] proposed a new three-tier fog computing architecture, but they did not study the scheduling of fog computing resources. Skarlat [14] proposed a two-layer fog computing architecture and optimized the delay, but they did not consider resource balance. Sagar Verma [17] proposed a cloud architecture and considered load balancing, but this work did not consider the delay optimization problem. Hamid Reza faragardi [4] proposed the application of fog calculation in intelligent factory. Ruilong Deng [3] mainly studied the energy consumption and time delay of cloud computing system. However, they do not consider the delay caused by network communication. Hend Gedawy [7] used a group of heterogeneous fog computing devices to form an edge micro cloud. Wan Jiafu [18] proposed a load balancing scheduling method based on fog computing to solve the complex energy consumption problem of manufacturing clusters in intelligent factories. Both of them consider the delay and resource balance problems at the same time, but the proposed algorithm needs to be further improved. Wenzhu Wang [19] proposed the allocation method of dominant resources under the condition of heterogeneous resources, but the final result does not accord with Pareto optimality. Zening Liu [10, 11] proposed a scalable and stable distributed task scheduling algorithm, but there is a lack of consideration of resource balance and application scenario of smart factories.

As summary, there are some deficiencies in the existing research: First, they basically optimize the processing time of tasks without considering the limited computing resources in the smart factory; Second, it is basically improved in one aspect, and it is not comprehensively optimized in combination with the characteristics of smart factory as a whole. Third, these optimizations are often based on pure fog computing. There are few articles on cloud and fog collaborative computing.

Compared with other studies, according to the task characteristics of smart factory, this paper uses Kubernetes to realize the task automatic deployment in smart factory. The contributions of this paper are summarized as following:

  1. 1)

    Framework: A framework with fog and cloud computing is proposed, in which the tasks are categorized into different types and may be matched to different type of fog computing resources. Compared with the traditional computing framework, the framework is more flexible and can adaptively adjust the use of cloud and fog computing resources.

  2. 2)

    Scheduling algorithm for only-fog resource scheduling: In this paper, we proposed IDGSA (Interval Division Genetic Scheduling Algorithm) to optimize tasks scheduling for use of only-fog resources scenarios. Different to other works, we consider to jointly optimize task processing time delay and resource usage balance. Basically, the scheduling optimization problem is non-deterministic polynomial (NP), and may not be solved by traditional genetic algorithm because of double optimization objects and easy to get local optimization solutions. As such, we propose to construct a dual-objective fitness function and to allocate individuals into three intervals based on their fitness values, and to use different operators for individuals of different intervals in the process of evolution. This algorithm may keep more diversities and can avoid local optimization.

  3. 3)

    Scheduling algorithm for joint cloud and fog resource scheduling: Further, in case of joint cloud and fog resource scheduling, we propose IDGSA-P (Interval Division Genetic Scheduling Algorithm with Penalty factor)algorithms to optimize resource allocation, in which the scheduling work is modeled as a NP problem with constraints so a penalty factor is introduced to solve the problem.

  4. 4)

    Simulations: We carried out simulation experiments to verify our proposed algorithms. Compared with the traditional genetic algorithm and the default algorithm of K8S, our proposed algorithm has faster iteration speed and better optimization effect. IDGSA can reduce data processing time by 50% and improve the utilization of fog computing resources by 60% when compare with K8S default scheduling method. Compared with traditional genetic algorithm, with fewer iterations, IDGSA can reduce data processing time by 7% and improve the utilization of fog computing resources by 9%. With the simulation settings, IDGSA-P algorithm converges faster than Joines&Houck method and may get better results. It may be 18% and 7% faster when compare to only-cloud and only-fog computing, respectively.

2 Scheduling framework in smart factories

In the smart factory with cloud computing resources, task allocation and management are very important. At present, the mainstream virtualization technology, such as docker technology, is commonly used for resource management and allocation. This virtualization technology can bring many benefits. Kubernetes is a commonly used docker management framework, which can realize the flexible distribution and deployment of tasks. In this section, we will introduce a framework of computing with Kubernetes in smart factories.

2.1 Task classification and fog resource classification

Firstly, in order to process tasks efficiently, we divide factory tasks into three categories based on their properties:

  1. 1)

    Real-Time Tasks (RTT): delay sensitive tasks, such as: judgment of the operating status of key smart devices and faults. This type of tasks must be completed within specified time delay.

  2. 2)

    General Tasks (GT): it is a type of task that can withstand a certain amount of delay, such as: monitoring the quality of products, processing video information in the factory.

  3. 3)

    Storage Tasks (ST): tasks that has no requirement on latency but needs storage spaces, such as: data analysis for each production line, analysis of the energy consumption.

Secondly, we categorize fog nodes with different characteristics, too:

  1. 1)

    Exclusive Fog Computing Nodes (EFCN): These nodes are close to the equipment, have outstanding performance, and can give result feedback in the fastest time.

  2. 2)

    Balanced Fog Node (BFN): The processing performance and storage performance of this type of fog node are relatively good, and it can process most tasks in the smart factory.

  3. 3)

    Fog Storage Node (FSN): Such nodes have average processing performance, but good storage performance, closer to the cloud, and can upload data to the cloud data center for processing at an appropriate time.

2.2 Task assignment

Secondly, we study how to assign different type of tasks to appropriate fog computing nodes after task classification. In previous works, tasks were usually deployed in the way of fog and cloud classification, but it did not involve automatic deployment and container application monitoring. Therefore, combined with Kubernetes, this paper further improves the fog computing framework in smart factory.

The components in Kubernetes mainly include:

  • Etcd: used to save the configuration and object status information of all networks in the cluster.

  • API server: Provides API interface and is the hub of data interaction and communication between other modules.

  • Scheduler: The execution module of Kubernetes scheduler dispatched the task to the appropriate node by algorithm.

  • RC (replication controller)/deployment: Monitors the number of tasks in the Kubernetes cluster to stabilize the number of tasks.

The scheduling framework of the smart factory proposed in this paper is shown in Fig. 1. Firstly, the tasks in the smart factory are containerized, and then the containerized tasks are labeled as different type. This information will be stored in Etcd. Then, the Scheduler will interact with API server to obtain the tasks that have not been deployed in Kubernetes cluster, and automatically deploy the tasks to the corresponding nodes according to their labels. For the container application whose label is RTT, it is assigned to EFCN for processing. For a container application with a label of ST, it is deployed on FSN. And for the container application with a label as GT, it is deployed on the BFN. In the whole process, the deployment module in Kubernetes will monitor the container applications in Kubernetes and recreate them when a container application has problems.

Fig. 1
figure 1

Scheduling frameworks in smart factories incorporating Kubernetes

Because label is the most widely used container in handling tasks, how to reduce the delay and improve the resource utilization is the most important question. Therefore, we propose the corresponding system model and algorithms to reasonably allocate such tasks.

3 Fog resource scheduling and balancing in smart factories

In this section, we will discuss the problem of resource scheduling and balancing in intelligent factory in fog computing scenario. We establish delay optimization model and resource balance model to describe the whole fog computing system, and design TSB factor to balance delay and resource usage. Finally, we designed IDGSA algorithm to solve the optimization problem.

3.1 System model

In an intelligent manufacturing factory, a production line is a service. The services in the smart factory can be defined by appj, appj ∈ A, where j represents the j − th service in the smart factory and A represents the collection of all services in the whole smart factory. All container applications used during the execution of a production line appj are defined as set S, and the i − th container application is defined as msi. msi, cpu, msi, mem respectively represent the minimum requirements of container application msi for CPU and memory of fog computing node. When all container applications monopolize one CPU for task processing, the time required is a unit time, expressed in ut(unit time), because the number of different container applications may have different requirements when processing tasks, Therefore, \(m{s}_i^{req}\) is used to indicate how many such container applications are needed on this production line.

In the process of task processing, container applications are executed in sequence, so container applications may use each other’s data or processing results. Therefore, two container applications with consumption relationship can be expressed as (msprov, mscons) indicating that mscons needs the processing results of msprov. The fog computing node resource pool can be defined as set P. the fog computing nodes in the node resource pool are represented by pml. If a container application msi is deployed on the node pml, it can be represented as alloc (msi) = pml. pml, cpu and pml, mem respectively represent the CPU resources and memory resources of the fog computing node.

3.2 Balance factor

In this paper, there are three optimization objectives: (1) Task execution time; (2) Cluster resource balance; (3) Cluster equalization degree and delay equalization factor.

3.2.1 Task execution time

The execution time of the task can be expressed by the maximum of the completion time of all container tasks, as shown below

$${T}_{service}=\max \left\{S\left(m{s}_i\right)\right\}\ i=1,2,3\dots$$
(1)

Where S(msi) represents the data processing time of the container application msi.

The calculation time of a single container application can be expressed as

$$S\left(m{s}_i\right)=\max \left\{{T}_{run}\left(m{s}_i\right),{T}_{wait}\left(m{s}_i\right)\right\}$$
(2)

Where \({T}_{run}\left(m{s}_i\right)=\frac{m{s}_{i, cpu}\times R}{p{m}_{l, cpu}}\) represents the processing time of the container application msi(unit:ut), and R in the molecule represents the number of all container applications on pml.

If the container application has a consumption relationship with other container applications, then we have:

$${T}_{run}=\max \left\{S\left(m{s}_j\right)+{T}_{trans}\left(m{s}_j\right)\right\},\forall m{s}_j\mid {\left(m{s}_j,m{s}_i\right)}_{prov/ cons}$$
(3)

Where, Ttrans(msj) represents the time when the provider container application transmits the processing results to the consumer container application. For ease of calculation, if two container applications are deployed on the same fog computing node, Ttrans(msj) = 0. If they are deployed on different fog computing nodes, Ttrans(msj) = 0.1 × S(msj).

3.2.2 Cluster resource balance

In order to make full use of the fog computing resources in the cluster and keep the cluster balance, we should try our best to ensure that the usage of various resources in the node are balanced. At the same time, the resource usage of each node in the cluster should also be consistent. Therefore, the cluster resource balance can be divided into two parts: (1) the balanced use of various resources on a single fog computing node; (2) The balanced use of resources among different nodes in the whole cluster.

$${B}_{all}={B}_{cluster}+{B}_{single}$$
(4)

And we define:

$${B}_{cluster}=\sigma \left(P{M}_{usage}^{p{m}_l}, if\exists m{s}_i| alloc\left(m{s}_i\right)=p{m}_l\right)$$
(5)
$$P{M}_{usage}^{p{m}_l}=\frac{1}{2}\left(\frac{\Sigma_{m{s}_i}m{s}_{i, cpu}\ }{p{m}_{l, cpu}}+\frac{\Sigma_{m{s}_i}m{s}_{i, mem}\ }{p{m}_{l, mem}}\right)$$
(6)

B cluster is equal to the balance degree of different fog computing nodes in the whole cluster. The smaller the value of Bcluster, the more uniform the use of resources on different fog computing nodes in the whole cluster.

Similarly, we have:

$${B}_{single}=\mid \frac{p{m}_{l, cpu usage}}{p{m}_{l, cpu}}-\frac{p{m}_{l, mem usage}}{p{m}_{l, mem}}\mid$$
(7)

B single represents the resource usage balance of a fog computing node, where pml, cpuusage, pml, memusage represent the CPU and memory used on the node respectively. In this way, it can ensure that in a single fog computing node, there will be no excessive use of one resource and almost no use of another resource, so that all resources on the node can be fully utilized.

3.2.3 Tradeoff between service-time and balance (TSB)

This paper defines an equilibrium factor as another optimization objective of the model. This optimization objective integrates the two objectives of task calculation time and cluster equilibrium, which can enable the factory to rely on Tservice or Ball by adjusting the weight of cluster equilibrium in TSB. It can be expressed by the following formula:

$$TSB(i)=\upbeta \times {B}_{all}^{norm}(i)+\left(1-\beta \right)\times {T}_{service}^{norm}(i)$$
(8)

\({B}_{all}^{norm}(i)\), \({T}_{service}^{norm}(i)\) represents the normalized result of Ball(i), Tservice(i).

In summary, container application scheduling problems in smart manufacturing plants can be summarized as follows:

$${\displaystyle \begin{array}{l}\kern3em \min {T}_{service,}{B}_{all}\\ {}s.t.\kern0.5em alloc\kern0.5em \left({ms}_i\right)={pm}_{l,}\forall {ms}_i\in \kern0.5em {app}_j\end{array}}$$
(9)

The above problems are NP (non-deterministic polynomial) problems, so heuristic genetic algorithm can be used to solve them. For the assignment of tasks in a factory, genetic algorithm is the most common one. This paper proposes the IDGSA algorithm and introduces the concept of interval partition to improve the performance of the traditional genetic algorithm.

3.3 IDGSA algorithm

When be applied to smart factories, traditional genetic algorithms cannot handle the dual-objective problem, and some invalid results may possibly be not properly handled. Furthermore, the iteration speed is relatively slow and has problem of local optimum results.

Therefore, this paper presents IDGSA algorithm to deal with the situation where the initialization results in an individual that is not a feasible solution (to assign container applications on overutilized nodes to other nodes), and to improve the idea that the crossover mutator and roulette selection operator in the traditional genetic algorithm use interval partitioning. Compared with the traditional genetic algorithm, it improves the iteration speed and achieves better results, while guaranteeing the diversity of the population and avoiding falling into the local optimal situation.

The traditional genetic algorithm selects good individuals, crosses over, mutates them, and generates the next generation using a method called roulette selection operator. The idea of the roulette selector is to select individuals according to their fitness values, cross and mutate them, and produce the next generation of individuals.

Traditional roulette operator thought: the probability of an individual being selected is proportional to its fitness function value, set the population size to N, the fitness of an individual xi to f(xi), then the selection probability of an individual xi is

$$P\left({x}_i\right)=\frac{f\left({x}_i\right)}{\Sigma_jf\left({x}_j\right)}$$
(10)

Although this selection operator is simple to construct and widely used, it has drawbacks because it can preserve excellent genes, but it has always been saved by individuals with higher fitness values. As a result, the individual diversity of the population is poor and the results tend to be locally optimal, resulting in no better results. To avoid premature convergence of IDGSA algorithm and traditional genetic algorithm and discard some search subspaces, this paper proposes interval division method to treat individuals with different fitness values. Figure 2 shows the diagram of IDGSA algorithm that incorporating the interval division method. First, the population is initialized and then to evaluate the rationality of each individual. If an individual is not rational then it will be modified until it becomes rational. For rational individuals, in each evolution we calculate its fitness value. When its calculated fitness value is high, then we keep the individual. When the fitness is low, we mutate it and try to let it change to an individual with high fitness value in next round evolution. For individuals with medium fitness value, we propose to use an interval division selection operator to generate new individual. We call the proposed interval division selection operator as interval partitioning roulette selection operator.

Fig. 2
figure 2

The proposed IDGSA algorithm incorporating interval division selection based on individual’s fitness values

The operation steps of the proposed interval partitioning roulette selection operator is described as following:

  1. 1)

    According to the dual-objective fitness function in the algorithm (which is TSB, Eq. 8), the fitness values of all individuals in the population are calculated.

  2. 2)

    Select the individuals with the best and worst fitness values in the whole population, then divide the individuals with the best and worst fitness values into M grades and assign the individuals of the population to the corresponding grade areas according to the fitness values.

  3. 3)

    Calculate the average fitness values for each of the M regions (dividing all individual values in this region by the number of individuals in this region);

  4. 4)

    In M regions, assume that the probability of each hierarchical region being selected is Pm, where Pm is the average fitness value of the current hierarchical region divided by the sum of the average fitness values of all hierarchical regions (M), and then calculate Pm;

  5. 5)

    Assume that the probability of an individual xi being selected in each of the M-level zones is \({P}_m^{x_i}\), which is the sum of the fitness values of the individual divided by the fitness of all the individuals in the hierarchical zones in which they are located;

  6. 6)

    Calculate the probability that each individual in the entire population will be selected \(P\left({x}_i\right)={P}_m\times {P}_m^{x_{i.}}\)

The whole population is defined as P, and there are N individuals in a population. The fitness value of individual xi is f(xi) calculated from the dual-bojective fitness function. At the second iteration, the fitness values of individuals in the entire population P can be expressed as

$$P(T)=\left\{f\left({x}_1\right),f\left({x}_2\right),\dots f\left({x}_N\right)\right\}$$
(11)

The range of fitness values for subspaces of population P is

$$diff=\frac{f{\left({x}_i\right)}_{max}-f{\left({x}_j\right)}_{min}}{M}$$
(12)

Therefore, the population P of the T − th iteration can be divided into

$$P(T)=\left\{f{\left({x}_1\right)}_1^T,f{\left({x}_2\right)}_2^T,\dots f{\left({x}_i\right)}_m^T\right\}$$
(13)

Where \(f{\left({x}_i\right)}_m^T\in \left[f{\left({x}_i\right)}_{min}+ diff\times \left(m-1\right),f{\left({x}_i\right)}_{min}+ diff\times m\right]\). Combined with the previous analysis, we can get that

$${P}_m=\frac{f_m^T{\left({x}_i\right)}_{avg}\ }{\Sigma_m{f}_m^T{\left({x}_i\right)}_{avg}}$$
(14)
$$P\left({x}_i\right)=\frac{f_m^T\left({x}_i\right)}{n_m\times {\Sigma}_m{f}_m^T{\left({x}_i\right)}_{avg}}$$
(15)

It can be seen that P(xi) is inversely proportional to nm, so if the number of individuals in an interval is too large, the probability of being selected will be reduced, and if the number of individuals in an interval is small, the probability of being selected will be increased. So, when the fitness of all individuals in the whole population is too different, the roulette selector with interval partitioning can avoid the early elimination of poor fitness individuals and improve the diversity of selection.

4 Joint scheduling of fog and cloud resource in smart factories

In above, we discussed the problem of task scheduling and resource balance of smart factory with only-fog computing resources, and proposed IDGSA for optimization. In this section, we will further discuss the task scheduling problem under the scenario of cloud and fog collaborative computing. We try to introduce cloud computing into the fog computing resource pool, and ultimately achieve cloud-fog collaborative computing.

Therefore, we set up a fog management node in the fog computing resource pool to assign tasks to specific fog nodes and determine when to collaborate with cloud resources for computing. The specific scheduling architecture is shown in Fig. 3.

Fig. 3
figure 3

Task assignment framework in case with joint cloud and fog computing

4.1 System model of cloud and fog collaborative computing

The task assignment framework can be regarded as a weighted undirected graph G(V, E) as shown in Fig. 4, where V = {F1, F2, …, Fi, FM, …, Fm, C} is a vertex set, Fi is the fog node, FM is the fog manager,C is cloud server.\(E=\left\{{e}_{F_M{F}_1},\dots, {e}_{F_M{F}_i},\dots, {e}_{F_MC}\right\}\) is the edge set,\({e}_{F_M{F}_i}\) a communication link between fog node Fi and fog manager, weight on edge is \({W}_{F_M{F}_i}\). The fog manager only allocates tasks and does not perform specific tasks. The computing power of Fi is denoted as \({A}_{F_i}\), the computing power of C is denoted as AC.For m fog nodes and 1 cloud server, task D can be divided into m subtasks of different sizes, which is denoted as D = {d1, d2, …, dj, …, dC, …, dm} , where dC is the subtask that is assigned to the cloud server.

Fig. 4
figure 4

Undirected graph of fog exclusive node

The total time consumed by node Fi to execute task dj is

$${T}_a\left({F}_i,{d}_j\right)=\frac{d_j}{A_{F_i}}+{T}_t\left({F}_i,{d}_j\right)$$
(16)

where \(\frac{d_j}{A_{F_i}\ }\) is the computing delay, Tt(Fi, dj) is the communication delay.

Similarly, the total time consumed by C is:

$${T}_a\left(C,{d}_C\right)=\frac{d_c}{A_C}+{T}_t\left(C,{d}_c\right)$$
(17)

The total time delay T(dj, dC) cost by the task can is:

$$T\left({d}_j,{d}_C\right)=\max \left\{{T}_a\left({F}_i,{d}_j\right),{T}_a\left(C,{d}_C\right)\right\}\ j=1,2,3,\dots$$
(18)

Finally, we have a constrained optimization problem:

$${\displaystyle \begin{array}{l}\min \kern0.5em \max \kern0.5em \left\{T\left({d}_j,{d}_c\right)\right\}j=1,2,3,\dots \\ {}\kern3em s.t.0\le {d}_{j,}{d}_c\le D\\ {}\kern3em \sum \limits_{j=1}^{m-1}{d}_j+{d}_c=D\end{array}}$$
(19)

4.2 IDGSA-P algorithm

The original problem is an optimization problem with constraints, and the IDGSA cannot solve it directly, so we propose IDGSA-P to convert the original constrained problem into an unconstrained problem.

We design an adaptive penalty function based on the concept of offset. First, construct the chromosome of genetic algorithm:xi = {δ1, δ2, …, δj, …, δm, δC}, where δj is scale factor:

$${\delta}_j=\frac{d_j}{D}\ j=1,2,3,\dots$$
(20)

Then, the original constraint can be converted to:

$${\displaystyle \begin{array}{l}s.t.0\le {g}_j\left({x}_i\right)\le 1j=1,2,3\dots \\ {}\kern3em h\left({x}_i\right)-1=0\end{array}}$$
(21)

where gj(xi) and h(xi) is:

$${g}_j\left({x}_i\right)={\delta}_j\ j=1,2,3\dots$$
(22)
$$h\left({x}_i\right)=\sum_{j=1,}^{m-1}{\delta}_j+{\delta}_C$$
(23)

We define gz(xj), gf(xj) as the positive and negative offset value of gj(xi). Then we have:

$${g}_z\left({x}_i\right)=\max \left\{0,{g}_j\left({x}_i\right)-1\right\}\ j=1,2,3\dots$$
(24)
$${g}_f\left({x}_i\right)=\min \left\{0,{g}_j\left({x}_i\right)\right\}\ j=1,2,3\dots$$
(25)

Similarly, we have the positive and negative offset value of h(xj):

$${h}_z\left({x}_i\right)=\max \left\{0,h\left({x}_i\right)-1\right\}$$
(26)
$${h}_f\left({x}_i\right)=\min \left\{0,h\left({x}_i\right)-1\right\}$$
(27)

In the chromosome set {xi} , we can find the infeasible solution set according to (24) ~ (27), it is denote as\(\left\{{x}_j^{reject}\right\}\), thus, the positive and negative offset degrees constrained in eq. (19) are introduced:

$${D}_z\left(f\left({x}_i\right)\right)=\frac{f_z\left({x}_i\right)}{\max \left\{{f}_z\left({x}_j^{reject}\right)\right\}-{f}_z\left({x}_i\right)}$$
(28)
$${D}_f\left(f\left({x}_i\right)\right)=\frac{\mid {f}_f\left({x}_i\right)\mid }{\mid \min \left\{{D}_f\left({f}_j^{reject}\right)\right\}\mid -\mid {f}_f\left({x}_i\right)\mid }$$
(29)

Based on (28) ~ (29) we can define ϕ(xi) , ξ(xi) as the offset degree of the current solution xi to the inequality constraint function, its expression is:

$${\phi}_j\left({x}_i\right)=\left\{\begin{array}{c}1\ 0\le {g}_j\left({x}_i\right)\le 1\\ {}{e}^{-{D}_z\left({g}_z\left({x}_i\right)\right)}\ {g}_j\left({x}_i\right)>1\ \\ {}{e}^{-{D}_f\left({g}_f\left({x}_i\right)\right)}\ {g}_j\left({x}_i\right)<1\end{array}\right.$$
(30)
$$\xi \left({x}_i\right)=\left\{\begin{array}{c}1\ h\left({x}_i\right)=1\\ {}{e}^{-{D}_z\left({h}_z\left({x}_i\right)\right)}\ h\left({x}_i\right)>1\ \\ {}{e}^{-{D}_f\left({h}_f\left({x}_i\right)\right)}\ h\left({x}_i\right)<1\end{array}\right.$$
(31)

Finally, according to (30) ~ (31), we set the penalty function in the genetic algorithm as:

$$\psi \left({x}_i,\beta \right)={\left(2-\frac{1}{2}\left(\xi \left({x}_i\right)+\frac{1}{m}\sum_{j=1}^m{\phi}_j\left({x}_i\right)\right)\right)}^{\beta }$$
(32)

where β is penalty factor.

By introducing a penalty function, the original constrained optimization problem is finally transformed into an unconstrained optimization problem [19]:

$$\min \max \left\{ TSB\left(j,\gamma \right)\times \psi \left(x,\beta \right)\ \right\}\kern0.5em j=1,2,3,\dots$$
(33)

5 Simulation and results

In this part, we design experiments to test our algorithm. Various parameters and equipment in the experimental process are given in the following sections.

5.1 Simulation environment

In the simulation of idgsa algorithm,this paper uses socks shop, a smart manufacturing factory that produces socks to carry out simulation experiments, and the parameter values in the experiment come from the analysis of socks shop [20]. Socks shop is a micro service demo application, which simulates the actual operation of a smart factory producing socks. The resource usage of each container application comes from the load test of the demo, and the number of each container application required by a task comes from the analysis of customer behavior model graph (CBMG) [7]. Table 1 shows the resource consumption.

Table 1 Container application in socks shop

The IDGSA-P algorithm is simulated in the cloud and fog collaborative computing scenario, so we deal with the cloud node and join the cloud server at the same time. We can see the configuration in Table 2.

Table 2 Parameters related to fog node and cloud server

5.2 Results and analysis

This part introduces simulation experimental results to verify the proposed scheduling algorithms and presents analysis.

5.2.1 Results of IDGSA with different optimization weights

In this paper, the performance of IDGSA algorithm under different balance optimization weights is simulated experimentally, which changes the weight of balance optimization target in the balance factor TSB.

As can ben seen in Fig. 5(a), with β increasing, the computing time of tasks increases and the balance in the cluster decreases (representing a gradual increase in resource utilization). And as shown in Fig. 5 (b), when β is 0.9, TSB will get the minimum value, that is, it can ensure that the computing time of tasks and the cluster balance are both small (cluster resource utilization is high), and two optimization goals reach a relative optimal value (balance optimization).

Fig. 5
figure 5

Simulation results under different equalization optimization weights

5.2.2 Comparison between IDGSA algorithm and traditional genetic algorithm (traditional GA)

To prove the advantages of IDGSA algorithm, IDGSA algorithm is compared with traditional GA by simulation. In the simulation experiment, the population size is 200, the number of iterations is 4000, the horizontal coordinate is the number of iterations, and the vertical coordinate is the balance factor TSB, task completion time, cluster balance, respectively.

Figure 6 (a) shows the balance factor TSB obtained by using IDGSA algorithm and traditional GA. It can be seen from that IDGSA algorithm has obtained the optimal solution when iterating to 500 times, while the traditional GA needs to iterate to 1500 times to obtain the optimal solution, and we can also see that the IDGSA can get a better result. Figure 6 (b) and (c) show the cluster balance degree and task execution time obtained by IDGSA algorithm and traditional GA respectively. It can also be seen that IDGSA algorithm has achieved better results than traditional GA when the number of iterations is about 500, while genetic algorithm needs to iterate to 1500 times. Therefore, we can have two conclusions: First, the final results of IDGSA are better than the traditional GA. Second, IDGSA algorithm can achieve the result in fewer iterations.

Fig. 6
figure 6

Performance simulation diagram between IDGSA and traditional GA

5.2.3 Comparison of IDGSA algorithm and Kubernetes default algorithm

IDGSA algorithm is needed to replace the default scheduling algorithm of Kubernetes, so the performance of the IDGSA algorithm is compared with that of the Kubernetes default algorithm. By changing the number of fog computing nodes in the fog computing resource pool, the performance of the two algorithms is compared when the resource situation changes. In Fig. 7 (a) and (b), the horizontal coordinates represent the number of fog computing nodes. As the number of fog computing nodes increases, the task execution time and resource balance of both algorithms decrease. The results show that the processing time of IDGSA algorithm is reduced by about 50% and the resource utilization is increased by about 60% compared with the default scheduling algorithm of Kubernetes. It can be also clearly seen that IDGSA algorithm has shorter task processing time and higher resource utilization than Kubernetes default algorithm regardless of whether fog computing resources are sufficient or scarce.

Fig. 7
figure 7

Performance simulation diagram of IDGSA and Kubernetes algorithm

5.2.4 Comparison between IDGSA-P and Joines & Houck methond

Joines&Houck method is a commonly used method to solve constrained optimization problems. This method also optimizes the traditional genetic algorithm and converts the constrained problem into an unconstrained problem.

In this simulation, IDGSA-P and Joines&Houck method are compared and the results are shown in Fig. 8. We have conducted 4000 iterations, and the iterative results are shown in the figure below. It can be seen that IDGSA-P has faster convergence speed and can better find the optimal solution than the Joines&Houck method.

Fig. 8
figure 8

Performance comparison between IDGSA-P and Joines&Houck method

5.2.5 Performance comparison under different computing scenarios

In order to verify the difference between the computing method of fog exclusive node and cloud server used in this paper and other computing methods, it is simulated and compared with pure cloud computing and pure fog computing. The number of cycles during simulation is 3000, and IDGSA-P is adopted.

As can be seen from Fig. 9, compared with the pure cloud computing method, the cloud computing method adopted in this paper has powerful computing power and can reduce the computing delay of tasks, but the cloud server is generally far away from the smart factory, the network link bandwidth is limited, and there is a large communication delay. Therefore, the total delay of task processing in pure cloud computing is greater, and with the increase of the number of tasks, the performance gap is becoming more and more obvious. Compared with the pure fog computing method, the cloud server with powerful computing power is introduced, and the total task delay can be reduced by reasonably allocating the task proportion. Compared with the traditional cloud computing and fog computing, the method adopted in this paper can reduce the total task delay by 18% and 7% when the number of users is 500, so as to reduce the total task delay and improve the plant efficiency when processing the real-time tasks of the smart factory.

Fig. 9
figure 9

Performance under different computing scenarios

6 Conclusion

In literatures there is few studies of task scheduling frameworks of fog and cloud computing resources in smart factories with consideration of using task scheduling tools like Kubernetes. Furthermore there is lack of studies with joint optimization of task processing time delay and resource balance. Based on the characteristics and requirements of smart factories, this paper improves the fog computing architecture for smart factories. Firstly, we discuss the scheduling and resource balancing of smart factory tasks in fog computing scenario and design the equalization factor TSB to comprehensively consider the delay and equalization problem. To solve this problem, we proposed IDGSA algorithm. Compared with the traditional genetic algorithm, IDGSA can achieve better optimization and faster convergence speed. Secondly, we further establish a cloud and fog collaborative computing model to flexibly allocate fog computing and cloud computing resources. According to the model, IDGSA-P is designed on the basis of IDGSA to optimize the original problem. Finally, we did simulation experiments to verify the proposed algorithms. The simulation results show that IDGSA has better performance than the traditional genetic algorithm and K8S default algorithm. At the same time, compared with only-cloud computing and only-fog computing, cloud and fog collaborative computing can make more effective use of resources and bring lower computing delay under IDGSA-P.

Future work of this study include considering more optimization objectives like limited computing resource or/and communication bands, real-time analysis model with consideration of resource faults, and implementation of the algorithm in smart manufacturing testbed using K8S.