Introduction

The complex and stochastic nature of production optimization is one of the greatest challenges that stand in the way of Industry 4.0 manufacturing. Automation alone can hardly resolve the complexity and uncertainty in the modern manufacturing environment with mixed product variety. Scholars and practitioners of production optimization have been fighting the complexity and uncertainty for decades. The Advanced Planning and Scheduling (APS) is a typical complex and stochastic optimization problem, such as the Hybrid Flow Shop (HFS) problem, determining the sequence of a number of products produced at several production stages, is NP-hard in most instances (Ruiz and Vázquez-Rodríguez 2010). The advent of Cyber-Physical System (CPS) provides a promising direction forward and paves the way to Industry 4.0 manufacturing (Zhong et al. 2017; Yang et al. 2019; Oztemel and Gursev 2020). Thus, how the planning and scheduling processes are reshaped in such a real-timely visible, traceable, and interconnected environment, and how to solve the APS problem by leveraging the strengths of CPS factories, are of concern in this study.

Considerable efforts have been made from industry and academia to resolve the manufacturing complexity and uncertainty. Leading manufacturers developed various APS systems to allocate materials and production capacity to meet the demand optimally. The usefulness of these systems is widely appreciated. However, the current APS systems lack versatility and the costs for customization are usually high, and it is hard to use these systems to respond to the disturbances and uncertain issues in actual production progress without utilizing real-time shopfloor data promptly. Besides, frequent rescheduling may cause resistance to change, which will be counterproductive for improving efficiency (Rahmani and Ramezanian 2016). Then again, manufacturers invested massively to build highly automated production lines. Still, the performance did not come up to expectations because automation is perfect for executing static schedules with its preciseness and efficiency, but it is not smart enough to tackle various uncertainties in a dynamic environment.

On the other hand, due to the NP-hardness of the planning and scheduling problem, researchers have tried a large variety of mathematical methods and computational algorithms. These approaches have generated similar solutions that could obtain optimal value in theory but can hardly put into practice because the uncertainties are not well tackled. Afterward, Hierarchical Planning and Scheduling (HPS) and Multiperiod Planning and Scheduling (MPS) have emerged. HPS and MPS are two typical methods to locate and resolve complexity and uncertainty. HPS discretizes the original complex problem into a series of relatively deterministic subproblems (Hax and Meal 1973). MPS divides the decision horizon into multiple short periods with limited uncertainty (Li and Ierapetritou 2010). Nevertheless, both HPS and MPS require but suffer from the lack of coordination and integration mechanisms for connecting subproblems.

Nowadays, the power of sharing information has been widely known for reducing the bull-whip effect. The new hope is to develop big data analytics to minimize complexity and uncertainty (Yang et al. 2019). Fortunately, the power of the Internet of Things (IoT) devices in CPS promises to capture real-time factory data (Sisinni et al. 2018; Fang and Zheng 2020). However, IoT, as it is, does not readily serve as effective solutions for sharing information. Only very few companies have eventually implemented such IoT solutions on a large scale. Moreover, how real-time shop floor data can be utilized for supporting decision-making is still a question to be answered.

This paper proposes a “divide and conquer” approach, Spatial–Temporal Out-Of-Order execution (ST-OOO), for achieving Real-Time Advanced Planning and Scheduling (RT-APS) in cyber-physical factories. ST-OOO divides the space and time scopes of a workshop into finite space and time units to reduce complexity and localize uncertainties so that the complex optimization problem is decomposed to a set of small-size subproblems with different spatial and temporal characteristics. These subproblems can be tackled in a rolling spatial–temporal manner to generate a global solution. Several research questions are answered in this paper. First, how to discretize the traditional monolithic APS decision into a series of real-time decisions, and how to establish their connections and dependencies using real-time visibility and traceability? Second, how to design a dynamic job distribution and execution mechanism considering the actual shop floor situation, such as the availability of men, machines, materials to organize production activities in a simple and resilient manner? Third, how the key parameters, spatial and temporal factors in the proposed method affect its performance?

This study aims to develop ST-OOO for real-time advanced production, planning, scheduling, and execution for cyber-physical factories. There are three research objectives:

  • To innovate a novel “divide and conquer” approach, ST-OOO, to resolve the complexity, uncertainty and localize the disturbances in real-life manufacturing optimization problem;

  • To apply the proposed ST-OOO in a hybrid flow shop scenario with detailed steps for practically achieving RT-APS;

  • To conduct a case study for evaluating the performance of ST-OOO and investigating the impacts of the spatial and temporal factors, and some parameters of shopfloor configuration on the performance.

In terms of research significance, this study offers a brand-new perspective of complexity and uncertainty management using real-time data and information visibility and traceability. It contributes to the theoretical basis for solving optimization problems in the CPS environment. From a practical perspective, this article develops a new effective solution for optimization problems encountered in the real-life factory to reduce operational errors and improve efficiency, productivity, and resource utilization. In short, this study provides a novel solution that can obtain a good trade-off between theory and practice.

The rest of this paper is organized as follows. “Literature review” section gives a literature review. The general idea and two key components of ST-OOO are presented in “Spatial–temporal out-of-order execution” section. “Five steps of the ST-OOO application in CP-HFS” section explains the detailed steps of ST-OOO with a HFS example. The case study is conducted in “Case study” section. Finally, “Conclusions” section summarizes this study and gives several research perspectives.

Literature review

Hybrid flow shop scheduling problem

HFS scheduling has been receiving considerable research attention for decades (Ruiz and Vázquez-Rodríguez 2010). Given its theoretical complexity and practical significance, researchers have presented a wide variety of mathematical models and computational algorithms to tackle the problem, including exact approaches, heuristics, and metaheuristics.

Exact approaches including Branch and Bound (B&B) and Mathematical Programming were preferred in literature for tackling the HFS scheduling. The B&B algorithm is firstly proposed by Brah and Hunsucker (1991) to tackle a general m stages HFS scheduling problem for minimizing the makespan. Afterwards, Vandevelde et al. (2005) further presented more advanced lower and upper bounds. Wittrock (1988) decomposed the HFS problem and used DP to find a proper loading sequence. Besides, Liu and Karimi (2008) formulated several MILP models for m-stages HFS with parallel processors and evaluated their quality. These methods have been proven effective for small size HFS problems, but it is computationally impossible to handle real-world large scale HFS using exact methods.

Heuristics and metaheuristics usually required less computational efforts when dealing with large-size HFS problems. Brah and Wheeler (1998) investigated 9 dispatching rules in a HFS with average flow time and makespan objectives. S. Wang and Liu (2013) designed a heuristics algorithm based on B&B for a two-stage HFS scheduling, near-optimal results were obtained in numerical studies. Metaheuristics including Genetic Algorithms (GA), Ant Colony Optimization (ACO), were also commonly used in HFS problems. Komaki et al. (2016) addressed a two-stage HFS problem followed by an assembly machine using the AIS algorithm. Qin et al. (2018) designed an ACO-based strategy to solve a HFS problem with uncertain processing time. Scholars tried various algorithms to solve the HFS problems given their NP-hardness. Nevertheless, similar results were produced, which were near-optimal in theory, but still suffer from the lack of resilience to handle frequent uncertain events in industrial environment.

Hierarchical/multiperiod planning and scheduling

It is generally accepted that APS problems are complex and stochastic in nature (Efthymiou et al. 2016; Keller and Bayraksan 2009). Scholars realize that the breakthrough to Industry 4.0 manufacturing is impossible without resolving the complexity and uncertainty.

Hierarchical Planning and Scheduling (HPS) is a typical approach to reduce complexity by decomposing the APS problem into a set of subproblems (Bitran et al. 1982). Dempster et al. (1981) introduced a stochastic programming framework that encompasses the entire multi-level decision process. More recently, O’Reilly et al. (2015) proposed a theoretical framework of HPS for food manufacturers. Menezes et al. (2016) investigated bulk cargo terminals and presented a HPS approach with a math model for scheduling and lot-sizing decisions integration. Multiperiod Planning and Scheduling (MPS) is another method to manage complexity and uncertainty by discretizing the decision horizon into multiple shorter time intervals (Sridharan et al. 1987). Balakrishnan and Cheng (2007) reviewed the research addressing the reconfiguration and uncertainty issues in cellular manufacturing with multiperiod planning horizons. Torkaman et al. (2017) studied a multi-stage multi-product multi-period capacitated flow shop planning problem and developed MIP-based heuristics with rolling horizons.

These efforts are widely appreciated, but the complexity and uncertainty issues remain since both HPS and MPS require but suffered from the shortage of real-time coordination and integration of subproblems.

CPS for smart manufacturing

In recent years, frontier Industry 4.0 concepts such as IoT, blockchain, and CPS provide new paradigms to enhance visibility and traceability in the shop floor. Udoka (1991) presented an overview of automated data capture technologies and claimed that these technologies are crucial to the success of production automation. Zhong et al. (2013) investigated RFID-enabled manufacturing and proposed a real-time MES for planning and scheduling. Lin et al. (2018b) used iBeacon technologies to construct visibility in a HFS to facilitate real-time decisions and operations. In addition, the applications of industrial wearables were also widely studied (Ming Li et al. 2019; Kong et al. 2019). Thanks to the deployment of IoT devices, abundant data are real-timely accessible. Kusiak (2017) revealed the crucial role of big data in Industry 4.0 manufacturing and summarized five gaps to fill. Moreover, L. Wang and Haghighi (2016) provided a concept map and associated a CPS with the joint strength of operators, agents, and functional blocks to obtain higher performance. A CPS-based Industrial IoT Hub was presented by Tao et al. (2017) to achieve real-time shop floor interconnection.

Indeed, manufacturing is getting smarter; however, technologies themselves are not readily served as effective solutions for supporting APS decision-making. Moreover, how these cutting-edge concepts reshape the APS process and what are the new characteristics of APS in CPS factories, deserve more explorations.

Finite elements method and out-of-order execution

This study also draws inspiration from the Finite Elements Method (FEM) and Out-Of-Order execution mechanism. FEM originated from structural analysis problems in the early 1940s and obtained its real impetus in the 1960s thanks to the power of software programs and computers (Zienkiewicz et al. 2013). The essential idea of FEM is to divide a complex system into smaller, simpler parts (finite elements) for which unknown function is approximated, and simple equations are easy to establish and solve. Then these simple equations are connected into an equivalent system of equations to model the original system. The method captures local effects and seeks for a global solution. Inspired by FEM, this study divides a factory into finite elements by meshing so that the complexity and uncertainty can be significantly minimized. Thus, straightforward models can be formulated and tackled, and then they are integrated using visibility and traceability.

Out-Of-Order (OOO) execution, since its formation in the 1960s and now widely implemented, has contributed to the advancement in high-performance CPUs. OOO makes full use of instruction cycles to overcome efficiency problems encountered in in-order processors. The concept of OOO is allowing the CPU to process instructions considering the real-time availability of input data and arithmetic logical units, rather than by the initial order in a program. In this way, OOO avoids a class of stalls that occur in in-order processors and improves the efficiency of instruction processing. OOO offers complex logics for dynamic analysis and resolution of data dependencies to enable more efficient use of multiple execution units. It inspires this study to introduce an OOO execution mechanism in CPS-enabled factories.

Spatial–temporal out-of-order execution

This section presents the general idea of the Spatial–Temporal Out-Of-Order execution (ST-OOO) and how this method leverages the strengths of Cyber-Physical Factories (CPF) for achieving Real-Time Advanced Planning and Scheduling (RT-APS).

Real-time visibility and traceability in CPF

In a CPF, the physical space is characterized by advanced connectivity between physical entities, real-time data and information acquisition. The cloud services, intelligent data analytics, and information sharing mechanisms are integrated into cyberspace to support decision-making. Real-Time Visibility and Traceability (RTVT) is the most crucial and indispensable characteristic of a CPF. RTVT visualize the flows of men, machine, and materials from 4 dimensions (space, time, information, status) to provide a thorough understanding of the real-time situation. The dimensions of space and time describe “where and when” and present a straightforward spatial–temporal trajectory. The dimension of information usually specifies “what” (basic facts such as ID, type, customer order, due date, etc.), while the dimension of status refers to “how” (such as whether the WIP is waiting in buffer or in operation, is there any errors, etc.). These help to establish the dependencies such as how the job pools update over time and how the jobs flow between stages. RTVT extended the manufacturing facts to a higher reality in which data are fully utilized along multiple dimensions. The novel ST-OOO can accommodate such a highly transparent, traceable, and interconnected environment with real-time decision-making. Instead of generating an optimal solution at the beginning and perform rescheduling frequently to cope with uncertainties, ST-OOO keeps monitoring the actual situation on the shop floor using RTVT to support real-time production decision-making. The role of RTVT is further explained with a hybrid flow shop example in  “Five steps of the ST-OOO application in CP-HFS” section. Spatial–temporal Analytics and Out-Of-Order execution are the two key components of ST-OOO.

Key component 1: spatial–temporal analytics

Inspired by the “divide and conquer” spirit of the Finite Element Method (FEM), a Spatial–Temporal Analytics (STA) is proposed in this subsection as the first key component of ST-OOO (Mingxing Li et al. 2020). The FEM becomes feasible thanks to modern computing power, while STA relies on the strengths of RTVT in a CPF.

Figure 1 shows the general process of STA. Firstly, dividing the space scope \( {\fancyscript{S}} \) and time scope \( {\fancyscript{T}} \) of the original problem \( {\fancyscript{P}} \) into a series of simple subproblems \( {\fancyscript{p}} \), which are called Spatial–Temporal Elements (ST-Elements). ST-Elements are of smaller space scale \( \Delta {\fancyscript{s}} \) and shorter time scale \( \Delta {\fancyscript{t}} \). Secondly, formulating the element decision models, which should be relatively simpler As the complexity and uncertainty of ST-Elements are slashed in meshing. Thirdly, identifying the connectivity and dependency of ST-Elements as well as boundary conditions using RTVT for the assembly of all isolated ST-Elements. Lastly, solving the subproblems under RTVT in a rolling spatial–temporal manner to generate a global solution.

Fig. 1
figure 1

Spatial-temporal Analytics

STA provides a new perspective to solve manufacturing optimization problems by minimizing complexity, approximating uncertainties, and localizing disturbances. Real-time information and data collected in a CPF are fully exploited in STA to facilitate decision-making.

Key component 2: out-of-order execution

The key idea of OOO in computer engineering is to allow CPU to process instructions considering the real-time availability of input data and arithmetic logical units. In this way, OOO avoids a class of stalls and improves resource utilization and processing efficiency. It inspires this study to introduce an OOO in CPF. That is, within a ST-Element, jobs are processed in an order governed by the availability of materials, machines, and men.

Figure 2 presents an example of OOO. Considering a simple CPU with 4 independent instructions in the queue, each instruction consists of 4 steps: Fetch (F), Decode (D), Execution (E), and Writeback (W). An in-order processor will process instructions one by one in the original order. If hazards occur, for instance, a cache miss (the data requested for \( {\text{I}}_{2} \) is not found in the cache, it will take a longer time for RAM to check the address, retrieve the data, etc.), the execution of \( {\text{I}}_{2} \) delays and \( {\text{I}}_{3} , {\text{I}}_{4} \) have to wait in the queue. In an OOO processor, when such a hazard occurs, the CPU will check the availability of data for \( {\text{I}}_{3} , {\text{I}}_{4} \) and process instructions out of the original order to avoid Stalls (S). Similarly, in a simple shop floor with 4 jobs in the queue, each job consists of 3 steps: Fetch (F), Execution (E), and Transfer (T). When a disturbance like material deficiency or loss occurs, the operator can decide to process other available jobs rather than wait.

Fig. 2
figure 2

An Example of OOO in CPU and shop floor

OOO offers a robust way and complex logic for dynamic job distribution to cope with disturbances and enable more efficient use of production facilities in the shopfloor by giving operators a certain degree of autonomy.

Five steps of the ST-OOO application in CP-HFS

As shown in Fig. 3, this section gives detailed steps and explanations on how to achieve RT-APS using proposed ST-OOO in a Cyber-Physical HFS. Firstly, discretizing the space and time scopes of factory to generate spatial–temporal mesh for minimizing complexity, localizing disturbances, and approximating uncertainty; Secondly, formulating the elemental models that are relatively simpler as the complexity and uncertainty of APS are slashed in meshing; Thirdly, implementing job pools and the Out-Of-Order execution mechanism to support decision making within each ST-Element; Fourthly, identifying the connectivity between ST-Elements and boundary conditions of the problem for the assembly of isolated ST-Elements; Lastly, designing a clustering-based synchronization strategy for making the decision. The notations are given in Table 1.

Fig. 3
figure 3

Five Steps of Spatial–Temporal Out-Of-Order Execution

Table 1 Notations and descriptions

Step 1 Spatial–Temporal Meshing in CPS-enabled HFS

In a highly visible, transparent, and interconnected CPS-enabled HFS, the first step is to generate spatial–temporal mesh for minimizing complexity, localizing disturbances, and approximating uncertainty.

In this step, the original problem \( {\fancyscript{P}}\left( {{\fancyscript{S}},{ \fancyscript{T}}} \right) \) is discretized into finite ST-Elements for generating the spatial–temporal mesh. Each ST-Element denotes a simpler subproblem \( {\fancyscript{p}}\left( {\Delta {\fancyscript{s}},\Delta {\fancyscript{t}}} \right) \) with limited system size. The ST-Elements should be small enough with reduced complexity and uncertainty and yet large enough to give valid results. All the uncertain events that occur in the current period \( \Delta {\fancyscript{t}} \) are postponed to the next \( \Delta {\fancyscript{t}} \). The loss of service quality (as postponement cost) is negligible since the time scale \( \Delta {\fancyscript{t}} \) is short enough relative to the original decision horizon \( {\fancyscript{T}} \). In the HFS case, machines at different stages usually have different functionalities. The space scale \( \Delta {\fancyscript{s}} \) of ST-Element is a single-stage. There are two main reasons: On the one hand, in traditional approaches, the HFS problems were solved by considering all production stages to generate results close to the theoretical optimum, but these solutions can hardly be applied when problem size grew beyond the available computing power. On the other hand, if the space scale \( \Delta {\fancyscript{s}} \) of ST-Element is set as a single machine, there is barely room for further optimization due to the limited capacity. Moreover, weak resilience is another problem if a rigid job processing sequence is determined for every single machine. Frequent rescheduling may cause resistance to change, which will be counterproductive for improving efficiency (Rahmani & Ramezanian, 2016). From the temporal point of view, the overall APS decision horizon \( {\fancyscript{T}} \) is discretized into a series of medium-length time intervals \( {\fancyscript{t}} \), which is a day or a standard work shift. Subsequently, \( {\fancyscript{t}} \) is subdivided into elementary time units \( \Delta {\fancyscript{t}} \) that represent 1 h or several hours. By this time, the complex HFS has been spatiotemporally discretized into an equivalent system of finite space and time units; Hence, for a ST-Element \( \left( {\Delta {\fancyscript{s}},\Delta {\fancyscript{t}}} \right) \) (at a single stage \( \Delta {\fancyscript{s}} \) in an elementary time unit \( \Delta {\fancyscript{t}} \)), the downsized subproblem considers only a small set of jobs. Complexity and uncertainty are slashed.

Step 2 Mathematical Formulation of ST-Elements

As the complexity and uncertainty of the APS problem are slashed in meshing, the element models should be relatively simpler. In the discretized HFS, a ST-Element represents a single stage in a time unit (i.e., the parallel machine scheduling problem in a given time unit). All ST-elements have their own decision autonomy and they are assumed to be self-centered. Supervisors and operators of a single production stage intend to complete the assigned jobs as soon as possible. This assumption is also in line with the observations in the real-life industry. Therefore, the objective function for each ST-Element is the makespan of that element. The overall benefits of the system are considered using a synchronization mechanism, which will be discussed further in Step 5. The other assumptions are: (1) all processing time and setup time are known; (2) parallel machines at the stage are identical; (3) preemption of jobs is not allowed. The ST-elements are formulated as MILP models.

$$ {\text{Minimize}}\, C^{k, \tau } $$
$$ {\text{Subject}} \,{\text{to}}: $$
$$ \mathop \sum \limits_{l = 1}^{{m^{k} }} Y_{j, l}^{k} = 1, \forall j $$
(1)
$$ X_{j, i}^{k} + X_{i, j}^{k} \le 1, \forall i \ne j, j $$
(2)
$$ C_{j}^{k} \ge S_{j}^{k} + p_{j}^{k} , \forall j $$
(3)
$$ S_{i}^{k} \ge C_{j}^{k} + s_{j, i}^{k} - M \cdot \left( {3 - X_{j, i}^{k} - Y_{j, l}^{k} - Y_{i, l}^{k} } \right) , \forall i \ne j, j, l $$
(4)
$$ S_{j}^{k} \ge C_{i}^{k} + s_{i, j}^{k} - M \cdot X_{j, i}^{k} - M \cdot \left( {2 - Y_{j, l}^{k} - Y_{i, l}^{k} } \right) , \forall i \ne j, j, l $$
(5)
$$ C_{i}^{k} \ge C_{j}^{k} + p_{i}^{k} + s_{j, i}^{k} - M \cdot \left( {3 - X_{j, i}^{k} - Y_{j, l}^{k} - Y_{i, l}^{k} } \right) , \forall i \ne j, j, l $$
(6)
$$ C_{j}^{k} \ge C_{i}^{k} + p_{j}^{k} + s_{i, j}^{k} - M \cdot X_{j, i}^{k} - M \cdot \left( {2 - Y_{j, l}^{k} - Y_{i, l}^{k} } \right) , \forall i \ne j, j, l $$
(7)
$$ C^{k, \tau } \ge C_{j}^{k} , \forall j $$
(8)
$$ X_{j, i}^{k} \in \left\{ {0, 1} \right\} , \forall i, j $$
(9)
$$ Y_{j, l}^{k} \in \left\{ {0, 1} \right\} , \forall j, l $$
(10)
$$ S_{j}^{k} , C_{j}^{k} \ge 0, \forall j $$
(11)

Step 3 Implementation of Job Pool and Out-Of-Order Execution

Step 3 aims at implementing job pools and the OOO execution to support decision-making within each ST-Element. The concept of job pool is proposed to real-timely manage the production activities with simplicity and resilience (Guo et al. 2020; Lin et al. 2018a). By introducing the concepts of order pool, stage pool, machine pool in a CPF, supervisors can easily obtain precise information on orders, jobs at each stage even each machine for better monitoring and control of the manufacturing process. Job pool offers a simple but robust way to manage orders and jobs in a synchronized manner and facilitates the implementation of OOO execution. OOO is a paradigm used in modern CPUs to avoid stalls and improve processing efficiency. OOO allows the CPU to process instructions considering the real-time availability of input data and arithmetic logical units. By analogy, the OOO execution in factories organizes the onsite production execution considering the real-time availability of materials, machines, and men. Operators look ahead in a window of jobs through smart devices and find those that are ready to be processed. The key features of OOO are the high degree of autonomy, flexibility, and resilience at the operational level. Therefore, the influence of uncertainties like stochastic processing time, material deficiency/loss, or machine failure on the whole system is negligible compared to the effects of those uncertainties on a rigid production schedule. Operators have autonomy because the jobs assigned to the ST-element are similar, which will be further discussed in the synchronization mechanism in Step 5.

In the HFS scenario, a ST-Element represents a single stage during a given period. The scale of the job pool of ST-Element is the size of the job (instruction) window in the OOO mechanism. Job pools update real-timely by utilizing RTVT. For example, a logistics operator completes a logistics job for transferring the material/WIP to the stage, and this means the material/WIP is now ready; thus, the corresponding production job is validated in the job pool of current ST-Element. The first available production operator looks ahead in the job pool and picks the ready production job. Once the operation is completed and submitted by the production operator, a new logistics job (transferring the material/WIP to the next stage) is generated logistics job pool for operators. The job pool and OOO execution mechanism offer robust and straightforward logic to tackle frequent disturbances within ST-Elements. Operators make decisions and act under the OOO mechanism in a highly visible and transparent environment supported by RTVT.

Step 4 Assembly Using Real-Time Visibility and Traceability

Step 4 aims to identify the connectivity between ST-Elements and boundary conditions of the problem for the assembly of isolated ST-Elements using RTVT. The nodes and nodal equilibrium are indispensable to connect the elements for obtaining the global model in FEM. In comparison, the Elemental Connectivities (ECs) in ST-OOO refer to how the job pools update over time, how the jobs are transferred between ST-Elements, and the elemental time dependency. RTVT plays a crucial role in identifying and establishing ECs because all these data can only be accessed through RTVT (space, time, information, status). In the HFS case, three kinds of ECs are considered:

Firstly, the update of job pools utilizes four dimensions of RTVT. The input of the element \( \left( {\tau ,k} \right) \) includes two parts. The first one is the output of \( \left( {\tau - 1,k} \right) \)(space and time dimensions); the second one is the new information in time unit \( \tau - 1 \) that is postponed to \( \tau \) (information dimension). And the sets \( Z_{\tau - 1,k,g}^{in} \) and \( Z_{\tau - 1,k,h}^{out} \) are also affected by various uncertain events such as stochastic operation time, machine breakdown, etc. (status dimension). The job pools are updated as

$$ \begin{array}{*{20}c} {Z_{\tau ,k} = \left( {Z_{\tau - 1,k} \backslash \mathop \cup \limits_{h = k + 1}^{K} Z_{\tau - 1,k,h}^{out} } \right) \cup \left( {N_{\tau - 1,k} \cup \mathop \cup \limits_{g = 1}^{k - 1} \;Z_{\tau - 1,k,g}^{in} } \right), \forall k \in K,\forall \tau \in T } \\ \end{array} $$
(12)

The second EC uses the space and time dimensions of RTVT. The jobs that are completed in element \( \left( {\tau - 1,k} \right) \) will flow to the subsequent space units \( k,k + 1, \ldots ,K \). And the jobs flow to the job pool of element \( \left( {\tau ,k} \right) \), are composed of the jobs that completed in the previous space units \( 1, 2, \ldots ,k - 1 \).

$$ \begin{array}{*{20}c} {Z_{\tau - 1,k}^{out} = \mathop \cup \limits_{u = k + 1}^{K} Z_{\tau ,k,u}^{in} , \forall k \in K,\forall \tau \in T } \\ \end{array} $$
(13)
$$ \begin{array}{*{20}c} {Z_{\tau ,k}^{in} = \mathop \cup \limits_{v = 1}^{k - 1} \;Z_{\tau - 1,v,k}^{out} , \forall k \in K,\forall \tau \in T } \\ \end{array} $$
(14)

The following formula (15) defines the time dependency between element models using the space and time dimensions of RTVT. An operation of job \( j \) can start in space unit \( k \) only when the operation of that job in space unit \( k - 1 \) has bn completed.

$$ \begin{array}{*{20}c} {S_{j}^{k} \ge C_{j}^{k - 1} , \forall k \in K, \forall j } \\ \end{array} $$
(15)

In FEM, certain Boundary Conditions (BCs) must be specified for each point on the solid surface to remove the singularity problem. In the HFS, three kinds of BCs are considered: (1) the initial state of the workshop, including the configuration and the production capacity of the HFS, the job pools \( Z_{0,k} , \forall k \in {\mathcal{K}} \); (2) the new information (e.g., new orders, availability of machine and worker) to be considered \( N_{\tau ,k} , \forall k \in {\mathcal{K}},\forall \tau \in {\mathcal{T}} \); (3) the objectives will also affect the APS decisions. There are three objectives to minimize: (1) Makespan (MS); (2) Total Setup Time (TST); (3) Mean Order Flow Time (MOFT, defined as the difference between the starting time of the first job in one order and the completion time of the last job in the order). MOFT incorporates both job flow time measure and the waiting time/holding time measure. The waiting time/holding time is commonly used as a manufacturing synchronization measure. (Lin et al. 2018b; Chen et al. 2019; Luo et al. 2019)

$$ MS \ge C_{j}^{K} , \forall j $$
(16)
$$ TST = \mathop \sum \limits_{i} \mathop \sum \limits_{j} \mathop \sum \limits_{k} \mathop \sum \limits_{l} Y_{j, l}^{k} \cdot Y_{i, l}^{k} \cdot \left( {X_{j, i}^{k} \cdot s_{j, i}^{k} - X_{i, j}^{k} \cdot s_{i, j}^{k} } \right) $$
(17)
$$ MOFT = \frac{{\mathop \sum \nolimits_{o} \left[ {\mathop {\hbox{max} }\limits_{{h_{j,o} = 1}} \left( {C_{j}^{K} } \right) - \mathop {\hbox{min} }\limits_{{h_{j,o} = 1}} \left( {S_{j}^{k} } \right)} \right]}}{O} $$
(18)

With the ECs and BCs, all isolated ST-Elements can be assembled into a global structure.

Step 5 Solve the Problem with a Clustering-based Synchronization Strategy

After meshing, formulating the models, establishing elemental connectivities, and identifying boundary conditions. The critical decision to be made for achieving RT-APS is which jobs to release to each ST-Element. The last step proposes a Clustering-based Synchronization Strategy (CSS) for making the decision.

Computationally efficient algorithms are preferable since the decisions need to be made in real-time. A CSS is proposed to make decisions for the HFS case. The spatial–temporal similarity of jobs is measured from the perspective of Horizontal and Vertical Synchronization (HSync, VSync). Besides, other synchronizations such as material delivery, operator skills, and order due date can also be taken into account. The production progress consistency of one order is controlled through HSync to lower the holding/waiting time of finished jobs, while VSync aims at coordinating similar jobs to reduce setup time for changeover (Lin et al. 2018a). Thus, the jobs within the same customer order and the jobs that need less setup for changeover tend to be clustered. The similarity between each pair of jobs are given by calculating their Euclidean distance. And then, linkages are generated between pairs of jobs that are close together to form binary job clusters. These newly formed binary job clusters are further linked to each other to create bigger clusters until all the jobs are linked together to form a hierarchical tree. And the similarity of clusters a, b is given as

$$ d\left( {a,b} \right) = \sqrt {\frac{{2n_{a} n_{b} }}{{n_{a} + n_{b} }}} \left\| {\bar{a} - \bar{b}} \right\|_{2} $$
(19)

The original intention of ST-OOO is to cope with shifting events by sticking to a fundamental principle. Instead of generating a rigid schedule, ST-OOO clusters similar jobs and assign them to each ST-element. It is precisely because the jobs in the same cluster are similar, an exact processing sequence within the cluster is less significant. Besides, there is no absolute division of job clusters, it is possible to cut the clusters of arbitrary sizes that are best fit in with the preset ST-Element. Then these clusters are released to ST-Elements based on spatial and temporal characteristics, the level of urgency, and the customer requirement, etc. This is the decision made by supervisors to answer “which jobs to release”. When a job cluster is released to shopfloor, related production and logistics tasks are generated accordingly, operators process these tasks under the OOO, which guarantees great flexibility and resilience for onsite production execution. Thus, RT-APS is achieved with the coordination of CSS and OOO.

Case study

This section presents a prototype case study to examine the performance of the ST-OOO for general hybrid flow shop scenarios, including four parts: (1) implementation of cyber-physical HFS with RTVT, (2) parameter setting, (3) performance evaluation, (4) sensitivity analysis.

Implementation of CP-HFS with RTVT

Motivated by the case from a collaborative company of the research group, a prototype of cyber-physical HFS is implemented in the laboratory for demonstration, as shown in Fig. 4.

Fig. 4
figure 4

Cyber-Physical HFS

There are three kinds of smart devices in the physical HFS, including smart tags (iBeacon tags) attached to production resources, gateways deploy at key areas, and visual devices for supervisors and operators. iBeacon tags serve as the flexible and reusable carriers of manufacturing data (basic facts such as ID, location, type, etc.). Smart gateways are set to monitor critical areas, such as raw materials/finished product area, buffer areas, machines areas. Gateways detect the iBeacon tags to capture real-time location data, checking job status, and monitoring mistakes and disturbances automatically. The production and logistics departments play the principal roles in the CP-HFS. Smart services are provided to operators and supervisors through smart visual devices (mobiles and desktops). Operation execution and control services are offered in the mobile application for onsite operators. Planning and scheduling services are integrated into the desktop application for supervisors.

The seamless connectivity among production entities and multi-dimensional real-time data capture are the key characteristics of the physical HFS. Master gateways are applied to achieve cyber-physical synchronization, capture the state of entities, and defined the interoperability of entities. The manufacturing data are collected and transmitted to the cyber HFS through master gateways to achieve RTVT. Cloud services, intelligent data analytics, and sharing mechanisms are integrated into the cyber HFS. Supported by cyber services and RTVT, managers can easily make APS decisions based on the real-time system status and state. CPS provides the HFS with higher visibility and traceability of resources, better circulation of manufacturing data flow as a solid foundation for applying ST-OOO.

Parameters setting

Three performance measures are used: (1) MS; (2) TST; (3) MOFT. Table 2 gives the details of the experiment data. As this paper aims at providing a general solution for HFS scenarios, these data and their distributions, as well as the assumptions, are similar to the literature that also investigated HFS (Kurz and Askin 2004; Chen et al. 2019; Weng et al. 2012). The assumptions are: (1) All processing time and setup time are known; (2) Parallel machines at each stage are identical; (3) A job might skip some production stages; (4) Preemption of jobs is not allowed; (5) The transportation time of jobs between stages is negligible.

Table 2 Experiment data

In the clustering-based synchronization strategy, the distances of jobs are given as:

$$ d\left( {i,j} \right) = w_{1} \cdot d_{1} + w_{2} \cdot d_{2} $$
(20)

The descriptions of the notations are in Table 1. The more similar two jobs are, the smaller their distance is. 21 combinations of weights are considered in Table 3.

Table 3 The performance of different combinations of weights

As shown in Fig. 5, when \( w_{1} \) increases and \( w_{2} \) decreases, the MS ranges from about 7750 min to over 8000 min, the TST ranges from slightly over 13,500 min to almost 17,500 min. Both MS and TST present the rising trend in fluctuations. While on the whole, the MOFT generally shows the declining trend ranging from nearly 4850 min to around 4500 min. It is noteworthy that the MS and MOFT show obvious and similar fluctuations, this may on account of the mutual effect between MS and MOFT. For example, as \( w_{1} \) grows, jobs in the same customer order tend to be clustered to reduce MOFT so that the TST tends to grow. Consequently, the MS is prolonged, and part of processing jobs flow in the shopfloor for a longer time. To attain well-balanced performance, the weights are set as (0.2, 0.8) for the subsequent experiments.

Fig. 5
figure 5

The performance of different combinations of weights

Performance evaluation

This subsection evaluates the performance of ST-OOO under various patterns of customer demand. Three typical dispatching rules are adopted as references. Since the decisions should be made real-timely and frequently, it is more appropriate to use the algorithms that require less computational efforts (even if in large instances): (1) Least Work Remaining (LWKR) is found to be effective in reducing job flow time (Kia et al. 2010); (2) Shortest Processing Time (SPT), which is one of most classical rules in literature; (3) First-Come First-Served (FCFS), intuitively, as orders are processed one by one under FCFS, this rule should be useful to reduce MOFT. These rules are very flexible, no matter what kinds of uncertainties occur, jobs are always dispatched based on the priority.

The number of customer orders is set as 48, 60, and 72, and the ratio of dynamically arriving orders is set as 1/3, 1/2, and 2/3. 10 instances are generated for each setting. The total number of customer orders represents the total demand, and the ratio of dynamically arriving orders reflects the dynamics of the market. The primary consideration for this setting is to evaluate the stability and adaptiveness of ST-OOO in a dynamic environment. Table 4 presents the average value of the results. It is observed that FCFS performs better in terms of MOFT, but compromises on other measures. ST-OOO can obtain well-balanced solutions that minimizing MS and TST with reasonable MOFT. To obtain a more reliable comparison, Fig. 6 gives the scattered boxplots of different methods. One noteworthy finding is that although ST-OOO does not show a significant improvement in terms of average MS in Table 4, the interval plots of ST-OOO and other rules are basically disjoint in Fig. 6, this implies that ST-OOO performs better in terms of MS and TST statistically and it is more stable compared with FCFS. In contrast, despite FCFS obtains smaller average values of MOFT, the interval plots of ST-OOO and FCFS overlap in Fig. 6, which indicates that the difference between the two methods is not statistically significant.

Table 4 Comparison with traditional strategies
Fig. 6
figure 6

Scattered boxplots of different methods

Sensitivity analysis

This subsection carries out the sensitivity analysis to investigate the effects of the spatial and temporal scale \( \Delta {\fancyscript{s}} \) and \( \Delta {\fancyscript{t}} \), and some useful managerial insights are given based on the results.

In the HFS case, the space scale \( \Delta {\fancyscript{s}} \) of ST-Element is a single-stage. However, a stage may not necessarily be the most appropriate unit. One reason is that the number of machines per stage may be influential to the performance. Therefore, the number of machines is set as 3, 6, and 9 per stage in the following experiment (see Table 5). It is worth noting that as the number of machines increases, the performance of ST-OOO deteriorates. When the production capacity is doubled and tripled, the MS and MOFT of LWKR and SPT rules are reduced to around 1/2 and 1/3, while the MS and MOFT of ST-OOO are over 1/2 and 1/3, especially in MOFT. Besides, the TST of LWKR and SPT rules are basically unchanged, while the TST of ST-OOO shows an obvious increase. This implies that discretizing the space scope based on stage is more suitable for the situation where each stage has fewer machines. Yet, it is not the most cost-effective \( \Delta {\fancyscript{s}} \) in meshing, especially when the number of machines per stage is relatively large.

Table 5 Effect of the number of machines

Another crucial factor in meshing is \( \Delta t \). In the subsequent experiments, \( \Delta t \) = 60, 90, 120, 150, 180, 210, 240 min and 36, 48, 60, 72 orders are considered (see Table 6). Figure 7 gives the curves of the measures with different values of \( \Delta t \). Generally, the tendencies of the measures are similar despite the number of orders. As the \( \Delta t \) extended, the MS and TST decrease first and then increase with minimum value reached at \( \Delta t \) = 120 min in most instances, while the MOFT generally shows a declining trend with fluctuations. Thus, it is preferable to set \( \Delta t \) = 120 min in meshing of the case to obtain overall well-balanced performance when applying ST-OOO.

Table 6 Effect of \( \Delta t \)
Fig. 7
figure 7

Effect of \( \Delta t \) on the Performance

Managerial implications

Based on the numerical results given in the above case study, several managerial implications can be concluded for practitioners as follows.

Firstly, massive shop floor data are captured by smart sensors and devices in the modern manufacturing industry. These data and information provide the manager with real-time visibility and traceability, the proposed method is proved effective to generate real-time APS decisions using visibility and traceability.

Secondly, in comparison with traditional planning and scheduling rules, ST-OOO can obtain overall balanced solutions regarding multiple measures, and its performance is more stable and resilient in a dynamic environment. Key parameters can be adjusted to fit in with the actual shop floor situation.

Thirdly, the size of ST-Element should be carefully decided according to the actual conditions of the factory, the schedule will be too rigid to lose resilience when the size is too small, while it is lack of responsiveness to frequent disturbances and the similarity of clustered jobs is weakened if the size is too large.

Conclusions

This paper has investigated the emerging features of traditional optimization problems in cyber-physical factories. In such a highly visible, traceable, transparent, and interconnected factory, RTVT means the data regarding the space, time, status, and state of the workshop are real-timely available to managerial decision-makers. Thus, the novel ST-OOO that capitalizes on the RTVT of the CPF was proposed. By reducing the complexity and uncertainty in meshing, the complex APS problem can be discretized and tackled in a rolling spatial–temporal manner to obtain a global solution. The detailed steps of ST-OOO were presented with a HFS example. Finally, a prototype case was conducted to examine the superiority of ST-OOO, and sensitivity analysis was conducted to investigate the impacts of two crucial factors on the performance.

In the case study, the performance of ST-OOO was evaluated comprehensively. The results showed that ST-OOO had a well-balanced performance on selected measures, by contrast, other strategies might be better on one specific measure but compromise with the others. Moreover, the ST-OOO performed more stably in a dynamic environment, while other strategies were volatile when the ratio of dynamic orders changed. In the sensitivity analysis, the effect of the number of machines per stage was firstly investigated. The performance of ST-OOO deteriorated as the number of machines increased. This implied that simply discretizing the space scope of the HFS according to the stage was straightforward. However, the stage might not be the most cost-effective scale of space in meshing. Besides, the effect of temporal scale \( \Delta t \) was also examined. The results showed that as \( \Delta t \) grew, the MS and TST decreased first and then increased, the overall well solution was obtained when \( \Delta t = 120{\text{mins}} \). These two findings suggested that the most cost-effective scale of ST-Elements should be small enough with reduced complexity and uncertainty and yet large enough to give valid results.

The contributions of this paper are threefold: (1) It innovated a novel “divide and conquer” approach, ST-OOO, which provided a brand-new perspective for solving production optimization problems using RTVT in cyber-physical factories; (2) It applied the proposed ST-OOO in a HFS case with detailed steps and explanations for achieving RT-APS in a practical way; (3) It presented a case study and sensitivity analysis to verify the effectiveness of ST-OOO and to investigate how spatial and temporal factors affect its performance.

Some research perspectives can be derived from this paper. Firstly, a more comprehensive case study might be conducted in future work to consider various uncertain events, adopt more measures like schedule stability and robustness, and compare the ST-OOO with more advanced strategies. Secondly, the most cost-effective temporal scale was found in sensitivity analysis, yet the spatial scale was not. How to find the most cost-effective scale of ST-Element is the question to be answered. Last but not least, the ST-OOO may be applied to other production scenarios such as job shop and even be applied to solve other real-life complex optimization problems.