Keywords

8.1 Introduction

Enterprises obtain goods and services in complex, global Supply Chains (SC). SC systems consist of four closely interrelated elements: Suppliers, manufacturing, distribution network, and customers. Each of these elements affects the behavior and performance of the entire system. This results in the necessity to consider all interactions, limitations, and uncertainties when making decisions for running a profitable SC. In recent years, research in the area of SC has predominantly focused on the nature of the relationships and processes in a closed loop, circular environment (Golan et al., 2020). Real-world environments require the consideration of uncertain behavior, for example, of competitors, suppliers, and customers. However, uncertainty cannot be appropriately predicted or expected (Knight, 2014). In SC, two basic approaches of uncertainty can be differentiated, referring to (1) the decision-making process and (2) shifted balance and changed profitability. In the former case, a planner has not all needed information to form an informed decision. This can come from a lack of transparency, missing information, and the unknown impact and interrelations of actions. The latter case is caused by potential unpredictable events such as pandemic, social or economic instabilities (Bonde, 2018). Consequently, enterprises aim for the use of systems that facilitate making better and more informed decisions. Large amounts of data are needed to make those systems possible. This data can then be used to develop Artificial Intelligence (AI) models. Especially in the field of Supply Chain Management (SCM), countless use cases for AI can be found. As a result, large enterprises have started experimenting with AI solutions very early on, to better understand how their business works and what events are most likely to happen in the future. This is also reflected in a study by Gartner, which shows that 72% of all study participants find investments in digitization as a competitive advantage (Klappich & Muynck, 2020). Results of this early phase of experimenting are now available on the market from SC application vendors. State-of-the-art AI tools have the ability to analyze large amounts of data, clean and identify patterns, themes, and trends and generate related action plans. This is achieved in either a predictive or prescriptive manner (Klappich & Muynck, 2020). As of today, there are two main options that allow enterprises to perform analyses in their SCs. The first option is the traditional platform approach is based on an integrated control tower capability, which is part of the SCM platform. This platform supports a combination of SC planning and execution. A supply chain control tower (SCCT) is a notable example, i.e., a central date hub and customized dashboard of data, key business metrics, and events across, that captures the necessary technologies, organizations, and processes, and accumulates required data for short- and long-term decisions (Rölli, 2021). The second option in the field of SCM is the data lake approach, which mainly relies on the visualization of data. In contrast to business intelligence, where analysis models would be built based on the data lake to enable deeper insights, a data lake is a repository that stores the collected data in its natural format, i.e., raw and unprocessed format (Giebler et al., 2020).

Disruptions in an SC can occur due to various reasons. It can be a breakdown in a production line, IT problems, demand fluctuations, strikes, war, changes in the legal framework, environmental conditions or pandemics, e.g., COVID-19 (Ivanov, 2020; Scholten et al., 2020). Some of the aforementioned disturbances are in advance predictable and even controllable. However, there are also uncontrollable disturbances. The impact of a disturbance on the performance of an SC depends strongly on the duration and severity of the event. It should, therefore, be the goal of an enterprise to be particularly resilient to such disruptions (Scholten et al., 2020). Supply chain resilience (SCRes) is an extremely important strategic tool for gaining a market advantage. SCRes is the ability of organizations to withstand disruptions and disturbances with no, or limited performance deterioration (Christopher & Peck, 2004). The importance of SCRes is particularly highlighted by a study of the World Economic Forum (Bhatia et al., 2013), which shows that 80% of enterprises are concerned about the resilience of their SC. In the “Resilien-Tech” project by acatech (2014), lessons learned were defined in seven topic areas in order to be able to develop a resilient enterprise, as well as regulatory requirements. The topic areas include (1) development of regulations on the interface between the state and the private sector, (2) implementations of insurance obligations, (3) conduct of resilience monitoring and incentivization, (4) introduction of early warning system and mandatory reporting, (5) use of incentive systems, (6) introduction of regulations for the implementation of emergency and crisis exercises, and (7) evaluation of cyber risks. These topic areas are defined in a vertical structural going from macro perspective of regulatory government frameworks for dealing with complex (systems of) systems in order to increase resilience, to the micro perspective of the SC where machine breakdowns and subsequent production line failures can lead to the need to anticipate unpredictable events. The ability to quickly adapt to disruptions and produce the same quality and quantity despite unexpected events becomes an even more important challenge after COVID-19 in today’s agile business world. This can be highlighted due to a study by Capgemini (2020), which shows that it took 68% of manufacturing enterprises 3 months to recover from SC disruptions caused by COVID-19.

An emerging dimension in the consideration of supply chain is viability. This is defined by Ivanov (2020) as “the ability of a supply chain (SC) to maintain itself and survive in a changing environment through a redesign of structures and replanning of performance with long-term impacts.” The Viable Supply Chain (VSC) comprises three dimensions: agility, resilience, and sustainability. Ivanov and Dolgui (2021) designed a conceptual framework for VSC with a focus on aligning resilience, sustainability, profitability, and digitalization.

The increase of the robustness and resilience of the production and thus of the SC can be ensured by appropriate maintenance leading to achieve and preserve desired uptime (i.e., low failure rate) of machine and plant as well as production system. This enables flexible resource management and keeps losses to a minimum. This is further reinforced by the fact that modern production systems are complex interactions of production machines, sensors, and IT systems, which in turn represent complex, self-contained systems. So-called Cyber Physical Production System (CPPS) consist of autonomous and cooperative elements as well as subsystems (Monostori et al., 2016). These subsystems are interconnected through and within all levels of production and logistic networks (Ansari et al., 2018). CPPS have three main characteristics, (1) intelligence (smartness), (2) connectedness, and (3) responsiveness, which enable viable production and impact VSC (Panetto et al., 2019). In other words, prediction of machine and plant breakdowns should be considered for modeling and estimation of SCRes. This can be achieved through the use of AI methods and technologies as well as knowledge-based maintenance methodologies in particular predictive and prescriptive maintenance (Ansari et al., 2019; Ivanov et al., 2021a, 2021b). In addition to predicting machine breakdowns, AI-enhanced maintenance must also recommend actions, as in prescriptive maintenance, in order to be able to react flexibly to changes. Predictive analysis in the context of SCM and production planning focuses on historical dataset and retrospective analysis to extract patterns used to forecast planning and scheduling. This allows management to increase flexibility and robustness as the core values of a SCRes. Hence, AI contributes to the predictability of risk, reduce risk in manufacturing enterprises and thus reduce uncertainty in SC. Based on the prediction of future events, prescriptive analysis makes it possible to act optimally in response to disturbances, disruptions, and changes. Using a diverse set of methods including mathematical modeling, simulations, statistical learning, machine learning, and semantic technologies (e.g., Knowledge Graphs, Bayesian Networks), prescriptive analytics enables the development of flexible and robust plans that take uncertainties into account. Nevertheless, the application of AI in SCM and production and logistics management does not end with planning and scheduling. AI can also implement recommendations leading to more responsive and flexible SC. This is especially relevant when real-time rescheduling is needed. AI systems dealing with dynamic time series data need to be able to constantly adopt to changing conditions and reflect them in decision-making parameters, preferences, and recommendations. This includes adopting equipment parameters and processes resulting in a range of alternative schedules. Accordingly, AI can either recommend those plans or schedule them automatically depending on the degree of automation. AI systems for prescriptive maintenance should consequently be able to work with a temporal component in addition to a complex, uncertain system in order to be able to realize resilient manufacturing and ultimately, building resilience in SCs. This failure resistance can be achieved by focusing on the concept of Reliability, Availability, Maintainability, and Safety (RAMS). Concentrating on reliability and availability in particular, major improvements can be made in industrial maintenance using prescriptive maintenance. The achieved improvement is evaluated using metrics and Key Performance Indicators (KPI). The most important of these are the Remaining Useful Lifetime (RUL), Mean-time Between Failure (MTBF), and Uptime. These KPIs have an impact on the Overall Equipment Efficiency (OEE) of production systems. An improvement of the OEE, therefore, leads to an increase in the reliability of manufacturing processes and thus to improved resilience in the SC (Karl et al., 2018). In particular, industrial AI and the associated ability to adapt itself can improve the aforementioned KPIs and lead to the ultimate goal of SCRes (Esmaeel et al., 2018). Notably, Ivanov et al. (2021b) proposed a three-dimensional framework for analyzing the impact of AI methods on SC.

Considering the above discussion into account, this paper provides an AI-enhanced approach for integrative modeling and analysis of related KPIs toward building resilience and viability in manufacturing and SCs, aided by Dynamic Bayesian Networks (DBN).

The rest of the paper is structured as follows: Sect. 8.2 provides a brief literature review discussing current research in the area of (1) resilience in SCM as well as (2) Dynamic Bayesian Networks (DBN). Section 8.3 presents an application of a DBN in an industrial maintenance use case. Section 8.4 discusses the results, limitations, and possibilities of the proposed AI-enhanced approach. Finally, Sect. 8.5 explores the current state of applications of DBN as well as future outlooks in SCM.

8.2 Literature Analysis

8.2.1 Resilience and Viability in Supply Chain Management

The study of the impact of economic, environmental, geopolitical, societal, and technical uncertainties on SC is being closely examined in research by public organizations (World Economic Forum, 2017). Yet, tools for measurable monitoring and deduction in the form of KPIs to derive recommendations for action and consequent improvement of SCM are needed. The relationship between KPIs and SCRes has been examined by Karl et al. (2018). They divided the influence of KPIs on SCRes into three phases: (1) before, (2) during, and (3) after the disruption phase. The consequent literature analysis showed a very strong correlation between non-financial KPIs and resilience. In particular, KPIs for order and delivery times, inventory levels and customer satisfaction have been identified as suitable indicators that support resilience. A study by Werner et al. (2021) in the manufacturing sector shows that the optimization of non-financial KPIs can greatly increase the resilience of enterprises as well. It also reveals that monitoring KPIs can help to detect early signs of vulnerability and to take targeted actions.

Pursuing this line of research, it is important to design the strategy from three points of view: (1) identifying the KPIs for building resilience strategy, (2) classifying the KPIs to identify which actions should be taken to respond in the event of a disruption, and (3) developing contingency plans based on the identified KPIs. The issue of a quantitative assessment of SC reliability, resilience, and viability has been investigated by Chen et al. (2017) and Ivanov (2022) leading to the development of a unified framework for evaluating SC reliability and resilience. Stavropoulos et al. (2020) have established a corresponding decision-making framework after analyzing the manufacturing processes of medical equipment in the COVID-19 pandemic.

Weichhart et al. (2021) focused on adaptivity in resilient manufacturing, which can be implemented in three levels, namely (1) the use of robotics for intra-logistics, (2) a planning system that can reschedule manufacturing on an ad hoc basis, and (3) a modular process model and execution system to ensure adaptivity at the process level. Bauer et al. (2021) emphasize that AI is an enabler to increase the performance of SCs, as heuristic models can be used to understand the complex nature of such networks. Machine Learning (ML) in particular is well suited for this purpose as it allows for generalization and works very well with previously unknown data. The use of algorithms to improve resilience in complex industrial CPPS has been also investigated by Stavropoulos (2020). Here, they adopted a chaos engineering approach to ensure the requirements of available, secure, safe, and reliable system operation. Industrial AI and its impact on KPIs have been studied in detail by Bai et al. (2021) in the use case of truck platooning. In analytical experiments, a positive correlation was found between the AI model used and related KPIs, namely Availability, Mean Time to Failure (MTTF), and Mean Security Capacity to Failure (MSCF). Reliability and availability are particularly important here as parts of RAMS. The RAMS process can be used to analyze potential hazards and the effects of failures. This also includes Failure Mode and Effects Analysis (FMEA). As described in Passath et al. (2021), the RAMS process can be extended to include an asset criticality analysis. This asset criticality is then used as a parameter of a DBN, which enables the calculation of relevant KPIs over the product life cycle using an additional, temporal dimension. Considering the correlation between resilience and operational KPIs like availability as well as financial KPIs like profitability, Schenkelberg et al. have investigated the impact of maintenance on profitability using various AI methods like Bayesian Networks (Schenkelberg et al., 2020a), supervised ML (Schenkelberg et al., 2020b) and simulation (Schenkelberg et al., 2020c), respectively. Due to the advantage of BNs on combining expert opinions and data for integrative modeling and analysis of KPIs over time, this paper mainly explores BNs application for building resilience in manufacturing and SCs.

8.2.2 Dynamic Bayesian Networks

A Bayesian Network (BN) is a graphical model that represents probabilistic relationships between variables. A BN consists of a qualitative and a quantitative part. The qualitative part consists of directed, acyclic graphs. Here, each variable presents a node. A causal relationship between nodes is modeled with edges. The quantitative part of the BN is formed by the conditional probability tables (CPT), which are assigned to each node. In each CPT, the defined states of the considered node are assigned for each possible state combination (Russel et al., 2010). The creation of BN is done in the following three steps as discussed by Ansari et al. (2020): (1) creation of an Object-oriented Bayesian Network, (2) building a static BN, (3) Incorporating temporal component for deriving dynamic BN (DBN) from the BN. This makes it possible to map the relationships of the variables over time. DBN are dynamic models, which allow what-if analysis and reasoning over time, considering the evolution of variables and temporal distributions of discrete time points i in the interval 0 ≤ i ≤ T (Ansari et al., 2020). DBN can be constructed manually with the help of domain experts who build the network and assign CPT. However, DBN can also be learned automatically, but this requires the use of special algorithms such as the Expectation Maximization (EM), General Expectation Maximization (GEM) algorithm (Mihajlovic & Petkovic, 2001), or Markov Chain Monte Carlo (MCMC) models (Liang et al., 2020). Ansari et al. (2020) see DBN as ideal models for the necessary predictive capabilities of SCM and industrial maintenance in particular. Hosseini and Ivanov (2019) studied the OEM (Original Equipment Manufacturer) exposure to the disruption propagation of its supply networks, where they developed a function for assessing the vulnerability and recoverability using BN. This enabled to measure the resilience of the SC of OEMs in the aerospace and automobile industry. BN can be used in combination with FMEA for risk analysis (Rastayesh et al., 2020) of the power conditions in polymer electrolyte membrane fuel cells. Kulkarni et al. (2021) integrated FMEA into BN in order to enable health monitoring and increase the reliability of critical infrastructure in the aerospace industry. To tackle the increased scale and complexity in software intensive manufacturing systems Yang et al. (2018) developed a framework, based on case-based reasoning, FMEA and BN for dynamic multi-fault diagnosis, considering uncertainty, in the aerospace industry. Further in the aerospace industry, Li et al. (2017) developed a DBN for health monitoring of airframes for the prediction of crack growth. A generic DBN-enhanced methodology of improving KPIs, especially RAMS for OEMs and machine users along product lifecycle was introduced by Passath et al. (2021). To combine pre- and post-failure phases in risk assessments, a DBN was developed by Tong et al. (2020) in order to increase resilience. The developed methodology was then applied to historical data from refinery accidents in order to demonstrate the applicability.

DBNs can be categorized into two groups stationary and non-stationary DBNs. Stationary DBNs do not consider the evolving nature of edges over time, whereas non-stationary DBNs (nsDBN) allow the use of a temporal dimension and simultaneously considering uncertainty (Ansari et al., 2020). nsDBNs need to learn conditional dependencies from complex multivariant time-series data. Thus, new learning approaches should be used. Hourbracq et al. (2016) propose an algorithm that decides at each time step, based on the likelihood in a data stream and a sliding window, whether to use an already known model or a new one for the prediction. The starting point of the algorithm is a given, initial network. Furthermore, nsDBN can currently handle abrupt, but not gradual, concept drift well. Meng et al. (2019) present a learning algorithm that addresses this problem and continuously updates the network through a logical search and global optimization. This enables various applications of nsDBN. Serras et al. (2021) designed the ETEOR method for outlier detection in multivariate time series and validated the approach using data from electrocardiogram alert systems, historical data compromising male mortality in France and pen-digit recognition. Quesada et al. (2021) developed a trend forecasting algorithm using DBN in non-stationary time series for industrial furnaces. nsDBNs were used by Zhang et al. (2021) for dynamic risk analysis in tunnel construction processes for non-stationary time-series recognition. The application of nsDBN in the area of maintenance of highly flexible production systems was proposed by Ansari et al. (2020). This would enable the unification of Event-Cost Schema with the temporal dimension of cause-effect analysis (cf. Fig. 8.1). In the context of maintenance, multi-channels of data sources are involved in the creation of nsDBN, as shown in Fig. 8.1. The results from object-oriented analyses such as the FMEA analysis are particularly worth mentioning here. Therefore, systems, subsystems, related components, and their possible fault conditions are analyzed. The results of these analyses represent the tokenized risks for the creation of the network. The statistical probabilities for these risks can be taken from fault databases. In turn, the expert knowledge expressed in form of troubleshooting reports and maintenance documentations can be analyzed and reflected opinions, recommendations, and measures for handling problems can be extracted using AI-enhanced approaches presented by Ansari et al. (2021). Furthermore, expert knowledge formalized as cases including solutions for solving previous problems (Riester et al., 2020) can be analyzed with the help of similarity learning algorithms to enrich the nsDBN. In order to map the temporal and changeable components in nsDBN, real-time data is needed to enable the constant evolution of the networks. This includes condition monitoring data and data from production and maintenance systems including maintenance and production plans and schedules as well as failure databases.

Fig. 8.1
figure 1

Possible data sources for DBN in a resilient SC

Considering the above discussion, the advantages and disadvantages of DBN should be considered as well (McCloskey, 2000). The major disadvantage of DBN is that there is no universal way to create them and that the creation requires a very high resource investment. Since a DBN also uses causal relationships, which are based on the knowledge of the experts involved, a DBN is also limited in this respect. However, these disadvantages are also the advantages of DBN, since new knowledge can easily be incorporated. Probably the biggest advantage of DBN is its ability to reason in two directions and that the result is explicit, unlike in other Machine Learning techniques such as Neural Networks. The probability, as a DBN output, can be interpreted in a deterministic way as KPIs by defining a threshold. This is in fact the main advantage of integrative modeling and analysis of multiple KPIs and their interrelations for building resilience in manufacturing and SCs.

The results presented can also be discussed qualitatively as shown in Table 8.1. It can be clearly seen that DBN is mainly used in the field of manufacturing. The aircraft industry is particularly strong in this respect, where the advantages of BN are clearly evident. The publications also show that nsDBNs have only been used in practical industrial examples in recent years. Sectors such as energy and maintenance also benefit from BN, especially DBN and nsDBN.

Table 8.1 Classification of the results of the literature analysis of the application of BN

8.3 Application of Dynamic Bayesian Network in Industrial Maintenance

The characteristics and applications of DBN identified in the literature analysis can be illustrated using a practical example from industrial maintenance in the consumer goods industry. The production process is divided into four sub-processes: (1) in the filling step the product is filled into the empty containers. The incorrectly filled containers are sorted out by means of a weight check. In the stage of (2) packaging, the carton is erected, the filled container with consumables is placed inside and sealed. Then the label is applied to the packaging. The filled production cartons are then (3) placed in a display carton. The display cartons are then (4) packed into transport cartons. In the following example, the packing process (2) of the containers into the product carton is examined. This process can again be broken down into sub-processes. In the production plant, the machine states and the individual feeder states of the process steps are automatically recorded in a database consisting of a table called history list. In addition, the piece counters are recorded before the start of the packaging process. This is done via the weight check. Defective products are rejected in the process. Machine breakdowns often lead to long downtimes in these highly automated systems. For this reason, a preventive maintenance strategy is currently being pursued. However, this ties up many highly specialized maintenance technicians. Particularly in the wake of the COVID-19 pandemic, this is leading to staff shortages. A maintenance strategy that only requires the situational deployment of personnel would lead to a strengthening of resilience in manufacturing and SC. The DBN model presented in the following section provides clear guidelines for the introduction of a predictive maintenance strategy in the use case described.

The modeling of the DBN is based on the process shown in Fig. 8.2. The approach is an adopted version of the approach presented by (Ansari et al., 2020). This is due to the fact that all necessary data exists in a database, where the rows represent the objects of an Object-oriented Bayesian Network. The starting point here is the history list table, which in a first step must be (1) prepared and then (2) analyzed. Furthermore, the DBN is created manually. This starts with the (3) creation of the structure which is then (4) simplified. After that, the (5) states of the nodes are determined and these are filled with (6) CPT tables. This allows consequently the (7) generation of a DBN from the BN by adding a temporal component.

Fig. 8.2
figure 2

Implementation process of the DBN in an application from the consumer goods industry

8.3.1 Data Preparation and Analysis

During the process, both the change in the machine state and the occurrence of a fault state are listed and stored in a so-called history list. The data of the history list is analyzed in order to be able to use it profitably for the BN. In the history list, machine states, fault states and piece count records are listed in equal measure. The Value ID determines whether the lines in the history list are a machine or fault state or a piece count record. The Value ID thus enables a distinction to be made between the types machine status, fault status and piece count recording. However, based on the Value ID, no statement can be made as to which machine or fault status is involved. The lines of the history list should be clearly assigned to the machine and fault states. A primary key is required for a unique assignment. For this purpose, one line of the history list is used as an example, see Table 8.2.

Table 8.2 Excerpt from the history list

The columns Machine ID, Value ID and Value are subsequently combined to form the primary key State ID to ensure unique identification of the machine and fault states (e.g., 103-200-16). With the help of experts, the fault states are assigned to five different categories with the state IDs 103-210-X (category M1) to 103-214-X (category M5). A fault condition of the respective category is considered to be eliminated when the associated Value 0 appears. In order to be able to determine the duration of a fault condition that has occurred, the faults must be separated and differentiated in their five categories M1 to M5, since a fault of a different category can occur between the fault event message and the fault correction message (Table 8.3).

Table 8.3 Excerpt from the list of machine states

Three machine states can occur during the process, namely (1) Machine activated, (2) Machine off, and (3) Setup process. A machine state is active until it is replaced by another machine state. Subsequently, for each machine status, the total duration, the number of appearances and the average duration per appearance are determined (Table 8.4).

Table 8.4 Excerpt from the list of failure conditions

The fault states are assigned to the five different categories with the state IDs 103-210-X through 103-214-X. A fault condition of the respective category is considered to be eliminated when the associated condition ID 103-21X-0 appears in further sequence, the total duration, the number of appearances and the duration per appearance are determined for each fault condition.

8.3.2 Manual Modeling of the Dynamic Bayesian Network

The manual modeling of the BN is conducted in several steps. First, the nodes of the model are determined and classified into the different levels of the model. Then, the connections of the nodes are created using arrows, which represent the causal relationships of the nodes. By visualizing the dependencies of the nodes, it also becomes clear which nodes are not crucial for the validity of the model. These are eliminated in the further consequence. In order to complete the model, the probability tables of the individual nodes are populated with their conditional probability values. Subsequently, a temporal component is introduced to make the BN dynamic and thus to model a DBN. In order to generate the model, the nodes must first be determined. The goal of this use case is the modeling of a DBN for the reliable prediction of predictive maintenance and simulation for introducing a predictive maintenance strategy with the help of KPIs to increase the resilience of the production process. For the DBN, maintenance relevant KPIs of the management level are required, which describe the maintenance characteristics of the production process. For the creation of the BN, data of the production process from the evaluated database of the history list, which originates from the operational level, is used. This data consists of the machine status, the fault status, and the production quantities at the time in question. The fault statuses are thereby separated into five categories.

A connection between the nodes based on the process data (operational level) and the nodes of the KPIs (management level) should be created with the help of nodes on an intermediate level, see Fig. 8.3. The required node points thus also function as a link between the operational level and the management level. In addition, the wanted nodes (effect) result from the nodes of the process data (cause), but at the same time, they themselves represent a cause for the nodes of the KPIs (effect). The nodes of the intermediate level result from the formulas for the calculation of the performance indicators. The nodes from which the arrow originates (shaft of the arrow) are called parent nodes of the nodes in which the arrow ends (arrowhead). In order to create a certain overview, the causal relationships are presented in Table 8.5.

Fig. 8.3
figure 3

Nodes of the intermediate level with their causal relations

Table 8.5 Causal relationships between nodes
Fig. 8.4
figure 4

Simplified model as the basis for the BN, with KPIs as model levels

The introduced causal relationships illustrate the fact that some introduced nodes have no meaning for the prediction of maintenance properties, see Table 8.5. These facts result in a simplified model that forms the basis for the BN, see Fig. 8.4. In the BN, maintenance-relevant KPIs are used as the basis of the model. The formation of the individual nodes is designed in such a way that the nodes within a level (operational level, intermediate level, and management level) already correspond to the assignment to the individual KPI types. Subsequently, the levels are named according to the key figure types. The naming of the levels is therefore Key Result Indicator (KRI) level, Result Indicator/Performance Indicator (RI/PI) level and KPI level.

Each node can have two or more states. The number of columns, and states, of the CPT of a node is given by the number of states of the parent nodes. The number of states is defined by the column count of the nodes, see Eq. (8.1). After determining the states, the cells of the CPT of each node are filled with the corresponding values. The states of the individual nodes are described in Table 8.6.

Table 8.6 Possible states of nodes in the BN

Calculation of the number of states per node as follows:

$$ \mathrm{Number}\ \mathrm{of}\ \mathrm{States}\ \mathrm{per}\ \mathrm{Node}=\prod \mathrm{States}\ \mathrm{of}\ \mathrm{the}\ \mathrm{Parent}\ \mathrm{Nodes} $$
(8.1)

8.3.3 Probability Values of KRIs

First, the probability values for the states of the fault status nodes (cf. Table 8.6) and the machine status node should be determined. The probability values of the individual machine states are entered in the CPT of the machine status node. The probability that a certain machine state exists is expressed by the relative duration of this state. This is obtained by dividing the total duration of the machine state under consideration with the sum of the total durations of all states. Each fault status category (M1–M5) gets its own fault status node. The possible states of the fault status nodes are reduced to two (“fault” or “no fault”), because if all fault states are taken into account, the CPT of the subsequent nodes would be reduced to a size that would no longer justify the workload. The second variant limits the number of states of the individual fault status nodes to two. If all fault states were considered, there would be 3,779,136 nodes; by limiting them to two fault states, only 96 nodes need to be modeled. To obtain the column count of the CPT, the number of possible states of each parent node is multiplied by each other. The probability of having a particular fault state is expressed by the relative total duration of that state. This is obtained by dividing the total duration of the considered fault state with the sum of the total duration of all states.

8.3.4 Determination of the Probability Values of the RIs and PIs

The fault and machine status nodes at the KRI level are parent nodes of the RIs and PIs, respectively, and feed directly into the calculation of the probability values of the RIs and PIs. The KRIs can thus be seen as the cause on the effect of the nodes from the RI/PI level. For the calculation of the probability values in the RI/PI level, machine or fault statuses are classified according to their properties. These properties can be taken from the analysis of the machine and fault states. A distinction is made between preventive maintenance or repair. For the calculation of the machine or fault states the equipment time is to be considered. The distinction between preventive maintenance and repair must be taken into account when calculating Mean Time To Repair (MTTR) and Mean Time To Preventive Maintenance (MTTPM). While a condition that is relevant for an MTTR calculation results in a repair activity, the MTTPM-relevant conditions cause a preventive maintenance activity. For the calculation of the utilization, performance, and quality grade probabilities of importance, resource states are required. To be able to calculate the RIs and PIs of the individual state combinations, the mean value of the failure-free time of the failure state categories is determined. In order to determine the probability of the states of the node points in the next step, the probabilities for the occurrence of the individual state conditions are calculated. These are calculated from the product of the probabilities of the parent nodes. The CPT of the considered node are multiplied by the corresponding values of the probability of the state combination. This gives the probability of a state of the node for a considered state combination. Modern programs for modeling BNs such as GeNIe SMILE automatically compute the probabilities of the states of the nodes using the distribution function of the BN.

8.3.5 Determination of the Probability Values of the KPIs

The nodes of the RI/PI level are parent nodes of the KPI nodes and are directly involved in the calculation of the probability values of the KPIs. The RI/PI nodes can thus be seen as the cause on the effect of the nodes from the KPI level. In the following, the filling of the CPT cells of the nodes is explained. The number of columns of the matrix is given by the number of possible state combinations of the parent nodes. To fill the CPT of the node, the minimum and maximum values of the node are calculated for each state combination of the parent nodes. For filling the CPT, the calculated minimum and maximum values are considered. These two values yield a range in which the actual value of the node for the considered state combination will lie. It is necessary to classify this range into the states of the node. If a state of the node contains a subset of the range of the node, the share of this subset is considered in the CPT. The calculated values of the CPT are entered into the CPT of the node points. Modern programs for modeling BNs, such as GeNIe SMILE, automatically calculate the probabilities of the states of the nodal points with the help of the distribution function of the BN. For the probabilities of the states of the node points, the probability for the occurrence of the individual state combinations is required. The probability of the state combination is calculated from the product of the probabilities of the parent nodes for their considered states.

8.3.6 Manual Modeling of the DBN

The created BN is a snapshot of the system at a given time and is used to model systems that are in an equilibrium state. However, in reality, systems change over time and it is consequently of great interest to see how these systems evolve over time. Therefore, a model capable of modeling a dynamic system is needed. The use of DBN allows the extension of the BN with a temporal component. In doing so, the network structure or its parameters do not change. The underlying process is thus stationary. However, the system becomes dynamic. To form a DBN from the manually created BN, a number of time steps t are assigned to the nodes. In GeNIe SMILE, this is done by moving the nodes into the so-called Temporal Plate. It is important to note that all nodes must be moved into the area at the same time, otherwise the causal relationships will not be converted correctly. In the present model, the nodes of the KRI level influence themselves over the time intervals. This can be illustrated by an example.

Fig. 8.5
figure 5

The resulting DBN describing the production process

If no fault occurs at fault status M1 in the past time interval, the probabilities of fault status M1 remain identical for the considered time interval in the CPT. If a fault message appears at fault status M1 in the previous time interval, the probabilities of fault status M1 in its CPT are different from those in the intervals before. Therefore, it is necessary to consider the state of the previous time interval. For each node of the KRI level, a temporal link (arrow) to itself is needed. The result of the previous steps is the DBN as seen in Fig. 8.5. For the creation of a model that corresponds to the real production process, expert knowledge for the method and the process is needed. This ensures that the prediction generated by the model will be true in the future.

8.4 Discussion of Results

The effort for data preparation is considerable for the manual creation of a DBN, because first the database has to be processed and then machine and fault states have to be analyzed in order to calculate the corresponding KPIs. Subsequently, the CPTs of the individual nodes are calculated. The calculation of the ratios (RI/PIs and KPIs) and the subsequent calculation of the CPTs of the nodes represent the time-consuming work of data preparation. The manual creation of the BN with the modeling of the nodes and the creation of the causal relationships, as well as the filling of the CPTs is also very time-intensive. However, the information content is significantly higher than with automatically created DBN. Since exact calculations and no approximations form the basis and the modeled relationships were created with application experts. This allows the DBN to display exactly the information (e.g., KPIs) required by the user. The robustness, adaptability, and therefore resilience of the network created with experts is also significantly higher than that of automatically created networks. A DBN allows production and SC managers to plan their processes. The model thus provides clear recommendations, based on visualization of the network and KPIs’ interrelations, on how, for example, availability can be increased and what the corresponding KRIs for this purpose must look like. A DBN also helps on the operational level because it can show through the use of KPIs how failure states can be reduced in general or in particular case. This enables the choice of the right maintenance strategy, as DBNs allow its evaluation before implementation. The deteriorating condition of the components over time is also taken into account. This is an important advantage, especially when estimating maintenance costs. DBN enable the transition from a preventive maintenance strategy to a predictive maintenance approach. This avoids unplanned downtime as far as possible and thus not only increases productivity but also optimizes product quality, effectiveness, resilience viability of the production process.

In practical terms, the potential for improvements by DBN can be illustrated using the use case from the consumer goods industry. By using the DBN, a specific machine down event was reduced by 30%. The benefits of the implemented DBN can also be seen in the entire use case, where an improvement in overall availability of 9% has been achieved.

8.5 Outlook

DBN should be adapted with the help of expert knowledge in order to represent reality, as optimal as possible and thus ensure a reliable forecast of maintenance KPIs, or disruption and changes along SCs. The DBN model presented in this chapter is based on a stationary production process. The limitations of a stationary model are that the defined connections cannot change over time. This does not necessarily reflect the reality of the process. Changes in the relations occur is due to increasing market volatility, which is mainly characterized by customer demand fluctuations, but also changes in the SC. Due to these uncertainties, a higher flexibility in the production process is needed. This is realized by a non-stationary production process. Non-stationary processes can be found not only in production, but also in social networks, in reconfigurable construction as well as in SC. All these examples have in common that elements in these networks are interconnected, their relationships change over time, and also that the relationships themselves are not stationary. These characteristics can be modeled by nsDBN. Hence, it can be concluded that nsDBNs enable new possibilities in planning, monitoring, and controlling in production SC and therefore ultimately strengthen resilience in SC.

Besides, the future research agenda should reinforce the use of DBNs by means of multi-channel data pipelines. This will be driven in particular by the use of new trends in the field of AI. The application of federated learning (FL) enables the use of assistance systems in manufacturing and logistics even in the event of IT and infrastructure breakdowns. At the same time, these assistance systems can be used in privacy-sensitive areas on heterogeneous hardware. The combination of such FL approaches with further AI models to a cognitive maintenance system for decision support was presented in Kohl et al. (2021). The presented approach supports the resilience viability of the SC by the possibility to react flexibly and proactively to events.