Keywords

1 Prediction of Quality Characteristics

As Industry 4.0 strategies are rolled out progressively, process data is becoming accessible in large amounts. The available data offers engineers and scientist innumerable opportunities to analyze and improve production processes. Some exemplary applications are predictive maintenance and process mining [23]. The research field Predictive Quality describes the user’s ability to optimize product and process-related quality characteristics by using data-driven forecasts as a basis for actions to be taken [5]. The foundation for all predictive quality applications is the prediction of quality characteristics (PQC).

The prediction those characteristics can be regarded as a virtual inspection process, as it replaces a physical inspection.

In conventional physical inspection processes for determining product quality, a specific operation (e.g., measuring or gauging) is used to decide whether a quality characteristic meets a pre-defined requirement. In order to make this decision, it is checked whether the considered quality characteristic lies within previously defined specification limits.

Since every inspection process is subject to uncertainties (e.g., due to the uncertainty of the underlying measurement process), the decision whether the characteristic meets the requirement is also uncertain. Due to the uncertainty of inspection results, an erroneous decision is possible. Characteristics that are within the specification limits are rejected (α-error), and characteristics that are outside the specification limits are accepted (β-error). Both errors entail technical, economic, and legal consequences. To reduce the risk of a wrong decisions, the limits of conformity are narrower than the specification limits to account for the uncertainty of the inspection process (e.g., the measurement uncertainty). To guarantee a product within the specification limits, the process variance, the variance of the test process, and the specification limits must be aligned according to DIN EN ISO 14253-1 (see Fig. 1) [41].

Fig. 1
figure 1

Limitation of the specification range due to measurement uncertainty according to ISO 14253-1 (see [28, 41])

In order to consider an inspection process as suitable, it must be ensured that the quotient of uncertainty of the inspection process U and tolerance of the considered quality characteristic T does not exceed a certain threshold. This threshold value is defined differently in various standards and guidelines (see MSA [18], VDA5 [40], ISO 22514-7 [42]). As a rule of thumb, the golden rule of metrology states that the ratio UT should not be greater than one-tenth to one-fifth [28, 39]. To deploy PQC in industry, the suitability of the (virtual) inspection process must be guaranteed. Hence, the uncertainty of the underlying model must be quantified. The determination of the uncertainty of a model is a typical example from the mathematical field of Uncertainty Quantification [37].

Uncertainty Quantification (UQ) focuses on the quantitative characterization of uncertainties in both real and computer-based applications. UQ methods are used to quantify the probability of certain results if some or all input variables are uncertain. A mathematical model is used to describe the system’s behavior extracted from the measured data. UQ problems are divided into two classes: forward uncertainty propagation and inverse uncertainty quantification. Forward uncertainty propagation aims to estimate the different sources of uncertainty, acting on a model to predict an overall uncertainty of the system response. Inverse uncertainty quantification involves estimating the so-called bias correction (i.e., the discrepancy between the measured value and the model) and unknown parameters of the model [6, 37].

In PQC, we estimate the parameters for a given model structure from data. The data used for parameter estimation are usually measurement data and, therefore, affected by uncertainty [28]. For a given model structure and some data, the objective is to minimize the model prediction’s uncertainty by setting the parameters appropriately. The determination of uncertainty in the field of predictive quality can, therefore, be considered an inverse uncertainty quantification problem by definition [37].

2 Definition of Prediction of Quality Characteristics

We first define PQC in a deterministic way before introducing a Bayesian perspective. The definition is provided for a single product in discrete manufacturing. Thus, the index \(i \in \mathbb {N}\) identifies a unique part of one product type. With minor modifications, the definition of PQC can be extended to the process industry. The foundation for any machine learning (ML) application is a sufficient database. In the case of PQC it contains the quality characteristics and the process data on a per-part basis. PQC is an inverse problem, as we want to infer a function H from some infinite-dimensional function space predicting the quality characteristics from process data [37].

We define process data and quality characteristics before constructing a database and deriving the resulting inverse problem.

Definition 1

The process data x i for part i is generated by \(m \in \mathbb {N}\) sensors, where the readings of every sensor s j 0 ≤ j < m are given as a function of time \(s_j: T \xrightarrow []{} S\) with \(t \in T \subset \mathbb {R}^+\). Accordingly the process data is modelled by \(x_i: T \xrightarrow []{} S^m\) with x i(t) := [s 0(t), …, s m(t)]T.

Definition 2

The measurements of the quality characteristics \(y_i\in \mathbb {R}^n\) for part i are given by \(n \in \mathbb {N}\) measurements, where every measurement v l 0 ≤ l < n is a fixed value y i := [v l]T.

In comparison to the process data x i we assume that the quality characteristics are time-invariant—or measured only once. Based on Definitions 1 and 2 the data for a unique part i is given by the tuple (x i, y i). Hence, we denote \(\mathcal {D} := \{(x_i,y_i)\}\, (0 \leq i<k)\) the database for a given PQC application with \(k \in \mathbb {N}\) entries.

Given the database \(\mathcal {D}\) we want to determine the parameters w ∈W of the mapping H w with

$$\displaystyle \begin{aligned} y_i = H_{\mathbf{w}} (x_i) \quad \forall (x_i,y_i) \in \mathcal{D}. \end{aligned} $$
(1)

Thus, the inverse problem has become a parameter estimation problem, which is usually ill-posed [37]. A common approach is the computation of a least-squares solution:

$$\displaystyle \begin{aligned} \operatorname*{\mbox{arg min}}_{\mathbf{w}} ||y_i-H_{\mathbf{w}} (x_i)||{}^2_{\mathcal{D}}. \end{aligned} $$
(2)

Note here that some kind of regularization usually improves the solution as noise in the data is considered [37]. The presence of noise in the data motivates the expansion of this deterministic interpretation of the parameter estimation using a Bayesian perspective.

The measurement of a quality characteristic is subject to measurement uncertainty; thus, it is better represented by a random variable. All sensor readings are also subject to measurement uncertainty and hence – to preserve the time dependency – interpreted as a stochastic process, which we define as follows:

Definition 3

Let \(u(t,\omega ): T \times \Omega \xrightarrow []{} S\) be a stochastic process, where \(t \in T \subset \mathbb {R}^+\) and ω ∈ Ω. Here Ω is the sample space of the probability space \((\Omega , \mathcal {F}, P)\) with \(\mathcal {F}\) being a σ-algebra and P a probability measure.

Accordingly we give the definitions of process data and quality characteristics in the Bayesian sense:

Definition 4

The process data X is generated by \(m \in \mathbb {N}\) sensors, where the sensor readings u j 0 ≤ j < m are given by a stochastic process. Accordingly the process data is modelled by \(X: T \times \Omega ^m \xrightarrow []{} S^m\) with \(X(t,\bar {\omega }):=[ u_j(t,\omega _j) ]^T\) where \(\bar {\omega }:= [\omega _j]^T\).

Definition 5

The measurements of the quality characteristics \(Y: \Omega ^n \xrightarrow []{} \mathbb {R}^n\) are given by \(n \in \mathbb {N}\) measurements, where every measurement v l 0 ≤ l < n is a random variable \(Y(\bar {\omega }): = [v_l(\omega _l)]^T\) where \(\bar {\omega }:= [\omega _l]^T\).

Based on Definition 4 and 5 the data of a single part i is given by (x i, y i), where \((x_i = X(\cdot , \bar {\omega }_i),y_i=Y(\cdot , \bar {\omega }_i))\) is a realization of (X, Y ). Taking a Bayesian point of view, Eq. (1) introduces the conditioned random variable Y |X, w and the solution to the inverse problem is the conditioned random variable \(\mathbf {w}|\mathcal {D}\) [37]. The parameters can be determined with maximum likelihood estimation (MLE) as

$$\displaystyle \begin{aligned} {\mathbf{w}}^{MLE} = \operatorname*{\mbox{arg max}}_{\mathbf{w}} \log P(\mathcal{D}| \mathbf{w}) = \operatorname*{\mbox{arg max}}_{\mathbf{w}} \sum_i \log P(y_i|x_i, \mathbf{w}) \end{aligned} $$
(3)

or by introducing a prior P(w) on the parameters and finding the maximum a posteriori (MAP) parameters

$$\displaystyle \begin{aligned} {\mathbf{w}}^{MAP} = \operatorname*{\mbox{arg max}}_{\mathbf{w}} \log P(\mathbf{w}| \mathcal{D}) = \operatorname*{\mbox{arg max}}_{\mathbf{w}} \log P(\mathcal{D}| \mathbf{w}) + \log P(\mathbf{w}). \end{aligned} $$
(4)

Example 1

Let the product have n = 2 quality characteristics, and the total amount of sensors on the involved machinery be m = 3. Then the database \(\mathcal {D}\) is constructed from Table 1. For sensor j = 0 there are two readings, for sensor j = 1 there is one reading and for sensor j = 2 there are three readings. We append all sensor readings into a single vector \(x \in \mathbb {R}^6\). The same procedure applies to the quality characteristics, which form the vector \(y \in \mathbb {R}^2\).

Table 1 Database entries for an exemplary predictive quality application

Assume that H w(x) := w x = y is a linear operator with \(\mathbf {w} \in \mathbb {R}^{2 \times 6}\), then the least-squares solution \(\hat {\mathbf {w}}\) according to Eq. (2) is

$$\displaystyle \begin{aligned} \hat{\mathbf{w}} \approx \begin{pmatrix} -12.92 \, & -89.1 \, & 1.69 \, & 3.35 \, & -3.17 \, & 24.28 \\ 0.93 \, & 0.12 \, & -0.002 \, & 0.04 \, & 0.002 \, & -0.16 \end{pmatrix}. \end{aligned} $$
(5)

3 State of Uncertainty Quantification for Predictive Quality

The formal proof of suitability requires a determination of the measurement uncertainty. We present the results of our literature review regarding the prediction of quality characteristics and on uncertainty quantification in (deep) machine learning. The UQ methods are designated keystones to provide a measurement uncertainty for PQC applications.

Current State of Predictive Quality

The quality of product depends on the interaction of the individual production steps and the condition of the components/machinery and material characteristics. Due to the increasing complexity in production processes, the number of interactions between individual processes is rising. Further, the increasing individualization of products leads to a significant increase in process variance [5].

To improve the understanding of products and processes in production engineering, data analytics methods are used to extract information from data and derive actions based on this information [15, 35]. In this sense, data analytics describes the steps of data investigation, data understanding, and knowledge acquisition, which aim to uncover new relationships within the production process [11]. There are many different methods for the implementation of this decision support, starting with statistical methods up to complex machine learning models, which differ in their application and depend on various factors such as purpose, expertise, and available resources. Data analytics methods can be categorized as descriptive analytics, diagnostic analytics, predictive analytics, or prescriptive analytics. The categories can be seen as steps in the data analysis, which partly rely on each other [26].

Considering the categories, PQ focuses on the application of predictive analytics to determine product quality based on process data [5]. Besides considering data from different process steps, existing information on intermediates and the individual assembly can also be taken into account. This enables a comprehensive optimization of the production process. By including data from product usage, the fulfillment of customer requirements can be increased [16, 36].

In recent years, the use of ML algorithms for PQC has been investigated in a manifold of applications. Especially the use of neural networks has shown potential for predicting quality characteristics, as they are capable of mapping and detecting complex cause-effect dependencies while the user is not required to contribute a high amount of expert knowledge [28, 34]. For example, Chen et al. used a back-propagation neural network algorithm and the Taguchi method for quality prediction in plasma-enhanced chemical vapor deposition for semiconductor manufacturing already in 2007 [12]. Ogordnyk et al. introduce a neural networks approach for PQ in the injection molding process. The task here was to classify the product quality based on 18 machine and process parameters [30]. Baturynska et al. describe a prediction model for selective laser sintering. They use neural networks to predict the deviation of manufactured parts in three dimensions depending on their orientation and positioning in the 3D printer [3].

The examples have in common that a model is set up to predict quality characteristics without quantifying the model’s uncertainty. Thus, no proof of suitability is obtained, making the use as an inspection tool in an industrial environment challenging. There are, however, machine learning methods which can be used to quantify the uncertainty of the model. These are introduced in the following.

Uncertainty Quantification in (Deep) Machine Learning

In the rise of (deep) machine learning since the 2010s, the importance of UQ has been underestimated in the scientific community. As adoption of ML progresses in industrial and consumer applications, safety and security regulations make some types of UQ necessary: verification, robustness, and interpretability [13]. Verification of a ML system provides formal guarantees about its behavior [8, 33, 44]. The robustness (i.e., the reaction to novel/noisy data) is highly relevant for industrial applications, as self-learning robots, and consumer applications, as autonomous vehicles [10, 27, 32]. Interpretability is another active field, where researchers try to understand why an ML system behaves a certain way [31]. We argue that verification and robustness are a form of UQ and that at least a subset of interpretability can be classified as UQ. In all cases, uncertainty in the model or the data are investigated.

Uncertainty in the data and the model are studied using Bayesian approaches since 1989. Early examples of Bayesian learning and Bayesian approaches to neural networks are [25] and [22]. In the 1980s, data sets were significantly smaller than today, and computational power was expensive. Since, the definition of UQ has been significantly expanded. Sullivan et al. consider the treatment of all uncertainties in real and computer-based applications [37]. Especially in the simulation community, where finite element and finite volume methods and their variants are commonly used, UQ did not gain traction until the early 2000s [43]. This was mainly due to the curse of dimensionality and the lack of computational power to perform the simulations for all parameter sets to be investigated [4]. The development of improved methods (e.g., sparse collocation) opened novel possibilities to overcome the curse of dimensionality and explore large parameter spaces efficiently [37].

In deep learning, there are three main movements for UQ [9]. There is Concrete Dropout [14]. The dropout rate becomes a learnable parameter, and nodes are dropped during the evaluation. Thus, a sample from a posterior distribution is generated from a single neural network by randomly omitting a certain percentage of neurons in each layer at each evaluation. This method is an extension to Dropout, which is used as a regularization method to prevent overfitting during model training [19]. Secondly, Deep Ensembles, as introduced in [24], are more sophisticated than Concrete Dropout. Depending on the algorithm’s variant, multiple neural networks are trained with different initializations and on different data subsets. At the evaluation, the outputs of all the neural networks are interpreted as samples from a posterior distribution. If we expand the number of models to infinity, we converge to Bayesian Neural Networks (BNN). For a BNN, the weights of each layer are represented by a probability distributions [17]. These networks are evaluated by sampling multiple times from the posterior distributions. In [20] a different classification is discussed, which takes other approaches into account that do not apply to PQC.

BNN are capable of representing aleatoric uncertainty (e.g., variability in the data) and epistemic uncertainty (e.g., model neglecting effects or missing data) via the posterior distribution [7]. This is a crucial feature for PQC applications as by Definitions 4 and 5 we have (commonly) unknown uncertainty in our data and no indication whether an employed model structure is sufficiently expressive. Even though we have seen successful applications of neural networks to PQC (cmp. [3, 12, 30] and more), assumptions regarding the structure or the hyperparameters of the models may be inherently flawed. BNN are successfully applied to various disciplines as physics [38], civil engineering [1], and others [2, 21, 45]. The BNN have shown excellent results, not only on theoretical toy problems (cmp. [7]) but in real world applications. Thus, we focus on BNN given their benefits and apply them to production engineering, and in particular to PQC. We demonstrate briefly how we apply BNNs to PQC, when predicting a quality characteristic \(\hat {y}\) from process data \(\hat {x}\).

The (posterior) predictive distribution of the unknown value \(\hat {y}\) for the test item \(\hat {x}\) is given by \(P(\hat {y}|\hat {x}) = \mathbb {E}_{P(\mathbf {w}|\mathcal {D})}\left [ P(\hat {y}|\hat {y},\mathbf {w}) \right ]\). The unknown distribution \(P(\mathbf {w}|\mathcal {D})\) can be rewritten using Bayes’ theorem:

$$\displaystyle \begin{aligned} P(\mathbf{w}|\mathcal{D}) = \frac{P(\mathcal{D}|\mathbf{w}) P(\mathbf{w})}{P(\mathcal{D})}, \end{aligned} $$
(6)

where P(w) is the prior on the weights, \(P(\mathcal {D})\) is a normalizing constant, and \(P(\mathcal {D}|\mathbf {w})\) is the likelihood of observation. To enable PQC in industrial settings, the predicted distribution \(P(\hat {y}|\hat {x})\) requires a small variance σ 2. However, this is not a specific goal of training a BNN since this method aims to approximate the distribution based on the given data. Hence the ambitions of quality engineers and mathematicians are not necessarily aligned.

There is not yet a consensus on how to quantify the quality of uncertainty quantification. Standard measures for a good fit of the posterior are the average marginal-log-likelihood, the prediction interval coverage probability, or the mean prediction interval width. However, Yao et al. show that these measures depend on the inference method used to determine the posterior distribution; we refer to [46] for a discussion of this matter.

Interim Conclusion

As detailed above, ML algorithms are successfully applied to PQC applications. In special use cases, we even see deployments in industrial applications even though uncertainties are not considered. Further, we established that UQ essential part for PQC and almost all other ML applications outside of laboratories.

To accomplish the overall goal to certify PQC methods as an inspections process, the application of UQ on PQC methods is imminent. We focus our upcoming research on BNN, as we see them as the most comprehensive and expressive method.

4 Application of Bayesian Neural Networks to the Prediction of Quality Characteristics

We apply a BNN to an injection molding process of a thin-walled thermoplastic part. In expert interviews, 14 process parameters (e.g., tool temperature, cycle time, pressure) were identified, each of which is recorded with one sensor. Hence, the machine provides m = 14 sensors for process data. We focus on n = 1 quality characteristic, i.e., a length of the exemplary part with a nominal value of 72.6 mm. The database \(\mathcal {D}\) was generated using a full-factorial design of experiments (DoE), where machine settings are explicitly varied, with k = 600 experiments. The measurements of the quality characteristic were performed on a coordinate-measuring machine, whose suitability was proven by a Gage R&R Study (MSA) in advance [29].

The data quality is excellent, as it was manually verified during the recording and before model training. All sensors and the quality characteristic are scaled to the interval [0, 1] to facilitate efficient model training. The original scaling is used for the interpretation in the industrial context in Sect. 4.1.

We use a feed-forward neural network with two hidden layers and leaky ReLU activation functions. The first hidden layer has four nodes, while the second hidden layer has two nodes. The second layer’s output is used to parametrize a normal distribution \(\mathcal {N}(\mu ,\sigma )\): the first node is interpreted as the mean μ, while the second node is understood as the variance σ.

Comparably to [7], we use a prior P(w) on our weights w and fit a posterior \(P(\mathbf {w}|\mathcal {D})\). A prior is placed on the weights \(P_t(\mathbf {w}) = \prod _j \mathcal {N} ({\mathbf {w}}_j | t_j, \sigma _p)\) where \(\mathcal {N} (x | \mu _p, \sigma _p)\) is the Gaussian density evaluated at x with mean μ p and variance σ p. The prior is learnable as the means t j are fitted during training, while σ p = 1 is fixed. We use a Gaussian variational posterior with trainable mean and variance.

The network is trained for 1250 epochs with a learning rate of 0.001 using the Adam optimizer. The other hyperparameters of the optimizer are the default values.Footnote 1 For the loss L we use the sum of the Kullback–Leibler divergence from both hidden layers and add the negative log-likelihood:

$$\displaystyle \begin{aligned} L = KL_1 + KL_2 + \mathbb{E}_{q_1({\mathbf{w}}_1|\theta_1),q_2({\mathbf{w}}_2|\theta_2)}\left[ log P(\mathbf{w} | \mathcal{D}) \right]. \end{aligned} $$
(7)

Here \(KL_i = KL\left [q_i({\mathbf {w}}_i|\theta _i) || P({\mathbf {w}}_i) \right ]\) where i = 1, 2 indexes the hidden layers and θ i are the parameters of a distribution on the weights. We keep the notation according to [7] and refer the interested reader for details. The loss L over the 1250 epochs is given in Fig. 2. After plateauing for about 1000 epochs, a final drop occurs over another 200 epochs before optimal performance is reached.

Fig. 2
figure 2

Loss L during the training with 1250 epochs

We train the BNN on 540 data points (≈ 90%) and randomly select 60 (≈ 10%) points for the evaluation. We sample the trained BNN 5000 times for each evaluation point to generate as many pairs (μ i, σ i) for the parametrized normal distribution. Figure 3 depicts the means μ i in a box plot for the first 15 evaluation points and give the results for the first 10 as tabular data in Table 2. The actual quality characteristics y 1 are given in blue in the box plot for comparison. The mean absolute error (MAE) between the mean of means \(\frac {1}{5000}\sum _{i=0}^{5000} \mu _i\) and the actual value y 1 is ≈ 0.1814. In relation to the size of the data set, this is a reasonably low MAE. In Fig. 3 only sample i = 7 is an outlier regarding the mean of means. A more extensive data set would allow more rigorous training of the BNN and yield a better MAE. We provide code and the scaled data set in our GitHub repository.Footnote 2

Fig. 3
figure 3

Quality characteristic y 1 (blue) with a box plot of the prediction H w(x) based on 5000 model evaluations for 15 samples from the test data set

Table 2 Quality characteristic y 1 with the mean prediction \( \mathbb {E} \left [ H_{\mathbf {w}}(x) \right ]\) and the variance of the prediction \( \mathbb {V} \left [ H_{\mathbf {w}}(x) \right ]\) after the training

4.1 Interpretation in the Industrial Context

For the industrial practitioner, the raw results of the BNN need further interpretation. Primarily, we have to restore the original scaling to evaluate the PQC in context. In Table 3 and Fig. 4 the predicted values are restored to their original scaling. It is notable how the variance decreases after the rescaling. This does not indicate a better model performance but is rather due to the dependency of variance on the mean. Similarly, the MAE decreases to ≈ 0.0641.

Fig. 4
figure 4

Quality characteristic y 1 (blue) in its original scale with a box plot of the prediction H w(x) based on 5000 model evaluations for 15 samples from the test data set

Table 3 Quality characteristic y 1 with the mean prediction \( \mathbb {E} \left [ H_\theta (x) \right ]\) and the variance of the prediction \( \mathbb {V} \left [ H_\theta (x) \right ]\) after transformation to the original scale

To prove the suitability for this virtual inspections process, the golden rule of metrology according to which the ratio \(\frac {U}{T}\) of the uncertainty of measurements U to tolerance T shall not be greater to one-tenth to one-fifth [39]. For our example, we can interpret the 2σ-interval γ of H w(x) as the uncertainty of measurement. Then with \(\mathbb {V}\left [ H_{\mathbf {w}}(x) \right ] < 0.0167\):

$$\displaystyle \begin{aligned} \gamma = 2 \sqrt{\mathbb{V}\left[ H_{\mathbf{w}}(x) \right]} \leq 0.258. \end{aligned} $$
(8)

Given T = 0.6 and choosing U = ⌈γ⌉, we derive

$$\displaystyle \begin{aligned} \frac{U}{T} \leq \frac{0.258}{0.6} = 0.43 \overset{!}{\leq} 0.2. \end{aligned} $$
(9)

Thus, based on this conservative estimate of the uncertainty of measurement, this BNN is not suitable as an inspection process. However, the following aspects need further consideration:

  • Using a more advanced inference method (e.g., Hamiltonian Monte Carlo) can better approximate the posterior and generate more favorable results regarding the suitability.

  • As the database was generated by a DOE, the process variation is deliberately high. This is in stark contrast to a real production environment, where the variation is usually low, and process capability is ensured.

  • The size of the database is relatively small compared to the number of trainable parameters (≈ 210) in the BNN.

  • The hyperparameters have a significant influence on the performance of the BNN. Deliberate, application-specific manual tuning or the use of AutoML-methods could guarantee proof of suitability.

Overall, we are certain that BNN are a well-suited method for PQC, but we openly acknowledge that more research is necessary before adopting industrial applications.

Furthermore, for a formal evaluation of the suitability, the measurement uncertainty must be determined by an approved procedure as the GUM or the VDA 5 (see [39] for details). However, none of these procedures considers algorithms based on process data. Many aspects from physical inspection procedures are transferable to PQC, yet some error sources (e.g., numerical concerns) are not addressed. As the adoption and development of PQC methods progress, the process to determine suitability will be extended as well.

5 Concluding Remarks

We identified the prediction of quality characteristics as the fundamental foundation of every predictive quality method. To give a framework for future research, we provided a formal definition of prediction of quality characteristics. Further, we established PQC as a virtual inspection process, which can complement and/or reduce costly physical inspections. For every inspection process, a proof of suitability is necessary, which requires the determination of the measurement uncertainty of the underlying method. Hence we added a Bayesian perspective to our definition to PQC, to consider model- and data-inherent uncertainties.

Based on our literature review, we reason that existing machine learning methods, as BNN, can provide an adequate uncertainty estimation. The uncertainty estimates are a decisive keystone to establish PQC as a virtual inspection process and permit proof of suitability. As a showcase, we applied a BNN to an injection molding process and give several hints on how to improve the uncertainty estimate for future applications. To facilitate adoption in industry, we advocate for a revision of standards as the VDA5 or the ISO 22514-7 to accommodate for virtual inspection processes.