1 Introduction

Random field characterizes randomness as a function of spatial (or temporal) variables. So far, little effort has been made to consider the random field in engineering probability analysis and design (Choi et al. 2006; Missoum 2008; Chen et al. 2010; Basudhar and Missoum 2009). That is mainly because of little or no effective approach for random field modeling, misconception of minor influence of the random field on system performances, or both. Hence, the random parameter approach (RPA) has been popular in probability analysis and design for engineering products and processes. The RPA models manufacturing and operation variability with simplification or ignorance of the spatial or temporal variability (i.e. the random field). For instance, spatial variability of a plate thickness can be modeled with a simplified form like a thickness random parameter. Such simplification of the random field in engineering practice may lead to an unreliable design. It has been recently acknowledged (Missoum 2008; Chen et al. 2010; Basudhar and Missoum 2009; Rajaee et al. 1994; Tamura et al. 1999; Berkooz et al. 1996; Fukunaga 1990; Yin et al. 2009; Xi et al. 2010) that consideration of the random field is a key to accurately predict variability in system performances, especially, field-sensitive failures (e.g., notch, buckling) and small-scale applications, in which tolerance control is more important but challenging. The random field can be found in the form of geometry, material property, and load in micro-electro-mechanical systems (MEMS), mechanical, and electronics products.

It is challenging to take into account the random field for probability analysis and design due to three primary reasons. First, it can be complicated and expensive to characterize the random field in engineered systems. Second, very few techniques are available to account for both random parameter and field variables for probability analysis and design. Third, the amount of random field realizations is often lacking in most engineering problems. To resolve the first challenge, various methods for random field characterization have been proposed, including the midpoint method (Der Kiureghian and Ke 1988), the spatial averaging method (Vanmarcke and Grigoriu 1983), the shape function method (Liu et al. 1986), and the proper orthogonal decomposition (POD) method (Turk and Pentland 1991). The first two methods can represent a random field in a discrete manner using a random vector. However, they require a large number of random field variables to describe the field reasonably well (Li and Der Kiureghian 1993). The shape function method can express a random field using a continuous function. But the accuracy of the method depends upon the selection of shape functions (Li and Der Kiureghian 1993). The POD method can describe a random field in both continuous and discrete ways. Furthermore, it is shown that the POD method is much more efficient than the other three methods while meeting a given accuracy level (Sudret and Der Kiureghian 2000). To resolve the second challenge, techniques have been sought to approximate a random field with a set of random field variables in a parametric form (Missoum 2008; Chen et al. 2010; Basudhar and Missoum 2009). It was, however, assumed that the random field variables are statistically independent. This assumption may not be valid because random field variables using the POD may be statistically dependent, thus resulting in inaccurate probability analysis (Xi et al. 2010). In order to address such a limitation, the authors proposed the random field modeling method in the previous work that accounts for the statistical dependence in the random field variables (Xi et al. 2010). To resolve the third challenge, an effective random field modeling technique with insufficient random field realizations should be proposed. Our study shows that available approaches for random field modeling are inadequate when random field realizations are insufficiently provided. This paper thus envisions addressing the challenge by proposing a Bayesian Copula based random field modeling technique. The proposed technique combined with the random field characterization approach such as the POD method can leverage any advanced probability analysis methods, such as the dimension reduction (DR) method (Rabitz et al. 1999; Rabitz and Alis 1999; Xu and Rahman 2004; Huang and Du 2006), eigenvector dimension reduction (EDR) method (Youn and Xi 2009), polynomial chaos expansion (PCE) method (Lee and Chen 2009; Hu and Youn 2009), etc.

This paper is structured as follows. Section 2 reviews the POD method for random field modeling. Section 3 presents the proposed Bayesian Copula based random field modeling technique with insufficient field realizations. Section 4 presents a mathematical case study for demonstrating the accuracy of the proposed random field modeling technique with insufficient random field data. In Section 5 an industrial case study was used to conduct robust design optimization (RDO) of door misalignment in a two-door refrigerator assembly process while considering the random field of tolerance with insufficient realizations. Section 6 concludes the paper with description of the contributions and limitation.

2 Review of the proper orthogonal decomposition (POD) method

A random field θ can be decomposed into mean μ and variation parts ν. The k th random field realization can be generally expressed as

$$ {\theta}_k=\mu +{\nu}_k $$
(1)

As an example, Fig. 1 shows the V-8 engine head and one random field realization of the head surface height (Wang et al. 2009) (see Fig. 1b), where the color denotes the height (μm) measured by a Coherix ShapixTM surface inspection system. The random field reflects the performance of a machining system, which of course affects engine performances. Other examples can be found in various engineering fields: chassis and frames in automotive and aerospace, channel diameters in microfluidics, and beam thickness in the MEMS bistable mechanism.

Fig. 1
figure 1

V-8 engine head and its random field realization of the surface height

Using the POD approach, a random field can be represented by Eq. (2) with sufficient number (e.g. m) of sampled random field realizations.

$$ \theta (x)=\mu (x)+{\displaystyle \sum_{i=1}^m{\alpha}_i\frac{\phi_i(x)}{\left\Vert {\phi}_i(x)\right\Vert }} $$
(2)

where x is the spatial measurement location; ϕ i (x) is the i th field signature of an ensemble of the random field variation ν(x); α i is the coefficient of the i th field signature and its value can be obtained through the inner product operation (●) between the field variation ν(x) and the unit field signature as

$$ {\alpha}_i=\nu (x)\bullet \frac{\phi_i(x)}{\left\Vert {\phi}_i(x)\right\Vert } $$
(3)

Theoretically, all signatures are required to exactly represent the random field that can be realized by a sufficiently large number (m) of sampled realizations as described in Eq. (2). However, only a few important signatures may be vital for the representation of the random field. In Eq. (3), a dataset of the coefficient (α i ) of the i th signature can be obtained from all sampled realizations. Hence, V i was defined as a random field variable that statistically models the coefficient dataset (α i ) of the i th signature. Therefore, the random field was represented as a function of deterministic field signatures (ϕ i ) and the corresponding random field variables (V i ) as explained in Eq. (4).

$$ \theta (x)\cong \mu (x)+{\displaystyle \sum_{i=1}^N{V}_i\frac{\phi_i(x)}{\left\Vert {\phi}_i(x)\right\Vert }} $$
(4)

where V i is the i th random field variable and N is the number of the vital field signatures. The distributions and statistical parameters of the random field variable V i can be identified using the maximum likelihood estimation (MLE) and goodness-of-fit tests.

In engineering applications, it is more practical to represent a random field in a discrete manner than in a continuous way because a finite amount of field data is given at discrete field locations. As shown in Fig. 1, it is reasonable to say that each realization has a finite amount of measurement locations (n), and physical quantities at each measurement location has variability over a finite amount of sampled realizations (m). The physical quantities could be geometric tolerances, material properties, and/or loads. Thus an m × n matrix representing the random field (Θ) can be constructed as

$$ \boldsymbol{\uptheta} =\left[\begin{array}{cccc}\hfill {\theta}_{11}\hfill & \hfill {\theta}_{12}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{1n}\hfill \\ {}\hfill {\theta}_{21}\hfill & \hfill {\theta}_{22}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{2n}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill {\theta}_{m1}\hfill & \hfill {\theta}_{m2}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{mn}\hfill \end{array}\right] $$
(5)

where θ lj indicates the measured physical quantity at the j th measurement location of the l th sampled realization. Such representation works for multi-dimensional problems. Regardless of the dimension of the random field, the scanned multi-dimensional data are listed in a one-dimensional array from θ l1 to θ ln for the l th sampled realization. The mean of the random field is estimated as

$$ \boldsymbol{\upmu} =\left[{\overline{\theta}}_{\bullet 1},{\overline{\theta}}_{\bullet 2},\cdots, {\overline{\theta}}_{\bullet n}\right] $$
(6)

where \( {\overline{\theta}}_{\bullet j} \) stands for the average of the j th measured physical quantity over the sampled realizations. Hence the variation part is expressed as

$$ \boldsymbol{\upnu} =\left[\begin{array}{cccc}\hfill {\theta}_{11}-{\overline{\theta}}_{\bullet 1}\hfill & \hfill {\theta}_{12}-{\overline{\theta}}_{\bullet 2}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{1n}-{\overline{\theta}}_{\bullet n}\hfill \\ {}\hfill {\theta}_{21}-{\overline{\theta}}_{\bullet 1}\hfill & \hfill {\theta}_{22}-{\overline{\theta}}_{\bullet 2}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{2n}-{\overline{\theta}}_{\bullet n}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill {\theta}_{m1}-{\overline{\theta}}_{\bullet 1}\hfill & \hfill {\theta}_{m2}-{\overline{\theta}}_{\bullet 2}\hfill & \hfill \cdots \hfill & \hfill {\theta}_{mn}-{\overline{\theta}}_{\bullet n}\hfill \end{array}\right] $$
(7)

The signature vector ϕ can then be obtained by solving an eigen-problem as

$$ {\displaystyle \sum \phi =\lambda \phi } $$
(8)

where λ is the eigenvalue of the covariance matrix Σ (n × n) that is defined as

$$ {\displaystyle \sum ={v}^{\mathrm{T}}v} $$
(9)

3 Random field modeling with insufficient realizations

When the mean (μ) and variation (ν) of the random field do not converge to true values, available realizations is said to be insufficient for the random field modeling. In such cases, the existing techniques for the random field modeling (Yin et al. 2009; Xi et al. 2010; Der Kiureghian and Ke 1988; Vanmarcke and Grigoriu 1983; Liu et al. 1986; Turk and Pentland 1991; Li and Der Kiureghian 1993) are inadequate because the random field may be inaccurately represented. This paper propose a new Bayesian Copula based random field modeling technique when insufficient random field realizations are provided.

Let θ(x) be a continuous random field of interest. The discretized random field with n measurement locations can be represented as θ = [θ(x 1), …, θ(x n )], where x j is the j th measurement location and θ(x j ) is a vector of measured physical quantity for all sampled realizations at the j th measurement location. The randomness of θ(x j ) can be represented by a random variable Θ j . Hence, the random field is represented by a collection of random variable Θ j (j =1, …, n), namely a joint probability density function (PDF). Thus, random field modeling essentially becomes a process to identify an n-dimensional joint PDF with insufficient random realizations of Θ j . The proposed approach employs a Copula for the joint PDF modeling because the Copula is capable of modeling a multivariate distribution with various statistical dependence patterns.

3.1 Joint PDF modeling of the random field

A Copula is a joint distribution function of standard uniform random variables. According to Sklar’s theorem (Sklar 1959), there exists an n-dimensional Copula C such that for all Θ j in a real random space

$$ F\left({\varTheta}_1, \dots, {\varTheta}_n\right) = C\left({F}_1\left({\varTheta}_1\right), \dots, {F}_n\left({\varTheta}_n\right)\right) $$
(10)

where F is an n-dimensional joint distribution function with marginal distribution functions F 1, …, F n and C is the Copula cumulative distribution function (CDF). To date, most Copulas deal with bivariate data due to the lack of practical n-dimensional generalization of the coupling parameters (Huard et al. 2006; Roser 1999). One way to deal with multivariate data is to analyze the data pair-by-pair using two-dimensional Copulas. Hence, a total number of Ch n2 bivariate Copula modeling is required for the joint PDF modeling of the random field, where Ch n2 indicates the mathematical operation of 2-combinations from n elements. To release the computational burden when the number of measurement locations n is too large, it is desirable that only the pair with strong statistical dependence should be modeled because accurate generation of the random field realization largely depends on the pair modeling with strong statistical dependence. One numerical example will be used to further illustrate this matter in Section 4. It is, however, impossible to know the level of statistical dependence among pairs unless all the pair modeling has been conducted. Therefore, it is assumed that the random field realization at the j th measurement location, i.e. θ(x j ), should have the strongest statistical dependence with its closest neighbor which could be identified by the Euclidean distance under general situations. Compared to the Copula modeling, calculation of the Euclidean distance is much more computationally efficient. To facilitate detailed modeling, location index is used to indicate the closest neighbor at specific measurement location. For instance, it is assigned that x j+1 is the closest neighbor of x j, and x j+2 is the closest neighbor of x j+1 exclude for x j . Therefore, Copula modeling is only performed between the random variable Θ j’ and Θ j’+1 (j′ = 1, …, n-1), resulting in a significant reduction of the Copula modeling from Ch n2 to n-1. Overall steps of the random field modeling are first illustrated, then technical details are followed in the subsections.

Step 1: Optimal marginal distributions are modeled for all random variables Θ j (j = 1, 2, …, n) using Bayesian statistics to describe random field realizations at each measurement location. Technical details are illustrated in subsection 3.1.1.

Step 2: Bivariate Copula modeling is performed pair-by-pair for random variables Θ j’ and Θ j’+1 (j′ = 1, 2, n-1) using a Bayesian Copula approach. Technical details are illustrated in subsection 3.1.2.

Step 3: Sufficient random field samples are generated using Monte Carlo simulation (MCS) according to the conditional PDF f(θ j’+1|θ j’ ) and initial random sample generation for marginal distribution of f(θ 1).

3.1.1 Determination of the optimal marginal distribution

Bayesian statistics is employed to select the optimal marginal distribution for Θ j . Let the random variable Θ j follow a PDF with an unknown distribution parameter vector Δ j . From the Bayesian point of view, δ j is interpreted as a realization of a random vector Δ j with a PDF \( {f}_{{\boldsymbol{\Delta}}_{\mathbf{j}}}\left({\boldsymbol{\updelta}}_{\mathbf{j}}\right) \). The density function expresses what one thinks about the occurring frequency of Δ j before any future observation of Θ j is taken, that is, a prior distribution. Based on the Bayes’ theorem, the posterior distribution of Δ j given a new observation of Θ j can be expressed as

$$ {f}_{{\boldsymbol{\Delta}}_{\mathbf{j}}\Big|{\varTheta}_j}\left({\boldsymbol{\updelta}}_{\mathbf{j}}\Big|{\theta}_j\right)=\frac{f_{\varTheta_j\Big|{\boldsymbol{\Delta}}_{\mathbf{j}}}\left({\theta}_j\Big|{\boldsymbol{\updelta}}_{\mathbf{j}}\right){f}_{{\boldsymbol{\Delta}}_{\mathbf{j}}}\left({\boldsymbol{\updelta}}_{\mathbf{j}}\right)}{f_{\varTheta_j}\left({\theta}_j\right)} $$
(11)

The Bayesian approach is used for updating information about the parameter vector Δ j . First, a prior distribution of Δ j should be assigned before any future observation of Θ j is taken. Then, the prior distribution of Δ j is updated to the posterior distribution as the data of Θ j is obtained. It is extremely difficult to compute an exact analytical form of the posterior distribution for the parameter vector Δ j since the normalization factor (the denominator in Eq. (11)) requires a complicated and multi-dimensional integration. Various approaches have been developed and employed to overcome the difficulty including the conjugate Bayesian (Zhan et al. 2012), Laplace approximation (Boone et al. 2005; Wang and Blei 2013), entropy based methods (Zhu and Xing 2009; Guan et al. 2012), Metropolis-Hastings algorithm (Berg 2004), etc. With the posterior distribution of the parameter vector Δ j in Eq. (11), the optimal marginal distribution of Θ j can then be determined by generating sufficient random samples using MCS based on the underlying PDF of Θ j .

3.1.2 Determination of the optimal copula

The common methods to select the optimal Copula are based on the maximum likelihood approach (Fermanian 2005; Chen and Fan 2005; Panchenko 2005), which estimates an optimal parameter set. Recently a Bayesian Copula approach (Huard et al. 2006) was proposed to select the optimal Copula that is independent on the parameter estimation. It was further shown in the study that this approach provides more reliable identification of true Copulas even with the lack of samples (Huard et al. 2006). Hence, the Bayesian Copula approach is employed for the statistical dependence modeling between random variables Θ j’ and Θ j’+1. For the sake of completeness, we briefly describe the procedures for selecting the optimal Copula using the Bayesian approach. Interested readers should refer to the reference (Huard et al. 2006) for details.

A set of hypotheses is first made as follows using the Bayesian Copula approach.

  • H k : The data come from Copula C k , k = 1, . . ., Q

The objective is to find the Copula with the highest probability Pr(H k | D), i.e. the optimal Copula, from a finite set (Q) of Copulas, where D represents the bivariate data in the standard uniform space. Based on the Bayes’ theorem, the probability that the bivariate data come from the Copula C k is expressed as

$$ \Pr \left({H}_k\Big|D\right)=\frac{ \Pr \left(D\Big|{H}_k\right) \Pr \left({H}_k\right)}{ \Pr (D)}={\displaystyle {\int}_{-1}^1\frac{ \Pr \left(D\Big|{H}_k,\tau \right) \Pr \left({H}_k\Big|\tau \right) \Pr \left(\tau \right)d\tau }{ \Pr (D)}} $$
(12)

where τ is a non-parametric measure of the statistical dependence and ranges from -1 to 1 for all Copulas. Eq. (12) can be rewritten to Eq. (13) because Pr(τ) is equally likely for each Copula C k and Pr(H k |τ) is equally probable with respect to a given τ which reflects no preference for a specific Copula.

$$ \Pr \left({H}_k\Big|D\right)={\displaystyle {\int}_{-1}^1\frac{ \Pr \left(D\Big|{H}_k,\tau \right)d\tau }{ \Pr (D)}} $$
(13)

Pr(D|H k , τ) can be calculated from the Copula PDF as

$$ \Pr \left(D\Big|{H}_k,\tau \right)={\displaystyle \prod_{l=1}^m{c}_k\left({\theta}_{j\hbox{'},l},{\theta}_{j\hbox{'}+1,l}\Big|\tau \right)} $$
(14)

where c k (•) is the PDF of the k th Copula, m is the number of the bivariate data set, and θ j’,l and θ j‘+1,l are the l th bivariate data realized from random variables Θ j’ and Θ j’+1. The normalization of Pr (D) can be computed using the sum rule (Jaynes and Bretthorst 2003). This study employs four representative Copulas, i.e. Clayton, Gaussian, Frank, and Gumbel, for the bivariate statistical modeling.

3.2 Remarks of the random field modeling

With sufficient samples of the random field generated from the joint PDF modeling, existing random field characterization methods such as the POD method can be readily employed for the random field characterization using Eq. (4). Then, any advanced probability analysis methods can be used to further perform probability analysis and engineering design. In addition, bounds of the random field may be too wide due to insufficient field realizations. It is feasible to narrow the bounds of the random field by forcing generation of the random field realizations within a reasonable confidence interval (CI) (e.g. 95 %) in Step 3 of Section 3.1.

It is worth noting that a generated field realization could be bumpy due to the sample randomness from the Copula model. Unlike the Gaussian random field (Haran 2012), the proposed approach does not assume any explicit functional form of the covariance function which results in the smooth realization of the Gaussian field. Such smoothness of the random field may not be true in reality as shown by the machined surface roughness (Bhushan et al. 2012). However, if the random field realization is believed to be smooth, various smoothing filters (e.g. the Savitzky-Golay smoothing filter (Orfanidis 1996)) can be used to smooth out the bumpy realization. Furthermore, it is assumed in the Gaussian random field that the field realizations are extracted from the multi-normal distributions. The proposed random field modeling approach eliminates such an artificial assumption and hence is more general than the Gaussian approach.

3.3 Accuracy quantification of the random field modeling

It is desirable to quantify accuracy of the random field modeling for various reasons. For example, a good metric can provide unbiased accuracy measure of the random field modeling from different approaches. In addition, the metric should represent about 100 % accuracy level if the approximate random field converges to the true random field when sufficient random field realizations are available. Due to little research in random field modeling with insufficient realizations, however, such a metric is not readily available.

The cross entropy (Kullback and Leibler 1951), also called Kullback–Leibler (KL) distance, was proposed to measure the similarity between a true PDF and an estimated PDF. The smaller the expected cross entropy, the higher degree of similarity is the approximate PDF to the true PDF. This concept is adopted in this paper to quantify accuracy of the random field modeling because the random field is essentially represented by a joint PDF. Let f(θ) and \( \widehat{f} \) (θ) denote the true and approximate joint PDFs, respectively, with regard to a random field realization θ. The cross entropy or KL distance is defined as

$$ \zeta \left(f,\widehat{f}\right)={\displaystyle \int f\left(\theta \right)}\cdot \ln \left[\frac{f\left(\theta \right)}{\widehat{f}\left(\theta \right)}\right]d\theta $$
(15)

Based on the concept of Shannon’s entropy, Eq. (15) computes the difference in the expected information between two distributions as

$$ \zeta \left(f,\widehat{f}\right)={E}_f\left[ \ln (f)\right]-{E}_f\left[ \ln \left(\widehat{f}\right)\right] $$
(16)

where the random field realization θ is omitted for clarity. It should be noted that ζ is not a physical distance between f and \( \widehat{f} \) in the common sense, since the KL distance is not associative in general, i.e., ζ(\( \widehat{f} \), f) ≠ ζ(f, \( \widehat{f} \)). Nevertheless, ζ is an information-theoretic distance measure between two different distributions.

4 Case study: a mathematical example of random field modeling

A mathematical example is used to demonstrate accuracy of the proposed approach for random field modeling with insufficient realizations. The analytic random field is formulated as

$$ \theta =0.1{\displaystyle \sum_{r=1}^5\left[ \sin \left(\frac{K_r\pi x}{r}\right)+ \sin \left(\frac{K_r\pi \left(100-x\right)}{r}\right)\right]} $$
(17)

where K r  ~ Normal(2, 0.022); x stands for the measurement location and ranges from 1 to 100. Sufficient field realizations can be obtained by generating sufficient random samples of K r as shown in Fig. 2. Convergence of the mean and variance of the random field was studied by increasing the number of realizations. It was found that the mean and variance of the random field converges to the true values with one hundred realizations (see Fig. 3a) and five hundred realizations (see Fig. 3b), respectively. Hence, it is concluded from this example that about five hundred realizations are required in order to accurately represent the random field using existing random field modeling approaches (e.g., the POD method in Eq. (4)).

Fig. 2
figure 2

One thousand random field realizations

Fig. 3
figure 3

Convergence of the mean and variance of the random field

In this study only five realizations (see Fig. 4) are assumed to be available for the random field modeling for demonstration of the proposed approach with insufficient realizations. As described in Section 3.1.2, the joint PDF of the random field was obtained by determining the optimal Copulas between Θ j’ and Θ j’+1 where j′ ranges from 1 to 99. Figure 5 shows part of the results where the first line illustrates the optimal marginal distributions of Θ j and the second line indicates the optimal Copulas between Θ j’ and Θ j’+1.

Fig. 4
figure 4

Five random field realizations

Fig. 5
figure 5

Joint PDF modeling of the random field

MCS was employed to generate sufficient random field realizations based on the Copula modeling of the random field according to the step 3 in Section 3.1. It is noted in Fig. 6a that the generated realization is a little bumpy because of the sampled randomness from the Copula model. To suppress the noise, the Savitzky-Golay smoothing filter was used to smooth the generated realization as shown in Fig. 6a. The cross entropy ζ was calculated as 0.16 and visual comparison between the approximate and actual random fields is shown in Fig. 6b, where only the lower and upper bounds of the actual random field is plotted. From the fact that the cross entropy ζ is relatively small and the trend of the sampled random field matches quite well with the actual one, the proposed method of the random field modeling represents the actual random field reasonably well even with insufficient random field realizations. It is desirable that the approximate random field has wider field variability than the true one when insufficient field realizations are provided. However, the proposed approach is able to generate the random field realizations within a given CI to reduce the large variability of the approximate random field as shown in Fig. 6c. In addition, it is worth noting that bounds of the approximate random field compared to the true one depend on not only the number of field realizations but also their specific realization locations.

Fig. 6
figure 6

Random field modeling with insufficient field realizations based on the Copula

A comparison study was further conducted by only performing bivariate Copula modeling between Θ 1 and Θ k , where k ranges from 2 to 100, and then generating sufficient random field realizations based on the conditional PDF f(θ k |θ 1) and initial random sample generation for marginal distribution of f(θ 1). In this scenario, the visual comparison between the approximate and actual random fields is very similar to Fig. 6b because the marginal distributions of Θ j are the same in both cases. However, the cross entropy ζ was calculated as 0.35 indicating a poor representation of the actual random field due to the relatively poor joint PDF modeling. The results indicate that pair modeling with stronger statistical dependence help produce more accurate random field realizations.

5 Case study: random field modeling for robust design of door misalignment reduction in a Two-door refrigerator assembly process

Many consumers are concerned about door misalignment of a multi-door refrigerator. The door misalignment is caused by parametric and field variability in the assembly process of the refrigerator. Followed by modeling the field variability with the random field variables, probability analysis and design can take care of both random parameters and fields in a parametric form (Xi et al. 2010). This study incorporates the proposed random field modeling into robust design optimization (RDO) that can minimize the door misalignment in the presence of both random parameters and fields. The RDO problem can be formulated as

$$ \begin{array}{l}\mathrm{minimize}\kern0.9em {\mu}_y\left(\mathbf{d},\kern0.3em \boldsymbol{\upgamma} \right)+6{\sigma}_y\left(\mathbf{d},\kern0.3em \boldsymbol{\upgamma} \right)\\ {}\mathrm{subject}\kern0.3em \mathrm{t}\mathrm{o}\kern0.6em {G}_i\left(\mathbf{X},\kern0.3em \boldsymbol{\Theta} \left(\mathbf{V}\right)\right)\le 0,\kern0.3em i=1,\cdots, NC\\ {}\kern4.5em {\mathbf{d}}^{\mathbf{L}}\le \mathbf{d}\le {\mathbf{d}}^{\mathbf{U}},\kern1.6em \mathbf{d}\in {R}^{ND}\kern0.7em and\kern1.1em \mathbf{X}\in {R}^{NP}\\ {}\kern4.5em {\boldsymbol{\upgamma}}^{\mathbf{L}}\le \boldsymbol{\upgamma} \le {\boldsymbol{\upgamma}}^{\mathbf{U}},\kern1.5em \boldsymbol{\upgamma} \in {R}^{MD}\kern0.6em and\kern1.1em \boldsymbol{\Theta} \in {R}^M\end{array} $$

where μ y and σ y are the mean and standard deviation of the objective function, respectively, d is the design vector of random parameter variables, γ is the design vector of random field variables, G i is the i th constraint function, X is the random parameter vector, V is the random field variable vector, NC, ND, NP, MD, and M are the number of constraints, random parameter design variables, random parameters, random field design variables, and random fields, respectively.

5.1 Probability analysis of the door misalignment

Let us consider the assembly process of a two-door refrigerator, which comprises of the main parts (front-L, inner case, hinge, freezer and refrigerator sides) as shown in Fig. 7. Door misalignment of a two-door refrigerator can characterize the performance of the three-stage assembly process, which is outlined with the process inputs and outputs as shown in Fig. 8. The objective of this study is to statistically predict and then minimize the door misalignment by analyzing the propagation of parametric and field variability in the assembly process through three assembly processes including: 1) insertion of the front-L component to the inner case in both freezer and refrigerator sides, 2) foaming process to increase stiffness of the main frame of the two-door refrigerator, and 3) door hinge installation to the front-L component. These assembly processes can produce the deformation field of the front-L component in a random manner. To demonstrate the effectiveness of the proposed random field modeling approach for probability analysis and design, this study was designed to conduct: 1) probability analysis of the door misalignment, and 2) robust design optimization of the door misalignment.

Fig. 7
figure 7

A two-door refrigerator and its main parts used in the assembly process

Fig. 8
figure 8

Assembly process of a two-door refrigerator

HyperWorks was used to build the insertion model as shown in Fig. 9a. The simulation model with 128,916° of freedom is mostly composed of Shell43 except the contact region between the front-L and the flange where Targe170 and Conta173 were employed. The edge of the inner case and the bottom of the front-L were fixed as indicated in Fig. 9a. ANSYS was then employed to perform the nonlinear contact analysis for identifying the final balance position between the front-L and the flange.

Fig. 9
figure 9

FE model of the insertion process and the input random parameter variables

As shown in Figs. 8 and 9b, eight random input parameter variables were defined for the freezer and refrigerator sides. Table 1 shows the baseline statistical properties of the random parameter variables for statistical calibration of the front-L deformation after the insertion process, where the subscripts R and F indicate the refrigerator and freezer sides, respectively. Given measured data representing the random input parameters, the statistical properties of the random parameter variables were determined using the maximum likelihood estimation (MLE) and goodness-of-fit tests. The means and standard deviations of the random input parameters were considered as the design variables in the RDO problem. The deformation of the front-L (Y 1) after the insertion process was represented by two discrete random fields for the freezer and refrigerator sides, where each random field is composed of six measurement locations in two-dimensional space.

Table 1 Statistical properties of random parameter variables in the insertion process

The foaming process further deforms the random deformation field (Y 1) of the front-L into the new random deformation field (Y 2) as defined in Eq. (18), where Θ foaming is the front-L deformation caused by the foaming process. The realizations of the front-L deformation were made by measuring the actual deformation before and after the foaming injection process and subtracting the Y 1 from Y 2 as shown in Eq. (18). The proposed Bayesian Copula was used to model the random field (Θ foaming ) with nine realizations as shown in Fig. 10. Given the sufficient random field realizations sampled from the joint PDF of the random field, the POD method was then used to characterize the foaming deformation with six important field signatures and random field variables (V 1R to V 6R and V 1F to V 6F) as shown in Table 2. Their statistical properties were determined using the maximum likelihood estimation (MLE) and goodness-of-fit tests. The probability analysis for the door misalignment involves twelve random field variables defined in Table 2 and eight random parameter variables defined in Table 1. Similar to the random parameter variables, these random field variables were also considered as the design variables where their means and standard deviations were adjustable.

Fig. 10
figure 10

Foaming deformation for both freezer and refrigerator sides

Table 2 Statistical properties of random field variables for the foaming deformation
$$ {Y}_2={Y}_1+{\varTheta}_{foaming} $$
(18)

In the last process, the hinges were installed to the deformed front-L (Y 2) at both freezer and refrigerator sides as shown in Fig. 11, where Y 3 is the horizontal distance from the hinge shaft to the hinge edge assembled together with the deformed front-L (Y 2). Difference of the distance Y 3 between the freezer and refrigerator sides causes the door misalignment Y 4 defined in Eq. (19).

Fig. 11
figure 11

Side view of the hinge installation

$$ {Y}_4={Y}_{3F}\hbox{--} {Y}_{3R} $$
(19)

The horizontal distance (Y 3F and Y 3R ) were calculated by Eqs. (20) and (21).

$$ {Y}_{3F}={X}_{5F} \cos \left(\alpha \left({Y}_{2F}\right)\right)-d\left({Y}_{2F}\right) $$
(20)
$$ {Y}_{3R}={X}_{5R} \cos \left(\alpha \left({Y}_{2R}\right)\right)-d\left({Y}_{2R}\right) $$
(21)

where X 5 is the distance from the hinge shaft to the edge, α and d are the tilted hinge angle and thickness respectively which are determined by the deformed front-L (Y 2). Specifically, X 5F  ~ Weibull (52.9525, 105.7080) and X 5R  ~ Lognormal (3.9814, 0.0097) using the goodness-of-fit test with sufficient sample measurement data.

In summary, to execute the probability analysis of the door misalignment with a large number of random variables—ten random parameter variables and twelve random field variables, an efficient probability analysis method must be employed. The EDR method was used with 2 N’ + 1 eigenvector samples (45 analyses) for probability analysis of the door misalignment, where N’ indicates the total number of random variables, i.e. random parameter and random field variables. As shown in Fig. 12, the proposed method for random field modeling and probability analysis accurately predicted the PDF of the door misalignment compared with the normalized histogram obtained from MCS.

Fig. 12
figure 12

Prediction of the door misalignment considering the random field and parameter variables

5.2 Robust design optimization of the door misalignment

This section presents the RDO problem (Youn et al. 2005) to minimize the door misalignment and its variability in the presence of many random parameter and field variables in the refrigerator assembly process. The RDO problem was formulated to minimize the variability of the door misalignment (Y 4) subject to the expense constraint ($30,000) for the process design by managing forty four design variables, composed of the means and standard deviations of the twenty two random design variables.

$$ \begin{array}{l}\mathrm{minimize}\kern0.5em {\mu}_{y4}\left({\boldsymbol{\upmu}}_{\mathbf{X}},{\boldsymbol{\upsigma}}_{\mathbf{X}},{\boldsymbol{\upmu}}_{\mathbf{V}},{\boldsymbol{\upsigma}}_{\mathbf{V}}\right)+6{\sigma}_{y4}\left({\boldsymbol{\upmu}}_{\mathbf{X}},{\boldsymbol{\upsigma}}_{\mathbf{X}},{\boldsymbol{\upmu}}_{\mathbf{V}},{\boldsymbol{\upsigma}}_{\mathbf{V}}\right)\\ {}\mathrm{subject}\ \mathrm{t}\mathrm{o}\kern0.5em G\left({\boldsymbol{\upmu}}_{\mathbf{X}},{\boldsymbol{\upsigma}}_{\mathbf{X}},{\boldsymbol{\upmu}}_{\mathbf{V}},{\boldsymbol{\upsigma}}_{\mathbf{V}}\right)\le 30,000\\ {}\kern4.25em {{\boldsymbol{\upmu}}_{\mathbf{X}}}^{\mathbf{L}}\le {\boldsymbol{\upmu}}_{\mathbf{X}}\le {{\boldsymbol{\upmu}}_{\mathbf{X}}}^{\mathbf{U}},{{\boldsymbol{\upsigma}}_{\mathbf{X}}}^{\mathbf{L}}\le {\boldsymbol{\upsigma}}_{\mathbf{X}}\le {{\boldsymbol{\upsigma}}_{\mathbf{X}}}^{\mathbf{U}}\\ {}\kern5em {{\boldsymbol{\upmu}}_{\mathbf{V}}}^{\mathbf{L}}\le {\boldsymbol{\upmu}}_{\mathbf{V}}\le {{\boldsymbol{\upmu}}_{\mathbf{V}}}^{\mathbf{U}},{{\boldsymbol{\upsigma}}_{\mathbf{V}}}^{\mathbf{L}}\le {\boldsymbol{\upsigma}}_{\mathbf{V}}\le {{\boldsymbol{\upsigma}}_{\mathbf{V}}}^{\mathbf{U}}\end{array} $$

The expense function G was defined as the sum of the four expense components to change the means and standard deviations of the random parameter and field variables in Eq. (22).

$$ G\left({\boldsymbol{\upmu}}_{\mathbf{X}},{\boldsymbol{\upsigma}}_{\mathbf{X}},{\boldsymbol{\upmu}}_{\mathbf{V}},{\boldsymbol{\upsigma}}_{\mathbf{V}}\right)={G}_1\left({\boldsymbol{\upmu}}_{\mathbf{X}}\right)+{G}_2\left({\boldsymbol{\upmu}}_{\mathbf{V}}\right)+{G}_3\left({\boldsymbol{\upsigma}}_{\mathbf{X}}\right)+{G}_4\left({\boldsymbol{\upsigma}}_{\mathbf{V}}\right) $$
(22)

It was assumed that $1,000 and $3,000 are the expenses to change each mean value of the random parameter and field variable, respectively. G 1(μ x ) is thus calculated as the multiplication of $1,000 with the number of changed mean values for the random parameter variable. G 2(μ V ) is calculated as the multiplication of $3,000 with the number of changed mean values for the random field variable. For example, $10,000 is required to change the means of all ten random parameter variables from the baseline design to the optimal design. It was assumed that the linear cost functions in Eqs. (23) and (24) are used to estimate the expenses to change the standard deviations of the random parameter and field variable, respectively.

$$ {G}_3\left({\boldsymbol{\upsigma}}_{\mathbf{X}}\right)={\displaystyle \sum_{i=1}^{10}{R}_i\left({\sigma}_{X_i,0}/{\sigma}_{X_i, new}-1\right)} $$
(23)
$$ {G}_4\left({\boldsymbol{\upsigma}}_{\mathbf{V}}\right)={\displaystyle \sum_{j=1}^{12}{S}_j\left({\sigma}_{V_j,0}/{\sigma}_{V_j, new}-1\right)} $$
(24)

where σ Xi,0 and σ Xi,new denote the baseline and new standard deviations of the i th random parameter variable, respectively; σ Vj,0 and σ Vj,new stand for the baseline and new standard deviations of the j th random field variable, respectively; it was assumed that the coefficients R i and S j are $2,000 and $6,000, respectively.

The EDR method was conducted to evaluate the mean and standard deviation of the door misalignment at each design iteration. The sequential quadratic programming was used as an optimizer in the RDO problem. The objective function was reduced from 9.837 mm to 3.614 mm with the investment of $30,000 budget after eleven design iterations as shown in Table 3. The PDFs of the door misalignment are compared for the baseline and optimal designs, as shown in Fig. 13.

Table 3 Design history of the objective function and expense constraint
Fig. 13
figure 13

PDF comparison of the door misalignment at the initial and optimal designs

6 Conclusion

This paper addresses the random field modeling with insufficient field realizations for probability analysis and design of engineering problems. The existing random field methods do not work when random field data are lacking. Random field modeling essentially becomes a process to identify an n-dimensional joint probability density function (PDF) of the random field with insufficient random realizations. This paper thus proposed the Bayesian Copula based random field modeling technique for probability analysis and robust design optimization (RDO). The proposed approach employs a Bayesian Copula for the joint PDF modeling because the Copula can model bivariate distributions with various statistical dependence patterns. The joint PDF modeling can be conducted in three steps: i) determination of the optimal marginal distributions, ii) determination of the optimal Copulas, and iii) joint PDF modeling of the random field. Bayesian statistics is employed to determine the optimal marginal distribution for the random field realizations at each measurement location. Given the optimal marginal distributions, this study uses the Bayesian approach for the reliable determination of the optimal Copula. It is noted that the proposed approach can handle not only non-Gaussian random field but also insufficient field realizations. By simulating the joint PDF of the random field, sufficient random field samples can be generated out of limited field data. Given the sufficient random field samples, the existing random field characterization methods like the POD method can be used to characterize the random field as a function of a few random field variables and deterministic field signatures. Two examples including the mathematical problem and the refrigerator assembly problem demonstrated that the proposed approach is effective for probability analysis and design while accounting for insufficiency of random field realizations.

The proposed approach requires n − 1 statistical dependence modeling of random field realizations using two-dimensional Copulas, where n is the number of the spatial measurement locations in the random field. As the number of the measurement locations increases, modeling the joint PDF of a random field becomes computationally more expensive. Hence, the proposed approach has a limitation in use when the random field is characterized with a large amount (n) of spatial measurement locations. This will be addressed in our future research.