1 Introduction

The objective of the Taguchi parameter design (Taguchi and Yokoyama 1993) is to determine optimal settings of design parameters such that the performance characteristics become robust to uncontrollable noise variables. It utilizes orthogonal arrays as experimental designs to study a large number of variables with a relatively small number of experimental runs. Once experimental data are collected, a performance measure called the signal-to-noise (SN) ratio is calculated for each run and used to determine the optimal settings of design parameters. Taguchi method is traditionally used to deal with single characteristic parameter design (SCPD) problems. Even though problems with multiple characteristics are more common in practice, they are much more difficult to solve compared to SCPD problems. In fact, a higher SN ratio for one characteristic may correspond to a lower SN ratio for another in determining optimal settings of design parameters. Consequently, an overall evaluation of multiple SN ratios is required for solving multi-characteristic parameter design (MCPD) problems.

Principal component analysis (PCA) has been widely used for solving MCPD problems. PCA is a dimensionality reduction technique that linearly transforms a number of possibly correlated variables into a small number of uncorrelated variables called principal components (Jolliffe 2002). For solving MCPD problems, Su and Tong (1997) applied PCA to multiple SN ratios and then selected principal component scores which correspond to the eigenvalues greater than 1 as aggregate performance measures. However, if more than one principal component score is selected, the problem still involves multiple performance measures, not a single aggregated one. Moreover, the ignored principal component scores which correspond to the eigenvalues less than 1 might still contain useful information. To overcome these problems, Jean and Wang (2006) used all principal component scores to construct a single aggregate performance measure by taking the logarithm of the sum of all principal component scores. Liao (2006) proposed the weighted sum of principal component scores as a single aggregate performance measure. The weight of each principal component score is chosen as the ratio of the variance of each principal component to the total variance of all principal components. Datta et al. (2009) proposed a single aggregate performance measure by taking the geometric mean of principal component scores. Sibalija and Majstorovic (2009) performed the grey relational analysis (GRA) (Wang et al. 1996), another popular dimensionality reduction method, on all principal component scores to construct a single aggregate performance measure.

The above PCA-based methods have shown good performance in solving MCPD problems by converting them to SCPD ones. However, if the data have complicated structures that cannot be well represented in a linear subspace, the original PCA may not work well for MCPD problems. To address this problem, a kernel PCA-based method is proposed in this paper. The kernel PCA (Schölkopf et al. 1998), a generalization of the original PCA, allows nonlinear feature extraction using kernel methods. By employing the kernel PCA, an effective single aggregate performance measure can be constructed even if multiple performance characteristics are nonlinearly related with each other.

Even though this paper focuses on PCA-based approaches, it is worth mentioning other approaches for MCPD problems. The GRA was first employed in Lin and Tarng (1998) and has been applied for solving MCPD problems in various fields by Lin and Lin (2002), Tarng et al. (2002), Lin (2004), Tzeng et al. (2009), Jung and Kwon (2010) and Yang et al. (2014). Artificial intelligence techniques have been also used, including artificial neural network (ANN) (Sukthomya and Tannock 2005; Tsao and Hocheng 2008), genetic algorithm (GA) (Forouraghi 2000; Jeyapaul et al. 2006; Yildiz et al. 2007), and fuzzy theory (Tong and Su 1997; Lin et al. 2000; Sharma et al. 2011). Some dimensionality reduction approaches have been combined with another as a hybrid approach. They include a PCA-GRA approach by Sibalija and Majstorovic (2009), ANN-Fuzzy approach by Antony et al. (2006), ANN-PCA approach by Hsu (2001), GRA-Fuzzy approach by Lin and Lin (2005), ANN-GA approach by Huang and Tang (2006), and PCA-GRA-ANN-GA approach by Sibalija and Majstorovic (2012).

The remainder of this paper is organized as follows. Section 2 introduces the PCA and kernel PCA. A brief account of the PCA is given in Sect. 2.1. A basic concept of the kernel function and the kernel PCA algorithm are introduced in Sects. 2.2 and 2.3, respectively. Then, the kernel PCA-based method for MCPD problems is proposed in Sect. 3. The proposed method is applied to simulated and real experimental data in Sects. 4 and 5, respectively. The paper concludes with a discussion in Sect. 6.

2 PCA and kernel PCA

2.1 PCA

Using orthogonal linear projection, PCA transforms a set of observations of possibly correlated variables into a set of the values of uncorrelated variables called principal components. The first principal component is required to have the largest possible variance, and the second principal component has the second largest variance not represented by the first principal component, and so on. The values of these new variables are called principal component scores which are actually the projections of the observations onto the principal component space. By representing data with only the first few principal components, the dimensionality of the data can be reduced.

PCA is performed as follows. Let \(\mathbb {R}^{p}\) be the space of \(p\)-dimensional real vectors. Suppose that \(\mathbf{x}_i =(x_{i1}, x_{i2},\ldots ,x_{ip} {)}'\in \mathbb {R}^{p}\) is centered as

$$\begin{aligned} \tilde{\mathbf{x}}_i =(x_{i1} -\bar{{x}}_1, x_{i2} -\bar{{x}}_2, \ldots , x_{ip} -\bar{{x}}_p {)}', i=1, \ldots , n \end{aligned}$$

where \(\bar{{x}}_j =\frac{1}{n}\mathop {\sum }\limits _{i=1}^n {x_{ij} } , j=1, \ldots , p\). A sample variance-covariance matrix of \(\tilde{\mathbf{x}}_i \) is defined as

$$\begin{aligned} {\hat{\varvec{\Sigma }}}_{\tilde{\mathbf{x}}} =\frac{1}{n-1}\sum _{i=1}^n {\tilde{\mathbf{x}}_i \tilde{\mathbf{x}}_i ^{\prime }}. \end{aligned}$$

Then, the eigenvalue problem for \({\hat{\varvec{\Sigma }}}_{\tilde{\mathbf{x}}} \) is constructed as follows.

$$\begin{aligned} {\hat{\varvec{\Sigma }}}_{\tilde{\mathbf{x}}} \mathbf{v}_j =\lambda _j \mathbf{v}_j , j=1, \ldots , p \end{aligned}$$
(1)

where \(\lambda _j \) and \(\mathbf{v}_j =(v_{1j} ,v_{2j} ,\ldots ,v_{pj})'\) are the \(j\)th largest eigenvalue and the corresponding eigenvector of \({\hat{\varvec{\Sigma }}}_{\tilde{\mathbf{x}}} \), respectively. \(\mathbf{v}_j \) is also called the \(j\)th principal component. Then, \(\tilde{\mathbf{x}}_i \) is transformed by orthogonal linear projection as follows.

$$\begin{aligned} t_{ij} =\mathbf{v}'_j \tilde{\mathbf{x}}_i =v_{1j} \tilde{x}_{i1} +v_{2j} \tilde{x}_{i2} +\cdots +v_{pj} \tilde{x}_{ip} , j=1, \ldots , p \end{aligned}$$

where \(t_{ij} \) is a linearly projected value on \(\mathbf{v}_j \) and is called a principal component score.

When the data are geometrically distributed as an ellipsoidal form (e.g., normal distribution), it can be easily represented by linear principal components. In practice, however, the data may have a complicated structure that cannot be explained well enough by conventional linear principal components. An example of this situation is illustrated in Fig. 1a in which the first principal component of the data is indicated as a solid line. It appears that the first principal component does not adequately explain the major structure of the data. In other words, when the data have complicated structures, the original PCA, which only allows linear extraction, may result in an unreasonable representation of the data. The kernel PCA is a nonlinear generalization of the original PCA. In the kernel PCA, the data in the original space are mapped into a higher dimensional feature space where the data can be linearly modeled (see Fig. 1b). Then, the original PCA can be performed in the feature space via the so called kernel trick (see Sect. 2.2), and the linearly extracted principal components in the feature space can better explain the nonlinear structure of the data in the original space (see Fig. 1c).

Fig. 1
figure 1

Basic idea of kernel PCA

2.2 Kernel functions and kernel tricks

If the data in the original input space are transformed into potentially much higher dimensional feature space through a nonlinear mapping, then the nonlinear relations of the data in the original space may be discovered using linear learning algorithms in the feature space. Define such a nonlinear mapping \(\Phi \) as follows.

$$\begin{aligned} \Phi :\mathbb {R}^{p}\rightarrow F \\ \mathbf{x} \mapsto \Phi (\mathbf{x}) \end{aligned}$$

where \(F\) is the feature space, the dimension of which is higher than that of the original space, \(\mathbf{x}\) is the \(p\)-dimensional input data, and \(\Phi (\mathbf{x})\) is the mapped data by a nonlinear mapping function \(\Phi (\cdot )\).

In general, the feature space has a very high or even infinite dimension, and therefore, it is cumbersome to construct the mapping function \(\Phi (\cdot )\) and evaluate the mapped data. Fortunately, many linear algorithms including the PCA can be reformulated in such a way that the inner product arises naturally (Bishop 2006). The inner product in the feature space can be calculated directly as a function of the original input. Consider the following mapping.

$$\begin{aligned} \Phi :(x_1 ,x_2 )'\rightarrow (x_1^2 ,\sqrt{2}x_1 x_2 ,x_2^2 )' \end{aligned}$$

Then, the inner product in the feature space can be reformulated in terms of an algebraic expression in the original space as follows (Müller et al. 2001).

$$\begin{aligned} \Phi (\mathbf{x}{)}'\Phi (\mathbf{y})= & {} \left( {x_1^2 ,\sqrt{2}x_1 x_2 ,x_2^2 } \right) \left( {y_1^2 ,\sqrt{2}y_1 y_2 ,y_2^2 } \right) ^{\prime } \nonumber \\= & {} \left( {\left( {x_1 ,x_2 } \right) \left( {y_1 ,y_2 } \right) ^{\prime }} \right) ^{2} \nonumber \\= & {} \left( {\mathbf{{x}'y}} \right) ^{2}\nonumber \\= & {} k(\mathbf{x},\mathbf{y}) \end{aligned}$$
(2)

where \(k(\mathbf{x},\mathbf{y})\) is called a kernel function. The inner product of mapped data is replaced by the kernel function in the original space, and therefore, the inner product in the feature space can be calculated in the original space without performing the nonlinear mapping \(\Phi (\cdot )\). This procedure is called the kernel trick (Schölkopf and Smola 2002).

There are various kernel functions, and the performance of kernel-based algorithms depends on the type of kernel functions. Commonly used kernel functions are summarized in Table 1. The Mercer’s theorem guarantees that there exists a mapping \(\Phi (\cdot )\) which satisfies \(k(\mathbf{x},\mathbf{y})=\Phi (\mathbf{x}{)}'\Phi (\mathbf{y})\) for such kernel functions (Müller et al. 2001).

Table 1 Common kernel functions

2.3 Kernel PCA

The kernel PCA algorithm proceeds as follows Schölkopf et al. (1998). Suppose that \(\mathbf{x}_i \in \mathbb {R}^{p},\, i=1, \ldots , n ,\) are mapped into a feature space as \(\Phi (\mathbf{x}_i ),\, i=1, \ldots , n\). Assume further that \(\Phi (\mathbf{x}_i )\)’s are centered. For notational convenience, the same notation \(\Phi (\mathbf{x}_i ),\, i=1, \ldots , n ,\) will be used to denote \(\Phi (\mathbf{x}_i )\)’s after centering. Then, the following holds.

$$\begin{aligned} \sum _{i=1}^n {\Phi (\mathbf{x}_i )} =\mathbf{0} \end{aligned}$$
(3)

In addition, the sample variance-covariance matrix \({\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} \) in the feature space is defined as

$$\begin{aligned} {\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} =\frac{1}{n}\sum _{i=1}^n {\Phi (\mathbf{x}_i )\Phi (\mathbf{x}_i {)}'} \end{aligned}$$
(4)

In order to perform the PCA in the feature space, the eigenvalue \(\lambda \) and the corresponding eigenvector v for \({\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} \) are computed by solving the following eigenvalue problem:

$$\begin{aligned} {\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} \mathbf{v}=\lambda \mathbf{v} \end{aligned}$$
(5)

Since an eigenvector v lies in the span of \(\Phi (\mathbf{x}_1 ), \ldots ,\Phi (\mathbf{x}_n )\), the following holds.

$$\begin{aligned} \mathbf{v}=\sum _{l=1}^n {\alpha _l \Phi (\mathbf{x}_l )} \end{aligned}$$
(6)

Multiplying both sides of Eq. (5) by \(\Phi (\mathbf{x}_k )\) yields

$$\begin{aligned} \Phi (\mathbf{x}_k {)}'{\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} \mathbf{v}=\lambda \Phi (\mathbf{x}_k {)}'\mathbf{v}, k=1, \ldots , n \end{aligned}$$
(7)

Inserting Eqs. (4) and (6) into Eq. (7) yields

$$\begin{aligned} \mathbf{K}^{2}\varvec{\upalpha }=n\lambda \mathbf{K}\varvec{\upalpha } \end{aligned}$$
(8)

where \(\mathbf{K}\), which is called the gram matrix, is defined as

$$\begin{aligned} \mathbf{K}=\left[ {{\begin{array}{lll} {\Phi (\mathbf{x}_1 {)}'\Phi (\mathbf{x}_1 )}&{} \cdots &{} {\Phi (\mathbf{x}_1 {)}'\Phi (\mathbf{x}_n )} \\ \vdots &{} \ddots &{} \vdots \\ {\Phi (\mathbf{x}_n {)}'\Phi (\mathbf{x}_1 )}&{} \cdots &{} {\Phi (\mathbf{x}_n {)}'\Phi (\mathbf{x}_n )} \\ \end{array} }} \right] \end{aligned}$$
(9)

and \(\varvec{\upalpha }=(\alpha _1 , \ldots , \alpha _n {)}'\). Solving a generalized eigenvalue problem in Eq. (8) is equivalent to solving the following eigenvalue problem:

$$\begin{aligned} \mathbf{K}{{\varvec{\upalpha }} }=n\lambda \varvec{\upalpha } \end{aligned}$$
(10)

Since \(\mathbf{K}\) is positive semidefinite, there are \(n\) nonnegative eigenvalues and the corresponding eigenvectors. Let \(n\lambda _k \) be the \(k\)th largest eigenvalue and \(\varvec{\upalpha }_k =(\alpha _{1k} , \ldots , \alpha _{nk} {)}'\) be the corresponding eigenvector for \(k=1, \ldots , n\). Then, \(\varvec{\upalpha }_k \) is normalized by normalizing the eigenvector \(\mathbf{v}_k \) in the feature space as follows.

$$\begin{aligned} 1={\mathbf{v}'}_k \mathbf{v}_k =\sum _{i,l=1}^n {\alpha _{ik} \alpha _{lk} } (\Phi (\mathbf{x}_i )'\Phi (\mathbf{x}_l ))={\varvec{\upalpha }'}_k \mathbf{K}\varvec{\upalpha }_k =n\lambda _k ({\varvec{\upalpha }'}_k \varvec{\upalpha }_k ) \end{aligned}$$

In order to extract the principal component score \(t_{ik} ,\, \Phi (\mathbf{x}_i )\) is projected onto the eigenvector \(\mathbf{v}_k \) as follows [see Eq. (6)].

$$\begin{aligned} t_{ik} =\mathbf{{v}'}_k \Phi (\mathbf{x}_i )=\sum _{l=1}^n {\alpha _{lk} \Phi (\mathbf{x}_l {)}'\Phi (\mathbf{x}_i )} =\sum _{l=1}^n {\alpha _{lk} \mathbf{K}(\mathbf{x}_l ,\mathbf{x}_i )} , k=1, \ldots , n \end{aligned}$$
(11)

where \(\mathbf{K}(\mathbf{x}_l ,\mathbf{x}_i )\) is the \((l,\, i)\)th element of the gram matrix, namely, \(\Phi (\mathbf{x}_l {)}'\Phi (\mathbf{x}_i )\). As shown in Eq. (2), \(\Phi (\mathbf{x}_l {)}'\Phi (\mathbf{x}_i )\) can be replaced by a kernel function \(k(\mathbf{x}_l ,\mathbf{x}_i )\) such as the one in Table 1, and therefore, the explicit form of \(\Phi (\mathbf{x}_i )\) is not required.

In the PCA algorithm, it is assumed that the mapped data are centered as in Eq. (3). However, the explicit form of \(\Phi (\cdot )\) is unknown in practice, and therefore, the mean of uncentered \(\Phi (\mathbf{x}_i )\) cannot be calculated. This implies that the matrix K in Eq. (9), which is based on centered \(\Phi (\mathbf{x}_i )\)’s, cannot be constructed directly. Fortunately, however, K can be constructed as follows without actually centering \(\Phi (\mathbf{x}_i )\)’s (Schölkopf et al. 1998).

$$\begin{aligned} \mathbf{K}=\mathbf{K}_u -\mathbf{1}_n \mathbf{K}_u -\mathbf{K}_u \mathbf{1}_n +\mathbf{1}_n \mathbf{K}_u \mathbf{1}_n \end{aligned}$$
(12)

where \(\mathbf{K}_u \) is the gram matrix with uncentered \(\Phi (\mathbf{x}_i )\)’s and its \((l,\, i)\)th element is \(k(\mathbf{x}_l ,\mathbf{x}_i ),\) and \((\mathbf{1}_n )_{ij} =1/n\).

The procedure of the kernel PCA to obtain the principal component scores requires to solve a similar eigenvalue problem as in Eq. (1) for the original PCA. The only difference is that the kernel PCA has to deal with a gram matrix \(\mathbf{K}\) instead of a sample variance-covariance matrix \({\hat{\varvec{\Sigma }}}_{\tilde{\mathbf{x}}} \). The kernel PCA algorithm is applied to solve MCPD problems in Sect. 3.

3 Proposed method

A kernel PCA-based method is developed to deal with MCPD problems. The proposed method allows to capture nonlinear relationships among multiple performance characteristics in constructing a single aggregate performance measure. The proposed method proceeds according to the following steps.

(1) Calculate the SN ratio for each performance characteristic

Let \(y\) be a performance characteristic. Then, the Taguchi SN ratio is calculated for each \(y\). Depending on the type of \(y\), the expected loss and the corresponding SN ratio are defined in a different manner as in Table 2 (Yum et al. 2013).

Table 2 Expected loss and SN ratio for each type of performance characteristic

For \(y\) of the NB type, it is assumed that the mean is adjusted to the target using an adjustment parameter, and therefore, the expected loss after adjustment is considered. Note from Table 2 that maximizing an SN ratio is equivalent to minimizing the corresponding expected loss (after adjustment when applicable).

(2) Standardize the SN ratio

For each performance characteristic, the estimated SN ratio is standardized as follows.

$$\begin{aligned} SN_{ij}^N =\frac{SN_{ij} -\overline{SN}_j }{SN_j^s } \end{aligned}$$
(13)

where \(SN_{ij} \) is the SN ratio at the \(i\)th experimental run for the \(j\)th performance characteristic for \(i=1,\ldots ,n\) and \(j=1,\ldots ,p ,\) and \(\overline{SN}_j \) and \(SN_j^s \) are respectively the mean and standard deviation of \(SN_{ij} \)’s for the \(j\)th performance characteristic.

(3) Perform the kernel PCA on the standardized SN ratios

Let \(\mathbf{x}_i =(SN_{i1}^N , SN_{i2}^N , \ldots , SN_{ip}^N {)}'\) for \(i=1, \ldots , n.\) Then, the matrix \(\mathbf{K}_u \) is constructed using a selected kernel function, and subsequently \(\mathbf{K}\) is constructed using Eq. (12). Any commonly used kernel functions in Table 1 can be chosen. The eigenvector \(\varvec{\upalpha }_k =(\alpha _{1k} , \ldots , \alpha _{nk} {)}',\, k=1, \ldots ,n ,\) for \(\mathbf{K}\) is calculated by solving the eigenvalue problem in Eq. (10). Principal component scores \(t_{ik} ,\, k=1, \ldots ,n ,\) are then calculated using Eq. (11), and combined to have an aggregate performance measure (APM) as follows.

$$\begin{aligned} APM_i =\sum _{k=1}^n {w_k t_{ik} } \end{aligned}$$
(14)

where \(APM_i \) is the value of the aggregate performance measure at the \(i\)th experimental run, \(w_k \) is the proportion of variance explained by the eigenvector corresponding to the \(k\)th largest eigenvalue for \(\mathbf{K}\). That is,

$$\begin{aligned} w_k =\frac{n\lambda _k }{\sum _{k=1}^n {n\lambda _k } }=\frac{\lambda _k }{\sum _{k=1}^n {\lambda _k } } \end{aligned}$$

where \(\lambda _k \) is the \(k\)th largest eigenvalue of \({\hat{\varvec{\Sigma }}}_{\Phi (\mathbf{x})} \).

If some \(\lambda _k \)’s are small (e.g., see the simulated example in Sect. 4), the contributions of the corresponding \(t_{ik} \)’s to the \(APM_i \) are negligible, and therefore, the corresponding \(t_{ik} \)’s can be ignored when calculating the \(APM_i\) for computational convenience. Also note that for \(y\) of the NB type, the adjustment is considered when the corresponding SN ratio is calculated before the kernel PCA is performed on the standardized SN ratios.

(4) Determine the optimal levels of design parameters

The \(APM_i \) can be statistically analyzed (e.g., analysis of variance) to identify those design parameters that have a significant effect on it. The optimal level of such a significant parameter is chosen as the one at which the \(APM_i \) is maximized. The level of a non-significant parameter may be selected based on such non-statistical factors as the ease of operation, cost for maintaining the level, etc. If such non-statistical information is not available or if the levels of a design parameter are indifferent with respect to those non-statistical factors, then the optimal level of a design parameter may be simply chosen as the one at which the APM is maximized (e.g., see the simulated example in Sect. 4).

4 Simulated example

The proposed method is applied to the simulated example for \(L_{18}\) orthogonal array experimental design. The homogeneous polynomial kernel is employed since it allows sufficient flexibility in capturing nonlinear structures of the data (Ben-Hur and Weston 2010). Also notice that the kernel PCA which employs the homogeneous polynomial kernel with the degree \(d = 1\) reduces to the original PCA. Although the simulated examples in this section adopted the homogeneous polynomial kernel, it is also possible to employ other kernel functions.

Suppose that there are two performance characteristics \(y_j ,\, j=1, 2\), of which \(y_1 \) is an NB type and \(y_2 \) is an SB type. Experimental data are generated using the \(L_{18} \) orthogonal array for six design parameters (A, B, C, D, E, and F) with three levels each. Interaction effects among design parameters are assumed to be negligible. Let \(y_{1,ijklmn} \) be the observation of \(y_1 \) when the design parameters A, B, C, D, E, and F are at the levels \(i,\, j,\, k,\, l,\, m\), and \(n\) (i.e., \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n )\), respectively. Then, \(y_{1,ijklmn} \) is generated from a normal distribution as follows.

$$\begin{aligned} y_{1,ijklmn} \sim N(\mu _{1,ijklmn} , \sigma _{1,ijklmn}^2 ) \end{aligned}$$
(15)

where \(\mu _{1,ijklmn} \) and \(\sigma _{1,ijklmn} \) are the true mean and standard deviation of \(y_1 \), respectively, at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n \). The Taguchi SN ratio for the NB type performance characteristic depends on \(\sigma /\mu \) (see Table 2), and therefore, in generating simulated data for \(y_1 ,\, (\sigma _1 /\mu _1 )_{ijklmn} \) and \(\mu _{1,ijklmn} \) are specified, and \(\sigma _{1,ijklmn} \) is determined using the following relationship.

$$\begin{aligned} \sigma _{1,ijklmn} =(\sigma _1 /\mu _1 )_{ijklmn} \times \mu _{1,ijklmn} \end{aligned}$$

When the generated \(y_{1,ijklmn} \) is negative, it is replaced with 0. Similarly, let \(y_{2,ijklmn} \) be the observation of \(y_2 \) at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n\). Then, \(y_{2,ijklmn} \) is generated from a normal distribution as follows.

$$\begin{aligned} y_{2,ijklmn} \sim N(\mu _{2,ijklmn} , \sigma _{2,ijklmn}^2 ) \end{aligned}$$
(16)

where \(\mu _{2,ijklmn} \) and \(\sigma _{2,ijklmn} \) are the true mean and standard deviation of \(y_{2,ijklmn} \), respectively, at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n\). When the generated \(y_{2,ijklmn} \) is negative, it is replaced with 0.

The effects of design parameters on \(\sigma _1 /\mu _1 ,\, \mu _1 ,\, \sigma _2 \) and \(\mu _2 \) are assumed to be additive. For example, \(\sigma _1 /\mu _1 \) at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n \) is assumed as

$$\begin{aligned} (\sigma _1 /\mu _1 )_{ijklmn} =m+a_i +b_j +c_k +d_l +e_m +f_n ,\quad i,j,k,l,m,n=1,2,3 \end{aligned}$$
(17)

where \(m\) is the overall mean, and \(a_i ,\, b_j ,\, c_k ,\, d_l ,\, e_m \), and \(f_n \) denote the effects of respective design parameters on \((\sigma _1 /\mu _1 )_{ijklmn} \) at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n \), and it is assumed that \(\sum _{i=1}^3 {a_i } =\sum _{j=1}^3 {b_j } =\sum _{k=1}^3 {c_k } =\sum _{l=1}^3 {d_l } =\sum _{m=1}^3 {e_m } =\sum _{n=1}^3 {f_n } =0\). Similarly, \(\mu _1 ,\, \sigma _2 \), and \(\mu _2 \) at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \hbox {F}_n \) are modeled in terms of the effects of design parameters as in Eq. (17). The effects of each design parameter on \(\sigma _1 /\mu _1 \) and \(\mu _1 \) for \(y_1 \), and on \(\sigma _2 \) and \(\mu _2 \) for \(y_2 \) are summarized in Tables 3a, b, respectively. ‘–’ indicates that the corresponding design parameter has no effect. Then, the true means and standard deviations at each run are determined as shown in Table 4.

Table 3 Effects of design parameters on (a) \(\sigma _1 /\mu _1 \) and \(\mu _1 \) of \(y_1 \), (b) \(\sigma _2 \) and \(\mu _2 \) of \(y_2 \)
Table 4 True means and standard deviations of simulated data

Using \(\mu \)’s and \(\sigma \)’s in Table 4, the true SN ratios can be calculated based on the formulas in Table 2. Figure 2 shows a scatter plot of the true standardized SN ratios of the two performance characteristics. It is observed that the standardized SN ratios do not have an ellipsoidal structure, which means that the structure of the simulated data may not be explained well by the principal components of the original PCA.

Fig. 2
figure 2

Scatter plot of standardized SN ratios of simulated data

To evaluate the performance of the proposed method, a total of 10 simulated datasets are generated. One dataset consists of nine replicates of \(y_1 \) and of \(y_2 \) at each experimental run, which are generated respectively by Eqs. (15) and (16) using the parameter values in Table 4. This section illustrates the procedure of the proposed method with a dataset which is shown in Tables 5a, b for \(y_1\) and \(y_2\), respectively. First, the sample mean, sample standard deviation, SN ratio, and standardized SN ratio for each performance characteristic at each run are calculated as in Table 6, where the SN ratios are estimated based on formula in Table 2 and then standardized using Eq. (13).

Table 5 Nine replicates of (a) \(y_1\) and (b) \(y_2\) in a dataset
Table 6 Sample means, sample standard deviations, estimated SN ratios, and standardized SN ratios of simulated data

Next, the kernel PCA is performed on the standardized SN ratios. The homogeneous polynomial kernel of degree \(d\) is chosen, and three cases where \(d = 1, 2\), and 3 are considered. When \(d = 1\), the kernel PCA is reduced to the original PCA. When \(d = 2\) and \(d = 3\), the data structure can be explained by quadratic and cubic polynomial models, respectively. For each case, the matrix \(\mathbf{K}_u \) is constructed using the homogeneous polynomial kernel in Table 1 and the matrix \(\mathbf{K}\) is constructed using Eq. (12). Then, the eigenvalue problem for \(\mathbf{K}\) is solved to find the eigenvalue \(\lambda _k \) and the corresponding eigenvector \(\varvec{\upalpha }_k \) for \(k=1,\ldots ,n\). A total of eighteen eigenvalues and the corresponding eigenvectors are actually obtained for \(\mathbf{K}\) since \(\mathbf{K}\) is a \(18\times 18\) matrix. However, the eigenvalues which are close to zero (less than \(10^{-10}\)in this example) and the corresponding eigenvectors are ignored for computational convenience since their contributions to the APM in Eq. (14) are negligible. Table 7 shows the computational results.

Table 7 Gram matrix \(\mathbf{K}\) and the solution of eigenvalue problem with respect to the degree of the homogeneous polynomial kernel

Now the standardized SN ratios are converted into the principal component scores using Eq. (11), which are then combined as the aggregate performance measure APM using Eq. (14). Table 8 shows the resulting principal component scores and APM. The ANOVA is performed on APM to identify statistically significant parameters. A design parameter whose \(F\) ratio is larger than two is considered to be statistically significant (Phadke 1989). When \(d = 1\) or 3, the design parameter E is identified as significant, while the design parameters A, C, and E are identified as significant when \(d = 2\). To determine the optimal levels of design parameters, the main-effect plots are constructed. Figure 3 shows the main-effect plots for APM with respect to the degree \(d\) of the homogeneous polynomial kernel. It is observed that the optimal levels of design parameters for \(d = 1, 2\), and 3 are \(\hbox {A}_1 \hbox {B}_1 \hbox {C}_2 \hbox {D}_2 \hbox {E}_1 \hbox {F}_1 ,\, \hbox {A}_1 \hbox {B}_1 \hbox {C}_2 \hbox {D}_1 \hbox {E}_1 \hbox {F}_3 \), and \(\hbox {A}_1 \hbox {B}_3 \hbox {C}_2 \hbox {D}_2 \hbox {E}_1 \hbox {F}_3 \), respectively.

Fig. 3
figure 3

The means of APM at each level of design parameters with respect to the degree of the homogeneous polynomial kernel

Table 8 Principal component scores \(t\) and APM of the simulated dataset with respect to the degree of the homogeneous polynomial kernel

The SN ratio for each performance characteristic at the optimal levels is calculated based on the formula in Table 2 to see whether the proposed method performs effectively for each performance characteristic. For example, for \(d = 1\), the SN ratio for each performance characteristic is calculated as follows.

$$\begin{aligned} \frac{\sigma _1 }{\mu _1 }= & {} 0.6-0.1-0.1-0.1=0.3\hbox { from Table 3(a)}, \\ \mu _2= & {} 10+2+2.5=14.5\hbox { and } \sigma _2 =0.6-0.05+0.1=0.65\hbox { from Table 3(b),} \\ SN_1= & {} -10\log \left( {\frac{\sigma _1^2 }{\mu _1^2 }} \right) =-10\log \left( {0.3^{2}} \right) \approx 10.4576, \\ SN_2= & {} -10\log \left[ {E\left( {y^{2}} \right) } \right] =-10\log \left( {\mu _2^2 +\sigma _2^2 } \right) =-10\log \left( {14.5^{2}+0.65^{2}} \right) \approx -23.2361. \end{aligned}$$

Similarly, \(SN_1 \) and \(SN_2 \) at the optimal levels for \(d = 2\) are calculated as 9.1186 and -22.2974, respectively, and for \(d = 3\) as 12.0412 and -22.2933, respectively.

The proposed method is then applied to the remaining nine datasets using the same procedures as above. Table 9 shows separate SN ratios for each performance characteristic at the optimal conditions. The largest SN ratio is italicized in each trial, and the kernel PCA produces largest SN ratios for both performance characteristics at trials 1, 2, 4, 6, 7, 9, and 10, while the original PCA never produces largest SN ratios simultaneously for both characteristics.

Table 9 Separate SN ratios for each performance characteristic at the optimal conditions with respect to the degree of the homogeneous polynomial kernel for \(L_{18} \) case

The proposed method is also applied to the case of \(L_9\) orthogonal array simulated experimental design under the same assumptions on the performance characteristics as for the \(L_{18}\) case. Table 10 shows the results for the \(L_9\) case. The largest SN ratio is italicized in each trial. It is observed that the kernel PCA-based method usually leads to larger SN ratio for each characteristic than those from the original PCA-based method.

Table 10 Separate SN ratios for each performance characteristic at the optimal conditions with respect to the degree of the homogeneous polynomial kernel for \(L_{9}\) case

5 Low-pressure cold spray process example

The proposed method is also applied to the analysis of the low-pressure cold spray (LPCS) process data (Goyal et al. 2013). The experiments were performed to determine the optimal levels of LPCS parameters with respect to three performance characteristics, namely, coating thickness \((y_1)\) which is the LB type, coating density \((y_2 )\) which is the LB type, and surface roughness \((y_3 )\) which is the SB type performance characteristic. Five LPCS parameters were considered using the \(L_{18} \) orthogonal array.

Figure 4 shows a scatter plot of standardized SN ratios of three performance characteristics. The estimated SN ratios for each type are standardized using Eq. (13). It is observed that the standardized SN ratios do not have an ellipsoidal structure, which means that the structure of the LPCS data may not be explained well enough by the principal components of the original PCA.

Fig. 4
figure 4

Scatter plot of standardized SN ratios of LPCS data

The homogeneous polynomial kernel is chosen, and the optimal levels of the design parameters are determined using the proposed method as illustrated in Sect. 4. Table 11 shows the eigenvalue \(\lambda _k \) and the corresponding eigenvector \(\varvec{\upalpha }_k \) for the solution of the gram matrix K in the LPCS process example. The eigenvalues which are close to zero (less than \(10^{-10}\) in this example) and the corresponding eigenvectors are ignored for computational convenience.

Table 11 Gram matrix \(\mathbf{K}\) and the solution of eigenvalue problem with respect to the degree of the homogeneous polynomial kernel for LPCS process example

Then, the principal component scores are calculated using Eq. (11), and they are combined as the aggregate performance measure APM using Eq. (14). Table 12 shows the APM values for each degree \(d\) of the homogeneous polynomial kernel.

Table 12 APM of the LPCS process data with respect to the degree of the homogeneous polynomial kernel

The optimal levels of design parameters are determined based on the main-effect plots of the APM as in Fig. 3. Since it is in general impossible for a reported case to evaluate the performance of the method in terms of the true parameter values, the SN ratio for each performance characteristic at the optimal condition is predicted instead. To predict the SN ratio for each performance characteristic, an additive model is used. For example, suppose that the optimal levels of the design parameters \(\hbox {A} \sim \hbox {E}\) are identified as \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \), and the design parameters A and B have a significant effect on the SN ratio of a performance characteristic. Then, the SN ratio of this performance characteristic at \(\hbox {A}_i \hbox {B}_j \hbox {C}_k \hbox {D}_l \hbox {E}_m \) is predicted as follows.

$$\begin{aligned} \widehat{SN}_{opt} =\overline{SN} +\left( {\overline{SN}_{A_i } -\overline{SN} } \right) +\left( {\overline{SN}_{B_j } -\overline{SN} } \right) \end{aligned}$$

where \(\overline{SN} \) is an overall mean of the SN ratios, \(\overline{SN}_{A_i} \) and \(\overline{SN}_{B_j } \) are the means of the SN ratios when the design parameters A and B are at levels \(i\) and \(j\), respectively.

The significance of the design parameters is identified by checking the F ratio from the results of the analysis of variance (ANOVA). In general, if the F ratio is less than two, the corresponding effect is considered insignificant. The F ratio larger than two indicates that the corresponding effect is not quite small, whereas the F ratio larger than four means that the corresponding effect is statistically significant (Phadke 1989). In this paper, a design parameter whose F ratio is larger than two is considered as statistically significant.

Table 13 shows the predicted SN ratio for each performance characteristic at the optimal condition. The ANOVA results indicate that all design parameters \(\hbox {A} \sim \hbox {E}\) have a significant effect on each SN ratio, and therefore, all of their effects are reflected to predict the SN ratios. The largest predicted SN ratio is italicized for each degree \(d\) of the homogeneous polynomial kernel.

Table 13 Predicted SN ratios for each performance characteristic at the optimal levels of LPCS parameters with respect to the degree of the homogeneous polynomial kernel

In Table 13, the original PCA \((d = 1)\) produces a larger predicted SN ratio for the first performance characteristic, while the kernel PCA \((d = 2, 3)\) yields larger predicted SN ratios for the second and third performance characteristics. Although neither the original PCA nor the kernel PCA produces consistently larger SN ratios for all performance characteristics, the kernel PCA-based method would be preferred unless the importance of the first performance characteristic is substantially higher than the others.

6 Conclusion

A kernel PCA-based method is developed to deal with the MCPD problems. The proposed method allows to capture possible nonlinear relationships among multiple performance characteristics in constructing a single aggregate performance measure, and therefore, is more flexible than the existing original PCA-based methods that only allow linear feature extraction.

Computational results for the problems with simulated data indicate that the kernel PCA-based method generally performs better than the original PCA-based method. Application of the proposed method to a real dataset also shows its potential for better performance than the original PCA.

The performance of the proposed method depends on the choice of a kernel function and its parameters. However, there are no universally accepted guideline on how to choose the proper kernel function (Lampert 2009). Instead, there are some heuristic approaches (e.g., see Hsu et al. 2003) for determining the kernel parameters using the so called cross validation. However, those approaches are usually applicable for large datasets, and may not be adequate for the experimental data in MCPD problems. Further research needs to be conducted to provide guidelines for selecting kernel parameters as well as kernel functions in solving various MCPD problems.

The proposed method was applied to the simulated and real datasets. For a more thorough evaluation, it is desired that the proposed method be tested and compared with other existing methods for the MCPD problems with various experimental designs and a diverse type and number of performance characteristics.