1 Introduction

Semiconductor manufacturing involves highly complex and lengthy wafer fabrication processes with 300–500 process steps and a large number of interrelated variables. As the feature size and critical dimension of integrated circuits (IC) continuously shrink to accommodate nano-generation, maintaining process specifications and controlling quality is increasingly difficult in semiconductor manufacturing (Chien and Hsu 2011). Thus, process monitoring and profile analysis are critical in detecting various abnormal events in wafer fabrication to enhance yield and control quality. Process monitoring and fault detection involve detecting abnormal processes and equipment including mean and variance shifts in one or more process variables, spikes, and drifts to quickly remove the assignable causes. However, process measurements are often insufficient, incomplete, or unreliable because of data quality or various other causes (Venkatasubramanian et al. 2003).

Therefore, there is a critical need in the semiconductor industry to effectively and efficiently monitor processes in real time and extract useful profile information to support process monitoring and fault detection. The extracted information assists engineers in understanding the process status and also in quickly removing abnormal behaviors and assignable causes. Good process monitoring leads to less downtime, improvement of production quality and reduction of manufacturing costs. With advanced information technology and metrology sensors, massive amounts of data are routinely collected during wafer processes such as data on temperature, pressure, flow rate, and power. The temporal patterns in these equipment parameters or process variables are signals of equipment behavior. Early detection and quick diagnosis of process faults are important to ensure tool effectiveness while process operation is controllable and to reduce yield loss.

With increasing demand for high-quality products and reliable processes, multivariate statistical process control (MSPC) is widely used in the manufacturing industry to ensure that equipment is “statistically controlled” by monitoring two or more related quality characteristics simultaneously (Montgomery 2005). Conventional univariate charts such as the Shewhart control chart, cumulative sum (CUSUM) control chart, and exponentially weighted moving average (EWMA) control chart have been used to monitor the deviation of key variables or parameter performance on the final product. However, univariate control charts have weaknesses. First, an engineer must monitor a large amount of charts with increasing complexity and manufacturing processes. Second, correlation variations between variables are difficult to be detected by univariate control charts. Third, the type I error increases with the number of charts. MSPC is used to monitor correlated variables with few control charts. The Hotelling T2 control chart is a popular MSPC method to detect the out-of-control signals. Although a T2 control chart is useful and powerful, it assumes that variables are normally distributed and independent, which does not occur in practice.

Projection methods are alternatives to MSPC, which reduce the dimensionality of process variables. Projection methods include principal component analysis (PCA) and partial least square (PLS) (Skagerberg et al. 1992; MacGregor et al. 1994; Kourti and MacGregor 1996; Ralson et al. 2001). Multi-way PCA (MPCA) integrates a time dimension into the PCA model. It was developed for fault detection in batch process monitoring (Wold et al. 1987b; Nomikos and MacGregor 1994). Although MSPC methods are used to detect process deviation real-time equipment monitoring is difficult because diagnosis procedures in real settings often rely on many manual actions by human operators.

To address the requirements of real settings, this study aims to develop a manufacturing intelligence approach for semiconductor fault detection and classification (FDC). It uses MPCA and data mining approaches to monitor and diagnose the semiconductor fabrication process. With advanced information technology, data mining approaches have been used to explore large databases automatically or semi-automatically and extract useful rules and patterns to improve decision quality (Chien et al. 2007). Manufacturing intelligence approaches have also been developed to derive decision rules to enhance operation efficiency and effectiveness (Chen and Chien 2011; Chien et al. 2010, 2011; Kuo et al. 2010). MSPC methods and data mining approaches can be used to discover and extract information from historical process data and can assist fault diagnosis and recovery. MPCA is used to unfold three-dimensional (3D) process data and reduce batch process data dimensions by rotating the original axis in the process data. A few principal components (PCs) are extracted to explain the maximal amount of variation. The D statistic and Q statistic control limits from the score space and residual space, respectively, are then constructed to detect the abnormal wafers. The set of process variables relevant to the detected abnormal events are then identified. Finally, decision trees are used to derive fault classification rules. To validate the proposed framework, an empirical study was conducted on the Chemical Vapor Deposition (CVD) process in a leading semiconductor company in Taiwan. The proposed approach and derived rules allowed the results of routine monitoring processes and information extracted from process data to be used to identify critical process variables and remove fault causes.

The study is organized as follows. Section 2 introduces a literature review of the relevant research on fault detection and classification and MSPC approaches. Section 3 presents a description of the proposed approach for semiconductor fault detection and classification. An empirical study in a leading semiconductor company is conducted to validate the proposed approach in Sect. 4. Section 5 offers a conclusion to the paper with a discussion of findings and future research directions.

2 Literature review

The notations and terminologies used in this paper are as follows:

i :

Wafer index

j :

Process measurement variable index

k :

Sample of a time interval index

n :

Number of observations

I :

Number of wafers

J :

Number of process variables

K :

Number of total samples recorded by time

X :

Three-dimensional historical process data array

X :

Unfolded two-dimensional data matrix

E :

Three-dimensional residual array

E :

Unfolded two-dimensional residual data matrix

2.1 Fault detection and classification

For real semiconductor fabrication facilities (fabs), process control and monitoring are necessary to ensure yield, and thus, profitability of large investments. There are four process monitoring procedures: fault detection, fault identification, fault diagnosis, and process recovery (Raich and Cinar 1996). Fault detection determines whether a fault has occurred in the process. Early detection gives engineers more time to perform appropriately to avoid serious equipment abnormality. Fault identification identifies the main effects on observation variables and concentrates on the process variables most relevant to diagnosing abnormalities. Fault diagnosis determines which fault has occurred, that is, the cause of the observed out-of-control status. Fault diagnosis determines the type, location, magnitude, and time of a fault (Isermann 1995). Process recovery removes the cause of the fault to reduce yield loss.

MSPC is widely used for chemical, biotechnical, polymer, and pharmaceutical applications. MSPC methods such as PCA and PLS have been used for many applications. Skagerberg et al. (1992) used PLS to predict polymer properties from measured temperature profiles in a tabular low-density polyethylene reactor and to interpret process behavior. MacGregor et al. (1994) developed a multiblock PLS method to detect and diagnose process faults. Wikströma et al. (1998) applied the multivariate process monitoring charts including a multivariate Shewart control chart, multivariate CUSUM control chart, and multivariate EWMA control chart in an electrolysis process. Ralson et al. (2001) developed a PCA model for process monitoring and fault diagnosis in chemical processes.

Several studies have examined fault detection and diagnosis applications in batch process monitoring. Nomikos and MacGregor (1994) proposed the MPCA method to monitor the batch process in a chemistry process. The MPCA method unfolds a 3D data matrix into a two-dimensional (2D) matrix and then performs PCA to explain variability among batches, process variables, and time. Wise et al. (1999) compared conventional PCA, MPCA, trilinear decomposition, and parallel factor analysis for fault detection in the semiconductor etching process. Yue et al. (2000) introduced batch process monitoring to semiconductor fabrication for plasma etchers by using emission spectra and the MPCA method to analyze multiple scan sensitivity within a wafer for several typical faults. Wise and Gallagher (1996) also used the MPCA and MSPC methods in process monitoring and fault detection for a chemistry process. Spitzlsperger et al. (2005) presented an adaptive Hotelling T2 control chart for the semiconductor etching process. In addition to statistical methods, the k-nearest neighbor (kNN) method based on distance has also been used to classify faults in semiconductor manufacturing (He and Wang 2007; Verdier and Ferreira 2011).

2.2 Principal component analysis

PCA is a multivariate method that explains the covariance structure of multivariate data using a few linear combinations of the original variables (Wold et al. 1987a; Wise and Gallagher 1996). PCA decomposes original data matrix X, including i measurements and j process variables, into uncorrelated t scores matrix T by an orthogonal loading matrix, P, and the unexplained variation of X, that is, residual matrix E.

$$ {\mathbf{X}} = {\mathbf{TP}}^{T} + {\mathbf{E}}. $$
(1)

Most variations can be explained by the first r PC matrix TP T. Given new measurement X, and then PC score t, prediction of measurement \( {\hat{\mathbf{X}}} \), and residual matrix \( {\hat{\mathbf{E}}} \) are defined as follows:

$$ {\mathbf{t}} = {\mathbf{P}}^{T} {\mathbf{X}} $$
(2)
$$ {\hat{\mathbf{X}}} = {\mathbf{PP}}^{T} {\mathbf{X}} $$
(3)
$$ {\hat{\mathbf{E}}} = {\mathbf{(I}} - {\mathbf{PP}}^{T} {\mathbf{)X}}. $$
(4)

After building the PCA model, original data variation can be detected by score space and residual space. The Hotelling T2 statistic and squared prediction error (SPE) are used to measure the divergence of a new measurement. The Hotelling T2 statistic is first used to estimate the explained variation of a new measurement by the PCA model in the score space. This is shown in (5).

$$ T^{2} = {\mathbf{t}}^{T} {\mathbf{S}}^{ - 1} {\mathbf{t}}. $$
(5)

The matrix S −1 is the diagonal matrix containing the inverse eigenvalues associated with the r eigenvectors (PCs). The T2 statistic assumes an F distribution. SPE is also used to estimate the unexplained variation of a new measurement in residual space and is defined as

$$ SPE = {\hat{\mathbf{E}}}^{T} {\hat{\mathbf{E}}} = {\mathbf{X}}^{T} {\mathbf{(I}} - {\mathbf{PP}}^{T} {\mathbf{)X}} $$
(6)

where SPE represents the square error of residual. Therefore, PCA performs process monitoring by constructing a control chart for these two statistics.

3 Proposed approach

With advanced information technology and sensors for data collection, real-time tool data can be recorded for tool or process monitoring in advanced 300 mm fabrication. All identification information and equipment parameters are collected as wafers pass through the process. All process data should be considered multivariable and not mutually independent because they may be correlated with each other. This study proposes a manufacturing intelligence approach for fault detection, diagnosis, and classification based on MPCA and data mining approaches, as shown in Fig. 1. After problem definition and data preparation, MPCA unfolds the 3D process data and reduces the batch process data dimensions into a few PCs. In fault detection, the D and Q statistics are calculated to detect abnormal events from the score space and residual space, respectively. Different types of process faults can be detected based on the constructed control limits of both statistics. All data points are then clustered using a self-organizing map network. The set of most relevant process variables to the detection of abnormal events are also identified. Decision trees are then used to extract rules from the generated groups to describe the various faults in the process. The generated simple rules are used to predict the wafer class defined by performance.

Fig. 1
figure 1

Research framework for semiconductor FDC

3.1 Problem definition

Advanced sensors and information technologies enable process data and tool parameters to be recorded for fault detection and diagnosis. Process data can be recorded immediately during operation in a semiconductor fab. Historical data including nominal process information and quantitative measurements are continuously recorded at nine sites per wafer. To reduce yield loss during the manufacturing process, tool abnormalities should be detected early during process monitoring. The MPCA unfolds the data and extracts a few PCs for process monitoring by considering the 3D data array in semiconductor manufacturing, wafers, process measurement variables, and samples recorded by time. If an abnormal wafer is detected, the fault type must be identified by classifying several subgroups for quick diagnosis and process recovery. This study also uses data mining to extract valuable process information and manufacturing intelligence to support fault detection and classification.

3.2 Data preparation

The FDC system collects a large amount of historical data including missing, noisy, and redundant data. Therefore, the data must be prepared to improve data quality for effective model construction (Chien et al. 2007). Because the FDC system records historical data every second, this increases the quantity of data and noise, which both influence the effectiveness of the constructed model. Domain knowledge is used to select the key process step in the temporal window. Historical data features are then extracted to detect faults and generate fault classes. Equipment parameters with constant values are removed. Before performing the MPCA method, data should be transformed to zero mean and unit variance. For each process variable, subtract its sample mean and divide it by its standard deviation. The purpose of standardization is to avoid particular variables dominating the model results due to the bias of data scale.

3.3 MPCA model construction

MPCA unfolds the 3D data matrix into a 2D matrix and then performs PCA (Wold et al. 1987b; Nomikos and MacGregor 1994). For batch process monitoring in a semiconductor fab, a set of process variables is measured with time intervals and wafers. All historical data are arranged into 3D array X (I × J × K), where I is the number of wafers, J is the number of variables, and K is the number of times for each wafer measurement. After unfolding 3D matrix X into 2D matrix X (I × JK), ordinary PCA is performed on matrix X. The objective of MPCA is to decompose 3D matrix X into fewer PCs. As shown in Fig. 2, X is decomposed into the score space (\( \sum\nolimits_{r = 1}^{R} {{\mathbf{t}}_{r} {\mathbf{p}}_{r}^{T} } \)) with first R PCs and residual space (E). t r and p r . represent a score vector and the rth row vector of the loading matrix, respectively.

$$ \begin{aligned} \underline{{\mathbf{X}}} & = {\mathbf{TP}}^{T} + \underline{{\mathbf{E}}} = \sum\limits_{r = 1}^{R} {{\mathbf{t}}_{r} {\mathbf{p}}_{r}^{T} } + \underline{{\mathbf{E}}} \\ & = {\hat{\mathbf{X}}} + \underline{{\mathbf{E}}} \\ \end{aligned} $$
(7)

P is the loading matrix, which includes loading vector p r , and T is the score matrix containing the projection location on the PC subspace. E is the variation that the MPCA model cannot explain.

Fig. 2
figure 2

Decomposed parts of the three-dimensional array

3.4 Fault detection

MPCA decomposes normal wafer variation into model space and residual space by using the D and Q statistics to monitor the multivariate batch process. The D statistic is similar to the Hotelling T2, that is, the Mahalanobis distance between the new wafer data and the normal condition in a score space that is formed by fewer PCs. The D statistic monitors systematic variation in the score space. The Q statistic monitors variation that is not explained by the PCs retained by the MPCA model. The D and Q statistics are calculated using the following equations:

$$ D_{i} = {\mathbf{t}}_{i}^{T} {\mathbf{S}}_{R}^{ - 1} {\mathbf{t}}_{i} \times \frac{I(I - R)}{{R(I^{2} - 1)}} \times F_{(R,I - R)} $$
(8)
$$ Q_{i} = {\mathbf{e}}_{i} \,{\mathbf{e}}_{i}^{T} , $$
(9)

where R is the number of PCs retained by MPCA; t i is the R score vector corresponding to wafer I; S R is the covariance matrix (R × R) of the t scores calculated by the MPCA model; the diagonal units in the matrix are the variance of first R PCs; and e i is residual vector i of residual matrix E. Therefore, original data variability can be detected in the model space and residual space.

The reference wafers determine the control limits for the D and Q statistics under normal conditions. Based on significance level α determined by the user, the D and Q statistic control limits are determined as follows (Jackson and Mudholkar 1979; Nomikos 1996):

$$ D_{{\alpha ,\;{\text{limit}}}} = F_{R,\;I - R} $$
(10)
$$ Q_{{\alpha ,\;{\text{limit}}}} = \theta_{1} \left[ {1 - \frac{{\theta_{2} h_{0} \left( {1 - h_{0} } \right)}}{{\theta_{1}^{2} }} + \frac{{z_{\alpha } \left( {2\theta_{2} h_{0}^{2} } \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}}} }}{{\theta_{1} }}} \right]^{{{1 \mathord{\left/ {\vphantom {1 {h_{0} }}} \right. \kern-\nulldelimiterspace} {h_{0} }}}} , $$
(11)

where h 0 = 1 − 2θ 1 θ 3/3θ 22 , θ 1 = trace(V), θ 2 = trace(V 2), θ 3 = trace(V 3), and V is the covariance matrix of E, V = EE /(I − 1), and z α is a critical value of standard normal distribution at significance level α.

3.5 Faults clustering and identification

Once the out-of control signal has been detected by the D or Q control charts, the cause of the out-of-control events must be identified. An out-of-control signal indicates abnormal behavior in equipment or processes. Violations in the D and Q control charts are from two types of process faults. The first is because of the systematic variation on t-score, and the other is because of residual space variation that cannot be explained by the MPCA model. Clustering analysis groups the observed wafers to investigate them under normal and abnormal process conditions with possible faults. Clustering analysis segments a diverse group with similar observation characteristics. For fault clustering, self-organizing map (SOM) neural network is used to cluster observations based on PC similarity. The SOM is better than other classical clustering analysis methods because it produces graphic visualizations and parallels the architecture of a neural network. The 2D topological map shows group results based on the relationships among observations.

Fault identification is difficult for process engineers or operators because many process variables and equipment parameters must be monitored. Additionally, abnormal faults could lead to a difference in several process variables. The aim of fault identification is to identify process variables that are most important for judging the out-of-control signal. A contribution plot can be used to show the contribution of the jth variable to the D and Q statistics. To compare the quantity contribution of the process variables, process variables with relatively large contributions should be diagnosed further.

3.6 Fault classification and diagnosis

Fault classification classifies normal and abnormal clusters by considering process variables that are most relevant to the faults. The decision tree method is used to extract rules to explain the relationship between the wafer groups and process variables. Because of the nature of the problem and the data characteristics, a Chi-squared Automatic Interaction Detector (CHAID) is used (Kass 1980). The tree is grown iteratively until all attributes in the decision tree model are not significantly different or the number of instances in the node is less than a determined threshold. Decision trees are better at exploring data for classification and extracting understandable decision rules for process engineers than other classification techniques. All extracted information can assist with fault diagnosis, and the process deviation can be corrected quickly.

4 Empirical study

4.1 Problem definition

To validate the proposed approach, an empirical study was conducted for the CVD Ti/TiN process in a leading semiconductor company in Taiwan. CVD is a chemical process used to develop thin films in ICs. The abnormal event is peeling caused by an imbalanced link between the film and substrate. In practice, the peeling is related to uniformity of thickness, which is inspected by scanning electron microscope analysis after several process steps. Therefore, equipment sensors record equipment parameters to determine whether the tool condition is normal.

4.2 Data preparation

Total 100 wafers with 21 process variables were collected during CVD process. It was possible to collect 23 records from the main CVD process step for each wafer. There were 68 normal wafers and 32 abnormal wafers. A normal wafer has no peeling faults. Additionally, four variables were excluded because of constant values that did not provide process information. Therefore, 17 process variables were selected to build the monitoring model. Table 1 lists these variables. All data were standardized with zero mean and unit variance for further MPCA modeling.

Table 1 Process variable description

The trend charts of each process variable are shown in Fig. 3. Identifying the process variable that is the main cause of the peeling fault is difficult. The variation among the 17 process variables is different, and some variables are correlated with each other. The variability among wafers, process variables, and recorded time cannot be explained by conventional PCA.

Fig. 3
figure 3figure 3figure 3

Trend charts for process variables

4.3 MPCA model construction

To characterize equipment behavior under normal conditions, 68 normal wafers were used to construct a reference model for equipment monitoring. First, MPCA unfolded original 3D array X (68 × 17 × 23) into 2D data set X (68 × 391), where the row represents wafer i and the column represents process variable j at time k. MPCA was then used to transform the original data space into score space and residual space. The score space accumulates the variation under first r PCs, and residual space is the variation that the model cannot explain. The maximal number of PCs is 68, and the explained variance percentages for each PC are listed in Fig. 4. The explained variation for each PC decreases as the number of PCs increases. The number of PCs is selected if the proportion of explained variation is larger than 5 %. The first three PCs explain 50.87 % of the process variation.

Fig. 4
figure 4

Explained variation for each principal component

4.4 Fault detection

To evaluate the effectiveness of peeling fault detection, unfolded data matrix X (100 × 391), including abnormal wafers (32 × 391), was decomposed into score space and residual space multiplying by loading matrix L r=3 (391 × 3). The D statistic is the overall wafer performance to compare with normal wafer performance in the MPCA model. The Q statistic represents equipment behavior in the residual space and monitors each wafer. The D and Q statistic control charts identify out-of-control signals to monitor processes. They are shown in Figs. 5 and 6. The dashed line is the 95 % control limit, and the solid line is the 99 % control limit. In total, 98 % of the wafers are under the control limit of the D statistic. This shows that most wafers are similar in the score space. The normal and abnormal wafer detection results can be compared by the t-score scatter plot, as shown in Fig. 7, where a circle represents a wafer without the peeling fault and a solid dot represents a wafer with the peeling fault. Normal and abnormal wafers with prior knowledge of peeling faults do not separate significantly in the score space. This implies that the equipment behavior of abnormal wafers cannot be identified in the score space.

Fig. 5
figure 5

D statistic control chart

Fig. 6
figure 6

Q statistic control chart

Fig. 7
figure 7

T score scatter plots, a first PC and second PC, b first PC and third PC, c second PC and third PC

In the D statistic control chart, the 64th and 71st wafers are identified as abnormal wafers at the 99 % and 95 % control limits, respectively. Figure 7 shows that the 64th and 71st wafers are different to other wafers, especially in the second and third PCs. The result of the D control chart implies that differentiating between a peeling and a non-peeling wafer is difficult to use only score space. In the Q statistic control chart, several wafers exceed the control limit of the Q statistic. In total, 47 wafers were abnormal beyond the 99 % control limit, and 85 wafers were abnormal beyond the 95 % control limit. The MPCA model explains only 50.87 % of the variation in the process data, which led to the misclassification of 25 % of normal wafers by the Q statistic.

4.5 Fault clustering and identification

Once a deviation is detected in the D or Q statistic control charts, the wafer should be diagnosed to identify the cause of the abnormal event. The fault identification process locates the process variables most responsible for the abnormal event. In the D statistic control chart, the 64th and 71st wafers exceed the out-of-control limit. Figure 8 shows the contribution of each process variable to the D statistic. For the 64th wafer, process variables V6, V7, V15, V16, and V17 contribute more than the other variables. Process variables V10, V16, and V17 contribute the most to the 71st wafer.

Fig. 8
figure 8

D statistic contribution plot, a 64th wafer, b 71st wafer

The wafers are grouped by SOM into three clusters based on the D and Q statistics. Table 2 lists the basic statistics for each cluster. The percentage of peeling wafers in Clusters (i) and (ii) is lower than 15 %, which is normal. Cluster (iii) is in the peeling wafer class because 91.67 % of wafers have a peeling problem. The peeling wafers in Cluster (iii) are selected to identify process variables associated with the abnormality. In the residual space, the Q statistic chart is used to monitor and investigate the MPCA model residuals. By averaging the process variable contributions of these selected abnormal events, process variables V13, V11, V5, V9, and V12 are first identified for fault diagnosis, as shown in Fig. 9.

Table 2 Basic statistics for each cluster
Fig. 9
figure 9

Average Q contribution plot for Cluster (iii)

4.6 Fault classification and diagnosis

After building the MPCA model for variation decomposition into score space and residual space, three groups of wafers were clustered using SOM method. A decision tree was then used to extract the relationship between peeling wafers and process variables. The target variable is a cluster result called “1” if the wafer is peeling, and “0” if it is not. Figure 9 shows the five variables identified from the contributing process variables set, V5, V9, V11, V12, and V13, based on the results from Cluster (iii). The average values for each process variable (100 × 17) were used as independent variables. Figure 10 shows the peeling fault rules extracted by the decision tree. The overall model fit is 97 % for V12 and V11, and three rules were extracted for classifying the peeling fault. The first rule is, “If the value of variable V12 is less than 0.8, then the wafer has the peeling issue.” The second rule is, “If the value of variable V12 is greater than 0.6 and the value of variable V11 is less than 1, then the wafer does not have the peeling issue.” The third rule is, “If the value of variable V12 is greater than 0.6 and the value of variable V11 is between 1 and 1.5, then the wafer has the peeling issue.” Based on the classification results, 29 peeling wafers were detected, and variables “RF Vpp” and “RF tune position” were extracted for a process engineer to diagnose the cause of the peeling issue.

Fig. 10
figure 10

Decision tree for classifying peeling wafers

4.7 Results and discussion

The peeling and non-peeling wafers can be identified using three rules. After detecting abnormal wafers, a set of process variables with large contributions to the D or Q statistics were identified by the fault identification procedure. Process variables V5 (Stage Heater Temperature), V9 (RF Power), V11 (RF Tune Position), V12 (RF Vpp), and V13 (RF Vdc) were regarded as the cause of the peeling issue. Variable V12 (RF Vpp) was extracted in the first level of decision tree and classified 23 peeling wafers, implying that it had the largest impact on classifying peeling wafers.

To evaluate the effectiveness and practical viability of the proposed approach, two conventional approaches including MPCA and the kNN method were selected for comparison. The kNN method categorizes an unlabeled wafer as normal or abnormal according to the k-nearest similar wafers in the training data. First, the number of kNNs for each sample is determined in the normal wafers. The sum of the k-smallest squared Euclidean distances from the ith wafer to the j-nearest wafer is then calculated. The number of kNNs was set as five based on domain knowledge and data quality. The fault detection threshold was estimated at the 95 % and 99 % confidence intervals from the 68 normal wafers. Figure 11 shows that the kNN method detected 24 abnormal wafers at the 99 % control limit, and only 1 normal wafer was identified as abnormal.

Fig. 11
figure 11

Fault detection using the kNN method

Table 3 shows a comparison of the three methods by sensitivity and specificity. Sensitivity measures the proportion of peeling wafers that were correctly classified into the peeling wafer group. Specificity measures the proportion of non-peeling wafers that were correctly classified into the non-peeling wafer group. The control limit was set under the 99 % confidence level. MPCA using the D and Q statistic control charts has the large sensitivity (93.75 %) and small specificity (77.94 %), respectively. The kNN method has the small sensitivity (75 %) and the large specificity (98.53 %). The proposed method has the large sensitivity (90.63 %) and the large specificity (100 %). While the results indicate that the proposed method effectively detects abnormal wafers, it is difficult to show significant differences among these methods because of the small number of wafers used in the study. Using a larger data set from a semiconductor company to reproduce this study should produce a more robust comparison and results.

Table 3 Comparison results for fault detection

Comparing the fault detection and classification results shows that the accuracy of the MPCA method for classifying the peeling issue is low and many non-peeling wafers exceeded the control limit of Q. Because only 50.87 % of the variation is explained in the score space, the remaining 49.13 % of the variation remains in the residual space. This results in many wafers that exceed the Q control limit and produces many false alarms. However, most peeling wafers were classified by clustering analysis and the decision tree method. This means that peeling wafers can be detected by the average process variable performance, and the data fluctuation in process time should be smoothed via data preparation for constructing the monitoring model.

5 Conclusion

This study develops an effective approach for semiconductor fault detection and classification. It integrates MSPC methods and data mining approaches for manufacturing intelligence and yield enhancement. The proposed approach detects potential process faults or abnormal events effectively by monitoring fewer key variables than conventional approaches. The results demonstrate the practical viability of the proposed approach. The MPCA method unfolds 3D historical data and projects the data onto the score space and residual space. Abnormal wafers are then detected based on the D and Q statistic control charts. The process variables most critical to abnormal events are then identified by SOM analysis and contribution plots. Understandable rules are extracted to classify the normal and abnormal wafer classes. With the information extracted from the historical process data, diagnosis of abnormal events can be performed systematically and effectively. The decision to remove the abnormal events and reset the process to normal conditions can be made by engineers or domain experts in a timely and precise manner. Moreover, simple and understandable rules are generated to predict wafer performance and classify wafers into different classes using classification approaches. The proposed approach has been implemented in the FDC system of the company for in situ real-time monitoring for yield enhancement and quality control.

In the semiconductor manufacturing process, some recipes and products result in different behaviors in the same equipment and processes. The effects between change of recipe or product changes and equipment abnormalities are not considered in this study. To detect faults while accounting for different recipes in the same monitoring model, recipe effects must be eliminated before model construction. Further studies can develop an integrated model for fault detection by considering recipe changes and different products. Most existing methods consider the reference model to compare different conditions by using a group of normal measurements. However, equipment conditions are not fixed because of the complex processes and varied products in fab. Further research should develop a rolling scheme for adaptive monitoring and equipment diagnostics and prognostics.