Introduction

Semiconductor products have become one of the most important components of various supply chains that leading countries have reemphasized the importance of semiconductor manufacturing (Chien et al., 2020b). Indeed, driven by Moore’s Law (Moore, 1965) that the number of transistors fabricated on a wafer will be doubled every 12 or 24 months, semiconductor industry has continuously migrated the technologies to shrink the feature size of the integrated circuit (IC). Thus, semiconductor industry is highly capital intensive, with complex and lengthy manufacturing process. In order to meet customer requirements and maintain competitive advantage, continuous yield enhancement for overall wafer effectiveness is critical for the semiconductor manufacturing companies (Chien & Hsu, 2014; Chien et al., 2013a, 2020a). In particular, wafer probe test plays a crucial role to distinguish the good dies from the remaining defected dies on each of the fabricated wafers that can reduce both the producer risk and customer risk via effective probing test. However, as the IC device features are continuously shrinking via technology migration, the fabrication processes as well as IC testing are increasingly complicated.

Semiconductor manufacturing consists of two phases for IC testing including circuit probing conducted on each fabricated wafer for sorting and final testing performed on the packaged IC (Hsu & Chien, 2007). The functionality and design specifications of each die can then be ensured by conducting IC testing. IC probe card is a crucial component as the signal interface between the tester and the tested wafers to detect failures against the designed electrical specification such as current, voltage, leakage, trigger and other functional speed as illustrated in Fig. 1. To obtain reliable test results, the probe card that is characterized by a set of mechanical and electrical parameters should match with the tester and the devices to be tested such as device size and shape, the number of bond pads, and signal characteristics. Advanced quality control (AQC) is critical for yield enhancement in advance among the suppliers of semiconductor supply chain for virtual vertical integration to address the increasing challenges for maintaining technology migration for IC shrinkage (Chien et al., 2020a). Since probe card quality will affect the sensitivity and specificity of testing outcomes, it is important to enhance advanced quality control of the probe cards to ensure data integrity of IC probe testing results.

Fig. 1
figure 1

IC probe card and wafer probing test

As one of the suppliers for semiconductor supply chain, it is difficult for probe card manufacturers to verify the usage quality of their products in real setting. Design of experiments will be employed for electromagnetic analysis and circuit model verification of new probe cards for in-house research and development. However, owing to trade secret and confidentiality, little usage data can be collected from the customers for probing test on their IC devices. Thus, most of the probe card manufacturers reply on domain knowledge and the process of trial and error for troubleshooting. With the advancement of semiconductor process technology and shrinking critical dimensions, the diagnosing and troubleshooting procedure for probe card has become much more complicated and time consuming (Fu et al., 2022). The probe card fault diagnosis procedure will differ based on multiple machine parameter settings and the types of failures. Thus, it is challenging for new engineers lacking sufficient fault diagnosis know-how, increasing the difficulty for finding effective troubleshooting solutions in short time. Therefore, it is crucial to develop a systematic fault diagnosis framework to improve the efficiency of probe card troubleshooting.

Once probe card fault occurs during wafer testing, probe card manufacturer needs to conduct a series of on-site function testing to clarify the fault issues and then execute troubleshooting solutions to ensure the testing quality of restored testers. The whole fault diagnosis process may take up to30 days, causing long machine downtime. In practice, as the customer inform a fault occurred, the engineers rely on domain knowledge and the process of trial and error for fault diagnosis and troubleshooting that are time-consuming. However, little research has been done to conduct probe card fault diagnosis with a systematic and effective method. Furthermore, since the customers will not share on-site testing data for proprietary information protection, only limited amount of data can be collected in real settings. Thus, potential data-driven approaches such as decision tree or random forest are limited by the data of the present problem. Rough set theory (RST) has been effectively employed for data mining and knowledge discovery in the incomplete or insufficient information (Chien & Chen, 2007; Kusiak, 2001; Pawlak, 1982, 1997; Peng et al., 2017).

Focusing on real setting to address the limitation and data quality, this study aims to construct a data-driven framework that integrates RST and domain knowledge to enhance the efficiency and decision quality of probe card troubleshooting for advanced quality control and smart manufacturing for Industry 3.5. To ensure data integrity for further analysis, data preparation is conducted based on domain knowledge. Domain knowledge will be incorporated to verify the derived rules from RST and thus support the decision making of on-site service engineers for troubleshooting effectively. An empirical study was conducted in a leading semiconductor testing company in Taiwan for the approach validation. The results have shown the effectiveness and feasibility of the proposed approach than the existing approach of trial and error to derive decision rules to effectively support the engineers shortening troubleshooting and equipment downtime. Indeed, the proposed framework is implemented in this company.

This paper is organized as follows. “Introduction” section introduces the background and motivation of this research. “Literature review” section reviews the related studies on IC testing fault diagnosis and rough set theory method. “UNISON framework” Section proposes a probe card troubleshooting framework. “Empirical study” section validates the proposed framework with empirical study in IC probe card troubleshooting. “Conclusion” section concludes thus study with contributions, limitations and future research directions.

Literature review

IC testing fault diagnosis

With the rapid development of consumer electronics, market demand of high-quality IC chips is continuously growing. Indeed, real-time monitoring of the manufacturing process is needed to support fault detection to effectively eliminate the cause of the faults and thus reduce abnormal yield loss (Chien et al., 2013a). With the shrinkage of critical dimensions and feature sizes of IC, semiconductor manufacturing process has become much more sophisticated, thereby increasing testing and package cost (Fu et al., 2022).

In order to ensure IC quality and save packaging cost, semiconductor testing is an important process that involves the entire semiconductor supply chain. To ensure the produced wafers meet specifications, IC testing is performed to distinguish the good products from the defected ones. Each die on the fabricated wafer will be probing tested by the probe card of the tester to filter electrically dysfunctional dies. The testing results are shown as the wafer bin maps with spatial defect patterns for further diagnosis and yield enhancement (Chien et al., 2013b). Probe card that serves as the mechanical and electrical signal interface between the wafer and testing equipment is a critical component of the wafer testing system, since probe card quality strongly influences IC yield rate. Misjudgment of specifications is not allowed, a stable and reliable contact interface is needed, therefore high reliability of probe card is required in wafer testing. Figure 2 illustrates the testing procedure for probe card.

Fig. 2
figure 2

Probe card testing procedure

The whole semiconductor manufacturing process involves hundreds of steps and various equipment are used, such as wafer fabrication equipment, wafer testing system and package machine. Since equipment breakdown may cause significant loss of productivity and profit, a systematic fault diagnosis approach is needed to eliminate the root cause quickly and effectively. Furthermore, it is crucial to enhance data integrity by combining the domain knowledge and careful data preparation to find potentially useful troubleshooting solutions.

A number of studies have been done for semiconductor equipment fault diagnosis. Hong et al. (2011) presented a fault detection and classification (FDC) method for semiconductor manufacturing equipment by applying modular neural network in tool data modeling and address the associated uncertainty by Dempster-Shafer (D-S) theory. Li et al. (2013) developed a semiconductor equipment fault diagnosis expert system based on Bayesian networks with improved causal relationship questionnaire and probability scale methods as inference machine. Nawaz et al. (2014) applied Bayesian networks to construct a semiconductor etching equipment fault diagnosis framework using principal component analysis (PCA) to analyze status variable identification (SVID) data. Kim et al. (2017) proposed a time of propagation delay pass fail approach to diagnose interconnect failures of high parallelism probe card. Khakifirooz et al. (2018) employed Bayesian inference for mining semiconductor manufacturing big data for defect detection and yield enhancement. Rostami et al. (2018) applied Support Vector Machine (SVM) to detect abnormal observations and extract and classify fault fingerprints. Fu et al. (2022) employed Bayesian network to model the causal relationship between diagnosis variables for IC testing probe card fault diagnosis. Little research has been done on probe card troubleshooting for advanced quality control.

Rough set theory

RST is a data mining methodology to discover hidden patterns and derive useful decision rules under the incomplete or insufficient information (Pawlak, 1982). Different from conventional approaches, RST does not need to make the assumptions about the independence of variables and normality of data distribution (Chien et al., 2016). Furthermore, RST can deal with incomplete information for the present problem, while the “IF–THEN” rules derived by RST can provide simple and explainable information for the decision makers. Thus, RST is particularly suitable to support the engineers who should employ a series of steps to detect the faults for troubleshooting.

A number of studies have applied RST in various domains such as semiconductor manufacturing (Kusiak, 2001), printed circuit board manufacturing (Tseng et al., 2004), decision rule mining for machining method chains (Wang et al., 2022), and product feature design (Chien et al., 2014, 2016; Wu et al., 2020). In particular, Kusiak (2001) introduced rough set theory and data mining to extract effective decision rules from data sets for making predictions in the semiconductor industry. Chien et al. (2016) developed a data-driven framework based on RST to effectively extract product visual aesthetics and identify useful design concepts for product design. Wu et al. (2020) integrated rough set theory and information entropy to develop a knowledge recommender to enhances knowledge acquisition and reuse for new product development. Wang et al. (2022) developed a decomposition-reorganization approach for mining decision rules for machining method chains based RST.

Furthermore, RST has been applied for fault diagnosis in various areas including power system (Muralidharan & Sugumaran, 2013; Peng et al., 2004, 2017; Shen et al., 2000), medical diagnosis (Jothi & Inbarani, 2016), and image classification (Hassanien et al., 2009). Shen et al. (2000) applied RST in diagnosing the valve fault for diesel engine and proposed a method suitable for discretizing the frequency and time domain attributes. Peng et al. (2004) used RST as a data mining tool for fault diagnosis on distribution feeder, that useful patterns and rules are derived for faulty equipment diagnosis and fault location. Muralidharan and Sugumaran (2013) presented a rough set based rule generation and fuzzy classification of wavelet features for fault diagnosis of monoblock centrifugal pump. Peng et al. (2017) applied RST in fault diagnosis of different cable fault types by rejecting interference signals and recognizing partial discharge signals from different sources, and results showed that the proposed method have higher accuracy than SVM and Back-propagation Neural Network. Jothi and Inbarani (2016) presented a hybrid tolerance rough set and firefly based approach to classify MRI brain tumor image, that can select imperative features of brain tumor. Hassanien et al. (2009) presents a review of rough set and near set applications in medical imaging such as image segmentation, object extraction and image classification, while hybrid approaches including neural networks, particle swarm optimization, SVM, and fuzzy sets are discussed. Extensive review including extensions, theory and applications of RST can also be found, for example, in the research by Zhang et al. (2016). However, little research has been done in applying RST for probe card troubleshooting.

UNISON framework

Focusing on the present problem for probe card troubleshooting, this study develops a UNISON decision framework that integrates RST and domain knowledge to derive rules from incomplete information including six phases: understand and define the decision problem, identify the niche for decision quality improvement, structure the influence relationships among uncertain events, sense and describe expected outcomes, judge and measure overall performance, and tradeoff and decision making, as illustrated in Fig. 3. Indeed, UNSION framework has been revised and effectively applied to various contexts including IC final testing strategy (Chien et al., 2007), overlay error reduction in semiconductor manufacturing (Chien & Hsu, 2011), demand forecast (Chien et al., 2020b; Fu & Chien, 2019), product design innovation based on user experience (Chien et al., 2016), knowledge management (Hu et al., 2019), IC probe card fault diagnosis (Fu et al., 2022), and wastewater treatment and recycle (Lin et al., 2022).

Fig. 3
figure 3

RST-based UNISON framework for probe card troubleshooting

Understand and define the problem

Probe card is a highly customized product, its design may be different based on different tester and testing programs. With the technology migration of semiconductor manufacturing, probe card manufacturing has become much more complicated. Unexpected probe card failure happen during the testing process will cause machine downtime, which reduces overall equipment efficiency. Thus it is crucial for probe card manufacturer to immediately respond and provide technical service to recover the failure as soon as possible and ensure customer satisfaction.

Probe card troubleshooting is challenging due to the problem complexity and information limitation (Fu et al., 2022). There are many factors that affect the troubleshooting solutions, various solutions may map to the same abnormal situation under different circumstances. It is difficult to comprehensively consider all the possible influential factors by human judgement, and even more difficult for new coming engineers to quickly identify effective troubleshooting solutions. Due to proprietary information protection of the IC testing companies, it is difficult to perform on-site testing and obtain complete probe usage information, only partial data can be collected, leading to incomplete or insufficient information.

Identify the niche

The objective of probe card troubleshooting is to shorten diagnosis and processing time and improve IC testing efficiency. Each of the component in the test system may be an uncertain event that affects probe card troubleshooting decisions. The test system consists of the automated test equipment (ATE), probe card and device under test. ATE is an equipment that measures the functionality and performance of a device under test that consists of tester heads, prober and testing programs. The probe card includes printed circuit boards, electronic components, and probes. By constructing a rough set based decision analysis framework, experience and knowledge of domain experts are converted into systematic analytics for Industry 3.5 that employs intelligent solutions to empower decision makers (Chien et al., 2020b; Fu et al., 2022; Lin et al., 2022). To enhance data integrity, the data needed for analytics is also defined after discussion with the domain experts. With the proposed framework, relationship between uncertain events can be structured and troubleshooting solutions can be kept consistent, as well as improving the decision quality.

Structure influence relationship

In real settings, troubleshooting actions and procedures will be different according to different product types and testing conditions. Based on domain knowledge, the influence relationship for probe card troubleshooting processes was structured. Due to the difficulty of collecting on-site testing data, engineers often diagnose the fault cases using the abnormal situation data and product information available. Thus, the abnormality situation and product information are two main factors that influence probe card troubleshooting solutions. Experienced domain experts have their own troubleshooting procedures for frequently occurring abnormal situations, but the expert knowledge is not recorded, leading to difficulty of experience sharing and knowledge management. Therefore, this study standardizes the troubleshooting procedure and derive useful decision rules from multiple attributes.

Data preparation process includes data collection, data inspection and cleaning, data transformation and partition to ensure the data is in good quality and can be effectively analyzed (Lee & Chien, 2022). First, three types of data were collected based on domain knowledge, including abnormality data, product data and troubleshooting solution data from historical probe card fault cases.

  1. (i)

    Abnormality data: The abnormality data includes the abnormal situation of the probe card and on-site testing fail items of the fault cases.

  2. (ii)

    Product data: The product data is the basic information related to each probe card, including product type, design house for the tested wafer, customer and tester type.

  3. (iii)

    Solution data: The troubleshooting solutions for the fault cases are regarded as the solution data. The solution data contains the solution codes and its effectiveness of solving the abnormal situation.

Second, consistency and completeness of the data is checked by correcting errors, removing noise and deleting incomplete objects. Third, the data was transformed into analytical form with condition and outcome attributes. Fourth, the collected data is partitioned into training dataset (k%) and testing dataset (100-k%), where the decision rules were derived from the training dataset and the testing dataset were used for rule validation.

Sense and describe the outcomes

RST based data analysis starts from a decision table, called an information system (Pawlak, 2002). The decision table presents the relationship between the decisions that will be made when the data meets specified conditions. In the decision table, each row stands for an object and each column stands for an attribute. The attributes can be further classified into condition and decision attributes.

Information system

In particular, the information system S can be defined as follows:

$$S=(U, A, V, f)$$
(1)

where U denotes the universe of all objects\({u}_{j }, {u}_{j}\in U\). A is the finite set of the attributes\({a}_{k}\). V is the set of all attribute value, such that

$$\mathrm{V}= {\cup }_{{a}_{k}\in A}{V}_{{a}_{k}}$$
(2)

where \({V}_{{a}_{k}}\) is finite attribute domain of the attribute value of attribute \({a}_{k}\). Finally, f denotes an information function such that, for every \({u}_{j}\in U\) and \({a}_{k}\in A\),

$$f\left({u}_{j}, {a}_{k}\right)\in {V}_{{a}_{k}}$$
(3)

Indiscernibility relation

In RST data analysis, considering specific attributes, objects that have identical attribute values are indiscernible. For example, Objects 1 and 2 in Table 1 have the same attribute in “Customer”, indicating that Objects 1 and 2 were indiscernible in the attribute “Customer”. Let D be a non-empty subset of set A of all attributes, i.e., \(D\subseteq A\). The D-indiscernibility relation (ID) defines that the objects x1 and x2 are D-indiscernible with respect to D as follows:

Table 1 Illustration of the decision table
$$\left({x}_{1}, {x}_{2}\right)\in {I}_{D}\iff f\left({x}_{1}, {a}_{d}\right)=f\left({x}_{2}, {a}_{d}\right) \forall {a}_{d}\in D$$
(4)

Based on the indiscernibility relations, the object universe can be decomposed into blocks of indiscernible objects (i.e., elementary sets). For example, considering the attribute “design house”, two elementary sets {1, 3, 5} and {2, 4} will be generated. That is, objects 1, 3, 5 are indiscernible since their “design house” is the same. Furthermore, if D = {“design house”, “tester”}, then three elementary sets {1, 3}, {2, 4} and {5}will be derived. The combination of the elementary sets \({I}_{D}\left(.\right)\) having the indiscernible relations of ID is denoted as U|ID. For the above example D = {“design house”, “tester”}, U|ID = {{1, 3}, {2, 4}, {5}}.

Table 1 illustrates an example decision table of five objects in the rows that are characterized with attributes in the columns including one decision attribute (i.e., troubleshooting solution) and four condition attributes (i.e., design house, customer, tester and fault problem).

Approximation of sets

In RST, the lower and upper approximations are used to represent the uncertain relationship among the objects. The lower approximation of a set consists of all the D-indiscernible elementary sets included in X. That is, under given attributes, all the objects that can be certainly classified are included. The upper approximation of a set is the union of indiscernible objects having non-empty intersection with X. That is, The upper approximation of a set consists of all the objects that is possible to be classified into the set, considering specific attributes.

The lower approximation of X in D is denoted as \(\underline{D}\left(X\right)\), while the upper approximation of X in D is denoted as \(\overline{D}\left(X\right)\). The definition is as follows:

$$\underline{D}\left(X\right)=\left\{x\in U|{I}_{D}(x)\subseteq X\right\}$$
(5)
$$\overline{D}\left(X\right)=\left\{x\in U|{I}_{D}(x)\cap X\ne \varnothing \right\}$$
(6)

The boundary region \({BN}_{D}\left(X\right)\) is defined as the difference between the upper and lower approximation of X in D and could be denoted as follows:

$${BN}_{D}\left(X\right)={\underline{D}}\left(X\right)-{\overline{D}}\left(X\right)$$
(7)

For example, as shown in Table 1, the objects can be classified into two sets, i.e., X0 and X1, according to the decision attribute of “troubleshooting solution” as follows:

  • X0 = {1, 2, 4} denotes the objects using “ SB0311” solution.

  • X1 = {3, 5} denotes the objects using “ SB0501” solution.

Let D = {“ design house”, “tester”}, then U|ID = {{1, 3}, {2, 4}, {5}}. The objects that can be included in X0 is {2, 4}, which is the lower approximation of X0. That is,\(\underline{D}{X}_{0}=\left\{2, 4\right\}\). For the upper approximation, \(\overline{D}{X}_{0}=\{1, 2, 3, 4\}\).

Therefore, the boundary region can be derived as follows:

$${BN}_{D}{(X}_{0})=\overline{D}{X}_{0}-\underline{D}{X}_{0}=\left\{1, 2, 3, 4\right\}-\left\{2, 4\right\}=\{1, 3\}$$
(8)

Furthermore, the accuracy of the approximation for \({X}_{i}\) in D is defined as follows:

$${\alpha }_{D}\left({X}_{i}\right)=\frac{card\underline{D}\left(X\right)}{card\overline{D}\left(X\right)}$$
(9)

In which the cardinality, denoted as card(.), is the number of objects in the set. The range of \({\alpha }_{D}\left({X}_{i}\right)\) is between 0 and 1. If \({\alpha }_{D}\left({X}_{i}\right)=1, { X}_{i}\) is a exact set with respect to D, while \({\alpha }_{D}\left({X}_{i}\right)<1, { X}_{i}\) is a rough set with respect to D. Following the example mentioned above, consider D = {“design house”, “tester”},

$$\underline{D}{X}_{0}= \left\{2, 4\right\}\Rightarrow card(\underline{D}{X}_{0})=2$$
(10)
$$\overline{D}{X}_{0}=\left\{1, 2, 3, 4\right\}\Rightarrow card(\overline{D}{X}_{0})=4$$
(11)

Thus,

$${\alpha }_{D}\left({X}_{0}\right)=\frac{card\underline{D}{X}_{0}}{card\overline{D}{X}_{0}}=\frac{2}{4}=0.5$$
(12)

Since the value of \({\alpha }_{D}\left({X}_{0}\right)\) is smaller than one, \({X}_{0}\) is a rough set with respect to D.

Attribute reduction and rule extraction

A main purpose of the RST approach is to derive compact rules from the selected reducts, where reduct is a subset from the original set of attributes that can perceive the information as applying the whole set of attributes (Pawlak, 1997). Given a subset of attributes \(D\subseteq A\), considering \({a}_{d}\in D\). If \({I}_{D}={I}_{D-\{{a}_{d}\}}\), then \({a}_{d}\) is dispensable in D; elsewise \({a}_{d}\) is indispensable in D. Thus, dispensable attributes can be excluded without original classification. Let D and E have the equivalence relation over U, where \(\mathrm{D}\subseteq \mathrm{A and E}\subseteq \mathrm{A}.\) The D-positive region of E is defined as:

$${POS}_{D}\left(E\right)= \bigcup_{X\in U/{I}_{E}}\underline{D}X$$
(13)

that denotes the set of objects that can be categorized into the E-elementary sets employing the knowledge expressed by ID. Suppose \({a}_{i}\in D\), if \({POS}_{D}\left(E\right)={POS}_{D-\left\{{a}_{i}\right\}}\left(E\right)\), then \({a}_{i}\) is E-dispensable in D; otherwise \({a}_{i}\) is E-indispensable in D. Reduct is the set of indispensable attributes, which is the essential knowledge that can capture all the fundamental concepts. The subset F of D is called the E-reduct if and only if F is the E-dispensable subset of F, where \({POS}_{D}\left(E\right)={POS}_{F}\left(E\right)\).

For instances, suppose D = {“design house”, “tester”} and E = {“troubleshooting solution}, the elementary sets generated for D is \(U/{I}_{D}\) = {{1, 3}, {2, 4}, {5}}, while the elementary sets generated for E is \(U/{I}_{E}=\) {{1, 2, 4}, {3, 5}}. Thus, \({POS}_{D}\left(E\right)\)={2, 4, 5}.

In RST, \(Core\left(D\right)\) is defined as the intersection of the reducts as follows:

$$Core\left(D\right)=\cap Reduct(D)$$
(14)

where \(Reduct(D)\) is the set of all the reducts of D. In particular, \(Core\left(D\right)\) contains the most important subset of attributes, in which any removal of an attribute would decrease the classification power.

Moreover, for better explainability, the derived reducts can be transformed into decision rules based on the essential attributes. Following the above example, Objects 2, 4 and 5 belong to E-positive region of D, with the same condition attributes of reduct = {“ design house”, “tester”}. The derived rules are follows:

  • IF “Design house = DH2"” and “Tester = ND2”, THEN “Troubleshooting solution = SB0311.”

  • IF “Design house = DH1” and “Tester = ND2”, THEN “Troubleshooting solution = SB0501.”

Rules generation

RST is applied to generate the reducts and derive candidate decision rules from the training dataset, which is shown as follows.

Step 1::

Construct the decision table. If there are numerical attributes, data discretization must be performed to transform the continuous data into several intervals; otherwise, go to Step 2.

Step 2::

Generate all possible reducts with removing inessential attributes and retain attributes which have obvious relationship with decision variables.

Step 3::

Evaluate the generated reducts based on domain knowledge and obtain the sets of satisfied reducts.

Step 4::

Generate rules from the extracted reducts.

Step 5::

Delete the rules where the decision attributes are not the focus class.

Step 6::

Calculate the support value for each generated rule.

Step 7::

For each generated rule, if support value ≥ threshold value, the rule is then selected and added to the candidate rules pool.

Step 8::

Arrange all the candidate rules as output.

Overall judgment and measurement

Three indices of support, confidence, and lift were applied to validate the rules derived from the proposed approach. Firstly, support denotes the significance of the candidate rule. Secondly, confidence denotes the prediction accuracy of the rule. Thirdly, lift is the information gain ratio of the rule.

The following steps are employed to examine the objects in the testing data set to estimate the validity of the derived rules.

Step 1::

Define the threshold for confidence and lift as \({\theta }_{c} \mathrm{and} {\theta }_{l}\).

Step 2::

Input the testing dataset to calculate confidence and lift value for each candidate rule.

Step 3::

If confidence value > \({\theta }_{c}\) and lift value > \({\theta }_{l}\), select this rule into the decision rules pool; or else delete this rule.

Step 4::

After checking the confidence and lift value for all the decision rules, have a discussion with domain experts to confirm the meaning and interpretation of the derived rules.

Trade-off and decision

By applying RST analysis, troubleshooting suggestions can be provided for different circumstances. While the threshold of support can be set to denote the significance of the candidate rules to be investigated, the confidence of a derived rule denotes a percentage value showing how frequently the rule is correct. Thus, the threshold for confidence should be higher is better. Nevertheless, since probe card manufacturers need to eventually find out defect causes for troubleshooting. We used confidence level to prioritize the derived rules in order to support troubleshooting process. Lift is a measure of the performance of a derived rule based on RST at predicting as having an enhanced response with respect to the measure based on a random choice with the population as a whole. Thus, the threshold for lift should be larger than 1. The decision rules can be further expressed in “IF–THEN” form, thus the relationships between the abnormal situations, product information and troubleshooting solutions can be structured. Important features that mainly affect the troubleshooting solutions are also identified.

In addition, the derived rules were discussed with domain experts for further verification and implication interpretation in probe card troubleshooting. With useful information provided by the decision rules, engineers can improve the efficiency and decision quality of probe card fault diagnosis. Since new abnormal situations may appear from time to time, the decision rules obtained from RST should be updated regularly to capture the latest information.

Empirical study

An empirical study was conducted in a leading semiconductor testing company in Taiwan for validation. Historical data provided by the case company are systematically transformed for proprietary information protection.

Understand and define advanced quality control for probe cards

The case company is customer oriented, providing testing related products for various industries. Although the case company aims to enhance customer competiveness by developing latest technology and advanced manufacturing techniques, it is challenging to avoid abnormal situations of probe card, since its quality cannot be tested before actual usage during the wafer testing process. It is crucial for the case company to provide local maintenance services to ensure customer satisfaction. In actual situations, customer service engineers need to first clarify with the customer the abnormal situations happened, and then provide the relevant information to troubleshooting engineers for further analysis and troubleshooting. This case company is a probe card manufacturer with limited and incomplete testing information that can be collected from experiments and the manufacturing process while the customers may not be able to share usage information in real settings.

Identify the niche for probe card troubleshooting

The possible niche for the problem lies in finding the possible influence factors for probe card troubleshooting based on domain expertise to improve IC testing efficiency. Through integration of data mining techniques and domain know-how, useful decision rules for probe card troubleshooting can be generated for different abnormal situations, shorten fault diagnosis time and reduce equipment downtime, to support probe card quality improvement and subsequent packaging cost reduction. Uncertainties that mainly affects probe card troubleshooting includes ATE, DUT and probe card. In the troubleshooting process, engineers first need to validate condition of the ATE by using the probe card on another ATE. After checking the ATE is in normal condition, perform testing on another DUT to ensure that the testing failure is not caused by the previous DUT. By executing the above mentioned steps, uncertainties of the first two factors can be eliminated. Digital decision solutions should be developed to support the engineers for troubleshooting via a number of decision rules to diagnose the faults effectively.

Structure the influence relationships of probe card defects

To empower digital decision and improve probe card troubleshooting efficiency, it is crucial to transform domain expertise into record data and construct a systematic troubleshooting approach. According to the proposed framework, product information and abnormal situation are two important factors that affects the troubleshooting procedure, where the abnormal situation data is collected from the testing machine. For analytics purpose, the troubleshooting solution data are transformed from textual narratives to systematic coding. For example, the solution described as “Add a capacitor on the spider side probe”, can be extracted as “Electrical components-capacitor-specifications-add”, named as “SD111”. Following this procedure, 15 abnormal situations and 248 solution codes for probe card troubleshooting were defined based on domain knowledge and historical data.

For data preparation, historical probe card fault cases data including product data, abnormality data and solution data were collected, and then merged into a dataset referring to record ID for decision table construction. The proposed RST decision system consists of condition attributes (1 for product information and 3 for abnormal situation) and 1 decision attribute (troubleshooting solutions), as shown in Table 2. For data cleaning, the missing and inconsistent data were completed and corrected by verifying the rationality of results with domain experts. In addition, 21 objects including 21 types of troubleshooting solution were also removed from the target data set, because none of them had sufficient supporting objects. (with less than 2 supporting objects). After the data cleaning process, we collected 215 pieces of data regarding the historical probe card fault cases and troubleshooting solution. The levels of the attributes is shown in Table 3. Finally, the data were randomly divided into training dataset that contained 172 objects (80%) and testing dataset that contained 43 objects (20%).

Table 2 Illustration of probe card troubleshooting
Table 3 Number of levels of the attributes

Sense and describe the outcomes of derived decision rules

Following the proposed framework, RST is applied to derive candidate decision rules for probe card troubleshooting. Due to the less amount of data collected and problem complexity, candidate rules were selected if there were at least two items supporting the rules. The decision system comprised 4 condition attributes and one troubleshooting solution code; 46 candidate rules were derived, and a sample of them is shown in Table 4. Rule 1 was a reduct with only one attribute, and 8 supporting data, meaning “IF the customer is OC1, THEN the solution code is SF131”. In addition, rule 4 was another reduct with three attributes and 6 supporting data, meaning “IF the customer is OC1, abnormal situation is ATPG, and characteristic result is Voltage, THEN the solution code is SF131”.

Table 4 Partial candidate rules for probe card troubleshooting

Overall judgment and measurement for decision support

To estimate the validity of the derived candidate rules, the threshold of the lift was 1. The confidence level denotes the one-shot probability that is the probability for successful troubleshooting via firstly employing this rule. A sample of the results of the candidate rules is shown in Table 5. Following the result of the first cross validation, take Rule 1 in Table 5 as an illustration. Rule 1 could be accepted since its confidence value 0.5 is larger than the one-shot probability threshold of 0.25 and the lift is 1.95 that is larger than the lift threshold. However, Rule 4 would be rejected since the lift is 0.98 less than 1, though its confidence value 0.25 is larger than the one-shot probability of 0.06. Therefore, through the validation process for the testing data, 39 rules were extracted.

Table 5 The validation results of partial candidate rules

Among the rules extracted, there are rules with the same condition attributes, but different solution code, which can be further sorted according to the confidence level, with a sample of rules for ATPG abnormal situation are illustrated in Table 6. Where engineers can adopt the rules with a higher accuracy first, which implicates a higher probability to successfully fix the abnormal situation. Through the derived ‘‘IF–THEN” rules for probe card troubleshooting, the relationships between the abnormal situations, product information, and the troubleshooting solutions can be identified.

Table 6 Illustration of the suggested solutions for “CR10”

Trade-off and decision for troubleshooting

Based on the proposed approach, decision rules for probe card troubleshooting are extracted based on historical fault cases and domain knowledge, where the validity and implication of the results are checked by discussion with domain experts. The derived rules which passed the validation criteria can provide insight for troubleshooting engineers to identify the key factor that influences the troubleshooting solutions, as well as providing information for troubleshooting solutions suggestions, which is beneficial for improving probe card troubleshooting efficiency. Indeed, when taking into account the factor of customer, the case company can provide extra care and service for specific important customers to enhance their satisfactory. Furthermore, the decision rules can be especially useful in providing troubleshooting solution suggestions when limited information are available, which can also speed up the time needed for fault analysis. The derived rules according to the ranking of confidence and life can be employed to prioritize the potential troubleshooting solutions for support the engineers in light of the obtained information to enhance the effectiveness and efficiency for advanced quality control of probe cards. The developed solution is thus implemented in this company.

Conclusion

Focusing on realistic needs, this study has developed a systematic framework that integrates RST and domain knowledge to systematically derive decision rules from incomplete information including the abnormal situations and product information to suggest prioritized solutions for probe card troubleshooting. Indeed, probe card that serves as the testing signal interface between the tester and wafer is an indispensable component to ensure the integrity of IC testing that is increasing challenging owing to the shrinking IC feature sizes. The main contribution and novelty of this research can be summarized as follows: Firstly, domain knowledge and troubleshooting know-how are effectively incorporated for defining the condition attributes for developing RST and validating the derived rules. Secondly, an empirical study was conducted for validation in a leading probe card manufacturer, in which the results have shown practical viability of the proposed approach. Indeed, the reducts and the rules derived by RST can provide effective suggestions for the engineers to shorten the troubleshooting time and reduce the equipment downtime. Thirdly, the validated solution has been embedded in the digital decision system that is implemented in real settings. The probe card manufacturers can effectively employ the proposed solution to integrate domain knowledge and troubleshooting experience that can reduce the loss of core know-how due to retirement or resignation of talents via maintaining troubleshooting techniques.

As critical dimensions of IC are continuously shrinking, high quality and reliability of probe card is crucial to ensure the sensitivity and integrity of IC probing test for advanced quality control. Future research can be done in a number of directions as follows: Firstly, the proposed approach can be employed for similar troubleshooting issues with incomplete information for other semiconductor testing tools such as prober and tester for advanced quality control. Secondly, more research can be done to retrain the RST model with updated data, while more decision rules can be extracted by applying the proposed approach in different manufacturing contexts. Thirdly, more studies can be done to incorporate other variables including product related factors such as IC design house and product type as well as the data of equipment status variables collected from the sensors to explore potential causal relationships to support trouble shooting. Fourthly, future research can be done to derive more comprehensive decision rules by employing other data-driven approaches such as decision tree and Bayesian network for comparison. Future studies should be done to develop a digital decision system for real-time decisions with integration of customized planned data storage system and data-driven approaches for smart manufacturing.