# **3 Qualification Tests for Components and Assemblies**

Components, materials, and assemblies have a great impact on the quality and reliability of the equipment and systems in which they are used. Their *selection and qualification* has to be considered with care by new technologies or important redesigns, on a *case-by-case basis.* Besides cost and availability on the market, important selection criteria are *intended application, technology, quality, long-term behavior* of relevant parameters, and *reliability.* A *qualification test* includes *characterization* at different stresses (for instance, electrical and thermal for electronic components), *environmental tests, reliability tests, andfailure analysis.* After some considerations about *selection criteria* for electronic components (Section 3.1), this chapter deals with *qualification tests* for complex integrated circuits (Section 3.2) and electronic assemblies (Section 3.4), and discusses *basic aspects of failure modes, mechanisms,* and *analysis* of electronic components (Section 3.3). Procedures given in this chapter can be extended to nonelectronic components and materials as well. Reliability related basic technological properties of electronic components are summarized in Appendix AID. Statistical tests are in Chapter 7, test and screening strategies in Chapter 8, *design guidelines* in Chapter 5.

# **3.1 Basic Selection Criteria for Electronic Components**

As given in Section 2.2 (Eq. (2.18)), the failure rate of equipment and systems without redundancy is the *sum* of the failure rates of their elements. Thus, for large equipment or systems *without redundancy,* high reliability can only be achieved by selecting components and materials with sufficiently *low failure rates.*  Useful *information for such a selection* are:

- 1. Intended application, in particular required function, environmental conditions, as well as reliability and safety targets.
- 2. Specific properties of the component or material considered, in particular technological limits, useful life, long term behavior of relevant parameters.
- 3. Possibility for accelerated tests.
- 4. Results of qualification tests on similar components or materials.
- 5. Experience from field operation.
- 6. Influence of derating, influence of screening
- 7. Potential design problems, in particular sensitivity of performance parameters, interface problems, EMC.
- 8. Limitations due to standardization or logistic aspects.
- 9. Potential production problems (assembling, testing, handling, storing, etc.).
- 10. Purchasing considerations (cost, delivery time, second sources, long-term availability, quality level).

As many of the above requirements are conflicting, component selection often results in a *compromise.* The following is a brief discussion of the most important aspects in selecting electronic components.

### **3.1.1 Environment**

*Environmental conditions* have a major impact on the functionality and reliability of electronic components, equipment, and systems. They are defined in *international standards* [3.8]. Such *standards* specify stress limits and test conditions, among others for

heat (steady-state, rate of temperature change), cold, humidity, precipitation (rain, snow, hail), radiation (solar, heat, ionizing), salt, sand, dust, noise, vibration (sinusoidal, random), shock, fall, acceleration.

Several combinations of stresses have also been defined, for instance,

temperature and humidity, temperature and vibration, humidity and vibration.

Not all stress combinations are relevant and by combining stresses, or in defining sequences of stresses, care must be taken to avoid the activation of failure mechanisms which would *not appear in the field.* 

Environmental conditions at equipment and system level are given by the *application.* They can range from severe, as in aerospace and defense fields (with extreme low and high ambient temperatures, 100% relative humidity, rapid thermal changes, vibration, shock, and high electromagnetic interference), to favorable, as in computer rooms (with forced cooling at constant temperature and no mechanical stress). *International standards* can be used to fix representative environmental conditions for many applications, e.g. IEC 60721 [3.8]. Table 3.1 gives *examples* for environmental test conditions for electronic/electromechanical equipment and systems. The stress conditions given in Table 3.1 have indicative purpose and have to be refined according to the specific application, to be cost and time effective.

| Environmental<br>condition      | Stress profile:<br>Procedure                                                                                                                                                                                                                                                                             | <b>Induced failures</b>                                                                                                                                                              |  |  |  |  |
|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Dry heat                        | 48 or 72 h at 55, 70 or 85 °C:<br>El. test, warm up $(2^{\circ}C/\min)$ , hold (80% of test<br>time), power-on (20% of test time), el. test, cool<br>down (1°C/min), el. test between 2 and 16h                                                                                                          | Physical: Oxidation, structural<br>changes, softening, drying out,<br>viscosity reduction, expansion<br>Electrical: Drift parameters, noise,<br>insulating resistance, opens, shorts |  |  |  |  |
| Damp heat<br>(cycles)           | 2, 6, 12 or 24 x 24 h cycles $25 \div 55^{\circ}$ C with rel.<br>humidity over 90% at 55°C and 95% at 25°C:<br>El. test, warm up (3h), hold (9h), cool down<br>(3h), hold (9h), at the end dry with air and el.<br>test between 6 and 16h                                                                | Physical: Corrosion, electrolysis,<br>absorption, diffusion<br>Electrical: Drift parameters,<br>insulating resistance, leakage<br>currents, shorts                                   |  |  |  |  |
| Low<br>temperature              | 48 or 72 h at -25, -40 or -55 °C:<br>El. test, cool down (2°C/min), hold (80% test<br>time), power-on (20% test time), el. test, warm<br>up (1°C/min), el. test between 6 and 16 h                                                                                                                       | Physical: Ice formation, structural<br>changes, hardening, brittleness,<br>increase in viscosity, contraction<br>Electrical: Drift parameters, opens                                 |  |  |  |  |
| Vibrations<br>(random)          | 30 min random acceleration with rectangular<br>spectrum 20 to 2000 Hz and an acceleration<br>spectral density of 0.03, 0.1, or $0.3 g_n^2 / Hz$ :<br>El. test, stress, visual inspection, el. test                                                                                                       | Physical: Structural changes,<br>fracture of fixings and housings,                                                                                                                   |  |  |  |  |
| Vibrations<br>(sinusoidal)      | 30 min at $2g_n$ (0.15 mm), $5g_n$ (0.35 mm),<br>or $10 g_n$ (0.75 mm) at the resonant freq. and<br>the same test duration for swept freq. (3 axes):<br>El. test, resonance determination, stress at the<br>resonant frequencies, stresses at swept freq.<br>(10 to 500 Hz), visual inspection, el. test | loosening of connections, fatigue<br><i>Electrical:</i> Opens, shorts, contact<br>problems, noise                                                                                    |  |  |  |  |
| Mechanical<br>shock<br>(impact) | 1000, 2000 or 4000 impacts (half sine curve 30<br>or $50 g_n$ peak value and 6 ms duration in the<br>main loading direction or distributed in the<br>various impact directions:<br>El. test, stress (1 to 3 impacts/s), inspection<br>(shock absorber), visual inspection, el. test                      | Physical: Structural changes,<br>fracture of fixings and housings,<br>loosening of connections, fatigue                                                                              |  |  |  |  |
| Free fall                       | 26 free falls from 50 or 100 cm drop height<br>distributed over all surfaces, corners and edges,<br>with or without transport packaging:<br>El. test, fall onto a 5 cm thick wooden block<br>(fir) on a 10cm thick concrete base, visual<br>insp., el. test                                              | Electrical: Opens, shorts, contact<br>problems, noise                                                                                                                                |  |  |  |  |

Table 3.1 Examples for environmental test conditions for electronic / electromechanical equipment and systems (according to  $IEC60068$  [3.8])

 $g_n \approx 10 \text{ m/s}^2$ ; el. = electrical

At *component level,* to the stresses caused by the equipment or system environmental conditions *add* those stresses produced by the component itself, due to its internal electrical or mechanical load. The *sum* of these stresses gives the *operating conditions,* necessary to determine the *stress at component level*  and the corresponding *failure rate.* For instance, the ambient temperature *inside*  an electronic assembly can be just some few °C higher than the temperature of the cooling medium, if forced cooling is used, but can become more than 30°C higher than the ambient temperature if cooling is poor.

### **3.1.2 Performance Parameters**

The required *performance parameters* at component level are defined by the intended application. Once these requirements are established, the necessary *derating* is determined taking into account the quantitative relationship between failure rate and stress factors (Sections 2.2.3, 2.2.4, 5.1.1). It must be noted that the use of "better" components does not necessarily imply better performance and / or reliability. For instance, a faster IC family can cause EMC problems, besides higher power consumption and chip temperature. **In** critical cases, component selection should not be based only on short data sheet information. Knowledge of parameter sensitivity can be mandatory for the application considered.

## **3.1.3 Technology**

Technology is rapidly evolving for many electronic components, see Fig. 3.1 and Table A10.1 for some basic information. As each technology has its advantages and weaknesses with respect to performance parameters and / or reliability, it is necessary to have a set of rules which can help to select a technology. Such rules *(design guidelines* in Section 5.1) are evolving and have to be periodically refined.

Of particular importance for *integrated circuits* (ICs) is the selection of the packaging form and type.

For the *packaging form,* distinction is made between inserted and surfacemounted devices. *Inserted devices* offer the advantage of easy handling during the manufacture of PCBs and also of lower sensitivity to *manufacturing defects or deviations.* However, the number of pins is limited. *Surface mount devices* (SMD) allow a large number of pins (more than 196 for PQFP and BGA), are cost and space saving, and have better electrical performance because of the shortened and symmetrical bond wires. However, compared to inserted devices, they have greater junction to ambient *thermal resistance,* are more stressed during soldering, and solder joints have a much lower mechanical strength (Section 3.4). Difficulties



Figure 3.1 Basic IC technology evolution

can be expected with pitch lower than 0.3 mm, in particular if thermal and / or mechanical stresses occur in field (Sections 3.4 and 8.3).

*Packaging types* are subdivided into *hermetic* (ceramic, cerdip, metal can) and *nonhermetic* (plastic) packages. *Hermetic packages* should be preferred in applications with high humidity or in corrosive ambiance, in any case if moisture condensation occurs on the package surface. Compared to plastic packages they offer lower *thermal resistance* between chip and case (Table 5.2), but are more expensive and sensitive to damage (microcracks) caused by inappropriate handling (mechanical shocks during testing or PCB production). *Plastic packages*  are inexpensive, less sensitive to thermal or mechanical damage, but are permeable to *moisture* (other problems related to epoxy, such as ionic contamination and low *glass-transition temperature,* have been solved). However, better epoxy quality as well as new *passivation* (glassivation) based on silicon nitride leads to a much better protection against corrosion than formerly (Section 3.2.3, point 8).

If the results of qualification tests are good, the *use of ICs in plastic packages*  can be allowed if one of the following conditions is satisfied:

- 1. Continuous operation, relative humidity < 70%, noncorrosive or marginally corrosive environment, junction temperature  $\leq 100 \degree C$ , and equipment useful life less than 10 years.
- 2. Intermittent operation, relative humidity < 60%, noncorrosive environment, no moisture condensation on the package, junction temperature  $\leq 100 \degree C$ , and equipment useful life less than 10 years.

For ICs with silicon nitride *passivation* (glassivation), the conditions stated in Point 1 above should also apply for the case of intermittent operation.

## **3.1.4 Manufacturing Quality**

The *quality of manufacture* has a great influence on electronic component reliability. However, information about *global* defective probabilities (fraction of defective items) or agreed AQL values (even *zero defects)* are often *not sufficient* to monitor the *reliability level* (AQL is nothing more than an agreed upper limit of the defective probability, generally at a *producer risk*  $\alpha \approx 10\%$ , see Section 7.1.3). Information about changes in the *defective probability* and the results of the corresponding *fault analysis* are important. For this, a direct *feedback* to the component manufacturer is generally more useful than an agreement on an AQL value.

## **3.1.5 Long-Term Behavior of Performance Parameters**

The *long-term stability* of performance parameters is an important selection criterion for electronic components, allowing differentiation between good and poor manufacturers (Fig. 3.2). Verification of this behavior is generally undertaken with *accelerated reliability tests* (trends are often enough for many practical applications).

### **3.1.6 Reliability**

The reliability of an electronic component can often be specified by its *failure rate* A. Failure rate figures obtained from field data are valid if *intrinsic* failures can be *separated* from *extrinsic* ones and reliable data / information are available. Those figures given by component manufacturers are useful if calculated with appropriate values for the (global) *activation energy* (for instance, 0.4 to 0.6eV for ICs) and *confidence level* (> 60% two sided or > 80% *one sided*, see Section 7.1.1). Moreover, besides the numerical value of  $\lambda$ , the influence of the *stress factor* (derating) S is important as a selection criteria (Eq. (2.1), Table 5.1).



**Figure** 3.2 Long-term behavior of performance parameters

# **3.2 Qualification Tests for Complex Electronic Components**

The purpose of a *qualification test* is to verify the *suitability* of a given item (material, component, assembly, equipment, system) for a stated application. Qualification tests are often a part of a *release procedure.* For instance, prototype release for a manufacturer and release for acceptance in a *preferred list (qualified part list*) for a user. Such a test is generally necessary for new technologies or after important redesigns or production processes changes. Additionally, periodic *requalification* of critical parameters is often necessary to monitor quality and reliability.

Electronic component qualification tests cover *characterization, environmental and special tests,* as well as *reliability tests.* They must be supported by intensive *failure (fault) analysis* to investigate *relevant failure mechanisms* (and *causes).*  For a user, such a qualification test must consider:

- 1. Range of validity, narrow enough to be representative, but sufficiently large to cover company's needs and to repay test cost.
- 2. Characterization, to investigate the electrical performance parameters.
- 3. Environmental and special tests, to check technology limits.
- 4. Reliability tests, to gain information on the failure rate.
- 5. Failure analysis, to identify failure causes and investigate failure mechanisms.
- 6. Supply conditions, to define cost, delivery schedules, second sources, etc.
- 7. Final report and feedback to the manufacturer.

The extent of the above steps depends on the *importance* of the component being considered, the *effect* (consequence) of its failure in an equipment or system, and the *experience* previously gained with similar components and with the same manufacturer. National and international activities are moving toward agreements which should make a qualification test by the user unnecessary for many components [3.8, 3.18]. Procedures for environmental tests are often defined in *standards* [3.8, 3.12].

A comprehensive qualification test procedure for ICs in *plastic packages* is given in Fig. 3.3. One recognizes the major steps (characterization, environmental and special tests, reliability tests, and failure analysis) of the above list. Environmental tests cover the thermal, climatic, and mechanical stresses expected in the application under consideration. The number of devices required for the reliability tests should be determined **in** order to *expect* 3 *to* 6 *failures during burn-in.*  The procedure of Fig. 3.3 has been applied extensively (with device-specific aspects like data retention and programming cycles for nonvolatile memories, or modifications because of ceramic packages) to 12 memories each with 2 to 4 manufacturers for comparative investigations [3.2 (1993), 3.6, 3.16]. The cost for a qualification test based on Fig. 3.3 for 2 manufacturers (comparative studies) can exceed US\$ 50,000.

### **3.2.1 Electrical Test of Complex ICs**

Electrical test of VLSI ICs is performed according to the following three steps:

- 1. Continuity test.
- 2. Test of De parameters.
- 3. Functional and dynamic test (AC).

The *continuity test* checks whether every pin is connected to the chip. It consists in forcing a prescribed current  $(100\mu A)$  into one pin after another (with all other pins grounded) and measuring the resulting voltage. For inputs with protection diodes and for normal outputs this voltage should lie between  $-0.1$  and  $-1.5$  V.

Verification of DC *parameters* is simple. It is performed according to the manufacturer's specifications without restrictions (disregarding very low input currents). For this purpose a *precision measurement unit* (PMU) is used to force a current and measure a voltage ( $V_{OH}$ ,  $V_{OL}$ , etc.) or to force a voltage and measure a current ( $I_{III}$ ,  $I_{IL}$ , etc.). Before each step, the IC inputs and outputs are brought to the logical state necessary for the measurement.

The *functional test* is performed together with the verification of the dynamic parameters, as shown in Figure 3.4. The generator in Fig. 3.4 delivers one row after another of the *truth table* which has to be verified, with a frequency  $f<sub>o</sub>$ . For a 40pin IC, these are 40-bit words. Of these binary words, called *test vectors*, the inputs are applied to the *device under test* (DUT) and the expected outputs to a logical comparator. The actual outputs from the DUT and the expected outputs are compared at a time point selected with high accuracy by a strobe. Modern VLSI *automatic test equipment* (ATE) for digital ICs have test frequencies  $f<sub>o</sub>$  > 600MHz and an overall precision better than 200ps (resolution < 30ps). In a VLSI ATE not only the strobe but other pulses can be varied over a wide range. The *dynamic parameters* can be verified in this way. However, the direct measurement of a time delay or of a rise time is in general time-consuming. The main problem with a functional test is that it is not possible to verify all the states and state sequences of a VLSI IC. To see this, consider for instance, that for an  $n \times 1$  cell memory there are  $2^n$  states and *n!* possible address sequences, the corresponding *truth table* would contain  $2^n \cdot n!$  rows, giving more than  $10^{100}$  for  $n = 64$ . The procedure used in





Figure 3.4 Principle of functional and AC testing for LSI and VLSI ICs

#### 3.2 Qualification Tests for Complex Electronic Components



Figure 3.3 Example for a comprehensive qualification test procedure for complex ICs in plastic (Pl) packages (industrial application with normal environmental conditions ( $G_B$  in Table 2.3), 3 to 6 expected failures during reliability test  $(A\lambda \approx 2 \cdot 10^{-3} h^{-1}$  in this example), RH= relative humidity)  $+$ )150°C by Epoxy resin, 175°C by Silicon resin;  $+$ )1000 h by Si<sub>3</sub>N<sub>4</sub> passivation

practical applications takes into account one or more of the following

- *partitioning* the device into modules and testing each of them separately,
- finding out *regularities* in the truth table or given by technological properties,
- limiting the test to the part of the truth table which is important for the application under consideration.

The above limitations rises the question of *test coverage,* i.e., the percentage of faults which are detected by the test. A precise answer to this question can only be given in *some particular cases,* because information about the faults which *actually* appear in a given IC is often lacking. *Fault models,* such as stuck-at-zero, stuck-at-one, or bridging are useful for PCB's testing, but generally of limited utility for a test engineer at the component level.

For packaged VLSI ICs, the electrical test should be performed at 70°C or at the highest specified operating temperature.

## **3.2.2 Characterization of Complex ICs**

*Characterization* is a parametric, experimental analysis of the electrical properties of a given IC. Its purpose is to investigate the influence of different operating conditions such as supply voltage, temperature, frequency, and logic levels on the IC's behavior and to deliver a cost-effective test program for incoming inspection. For this reason a characterization is performed at 3 to 5 different temperatures and with a large number of different patterns.



Figure 3.5 Example of test patterns for memories (see Table 3.2 for pattern sensitivity)

|                 | Functional |          |                    | Dyn. parameters | Number of        | Approx. test time $[s]$ |            |  |  |
|-----------------|------------|----------|--------------------|-----------------|------------------|-------------------------|------------|--|--|
| Test pattern    | D, H, S, O | $C^{+)}$ | $C^{(+)}$<br>A, RA |                 | test steps       | bit addr.               | word addr. |  |  |
| Checkerboard    | fair       | poor     |                    |                 | 4n               |                         | 0.05       |  |  |
| March           | good       | poor     | poor               |                 | 5n               |                         | 0.06       |  |  |
| Diagonal        | good       | fair     | poor               | poor            | 10n              |                         | 0.13       |  |  |
| <b>Surround</b> | good       | good     | fair               | fair            | $26n-16\sqrt{n}$ | 27                      | 0.34       |  |  |
| Butterfly       | good       | good     | good               | fair            | $8n^{3/2}+2n$    | $8.10^{3}$              | 38         |  |  |
| Galloping one   | good       | good     | good               | good            | $4n^2 + 6n$      | $4 \cdot 10^5$          | $7.10^{3}$ |  |  |

Table 3.2 Kindness of various test patterns for detecting faults in SRAMs, and approximate test times for a  $100 \text{ ns}$   $128 \text{ K} \times 8 \text{ SRAM}$  (tests on a Sentry S50, scrambling table with IDS5000 EBT)

A=addressing, C=cap. coupling, D=decoder, H=stuckat 0 or at 1, O=open, S = short,  $RA = read$  amplifier recovery time;  $\rightarrow$  pattern dependent;  $\rightarrow$  pattern and level dependent

Referring to the functional and AC measurements, Figure 3.5 shows some *basic patterns* for memories. These patterns are generally performed twice, direct and inverse. For the patterns of Fig. 3.5, Table 3.2 gives a qualitative indication of the corresponding pattern sensitivity for static random access memories (SRAMs), and the approximate test time for a 128 K x 8 SRAM. *Quantitative* evaluation of *pattern sensitivity* or of *test coverage* is seldom possible; in general, because of the limited validity of *fault models* available (Sections 4.2.1 and 5.2.2). As shown in Table 3.2, test time strongly depends on the pattern selected. As test times greater than lOs per pattern are long also in the context of a characterization (the same pattern will be repeated several thousands times, see e.g. Fig. 3.6), development of efficient test patterns is mandatory [3.2(1989),3.6,3.16,3.20]. For such investigations, relationship between address and physical location (scrambling table) of the corresponding cell on the chip is important (in particular considering the increased presence of spare rows/columns in large memories [3.11]). If design information is not available, *electron beam tester* (EBT) can be helpful to establish the *scrambling table.* 

An important evaluation tool during a characterization of complex ICs is the *shmoo plot.* A shmoo plot is the representation in an  $x/y$  -diagram of the operating region of an IC as a function of two parameters. As an example, Fig. 3.6 gives the shmoo plots for  $t_A$  versus  $V_{CC}$  of a 128K × 8 SRAM for two patterns and two ambient temperatures [3.6]. For Fig. 3.6, test pattern has been performed about 4000 times (2 × 29 × 61), each with a different combination of  $V_{CC}$  and  $t_A$ . If no fault is detected, an x, otherwise  $a \cdot$ , is plotted (defective cells are generally retested once, to confirm the fault). As shown in Fig. 3.6, a small (probably capacitive) coupling between nearby cells exists for this device, as a butterfly pattern is more sensitive than the diagonal pattern to this kind of fault. Statistical evaluation of shmoo plots is often done with *composite shmoo-plots* in which each record is labeled in 10% steps.

|                             |      | $25^{\circ}$ C |                 |       | $70^{\circ}$ C |       |       |  |  |
|-----------------------------|------|----------------|-----------------|-------|----------------|-------|-------|--|--|
| $V_{DD}$                    |      | 12 V           | 15 <sub>V</sub> | 18 V  | 12 V           | 15 V  | 18 V  |  |  |
|                             | min  | 310            | 410             | 560   | 260            | 340   | 470   |  |  |
| $I_{DD}$ ( $\mu$ A)         | mean | 331            | 435             | 588   | 270            | 358   | 504   |  |  |
|                             | max  | 340            | 450             | 630   | 290            | 390   | 540   |  |  |
| $V_{0H}$ (V)                | min  | 11.04          | 14.16           | 17.24 | 10.96          | 14.12 | 17.16 |  |  |
| $(I_{0H} = 2.4 \text{ mA})$ | mean | 11.14          | 14.25           | 17.32 | 11.03          | 14.15 | 17.24 |  |  |
|                             | max  | 11.20          | 14.33           | 17.40 | 11.12          | 14.20 | 17.32 |  |  |
| $V_{0L}$ (V)                | min  | 0.40           | 0.36            | 0.32  | 0.44           | 0.24  | 0.32  |  |  |
| $(I_{0I} = 2.4 \text{ mA})$ | mean | 0.47           | 0.42            | 0.38  | 0.52           | 0.45  | 0.41  |  |  |
|                             | max  | 0.52           | 0.44            | 0.44  | 0.60           | 0.52  | 0.48  |  |  |
|                             | min  | 2.65           | 3.19            | 3.89  | 2.70           | 3.19  | 3.79  |  |  |
| $V_{Hyst}$ (V)              | mean | 2.76           | 3.33            | 3.97  | 2.75           | 3.32  | 3.93  |  |  |
|                             | max  | 2.85           | 3.44            | 4.09  | 2.85           | 3.44  | 4.04  |  |  |

**Table** 3.3 DC parameters for a 40 pin CMOS ASIC specially developed for high noise immunity and with Schmitt-trigger inputs (20ICs)

From the above considerations one recognizes that in general only a *small part* of the possible states and state sequences can be tested. The definition of appropriate test patterns must thus pay attention to the specific device, its technology and regularities in the truth table, as well as to information about its application and experience with similar devices [3.2 (1989), 3.6]. A close *cooperation* between test engineer and user, and also if possible with the device designer and manufacturer, can help to reduce the amount of testing.

As stated in Section 3.2.1, measurement of DC parameters presents no difficulties. As an example, Table 3.3 gives some results for an application specific CMOS-IC (ASIC) specially developed for high noise immunity.

## **3.2.3 Environmental and Special Tests of Complex ICs**

The aim of *environmental and special tests* is to submit a given IC to stresses which can be more severe than those encountered in field operation, in order to investigate *technological limits* and *failure mechanisms.* Such tests are often destructive. A failure analysis after each stress is important to evaluate *failure mechanisms* and to detect *degradation* (Section 3.3). Kind and extent of environmental and special tests depend on the intended application ( $G_F$  for Fig. 3.3) and specific characteristics of the component considered. The following is a description of the environmental and special tests given in Fig. 3.3 (considerations on production related potential reliability problems are in Sections 3.3 & 3.4, see also Figs. 3.7, 3.9, 3.10):



Figure 3.6 Shmoo plots of a 100 ns  $128 K \times 8$  SRAM for test patterns a) Diagonal and b) Butterfly at two ambient temperatures  $0^{\circ}C$  ( $\cdot$ ) and  $70^{\circ}C$  (x)

- 1. *Internal Visual Inspection:* Two ICs are inspected and then kept as a reference for comparative investigation (check for damage after stresses). Before opening (using wet chemical or plasma etching), the ICs are x-rayed to locate the chip and to detect irregularities (package, bonding, die attach, etc.) or impurities. After opening, inspection is made with optical microscopes (conventional or stereo) and SEM if necessary. Improper placement of bonds, excessive height and looping of the bonding wires, contamination, etching, or metallization defects can be seen. Many of these deficiencies often have only a marginal effect on reliability. Figure 3.7a shows a limiting case (mask misalignment). Figure 3.7b shows voids in the metallization of a 1M DRAM.
- *2. Passivation Test:* Passivation (glassivation) is the protective coating, usually *silicon dioxide* (PSG) and / or *silicon nitride,* placed on the entire (die) surface. For ICs in plastic packages it should ideally be free from *cracks* and *pinholes*. To check this, the chip is immersed for about 5 min in a 50°C warm mixture of nitric and phosphoric acid and then inspected with an optical microscope (e.g. as in *MIL-STD-883 method 2021* [3.12]). *Cracks* occur in a silicon dioxide passivation if the content of phosphorus is  $< 2\%$ . However, more than 4% phosphorus activates the formation of phosphoric acid. As a solution, *silicon nitride* passivation (often together with silicon dioxide in separate layers) has been introduced. Such a passivation shows much more resistance to the penetration of moisture (see humidity tests in Point 8 below) and of ionic contamination.
- *3. Solderability:* Solderability of tinned pins should no longer constitute a problem today, except after a very long storage time in a non-protected ambient or after a long burn-in or high-temperature storage. However, problems can arise with gold or silver plated pins, see Section 5.1.5.4. The solderability test is performed according to established *standards* (e.g. IEC *60068-2* or *MIL-STD-883* [3.S, 3.12]) after conditioning, generally using the solder bath or the meniscograph method.
- *4.Electrostatic Discharge (ESD):* Electrostatic discharges during *handling, assembling,* and *testing* of electronic components and populated printed circuit boards (PCBs) can destroy or damage sensitive components, particularly *semiconductor devices*. All ICs families and many discrete electronic components are sensitive to ESD. Integrated circuits have in general *protection circuits,* passive and more recently active (better protection by a factor  $\geq$  2). To determine *ESD immunity*, i.e., the voltage value at which damage occurs, different pulse shapes (models) and procedures to perform the test have been proposed. For semiconductor devices, the *human body model*  (HBM) and the *charged device model* (CDM) are the most widely used. The CDM seems to apply better than the HBM in reproducing some of the damage observed in field applications (see Section 5.1.4 for further details). Based on the experiences gained in qualifying 12 memory types according to Fig. 3.3 [3.2 (1993), 3.6], the following procedure can be suggested for the HBM:
	- 1. 9 ICs divided into 3 equal groups are tested at 500, 1000, and 2000V, respectively. Taking note of the results obtained during these preliminary tests, 3 new ICs are stressed with steps of 250 V up to the voltage at which damage occurs ( $V_{ESD}$ ). 3 further ICs are then tested at  $V_{ESD}$  -250V to confirm that no damage occurs.
	- 2. The test consists of 3 positive and 3 negative pulses applied to each pin within 30 s. Pulses are generated by discharging a 100 pF capacitor through a  $1.5 \text{k}\Omega$  resistor placed in series to the capacitor (HBM), wiring  $inductance < 10\mu$ H. Pulses are between pin and ground, unused pins open.
	- 3. Before and after each test, leakage currents (when possible with the limits ±lpA for open and ±200nA for short) and electrical characteristics are measured (electrical test as after any other environmental test).

Experience shows that an electrostatic discharge often occurs between 1000 and 4000 V. The model parameters of  $100pF$  and  $1.5k\Omega$  for the HBM are average values measured with humans (80 to 500 pF, 50 to 5000  $\Omega$ , 2 kV on synthetic floor and O.SkV on an antistatic floor with a relative humidity of about 50%). A new model for latent damages caused by ESD has been developed in [3.60 (1995)]. Protection against ESD is discussed in Sections 5.1.4 and 5.1.5.4, see also Section 3.3.4.



a) Alignment error at a contact window (SEM, x1O,OOO)



b) Opens in the metallization of a 1 M DRAM bit line, due to particles present during the photolithographic process (SEM,  $\times$ 2,500)



c) Cross section through two trench-capacitor cells of a  $4M$  DRAM (SEM,  $\times$ 5,000)



d) Silver dendrites near an Au bond ball (SEM, x800)



e) Electromigration in a 16K Schottky TTL PROM after 7 years field operation (SEM, x500)



f) Bond wire damage (delamination) in a plastic-packaged device after  $500 \times -50$  / +150°C thermal cycles (SEM, x500)



- *5. Technological Characterization:* Technological investigations are performed to check technological and process parameters with respect to *adequacy* and *maturity*. The extent of these investigations can range from a simple check (Fig. 3.7c) to a comprehensive analysis, because of detected weaknesses. Refinement of techniques and evaluation methods for *technological characterization* is still in progress, see e.g. [3.31 - 3.66, 3.70 - 3.93]. The following is a simplified, short description of some important *technological characterization methods:* 
	- *Latch-up* is a condition in which an IC latches into a nonoperative state drawing an excessive current (often a short between power supply and ground), and can only be returned to an operating condition through removal and reapplication of the power supply. It is typical for CMOS structures, but can also occur in other technologies where a PNPN structure appears. Latch-up is primarily induced by voltage overstresses (on signals or power supply lines) or by radiation. Modern devices often have a relatively high latch-up immunity (up to 200 mA injection current). A verification of latch-up sensitivity can become necessary for some special devices (ASICs for instance). Latch-up tests stimulate voltage overstresses on signal and power supply lines as well as power-on/power-off sequences.
	- *Hot Carriers* arise in micron and submicron MOSFETs as a consequence of *high electric fields* ( $10^4$  to  $10^5$  V/cm) in transistor channels. Carriers may gain sufficient kinetic energy (some eV, compared to 0.02 eV in thermal equilibrium) to surmount the potential barrier at the oxide interface. The injection of carriers into the gate oxide is generally followed by electronhole pairs creation and causes an increasing degradation of the transistor parameters, in particular an increase with time of the threshold voltage  $V_{TH}$ which can be measured in NMOS transistors. Effects on VLSI and ULSI-ICs are an increase of switching times (access times in RAMs for instance), possible data retention problems (soft writing in EPROMs) and in general an increase of noise. Degradation through hot carriers is accelerated with increasing drain voltage and lowering temperature (negative activation energy of about  $-0.03$ eV). The test is generally performed under dynamic conditions, at high power supply voltages (7 to 9 Y) and at low temperatures  $(-70 \text{ to } -20 \text{ °C})$ .
	- *Time-Dependent Dielectric Breakdown* (TDDB) occurs in very thin gate oxide layers (< 20nm) as a consequence of *extremely high electric fields*  $(10^7 \text{--} 10^8 \text{ V/cm})$ . The mechanism is described by the thermochemical (E) model up to about 10<sup>7</sup>V/cm and by the carrier injection (1/E) model up to about  $2.10<sup>7</sup>$  V/cm. An approach to unify both models has been proposed in [3.47 (1999)]. As soon as the critical threshold is reached, breakdown takes place, often suddenly. The effects of gate oxide breakdowns are increased

#### 3.2 Qualification Tests for Complex Electronic Components

leakage currents or shorts between gate and substrate. The development in time of this failure mechanism depends on process parameters and oxide defects. Particularly sensitive are memories> 4M. An *Arrhenius model*  can be used for the temperature. Time-dependent dielectric breakdown tests are generally performed on special test structures (often capacitors).

- *Electromigration* is the migration of metal atoms, and also of Si at the AI / Si interface, as a result of *very high current densities,* see Fig. 3.7e for an example of a 16K TTL PROM after 7 years of field operation. Earlier limited to ECL, electromigration also occurs today with other technologies (because of scaling). The median  $t_{50}$  of the failure-free time as a function of the current density and temperature can be obtained from the empirical model given by Black [3.45],  $t_{50} = Bj^{-n}e^{E_a/kT}$ , where  $E_a = 0.55$  eV for pure Al (0.75 eV for Al-Cu alloy),  $n=2$ , and B is a process-dependent constant. Electromigration tests are generally performed at wafer level on test structures. Measures to avoid electromigration are optimization of grain structure (bamboo structures), use of AI-Si-Cu alloys for the metallization and of compressive passivation, as well as introduction of multilayer metallizations.
- *Soft errors* can be caused by the process or chip design as well as by process deviations. Key parameters are MOSFET threshold voltages, oxide thickness, doping concentrations, and line resistance. If for instance, the post-implant of a silicon layer has been improperly designed, its conductivity might become too low. **In** this case, the word lines of a DRAM could suffer from signal reductions and at the end of the word line soft errors could be observed on some cells. As a further example, if logical circuits with different signal levels are unshielded and arranged close to the border of a cell array, stray coupling may destroy the information of cells located close to the circuit (chip design problem). Finally, process deviations can cause soft errors. For instance, signal levels can be degraded when metal lines are locally reduced to less than half of their width by the influence of dirt particles. The characterization of soft errors is difficult in general. At the chip level, an electron beam tester allows the measurement of signals within the chip circuitry. At the wafer level, single test structures located in the space between the chips (kerf) can be used to measure and characterize important parameters independently of the chip circuitry. These structures can usually be contacted by needles, so that a well equipped bench setup with high-resolution I-V and C-V measurement instrumentation would be a suitable characterization tool.
- *Data Retention* and *Program/ Erase Cycles* are important for nonvolatile memories (EPROM, EEPROM, FLASH). A test for data retention generally consists of storage (bake) at high temperature (2000 h at 125°C for plastic

packages and 500 h at 250°C for ceramic packages) with an electrical test at 70°C at 0, 250, 500,1000, and 2000h (often using a checkerboard pattern with measurement of  $t_{AA}$  and of the margin voltage). Experimental investigation of EPROM data retention at temperatures higher than 250°C shown a deviation from the charge loss predicted by the thermionic model [3.6, 3.36]. Typical values for program/erase cycles during a qualification test are 100 for EPROMs and 10,000 for EEPROMs and Flash memories.

- *6. High-Temperature Storage:* The purpose of high-temperature storage is the stabilization of the thermodynamic equilibrium, and consequently of the IC's electrical parameters. Failure mechanisms related to surface problems (contamination, oxidation, contacts, charge induced failures) are activated. To perform the test, the ICs are placed on a metal tray (pins on the tray to avoid thermal voltage stresses) in an oven at 150°C for 200 h. Should solderability be a problem, a protective atmosphere  $(N_2)$  can be used. Experience shows that for a mature technology (design and production processes), high temperature storage produces only a very few failures (see also Section 8.2.2).
- *7. Thermal Cycles:* The purpose of thermal cycles is to test the IC's ability to support rapid temperature changes. This activates failure mechanisms related to mechanical stresses caused by mismatch in the expansion coefficients of the materials used, as well as wearout because of fatigue, see Fig. 3.7f for an example. Thermal cycles are generally performed from air to air in a twochamber oven (transfer from one chamber to the other with a lift). To perform the test, the ICs are placed on a metal tray (pin on the tray to avoid thermal voltage stresses) and subjected to 2,000 thermal cycles from -65°C  $(+0,-10)$  to  $+150\degree$ C  $(+15,-0)$ , transfer time  $\leq 1$  min, time to reach the specified temperature  $\leq 15$  min, dwell time at the temperature extremes  $\geq 10$  min. Should solderability be a problem, a protective atmosphere  $(N_2)$  can be used. Experience shows that for a mature technology (design and production processes), failures should not appear before some thousand thermal cycles (lower figures for power devices).
- *8.Humidity or Damp Heat Test,85/85* and *pressure cooker:* The aim of humidity tests is to investigate the influence of *moisture* on the chip surface, in particular corrosion. The following two procedures are often used:
	- (i) Atmospheric pressure,  $85 \pm 2$ °C and  $85 \pm 5$ % rel. humidity (85/85 *Test*) for 168 to 5,OOOh.
	- (ii) Pressurized steam,  $110 \pm 2^{\circ}$ C or  $120 \pm 2^{\circ}$ C or  $130 \pm 2^{\circ}$ C and  $85 \pm 5\%$  rel. humidity *(pressure-cooker test* or *highly accelerated stress test* (HAST») for 24 to 408 h **(1,000** h for silicon nitride passivation).

In both cases, a *voltage bias* is applied during exposure in such a way that power consumption is as low as possible, while the voltage is kept as high as possible *(reverse bias* with adjacent metallization lines alternatively polarized

#### 3.2 Qualification Tests for Complex Electronic Components 99

high and low, e. g. 1h *on* / 3h *off* intermittently if power consumption is greater than 0.01 W). For a detailed procedure one may refer to *IEC 60749*  [3.8]. In the procedure of Fig. 3.3, both 85/85 and HAST tests are performed in order to correlate results and establish (empirically) a conversion factor. Of great importance for applications is the relation between the *failure rates* at elevated temperature and humidity (e. g. 85/85 or 120/85) and at field operating conditions (e. g. 40/60). A large number of models have been proposed in the literature to empirically fit the *acceleration factor A* associated with the 85/85 test

$$
A = \frac{\text{mean time to failure at lower stress } (\theta_1 / RH_1)}{\text{mean time to failure at 85/85 } (\theta_2 / RH_2)}.
$$
 (3.1)

The most important of these models are

$$
A = \left(\frac{RH_2}{RH_1}\right)^3 e^{\frac{E_a}{k} \left(\frac{1}{T_1} - \frac{1}{T_2}\right)},
$$
\n(3.2)

$$
A = e^{E_a [C_1 (\theta_2 - \theta_1) + C_2 (RH_2 - RH_1)]},
$$
\n(3.3)

$$
A = e^{\left[\frac{E_a}{k} \left(\frac{1}{T_1} - \frac{1}{T_2}\right) + C_3 \left(RH_2^2 - RH_1^2\right)\right]},
$$
\n(3.4)

$$
A = e^{\left[\frac{E_a}{k}\left(\frac{1}{T_1} - \frac{1}{T_2}\right) + C_4\left(\frac{1}{RH_1} - \frac{1}{RH_2}\right)\right]}
$$
\n(3.5)

$$
A = e^{\int_{R}^{1} \left(\frac{E_a (RH_1)}{T_1} - \frac{E_A (RH_2)}{T_2}\right) + (RH_2 - RH_1) ]}.
$$
\n(3.6)

In Eqs. (3.2) to (3.6),  $E_a$  is the *activation energy*, *k* the Boltzmann constant  $(8.6 \cdot 10^{-5} \text{ eV/K})$ ,  $\theta$  the temperature in °C, *T* the absolute temperature (K), *RH* the relative humidity, and  $C_1$  to  $C_4$  are constants. Equations (3.2) to (3.6) are based on the *Eyring model* (Eq. (7.59», the influence of the temperature and the humidity is multiplicative in Eqs.  $(3.2)$  to  $(3.5)$ . Eq.  $(3.2)$  has the same structure as in the case of electromigration  $(Eq. (7.60))$ . In all models, the technological parameters (type, thickness, and quality of the passivation, kind of epoxy, type of metallization, etc.) appear indirectly in the activation energy  $E_a$  or in the constants  $C_1$  to  $C_4$ . Relationships for HAST are more empirical. From the above considerations, 85/85 and HAST tests can be used as *accelerated tests* to assess the effect of damp heat combined with bias on ICs by accepting a numerical uncertainty in calculating the acceleration factor. As a *global value* for the acceleration factor referred to operating field conditions of 40°C and 60% RH, one can assume for PSG a value between 100 and 150 for the 85/85 test and between 1,000 and 1,500 for the 120/85 test. To assure 10 years field operation at 40°C and 60% RH, PSG-ICs should thus pass without

evident corrosion damage about 1,000h at 85/85 or 100h at 120/85. Practical results show that *silicon-nitride glassivation* offers a much greater resistance to moisture than PSG by a factor up to 10 [3.6].

Also related to the effects of humidity is metal migration in the presence of reactive chemicals and voltage bias, leading to the formation of conductive paths *(dendrites)*  between electrodes, see an example in Fig. 3.7d. A further problem related to *plastic packaged ICs* is that of bonding a *gold wire* to an *aluminum* contact surface. Because of the different interdiffusion constants of gold and aluminum, an inhomogeneous *intermetallic layer* (Kirkendall voids) appears at high temperature and / or in presence of contaminants, considerably reducing the electrical and mechanical properties of the bond. Voids grow into the gold surface like a plague, from which the name *purple plague* derives. Purple plague was an important reliability problem in the sixties. It propagates exponentially at temperatures greater than about 180°C. Although almost generally solved (bond temperature, AI-alloy, metallization thickness, wire diameter, etc.), verification after high temperature storage and thermal cycles is a part of a qualification test, especially for ASICs and devices in small-scale production.

| Component                    |                                | <b>Shorts</b>  | Opens    | Drift | Functional         |  |
|------------------------------|--------------------------------|----------------|----------|-------|--------------------|--|
| Digital bipolar ICs          |                                | $50^{*}\Delta$ | $30*$    |       | 20                 |  |
| Digital MOS ICs              |                                | $20^{\Delta}$  | $60*$    |       | 20                 |  |
| Linear ICs                   |                                |                | $25^{+}$ |       | $75^{++}$          |  |
| <b>Bipolar</b> transistors   |                                | 85             | 15       |       |                    |  |
|                              | Field effect transistors (FET) | 80             | 15       | 5     |                    |  |
| Diodes (Si)                  | general purpose                | 80             | 20       |       |                    |  |
|                              | Zener                          | 70             | 20       | 10    |                    |  |
| Thyristors                   |                                | 20             | 20       | 50    | $10^{\circ}$       |  |
| Optoelectronic (optocoupler) |                                | 10             | 50       | 40    |                    |  |
| Resistors, fixed (film)      |                                |                | 40       | 60    |                    |  |
| Resistors, variable (Cermet) |                                |                | 70       | 20    | $10^{#}$           |  |
| Capacitors<br>foil           |                                | 15             | 80       | 5     |                    |  |
|                              | ceramic                        | 70             | 10       | 20    |                    |  |
|                              | Ta (solid)                     | 80             | 15       | 5     |                    |  |
|                              | Al (wet)                       | 30             | 30       | 40    |                    |  |
| Coils                        |                                | 20             | 80       |       |                    |  |
| Relays (electromec.)         |                                | 20             |          |       | $80^{\frac{1}{1}}$ |  |
| Quartz crystals              |                                |                | 80       | 20    |                    |  |

Table 3.4 Indicative values for failure modes of electronic components (%)

 $\degree$  input and output half each;  $\degree$  short to  $V_{CC}$  or to GND half each;  $\degree$  no output;

<sup>++</sup> improper output; <sup>0</sup> fail to off; <sup>#</sup> localized wearout; <sup>†</sup> fail to trip / spurious trip  $\approx 3/2$ 

### **3.2.4 Reliability Tests**

The aim of a *reliability test* for electronic components is to obtain information about

- failure rate,
- long-term behavior of critical parameters,
- effectiveness of screening to be performed at the incoming inspection.

The test consists in general of a *dynamic burn-in* with electrical measurements and *failure analysis* at appropriate time points (Fig. 3.3), also for some components which have not failed (check for degradation). The number  $(n)$  of devices under test can be estimated from the *predicted failure rate*  $\lambda$  and the *acceleration factor* A (Eq. (7.56)) in order to expect 3 *to 6 failures* (k) during burn-in  $(n \approx k/(\lambda A t))$ . Half of the devices can be submitted to a *screening* (Section 8.2.2) to better isolate *early failures.* Statistical data analyses are given in Section 7.2 and Appendix A8.

# **3.3 Failure Modes, Failure Mechanisms, and Failure Analysis of Electronic Components**

This section introduces some basic concepts and considerations on failure mechanisms and analysis of electronic components. It aims to bring the attention to this field that is important for both equipment and system level reliability engineering. For greater details see e. g. [3.1- 3.93], in particular [3.1, 3.10, 3.31, 3.48, 3.54].

## **3.3.1 Failure Modes of Electronic Components**

*Afailure mode* is the *symptom* (local effect) through which a failure is observed. Typical failure modes are *opens, shorts, drift, functional faults* for electronic components, and brittle fracture, creep, buckling, fatigue for mechanical components. *Average values* for the relative frequency of failure modes in electronic components are given in Table 3.4. The values given in Table 3.4 have indicative purpose and have to be supplemented by application specific results, as far as necessary.

The different failure modes of *hardware,* often influenced by the specific application, cause difficulties in investigating the *effect* of a given failure, and thus in the concrete implementation of *redundancy* (series if short, parallel if open). For critical situations it can become necessary to use *quad redundancy* (Section 2.3.6). Quad redundancy is the simplest *fault tolerant structure* which can accept at least one failure (short or open) of anyone of the 4 elements involved in the redundancy.

### **3.3.2 Failure Mechanisms of Electronic Components**

A *failure mechanism* is the physical, chemical, or other process which results in failure. A large number of failure mechanisms have been investigated in the literature, e. g. [3.31-3.66,3.70-3.93]. For some of them, appropriate physical explanations have been found. For others, models are *empirical* and often of limited validity. Evaluation of models for failure mechanisms should be developed in two steps: (i) verify the *physical validity* of the model and (ii) give its *analytical formulation* with the appropriate set of parameters to *fit the model to the data.* In any case, *experimental verification* of the model should be performed with at least a second, independent experiment, and *limits of the model* should be clearly indicated. The two most important models used to describe failure mechanisms of electronic components, the *Arrhenius* and *Eyring models,* are introduced in Section 7.4 with *accelerated tests* (Eqs. (7.56) - (7.60)). Models to describe the influence of temperature and humidity in damp heat tests have been given with Eqs. (3.2) - (3.6). A new model for latent damages caused by ESD is given in [3.60 (1995)]. Table 3.5 summarizes some important failure mechanisms for ICs, specifying influencing factors and the approximate distribution of the failure mechanisms for plasticpackaged ICs in industrial applications ( $G_B$  in Table 2.3). The percentage of misuse and mishandling failures can vary over a large range (20-80%) depending on the design engineer using the device, the equipment manufacturer and the end user. For ULSI-ICs one can expect that the percentage of failure mechanisms related to *oxide breakdown* and *hot carriers* will grow in the future. Comments on failure mechanisms are also in Sections 3.4, 8.2 & 8.3.

## **3.3.3 Failure Analysis of Electronic Components**

The aim of a *failure analysis* is to investigate the *failure mechanisms* and find out possible *failure causes.* A procedure for failure analysis of complex ICs (from an user's point of view) is shown in Fig. 3.8. It is based on the following steps:

- *1. Failure detection and description:* A careful description of the failure, as observed in situ, and of the surrounding circumstances (operating conditions at the failure occurrence) is important. Also necessary are information on the IC itself (type, manufacturer, manufacturing data, etc.), on the electrical circuit in which it was used, on the operating time, and if possible on the tests to which the IC was submitted previous to the final use (evaluation of possible damage, e. g. ESD). In a few cases the failure analysis procedure can be terminated, if evident mishandling or misuse failure can be confirmed.
- *2. Nondestructive analysis:* The nondestructive analysis begins with an *external visual inspection* (mechanical damage, cracks, corrosion, burns, overheating, etc.), followed by an *x-ray inspection* (evident internal fault or damage) and a careful *electrical test* (Section 3.2.1). For ICs in hermetic packages, it can also

| 3.3                                                                                          | $\mathbf 1$<br>Failure Modes, Failure Mechanisms, and Failure Analysis of Electronic Components                                                                                                                               |                                                                                                                                                                                           |                                                                                                                                                                                                                                   |                                                                                                                                                                                                     |                                                                                                                                                  |                                                                                                                                          |                                                                                                                          |                                                                                                                                |                                                                  |                                          |                                                                  |                                 |                                                              |                                                                                                                                                                          |
|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------|---------------------------------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>Table 3.5</b><br>Basic failure mechanisms of integrated circuits (ICs) in plastic package |                                                                                                                                                                                                                               |                                                                                                                                                                                           |                                                                                                                                                                                                                                   |                                                                                                                                                                                                     |                                                                                                                                                  |                                                                                                                                          |                                                                                                                          |                                                                                                                                |                                                                  |                                          |                                                                  |                                 |                                                              |                                                                                                                                                                          |
| P,                                                                                           | S                                                                                                                                                                                                                             |                                                                                                                                                                                           | 5<br>$\overline{a}$                                                                                                                                                                                                               |                                                                                                                                                                                                     | $\overline{a}$                                                                                                                                   |                                                                                                                                          | $\overline{a}$                                                                                                           |                                                                                                                                |                                                                  | $\degree$                                | $\overline{100}$                                                 |                                 |                                                              |                                                                                                                                                                          |
| Acceleration factors                                                                         | res. freq. for hermetic dev.)<br>$\Delta\theta$ > 150°C (vibrations at<br>Thermal cycles with                                                                                                                                 | Temperature $> 180^{\circ}$ C<br>$(E_a = 0.7 - 1.1 \text{ eV})$                                                                                                                           | ខ<br>$(E_a = 0.5 - 1.2 \text{ eV}$ , up<br>2eV for linear ICs)<br>Γθ,                                                                                                                                                             | RH, E, 0                                                                                                                                                                                            | $(E_a = 0.5 - 0.7 \text{ eV})$<br>$j^n, \theta_J$                                                                                                |                                                                                                                                          | 1eV for large-grain Al)<br>$(E_a = 0.2 - 0.4$ eV for<br>Εθ <sub>J</sub>                                                  | 0.6eV for intrinsic oxide)<br>$\pmb{\mathfrak{r}}$<br>oxide with defects, 0.5                                                  | $E, \theta$ , th.cycl. ( $\Delta\theta$ > 200°C)                 | щ                                        | External radiation                                               | Voltage overstress              |                                                              |                                                                                                                                                                          |
| Causes                                                                                       | materials in contact (for hermetically<br>sealed devices also wire resonance)                                                                                                                                                 | contamination, too thick metallization<br>Different interdiffusion constants of<br>Au and A1, bonding temperature,                                                                        | too thin oxide layer (MOS), package<br>Contamination with Na <sup>+</sup> , K <sup>+</sup> , etc.,<br>material                                                                                                                    | $(Na^+, Cl^-, K^+)$ , cracks or pinholes<br>in the passivation<br>Humidity, voltage, contamination                                                                                                  | (Au, Ag, Pd, Cu, Pb, Sn), contaminan<br>Humidity, voltage, migrating metals<br>(encapsulant)                                                     | temperature gradient, anomalies in<br>Current density ( $> 10^6$ A/cm <sup>2</sup> )<br>the metallization                                | High voltages, thin oxides, oxide<br>defects                                                                             | Contamination with alkaline ions,<br>pinholes, oxide or diffusion defects                                                      | Mask defects, overheating, pure Al                               | Dimensions, diffusion profiles, E        | Package material, external radiation                             | PNPN paths                      | Application, design, handling, test                          |                                                                                                                                                                          |
| Short description                                                                            | Mechanical fatigue of bonding wires or bonding pads because   Different expansion coefficients of the<br>of thermomechanical stress (also because of vibrations at the<br>esonance frequency for hermetically sealed devices) | voids in Au due to diffusion) which can provoke bond lifting<br>Formation of an intermetallic layer at the interface between<br>wire (Au) and metallization (A1) causing a brittle region | solation interface, resulting in an inversion layer outside the<br>Charge spread laterally from the metallization or along the<br>active region which can provide for instance a conduction<br>bath between two diffusion regions | numidity and ionic contamination (P, Na, Cl etc.), critical for<br>PSG (SiO <sub>2</sub> ) passivation with > 4% P (< 2% P gives cracks)<br>Electrochemical or galvanic reaction in the presence of | chemicals, water, and bias, leading to conductive paths<br>Migration of metal atoms in the presence of reactive<br>dendrites) between electrodes | direction of the electron flow, creating voids or opens in the<br>Viigration of metal atoms (also of Si at contacts) in the<br>structure | sufficient charge has been injected to trigger a runaway proc.<br>Breakdown of thin oxide layers occurring suddenly when | Carrier injection in the gate oxide because of E and $\theta_j$ ;<br>creation of charges in the SiO <sub>2</sub> /Si-interface | Formation of intermet. layer between metal. $(AI)$ & substr. Si) | Injection of electrons because of high E | Generation of electron-hole pairs by $\alpha$ -particles (DRAMs) | <b>Activation of PNPN paths</b> | Electrical (ESD/EOS), thermal, mech., or climatic overstress | E = electric field, RH = relative humidity, j = current density, $\theta_1$ = junction temperature, passivation (= glassivation), % = indicative distribution in percent |
| Failure mechanism                                                                            | • Fatigue<br>Bonding                                                                                                                                                                                                          | · Purple plague                                                                                                                                                                           | (leakage currents,<br>• Charge spreading<br>inversion)<br>Surface                                                                                                                                                                 | Metallization<br>• Corrosion                                                                                                                                                                        | · Metal migration                                                                                                                                | · Electromigration                                                                                                                       | · Time-dependent dielec.<br>breakdown (TDDB)<br>Oxide                                                                    | · Ion migration (parasitic<br>transistors, inversion)                                                                          | · Intermetallic compound<br>Others                               | • hot carriers                           | · a-particles                                                    | · Latch-up, etc.                | Misuse / Mishandling                                         |                                                                                                                                                                          |

Failure Modes, Failure Mechanisms, and Failure Analysis of Electronic Cor<br>le 3.5 Basic failure mechanisms of integrated circuits (ICs) in plastic package ai<br>3 lu<br>.5 ni<br>1s s:<br>o a<br>te n<br>g E<br>. p n<br>P

be necessary to perform a seal test and if possible a dew-point test. The result of the nondestructive analysis is a careful description of the *external failure mode* and a first information about possible failure causes and mechanisms. For *evident failure causes,* the failure analysis can be terminated.

- *3. Semidestructive analysis:* The semidestructive analysis begins by *opening the package,* mechanically for hermetic packages and with wet chemical (or plasma etching) for plastic ICs. A careful *internal visual* check is then performed with optical microscopes, conventional  $1000 \times$  or stereo  $100 \times$ . This evaluation includes opens, shorts, state of the passivation / glassivation, bonding, damage due to ESD, corrosion, cracks in the metallization, electromigration, particles, etc. If the IC is still operating (at least partially), other procedures can be used to *localize* more accurately the fault on the die. Among these are the *electron beam tester* (or other voltage contrast techniques), *liquid crystals* (LC), *infrared thermography* (IRT), *emission microscopy* (EMMI), or one of the methods to detect irregular recombination centers, like *electron beam induced current* (EBIC) or *optical beam induced current* (OBIC). For further investigations it is then necessary to use a *scanning electron microscope* (SEM). The result of the semidestructive analysis is a careful description of the *internal failure mode* and an improved information about possible failure causes and failure mechanisms. In the case of *evident failure causes,* the failure analysis procedure can be terminated.
- *4. Destructive Analysis:* A destructive analysis is performed if the previous investigations yield unsatisfactory results and there is a *realistic chance* of success through further analyses. After removal of the passivation and other layers (as necessary) an inspection is carried out with a *scanning electron microscope* supported by a material investigation (e. g. *EDX spectrometry).*  Analyses are then continued using methods of microanalysis (electron microprobe, ion probe, diffraction, etc.) and performing *microsections.* The destructive analysis is the last possibility to recognize the *original failure cause* and the *failure mechanisms* involved. However, it cannot guarantee success, even with skilled personnel and suitable analysis equipment.
- *5. Failure mechanism analysis:* This step implies a correct interpretation of the results from steps 1 through 4. Additional investigations have to be made in some cases, but questions related to failure mechanisms can still remain open. In general, *feedback to the manufacturer* at this stage is mandatory.
- *6. Final report:* All relevant results of the steps 1 to 5 above and the agreed corrective actions must be included in a (short and clear) final report.
- *7. Corrective actions:* Depending on the identified failure causes, appropriate corrective actions should be started. These have to be discussed with the IC manufacturer as well as with the equipment designer, manufacturer, or user depending on the failure causes which have been identified.



Figure 3.8 Basic procedure for failure analysis of complex ICs from an user's point of view (see e. g. [3.48 (2005/2009)] for greater details from a manufacturing's point of view)

The failure analysis procedure described in Section 3.3.3 for ICs can be applied to other electronic or mechanical components and extended to cover populated printed circuit boards (PCBs) as well as subassemblies or assemblies.

### **3.3.4 Present VLSI Production-Related Reliability Problems**

Production-related potential reliability problems, i. e., flaws or damages which can lead to failures, can occur for VLSI devices at packaging or soldering level (Fig. 3.10), as well as on silicon dies. Those on dies are often more difficult to identify. Following examples show some cases for production-related potential reliability problems on silicon dies, in grown difficulty with respect to their identification [3.48 (2005/2009)] (see also Fig. 3.7 for further examples).

Fig. 3.9a shows a contact step coverage flaw. The contact to a diffusion in bulk silicon is made by the first metal layer, which usually is protected by a barrier against Al penetration into bulk-silicon. However, the first metal layer often must adapt itself to some topography. Design rules make sure that the contact is flat enough. However, if the contact slopes are too steep (e. g. etching process problem) the step coverage may be reduced. In this case, electric contact is often still given, but melting or. electromigration may start, leading to a failure. OBIRCH (optical beam induced resistivity change) can help to detect such weak contacts.

Fig. 3.9b shows a wafer processing flaw. Semiconductor devices include at least one poly-Si layer, which usually performs MOS-transistor gates. It is isolated versus bulk silicon by a thin (some nm) gate-oxide, or by a more thick field oxide in active regions. The isolation against further poly-Si layers is given by a self-grown re-oxidation of the poly-Si surface and (in part) by doped silicate-glass (PSG, BPSG). In the structuration process of poly-Si (usually photolithography and plasma etching), an improper etching process may result in poly-Si *residues* or particles, which during subsequent re-oxidation form an irregular and thin oxide around themselves. A short at  $t = 0$  will be avoided; however, a latent short path is created and a small voltage peak may be enough to breakdown the oxide causing a leakage path.

Figs. 3.9c and 3.9d show a ESD damage giving failures at  $t=0$  or *latent failures*, formerly considered as mechanical surface damage. Silicon dies are often delivered as wafers to customers which perform subsequent pre-assembly processes (wafer dicing, back grinding, and pick & place). These operations can include great risks for electrostatic discharge from robotics equipment to the device via device passivation (e. g. when the picker setup of the pneumatic handler moves rapidly on a Teflon bearing). The term ESDFOS (electrostatic discharge from outside-to-surface) has been introduced to describe this failure cause. Like a lightning-strike, the electrostatic spark comes onto the passivation, cracks it, melts the aluminum of the top metal and cracks the interlevel dielectric (lLD), where the metal underneath locally melts and penetrates into the crack. Depending from the degree of Al penetration, the damage causes a failure at  $t = 0$  or a latent failure. Periodic audits with survey and location of air ionizer fans, grounding concepts, materials, etc. is an effective method against this damage.

Further examples related to wafer sawing, poly-Si residues, and RFID devices are in [3.48 (2008, 2009)].



a) A steep slope topography causing a bad contact coverage with Al  $(\times 5000)$ 



b) Slightly oxidized poly residue (small white line) buried between a poly-Sigate and a neighbored contact  $(\times 5000)$ 



c) Latent ESDFOS damage, see also Fig. 3.9d  $(x5000)$ 



d) Short of two top metal layer as consequence of an ESDFOS damage (x 5000)

Figure 3.9 Examples of production-related (hidden) potential reliability problems in Si-dies [3.48] (see also Figs. 3.7 & 3.10)

# **3.4 Qualification Tests for Electronic Assemblies**

As outlined in Section 3.2 for components, the purpose of a *qualification test* is to verify the suitability of a given item (electronic assemblies in this section) for a stated application. Such a qualification involves *performance, environmental,* and *reliability* tests, and has to be supported by a careful *failures (faults) analysis.* To be efficient, it should be performed on *prototypes* which are *representative* for the production line in order to check not only the design but also the *production process.* Results of qualification tests are an important input to the *critical design review* (Table A3.3). This section deals with *qualification tests of electronic assemblies,* in particular of *populated printed circuit boards (PCBs).* 

The aim of the *performance test* is similar to that of the *characterization*  discussed in Section 3.2.2 for complex ICs. It is an experimental analysis of the electrical properties of the given assembly, with the purpose of investigating the influence of the most *relevant electrical parameters* on the behavior of the assembly at different *ambient temperatures* and power supply conditions (see Section 8.3 for considerations on electrical tests of PCBs).

*Environmental tests* have the purpose of submitting the given assembly to stresses which can be more severe than those encountered in the field, in order to investigate *technological limits andfailure mechanisms* (see Section 3.2.3 for complex ICs). The following *procedure,* based on the experience with a large number of equipment [3.76], can be recommended for assemblies of mixed technology used in *high reliability* (or safety) applications (total  $\geq$  10 assemblies):

- 1. Electrical behavior at extreme temperatures with functional monitoring, 100 h at  $-40^{\circ}\text{C}$ ,  $0^{\circ}\text{C}$ , and  $+80^{\circ}\text{C}$  (2 assemblies, as reference for failure analysis).
- 2. 4,000 thermal cycles  $-40/1120^{\circ}$ C with functional monitoring,  $\leq 5^{\circ}$ C/min or  $\geq$  20 $\degree$ C *l* min within the components according to the field application,  $\geq$  10 min dwell time at -40°C and  $\geq$  5 min at 120°C after the thermal equilibrium has been reached within  $\pm 5^{\circ}$ C (total dwell times of about 30 & 15 min, 60 & 30 min for lead-free solder;  $\geq$  3 assemblies, metallographic analysis after 2,000 and 4,000 cycles).
- 3. Random vibrations at low temperature, 1h with  $2 6g_{rms}$ ,  $20 500$  Hz at -20°C (2 assemblies).
- 4. EMC and ESD tests (2 assemblies).
- 5. Humidity tests, 240h *8S/8S* test (1 assembly).

Experience shows [3.76] that electronic equipment often behaves well even under extreme environmental conditions (operation at  $+120^{\circ}$ C and  $-60^{\circ}$ C, thermal cycles *-401* +120°C with up to 60°C *1* min within the components, humidity test *8S/8S,*  cycles of 4h *9S/9S* followed by 4h at -20°C, random vibrations 20 - SOOHz at 4g<sub>rms</sub> and −20°C, ESD/EMC with pulses up to 15kV). However, problems related to crack propagation *in solder joints* appear, and metallographic investigations on more than 1,000 microsections [3.76] confirm that *cracks* in solder joints are initiated by production flaws (Fig. 3.10 d - f) or by microvoids caused by creep. The above holds in particular for Sn-Pb solder. For lead-free solder, greater sensitivity to fast thermal cycles and vibrations can be expected, see e.g. [3.79 (200S, 2008)].

Many of the production flaws with *inserted components* (Fig. 3.lOa -c) can be avoided and would cause only *minor* reliability problems (for instance, voids can be eliminated by a better plating of the through-holes). Since even voids up to SO% of the solder volume do not severely reduce the reliability of solder joints, it is preferable to *avoid rework.* Poor wetting of the leads or the excessive formation of brittle intermetallic layers are *major* potential reliability problems for solder joints. This last kind of defects must be avoided through a better production process.

#### 3.4 Qualification Tests for Electronic Assemblies 109

More critical are *surface mount devices* (SMD), for which a detectable *crack propagation* in solder joints often begins after some few thousand thermal cycles. Extensive investigations [3.79 (1996)] show that crack propagation is almost independent of pitch, at least down to a pitch of 0.3 mm, and solder joints of IC's with shrinking pitches are less critical (due to leads flexibility). A new model based on *creep* (intended as elevated temperature, time dependent deformation) to describe the *viscoplastic behavior* of SMT solder joints has been proposed in [3.92], see also [3.79 (02),3.81,3.86]. The model outlines the strong impact of *deformation energy* on *damage evolution.* Besides *diffusion creep,* at very low stress (thermal gradient), basically two different *deformation mechanisms* are present, *grain boundary sliding* at low thermal gradient and *dislocation climbing* at high thermal gradient. Each mechanism causes microvoids, in locally restricted recrystallized areas within the joint, that evolve to cracks. The strain rate in the steady-state stage can be described by an Eyring model similar to that for electromigration (Eq. (7.60» with 2 additive terms [3.79(02,05,08)]; other models are e.g. in [3.89 (Chapter 3)]. Hence, attention must be paid in defining environmental and reliability tests or screening procedures (Section 8.3) for assemblies in SMT. In such a test, or screening, it is mandatory to activate only the failure mechanism which would also be activated in the field. *Dwell time during thermal cycles* also plays an important role. It must be long enough to allow *relaxation* of the stresses, and depends on temperature, temperature swing, and materials stiffness. As for the thermal gradient, it is difficult to give general rules; however, dwell times of about 20 min at  $-20^{\circ}$ C and 10 min at  $100^{\circ}$ C (40 and 20 min for lead-free solder) seems to be reasonable.

*Reliability tests* at assembly and higher integration level have as a primary purpose the detection of all *early failures* (Section 7.7) and an estimation of the *failure rate* (Section 7.2.3). Precise information on the failure rate shape is seldom possible from reliability tests, because of cost and time limits. If reliability tests are necessary, the following procedure can be used (total  $\geq 8$  assemblies):

- 1.4,000 h dynamic burn-in at 80 °C ambient temperature ( $\geq$  2 assemblies, functional monitoring, intermediate el. tests at 24, 96, 240, 1,000, and 4,000 h).
- 2.5,000 thermal cycles  $-20/1100^{\circ}$  with  $\leq 5^{\circ}$  / min for applications with slow heat up and  $\geq 20^{\circ}$ C / min for rapid heat up, dwell time  $\geq 10$  min at  $-20^{\circ}$ C and  $\geq$  5 min at 100 °C after the thermal equilibrium has been reached within  $\pm$  5°C (total dwell times of about 20 & 10 min, 40 & 20 min for lead-free solder;  $\geq$  3 assemblies, metallographic analysis after 1,000, 2,000, and 5,000 cycles; crack propagation can be estimated using a Coffin-Manson relationship of the form  $N = Ae^n$  with  $\varepsilon = (\alpha_B - \alpha_C)I\Delta\theta/d$ , the parameter A has to be determined with tests at different temperature swings).
- 3. 5,000 thermal cycles  $0/180^{\circ}$ C, with temperature gradient as in point 2 above, combined with random vibrations  $1g_{rms}$ ,  $20 - 500$  Hz ( $\geq 3$  assemblies, metallographic analysis after 1,000, 2,000, and 5,000 cycles).

3 Qualification Tests for Components and Assemblies



a) Void caused by an s-shaped pin gassing out in the area  $A (x 20)$ 



b) Flaw caused by the insertion of the insulation of a resistor network  $(x20)$ 



- c) Defect in the copper plating of a hole in a multilayer printed board  $(x 50)$
- f) Detail A of Fig. 3.9e  $(\times 500)$

**Figure 3.10** Examples of production flaws responsible for the initiation of cracks in solder joints a) - c) inserted devices, d) - f) SMD (Rel. Laboratory at the ETH Zurich); see also Figs.  $3.7 \& 3.9$ 



d) A row of voids along the pin of an SOP package  $(\times 30)$ 



e) Soldering defect in a surface mounted resistor, area A  $(\times 30)$ 



#### 3.4 Qualification Tests for Electronic Assemblies 111

*Thermal cycles with random vibrations* highly activate failure mechanisms at the assembly level, in particular *crack propagation in solder joints.* If such a stress occurs in the field, insertion technology should be preferred to SMT for high reliability or safety applications. Figure 3.11 shows a comparative investigation of crack propagation [3.79 (1993)].

Preliminary results show that lead-free solder joints are more sensitive than Sn-Pb solder joints to manufacturing flaws or defects, mechanical vibrations, and fast thermal cycles. For this reason, tests and/ or screening on assemblies (PCBs) manufactured with lead-free solder should take care of the stress really encountered in the field (see also Sections 5.1.5.4 and 8.3).



Figure 3.11 Crack propagation in different SMD solder joints as a function of the number of thermal cycles ( $\delta l / l$  = crack length in % of the solder joint length, mean over 20 values, thermal cycles *-201+* 100°C with 60°C/min inside the solder joint; Reliability Laboratory at the ETH Zurich)