8.1 Introduction

Despite the fact that there has been evident progress in component quality and reliability [1], there are some signs of degradation of component reliability. One reason for this is the abandoning of the military handbooks that provided clear guidelines. Therefore, common requirements on acceptable reliability levels do not exist. Today’s market is driven by consumer application-oriented systems instead of the ones that require long-lasting, high-reliability performance. This has sometimes resulted in a lack of components conforming to high-reliability requirements. This lack has caused some problems, especially in the application areas where long lifetime and high reliability are required, such as military [2] and telecommunications infrastructure products [3].

New surface mount component types without interconnection leads cannot always be adapted due to their limited reliability in demanding applications [4]. Several new component types have been introduced to the market, but the second-level interconnection reliability of all these components is not at a sufficient level. In Fig. 8.1, some thermal cycling test results are depicted [5]. It can be easily seen that most of the components do not conform to the no-failures-in-1,000-cycles criterion.

Fig. 8.1
figure 1_8

Thermal cycling (−40 to +125°C, 1-h cycle) test results of some leadless components. Characteristic lifetimes in cycles are depicted [5]

The complexity of the products is increasing. This may also create further demands on component reliability. Outsourcing of the design and the manufacturing of IP blocks do not eliminate the responsibility of the end-product manufacturer. Outsourcing may even be seen as a threat to reliability and quality, unless the end-product manufacturer carefully communicates the reliability targets and controls the fulfillment of the reliability requirements.

In Chap. 3, general failure mechanisms are discussed, whereas in this chapter, the component reliability is looked at from an alternative point of view – empirical models.

8.2 Empirical Models

While physical models address a certain failure mechanism and try to give an estimate on lifetime based on the evolution rate of the degradation, empirical models are giving some generic estimates on failure rate for a certain component type or technology. Although being based on empirical data, the effect of field environment is taken into account by “factors” responsible for the degradation effects related to temperature, voltage, or some other stress factor. Therefore, these two ways of making lifetime estimates – physical models and empirical models – are not completely opposite, but both of them apply physical and chemical relations. In the case of empirical models, the actual failure mechanism is not, however, directly implicated, but more or less buried inside the model.

Empirical models have a long history and they are still widely applied. One of the major reasons for their popularity is the fact that they are relatively simple and easy to use. Also, when using empirical models, it is easy to expand the reliability analysis from component level to a system or a subassembly level. Many software tools also support the use of empirical models. Empirical models can, in principle, also take into account early failures and random failures, which is not usually the case when considering physical models.

Since the early 1970s, the failure rates for microdevices have fallen ca. 50% every 3 years [6], and the empirical models have been updated on the average every 6 years, thus, the models have become overly pessimistic. In 1994, the US Military Specifications and Standards Reform initiative led to the cancelation of many military specifications and standards, including MIL-HDBK-217 [7]. However, this was not the end of the story. PRISM and 217Plus are the updated versions, the old military handbook prepared by the Reliability Information Analysis Center (RiAC). Due to the popularity of MIL-HDBK-217, the Defense Standardization Program Office (DSPO) decided to revitalize MIL-HDBK-217. On May 8, 2008, the initial 217WG meeting was held in Indianapolis. At this time, the updating is on-going with the help of volunteering industry partners.

Besides MIL_HDBK-217, there are several other standards based on empirical models, such as Bellcore Reliability Prediction Procedure (Telcordia) [8], Nippon Telegraph and Telephone (NTT) procedure [9], British Telecom Handbook [10], CNET procedure [11], and Siemens procedure [12]. The predicted failure rates originating from different standards may, however, deviate from each other [6, 13].

8.3 The Methodology

Although each empirical model is a bit different from each other, there are several similarities between the models, and the basic methodology is quite similar. For each component technology, a certain base failure rate λ b is defined. This failure rate is considered to be a typical or average failure rate representative for this specific component technology. The value for this failure rate is chosen based on the field failure data.

Base failure rate alone is rarely used, but it is usually multiplied by the so-called pi-factors that may take into account several factors: operational conditions (temperature π T and voltage π v), quality of the component π Q, “learning factor” (based on the age of the component/technology) π L, and “environmental factor” (taking into account the ambient conditions of the device use) π e. The end result is the failure rate prediction for a certain component λ:

$$ \lambda = {\lambda_{{b}}}\prod\limits_{i = 1}^n {{\pi_i}}, $$
(8.1)

One should note that even though the formulae may resemble each other, the parameter values, base failure rate λ b and pi-factors π i for different empirical models, may vary a lot, as well as the actual failure rate prediction λ.

Usually, the reliability prediction using empirical models is started in an early phase of product development. Then, only limited information on the actual design is available. Therefore, at this time quite often the effect of stress factors may be neglected. This kind of analysis is called parts-count method. When the electrical design gets more mature, more information becomes available and, therefore, the effect of voltage and temperature can also be better taken into account. At this stage, the methodology is called parts-stress.

8.4 Empirical Models in System Reliability Analysis

As was implied in Sect. 7.3, the empirical models give a constant failure rate for a certain component. This may not always be a realistic assumption, because this is, strictly speaking, valid only in the case of the so-called useful life period of a component’s lifetime. This is the “middle part” of the bathtub curve, after the early-failure period and before the wear-out period.

However, it can be shown that reasonable approximations are available to turn nonconstant failure rates into quasi-constant values, as will be discussed in Chap. 9. Furthermore, using the constant failure rate assumptions makes the system-level reliability analysis very simple. This is due to the fact that the reliability function for a component that has a constant failure rate can be expressed as:

$$ R(t) = {{{\it e}}^{ - \lambda t}}. $$
(8.2)

When assuming that there are two components in a system having failure rates λ 1 and λ 2, the reliability function for this system (assuming that both components are required to be functional in order for the system to be operational ↔ series connection) can be given as:

$$ R{(t)_{\it{sys}}} = R{(t)_1} \cdot R{(t)_2} = {{\it{e}}^{ - {\lambda_1}t}} \cdot {{\it{e}}^{ - {\lambda_2}t}}, $$
(8.3)

which is equivalent to:

$$ {\lambda_{\it{sys}}} = {\lambda_1} + {\lambda_2}. $$
(8.4)

For a system consisting of n components, the same can be written as:

$$ {\lambda_{\it{sys}}} = \sum\limits_{i = 1}^n {{\lambda_i}}, $$
(8.5)

As can be seen, to calculate the system reliability, it is enough to sum the failure rates of each individual component. To be able to write (8.5), one needs to assume that the failure rates of the components are statistically independent. In a general system this assumption can be difficult to make, since each component’s reliability could be a complex function of time, stress level, etc.

Mean time to failure (MTTF) for a system is simply:

$$ {\it MTT}{{\it F}}_{\it sys} = \frac{1}{{{\lambda_{\it sys}}}}. $$
(8.6)

One can, however, argue that it is not realistic to consider that all components are required to operate to be able to consider the system to be operational. This is a valid argument, while not all components may be equally critical – and sometimes some redundancy is – on purpose – created so that the system can operate even in the case of failure of a certain component or a subsystem.

If more complex scenarios are to be studied, then more powerful tools are needed. Those include, e.g., the Reliability Block Diagram (RBD) technique or Markov Chain analysis technique [14]. When using these methods, the mathematical analysis, however, becomes more complex and in many cases – especially in the case of repairable systems – a simulation method, such as Monte Carlo, needs to be utilized.

Using empirical models in conjunction with series-system assumption (resulting in easy mathematics) is, however, not mandatory, even though that is more or less a standard practice. In theory, nothing prevents the use of empirical component models as part of RBD analysis. In practice, however, software tools are often so organized that empirical reliability prediction and RBD are separate modules. Assuming constant failure rate is not mandatory either. Some alternative approaches do exist [15].

8.5 Limitations of Empirical Models and Recommendations on Use

As discussed earlier in Sects. 8.2 and 8.4, there are several drawbacks and limitations to the use of empirical models. The validity and novelty of data on which the models are based is one of the most severe ones. Due to the rapid development of component technologies, many empirical models – unless frequently updated – can become obsolete.

It may be argued that the use and the use environment – on which the empirical model is based – may be very different from the one the component is about to be applied to. Therefore, selecting a telecom standard-based model is a good idea, if your design is about to be used in a telecom application. A military standard model may not be equally good choice in that case.

The effect of all stress factors is not comprehensively taken into account when developing the models. For example, the effect of vibration is not visible in the models, even though this kind of stressing can be embedded in the field data on which the model is based.

Interconnections are not usually taken into account in empirical models, even though their effect on reliability is increasing both in absolute (surface mount technology is dominant, solder interconnection are getting smaller) and relative manner (semiconductor devices have become more rugged due to significant improvements in the manufacturing processes). There is, however, no reason why interconnection could not be satisfactorily taken into account when using empirical models. One just needs to insert a representative model into those.

Furthermore, the physical models and their parameters embedded in the models have been heavily criticized. As an example, the fact that only Arrhenius-type (exponential) absolute temperature dependency is utilized, even though it is well known that this dependency can be more complicated [16]. Another criticism is related to the parameter value selection, which may not always have been best possible.

One further word of caution is that as empirical models are based on field failure data, they may not be very suitable for new, radically different component types. However, components that have only minor deviations from a reliability perspective are potentially easy to analyze using empirical models for existing component types.

Regarding simplifications related to system-level analysis, the main argument that has been assuming constant failure rate and a series-connection type relation between components (all components are needed to keep the system operating) are not necessarily realistic.

In refs. [6] and [13], the different models and the reliability estimates obtained when using those are studied. Both studies show a very large deviation between results obtained when using alternative models. In ref. [6], an analysis of a single component is performed, whereas in ref. [13], a system consisting of several components is also studied. Nevertheless, in both bases significant deviations are obtained. In Table 8.1, the failure rates for a memory component are listed. It can clearly be seen that not only the absolute values vary a lot, but also the temperature dependency is quite different. This is due to the different selection of activation energy values.

Table 8.1 Failure rates for a memory component at different temperatures (© 1992 IEEE) [6]

The situation is unfortunately not very much better when considering whole systems. When studying six different circuit board assemblies, the deviation could be even as high as 500% (over-pessimism) (Fig. 8.2). However, in certain cases, the failure rate proved to be much lower than anticipated.

Fig. 8.2
figure 2_8

Deviation from the observed failure rate for six different circuit board assemblies (© 1999 IEEE) [13]

When looking at these predictions and the evident deviations from the observed failure rate values, one should, however, remember that predicting reliability always means working with models and parameters with considerable uncertainty. Therefore, the fact that reliability is highly dependent is unfortunately true – but not depending on the model type. Finding the right activation energy value is a common task – be the model either empirical or physical.

To obtain the best possible accuracy, when using empirical models, it is recommended that a company updates the parameters based on their own field data. Doing so, the data best reflect the use and use environment the components are likely to encounter. It is also recommended that interconnections are taken into account in the analysis phase, especially if the product is applying some novel interconnection technologies, where the risk of premature failure is larger.

As the primary use of empirical models is in the early phase of product development, a clever reliability engineer can greatly benefit if recognizing the potential risks early on. In an early phase, changes are still relatively easy to make. Therefore, the relatively poor accuracy may be compensated by the ease of use and possibility to be involved early in an R&D project. Quite often reliability models are used to compare different designs, and then the absolute accuracy of reliability predictions is not of primary importance, but the indication of primary risks is vital.