Keywords

1 Introduction

With the extensive application of digital equipment in nuclear power plants (NPPs), computer software plays an increasingly important role in the digital instrumentation and control (I&C) systems of NPPs. The scale and importance of software are on the rise. However, with the increase of software scale, defect density increases geometrically. The following Table 1 gives the defect density of some applications. The increase of software defects not only leads to a great increase in the cost of defect location and repair, but also may lead to a large number of software failure and produce serious consequences. Therefore, the reliability of the software must be considered before the digital equipment is put into practical use of the safety I&C systems in NPPs.

Table 1. Defect density of some applications [1].

This research mainly uses the static test technology of software reliability qualitative analysis to detect and eliminate software defects, that is to say, to find and eliminate software defects without actually running the software. Software reliability analysis method is different from the defect detection methods of software testing. This method can be applied to the early days of software development process, find out the defects and the weak link in the software requirements or the software design, to avoid the continuation of defects to the subsequent development stage. In addition, it also reflects the principle of defect detection as early as possible, reducing the cost and time of defect repair. Commonly used software reliability analysis methods include software failure modes and effects analysis (SFMEA), software fault tree analysis (FTA), event tree analysis (ETA) and Petri net analysis.

This research firstly introduces the necessity of software reliability analysis of digital equipment, and then explains the advantages and disadvantages of FMEA and FTA and the principle of comprehensive analysis using the two methods. After that, according to the principle of comprehensive analysis, the application software of I&C system of nuclear power plant is taken as an example, and an analysis model is established to qualitatively analyze the reliability of the software. Finally, the software reliability analysis is summarized and prospected.

2 Software FMEA and FTA Comprehensive Analysis

2.1 The Advantages and Disadvantages of the Analysis Methods

The software FMEA is a bottom-up single factor failure analysis method, which cannot perfectly express the various logical relationships among failure reasons. In addition, the analysis process and results of the software FMEA are presented in table form, which is not as intuitive as the graphical expression of FTA.

Software FTA is a top-down method of pushing fault causes backward according to tree structure, which may miss potential key top-level failure events and fault causes, affect the importance ranking of bottom events, and thus affect fault control and judgment of implementing improvement measures. The tree structure of FTA is not as detailed as the form of FMEA in describing the analysis results and other information.

As mentioned above, the respective advantages of FMEA and FTA make up for each other’s shortcomings. In order to better reflect the completeness of software reliability analysis process and results and the intuitiveness of logical expression, find out the potential defects or problems as early as possible and improve the quality of software reliability analysis, this research comprehensively applies the two analysis techniques to case analysis.

2.2 The Principle of the Comprehensive Analysis

By referring to the general steps of software reliability qualitative analysis of nuclear power plant safety system in GB/T 9225, the analysis steps in Fig. 1 are formulated [2].

Fig. 1.
figure 1

Qualitative analysis steps of software reliability.

Among the steps, the most important analysis steps are the comprehensive analysis of FMEA and FTA (see Fig. 2). Starting from the failure mode of a certain function of the analysis object, the failure impact is firstly obtained through the software FMEA analysis. The failure impact can be used as the source of the top event of the software FTA, and the priority of the analysis of the software FTA can be determined according to the impact degree. It can also analyze the fault mode of the object as an intermediate event, and conduct downward analysis of the fault cause of software FTA. Based on the software FMEA, the tree structure is used to construct a more intuitive logic relationship between fault causes. Then, according to the constructed tree logic relationship, the minimum cut set is solved to judge the importance ranking of bottom events, and the key fault modes and their causes are found so as to formulate control or improvement measures.

Fig. 2.
figure 2

Principle of FMEA and FTA comprehensive analysis [3, 4].

3 Modeling and Application

3.1 System Function and Structure

In order to facilitate analysis and modeling, the schematic diagram of reactor trip protection system (RTS) shown in Fig. 3 is presented. The system consists of two mutually redundant protection channels, which acquire the same sensor analog signals respectively, and send them to each other’s voting logic through network communication at the same time. After the voting logic processing, both channels get the digital signal of trip protection and output it to the actuators or the reactor trip breakers.

Fig. 3.
figure 3

The schematic diagram of RTS.

3.2 Basic Conditions and Constraints

Based on the system characteristics of RTS, the corresponding basic conditions or constraints should be given before the software reliability analysis. Specific as follows:

  1. 1.

    For software common cause failure (CCF) that may affect RTS protection or critical safety functions, additional diversity system is set up to mitigate and protect.

  2. 2.

    Communication between the channel processors is confirmed by the receiving processor and, if necessary, the data defaults to a secure state.

  3. 3.

    If the processor loses the current signal, the processor replaces it with the last known valid signal in its history.

  4. 4.

    In the event of a single fault, the system operates normally and without bypass.

  5. 5.

    For a device that outputs trip/no-trip status signals, when it loses power or signal, the final output signal will be the RT safety status signal.

3.3 FMEA and FTA Comprehensive Analysis

According to the comprehensive analysis principle in Fig. 2, combining with the characteristics, basic conditions and constraints of the software system, the fault state or reason of the processor that may reduce the system reliability is identified by taking the software of the case system as the research object. At the same time, the fault states related to hardware are eliminated, and the software reliability is qualitatively analyzed based on the software fault mode. The analysis mainly includes five aspects: input fault mode, program fault mode, output fault mode, communication fault mode and human error. In the reliability analysis, the software system can only be accurately modeled when human factors are considered [5]. The analysis of software fault mode is shown in Table 2.

Table 2. Software fault mode analysis [6].

Based on the software fault mode analysis in Table 2, by analyzing the causes of failure of software execution functions, FTA identifies various logical relationships among the causes of failure, finds out the key causes of failure of software function, judges the effectiveness of measures to correct and prevent failure causes, and qualitatively evaluates the reliability of software.

As shown in the Fig. 4, the implementation logic of reactor trip (RT) function is 1/2 voting logic. In order to invalidate this function, the signal of two redundant channels shall be invalidated or the software common cause failure occurs between channels. The failure of single channel signal means that the processor of the channel is not available. Failure of hardware is not analyzed in this research. Software failure can be caused by input fault or program fault or output fault or communication fault or human error. Each fault mode can be subdivided into specific bottom events, as shown in Fig. 4.

Fig. 4.
figure 4

Fault tree model of RT function failure.

3.4 Solve the Minimum Cut Set

According to the fault tree model obtained in Fig. 4, the logical relationship between various bottom events has been obtained, and the downlink method is adopted to solve the minimum cut set.

The analysis rules of the downlink method are:

  1. 1.

    The order of cut set is increased when “and” gate is encountered (the number of base events contained in cut set), and;

  2. 2.

    The number of cut sets is increased when “or” gate is encountered.

The solution of the minimum cut set for the failure of RT function is shown in Table 3. It needs to be particularly noted that the two channels of the RTS are mutually redundant, so the intermediate event A3 is not further subdivided.

Table 3. The downlink method to solve the minimum cut set.

According to the data in the fifth column of Table 3, the number of cut sets in which RT function failure is 16.

The cut set is used to judge the importance of bottom events, and the influence of the occurrence of bottom events on the occurrence of top events is only analyzed from the fault tree structure without considering the probability of bottom events.

Basic principles of importance (\( {\text{I}}\varphi \)) analysis of ground events:

  1. 1.

    Only all basic events that appear in the same minimum cut set have equal importance;

  2. 2.

    In the minimum cut set, the fewer the number of basic event, the more important the basic event is;

  3. 3.

    The more times the basic event appears in different minimum cut sets, the more important it is.

To sum up, the importance (\( {\text{I}}\varphi \)) order of the bottom event of RT function failure is:

$$ \begin{aligned} & I\varphi \left( {X1} \right) > I\varphi \left( {X2} \right) = I\varphi \left( {X3} \right) = I\varphi \left( {X4} \right) = I\varphi \left( {X5} \right) = I\varphi \left( {X6} \right) = I\varphi \left( {X7} \right) = I\varphi \left( {X8} \right) = I\varphi \left( {X9} \right) = \\ & I\varphi \left( {X10} \right) = I\varphi \left( {X11} \right) = I\varphi \left( {X12} \right) = I\varphi \left( {X13} \right) = I\varphi \left( {X14} \right) = I\varphi \left( {X15} \right) = I\varphi \left( {X16} \right) \\ \end{aligned} $$

3.5 Develop Control or Improvement Measures

According to the order of the bottom event and its importance in the above cut set, it analyzes whether the RTS and its software have designed corresponding corrective and preventive measures to mitigate or eliminate the impact of the bottom event, as shown in Table 2.

Control or improvement measures have been taken to reduce or avoid the occurrence of top event for bottom events that may lead to the failure of RT function of top event (see Table 2). A single fault will not lead to the harmful consequences of the software, but the software common cause fault adopts a diversity system to provide protection, and the trip protection function achieved by the software design of the system has high reliability.

4 Conclusions

By combining the advantages and disadvantages of FMEA and FTA, this research adopted the qualitative reliability analysis method of FMEA and FTA comprehensive analysis to study the reliability of the software of nuclear power plant trip protection system. According to the principles and steps of comprehensive analysis, the FMEA analysis form and FTA fault tree model are established. On this basis, the importance ranking of the bottom event is obtained, and the corresponding prevention or improvement measures are designed to ensure the reliability of the studied software system according to the importance ranking. In conclusion, the research may probably conclude that the software of this RTS has high reliability. A whole set of qualitative and comprehensive analysis methods of software reliability proposed in this research can achieve quantitative evaluation of software reliability if the occurrence probability of bottom events can be obtained.