1 Introduction

Developing high-quality software is the ultimate goal of any software development organization, but one of the most challenging aspects of quality, it can usually only be measured after delivery, and reliability is the primary measure of software quality. During development, there are many attempts to assess software quality. Such estimates are likely to aid the engineering of high-quality software by providing useful insights to project managers. Data mining is the use of various heuristic methods or tools to collect a large set of data, which is stored in a database or data warehouse for analysis, the goal is to discover hidden patterns and relationships in the data set and summarize the data in a form that decision makers can understand, that is to realize "data, information, knowledge, value" transformation process. In the software, data mining technology can be used to mine, collect effective data, and store it in the database to sort out data, improve document security, and filter dangerous and useless information.

Visagan A. R, organizations are typically equipped with data related to their past software versions that can be used to build models that can help predict the reliability of the software being developed. This study uses data collected from Sonar Cloud to build a model that can help predict software reliability (Visagan et al. 2020a). He H. proposed an optimized software fault mode miner based on complex network. By analyzing the relationship between multiple execution trajectories of software and functions, a weighted software execution dependency graph model is finally established, the traversal database is generated by the depth-first search strategy, and the extraction of the software path traversal to evaluate the reliability of the software (He et al. 2020). Agrawal V proposed an unsupervised software fault prediction method to improve the reliability of the software. The purpose of this method is to determine whether degrees exist in the early life cycle: (requirements measurement), whether it is possible to select the measures available later in the lifecycle (code metrics) to use clustering techniques or classification techniques to identify modules prone to failure and determine which technique can get better results (Agrawal 2017; Bhola et al. 2021). Many researchers (Li et al. 2017; Rajbahadur et al. 2017; Kishor et al. 2021a; Lu et al. 2015) have tried to use the data of one project to predict the defects for some other projects and known as cross project defect prediction (CPDP).

Based on the extensive literature survey it is found that while developing new software it is very important to achieve the optimum quality level for the software. There are number of techniques, methods and models which can be used for training and testing of the software. The key point of software reliability evaluation is how to model the reliability of a software system. BP artificial neural network is an important branch of data mining technology. It must have the self—learning, self—organization, good fault—tolerance, and good nonlinear approximation ability. So the purpose of this study is to verify the advantages of artificial neural network as a nonlinear model compared with traditional linear model in software reliability prediction and evaluation.

2 Methods

2.1 Principle of BP neural network

The learning rule of BP neural network is to use the gradient descent method to adjust the weight and threshold of the network constantly through back propagation, to minimize the sum of error squares of the network. The topological structure of BP neural network model includes Input Layer, hid Layer and Output Layer, as shown in Fig. 1. The hidden layer can be one or more layers, and there is no coupling in the nodes of the same layer. The input signal is transmitted from the input layer node to the hidden layer node and then to the output layer node. The output of each layer of nodes only affects the output of the next layer of nodes.

Fig. 1
figure 1

Neural network structure based on BP algorithm

Each node cell mimics the three most basic and important functions of biological neurons: Weighting, summation and transfer. The neuron model is shown in Fig. 2, where \(x_{1} ,\;x_{2} ,\; \ldots ,\;x_{i} ,\; \ldots ,\;x_{n}\), representing neuron 1 from the upper layer, the input of 2, …, i. n. \(w_{j1} ,\;w_{j2} ,\; \ldots ,\;w_{ji} ,\; \ldots ,\;w_{jn}\) represents the connection strength between the neuron in the upper layer and the jth neuron in this layer, the weight. \(b_{j}\) is the threshold. \(f\left( \cdot \right)\) is the transfer function. \(y_{j}\) is the output of the jth neuron in this layer. As shown in formula (1).

$$yj = f\left( {\sum\limits_{i = 1}^{n} {w_{ji} \cdot x_{i} + b_{j} } } \right)$$
(1)
Fig. 2
figure 2

BP neurons

In BP neural network, the transfer function of a node element is usually sigmoid type function, but in the output layer, the node element is sometimes a linear function. BP neural network can be regarded as a highly nonlinear mapping of input to output. Homik et al. have proved that: If the input layer and output layer adopt the linear transformation functions, and the hidden layer adopts sigmoid transformation functions, then a network with a hidden layer can approximate any rational function with arbitrary precision.

2.2 BP network learning algorithm

The standard BP neural network learning algorithm is as follows:

  • Compute the output of nodes at each layer in the network.As shown in formula (2).

    $${\text{Out}}_{j} = f\left( {\sum\limits_{i = 1}^{n} {w_{ji} x_{i} + b_{j} } } \right)\;\;j = 1,\;2,\; \ldots ,\;m$$
    (2)

    Outj represents the output of the jth neuron in this layer, transfer function adopts \(f(x) = {1 \mathord{\left/ {\vphantom {1 {(1 + \exp ( - x)),\;w_{ji} }}} \right. \kern-\nulldelimiterspace} {(1 + \exp ( - x)),\;w_{ji} }}\) stands for weight, \(b_{j}\) stands for threshold, n is the number of neurons in the upper layer, and m is the number of neurons in this layer.

  • Calculate the total error E to determine whether the tolerance threshold is reached, the total error E between the expected output and the actual output is formula (3):

    $$E = \frac{1}{2}\sum\limits_{p} {\sum\limits_{i} {\left( {d_{{p_{i} }} - o_{{p_{i} }} } \right)} }^{2}$$
    (3)

where, \(d_{pi}\) is the expected output value of neuron i corresponding to the pth input mode. \(o_{pi}\) is the actual output of the neuron. If the total error E reaches the tolerance threshold, the learning is finished, otherwise, reverse transitional learning is carried out.

  • Calculate the network error

For neurons in the output layer, the error signal \({\updelta }_{j}\) is formula (4)

$$\delta_{j} = {\text{Out}}_{j} \left( {1 - {\text{Out}}_{j} } \right)\left( {d_{pi} - o_{pi} } \right)$$
(4)

For neuron j in the hidden layer, there is no special target value. The error signal \(\delta_{j}\) is determined recursively in terms of all neurons k that are directly connected to neuron j and their weights. As shown in formula (5).

$$\delta_{j} = {\text{Out}}_{j} \left( {1 - {\text{Out}}_{j} } \right)\left( {\sum\limits_{k} {\delta_{k} w_{kj} } } \right)$$
(5)

(4) Modify the network weight.

During the nth iteration, the weights are adjusted according to the following criteria.As shown in formula (6), (7):

$$\Delta W_{ij} (n + 1) = \eta \delta_{j} {\text{Out}}_{i}$$
(6)
$$W_{ij} (n + 1) = W_{ij} (n) + \Delta W_{ij(} n + 1)$$
(7)

Type: \(W_{ij} (n + 1)\) is between neuron i, which provides input, and neuron j, which lies on the next hidden or output layer, the weight in the n+1 round. \(\eta\) is learning rate. \({\text{Out}}_{i}\) is the output of the ith neuron.

The entire learning process iterates until the total error E reaches the tolerance threshold. In practice, people will also provide a maximum number of iterations M. Learning also ends when the number of iterations exceeds M.

2.3 Reliability modeling based on BP network

Software reliability evaluation is to predict the reliability of target software based on the analysis of software failure data. Dynamic prediction of the number of software failures can be simply defined as: firstly, a set of cumulative execution times collected in the process of software testing up to the current time and the corresponding cumulative number of observed faults are given, then, the cumulative execution time of the future test phase is given to calculate the corresponding predicted cumulative failure number (Ding and Xing 2020). According to the mapping method of neural network, it can be simply described as: \(P:\left\{ {(I_{t} ,O_{t} ),i_{t + h} } \right\} \to O_{t + h}\), among them, \(\left\{ {(I_{t} ,O_{t} )} \right\}\) represents the failure history of the software system, \(O_{t + h}\) is the prediction result of \(i_{t + h}\) in the future time.

After a certain number of tests are carried out on the software system, the collected software failure data can be used as the training samples of the neural network to build a model and predict the cumulative failure number in the future. The basic method of neural network is: according to the training algorithm and user experience, the structure of BP network is statically organized, and then the historical data of software failure is used to train (or learn) the network until its performance meets certain requirements. The trained neural network can evaluate the reliability of the software in the future (Visagan et al. 2020b).

The BP network structure is designed as 3 layers: Input layer, hidden layer, and output layer, while the dynamic prediction of software failure number has only one node in the input layer, which represents the cumulative execution time 1, and the number of nodes in the hidden layer is determined by experience, generally, it is 2~3 times of the input node. The output layer also has only one node, which represents the cumulative number of failures. In practice, it is found that such a network structure is not easy to converge. The main reason is that the input pattern space is one-dimensional and does not provide enough information for network learning. Therefore, the input pattern space must be extended (Krejsa et al. 2017; Jairath et al. 2021). The usual method is to use an orthogonal complete sets, such as the form extension function of \(\sin \pi x\), \(\cos \pi x\) and \(\sin 2\pi x\), \(\cos 2\pi x\), \(x\), \(x^{1}\), \(x^{2}\), \(x^{3}\)\(x^{12}\), the prediction effect is better.

Therefore, 12–24-1BP forward network is adopted, that is, there are 12 nodes in the input layer, 24 nodes in the hidden layer, and 1 node in the output layer. The predicted network training process is as follows:

  • Normalize the time \(t\), \(t^{2}\), …, \(t^{12}\), of the input node to 0~1, and the expected value of the, output node, that is the cumulative number of, faults tested is reduced to 0~1.

  • Initialize the weights of each layer of the network.

  • Compute the output of each layer of nodes in the network.

  • The global error E is calculated to determine whether the tolerance threshold is reached. If the allowable range is reached, the training ends. Otherwise, go to Step (5).

  • To reduce the error between the actual output and the expected output.

  • After all weights and thresholds are corrected, return to step (3) to continue the training. After the network is trained, the cumulative execution time parameters of the software are sent to the network, and the output value of the output layer is the cumulative fault number predicted by the BP network. The flow chart is shown in Fig. 3.

Fig. 3
figure 3

Neural network training and prediction process

3 Examples and conclusions

3.1 System structure of test software

A bank's general security management system BHRF is tested according to BP network model. After several rounds of modification and testing, the function of BHRF system has become stable, and its reliability needs to be tested and evaluated. To this end, a large amount of failure data of the software was collected in advance. The severity of the bug was divided into four levels: Urgent (the most serious), High (High), Medium (middle), and Low (Low). The distribution of bug severity is shown in Fig. 4.

Fig. 4
figure 4

Bug severity distributions

Since bugs with low severity are mostly related to the display of interface elements, while software reliability refers to the failure of software functions, only bugs with medium severity are counted (Nosratian et al. 2020; Kishor et al. 2021b). The cumulative test time of THE BHRF system and the corresponding cumulative failure times are shown in Table 1.

Table 1 BHRF failure data statistics

Software safety reliability evaluation to verify the nonlinearity of BP network needs to be compared with the traditional linear evaluation model.

3.2 Reliability evaluation based on BP network

Using MATLAB 6.5 to design and simulate the BP neural network. According to the reliability evaluation principle based on BP network described above, the structure of BP neural network is designed. The network consists of three layers: input layer, hidden layer and output layer (Zhang et al. 2018; Mahajan et al. 2021). The dimension of the input layer is 12, the hidden layer has 24 neurons, and the output layer contains 1 neuron, as shown in Fig. 5. The hidden layer uses the logs transfer function (LOGSIG), which outputs data as \(\log sig(n) = {1 \mathord{\left/ {\vphantom {1 {(1 + \exp ( - n)}}} \right. \kern-\nulldelimiterspace} {(1 + \exp ( - n)}})\). The output layer uses a linear transfer function (PURELIN), which outputs the data according to the original value according to \({\text{pureline}}\;(n) + n\).

Fig. 5
figure 5

Neural network simulation structure designs

TRAINGDM was used as the training function. TRAINGDM uses the momentum gradient descent back propagation algorithm to train the network. When updating weights and thresholds, this function not only considers the current gradient direction, but also the gradient direction of the previous moment, to reduce the sensitivity of network performance to parameter adjustment and effectively suppress local minimization (Eppelbaum 2015). Other training parameters are set as follows:

netrain.Param.epochs = 100,000.

net.train.Param.goal = 0.00001.

net.train.Param.Ir = 0.01.

nettrain.Param.max_ fail = 5.

net.train.Param.mc = 0.9

net.train.Param.min_ grad = le-010.

net.rain.Param.show = 25.

net.train.Param.time = inf.

Taking the data in Table 1 as the training set, the input mode space is orthogonal extended according to the \(x,\;x^{2} ,\; \ldots x^{12}\) function form described above, the cumulative test time and cumulative failure times in the training set were normalized. Then input the network for training. When the network converges after 100,000 cyclic iterations, the training ends.

The last four groups of data in Table 1 are taken as the test set. After the processing of test data through the Orthogonal, extension and normalization the cumulative test time of the tests is recorded. After completion of processing the data is used as input into the BP neural network trained above (Chakraborty et al. 2015; Dash et al. 2021). The simulation results are obtained. After the reverse normalization, the cumulative failure times of the software when the cumulative test time reaches 680, 720, 760, and 800 h are, respectively: (204.8970, 210.1200,223.8680,239.0520).

3.3 Reliability evaluation of the comprehensive model based on linear network

To compare the model performance, three classic software reliability models are selected to model the software failure data in Table 1, the selected models are: Jelinski-Moranda model (JM), Goel-Okumoto model (GO), and Yamada S-Shaped model (S-Shaped) (Bhuyan et al. 2016). Linear neural network structure, including input layer and output layer with two levels. The dimension of the input layer is 3, and the dimension of the output layer is 1, as shown in Fig. 6. The output layer uses a linear transfer function (PURELIN), which outputs data according to the original value purelin(n) = n.

Fig. 6
figure 6

Neural network simulation structure design

Similarly, we modeled the first 16 groups of data in Table 1 and predicted the last 4 groups of data. The actual cumulative failure times, THE simulation results of JM model, GO model, S-shaped model, BP neural network model, and the comprehensive model based on linear neural network are listed in Tables 2 and 3.

Table 2 Actual cumulative failure times and simulation results of each model
Table 3 Test data sets

It can be seen from the comparison results in Tables 2 and 3 that the BP neural network prediction method has higher prediction accuracy than the traditional model (JM model, GO model, S-Shape model) the feasibility and effectiveness of artificial neural network for software reliability prediction and evaluation are proved. However, it can be seen from the forecast data in Table 3 that BP neural network can predict the 17th, 18th, and 19th groups of cumulative failure times more accurately, while the prediction deviation for the 20th group is large. Through similar multiple simulations, it is found that the short-term prediction effect of BP neural network is better than the long-term prediction effect, and the average error tends to become larger as time goes on (He et al. 2020). If long-term prediction is required, the rolling training method can be adopted, that is, as time goes by, new samples are continuously added for training, which can improve the long-term prediction accuracy of the neural network.

4 Conclusions

Data mining plays an important role in many fields and plays a significant role in software reliability prediction and evaluation. After software development is completed, a large number of failures and invalid data will be generated in the process of testing. BP neuron model is built according to these data, through intensive learning of the data, the ability to judge and predict faults is formed, and the reliability of the software is evaluated. The topological structure of BP neural network model includes Input Layer, hid Layer, and Output Layer, which can be regarded as a highly nonlinear mapping from input to output. If the input layer and output layer adopt the linear transformation function and the hidden layer adopts sigmoid transformation functions, then the network with a hidden layer can approximate any rational function with arbitrary precision. In this study, the BP neuron model is compared with the traditional linear data mining model, according to the experimental data, the BP neuron data mining model as a nonlinear data mining method, compared with the traditional linear mining model, it has higher accuracy in fault prediction and troubleshooting.

Reliability of software is the first prerequisite for its use, and it is also an urgent computing problem to be solved. Software with poor reliability will not only fail to function as it should, but also cause failures and increase operating costs. It is proved that nonlinear data mining has a broad application prospect to realize the reliability evaluation problem of software. However, there are still some caves. For example, although some computer scientists have shown that the light layer in the BP neuron model can achieve infinite approximation to any function, it’s just a corollary in the actual operation, it still needs to rely on the knowledge level and practical experience of experiencers, so it still has a broad research space in the future.

Based on the results it can be concluded from the forecast data in Table 3 that BP neural network can predict the 17th, 18th, and 19th groups of cumulative failure times more accurately, while the prediction deviation for the 20th group is large.