Abstract
Software reliability is the important quantifiable attribute in gaining reliability by assessing faults at the time of testing in the software products. Time-based software reliability models used to identify the defects in the product, and it is not suitable for dynamic situations. Instead of time, test effect is used in few explorations through effort function and it is not realistic for infinite testing time. Identifying number of defects is essential in software reliability models, and this research work presents a Pareto distribution (PD) to predict the fault distribution of software under homogenous and nonhomogeneous conditions along with artificial neural network (ANN). This methodology enables the parallel evolution of a product through NN models which exhibit estimated Pareto optimality with respect to multiple error measures. The proposed PD-ANN-based SRGM describes types of failure data and also improves the accuracy of parameter estimation more than existing growth models such as homogeneous poison process and two fuzzy time series-based software reliability models. Experimental evidence is presented for general application and the proposed framework by generating solutions for different product and developer indexes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In modern society accessing of computers, Internet, electronic equipment by people is accounting into 50% of world’s total population approximately calculated through a survey by International Telecommunication Union. Digital inclusion being concerned with need and demands of people while using interactive applications. Backbone of these applications depends on software, and it is essential to provide an application as an errorless module. Software engineering aims to produce reliable software to satisfy the user requirements. It plays a vital role in software development as the need of quality software model is measurable by its reliability. Software testing is one of the specific phases in developing such application-based software product. Developers are used to identify the bugs through continuous execution under different test cases. Test case (Zhang et al. 2018) and reliability proportional to each other as the reliability increases if the test case increases. But increasing reliability through test cases needs more effort also increases the cost. It is essential to plan before software testing while allocating test case to reduce the developing cost.
Software reliability plays a significant role in many daily life applications, and it is defined as the probability of failure-free operation over a specified time under specified environment. In software development, life cycle failures are introduced by designers, developers, analysts and managers at different phases. User can get to the present and future steadiness through testing the models and also make decisions about the product, regardless the product is released in its present state or require further testing so as to enhance the nature of software. Identifying such errors and fix the bugs, the entire software system is tested so that, the quality of product increases. Figure 1 gives an illustration of software reliability engineering process as a cycle to achieve reliable model.
In general, reliability models are categorized into deterministic and probabilistic models. Deterministic model identifies faults based on the certainty of the assertions used. Deterministic model uses prediction-based performance over different position and time, and it is reliable for simple manipulations. Probabilistic model uses probability values, and it is further divided into failure rate model, error seeding model, fault count model and software reliability growth models (SRGM). It mainly focuses the failure rate and not the number of failures. It can be obtained on a specific duration rather than obtaining between failure gaps. Failure seed models estimate the faults based on the known number of seeded defects in the product. The software reliability growth model applied in last phase of software development, which improves the reliability by estimating the residual defects.
The major problem in software testing is to take decision about when to stop the testing process. It is important as the developer could not identify faults in software due to repeated examination process and self-confidence level. Revealing faults in program based on software testing metrics measures the effects on module on how much it deviates from the desired output. Conventional testing metrics measures the violation rate, and it uncovers the errors. It simply evaluates each statement and identifies the violations. But program has large number of statements and that estimation is based on software essential to measure the violation and also errors. The research effort in the development of software for testing the assertions could benefit the validation of product also satisfy the user.
2 Related works
Many research works are accessible using software reliability growth models, and this section provides a summary of the existing models. These models are analysed based on the reliability term and stage of utilization in software development. The failure rate identification in product development life cycle can be obtained through relevant reliability, level of developed product and user opinion. Since metrics plays an important role in software testing for early prediction even though any failure data is available. Literature (Wang and Zhang 2018) used deep learning neural network as proposed model to achieve reliability in the software testing. The size and fitness functions are considered to predict the faults in the assertions based on the recurrent neural network encoder and decoder. It is helpful in predicting the fault count to improve the reliability of the product. Research work (Özakıncı and Tarhan 2018) describes about the early software defect prediction as a review by highlighting the various prediction methods, detection performance. Including metric function for defining the faults the EDSP model is referred as better model, as it obtains quantitative results.
The findings from literature (Ivanov et al. 2018) are essential in analysing the software reliability as the proposed research work provides a summary of the various reliability models as a comparison. Open source software (Yang et al. 2016) such as sailfish and Tizen are used in the comparison process to project the experimental results. Software metrics (Amara and Rabai 2017) is considered as parameter in estimation of faults in reliability measurements. Conventional software measurements used basic metrics to stop the evaluation process of the assertions. It is essential to develop a model to assess the reliability by proper decision to improve the quality of the product. In software estimation, Bayesian network model (Dragicevic et al. 2017; Rana et al. 2016) is used to predict the faults in some research models. Data set used in estimation of parameters such as relative error, prediction level and accuracy and mean square error. Bayesian models have advantage over other models as it can able to perform evaluation for agile data sets and provides better prediction accuracy and reliability.
In consolidating the test maturity levels of the developers, the challenges present in the developed assertion must be examined through proper state of art. A survey is presented in literature (Garousi et al. 2017) to present the various software testing models in terms of the maturity level. This comprehensive study helps to identify the advantage and issues present in the evaluation process to improve the maturity level. Introducing poison process in analysing the faults in software reliability process is reported in literature (Wang et al. 2016). Non-homogenous-based model is considered to analyse the fault prediction and classification through the Weibull distribution function (Washizaki 2017; Awad 2016). Similar to poison distribution function, the prediction performance is analysed in the research model. Pitfall (Byun et al. 2017) is an important term in software quality measurements in practical conditions. Normally, pitfall includes Hawthorne effects, uncertain function and quality to improve the prediction level in the machine learning-based models. Matrix-based reliability model is discussed in literature (Sagar et al. 2016) for the growth analysis of system. This model has unique properties as the functions are defined in matrix terms to analyse the faults in the system.
Feedback-based prediction mechanism is discussed in literature (Xiao et al. 2018) in software testing. The results from the assertions which are executed are analysed and then the variations are given as a feedback to the developer again to improve the product as per the requirement is performed in the feedback-based prediction model. Identifying faults in software is essential and, in some cases, it is essential to inject faults (Kooli et al. 2017) in the product to improve the reliability. Various injection and prediction models are discussed in the proposed model. Test cases play a vital role in software engineering. Literature (Zhang et al. 2018) employs the unlabelled test case functions in analysing the fault prediction results for large number of assertions. Software fault prediction based on rule mining approach is proposed in the research model (Shao et al. 2018) for defining the faults in the assertion. Atomic class association rule is used to identify the defects in the data set and also a comparative experimentation is performed with other classifiers in the research work.
Fuzzy Logic (Rizvi et al. 2016) Based Software Reliability Quantification models made the testing process more accurate as an early stage perspective approach. The earlier distribution models have performance metrics to converge the results and focuses only on the faults and not on the classification and introducing GUI (Banerjee 2017), artificial intelligence-based models (Elmishali et al. 2018) fuzzy and machine learning(Singh et al. 2018)-based approaches in the software testing process improves the system efficiency. By summarizing the related works, it is evident that introducing the distribution function to the classification methodologies provides better fault prediction and classification. This proposed model combines Pareto distribution function with artificial neural network model to obtain an improved reliability model in software engineering.
3 Proposed work
The mathematical model of Pareto distribution (PD) in identifying the fault distribution under non-homogenous conditions is presented in this section and also a neural network-based classification is combined to obtain suitable reliability model in software development. The Pareto distribution model is used to obtain the random behaviour of the faults in the developed product. The distribution function is given as
where \( \alpha \) > 0 and e > 0. The mean and variance for the function are given as
The exponential function for Pareto distribution is based on the density function, and it is given as
The distribution function is based on the information about the faults, and the excess faults are identified using maximum likelihood estimator, and it is given as \( \hat{e} = {}_{i}^{\hbox{min} } X \). Let the independent Pareto distribution random variable \( x_{1} ,x_{2} , \ldots ,x_{n} \) is given as
The density function is \( \varGamma \) distributed as \( \hat{\alpha } = n/T \), and the above function is changed into
The joint density function \( x_{1} ,x_{2} , \ldots ,x_{n} \) is given as
As the function \( x_{1} ,x_{2} , \ldots ,x_{n} \) is independent pareto distributed random variable then the output \( y_{1} ,y_{2} , \ldots ,y_{n} \) is also independent and it is identically exponential distributed and it follows the central limit theorem
The moment estimator is given as α and when X is Pareto distributed, the central limit theorem statement is changed into
The above expression is normally distributed with parameters \( \frac{\alpha }{\alpha - 1}e, \frac{\alpha }{{\left( {\alpha - 1} \right)^{2} \left( {\alpha - 2} \right)}}e^{2} \) for \( \alpha > 2 \). In some cases, the fault could not be predicted and the estimator is asymptotically distributed with its parameters. The asymptotical variance is given as \( {\text{ASy}}.{\text{Var}}\left( {\alpha^{0} } \right) > {\text{Var}}\left( {\alpha^{*} } \right) \). Instead of identifying single faults, the total faults and the number of faults go beyond the limit is obtained using the estimator value. The efficiency is defined as the ratio of asymptotic variance and normal variance, and it is given as
Introducing machine learning techniques for selecting the suitable data to identify the fault classes in the proposed model is achieved by combining neural network with Pareto distribution model as a hybrid system. The fitness function in the machine learning approaches helps to obtain the quality of product by classifying the faults by matching with existing data. Traditional models have limitation in terms of validity, long-term and short-term testing reliability models to predict the faults. Using ANN in the proposed model with Pareto distribution uses the random functions which described in earlier section. Selection of faults is based on the results of minimum training error sequence and produces an accurate data for the test product. It provides low training rate by obtaining high error accurate as an early stage perspective approach. Using a feedforward neural network, the proposed algorithm obtains the mean function as
where \( w_{3} w_{2} w_{1} w_{0} \) are the fault weights assigned for the data obtained from Pareto distribution function. The values are used to activate the ANN model for the selected software reliability growth model and its test function. For estimating the failure data based on the weights, a cumulative testing time is obtained based on the cumulative number of failures. The estimation of weights using software failure data pair and mean value the activation function for hidden layers is defined and also a linear function is used in the output layer as activation module. The time function ‘t’ is replaced with test effort function W(t), and then, the mean value is changed into
where the W(t) is obtained for the time interval [0, n] and \( E_{\text{er}} \) is the expected number of detected errors, b is a constant value. The validity of the proposed model is obtained using failure intensity function and it is given as
The proposed model is a combined model of Pareto distribution function and ANN model. The failure intensity is obtained from the ANN model using the test data and train data. Figure 2 gives an illustration of neural network input, hidden and output layer as a function based on failure data.
The effort for improving the fault identification is achieved using ANN in the Pareto distribution function results provides better reliability by identifying the faults in the software product. The process starts from the user requirements, followed by developing attributes for the user requirements. The final output of the product is tested using Pareto distribution function to obtain the fault ratio in the product as a level-based examination. The collected data is given to the ANN model for further classification, and the faults are classified. Proposed model is illustrated as a block diagram in Fig. 3.
The proposed model pseudocode is summarized into steps, and it is described as follows
- Step 1:
-
Normalize the data set patterns
- Step 2:
-
Obtain the error rate and failure intervals using distribution function
- Step 3:
-
Average mean and asymptotic mean is obtained for the failure rate and errors
- Step 4:
-
Initialize the weight values for random functions
- Step 5:
-
Define the Error rate criteria
- Step 6:
-
Obtain the activation function for layers of ANN using mean values obtained in Step 3
- Step 7:
-
Calculate the neurons for proposed model using activation function
- Step 8:
-
Calculate the error from output layer and hidden layer recursively
- Step 9:
-
Perform weight adjustment to classify the faults through output and hidden layer values
- Step 10:
-
Repeat the process until it meets the defined criteria.
4 Result and discussion
The proposed model is experimented using artificial neural network; therefore, the implementation has been performed in MATLAB 14.1 through neural network toolbox. The implementation and classification of failure, error rate data from the Pareto distribution are done through DOT.NET-based model. For analysis, the design metrics of neural network and neurons for the input and output function uses the results from the Pareto distribution function mean and variance values
Figure 4 gives the validation performance for R-value in the neural network model based on number of neurons. It is observed that the neuron increases, the R-value also increases. The range of neurons is limited to 15 in the proposed mode.
Four data sets are used to analyse the proposed framework, and Table 1 gives the description about the failure data sets obtained in the initial stage of proposed model.
Using multiple weights in ANN leads to improve the quality of the product, and this failure data set is used to analyse the proposed model. It includes the duration and the frequency of failures and overall analysis time. Four data sets are used with different values to analyse the proposed model performance, and the data set is obtained from the database of Tandem computers company as a subset data. The minimum training error for the proposed model is defined as 65% for the training data set for the selected weights also about 15% of non-overlapping data set is used to ensure the proposed model prediction and detection accuracy. Table 2 gives the MSE values for the proposed model for training and validation.
Figure 5 depicts the expected number of faults from Pareto distribution ANN model, compared with fuzzy model, and the conventional ANN model against four data sets with a specified testing time, respectively. The essential fitness function is based on the adjustments on the parameters. It is observed that proposed PD-ANN model estimates the number of faults for data sets and also it provides the best match function for the four data sets.
The mean squared error for training and validation of four weight sets of the proposed model is given in Table 2. It is observed that the training error is similar to all the data sets which contains similar failure rates. If the failure rate increases, the MSE value also increases. On the basis of three trails, the final result was obtained for the given data set. If the validation error is large on first trail, it reduced into small values on the third trail since the validation errors lower than the third trail only may fit for the future data equally. The estimation of release time is essential for converging the test phase. Table 3 gives the release time estimation for the data set used in the proposed model.
The experimental results of software reliability (t, T) for 10th week, 20th week and 30th week are given in Fig. 6 as reliability growth rate with respect to its corresponding weeks. It is observed that the time frames undergone drops in initial stage and slowly the faults are removed. Once the faults are removed, the proposed model achieves a stationary point.
Figure 7 gives the evolution of reliability (t, T) similar to reliability growth rate analysis time of 10th week, 20th week and 30th week. It is observed that the initial values fall to stationary zero value as the week increases.
The performance of the proposed model is compared with existing conventional artificial neural network model and fuzzy model. It is observed that the proposed model performance confirms the selected faults and enhances the system performance using the 15% test data, and the calculated MSE for the test data is given in Table 4.
5 Conclusions and future work
In this research model, we presented a novel software testing technique as software reliability growth model based on Pareto distribution and artificial neural network model. The main aim of the research is to enhance the performance of software testing system for large assertions also to provide errorless product to the user based on their requirements through software developers. Identifying faults and classifying them helps the developers to improve the product and also it provides an idea about the level to stop assertion analysis. For evaluating the proposed model, an experimentation is performed based on the theoretical model with different set of assertions. The performance of the proposed model is considerably good then the existing neural network-based models as Pareto distribution function was not applied in earlier research models. In future, we would to extend the research work through optimization models to enhance the performance for commercial-based software products.
Change history
03 September 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s00500-024-10136-z
References
Amara D, Rabai LBA (2017) Towards a new framework of software reliability measurement based on software metrics. Proc Comput Sci 109:725–730
Awad M (2016) Economic allocation of reliability growth testing using Weibull distributions. Reliab Eng Syst Saf 152:273–280
Banerjee I (2017) Advances in model-based testing of GUI-based software. Adv Comput 105:45–78
Byun J-E, Noh H-M, Song J (2017) Reliability growth analysis of k-out-of-N systems using matrix-based system reliability method. Reliab Eng Syst Saf 165:410–421
Dragicevic S, Celar S, Turic M (2017) Bayesian network model for task effort estimation in agile software development. J Syst Softw 127:109–119
Elmishali A, Stern R, Kalech M (2018) An Artificial Intelligence paradigm for troubleshooting software bugs. Eng Appl Artif Intell 69:147–156
Garousi V, Felderer M, Hacaloğlu T (2017) Software test maturity assessment and test process improvement: a multivocal literature review. Inf Softw Technol 85:16–42
Ivanov V, Reznik A, Succi G (2018) Comparing the reliability of software systems: a case study on mobile operating systems. Inf Sci 423:398–411
Kooli M, Kaddachi F, Di Natale G, Bosio A, Torres L (2017) Computing reliability: on the differences between software testing and software fault injection techniques. Microprocess Microsyst 50:102–112
Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239
Rana R, Staron M, Berger C, Hansson J, Meding W (2016) Analyzing defect inflow distribution and applying Bayesian inference method for software defect prediction in large software projects. J Syst Softw 117:229–244
Rizvi SWA, Singh VK, Khan RA (2016) Fuzzy logic based software reliability quantification framework: early stage perspective (FLSRQF). Proc Comput Sci 89:359–368
Sagar BB, Saket RK, Singh G (2016) Exponentiated Weibull distribution approach-based inflection S-shaped software reliability growth model. Ain Shams Eng J 7(3):973–991
Shao Y, Liu B, Wang S, Li G (2018) A novel software defect prediction based on atomic class-association rule mining. Expert Syst Appl 114:237–254
Singh A, Bhatia R, Singhrova A (2018) Taxonomy of machine learning algorithms in software fault prediction using object-oriented metrics. Proc Comput Sci 132:993–1001
Wang J, Zhang C (2018) Software reliability prediction using a deep learning model based on the RNN encoder–decoder. Reliab Eng Syst Saf 170:73–82
Wang J, Zhibo W, Shu Y, Zhang Z (2016) An optimized method for software reliability model based on nonhomogeneous Poisson process. Appl Math Model 40(13–14):6324–6339
Washizaki H (2017) Pitfalls and countermeasures in software quality measurements and evaluations. Adv Comput 107:1–22
Xiao P, Liu B, Wang S (2018) Feedback-based integrated prediction: defect prediction based on feedback from software testing process. J Syst Softw 143:159–171
Yang J, Liu Y, Xie M, Zhao M (2016) Modeling and analysis of reliability of multi-release open source software incorporating both fault detection and correction processes. J Syst Softw 115:102–110
Zhang X-Y, Zheng Z, Cai K-Y (2018) Exploring the usefulness of unlabelled test cases in software fault localization. J Syst Softw 136:278–290
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We don’t have any conflicts of interest.
Research involving human participants and/or animals
No animals and humans are involved.
Informed consent
We use our own content.
Additional information
Communicated by Sahul Smys.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s00500-024-10136-z"
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sudharson, D., Prabha, D. RETRACTED ARTICLE: A novel machine learning approach for software reliability growth modelling with pareto distribution function. Soft Comput 23, 8379–8387 (2019). https://doi.org/10.1007/s00500-019-04047-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04047-7