1 Introduction

Now a day, software has become the integral part of most of the complex applications and people are working under direct or indirect influence of software. Therefore, it is very important to ensure the reliability of the software system. The reliability of a software system depends upon the number of residual defects. A defect is the product anomaly (IEEE 1988). A general method to measure the reliability of software is to reveal the presence of defects in it, and usually the metric used for it is defect density (DD). The DD is defined as the total number of defects divided by the size of the software (IEEE 1990). The software defect density indicator metric provides the information regarding the reliability improvement during development phases.

Software reliability is an important factor of software quality. Software reliability is the probability that software will not cause any failure of a system for a specified period of time under the specified conditions (IEEE 1990). Reliability is requested to be assured in almost all safety–critical system. Software reliability model was designed to quantify the likelihood of software failure (IEEE 1988; Lyu 1996). The termination of the ability of a functional unit to perform its required function called failure (IEEE 1990). Software reliability plays an important role in the early software development phases (Musa et al. 1987). Lots of study in the past has been made for software reliability estimation and prediction (Lyu 1996; Pham 2007). (Gaffney and Davis 1988; Gaffney and Pietrolewiez 1990) proposed a phase based model for predicting reliability by using the faulty statistics. Rome Laboratory developed a model for early software reliability prediction (McCall et al. 1992; Friedman et al. 1992). The model is mainly based on the software requirement specification and data collected by the organization. Agresti and Evanco (1992) proposed a model to predict defect density on the basis of process and product characteristics. (Smidts et al. 1998) developed a reliability prediction model based on the requirements change request during the SDLC. The traditional models for software reliability prediction are neither universally successful in predicting reliability behavior, nor generally tractable to users (Cai et al. 1991). The majority of models is based on probabilistic approach.

The causal model for defect prediction with Bayesian net is developed by (Fenton and Neil 1999; Fenton et al. 2007; Fenton et al. 2008). The main feature is that it does not require detailed domain knowledge and it combines both qualitative and quantitative data. (Mohanta et al. 2010, 2011) proposed a model to predict the reliability of object-oriented systems during the early stages of the product development based on bottom-up approach. In this approach, the reliability of the overall system is estimated based on operational profile and reliabilities of classes. (Octane and Yildiz 2014) proposed a novel method using Bayesian networks to explore the relationships among software metrics and defect proneness.

(Pandey and Goyal 2009) have proposed an early fault prediction model using process maturity and software metrics. They have considered the fuzzy profiles of various software metrics in different scale and have not explained the criteria used for developing these fuzzy profiles. The method level metrics are used in most of the fault prediction models. Yadav et al. (2012) proposed a software defect prediction model in which they had considered only the uncertainty associated over the assessment of software size metric and three metrics of requirement analysis phase. (Catal and Diri 2009; Catal 2011) provided a systematic review of various software fault prediction studies with a focus on metrics, methods and datasets. (Radjenovic et al. 2013) reported that the process metrics are successful in finding the faults. (Can et al. 2013) suggested a model for software defect prediction in which they used the benefit of the non-linear computing capability of support vector machine and parameters optimization capability of particle swarm optimization. Recently, (Maa et al. 2014) analyze the ability of requirement metrics for software defect prediction during the design phase.

The most of software reliability models are based on failure data. However, failure data are not available in the early phases of SDLC. There are many factors which affect the software reliability in SDLC. Thirty-two factors are identified which have an impact on the software reliability (Zhang and Pham 2000). In another study, (Li et al. 2000; Li and Smidts 2003) identified thirty software metrics which influence the software reliability.

In fact, most of the software metrics are associated with uncertainty. The smaller size of software testing data, unrealistic assumptions, and the fact that some measures cannot be defined precisely, are the key reasons that a fuzzy logic approach should be developed for predicting the software reliability at the early phase of the SDLC.

The rest of the paper is organized as follows: the proposed model and methodology is discussed in Sect. 2. In Sect. 3, a case study is presented. Results and validation are discussed in Sect. 4 and 5 respectively. Conclusion is presented in Sect. 6.

2 Proposed model and methodology

In the proposed model, defect density indicator of early phases of SDLC is predicted based on the measures present in the early phases of SDLC. Therefore, proposed model leverages the top most reliability relevant metrics (Li et.al. 2000; Li and Smidts 2003) from early phases of SDLC. In the requirement analysis phase, the defect density indicator is predicted using requirement fault density (RFD), requirement stability (RS), and review, inspection and walk through (RIW) software metrics.

Fig. 1
figure 1

Proposed model architecture

The defect density indicator predicted at the end of requirement phase (RPDDI) is taken as input in the design phase along with cyclomatic complexity (CC) and design review effectiveness (DRE) to predict the defect density indicator at the end of the design phase. Similarly, the defect density indicator predicted at the end of the design phase (DPDDI) taken as input in coding phase along with the programmer capability (PC) and process maturity (PM) (Fig. 1). At the end of coding phase, we will get the total number of defects predicted for the software before testing phase using coding phase defect density indicator (CPDDI).

The following steps are involved in this proposed model

  1. A.

    Selection of software metrics

  2. B.

    Define the membership function of each input and output variable

  3. C.

    Design fuzzy rules

  4. D.

    Perform fuzzy inference, and defuzzification.

2.1 Selection of software metrics

Software metrics that are considered in the proposed model are explained as follows:

2.1.1 Requirement phase software metrics

  1. (i)

    Requirement stability (RS) Requirement stability is inversely proportional to requirement change request. The requirement change may happen at any time during a software project development. Studies have exposed that more than half the errors are due to imprecisely defined requirements during software development.

  2. (ii)

    Requirement fault density (RFD) This metric measures the fraction of faulty requirements specification documents. Requirement fault density provides an indicator of the software quality of developing software during the requirement analysis phase.

  3. (iii)

    Review, inspection and walk-through (RIW) This metric purify the software product and can be applied at various points during software project development. The goal of the review process is to ensure that the software requirement specification is feasible, complete, consistent and accurate. From a quality point of view, it is very important metrics.

2.1.2 Design phase software metrics

  1. (i)

    Cyclomatic complexity (CC) The measurement of Cyclomatic complexity by McCabe (Kan 2002) was intended to specify a program’s understandability and testability. It can be used to indicate an upper bound in the model for estimating the number remaining software defects.

  2. (ii)

    Design review effectiveness (DRE) Design defects are usually found by a design review process during the software project development. The goal of design review is to make sure that the design meets the stakeholder’s requirements or to find whether design requires modification.

2.1.3 Coding phase software metrics

  1. (i)

    Programmer capability (PC) Software complexity depends on the experience of the staff and their intelligence. An experienced and sound technical background programmer will develop quality software with the least number of defects.

  2. (ii)

    Process maturity (PM) In Software Company, capability maturity model (CMM) plays a key role in defining software development process improvement. CMM has five levels. Software defect density reduces as one proceeds from one CMM level to next CMM level.

2.2 Define the membership function of each input and output variable

There are many methods of membership value assignment such as: rank ordering, intuition, inference, etc. (Yadav et al. 2012; Ross 2004; Yadav et al. 2012; Yadav and Yadav 2013; 2014; Yadav et al. 2011; Verma et al. 2007). In the intuition method, fuzzy profile is derived from the ability of humans to develop a fuzzy profile through their own innate intelligence and understanding. In the inference method of fuzzy profile development, one uses knowledge to perform deductive reasoning. The assessing preference of a single individual, a committee, a poll, and other opinion methods can be used to assign membership values to a fuzzy variable in the rank ordering method. Membership functions for all the input and output software metrics which are considered in the proposed model should be defined by domain experts. Developing a fuzzy profile of selected software metrics with the help of domain expert knowledge is one of the basic steps in the design of a problem which is to be solved by fuzzy set theory. There are no standard guidelines or rules that can be used for the appropriate membership function construction technique. Another problem that makes membership function construction an important task is the lack of consensus on the definition and interpretation of membership functions. The majority of the methods is application domain dependent and complex. It is impractical to use different membership function construction technique for different application problem. It is not impossible, come up with a single membership construction technique which will work for most application problems.

Membership function can have a variety of shapes like polygonal, trapezoidal, triangular, and so on (Ross 2004; Yadav et al. 2012; Yadav and Yadav 2013; 2014). However, triangular and trapezoidal shapes provide a convenient representation of domain expert knowledge and it also simplifies the process of computation (Kaya and Alhajj 2003). In the proposed model membership function of all the input and output metrics are defined with the help of domain experts. In this model triangular and trapezoidal membership are considered for representing the linguistic state.

2.3 Design fuzzy rules

In this step fuzzy rule is defined in the form of IF–THEN conditional statement.

  • IF A is X

  • THEN B is Y

IF part of the rule is known as the antecedent and THEN part is consequent (Zadeh 1989). Fuzzy rules that are required for the prediction of defects of software projects are defined using human intuition. Instead of considering the entire set of input variables at the same time, software metrics involved from phase to phase reduces the required number of rules. Considering all the selected software metrics one at a time, it is required to define large numbers of rules for the prediction of software defects. However, instead of using generalized fuzzy inference methods proposed model considers cascading the input variables. Therefore, less numbers of rules alone are required.

2.4 Perform fuzzy inference, and defuzzification

Fuzzy inference engine evaluates and combines the result of each fuzzy rule. Fuzzy inference engine maps, fuzzy set into a fuzzy set. A fuzzy Max–Min operator is used for this step. In many applications, the crisp value needs to be obtained as an output. The defuzzification method such as centroid, max–min and bisection etc. maps, fuzzy set into crisp value (Ross 2004). The process of fuzzy inference and defuzzification is shown in Fig. 2. Centroid method of defuzzification is used to calculate the value of z* in this model.

Fig. 2
figure 2

Process of fuzzy inference and defuzzification

3 Case studies

3.1 The data set used

In order to validate the proposed model, twenty real software project data sets (Fenton et al. 2008) are used for case studies and that is reproduced in Table 1.

Table 1 Software projects metrics data

3.2 Model illustration: case study 1

In this case study, software project one has been considered to explain the proposed approach. Following are the steps for finding the defect density indicator and total number of residual defects for software project one before the testing phase.

3.2.1 Selection of software metrics

The selected software metrics and their fuzzy range and values for early phases of the SDLC are shown in Tables 2, 3 and 4.

Table 2 Requirement analysis phase software metrics
Table 3 Design phase software metrics
Table 4 Coding phase software metrics

3.2.2 Define the membership function of input and output variable

Membership functions for individual software metrics are illustrated in this section. Membership functions for each input and output software metrics are shown in Figs. 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12. The proposed model consists of a set of input and output values. The range of software input and output metrics are in normalized form.

Fig. 3
figure 3

Requirement stability

Fig. 4
figure 4

Requirement fault density

Fig. 5
figure 5

Review, inspection and walkthrough

Fig. 6
figure 6

Requirement phase defect density indicator

Fig. 7
figure 7

Cyclomatic complexity

Fig. 8
figure 8

Design review effectiveness

Fig. 9
figure 9

Design phase defect density indicator

Fig. 10
figure 10

Programmer capability

Fig. 11
figure 11

Process maturity

Fig. 12
figure 12

Coding phase defect density indicator

3.2.3 Design fuzzy rules

The fuzzy rules for project one in early phases of the SDLC are shown phase wise in Tables 5, 6 and 7.

Table 5 Requirements phase fuzzy rule
Table 6 Design phase fuzzy rule
Table 7 Coding phase fuzzy rule
  1. (i)

    Requirements phase fuzzy rule If RS is high, the defect will be low and if RFD is high, the defect will be higher but it is not applicable for RIW. Therefore, the fuzzy rules are interpreted in the following manner.

  2. (ii)

    Design phase fuzzy rule For lower value of CC, the defect will be lower, but for lower values of DRE, the defect will be higher. Therefore, the following fuzzy rules are developed.

  3. (iii)

    Coding phase fuzzy rule If the PC and PM are high, then defect will be low in a software project. Therefore, the fuzzy rules are developed as follows:

3.2.4 Perform fuzzy inference, and defuzzification

The defect density indicator value is obtained using fuzzy inference tool of MATLAB at the end of requirement analysis phase, design phase and coding phase. The Result of case study one is shown in Table 8.

Table 8 Defect density indicator of project one

4 Prediction result

The prediction results for 20 case studies are shown in Table 9. Table 9 shows the actual defects, predicted defects and defects predicted by (Yadav et al. 2012) and (Fenton et al. 2008). Defects of software projects are obtained based on defect density indicator in the coding phase of the respective project, which has been compared with the similar results done by Fenton, et al. (2008) and (Yadav et al. 2012).

Table 9 Predicted defect density indicator in requirement analysis, design, and coding Phase

We can observe from Fig. 13 that the maximum number of defect density occurs in requirement analysis phase, which also effect later on in the design phase and coding phase. It is also observed that the software metrics that are responsible for the defect density present in the initial phases of SDLC need to be considered with more attention than the metrics that become available in the later phases of SDLC. Early software defect density indicator prediction could improve the reliability of a software project and helps software managers to achieve reliable software within time and costs.

Fig. 13
figure 13

Defect density indicator in early phases

In case study 8, 15 and 18, defects in the design phase are higher than the requirement analysis phase. Design phase metrics are critical in these projects. Similarly, in case study 1, 2, and 13 coding phase metrics require high consideration along with design phase metrics.

5 Model validation

5.1 Evaluation measures

To validate the prediction accuracy of the proposed model commonly used and suggested evaluation measures have been taken (Fenton et al. 2008; Yadav et al. 2012; Chulani et al. 1999; Kitchenham et al. 2001).

  1. (i)

    Mean magnitude of relative error (MMRE)

MMRE is the mean of absolute percentage errors and a measure of the spread of the variable Z, where the Z = estimate/actual

$${\text{MMRE}} = \frac{1}{\text{n}}\mathop \sum \limits_{{{\text{i}} = 1}}^{\text{n}} \frac{{\left| {{\text{y}}_{\text{i}} - {\hat{\text{y}}}_{\text{i}} } \right|}}{{{\text{y}}_{\text{i}} }}$$

where, \({\text{y}}_{\text{i}}\) is the actual value and \({\hat{\text{y}}}_{\text{i}}\) is the estimated value of a variable of interest.

  1. (ii)

    Balanced mean magnitude of relative error (BMMRE)

MMRE is unbalanced and penalizes overestimates more than underestimates. For this reason, a balanced mean magnitude of the relative error measure is also considered which is as follows:

$${\text{BMMRE}} = \frac{1}{\text{n}}\mathop \sum \limits_{{{\text{i}} = 1}}^{\text{n}} \frac{{\left| {{\text{y}}_{\text{i}} - {\hat{\text{y}}}_{\text{i}} } \right|}}{{{\text{Min}}\left( {{\text{y}}_{\text{i}} ,{\hat{\text{y}}}_{\text{i}} } \right)}}$$

The lesser value of MMRE and BMMRE indicates better accuracy of prediction.

5.2 Validation results

The proposed model is validated using actual defects, and the predicted result of Yadav et al. (2012) and (Fenton et al. 2008). Fenton proposed a Bayesian Net model for predicting the software defects for the same software projects.

It can be observed in Table 10 that the MMRE and BMMRE for the proposed model are 0.0687 and 0.0757, respectively. Clearly, the MMRE and BMMRE of the proposed model come out to be much lesser than that of the (Fenton et al. 2008) model and (Yadav et al. 2012) model.

Table 10 Values of model evaluation measures

It can also be observed that the predictive accuracy of the models expressed by different measures increases with the size of the project. Measures based on the relative error (MMRE, BMMRE) decrease significantly, as project size increases for all three models.

6 Conclusion

In this paper, a fuzzy logic based model is proposed for predicting software defect density indicator at early phase of the SDLC. The proposed model considers only reliability relevant software metrics of the early phase of the SDLC. The proposed model takes into account the uncertainty associated with the reliability relevant software metrics of early phases of SDLC. The predicted defect for 20 software projects are found very near to the actual defects detected during testing. The predicted defect density indicators are very helpful to analyze the defect severity in different artifacts of SDLC of a software project. This provides a guideline to the software manager for early identification of cost overruns, schedules mismatch, software development process issues, software resource allocation and release decision making etc.