Keywords

1 Introduction

Software Cost Estimation (SCE) of a software project starts from initial phase of software development which includes generating proposal requests, analysis, contract negotiations, planning, scheduling, designing, implementation, maintenance, monitoring and control. The estimation process includes size and effort estimation, initial project scheduling and finally estimation of overall cost of the project. Accurate estimation of software cost is necessary to complete project within time and budget and to prevent failure of software project. If effort estimation is done too low, it may lead to problems in managing project, delay in delivery, overrun of budget and low software quality. If effort estimation is done too high, it may cause business loss and inefficient use of resources. Accuracy is important while software estimation for developers as well as customers as it determines what, where, and when resources will be used, analyzes impact of requirement change etc. Various SCE models have been developed to manage software project’s budget and schedule. Estimation models developed so far has their own significance or importance and is applicable for specific type of projects. So the criteria to evaluate accuracy of software estimation model are much important to successfully complete a software project. Techniques used in estimating software cost have their own features as well as limitations. Some of which have been described in Table 1.

Table 1 Features and limitations of techniques used in SCE

2 Review of SCE Models Based on Used Technique

A lot of research has been carried out for implementation of SCE models using various methodologies. A brief overview of research work done in the past for developing SCE model has been discussed here.

While developing SCE model, ANN acts as a proven practical way that reduces the model’s input space (and thus computational complexity and human effort) while maintaining the same levels of effort prediction accuracy. An automated SCE applied on COCOMO data set using Feed forward BPNN tested on COCOMO NASA 2 dataset may help project manager for fast and realistic estimation of software cost for project effort and development time [1, 2]. Matlab Neural Network tool box with data from multiple projects can be used to validate, train and simulate the network with observations that neural network performs better than COCOMO; and Cascade correlation performs better than Neural Network [3]. BPNN model with COCOMO data works well for Small projects while neural network with Resilient Back Propagation is good for big projects [4]. Radial Basis Function Neural Network with K-means clustering algorithm can perform better in terms of accurate cost estimation [5]. A Neuro-Fuzzy Constructive Cost Model (COCOMO) proves that estimation accuracy can be improved as compared to COCOMO model using industry project data [6, 7].

FL solves the problems of vagueness, imprecise and incomplete data to make reliable and accurate effort estimates. FL can be used to develop SCE model by fuzzifying functional points, applying membership functions e.g. triangular, trapezoidal, Gaussian function etc. to represent the cost drivers and defuzzifying the results to get the resultant effort. SCE model developed using FL with membership functions gives better performance as compared to COCOMO model which was tested and evaluated on a dataset of software projects [8]. Triangular fuzzy logic on NASA software projects representing linguistic terms in Function Point Analysis (FPA) with complexity metrics estimates size in person hours [9]. FL with Gaussian Membership Function (GMF) applied on COCOMO cost drivers gives results close to the actual effort than the trapezoidal function [10]. FL with Takagi-Sugeno technique for estimation applied on COCOMO and SLOC using Function Point (FP) gives simple, better estimation capabilities and mathematical relationship between the effort and the inputs [11, 12, 13].

Genetic Programming provides a more advanced mathematical function to predict more accurate estimated effort. Data mining tool can be used to increase accuracy of effort estimation by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes. GA can be used to assess software project in terms of effort computation that takes much less time and performs better than COCOMO model on NASA software project dataset. GA can provide better results as compared to COCOMO II as tested on Turkish and Industry data set [1416].

PSO with clustering can perform efficient effort estimation with learning ability by providing an efficient, flexible and user friendly way to perform the task of effort estimation. More accuracy in SCE can be achieved than the standard COCOMO using PSO with K-means clustering applied on COCOMO model that enables learning from past project data and domain specific projection of future resource requirements. PSO with inertia weight applying on COCOMO data of NASA software project can be used to calculate MARE, VARE and VAF [17, 18, 19].

Any one of the Line of Code, Function Point and Cosmic FFP can be used to measure size of a software project. Cosmic FFP provides simple, easy to use, proven and practical solution for software size estimation and quality improvement. COSMIC FFP uses functional size unit for SCE where One Cosmic Functional Size Unit (CFSU) is assigned for each entry/exit of a data group and for each read/write operation by a data group [20].

3 Statistical Criteria to Analyze and Evaluate Performance of SCE Model

Statistical criteria to analyze and evaluate the efficiency of software cost estimation model have been shown in Table 2.

Table 2 Evaluation criteria for SCE models based on actual effort and predicted effort

4 Proposed Model

Literature analysis reveals that the SCE models developed using Neural Network, Fuzzy Logic or combination of both provides good results as compared to other soft computing techniques. Neuro-Fuzzy model acts as a powerful tool to predict cost and quality by integrating numerical data and expert knowledge. Proposed neuro-fuzzy model has been derived from [2, 6, 7, 12]. Model has been validated through data got from PROMISE Software Engineering Repository of 93 NASA projects. For calculation of effort, COCOMO II model has been used:

$$ Effort = A \times \left( {KLOC} \right)^{{B + 0.01 \times \mathop \sum \limits_{i = 1}^{5} SF_{i} }} \times \mathop \prod \limits_{j = 1}^{17} EM_{j} $$
$$ Schedule\left( {in months} \right) = C \times Effort^{D} + 0.01 \times \mathop \sum \limits_{i = 1}^{5} SF_{i} $$

where A, B, C, D are domain specific parameters (By default A = 2.94, B = 0.91, C = 3.67, D = 0.28), SF is the scale factor and EM is the effort multiplier.

Cost drivers are used in calculation of development effort, such as analyst capability, application experience etc. Fuzzification converts the crisp data to linguistic variables which are passed to Inference Engine. A fuzzy set has been defined for six qualitative rating levels for every cost driver and expressed in linguistic terms as very low (VL), low (L), nominal (N), high (H), very high (VH) and extra high (XH). The membership functions used is triangular functions which is a three-point function, defined by minimum (α), maximum (β) and modal (m) values, i.e. T (α, m, β), where (α ≤ m ≤ β). The rules can be on the basis of single parameter or combination of parameters e.g.

  • if (PREC is Very Low) then (EFFORT is Extra High)

  • if(PREC is Low) then (EFFORT is Very High)

  • if(FLEX is Very Low) then (EFFORT is Extra High) etc.

For defuzzification, Centeroide Method which calculates Centre of Gravity (COG) area under the curve has been used.

$$ E = \frac{{w_{1} \left( {a \alpha^{b} } \right) + w_{2} \left( {am^{b} } \right) + w_{3} \left( {a\beta^{b} } \right)}}{{w_{1} + w_{2} + w_{3} }} $$

where w1, w2 and w3 are weights of the optimistic, most likely and pessimistic estimate respectively. Here maximum weight is given for most expected estimate. (aαb) denotes optimistic estimate, (amb) denotes most likely estimate and (aβb) denotes pessimistic estimate.

Proposed model has been implemented in MatLab R2013 using ANFIS (Adaptive Neuro-Fuzzy Inference System) with a hybrid learning algorithm of least-squares method and back-propagation gradient descent (for small projects) and Resilient BPNN (for large project) are used to identify parameters of Sugeno-type fuzzy inference systems as shown in Fig. 1.

Fig. 1
figure 1

ANFIS generation using clustering of training data, model structure and testing with error 0.0058664

5 Empirical Analysis and Evaluation

Hereby we are going to evaluate some popular models with our proposed model based on the statistical criteria as defined in Sect. 3. The empirical evaluations have been derived from statistical analysis of predicted effort and actual effort given by model of that particular type of technique.

6 Result Analysis

From the data obtained by empirical calculation of selected SCE models, it can be verified that models based on Neural Network, Fuzzy Logic or their combination performs better than other methods i.e. GA and PSO.

Based on Table 3 to find the optimized model, Table 4 and Fig. 2 reveal that for all statistical criteria, no model is at 1st rank while proposed model gives at least 2nd rank. Although the proposed Neuro-Fuzzy model does not give best results for all statistical parameters, still if we compare in overall, we can say that the proposed model provides optimized result for SCE.

Table 3 Evaluation of techniques used in SCE model using statistical parameters
Table 4 Evaluation of SCE techniques w.r.t. statistical parameters
Fig. 2
figure 2

Performance of Proposed model with other considered SCE models

7 Conclusion and Future Works

In this paper, a detailed empirical analysis and evaluation of SCE models developed through soft computing techniques (e.g. ANN, FL, GA, PSO etc.) have been done using an in-depth review and statistical criteria. Final results indicates that none of the models shows perfect behavior as in terms of certain measure, certain model qualifies better than another, but for other measures it may be worse. Analytical review of considered models shows that SCE models based on NN, FL or combination of NN and FL can give better results as compared to SCE models based on other techniques. Keeping this view, an optimized Neuro-Fuzzy SCE model has been proposed that provides optimum results for considered statistical parameters as compared to other considered SCE models. Due to limitations of NN and FL, proposed model is dependent on size and type of project and data used for training/learning. As per the empirical analysis, it seems that while developing SCE model there is still some scope of improvement in Neuro-Fuzzy techniques and can be accommodated in near future. In future, some improvements may be done by developing SCE models using other optimization techniques like Ant Colony Optimization, Bee Colony Optimization etc. to overtake the performance given by proposed Neuro-Fuzzy SCE model.