1 Introduction

Software development is becoming a necessity at a grandiose rate among all types and size of organizations. Software practitioners have become more and more apprehensive about their software cost and development. Varied software cost estimation models have been proposed over the past few years. However, they are unable to cope with the realistic realities of software engineering like handling imprecise information, dealing with uncertainty and many more [1,2,3,4].

The model proposed in this manuscript has been validated for its accuracy and estimation by using publicly available NASA93 software project data consisting of 20 projects with their values allocated to each cost driver. Results have been tabulated after comparing basic COCOMO and proposed fuzzy model. Results prove that the proposed model is more accurate and precise due to machine learning algorithm application that discovers knowledge and produces expertise results. Basic COCOMO model generates assumption-based results using historical data without applying any algorithms or sets.

The rest of this paper is categorized as follows: Sect. 2 reviews available literature and work done in field of software cost estimation. Section 3 describes easy and efficient way of estimating software cost parameters by using Costar software estimation tool based on COCOMO II model. It depicts how parameters like effort, schedule are estimated using pre-defined COCOMO equations. Section 4 details proposed neuro fuzzy model. Section 5 provides validation of given model followed by conclusion and references.

2 Literature review

Whenever an estimate is generated, all requirements and business rules are verified with previous estimations taken into account. As per Standish report [5], cost estimation survey states that 30% of projects never complete, average project exceeds cost by 90%, schedule by 67% and 15% of projects do not deliver anything.

Cai et al. [6] replaced probabilistic software models (PSRM) by fuzzy software reliability modeling. It is said that world is full of fuzziness and partial knowledge beliefs that can only be handled by fuzzy model. Karunanithi et al. [7] proposed connectionist models to predict software reliability and cost. Sitte [8] analyzed neural networks to conclude that these networks are simpler to use and prove best in nonlinear models. Tian and Noore [9] worked on evolutionary neural network approach founded on multi input and single output design. This approach performs better in case of failure time prediction. Su and Huang [10] suggested working and development of neural network approach to build dynamic weighted combinational model (DWCM) that predicts software maintainability. Madsen [11] designed soft computing framework to apply intelligent techniques for monitoring software reliability engineering. Kumar et al. [12] designed a framework for measuring software complexity, adaptability, security and usability but the model failed to measure interaction among various metrics. Wason et al. [13] proposed automata-based model to control runtime software reliability.

A plethora of studies led by researchers in context of measuring software quality to estimate cost of software have been conducted. A comparative study on algorithms used in NN model was conducted by Aggarwal et al. [14]. It includes quasi network method, levenberg model etc. A fuzzy model has also been suggested by Aggarwal et al. [15] for estimating software maintainability and usability. It is based on fuzzy logic approach that identifies error prone software components. Yang et al. [16] devised a model on fuzzy neural network in order to forecast software quality. The model produces empirical results by taking primary data and knowledge gained from human experiences into consideration. Despite, such large number of significant research to improve the process of software development and design, gap within estimated and predicted software costs does not seem to reduce. To achieve the same we propose an innovative, hybrid approach using COCOMO II to estimate software cost factors and NFNN model to calculate cost. The details of the proposed model are discussed in the next sections.

3 Using costar to estimate cost in COCOMO’81 and COCOMO II models

The traditional COCOMO’81 model uses metric in terms of Delivered Source Instructions (DSI), which is similar to SLOC while COCOMO II model defines metric in terms of SLOC. The major difference between both metrics is that SLOC consist of various physical lines.

3.1 Scale drivers

COCOMO II considers five scale drivers as follows:

  • Precedentedness.

  • Development flexibility.

  • Architecture/risk resolution.

  • Team cohesion.

  • Process maturity.

The above Scale Drivers replace the COCOMO’81 Development Mode (Organic, Semidetached, or Embedded).

3.2 Cost drivers

COCOMO II has 22 cost drivers consisting of 17 Effort multipliers and five scalar factors. For example, if our plan needs to build up software on airline reservation system in that case we have to place the Required Software Reliability (RELY) expenditure driver to very high.

A. Results produced by costar

Figures 1 and 2 below describe the process of cost estimation factors—effort and schedule in both models. They also display that cost of software project is less in COCOMO II as compared to basic COCOMO model (COCOMO’81).

Fig. 1
figure 1

COCOMO 81 intermediate results in semi-detached mode with SLOC 2000 as total size

Fig. 2
figure 2

Cost in COCOMO 81 = 0.022$

4 Proposed neuro-fuzzy model

COCOMO II model has 22 cost estimators with five scale factors (SF) and 17 effort multipliers (EM) that acts as input to proposed model. The intended outcome is effort estimated using COCOMO II post architectural model equation as [17]:

$$ Effort = A \times \left( {Size} \right)^{B + 0.01 \times } \sum\limits_{i = 1}^{S} {SF_{i} } \times \prod\limits_{i = 1}^{17} {EM_{i} }, $$

where A, B are calibration constants.

Rating of the cost estimators can be continuous or linguistic terms (very low, low, nominal, high, very high and extra high). Every rating value of cost estimator is related to value used in COCOMO model because each cost estimator represents one factor that is linked to effort. The proposed model is divided into two parts:

  1. a.

    Set of Neuro fuzzy sub models (NF’s).

  2. b.

    One cost driver to each sub model.

Rating value of cost driver acts as input to each NF and output of each NF is their effort multiplier corresponding to each cost driver. Each NF sub model translates linguistic terms of cost drivers into qualitative multipliervalues by using fuzzy sets. They are described as membership functions. Each NF is represented by ANFIS (Adaptive Neuro fuzzy inference system) [18].

ANFIS technique is a combination of Neural Network NN and Fuzzy Inference System FIS that considers input nodes as fuzzy sets and provides learning process for fuzzy modeling to deal with various data sets. It calculates membership function attributes that apply fuzzy inference system components (membership editor, rule editor) to follow the provided input or output data. It permits fuzzy system to adapt and study latest information to reach at some conclusion.

FIS is based on three strategies-

  • Outlier identification: it includes model type, rules to be applied and interval values.

  • Parameter estimation: it deals with association rules. Represented as: “If Then” rules. If is called Antecedent, Then is called consequent [19]. They are used to show relationship among various data items.

  • Validating model: the model is validated on MATLAB simulation tool to produce membership functions (Fig. 3).

    Fig. 3
    figure 3

    Fuzzy rules IF THEN applied on data set

The complete working of the NFNN COCOMO II model is depicted below in Fig. 4.

Fig. 4
figure 4

NFNN COCOMO II model framework

A. Layout of proposed model.

B. Results of ANFIS using simulation tool—MATLAB. (Figs. 5, 6).

Fig. 5
figure 5

FIS output produced on trained set of data (fuzzification)

Fig. 6
figure 6

Output produced after defuzzyfication of Fig. 5

4.1 Validation by NASA project data

This section validates the modeling performance and accuracy of the proposed neuro fuzzy COCOMO II model. NASA93 publicly available data from original COCOMO’81 database [20] was used for validation purpose. A comparative analysis was made between proposed model and COCOMO’81 model (traditional model) because COCOMO II does not have trained data present and also cost drivers of COCOMO’81 are compatible for validation. The dataset consists of two independent variables Lines of Code LoC and ME and one dependent variable Effort. The dataset of NASA93 project consists of 20 projects out of which few are outlined in table below.

The validation results of experiment were assessed using the concept of Mean Magnitude Relative Error (MMRE) to estimate accuracy of proposed model (Table 1). MMRE is calibrated using following equation:

$$ MMRE = O \, \left( {I \, = \, 1\;to \, n} \right) \, \left[ {\left( {Estimated \, effort \, {-} \, Actual \, effort} \right) \, / \, Actual \, effort} \right]. $$
Table 1 NASA software project dataset [19]

The results obtained after calculation are reported in Table 2 below.

Table 2 Effort estimation

4.2 Conclusions and future work

The paper proposes an expert model that is combination of algorithmic approach namely COCOMO II and machine learning algorithm namely Neuro Fuzzy (NF) approach. The Size of Project and Output of sub models Neuro Fuzzy acts as input to COCOMO II model that is amalgamated with neuro fuzzy technique and produces final cost metric. The validation of proposed model is done by taking data from NASA project of original database under consideration and results are compared with COCOMO’81 model.

As future enhancement, the same neuro fuzzy technique can be applied to object oriented metrics to predict software quality attributes. This approach can further be extended to predict quality of tools like SLIM (Putnam), CA estimates and others.