Keywords

1 Introduction

Software companies today are outsourcing a wide variety of their jobs to offshore organizations, for maximizing returns on investments. Estimating the amount of effort, time, and cost required for developing any information system is a critical project management issue. In view of the above, long-term, credible, and optimum forecast of software project estimates in the early stages of a project’s life cycle is an almost intractable problem. Often, key information of real-life projects regarding size, complexity, system documentation, vocabulary, annual change traffic, client attitude, multilocation teams, etc. is unavailable. In spite of the availability of more than 100 estimation tools in the market, experience-based reasoning still remains the commonly applied estimation approach owing to some fundamental estimation issues which software developers have struggled with [1].

2 Literature Review

A review of studies on expert estimation of SD effort was presented by [2, 3]. An exploratory analysis of the state of the practice on schedule estimation and software project success prediction is presented in [4]. It was found that the data collection approach, role of respondents, and analysis type had an important impact on software estimation error [5]. Soft-computing- or artificial intelligence (AI)-based approaches are of late being used for more accurate prediction of software effort/cost. Artificial neural networks (ANN) offer a powerful computing architecture capable of learning from experimental data and representing complex, nonlinear, multivariate relationships [6, 7]. Kumar et al. compared the effectiveness of the variants of wavelet neural network (WNN) with many other techniques to forecast the SD effort [8]. Genetic algorithms (GAs) were used for the estimation of COCOMO model parameters of NASA SD projects in [9] while different fuzzy logic-based studies have been conducted [1012]. Many hybrid schemes (neuro-GA, neuro-fuzzy, grey-GA, fuzzy-grey, etc) have also been investigated [1315]. Many studies on software prediction have focused on the development of regression models based on historical data [16, 17].

Table 1 SD effort dataset [18]

3 Statistical Modeling

This modeling study is based on the SD effort dataset of Bailey and Basili [18] (Table 1—shown partly for brevity reasons). The six input factors are the total lines of code, new lines of code, developed lines of code (DL) (all in kloc), total methodology (ME), cumulative complexity, and cumulative experience, and the output is effort (in man months). Preliminary statistical analysis of the dataset was conducted beforehand including the following: (1) correlation coefficient, (2) covariance, (3) kurtosis, and (4) R-square as presented in Table 1. Initially, from Minitab [19]-based ANOVA, a multivariable linear regression model (Eq. 1) has been fitted. The goodness of this developed model is validated with two other models (Eqs. 2 and 3) given by Sheta and Al-Afeef [15] in Table 2. Based on the high T (or low P) values, the following ranking (in a decreasing order) of the 6 effort drivers has been established: (1) methodology, (2) new LoC, (3) total LoC, (4) cumulative experience, (5) developed LoC, and (6) cumulative complexity. The high R-squared value of 98.3 % and R-Sq(adjusted) values of 97.4 % justify the correctness of the ANOVA.

$$\begin{aligned} \text {Effort}&=41.6+0.314 \; Tot\_LoC +0.986 \; New\_LoC+0.116\;Develop\_LoC \nonumber \\&\quad -1.57\;Meth-0.112\;Cum\_Complex+0.376\;Cum\_Exper \nonumber \\ \end{aligned}$$
(1)
$$\begin{aligned} E=1.75992 \times DL-4.56 \times 10^{-3}\times DL^{2} \end{aligned}$$
(2)
$$\begin{aligned} E=2 \times DL-0.59 \times 10^{-3}ME^{2} \times DL \end{aligned}$$
(3)

The main effect plots for the 6 effort drivers are shown in Fig. 1.

Fig. 1
figure 1

Main effects plot

4 Neural Network Modeling

Back-propagation (BP) NN modeling for effort estimation has been carried out in this work using the MATLAB (2007b) NN toolbox options. Initially, a simple two-layer BP (6-6-1) NN was employed. The number of hidden nodes in the hidden layer was kept equal to the number of inputs (6 here). The number of hidden neurons was then suitably increased in an orderly hit and trial manner, to decide the final structure of the NN by keeping a check on the convergence rate of training, testing, and validation errors as well as the average percentage error. The learning rate and momentum can also be adjusted for the above purpose (although not varied in the present work).

Before the network is made ready to make estimates, we input the combinations of data inputs and outputs [18] through the network for training (60 %), validation (20 %), and testing (20 %). In our case, the activation functions of both the hidden and output layers were initially chosen to be tan-sigmoid. The same was later changed to the purelin(ear) function in the output layer. We used the two most popular training algorithms i.e., the Levenberg-Marquardt (LM) and the Bayesian regularization (BR) algorithms. The training performance and linear regression analysis (between the network outputs and the corresponding targets) are shown in Figs. 2 and 3. For the LM algorithm, the output tracks the targets reasonably well, and the regression coefficient (R) value is over 0.97 mostly. Similarly, for the BR algorithm-based training with purelin output function, the R values are over 0.99 in nearly all the cases (Fig. 3).

Fig. 2
figure 2

Levenberg-Marquardt training with tansigmoid function in output layer

Fig. 3
figure 3

Bayesian regularization training with purelin(ear) function in output layer

4.1 NN Modeling Tips

Listed underneath are some practical tips for efficient NN modeling.

  • NNs are rather sensitive to the number of neurons in the hidden layers. Too few neurons often lead to underfitting, while too many neurons can contribute to overfitting. In this case, inspite of all the training points being well fitted, the fitted curve oscillates largely between these points [20].

  • The NN dataset is generally divided in the following ratios: training (50–60 ), validation (20–25 ), and testing (20–25 ).

  • Learning rate (alpha) represents how quickly an NN learns ranges from 0 to 1 and is initialized randomly. As with linear networks, a learning rate that is too large leads to unstable learning. Contrarily, a too small learning rate results in much longer training times. Typical values are 0.01–0.05.

  • Momentum is a variable, which helps NN to break out of local minima. It may range from 0 to 1. Typical values are around 0.5.

  • The threshold function (logsig, tansig, purelin, etc) selection is critical and determines when a node fires propagating a value further through the network. The choice is essentially based on the range and sign of inputs/outputs.

  • The BR algorithm which is a modification of the LM algorithm is often used, as it generalizes well and reduces the difficulty of determining the optimum network architecture.

  • LM training would normally be used for small- and medium-size networks, if enough memory is available. If memory is a problem, then there are a variety of other fast algorithms available. For large networks, one would probably want to use trainscg (conjugate gradient) or trainrp (resilient BP) algorithms.

  • Overfitting is one of the most common problems that occurs in NN training. The training set error becomes a very small value, but the error turns to a large value when new data are presented to the network. An attempt at collecting more data and increasing the size of the training set must be made to prevent the situation of overfitting [20].

  • One suggested method to improve network generalization is the use of a just large enough network that provides an adequate fit. Larger is the network used, more complex can be the functions the network can create. A small enough network will not have enough power to overfit the data. The two methods for improving generalization and implemented in MATLAB NN toolbox are regularization and early stopping [20].

Table 2 Comparison of empirical model fitted and NN effort with target effort

5 Results and Discussion

The degree to which a model’s estimated effort (\(\text {MM}_\mathrm{est}\)) matches the actual or target effort (\(\text {MM}_\mathrm{act})\) is estimated by a percentage relative error. Magnitude of relative error (MRE), which accounts for under and overestimates along with its mean magnitude of relative error (MMRE) is often used in effort estimation analysis.

$$\begin{aligned} MRE=\left| {\left. {\frac{MM_\text {act} -MM_\text {est} }{MM_\text {act} }} \right| } \right. \end{aligned}$$
(4)

Table 2 (in brief) presents a comparison of the empirical models (Eqs. 13) fitted effort and NN effort (for different configurations) with the target effort of [19]. It can be concluded that the present NN framework is able to successfully model the dataset with nearly the following percentage relative error and percentage mean relative error:

  1. 1.

    \(-10.58\) to 8.36 and 4.68 %, respectively, for trainlm with tansig function in output layer and hidden neurons varied from 6 to 20.

  2. 2.

    \(-12.5\) to \(-9.62\) and \(-10.3\) %, respectively, for trainbr with tansig function in output layer and hidden neurons varied from 6 to 20.

  3. 3.

    0.65 to \(-3.12\) and 0.79 %, respectively, for trainbr with purelin function in output layer and hidden neurons varied from 6 to 20.

Table 3 Comparison of different models

The relative error obtained from the developed multi-regression model (Eq. 1) is comparable to other models (Eqs. 2 and 3). A comparison between the mean relative error of the developed NN and regression models and the MMRE of Halstead, Walston-Felix, Bailey-Basili, and Doty models are shown in Table 3 [16].

6 Conclusions

Effort estimation is a complex task, and research studies indicate that results in general vary a lot. The market potential for SD and maintenance is huge and constantly growing mainly for financial and online applications. In this work, a twofold approach based on NN and multilinear regression has been carried out for more accurate SD effort estimation