Keywords

107.1 Introduction

One of the major activities in software project management is software development effort estimation (SDEE). Recently machine learning methods and data mining techniques are getting more attention [1, 2]. Among the model based techniques regression is most frequently used. In comparing different methods regression is used as a default method. Problems of comparing one method with another arise as there are many criteria for accuracy evaluation. Accuracy also depends on the data used for evaluation as well as the criteria. Generally one can classify SDEE into four groups:

  1. (i)

    Analogy based methods

  2. (ii)

    Expert estimation, Delphi and Wideband Delphi

  3. (iii)

    Model based such as COCOMO, SLIM, etc.

  4. (iv)

    Artificial Intelligence (AI) methods such as neural networks, fuzzy logic, genetic algorithms or combinations there of

Past projects data are used directly or indirectly in all the methods. Analogy based methods compare the current project with past project which is close to it. In expert estimation, opinion of experts is sought for effort values. In model based methods, relationship between effort and project parameters are obtained using historical data. Among the AI methods neural networks are most commonly used [3]. Here we have used General Regression Neural Network (GRN) which is easy to implement with only one parameter to be tuned and its performance compared with classical Linear Least Squares Regression (LSR).

Section 107.2 gives related work followed by estimation problem and measurement data in Sect. 107.3. LSR and GRN are explained in Sects. 107.4 and 107.5 respectively. Comparative results are provided in Sect. 107.6. Section 107.7 provides conclusions and future research. References are listed at the end. We have followed the empirical software engineering approaches given in [4, 5].

107.2 Related Work

Software development effort estimation continues to be a hot topic in spite of many persons across many countries doing research. The major problems are related to input data, algorithm, and accuracy evaluation criteria. One needs to consider all these three factors to arrive at a conclusion. Boehm et al. [6] suggest that no one technique should be relied upon for SDEE. Instead multiple methods should be compared for decision making. Also when students learn estimation and apply in a college setting it becomes easy for any organization to implement SDEE. As discussed earlier, SDEE is a function of input where size of software projects play an important role. For small projects effort required is also small. Lopez-Martin [7] used fuzzy logic model based on two independent variables New and Changed (N&C) code and Reused (R) code. He has compared the performance of fuzzy model with multiple regression model. The results indicate that there is no difference between these two models. Two fuzzy logic models Mamdani and Takai-Sugeno are studied in [8]. The evaluation of these methods with linear regression showed that Takai-Sugeno fuzzy system performs better. Here only New and Changed code is used as independent variable. GRN is used to predict effort of industrial projects [9]. It is proved using statistical tests ANOVA and Kruskal-Wallis that GRN is an alternative to regression model. None of the above works compares SDEE using one and two independent variables. It is suggested to report effect size in statistical testing of all randomized algorithms [5].

107.3 Estimation Problem and Measurements

SDEE generally consists of two stages viz. model building and model evaluation. These are also known as verification or training and validation or testing. A part of the measurements is used to build the model and the remaining data are used to validate the model. Here we have used the data for verification and validation given in [7]. This consists of Actual Effort (AE), N&C code (N&C) and Reused code (R) for small projects in an academic setting. Effort is the dependent variable or response and the two independent variables or predictors are N&C code and R code. For training 163 projects are used and for testing 68 projects are used. Table 107.1 summarizes both training and testing data (N&CT, RT, AET). Pearson correlation coefficients of different variables are given in Table 107.2. It can be observed that the linear correlation of Reused code with Actual Effort is small compared with New and Changed code correlation. More details of the data are available in [7].

Table 107.1 Characteristics of training and testing data
Table 107.2 Pearson correlation coefficients of different variables

The estimation problem aims at finding a relationship between dependent and independent variables using training data. Then the test data is used to validate the developed model. We have used General Regression Neural Network (GRN) [10] and compared the results with Linear Least Squares Regression (LSR) for one and two independent variables. The accuracy is evaluated by magnitude of error relative to prediction (MER) with respect to each project and for each model. It has been strongly suggested not to use magnitude of error with respect to prediction (MRE) [11]. Also the error, (Actual efforti − Predicted efforti), known as residual in statistics literature is uncorrelated to Predicted effort. For each project:

$$ {\text{MER}}_{\text{i}} = {\text{abs}}\left( {{\text{Actual effort}}_{\text{i}} \,{-}\,{\text{Predicted effort}}_{\text{i}} } \right)/{\text{Predicted effort}}_{\text{i}} $$

The aggregate for all the n projects is: MMER = (1/n) Σ MERi.

107.4 Linear Least Squares Regression (LSR)

107.4.1 Using Two Independent Variables (N&C, R)

We have used MINITAB® to obtain the following results. The least squares method fits training data in the two variables as AE = 44.7 + 1.08 N&C − 0.146 R.

The contribution of Reused code is one tenth of New and Changed code. The coefficient signs are intuitively correct.

$$ {\text{R - Sq}} = 5 7. 2\;\% \quad {\text{R - Sq}}\left( {\text{adj}} \right) = 5 6. 6\;\% \quad {\text{R - Sq}}\left( {\text{pred}} \right) = 5 5. 4 3\;\% $$

The R-Sq value indicates that the predictors explain 57.2 % of the variance in Actual Effort. The R-Sq(adj) is 56.6 %, which accounts for the number of predictors in the model. The R-Sq(pred) value is 55.43 %. Because the predicted R value is close to the R-Sq and adjusted R-Sq(adj) values, the model does not appear to be over fit.

The P-value, 0.000, in the Analysis of Variance table (Table 107.3) shows that the model estimated by the regression procedure is significant at an α-level of 0.05. This indicates that at least one coefficient is different from zero.

Table 107.3 Analysis of variance for two variables

The P-values for the estimated coefficients of N&C and R are both less than 0.05, indicating that they are significantly related to AE. The residuals plots for model validation are shown in Fig. 107.1. The normal probability plot shows an approximately linear pattern consistent with normal distribution. The plot of residuals versus the fitted values shows that the residuals are distributed on both sides of the reference line. The graphs do not show any abnormality. From both graphical and tabular data analysis, we can accept the regression equation for two variables. The accuracy in term of MMER for training and testing are 0.274 and 0.287 respectively.

Fig. 107.1
figure 1

Residual plots for training data for two variables

107.4.2 Using One Independent Variable (N&C)

Since the correlation of Reused code with AE is small we have used only N&C. The regression equation is

$$ \begin{aligned} {\text{AE}} & = 3 9. 3+ 1.0 6\;{\text{N}}{\&} {\text{C}} \\ {\text{R - Sq}} & = 5 5. 8\,\% \quad {\text{R - Sq}}\left( {\text{adj}} \right) = 5 5. 5\,\% \quad {\text{R - Sq}}\left( {\text{pred}} \right) = 5 4. 6 3\,\% \\ \end{aligned} $$

These values are not much different from two variables results. Both ANOVA and coefficient table (not shown) (Table 107.4) indicate the significance of regression and coefficient. Residuals plots also do not show any problem. One interesting observation no residual is more than three sigma value where as for two variables case one observation (160) is outside three sigma. The accuracy in term of MMER for training and testing are 0.272 and 0.276 respectively.

Table 107.4 Coefficients table for two variables

107.5 General Regression Neural Network (GRN)

The GRN is a type of neural network which can be used to perform regression on a continuous data [10]. It can learn fast compared to standard back propagation multilayer perceptron as the output is obtained using a single pass. Also GRN needs only one parameter, spread, to be tuned. This network can be used for any regression problem including assumption of no linearity. We have used MATLAB® for GRN application. Figure 107.2 gives the architecture of GRN. The network contains one radial basis layer (Hidden) and one linear layer (Output). Radial basis layer activation function is exponential and its spread needs to be adjusted empirically for a particular problem. We need to find two optimal spreads one for two variable case and another for one variable input.

Fig. 107.2
figure 2

Generalized regression neural network architecture

The spread is varied in steps of 0.01 from 0.01 to 1.0. The input and output are scaled from 0 to 1. For two variables, N&C and R, input it is found that at spread equal to 0.22, training and testing MMER are nearly equal. MMER for training is 0.2976 and for testing is 0.2986. For one variable case, N&C, input it is found that at spread equal to 0.08, training and testing MMER are nearly equal. MMER for training is 0.2725 and for testing is 0.2733. Although the accuracy is better (reduced error) for one variable, we need to validate statistically.

107.6 Comparative Analysis

We have used MINITAB® for testing the equality of two means and two medians. Two sample t test is used for the former as it is robust to moderate departures from normality assumptions. A non-parametric test, which does not require any distribution assumption, Mann-Whitney (M-W) test, is applied for comparing medians. As suggested in [5], we give P values for both tests. It is also required to give effect size as it is possible to obtain statistically significant results for large samples with t-test and M-W test. Here we report non-parametric effect size measure A12 given in [5].

A12 = {R1/m − (m + 1)/2}/n, where R1 is the rank sum of the first data group, m is the number of observations in first data sample, n is the number of observations in second data sample. Given a performance measure, M, the A12 statistics measures the probability that running algorithm one yields higher M values than running second algorithm. Table 107.5 provides comparison of mean, median and standard deviation of MER for LSR and GRN for one variable and two variables for testing and training data. The differences are small and we want to confirm by statistical tests. Results, P values, for different statistical tests for LSR and GRN are given in Table 107.6. It can be seen from t test and M-W test that for all cases P values are more than 0.05. We can conclude that GRN performs equal to LSR for one or two variables and for training and testing. Effect size for all the four cases is near 0.5, we can conclude that LSR and GRN perform equally well. Table 107.7 compares the performance of one and two variables. As expected P values for t-test and M-W test are greater than 0.05. Also effect size values are close to 0.5. We can conclude that there is no performance difference between one or two variables for LSR or GRN.

Table 107.5 Comparison of statistical parameters for LSR and GRN
Table 107.6 Comparison of statistical tests for LSR and GRN
Table 107.7 Comparison of statistical tests for one (N&C) and two variables (N&C, R)

107.7 Conclusions

Software effort estimation accuracy of GRN is equal to LSR for small projects for the MMER criteria. Software effort estimation accuracy using one variable is equal to two variables for small projects for the MMER criteria for both algorithms. It is recommended to use one variable, N&C, for estimation of small projects. Research needs to be carried out to theoretically justify the same performance of GRN and LSR.