General Regression Neural Network for Software Effort Estimation of Small Programs Using a Single Variable

Pillai, S. K.; Jeyakumar, M. K.

doi:10.1007/978-81-322-2119-7_107

S. K. Pillai⁵ &
M. K. Jeyakumar⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 326))

4343 Accesses
2 Citations

Abstract

Software development effort estimation always remains a challenging task for project managers. New techniques are applied to estimate effort. Predicting effort for small programs in educational setting is a difficult task. Minimum number of independent variables should be used to reduce data collection effort. Evaluation of accuracy is a major activity as many methods are proposed in the literature. Here, we have applied General Regression Neural Network (GRN) and compared the results with Linear Least Squares Regression (LSR) for one and two independent variables. Results are evaluated using statistical tests and effect size. The results show that accuracy of GRN and LSR with one and two variables are not different for small programs.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Software Effort Estimation through a Generalized Regression Neural Network

Estimating Software Effort Using Neural Network: An Experimental Investigation

Comparative Analysis of Machine Learning Techniques for Software Effort Estimation

Keywords

107.1 Introduction

One of the major activities in software project management is software development effort estimation (SDEE). Recently machine learning methods and data mining techniques are getting more attention [1, 2]. Among the model based techniques regression is most frequently used. In comparing different methods regression is used as a default method. Problems of comparing one method with another arise as there are many criteria for accuracy evaluation. Accuracy also depends on the data used for evaluation as well as the criteria. Generally one can classify SDEE into four groups:

(i)
Analogy based methods
(ii)
Expert estimation, Delphi and Wideband Delphi
(iii)
Model based such as COCOMO, SLIM, etc.
(iv)
Artificial Intelligence (AI) methods such as neural networks, fuzzy logic, genetic algorithms or combinations there of

Past projects data are used directly or indirectly in all the methods. Analogy based methods compare the current project with past project which is close to it. In expert estimation, opinion of experts is sought for effort values. In model based methods, relationship between effort and project parameters are obtained using historical data. Among the AI methods neural networks are most commonly used [3]. Here we have used General Regression Neural Network (GRN) which is easy to implement with only one parameter to be tuned and its performance compared with classical Linear Least Squares Regression (LSR).

Section 107.2 gives related work followed by estimation problem and measurement data in Sect. 107.3. LSR and GRN are explained in Sects. 107.4 and 107.5 respectively. Comparative results are provided in Sect. 107.6. Section 107.7 provides conclusions and future research. References are listed at the end. We have followed the empirical software engineering approaches given in [4, 5].

107.2 Related Work

Software development effort estimation continues to be a hot topic in spite of many persons across many countries doing research. The major problems are related to input data, algorithm, and accuracy evaluation criteria. One needs to consider all these three factors to arrive at a conclusion. Boehm et al. [6] suggest that no one technique should be relied upon for SDEE. Instead multiple methods should be compared for decision making. Also when students learn estimation and apply in a college setting it becomes easy for any organization to implement SDEE. As discussed earlier, SDEE is a function of input where size of software projects play an important role. For small projects effort required is also small. Lopez-Martin [7] used fuzzy logic model based on two independent variables New and Changed (N&C) code and Reused (R) code. He has compared the performance of fuzzy model with multiple regression model. The results indicate that there is no difference between these two models. Two fuzzy logic models Mamdani and Takai-Sugeno are studied in [8]. The evaluation of these methods with linear regression showed that Takai-Sugeno fuzzy system performs better. Here only New and Changed code is used as independent variable. GRN is used to predict effort of industrial projects [9]. It is proved using statistical tests ANOVA and Kruskal-Wallis that GRN is an alternative to regression model. None of the above works compares SDEE using one and two independent variables. It is suggested to report effect size in statistical testing of all randomized algorithms [5].

107.3 Estimation Problem and Measurements

SDEE generally consists of two stages viz. model building and model evaluation. These are also known as verification or training and validation or testing. A part of the measurements is used to build the model and the remaining data are used to validate the model. Here we have used the data for verification and validation given in [7]. This consists of Actual Effort (AE), N&C code (N&C) and Reused code (R) for small projects in an academic setting. Effort is the dependent variable or response and the two independent variables or predictors are N&C code and R code. For training 163 projects are used and for testing 68 projects are used. Table 107.1 summarizes both training and testing data (N&CT, RT, AET). Pearson correlation coefficients of different variables are given in Table 107.2. It can be observed that the linear correlation of Reused code with Actual Effort is small compared with New and Changed code correlation. More details of the data are available in [7].

Table 107.1 Characteristics of training and testing data

Full size table

Table 107.2 Pearson correlation coefficients of different variables

Full size table

The estimation problem aims at finding a relationship between dependent and independent variables using training data. Then the test data is used to validate the developed model. We have used General Regression Neural Network (GRN) [10] and compared the results with Linear Least Squares Regression (LSR) for one and two independent variables. The accuracy is evaluated by magnitude of error relative to prediction (MER) with respect to each project and for each model. It has been strongly suggested not to use magnitude of error with respect to prediction (MRE) [11]. Also the error, (Actual effort_i − Predicted effort_i), known as residual in statistics literature is uncorrelated to Predicted effort. For each project:

$$ {\text{MER}}_{\text{i}} = {\text{abs}}\left( {{\text{Actual effort}}_{\text{i}} \,{-}\,{\text{Predicted effort}}_{\text{i}} } \right)/{\text{Predicted effort}}_{\text{i}} $$

The aggregate for all the n projects is: MMER = (1/n) Σ MER_i.

107.4 Linear Least Squares Regression (LSR)

107.4.1 Using Two Independent Variables (N&C, R)

We have used MINITAB® to obtain the following results. The least squares method fits training data in the two variables as AE = 44.7 + 1.08 N&C − 0.146 R.

The contribution of Reused code is one tenth of New and Changed code. The coefficient signs are intuitively correct.

$$ {\text{R - Sq}} = 5 7. 2\;\% \quad {\text{R - Sq}}\left( {\text{adj}} \right) = 5 6. 6\;\% \quad {\text{R - Sq}}\left( {\text{pred}} \right) = 5 5. 4 3\;\% $$

The R-Sq value indicates that the predictors explain 57.2 % of the variance in Actual Effort. The R-Sq(adj) is 56.6 %, which accounts for the number of predictors in the model. The R-Sq(pred) value is 55.43 %. Because the predicted R value is close to the R-Sq and adjusted R-Sq(adj) values, the model does not appear to be over fit.

The P-value, 0.000, in the Analysis of Variance table (Table 107.3) shows that the model estimated by the regression procedure is significant at an α-level of 0.05. This indicates that at least one coefficient is different from zero.

Table 107.3 Analysis of variance for two variables

Full size table

The P-values for the estimated coefficients of N&C and R are both less than 0.05, indicating that they are significantly related to AE. The residuals plots for model validation are shown in Fig. 107.1. The normal probability plot shows an approximately linear pattern consistent with normal distribution. The plot of residuals versus the fitted values shows that the residuals are distributed on both sides of the reference line. The graphs do not show any abnormality. From both graphical and tabular data analysis, we can accept the regression equation for two variables. The accuracy in term of MMER for training and testing are 0.274 and 0.287 respectively.

107.4.2 Using One Independent Variable (N&C)

Since the correlation of Reused code with AE is small we have used only N&C. The regression equation is

$$ \begin{aligned} {\text{AE}} & = 3 9. 3+ 1.0 6\;{\text{N}}{\&} {\text{C}} \\ {\text{R - Sq}} & = 5 5. 8\,\% \quad {\text{R - Sq}}\left( {\text{adj}} \right) = 5 5. 5\,\% \quad {\text{R - Sq}}\left( {\text{pred}} \right) = 5 4. 6 3\,\% \\ \end{aligned} $$

These values are not much different from two variables results. Both ANOVA and coefficient table (not shown) (Table 107.4) indicate the significance of regression and coefficient. Residuals plots also do not show any problem. One interesting observation no residual is more than three sigma value where as for two variables case one observation (160) is outside three sigma. The accuracy in term of MMER for training and testing are 0.272 and 0.276 respectively.

Table 107.4 Coefficients table for two variables

Full size table

107.5 General Regression Neural Network (GRN)

The GRN is a type of neural network which can be used to perform regression on a continuous data [10]. It can learn fast compared to standard back propagation multilayer perceptron as the output is obtained using a single pass. Also GRN needs only one parameter, spread, to be tuned. This network can be used for any regression problem including assumption of no linearity. We have used MATLAB® for GRN application. Figure 107.2 gives the architecture of GRN. The network contains one radial basis layer (Hidden) and one linear layer (Output). Radial basis layer activation function is exponential and its spread needs to be adjusted empirically for a particular problem. We need to find two optimal spreads one for two variable case and another for one variable input.

The spread is varied in steps of 0.01 from 0.01 to 1.0. The input and output are scaled from 0 to 1. For two variables, N&C and R, input it is found that at spread equal to 0.22, training and testing MMER are nearly equal. MMER for training is 0.2976 and for testing is 0.2986. For one variable case, N&C, input it is found that at spread equal to 0.08, training and testing MMER are nearly equal. MMER for training is 0.2725 and for testing is 0.2733. Although the accuracy is better (reduced error) for one variable, we need to validate statistically.

107.6 Comparative Analysis

We have used MINITAB® for testing the equality of two means and two medians. Two sample t test is used for the former as it is robust to moderate departures from normality assumptions. A non-parametric test, which does not require any distribution assumption, Mann-Whitney (M-W) test, is applied for comparing medians. As suggested in [5], we give P values for both tests. It is also required to give effect size as it is possible to obtain statistically significant results for large samples with t-test and M-W test. Here we report non-parametric effect size measure A₁₂ given in [5].

A₁₂ = {R1/m − (m + 1)/2}/n, where R1 is the rank sum of the first data group, m is the number of observations in first data sample, n is the number of observations in second data sample. Given a performance measure, M, the A₁₂ statistics measures the probability that running algorithm one yields higher M values than running second algorithm. Table 107.5 provides comparison of mean, median and standard deviation of MER for LSR and GRN for one variable and two variables for testing and training data. The differences are small and we want to confirm by statistical tests. Results, P values, for different statistical tests for LSR and GRN are given in Table 107.6. It can be seen from t test and M-W test that for all cases P values are more than 0.05. We can conclude that GRN performs equal to LSR for one or two variables and for training and testing. Effect size for all the four cases is near 0.5, we can conclude that LSR and GRN perform equally well. Table 107.7 compares the performance of one and two variables. As expected P values for t-test and M-W test are greater than 0.05. Also effect size values are close to 0.5. We can conclude that there is no performance difference between one or two variables for LSR or GRN.

Table 107.5 Comparison of statistical parameters for LSR and GRN

Full size table

Table 107.6 Comparison of statistical tests for LSR and GRN

Full size table

Table 107.7 Comparison of statistical tests for one (N&C) and two variables (N&C, R)

Full size table

107.7 Conclusions

Software effort estimation accuracy of GRN is equal to LSR for small projects for the MMER criteria. Software effort estimation accuracy using one variable is equal to two variables for small projects for the MMER criteria for both algorithms. It is recommended to use one variable, N&C, for estimation of small projects. Research needs to be carried out to theoretically justify the same performance of GRN and LSR.

References

Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38:375–397
Article Google Scholar
Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54:41–59
Article Google Scholar
Dave VS, Dutta K (2012) Neural network based models for software effort estimation: a review. In: Artificial intelligence review, Springer, Berlin (online)
Google Scholar
Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, Emam KE, Rosenberg J (2007) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28:721–733
Article Google Scholar
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. ICSE’11, Honolulu, USA, 21–28 May 2011
Google Scholar
Boehm B, Abts C, Chulani S (2000) Software development cost estimation approaches: a survey. Ann Soft Eng 10:177–205
Article MATH Google Scholar
Lopez-Martin C (2011) A fuzzy logic model for predicting the development effort of short scale programs based upon two independent variables. Appl Soft Comput 11:724–732
Article Google Scholar
Garcia-Diaz N, Lopez-Martin C, Chavoya A (2013) A comparative study of two fuzzy logic models for software development effort estimation. Procedia Technol 7:305–314
Article Google Scholar
Lopez-Martin C, Isaza C, Chavoya A (2012) Software development effort prediction of industrial projects applying a general regression neural network. Empirical Softw Eng 17:738–756
Article Google Scholar
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2:568–576
Article Google Scholar
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29:985–994
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Electronics Department, Noorul Islam University, Kumaracoil, Tamil Nadu, India
S. K. Pillai
Computer Applications Department, Noorul Islam University, Kumaracoil, Tamil Nadu, India
M. K. Jeyakumar

Authors

S. K. Pillai
View author publications
You can also search for this author in PubMed Google Scholar
M. K. Jeyakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. K. Pillai .

Editor information

Editors and Affiliations

Electrical & Electronics, Rajalakshmi Engineering College, Chennai, Tamil Nadu, India
C. Kamalakannan
Electrical & Electronics Engineering, Noorul Islam College of Engineering, Kumaracoil, Tamil Nadu, India
L. Padma Suresh
Department of Electrical and Electronics Engineering, SRM University SRM Engineering College, Chennai, Tamil Nadu, India
Subhransu Sekhar Dash
Department of Electrical Engineering, Indian Institute of Technology, New Delhi, India
Bijaya Ketan Panigrahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pillai, S.K., Jeyakumar, M.K. (2015). General Regression Neural Network for Software Effort Estimation of Small Programs Using a Single Variable. In: Kamalakannan, C., Suresh, L., Dash, S., Panigrahi, B. (eds) Power Electronics and Renewable Energy Systems. Lecture Notes in Electrical Engineering, vol 326. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2119-7_107

Download citation

DOI: https://doi.org/10.1007/978-81-322-2119-7_107
Published: 20 November 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2118-0
Online ISBN: 978-81-322-2119-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

General Regression Neural Network for Software Effort Estimation of Small Programs Using a Single Variable

Abstract

Similar content being viewed by others

Software Effort Estimation through a Generalized Regression Neural Network

Estimating Software Effort Using Neural Network: An Experimental Investigation

Comparative Analysis of Machine Learning Techniques for Software Effort Estimation

Keywords

107.1 Introduction

107.2 Related Work

107.3 Estimation Problem and Measurements

107.4 Linear Least Squares Regression (LSR)

107.4.1 Using Two Independent Variables (N&C, R)

107.4.2 Using One Independent Variable (N&C)

107.5 General Regression Neural Network (GRN)

107.6 Comparative Analysis

107.7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

General Regression Neural Network for Software Effort Estimation of Small Programs Using a Single Variable

Abstract

Similar content being viewed by others

Software Effort Estimation through a Generalized Regression Neural Network

Estimating Software Effort Using Neural Network: An Experimental Investigation

Comparative Analysis of Machine Learning Techniques for Software Effort Estimation

Keywords

107.1 Introduction

107.2 Related Work

107.3 Estimation Problem and Measurements

107.4 Linear Least Squares Regression (LSR)

107.4.1 Using Two Independent Variables (N&C, R)

107.4.2 Using One Independent Variable (N&C)

107.5 General Regression Neural Network (GRN)

107.6 Comparative Analysis

107.7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation