A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO

Kaushik, Anupama; Tayal, Devendra Kr.; Yadav, Kalpana

doi:10.1007/s13369-019-04250-6

A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO

RESEARCH ARTICLE - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS
Published: 19 November 2019

Volume 45, pages 2605–2618, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO

Download PDF

825 Accesses
12 Citations
Explore all metrics

Abstract

At present, in the software industry, agile and non-agile software development approaches are followed and effort estimation is an intrinsic part of both the approaches. This work investigates the application of deep belief network (DBN) along with antlion optimization (ALO) technique for effort prediction in both agile as well as non-agile software development environment. The study also provides a prediction interval of effort to handle uncertainty in estimation. This will help the project managers to estimate the effort in ranges instead of a crisp value. The proposed DBN-ALO approach is applied on four promise repository datasets for traditional software development (non-agile), and on three agile datasets. It provides the best results in all the evaluation criteria used. The proposed approach is also statistically validated using nonparametric tests, and it is found that DBN-ALO worked best for both agile and non-agile development approaches.

Effort estimation in agile software development using experimental validation of neural network models

Article 06 April 2018

Machine Learning Based Software Effort Estimation of Suggestive Agile and Scrumban Methodologies

Incorporating whale optimization algorithm with deep belief network for software development effort estimation

Article 04 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Software effort estimation is the requirement of all the software development firms. The project managers cannot evade this activity, and an accurate estimation is the need of the hour for software development. The accurate estimations will lead the project to success, and inaccurate estimations will result in its failure. There are many effort estimation approaches available in the literature for non-agile software development. These approaches are based on neural networks, fuzzy logic [1,2,3], and various other soft computing techniques [4, 5]. Recently, nature-inspired algorithms are getting attention from the researchers which comprises of firefly algorithm (FA), cuckoo search (CS), bat algorithm (BA), particle swarm optimization (PSO), etc. In software estimation studies also, these algorithms have been used by many researchers [6,7,8,9]. These algorithms are simpler to understand, implement, and also require less operators than evolutionary algorithms. Most importantly, when these algorithms are integrated with machine learning techniques, they increase the strength of the machine learning technique.

Nowadays, the software development paradigm is shifting toward agile software development, which was incepted in 2001. These agile methodologies have advantages such as flexibility, respond to changes even in later stages of development, customer friendly, and delivers the working software rapidly [10]. The effort prediction aspect of agile software development is slow in its evolution in comparison with non-agile software development, and very fewer studies exist based on various machine learning approaches [11, 12]. This can be due to the lack of publicly available agile data in comparison with non-agile data, and moreover, research in agile is more of a conceptual nature rather than model-driven.

This study provides a model-driven approach of effort estimation for agile and non-agile projects. An accurate estimation is often necessary for agile and non-agile software projects. The model used for effort prediction in both the development approaches is DBN-ALO. Deep belief network (DBN) is a powerful category of deep neural networks. Nowadays, it is extensively used in various applications due to its unique learning capability [13]. The meta-heuristic techniques are primarily used for obtaining optimal solutions. ALO [14] is a meta-heuristic technique based on the hunting nature of antlions. It is chosen over other optimization techniques due to its unique characteristics of exploration and exploitation of the search space. As the algorithm is population-based, it has very few parameters to be adjusted, and local optimum avoidance is also high. To the best of our knowledge, none of the studies have reported DBN-ALO for effort estimation in both traditional and agile software development environments so far.

There are several effort estimation models present, but all these models do not provide 100% accurate results. There are some uncertainty and inaccuracy associated with these estimates. This estimation uncertainty is handled by providing a range of estimation into which the actual effort will presumably fall [15]. This range of estimation is known as prediction interval (PI) proposed by Jorgensen and Sjoberg [16]. Our study also computes a prediction interval (PI) of effort for software projects, which will help the software managers to predict the interval of effort estimate rather than single numeric effort value for a project.

The remaining paper is organized as follows. Section 2 discusses the related work on effort estimation for both agile and non-agile environment. Section 3 briefly describes the background techniques used in the study. Section 4 describes the proposed work. Section 5 discusses the experimental setup and results. Section 6 provides the statistical analysis. Section 7 discusses the threats to validity, and Sect. 8 concludes the paper.

2 Related Work

This section reviews the effort estimation work done in non-agile and agile software development environment in the past few years, which comprises of different techniques and methodologies.

In the non-agile software development environment, the effort estimation studies are broadly classified into parametric and nonparametric studies. The parametric studies are based on mathematical equations, and nonparametric studies use various machine learning approaches. The nonparametric studies are preferred over parametric studies as they are more effective and robust in their results. They can easily model the complex relationships among the contributing factors and provide good reasoning capabilities. Though there are many nonparametric studies available in the literature, we have given only a few of them in brief.

Rijwani and Jain [1] used backpropagation feed-forward neural network for tuning the COCOMO (Constructive Cost Model) parameters. Laqrichi et al. [2] introduced uncertainty in effort estimation using neural networks based on bootstrap technique. They evaluated their technique on ISBSG (International Software Benchmarking Standards Group) dataset, and it provided more realistic effort. Nassif et al. [3] compared Mamdani and Sugeno with constant output, and Sugeno with linear output fuzzy models, and performed regression analysis. They evaluated these models using standard accuracy measures used in effort estimation on ISBSG dataset, and concluded that Sugeno fuzzy inference system with linear output performed best in comparison with other models. Zare et al. [4] proposed a methodology for effort estimation with three levels of Bayesian network, and used fuzzy numbers to depict the interval of all the nodes of the network. They have also used evolutionary algorithms, and their proposed model provided more accurate results than the other models discussed in their work.

Sehra et al. [5] contributed a hybrid model for effort estimation using multi-criteria decision making (MCDM) approach and machine learning algorithm. Their model gave accurate results than bee colony optimization and basic radial basis function (RBF) kernel-based model. Kaushik et al. [6] provided a hybrid effort estimation approach by integrating cuckoo optimization algorithm and the fuzzy inference system. They evaluated their technique on promise datasets for effort estimation, and found their results at par than the other techniques discussed in their work. Kaushik et al. [7] integrated firefly algorithm with radial basis function network and functional link artificial neural network, and predicted software cost. They found their proposed methodology worked best, and used statistical tests to prove that. Sivanageswara Rao et al. [8] proposed multiobjective particle swarm optimization for tuning the COCOMO-based projects, and found their model was better than the COCOMO model. Venkataiah et al. [9] provided ant colony optimization model for software cost prediction, and evaluated their model on three datasets.

Abdelali et al. [17] provided a technique of effort estimation using random forest. They provided this empirical approach by varying its key parameters, and validated their approach using ISBSG Release 8, Tukutuku, and COCOMO datasets. Pai et al. [18] proposed a software effort estimation approach using neural network ensemble and regression analysis. They used 163 software development projects to validate the approach, and found that the project size is the main element in determining the effort of a project. They also observed that neural networks ensemble technique outperforms the regression analysis.

Benala and Mall [19] reported the use of differential evolution for effort estimation in analogy-based software development, and compared their approach with many existing approaches using datasets from promise repository. Ezghari and Zahi [20] proposed Consistent Fuzzy Analogy-based Software Effort Estimation (C-FASEE) model, which overcame the drawbacks of Fuzzy Analogy-based Software Effort Estimation (FASEE). Their model was validated on 13 project datasets, and it provided good estimation accuracy than the other models used for comparison. Abdelali et al. [21] proposed ensemble approach on optimal trees for effort estimation, and found their proposed approach worked well in comparison with random forest and regression trees models. Nguyen et al. [22] proposed a fixed window-based effort estimation model to calibrate COCOMO project data and improved the estimation accuracy of software projects.

In agile software development, the story point estimation is the commonly used effort estimation technique for real-time agile projects, and most of the studies are based on it.

Satapathy and Rath [11] presented a study to improve the story point approach of effort estimation used in agile development environment with different machine learning approaches, and compared the performance of these techniques with the other techniques existing in the literature. Panda et al. [12] contributed to agile effort estimation based on story point approach using different types of neural networks. They also compared the results of these models, and assessed them using standard effort evaluation criteria. Ziauddin et al. [23] provided an effort estimation model for the agile projects based on story point approach, and validated their model using data collected from 21 agile projects. Martínez et al. [24] provided a Bayesian network technique to model the complexity and importance of user stories using planning poker in scrum projects. Their model was based on the data provided by the students and professionals, and found that their model gave better estimates than the traditional planning poker.

Dragicevic et al. [25] proposed a Bayesian model for effort prediction in agile development environment, which can be used during the planning stage of the project development. They evaluated their model on the completed agile projects taken from the single software company, and used standard accuracy measures to assess the model. Tanveer [26] proposed a hybrid methodology of effort estimation for agile projects, which aimed at changing impact analysis for software artifacts. It also included the cost drivers as suggested by the experts for improving the effort estimation. Tanveer et al. [27] contributed to effort estimation of agile projects from agile development team perspective, and presented a case study on three agile development teams of a German multinational software corporation. Bilgaiyan et al. [28] proposed effort estimation for 21 agile projects from six different software houses using feed-forward backpropagation ANN and Elman ANN. Their model was evaluated with three commonly used performance matrices for effort estimation. Britto et al. [29] presented a study on effort estimation of agile global software development by collaborating the results present in the literature survey of effort estimation and global software development context.

Usman et al. [30] investigated distributed large-scale agile projects to explore the effort estimation process, and identified various elements affecting their accuracy. Satapathy et al. [31] optimized the story point approach using support vector regression (SVR) kernel methods, and found that support vector regression with RBF kernel (SVR-RBF) provided the best results for agile projects. Tung and Hanh [32] proposed an integration of artificial bee colony (ABC) and particle swarm optimization (PSO) for effort estimation in agile software development environment. They used velocity and the story points as the main inputs, and found that their technique outperformed the earlier existing studies. Zakrani et al. [33] proposed an improved effort estimation model for agile projects using support vector regression (SVR) optimized by grid search method, and found their model outperformed the SVR kernel methods.

All the techniques available on effort estimation in the literature for agile and non-agile projects are distinct, contributed their best, and have their own merits and demerits. However, there is no standard technique that is commonly accepted for effort estimation. The recent studies in non-agile context used machine learning and various other soft computing approaches for effort prediction, whereas in agile context the studies lack in using the above methods.

The current study is an attempt toward providing a model-driven effort estimation approach for agile and non-agile projects. In order to improve the accuracy of predictions, data from real agile and non-agile projects are used. The novelty here is the integration of DBN and ALO, which is not available in the literature for both agile and non-agile contexts.

3 Background Techniques

This section provides some background concepts used in constructing the proposed framework.

3.1 Deep Belief Network (DBN)

DBN belongs to a set of deep neural networks. The heart of DBN is restricted Boltzmann machine (RBM). All the layers of DBN are placed upon each other as a stack of RBMs. The RBM is a two-layer neural network with a visible layer and a hidden layer as shown in Fig. 1. All the visible layer neurons are connected to the hidden layer neurons, but the nodes of the same layer are not linked to each other. For learning, the inputs are mapped to the visible layer, where they are passed to hidden layer after multiplying with their respective weights, and passing through an activation function producing one output per hidden node.

The structure of DBN is shown in Fig. 2. In DBN, the outputs of hidden layer 1 are passed as input to hidden layer 2, until a final classifying layer is reached. DBN uses a greedy training approach and contrastive divergence method to train the stack of RBMs. Since there is no intra-layer connection in RBM, and also it has the shape of bipartite graph, the hidden neurons are mutually independent given the visible neurons, and vice versa. Given, m visible neurons and n hidden neurons, the conditional probability of hidden neurons H given visible neurons V, is P (H = 1|H). This is known as positive phase. Conversely, the conditional probability of V given H, is P (V = 1|H). This is known as negative phase.

3.1.1 Training of RBM

1.
Map the training dataset to the neurons of the visible layer.
2.
Positive Phase In this phase, all the hidden neurons are updated in parallel. Compute the positive statistics for edge E_ij which is P (H_j = 1|V), and can be given as:

$$ \begin{aligned} {\text{Positive}}\;\left( {E_{ij} } \right): \hfill \\ P \, \left( {H_{j} = \, 1|V} \right) = \sigma \left( {B_{j } + \mathop \sum \limits_{i = 1}^{m} W_{ij} V_{i} } \right) \hfill \\ \end{aligned} $$
(1)

Here, B_j is the bias associated with the hidden neuron H_j, W_ij is the weight associated with the hidden neuron H_j and visible neuron V_i, and $ \sigma $ represents the sigmoid function.
3.
Negative Phase This phase reconstructs all the visible units. Compute the negative statistics for edge E_ij which is P (V_i = 1|H), and can be given as:

$$ \begin{aligned} {\text{Negative}}\;\left( {E_{ij} } \right): \hfill \\ P \, \left( {V_{i} = \, 1|H} \right) = \sigma \left( {A_{i } + \mathop \sum \limits_{j = 1}^{n} W_{ij} H_{j} } \right) \hfill \\ \end{aligned} $$
(2)

Here, A_i is the bias associated with the visible neuron V_i, W_ij is the weight associated with the hidden neuron H_j and visible neuron V_i, and $ \sigma $ represents the sigmoid function.
4.
Update the weights The previous weight W_ij is updated as:

$$ {\text{Updt}}\left( {W_{ij} } \right) = W_{ij} + L*\left( {{\text{Positive}}\left( {E_{ij} } \right) - {\text{Negative}}\left( {E_{ij} } \right)} \right) $$
(3)
where L is the learning weight.
5.
Transpose the weights and repeat with all the training examples till the required threshold is achieved.

3.2 Antlion Optimization Algorithm

The antlion optimization algorithm (ALO) was proposed by Mirjalili [14]. It is a meta-heuristic technique based on the hunting mechanism of antlions. In this algorithm, antlions catch the ants by building cone-shaped traps. Once the ants enter the trap, the antlions consume these ants, but many times the ants also try to evade the confinement. For such case, the antlions throw the soil toward the outer edge of the pit, so that the ants trying to escape slide down.

3.2.1 Random Walk of Ants

The location of ants and antlions is stored in two different matrices, and each of them is evaluated using a fitness function. The ants move randomly in the search space which is modeled as:

$$ Z\left( s \right) = \left[ {0,{\text{ cs}}\left( {2y\left( {s_{1} } \right) - 1} \right),{\text{ cs}}\left( {2y\left( {s_{2} } \right) - 1} \right), \ldots ,{\text{ cs}}\left( {2y\left( {s_{n} } \right) - 1} \right)} \right] $$

(4)

where cs is the collective sum, n is the maximal number of iterations, s is the step of random walk, and y(s) is defined as:

$$ y\left( s \right) = \left\{ {\begin{array}{*{20}c} 1 & {{\text{if}}\quad {\text{rno}} > 0.5} \\ 0 & { {\text{if}}\quad {\text{rno}} \le 0.5} \\ \end{array} } \right. $$

(5)

where rno is the random number in the interval [0 1].

This random walk is restricted within the bounds of the search space which is done using min–max normalization equation given as follows.

$$ Z_{i}^{s} = \frac{{\left( {Z_{i}^{s} - e_{i} } \right) \times \left( {h_{i} - g_{i}^{s} } \right)}}{{\left( {h_{i}^{s} - e_{i} } \right)}} + g_{i} $$

(6)

where $ e_{i} $ provides the minimal value of random walk, $ g_{i}^{s} $ provides the minimal of the ith variable at the sth iteration, and $ h_{i}^{s} $ provides the maximal of the ith variable at the sth iteration.

3.2.2 Entrapping in Pits

The random walk of ants in the search space is affected by the antlion’s trap which is modeled using the equation:

$$ g_{i}^{s} = {\text{Ant}}\;L_{j}^{s} + g^{s} $$

(7)

$$ h_{i}^{s} = {\text{Ant}}\;L_{j}^{s} + h^{s} $$

(8)

where $ g^{s} $ is the minimal value at the sth iteration, $ h^{s} $ is the maximal value at the sth iteration, $ g_{i}^{s} $ provides the minimal value for the ith ant, $ h_{i}^{s} $ provides the maximal value for the ith ant, and $ {\text{Ant}}\;L_{j}^{s} $ is the location of the chosen jth antlion at the sth iteration.

The ants walk around a selected antlion randomly in a hyperspace as given by Eqs. (7) and (8).

3.2.3 Building Trap and Elitism

The antlions build the traps to catch the ants, but only the fitter antlions can catch the prey. This hunting capability of antlions is modeled by roulette wheel operator. The fittest antlion is obtained in every iteration, and termed as elite. The ants walk around the elite and the selected antlion by the roulette wheel which is given by:

$$ {\text{Ant}}\;_{i}^{s} = \frac{{Z_{A}^{s} + Z_{E}^{s} }}{2} $$

(9)

where $ Z_{A}^{s} $ is the random walk about the antlion at the sth iteration, $ Z_{E}^{s} $ is the random walk surrounding the elite at the sth iteration, and $ Ant_{i}^{s} $ shows the location of the ith ant at the sth iteration.

3.2.4 Sliding Ants Toward Antlion

The ants trying to evade the trap are controlled by antlions by throwing soil outward so that the ants slide inside the trap. This is modeled using the equations:

$$ g^{s} = \frac{{g^{s} }}{L} $$

(10)

$$ h^{s} = \frac{{h^{s} }}{L} $$

(11)

where $ g^{s} $ provides the minimal value at the sth iteration, $ h^{s} $ provides the maximal value at the sth iteration, and L is a ratio given by:

$$ L = 10^{W} \frac{s}{T} $$

(12)

where s is the current iteration, T is the maximal number of iterations, and w is a constant that regulates the accuracy of exploitation.

3.2.5 Capturing Prey and Rebuilding the Pit

The ants are caught and consumed by the antlions when they reach the bottom of the pit. This behavior of antlions moves them to the current position of the hunted ant. It also enhances their chance of catching a new prey, and is modeled by:

$$ {\text{Ant}}\;L_{j}^{s} = {\text{Ant}}_{i}^{s} \quad {\text{if}}\;f\left( {{\text{Ant}}_{i}^{s} > {\text{Ant}}L_{j}^{s} } \right) $$

(13)

where s is the current iteration, $ {\text{Ant}}\;L_{j}^{s} $ is the location of the chosen jth antlion at the sth iteration, and $ {\text{Ant}}_{i}^{s} $ is the location of the ith ant at the sth iteration.

The ALO algorithm is represented using a flow diagram in Fig. 3.

3.3 Story Point Approach

In an agile development approach, the project teams develop user stories of a project. A user story is the high-level description of the requirement which help the developers to estimate the effort to implement them. The user stories are assigned story points, which is a metric to estimate the effort of implementing a user story. The proposed work uses the user story estimation approach as suggested by Ziauddin et al. [23]. It assigns the story point to a user story based upon its size and complexity. There is another important factor, velocity. It is the amount of work done in a sprint time, where sprint time is the time allocated for a specific work to get completed and reviewed. In various machine learning studies, the final velocity and the story points of an agile project are taken as input arguments to estimate the effort.

3.4 Prediction Interval Approach of Effort Estimation

Jorgensen and Sjoberg proposed prediction interval (PI) technique in empirical distribution [16]. In this technique, the effort PI of a new project depends on the estimation accuracy of earlier software projects. Each PI has a minimum and maximum effort value, and a confidence level. The confidence level suggests how much the PI comprises the effort.

The confidence level used in the study is 90%. The matrix used for the PI computation is balanced relative error (BRE) computed as:

$$ {\text{BRE}} = \frac{{{\text{Actual}}\;{\text{Effort}} - {\text{Estimated}}\;{\text{Effort}}}}{{{\text{Min}}\left( {{\text{Actual}}\;{\text{Effort,}}\;{\text{Estimated}}\;{\text{Effort}}} \right)}} $$

(14)

To estimate PI of a new software project, completed projects of similar nature are required. We calculate the BRE values of all the completed projects, and find the minimum and maximum BRE. The PI of a new software project is calculated as:

$$ {\text{PI}} = \left\{ {\begin{array}{*{20}c} {\frac{{{\text{Estimated}}\;{\text{Effort}}_{\text{newproject}} }}{{1 - {\text{BRE}}}}, } & {{\text{BRE}} \le 0} \\ {{\text{Estimated}}\;{\text{Effort}} \cdot \left( {1 + {\text{BRE}}} \right), } & {{\text{BRE}} > 0} \\ \end{array} } \right. $$

(15)

4 Proposed Work

This work proposes DBN-ALO and investigates whether the same effort estimation technique can be used for both agile and non-agile software development approaches. It also finds an effort prediction interval of a software project (Sect. 3.4). The block diagram of DBN-ALO is given in Fig. 4. The work is evaluated on four datasets for non-agile software development, and three datasets for agile software development. Before beginning the framework, all the data are normalized. The normalized data are fed into DBN-ALO. The key element in DBN-ALO is to determine the number of RBM stacks, input nodes, hidden nodes in each RBM stack, and the output layer. The number of RBM stacks and the nodes at their hidden layer are determined through experiments. The number of nodes at the visible layer of each RBM is equivalent to the number of attributes used from the datasets. In our architecture, the number of nodes at the hidden layer of each RBM is five for non-agile inputs, and three for agile inputs. There is only one node at the output layer which provides the effort. The input data are mapped to the visible layer of DBN and processed according to the training algorithm as given in Sect. 3.1. This is passed through three RBM stacks. At the output layer, the effort is computed as a linear-weighted sum of the final RBM outputs. The backpropagation algorithm is run between the hidden layer of the final RBM and the output layer to reduce the error between the estimated and actual effort. Delta rule is employed to update the weights between them. The ALO is applied to initialize the weights between the hidden layer of the third RBM stack and the output layer as it provides the optimal value. Hence, this reduces the error in minimum time between the estimated value and the actual value of effort. The parameters and their values used in DBN-ALO are given in Table 1. These values are chosen after running the programme code of DBN-ALO multiple times with different values of these parameters, and the values which gave the best results are tabulated. The procedure of DBN-ALO is given in Fig. 5. This whole procedure is implemented in MATLAB R2018b.

Table 1 Parameters of DBN-ALO

A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO

Abstract

Similar content being viewed by others

Effort estimation in agile software development using experimental validation of neural network models

Machine Learning Based Software Effort Estimation of Suggestive Agile and Scrumban Methodologies

Incorporating whale optimization algorithm with deep belief network for software development effort estimation

Explore related subjects

1 Introduction

2 Related Work

3 Background Techniques

3.1 Deep Belief Network (DBN)

3.1.1 Training of RBM

3.2 Antlion Optimization Algorithm

3.2.1 Random Walk of Ants

3.2.2 Entrapping in Pits

3.2.3 Building Trap and Elitism

3.2.4 Sliding Ants Toward Antlion

3.2.5 Capturing Prey and Rebuilding the Pit

3.3 Story Point Approach

3.4 Prediction Interval Approach of Effort Estimation

4 Proposed Work

5 Experimental Evaluation

6 Statistical Validations

7 Threats to Validity

7.1 Internal Validity

7.2 External Validity

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation