1 Introduction

Time and cost are two important parameters in any type of project, but the importance of these parameters in software projects is much greater than in other projects. There are two main reasons why being on time and on budget are the primary focus in software projects. The first reason is the rapid and dynamic advancements in hardware platforms, which forces software developers to deliver consistent and adaptable software in a limited time frame to fill the gap that exists between software and hardware progress. The second reason is the constant change in customer demands. Nowadays, the diversity of software is very high and the software market has become ever more competitive. Customer requests change every day, and software developers must respond to this event by providing software that is on time and on budget. Moreover, software projects are based on logical and analytical work, while others are based on physical work. We cannot measure the complexity of a software project until we actually begin working on it. Our final object is an intangible product that is very difficult to estimate (Stepanek 2005). Due to the lack of standards for determining the cost and effort of a project, managers and developers are often forced to make decisions regarding these parameters based on prior experience.

The process of improvement in estimation methods has grown steadily; estimation began with very simple assumptions and today it includes complicated equations and techniques. A discussion of existing estimation methods requires dividing the methods into two main groups: algorithmic and nonalgorithmic. The first idea for software development effort estimation based on algorithmic methods was conceived in 1950 by presenting the manual rule of thumb (Jones 2007). As the number of software projects and the need for high-quality software on the part of users increased, new estimation models were presented in 1965 that used linear equations and regression techniques (Boehm and Valerdi 2008). Larry Putnam, Barry Boehm, and Joe Aron were notable pioneers in software estimation methods (Jones 2007). Later, in 1973, IBM researchers presented the first automated tool, Interactive Productivity and Quality (IPQ) (Jones 2007). Barry Boehm proposed a new method called COCOMO that utilized experimental equations to estimate the software development effort (Boehm 1981). In addition, Boehm (1981) explained several algorithms in his book Software Engineering Economics that are still used today by researchers. Other models such as Putnam Lifecycle Management (SLIM) and Software Evaluation and Estimation of Resources-Software Estimating Model (SEER-SEM) continued the principles of COCOMO (Boehm and Valerdi 2008).

The introduction of the function point (FP) by Albrecht (Albrecht and Gaffney 1983) was the other important event in that decade. FP was proposed to measure the size of a project in its early stages. All previous estimation methods used Line Of Code (LOC) as the main input parameter, but LOC was a subjective and inaccurate parameter, and most methods suffered from its negative effect on accuracy of estimates. On the contrary, FP was more accurate than LOC in determining the size of a software project based on measuring the functionality of a system (Khatibi Bardsiri and Jawawi 2012). Changes in software development methods and rapid progress in software methodologies led to the development of a new version of COCOMO called COCOMO II in 2000 (Boehm 2000). The new version covered the new features and requirements of software projects. This version of COCOMO included additional details regarding system functionality and utilized FP to determine the scope of projects.

On the other hand, nonalgorithmic estimation methods are constructed based on analyzing completed software projects. Since software projects naturally diff from other types of projects, in many cases algorithmic methods are unable to deal with their dynamic behavior. Moreover, lack of information in the early stages of software projects makes estimating very difficult when algorithmic methods are employed. Therefore, researchers have proposed nonalgorithmic methods to overcome these barriers. Some researchers believed that an estimating process should include expert consultation and opinion. Hence, expert judgment methods were proposed in 1963 (Dalkey and Helmer 1963). In these methods, experts share their ideas in prearranged sessions to achieve agreement on estimation issues. Several studies have tried to analyze the principles of expert judgment methods and to make it easier to use (Molokken and Jorgensen 2005; Jorgensen and Halkjelsvik 2010).

The classification and regression tree (CART) (Breiman, Friedman et al. 1984) was proposed in 1984 as another nonalgorithmic method in this field. Researchers used this method to analyze prior software project data and construct a regression tree in which leaves represented the amount of effort. In such trees, a path is traversed from root to leaf based on the target project features (Briand et al. 1999; Mendes et al. 2003).

The most common non-algorithmic method, analogy-based estimation (ABE), was proposed as a comparative method in 1997 (Shepperd and Schofield 1997). This method predicts the software development effort by comparing the target project with past completed projects and finding projects that most resemble the target project. A comparison process is performed based on project features. Each software project is described through several features (e.g., FP, development type, and application type), which are used by ABE to identify the similarity level of projects.

Due to variations in estimates, some researchers preferred to consider an interval instead of a fixed value for the purpose of estimating (Bakır et al. 2011). Moreover, the effect of data set quality has been investigated as a key point in this area (Bakır et al. 2010).

As neural networks have been widely used for estimation purposes in various sciences, they have also been employed in software development effort estimation (Li et al. 2009a, b; Bhatnagar et al. 2010).

Finally, fuzzy and neuro fuzzy systems have been used in this field to interpret the behavior of software projects from the perspective of effort estimation. Most research studies in this area have tried to model the size of a software project using fuzzy logic to estimate the effort. The COCOMO model has been widely used as the basic model in fuzzy-based research (Ahmed et al. 2005; Azzeh et al. 2010). This paper is organized in six sections as follows. The ABE method and a particle swarm optimization (PSO) algorithm are explained in Sects. 2 and 3, respectively, and the proposed hybrid model is presented in Sect. 4. Section 5 includes the obtained results, and, finally, the conclusion is stated in Sect. 6.

2 Analogy-based estimation (ABE)

The ABE method was proposed by Shepperd in 1997 (Shepperd and Schofield 1997) as a substitute for algorithmic methods. In this method, the estimation of software project metrics is performed by comparing the target project with previously completed projects and finding the projects that are most similar to the target project. Due to its simplicity, ABE has been widely used in software projects many similar predecessors. Basically, ABE consists of four main parts (Khatibi Bardsiri and Jawawi 2011):

  1. (i)

    Historical data set,

  2. (ii)

    Similarity function,

  3. (iii)

    Associated retrieval rules,

  4. (iv)

    Solution function.

The estimation process of ABE consists of the following steps:

  1. 1.

    Gather data from previous projects and produce a basic data set.

  2. 2.

    Choose proper measurement parameters such as FP and LOC.

  3. 3.

    Retrieve previous projects and calculate the similarities between target project and previous projects.

  4. 4.

    Estimate target project effort.

2.1 Similarity function

ABE uses a similarity function that compares the features of two projects and determines the level of similarity between them. There are two popular similarity functions: Euclidean similarity (ES) and Manhattan similarity (MS) (Shepperd and Schofield 1997). In mathematics, the Euclidean distance or Euclidean metric is the ordinary distance between two points. It is frequently used in optimization problems in which distances only have to be compared. Manhattan distance is a form of geometry in which the usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their coordinates.

Both of these functions have been widely used to measure the degree of similarity between software projects. There is no basis for selecting Euclidean or Manhattan functions for a particular data set as this selection is performed by trial and error. The nature of the projects in the data set and the normality level can have a considerable effect on the performance of similarity functions. Equation (1) shows the ES function:

$$ {\text{Sim}}(p,p^{\prime } ) = \frac{1}{{\left[ {\sqrt {\mathop \sum \nolimits_{i = 1}^{n} w_{i} {\text{Dis}}\left( {f_{i} ,f_{i}^{\prime } } \right)} + \delta } \right]}}\quad \delta = 0.0001 $$
(1)
$$ {\text{Dis}}\left( {f_{i} ,f_{i}^{\prime } } \right) = \left\{ {\begin{array}{*{20}c} {\left( {f_{i} - f_{i}^{\prime } } \right)^{2} } \hfill & { {\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{numerical}}\;{\text{or}}\;{\text{ordinal}}} \hfill \\ 0 \hfill & {{\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{nominal}}\;{\text{and}} \;f_{i} = f_{i}^{\prime } } \hfill \\ 1 \hfill & {{\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{nominal}}\;{\text{and}}\;f_{i} \ne f_{i}^{\prime } } \hfill \\ \end{array} } \right\} $$

where p and \( p^{\prime } \) are the projects, w i is the weight, ranging between 0 and 1, assigned to each feature. fi and \( f_{i}^{\prime } \) display the ith feature of each project, and n indicates the number of features. δ is used to obtain nonzero results. The MS formula is very similar to the ES one, but it computes the absolute difference between features. Equation (2) shows the MS function:

$$ {\text{Sim}}(p,p^{\prime } ) = \frac{1}{{\left[ { \mathop \sum \nolimits_{i = 1}^{n} w_{i} {\text{Dis}}\left( {f_{i} ,f_{i}^{\prime } } \right) + \delta } \right]}}\quad \delta = 0.0001 $$
(2)
$$ {\text{Dis}}\left( {f_{i} ,f_{i}^{\prime } } \right) = \left\{ {\begin{array}{*{20}c} {\left| {f_{i} - f_{i}^{\prime } } \right|} \hfill & {{\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{numerical}}\;{\text{or}}\;{\text{ordinal}}} \hfill \\ 0 \hfill & { {\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{nominal}}\;{\text{and}}\; f_{i} = f_{i}^{\prime } } \hfill \\ 1 \hfill & {{\text{if}}\;f_{i} \;{\text{and}}\;f_{i}^{\prime } \;{\text{are}}\;{\text{nominal}}\;{\text{and}}\; f_{i} \ne f_{i}^{\prime } } \hfill \\ \end{array} } \right\} $$

In addition, there are other similarity functions such as rank mean similarity (Walkerden and Jeffery 1997), maximum distance similarity, and Minkowski similarity (Angelis and Stamelos 2000). These three similarity functions have been widely used.

In some previous works, several types of similarity function have been employed to determine the optimum performance of ABE because there is no specific method to indicate the best similarity function to be used in defined conditions (Shepperd and Schofield 1997; Chiu and Huang 2007; Li et al. 2007a, b, 2009a, b; Li and Ruhe 2008a, b).

2.2 Solution function

A solution function is used to estimate the software development effort by considering similar projects that were found to be based on the similarity function. Common solution functions include the most similar project (known as closest analogy) (Walkerden and Jeffery 1999), average of most similar projects (mean) (Shepperd and Schofield 1997), median of most similar projects (median) (Angelis and Stamelos 2000), and inverse distance weighted mean (inverse) (Kadoda et al. 2000). The average describes the mean amount of effort for K most similar projects, where K > 1. The median describes the median amount of effort for K most similar projects, where K > 2. Finally, the inverse adjusts the portion of each project in estimation using Eq. (3):

$$ C_{p} = \mathop \sum \limits_{k = 1}^{K} \frac{{{\text{Sim}}(p,p_{k} )}}{{\mathop \sum \nolimits_{i = 1}^{K} {\text{Sim}}(p,p_{i} )}}C_{{p_{k} }} $$
(3)

where p is the new project, p k the kth most similar project, \( C_{{p_{k} }} \) the effort value of the kth most similar project p k , Sim(p, p k ) the similarity between projects p k and p, and K the total number of most similar projects. Different types of solution functions have been tried; some studies used just one solution function (Huang and Chiu 2006; Li and Ruhe 2008a, b), while other studies considered several types of solution function (Angelis and Stamelos 2000; Li et al. 2009a, b).

2.3 K nearest neighborhood (KNN)

Finding the best value of K [in Eq. (3)] has been the subject of several studies in the last decade because the value of K affects the level of accuracy in estimates and may vary in different data sets. The research studies related to analogy methods can be divided based on the method used to find the best number of analogs (K). In a number of studies, the authors considered a fixed number for K (Walkerden and Jeffery 1999; Chiu and Huang 2007). On the other hand, a number of studies determined a range for K and then tried to find the best number (Shepperd and Schofield 1997; Huang and Chiu 2006; Li et al. 2009a, b).

Recently, some researchers have proposed a dynamic search method to find the optimal value of K so that this value may be different for different projects (Li et al. 2007a, b).

2.4 Previous attempts to improve the performance of ABE

Analyzing the correlation coefficient has been employed to improve the performance of ABE. It can be used for feature weighting and feature selection in terms of software development effort estimation. Features that have a weak correlation with effort are given a low weight, and features with a strong correlation are given a high weight. Features with no correlation are deleted. Some studies have demonstrated improvement in the performance of ABE using this technique (Keung et al. 2008; Jianfeng et al. 2009).

Rough set analysis (Pawlak 1991) is a weighting technique that suggests the proper weights for independent features based on a series of predefined criteria. All features are divided into condition features (independent) and decision features (dependent). In this technique, analyzing dependencies between features leads to several reducts (classes), which are subsets of features. Subset discovery is performed using decision rules. The intersection of all classes is considered to comprise core features, which are treated as the most important features. The weighting system in rough set technique is constructed based on the frequency of features in reducts, the existence of features in the core set, and the number of times a feature appears in decision rules. This technique has been used to estimate software development effort (Li et al. 2007a, b; Li and Ruhe 2008a, b).

Gray theory was introduced in 1982 (Deng 1982). Gray indicates a return to the fuzziness concept, black means incomplete and unknown information, and white denotes completed and known information. Gray analysis is a statistical technique used to determine the similarity level between two observations based on a comparison of their features. Use of this technique can improve the performance of the ABE method in terms of comparison (Huang et al. 2008; Azzeh et al. 2010; Hsu and Huang 2011; Song and Shepperd 2011).

One of the most important parts of the ABE method is the solution function because it has a significant effect on the accuracy of predictions. Therefore, several studies have tried to enhance the solution function by applying adjustment expressions (Jorgensen et al. 2003; Chiu and Huang 2007; Li et al. 2009a, b). An adjustment expression refines the results of solution functions to produce more accurate estimations.

Search-based software engineering (SBSE) is a method to apply metaheuristic search techniques like genetic algorithms (GAs), simulated annealing, and tabu search to software engineering problems. Due to the complexity and uncertainty of these problems, common optimization techniques in operations research like linear programming or dynamic programming are mostly impractical for large-scale software engineering problems. This is why researchers and practitioners have utilized metaheuristic search techniques in this area (Harman and Jones 2001).

Despite the fact that the main application of SBSE is in software testing (McMinn 2004), it has been applied to other software engineering activities such as requirements analysis (Greer and Ruhe 2004), software design (Clark and Jacob 2001), software development (Alba and Chicano 2007), and software maintenance (Antoniol et al. 2005). In this paper, SBSE is employed in the field of software development effort estimation by combining ABE and PSO optimization algorithms. In the following sections, studies that have used the SBSE method to increase the accuracy of effort estimation are reviewed.

Optimization techniques can be useful when adjusting the feature weights in the ABE similarity function. GAs, as the most common optimization method, have been used to determine the feature weights in the ABE method (Huang and Chiu 2006; Li et al. 2007a, b, 2008, 2009a, b). Huang and Chiu (2006) defined several linear and nonlinear equations in which the weights are determined for effort drivers. In this study, a GA was employed to search for the best parameters that must be used in the mentioned equations. The results showed that a GA could significantly improve the performance of ABE by adjusting the parameters involved in the equations.

A combination of GA and other techniques has been commonly used to increase the accuracy of effort estimates. Linear adjustment (Chiu and Huang 2007), a gray relational similarity technique (Huang et al. 2008; Hsu and Huang 2011), and regression methods (Oliveira et al. 2010) have been combined with GAs to obtain more accurate results.

Due to the high level of uncertainty, fitness functions or optimization targets play a significant role for effort estimation in software projects. Ferrucci et al. (2010) investigated the efffect of different fitness function on the accuracy of estimates. The results showed that an appropriate selection of performance metrics (which must be optimized) can significantly increase the accuracy of estimates.

The use of a PSO algorithm has been limited to enhancing the performance of the COCOMO model (Lin 2010; Sheta et al. 2010; Reddy 2011). A PSO has no overlapping and mutation calculation. The search can be carried out by the speed of particles. Over several generations, only the most optimistic particle can transmit information to other particles, and the speed of search is very fast. After that the calculation in PSO is very simple. Compared with other calculations under development, PSO offers enhanced optimization and can be completed easily (Bai 2010). Therefore, PSO can be more computationally efficient than GAs in some cases, and it is reasonable to employ this algorithm in field of effort estimation.

3 Particle swarm optimization (PSO) algorithm

Particle swarm optimization (PSO), inspired by the social behavior of birds flocking or fish in schools, is a population-based stochastic optimization technique developed by Kennedy and Eberhart (1995). The main strength of PSO is its fast convergence, which compares with many global optimization algorithms like GAs, simulated annealing, and other global optimization algorithms.

PSO shares many similarities with evolutionary computation techniques such as GAs. The system is initialized with a population of random solutions and searches for optima by generational updating. However, unlike GAs, PSO has no evolution operators such as crossover and mutation. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. Detailed information will be given in the following sections.

3.1 Optimization process

PSO learns from a scenario and uses it to solve optimization problems. In PSO, each single solution is a “bird” known as a particle in the search space. All particles have fitness values that are evaluated by the fitness function to be optimized and have velocities that direct the flight of the particles. The particles fly through the problem space by following the current optimum particles.

PSO is initialized by a group of random particles (solutions) and then searches for optima by updating generations. In any iteration, each particle is updated by following the two “best” values. The first one is the best solution (fitness) it has achieved so far (the fitness value is also stored). This value is called Pbest. Another “best” value that is tracked by the particle swarm optimizer is the best value obtained so far by any particle in the population. This best value is a global best and called Gbest. When a particle takes part of the population as its topological neighbors, the best value is a local best and is called lbest. After finding the two best values, the particle updates its velocity and position by following Eqs. (4) and (5):

$$ v[i] = w*v[i] + c_{1} *{\text{rand}}()*({\text{Pbest}}[i] - {\text{present}}[i]) + c_{2} *{\text{rand}}()*\left( {{\text{Gbest}} - {\text{present}}[i]} \right) $$
(4)
$$ {\text{present}}[i] = {\text{present}}[i] + v[i] $$
(5)

where v[] is the particle velocity and present[] is the current particle (solution). Pbest[] and Gbest[] are defined as stated previously. rand() is a random number in the interval (0,1), c 1 is a “cognitive parameter,” c 2 is a “social parameter,” and w is the inertia weight. The combination of these parameters determines the convergence properties of the algorithm.

3.2 Parameter value selection

For the initial version of the PSO, the values for c 1, c 2, and w must be selected. This selection has an effect on the convergence speed and the ability of the algorithm to find the optimum, but different values may be better for different problems. Much work has been done to select a combination of values that works well in a wide range of problems.

The inertia weight determines how the previous velocity of a particle influences the velocity in the next iteration:

  • If w = 0, then the velocity of the particle is only determined by the part i and part g positions; this means that the particle may change its velocity instantly if it is moving far from the best positions recorded in its history. Thus, low inertia weights favor exploitation (local search).

  • If w is high, the rate at which a particle may change its velocity is lower (it has an “inertia” that makes it follow its original path) even when better fitness values are known. Thus, high inertia weights favor exploration (global search).

In Shi and Eberhart (1998, 1999), a decaying inertia weight is proposed and tested, with the aim of favoring global search at the start of the algorithm and local search subsequently. If the inertia weight is not reduced with time, the authors suggest selecting a value of w ∈ [0.8, 1.2]. The values of the cognitive and social factors are not critical for the algorithm, but the selection of proper values may result in better performance, both in terms of speed of convergence and alleviation of local minima. Their values must be taken into account when choosing the inertia weight. Several studies propose different values for these parameters that are considered adequate for some of the usual benchmark functions (Table 1).

Table 1 Suggested PSO parameter values

The pseudocode of the procedure is as follows:

figure e

Particle velocities in each dimension are clamped to a maximum velocity V max. If the sum of accelerations causes the velocity of that dimension to exceed V max, which is a parameter specified by the user, then the velocity of that dimension is limited to V max.

4 PSO-based estimation model

Even though ABE is a simple, fast, and straightforward estimation method, the nonnormality of software project data sets makes the process of estimation quite difficult. Indeed, the nonnormality of software projects is the main problem that all comparison-based methods like ABE suffer from. As a solution, accurate and efficient feature weighting can considerably improve the performance of ABE. This paper aims to overcome this problem by suggesting appropriate weights for project attributes. To explicitly determine the research goal, three research questions are defined as follows:

  1. 1.

    How can the PSO algorithm be combined with ABE to design an efficient attribute-weighting system?

  2. 2.

    How do the different structures of ABE affect the accuracy of the weighting system?

  3. 3.

    Can a combination of ABE and PSO algorithm increase the accuracy of estimates achieved by other estimation methods?

Since the comparison of a new project with completed projects is the main feature of the ABE method, the accuracy of estimations strongly depends on the integrity of comparisons. Determination of the similarity level that exists between two projects, without regard to the importance of each feature, may have a negative impact on the integrity of comparisons. Due to the uncertain and complex nature of software projects, the process of comparison requires more attention as compared to other projects. Comprehensive feature analysis prior to project comparison can improve the performance of the ABE method. As stated previously, the comparison step is performed through the similarity function in the ABE method. Therefore, the proposed model emphasizes improving the performance of the similarity function. In the proposed hybrid estimation model, the PSO algorithm is combined with the ABE method to increase the accuracy of estimates. Flexibility and adaptability are two valuable specifications of PSO that enable it to overcome the complexity and vagueness of software project features. Indeed, the role of PSO is to find the most appropriate feature weights for use in the similarity function. Weight allocation is performed with regard to the optimization of ABE performance parameters described in the following section. The process of model construction is displayed through a framework including training and testing stages.

4.1 Performance metrics

The performance of estimation methods is evaluated using four main metrics: relative error (RE), magnitude of relative error (MRE), mean magnitude of relative error (MMRE), and percentage of the prediction (PRED), which are computed as follows (Shepperd and Schofield 1997):

$$ {\text{RE}} = \frac{{({\text{Estimate}} - {\text{Actual}})}}{\text{Actual}} $$
(6)
$$ {\text{MRE}} = \frac{{|{\text{Estimated}}- {\text{Actual}} |}}{\text{Actual}} $$
(7)
$$ {\text{MMRE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} {\text{MRE}}}}{N} $$
(8)
$$ {\text{PRED}}(X) = \frac{A}{N} $$
(9)

where A is the number of projects with MRE less than or equal to X, and N is the number of estimated projects. An acceptable level of X in software effort estimation methods is 0.25, and the proposed methods are compared based on this level. MMRE as the total amount of error must be minimized, whereas PRED(0.25) must be maximized.

4.2 Training stage

In the training stage, the proposed estimation model is constructed based on adjusting the feature weights by means of the PSO algorithm to use in the ABE similarity function. The dependent feature (target feature) is development effort; all others are considered independent features. The training stage begins by dividing all available projects into three main groups—basic, training, and testing projects. Projects in the first and second groups are used to construct the estimation model, while testing projects are employed to evaluate the performance of the proposed model. Indeed, the basic projects are the basis of comparison in the proposed model.

In other words, training projects are compared with basic projects to find the most appropriate weights, and testing projects are also compared with the basic projects to examine the accuracy of estimation model. A project is removed from the training group and applied to the similarity function as a new project that needs to be estimated (estimation of project development effort). The PSO algorithm assigns weights (in the range of [0,1]) to the independent features considered in the similarity function (in the first iteration, suggested weights are generated randomly). The removed project is compared with the basic projects by applying PSO suggested weights to Eq. (1). The similarity function discriminates the most similar projects (from basic projects) to the removed project and sends them to the solution function. Eventually, the effort is estimated in a solution function, and the MRE performance metric is computed. This process is repeated until all training projects are estimated.

In the next step, MMRE and PRED(0.25) are computed for a training group based on the obtained amounts of MRE, and these values are passed to the PSO algorithm. Since increasing PRED(0.25) and decreasing MMRE are the main goals of any estimation models in this field, the PSO algorithm is adjusted to minimize the value of (MMRE − PRED(0.25)) as the target parameter. The most efficient weights are found using the PSO algorithm if both MMRE and PRED(0.25) are considered in the optimization process. A low value of MMRE regardless of PRED(0.25) or high value of PRED(0.25) regardless of MMRE cannot be interpreted as an accurate result because there may be biased estimates that change a performance metric to an unreal value. If termination criteria (number of iterations or level of error) are satisfied, the weights are recorded (as optimized weights) to use in the testing stage; otherwise, the PSO algorithm modifies the weights with consideration of the obtained performance parameters. New weights are applied to the similarity function, and all computations are again repeated for the training projects.

This process is continued until termination criteria are satisfied (maximum number of iterations or desired rate of error are achieved). Figure 1 depicts the training stage of the proposed estimation model. As seen in the figure, there are two cycles in the training stage where the first is related to the computation of MMRE for training projects and the second for adjustment of weights by means of the PSO algorithm.

Fig. 1
figure 1

Training stage of proposed model

4.3 Testing stage

The main goal of the testing stage is to evaluate the accuracy of the constructed estimation model by applying unseen projects. In this stage, for the purpose of exploring the performance of the proposed model, basic and testing projects are employed as similarity function inputs. In addition, optimized weights, obtained from the training stage, are applied to the similarity function.

Similar to what was done in the training stage, a project is removed from testing projects and then compared with basic projects using the similarity function. Projects that are most similar to the removed project are determined and sent to the solution function. The effort is estimated, and then the amount of MRE is computed. This process is repeated for all testing projects and, finally, performance metrics of MMRE and PRED(0.25) are computed. The testing stage is shown in Fig. 2. As stated in Sect. 4.2, the weights proposed (by PSO) for project features are produced so that the ABE can estimate the training projects as accurately as possible. In this case, basic projects are utilized by ABE for the purpose of comparison. Therefore, two-thirds of the existing projects in the data set (including basic and training sets) are used to reach the best possible weights, and the remaining projects are considered as the test set. PSO randomly generates the weights and adjusts them during the iterations to reach the desired accuracy.

Fig. 2
figure 2

Testing stage of proposed model

In the proposed model feature weighting is performed so that the combination of weights helps ABE to produce the most accurate results. The value assigned to a feature is a number in the range [0, 1], which is meaningless by itself. PSO is a stochastic algorithm that produces different weights at each iteration, and the best combination of feature weights can be different from one execution to another (different combinations may lead to the same accuracy level). Therefore, the weight assigned to a feature cannot be interpreted as the importance of that feature but as a part of the combination that must be utilized by ABE.

4.4 Evaluation process

If the accuracy of the effort estimation model is computed by applying projects that have been used in the construction of the model, the performance evaluation may be too optimistic. The estimation error may be artificially low and may not reflect the real performance of the model on unseen projects (Hayes 1994). A cross-validation approach gives a more realistic accuracy assessment. It involves dividing the whole data set into multiple training and test sets.

The estimation results in the training stages present the estimation accuracies from the model construction data sets. The testing stages evaluate the estimation accuracies of the models using the other unseen data sets. In the training stage, a cross-validation approach calculates the accuracy for each training data set and then combines the accuracies across all training sets for the training result. In the testing stage, the approach calculates the accuracy for each test set and then combines the accuracies across all test sets for the test result. In this paper, the proposed model is evaluated using threefold cross validation as stated in the following section.

4.4.1 Cross validation

Basic, training, and testing are the three main groups that must be involved in a cross-validation process. Because these three groups are randomly selected from available projects, they can be employed in a k-fold cross validation. Since there are three separate groups of projects, it may seem that threefold cross validation should be considered for the purpose of evaluating, but something is different in this case. In threefold cross validation, all samples are randomly divided into three sets where two are combined to form the training set, and the other is considered as the test set. This classification is performed three times to include all possible situations. Similar to the threefold technique, there are three separate sets in the proposed model but the order of the first two sets (basic and training) considerably affects the model construction process. Therefore, a modification must be applied to the validation process in this case.

Six different sequences can be considered for the proposed model, as seen in Table 2, where S is a set including all projects. S 1, S 2, and S 3 are three subsets selected randomly from S as basic, training, and test sets respectively. These sets include the same number of projects (or approximately the same). At each stage, performance metrics are computed for two separate sequences, and the average is considered as the result of that stage. Final results are determined based on the average of results obtained from all three stages.

Table 2 Applying threefold cross validation to proposed model

4.4.2 Data set description

To explore the real performance, the evaluation of the estimation model must be carried out by applying real data sets. In this study, three data sets are employed to investigate the accuracy of the proposed model. The first data set comes from the IBM data processing services (DPS) organization (Matson et al. 1994) including 24 projects developed by third-generation languages. Five numerical features that may affect the project effort are input count (IC), output count (OC), query count (QC), file count (FC), and adjustment factor (AF). In this data set, there is a project whose effort is more than twice as small as the second smallest project. In practice, this project is not suitable as an analog for other projects. Therefore, it is excluded as an outlier in order to compare the results with previous findings (Chiu and Huang 2007). Table 3 shows the statistical information about this data set.

Table 3 Description of DPS data set

The second data set, which includes 21 projects, is related to a major Canadian financial (CF) organization (Abran 1996). The collected projects are within the same application domain and developed using a standard development process model. Most of the collected projects developed on the IBM mainframe are included in the data set (Chiu and Huang 2007). IC, OC, inquiry count (IQC), internal logical file (ILF) count, external interface file (EIF), and AF are the main features considered in model construction. Statistical information related to this data set is presented in Table 4.

Table 4 Description of CF data set

The third data set is that of the International Software Benchmarking Standards Group (ISBSG 2011), which has been developed and refined by data collection over a 10-year period based on metrics. The latest release of this data set is the ISBSG R11 data repository (ISBSG), which includes a total of 5,052 projects from 24 countries. Statistical information regarding the selected features [Input count (Inpcont), Output count (Outcont), Enquiry count (EnqCont), File count (FileCont), Interface Count (Intcont), Adjusted function point (AFP), and Normalized effort in hours (NorEffort)] is displayed in Table 5. Projects with missing values in any of the selected feature are excluded from the subset. In the ISBSG data set, project data quality is rated and only projects with an A or B rating are used in published research papers. Therefore, projects with ratings other than A and B are excluded from the subset.

Table 5 Description of ISBSG data set

Moreover, since normalized effort (NorEffort) is used as the target for estimation, the risk from using normalized effort should be noted. For projects covering less than a full development life cycle, normalized effort is an estimate of the full development effort, and this may introduce bias. Hence the normalized ratio (normalized effort/summary effort) is used to refine the project subset. As suggested by ISBSG (2011), a ratio of up to 1.2 is acceptable. Projects with a normalized ratio larger than 1.2 are excluded. Finally, the subset is further reduced to the projects with “Insurance” as “OrgType.” In the end, the foregoing procedures generated a subset with 134 projects, which are used to evaluate the proposed method.

4.5 Primary adjustments

Data preprocessing is an important part of estimation problems because it can significantly affect quality of training. In this study, all independent features are normalized in a range of [0, 1] to ensure that they have the same effect on the dependent feature (effort). It expedites the process of training in the PSO algorithm. In all data sets, the population size and the number of iterations of the PSO algorithm are adjusted to 100. In addition, to c 1, c 2, and w are assigned 2, 2, and 1, respectively. The mentioned parameters were initialized based on comprehensive trial and error using the PSO algorithm on the training data. MMRE and PRED(0.25) are employed to construct and evaluate the performance of the model because numerous recent studies have used these metrics to compare the accuracy of models (Li et al. 2009a, b; Attarzadeh and Ow 2011; Hsu and Huang 2011; Khatibi Bardsiri et al. 2011). For the purpose of exploring the role of KNN (described in Sect. 2.3) in terms of the accuracy of the proposed model, five values, {1, 2, 3, 4, 5}, are applied to the solution function [value of K in Eq. (3) varies from 1 to 5] separately, and the obtained results from each amount of KNN are recorded. Moreover, Euclidean and Manhattan are used as similarity functions to assess whether or not a change in similarity function has any effect on the accuracy of estimates. Three main solution functions, including inverse, mean, and median (Sect. 2.2) are investigated as well. In total, the investigation scope comprises 24 different structures for the proposed model based on a combination of KNN, similarity function, and solution function. The results presented in Tables 6, 7, and 8 were obtained from an iterative execution of PSO (average of 30 executions).

Table 6 Results on DPS data set
Table 7 Results on CF data set
Table 8 Results on ISBSG data set

5 Experimental results

5.1 Results on DPS data set

Table 6 shows the results obtained from applying the proposed model to the DPS data set based on the variation of ABE parameters. According to the table, different combinations of model parameters (KNN, similarity function, and solution function) slightly change the accuracy of the model. Regarding the similarity function, since the diversity of results is quite high, it is not possible to determine which similarity function is the best one.

Concerning the solution function, inverse presents more accurate estimations as compared to the others. Moreover, among all values of KNN, k = 5 and k = 2 produce the worst and best estimates, respectively. In the case of k = 1, there is no choice for the solution function and the closest project is selected to estimate the target project. In the case of k = 2, mean and median solution functions produce the same results, which is why the median solution function is not shown in the table for this case.

The most accurate estimation (MMRE = 0.26, PRED(0.25) = 0.62) is achieved when KNN, the similarity function, and the solution function are 2, Euclidean, and inverse, respectively. This structure is utilized to compare the proposed model with other estimation models.

5.2 Results on CF data set

The results obtained from applying the proposed estimation model to the CF data set are presented in Table 7. Similar to the DPS data set, there is no significant difference between Euclidean and Manhattan similarity functions in the CF data set. Additionally, the mean solution function produces the best performance metrics in all combinations, whereas median produces the worst results.

Regarding the value of KNN, k = 1 presents the worst results, and k = 4 and k = 5 produce slightly more accurate estimates as compared to the other k values. In general, the best performance of the model [MMRE = 0.38, PRED(0.25) = 0.69] is achieved by a structure of k = 5, SimFunction = Manhattan, SolFuntion = mean, which is considered for the purpose of comparing the proposed model with other estimation models.

5.3 Results on ISBSG data set

Table 8 displays the estimates obtained by applying the proposed model to the ISBSG data set. As seen in the table, the similarity function cannot significantly change the accuracy of estimates. The variation in similarity function leads to quite negligible changes in the accuracy of estimates. The level of nonnormality and the diversity of projects in this data set are much greater than in the DPS and CF data sets. Therefore, the range of performance metrics for this data set is quite different from the other ones. Excluding the case of k = 1, the inverse solution function produces the most accurate estimates for all other values of KNN. The best performance of the proposed method is achieved by a structure of KNN = 4, SimFunction = Euclidean, SolFunction = inverse, where the value of MMRE is 0.64 and that of PRED(0.25) is 0.51. This structure is considered for the purpose of comparing the proposed method with other estimation methods.

5.4 Proposed model versus other estimation models

The performance of the proposed model is compared with eight well-known estimation models. Since the primary goal of this paper is to enhance the accuracy of the ABE method, performance metrics obtained from the proposed model must be compared with ABE. Therefore, all possible structures of ABE (similar to structures considered for the proposed model) are applied to all data sets and the best performance metrics are recorded for comparison purposes. Moreover, regression-based estimation methods including CART, stepwise regression (SWR), and multiple regression (MLR) are involved in the comparison process because they have been widely used in other research studies (Li et al. 2009a, b; Mittas and Angelis 2010; Hsu and Huang 2011). Finally, artificial neural network (ANN) and three adjusted ABEs (Chiu and Huang 2007)—AAE, AAMH, and AAMK—are compared with the proposed method. All results presented in the following sections are computed using a threefold cross-validation technique.

Table 9 summarizes the results obtained from applying the selected estimation models to the DPS data set (testing data) based on MMRE and PRED(0.25). It can be seen that the proposed model produces the most accurate estimates (MMRE = 0.26, PRED(0.25) = 0.62) in the testing stage as compared to the other methods, followed by AAE [MMRE = 0.38, PRED(0.25) = 0.57]. The worst estimates come from SWR [MMRE = 0.93, PRED(0.25) = 0.16].

Table 9 Comparison of estimation models on DPS data set

The results of applying different estimation models to the CF data set in the testing stage are shown in Table 10. As seen in the table, the proposed model achieves the best performance metrics [MMRE = 0.38, PRED(0.25) = 0.69], among all other estimation models, followed by AAE [MMRE = 0.52, PRED(0.25) = 0.43]. In addition, SWR produces the worst estimates [MMRE = 0.97, PRED(0.25) = 0.02].

Table 10 Comparison of estimation models on CF data set

Table 11 shows the performance metrics obtained from applying the proposed method to the ISBSG data set. The range of performance metrics for this data set shows its complexity and high nonnormality. As seen in the table, the most accurate estimates are achieved by the proposed estimation model [MMRE = 0.64, PRED(0.25) = 0.51]. The method closest to the proposed model is AAMK [MMRE = 0.89, PRED(0.25) = 0.27]. Moreover, MLR presents the worst estimates on this data set [MMRE = 1.32, PRED(0.25) = 0.16].

Table 11 Comparison of estimation models on ISBSG data set

5.5 Improvement analysis

Since improving the accuracy of ABE was the main idea behind this paper, analysis of obtained results must be performed comprehensively to clarify how the proposed model improves the performance of ABE. In this section, percentage of improvement in terms of ABE, AAE, AAMH, and AAMK is investigated (Figs. 3, 4).

Fig. 3
figure 3

Percentage of improvement obtained by proposed method on DPS

Fig. 4
figure 4

Percentage of improvement obtained by proposed method on CF

Figure 3 depicts the percentage of improvement achieved by applying the proposed model to the DPS data set. It is observed that the proposed model improves the accuracy of ABE in both performance metrics of MMRE and PRED(0.25) by 49 and 48 %, respectively. Since both performance metrics were improved in the same range, it is confirmed that the improvement domain involves a wide range of testing projects. Increasing the accuracy by almost 50 % implies that applying PSO can considerably enhance the performance of ABE on the DPS data set.

The performance of AAE, AAMH, and AAMK was improved as well, but the percentage of improvement is not as good as that of ABE. In terms of AAE and AAMK, improvement of MMRE is quite a bit more than PRED(0.25). As seen in Fig. 3, in these models, PRED(0.25) was improved by less than 10 %. This shows that the range of improvement is limited to a few projects but the percentage of improvement in these few projects is convincing (32 and 40 %).

Regarding AAMH, MMRE and PRED(0.25) were improved by almost the same percentage. Compared to AAE and AAMK, the estimation model of AAMH has a lower improvement percentage of MMRE (28 %) and a higher improvement percentage of PRED(0.25) (19 %). In other words, the percentage of accuracy improvement in AAMH is less than the other two models, but the number of projects affected by this improvement is more than the number of improved projects in the other two models. It can be concluded that the enhancement of MMRE is greater than PRED(0.25) in all of the mentioned estimation models. The percentage of improvement obtained from using the proposed model in CF data set is shown in Fig. 4. Since the performance metric values for AAMH and AAMK are the same (Table 10), these models are combined in Fig. 4. According to the figure, the greatest enhancement is achieved for ABE where MMRE and PRED(0.25) are improved by 55 and 138 %, respectively, which indicates that estimation accuracy was increased in a wide range of projects by a convincing percentage. Therefore, PSO can significantly improve the performance of ABE in the CF data set.

MMRE and PRED(0.25) were improved by almost the same percentage as for AAE. Although the improvement of PRED(0.25) in this case is a bit less than that observed in ABE, performance enhancement is still more than 50 %, which certifies the superiority of the proposed model.

In terms of the last two models (AAMH and AAMK), PRED(0.25) is improved by a promising percentage of 82 %, which indicates that a high number of estimates are improved but the percentage of improvement in total accuracy (MMRE = 27 %) is less than in the other two models. The improvement percentage of PRED(0.25) is higher than MMRE in the CF data set. In addition, the proposed model presents a greater improvement percentage in the CF data set compared with the DPS data set.

Figure 5 depicts the percentage of improvement achieved by applying the proposed model to the ISBSG data set. It is observed that the value of PRED(0.25) is increased significantly for all types of ABE (improvement of 168, 82, 183, 89 %). The largest improvement of MMRE, 47 %, occurred for ABE and the lowest percentage of improvement, 28 %, is related to AAMK. Compared to the other data sets, the improvement percentage of PRED(0.25) is higher than the others, whereas the improvement percentage of MMRE is lower. This indicates that the number of outliers and nonnormal projects in ISBSG are more than in the other data sets.

Fig. 5
figure 5

Percentage of improvement obtained by proposed method on ISBSG

Regarding the research questions, the first question was addressed by the explanation of the hybrid model in Sects. 4.2 and 4.3. The training and testing stages showed how the PSO algorithm could be combined with ABE. In addition, the comparison between the proposed hybrid model and ABE (Sect. 5.4) certified the efficiency of the weighting system, as mentioned in the first research question. The second question was investigated in Sects. 5.15.3, where the different structures of ABE were employed in the proposed hybrid model. The results proved that the proposed weighting system is flexible enough to be used by different structures of ABE. Finally, the third research question was addressed in Sect. 5.4, where the performance of the proposed model was evaluated against the other estimation models. This evaluation certified that the proposed model outperformed the other models on all three data sets. Therefore, it can be concluded that the combination of ABE and the PSO algorithm can improve the performance of other estimation models.

6 Conclusions

It is undeniable that development effort estimation plays a vital role in software project management. Due to the complexity and inconsistency of software projects, inaccurate estimates have become a common challenging issue, which bothers the developers and managers throughout the development phases. The comparison between new projects and previously completed projects is the main idea behind much research in this field. Although ABE is the best-known comparison-based estimation model and has been widely used in software development effort estimation, it is still unable to produce accurate estimates in many situations. PSO as a low computational cost and fast optimization algorithm was combined with ABE using the framework proposed in this paper. The proposed framework consists of training and testing stages in which an estimation model is constructed and evaluated. PSO explores the possible weights and selects those that will lead to the most accurate estimates. Indeed, the quality of the comparison process in the ABE method was improved by assigning the most appropriate weights to project features. To evaluate the performance of the proposed model, three real data sets were employed, and the performance metrics of MMRE and PRED(0.25) were computed using a cross-validation technique. The encouraging results showed that the proposed model can significantly increase the accuracy of estimates based on MMRE and PRED(0.25). The obtained results were compared with eight common estimation models, which showed the superiority of the proposed model in all data sets. According to the results obtained from three real data sets, it can be concluded that the combination of PSO and ABE leads to a high-performance model in terms of software development effort estimation. Besides achieving promising estimates, the proposed model can be used in a wide range of software project data sets by modifying a few parameters (primary adjustments). There is no prior assumption or prerequisite to use the proposed model. This means that it is a consistent, flexible, and adaptable estimation model that is suitable for use in various types of software projects. As future work, we will use other new optimization algorithms to increase the accuracy of development effort estimation in the ABE method.