1 Introduction

Software engineering (SE) discipline has evolved since the 1960s and has garnered significant knowledge (Zelkowitz et al. 1984). Academia and industry have invested in SE research and development in past decades that resulted in the development of improved tools, methodologies and techniques. Over the years, there has been an intense criticism of SE research as it advocates more than it evaluates (Glass et al. 2002). Many researchers have attempted to characterise SE research, but they failed to present a comprehensive picture (Jørgensen et al. 2009; Shaw 2002).

Software effort estimation (SEE) is a critical component that predicts the effort to accomplish development or maintenance tasks based on historical data. Accurate estimates are critical to company and customers because it can help the company personnel to classify, prioritise and determine resources to be committed to the project (Nisar et al. 2008). Since its inception, the problems and issues in SEE have been addressed by researchers and practitioners. The researchers have proposed numerous estimation methods since the inception of SE as a research area (Jørgensen and Shepperd 2007; Rastogi et al. 2014; Trendowicz et al. 2008). The application of developed models has been found to be appropriate for the specific types of the development environment. The advances in the technology stack and frequent changing user requirements have made the process of SEE difficult. Numerous approaches have been tried to predict this probabilistic process accurately, but no single technique has performed consistently. Even, few researchers have tried to employ a combination of the approaches rather than a single approach. The major reason for inaccurate estimates is that datasets of past projects are usually sparse, incomplete, inconsistent and not well documented. Another reason for this is that SEE process is dependent on multiple seen and unseen factors.

Despite extensive research on it, the community is unable to develop and accept a single model that can be applied in diverse environments and which can handle multiple environmental factors. Recently, multi-criteria decision-making (MCDM) methods have emerged as well-qualified approaches to handle multi-factored decision making. Also, incorporation of machine learning (ML) approaches to the primary estimation model has always enhanced the performance. Thus, current paper is motivated to focus on the development of hybrid model based on ML and MCDM. Fuzzy analytic hierarchy process (FAHP) has been used effectively for feature ranking. Ranks generated from FAHP have been integrated into weighted kernel least square support vector machine (LSSVM) for effort estimation. The model developed has been empirically validated on data repositories available for SEE.

The paper has been divided into six sections. The following section describes related work in detail. The third section elaborates the methodology followed. Proposed hybrid model is presented in fourth section. The fifth section provides the empirical validation of the proposed model. The final section concludes the paper and provides directions for future work.

2 Related work

Algorithmic SEE models have been studied for many years (Jørgensen and Shepperd 2007). The application of ML techniques to predict effort has gained significant momentum from the year 2006. The most explored algorithmic approaches such as “fuzzy logic” (Ryder 1998; Wong et al. 2009), “neural networks” (Attarzadeh and Ow 2009; Azzeh and Nassif 2016; Dasheng and Shenglan 2012; Idri et al. 2006, 2002; Sheta et al. 2015) and genetic algorithms (Gharehchopogh et al. 2015; Milios et al. 2013; Oliveira et al. 2010) have been consistently used in every aspect of SEE. Other areas identified are “nature inspired algorithms” which focus on the use of algorithms based upon various natural phenomena (Chalotra et al. 2015a, b; Dave and Dutta 2011; Gharehchopogh et al. 2015; Idri et al. 2007; Madheswaran and Sivakumar 2014; Reddy et al. 2010). “feature selection in problem domain” (Kocaguneli et al. 2015; Liu et al. 2014), “support vector regression” (Braga et al. 2007; Corazza et al. 2011) and “case-based reasoning” (Mendes et al. 2002).

Wen et al. (2012) investigated 84 primary studies of ML techniques in SEE for finding out different ML techniques, their estimation accuracy, the comparison between different models and estimation contexts. Based upon this study, they found that in SEE, eight types of ML techniques have been applied and concluded that ML models provide more accurate estimates as compared to non-ML models.

The effectiveness of support vector regression (SVR) for web effort estimation using a cross-company dataset has been investigated by Corazza et al. (2011). The analysis has showed that different preprocessing strategies and kernels can significantly affect the prediction effectiveness of the SVR approach. It has also revealed that logarithmic preprocessing in combination with the radial basis function (RBF) kernel provides the best results. The estimation of software effort has been performed by using ML techniques instead of subjective and time-consuming estimation methods (Hidmi et al. 2017). Models using two ML techniques, viz. support vector machine (SVM) and k-nearest neighbour (k-NN) separately and combining those together using ensemble learning, have been proposed.

SVM has been widely used in classification and nonlinear function estimation. However, the major drawback of SVM is its higher computational burden for the constrained optimisation programming. This disadvantage has been overcome by LSSVM, which solve linear equations instead of a quadratic programming problem. LSSVM has been employed in numerous applications including stock market trend prediction (Marković et al. 2015), project risk forecasting (Liyi et al. 2010) and analogy-based estimation (ABE) (Benala and Bandarupalli 2016).

There may be numerous factors affecting software effort estimates, but few dominate the given environment of software development, and those must be identified. Effort estimation can be understood as a problem that depends upon multiple factors which are qualitative as well as quantitative.

Minku and Yao (2013) argued that SEE can be considered as a multi-objective problem. Ferrucci et al. (2011) studied the efficacy of multi-criteria genetic programming on effort estimation and stated that effort estimation is an inherently multi-criteria problem. Jiang and Naudé (2007) found project size as the crucial factor in effort estimation.

Shepperd and Cartwright (2001) surveyed a company and discovered that project managers rely on the parameters namely count of programs, functionality, difficulty level, personnel skill and sameness as past work for effort estimation in a particular project. Morgenshtern et al. (2007) considered the factors in mainly four dimensions, namely project uncertainty, use of estimation development, estimation management and the experience of the estimator. Furulund and Molokken-Ostvold (2007) highlighted and confirmed the importance of using experienced-based data and checklists. Jiang and Naudé (2007) considered factors for making a decision, viz. project size, average team size, development language, computer-aided software engineering (CASE) tools, development type and computer platform. They have also discussed that use of historical data can also be used as a mean to increase SEE accuracy.

Liu et al. (2017) suggested new method based on LSSVM and K-means clustering for ranking the optimal solutions for the multi-objective allocation of water resources. An approach has been proposed using feature ranking and feature selection approach in combination with weighted kernel LSSVMs. The feature weights obtained by the analytic hierarchy process (AHP) method are used for feature ranking and selection and used with the LSSVMs through a weighted kernel (Marković et al. 2017).

3 Methodology

3.1 Multi-criteria decision-making (MCDM)

MCDM approaches qualify for SEE as they can combine the historical data and expert judgement by quantifying subjectivity in judgement. The MCDM method is the process of making decisions in the presence of multiple criteria or objectives. The criteria can be multiple and quantifiable or non-quantifiable from which an expert is required to choose. The solution of the problem is dependent on the inclination of the expert since the objectives usually are contradictory (Belton and Stewart 2002). Further, the difficulty of developing a selection criterion for precisely describing the preference of one alternative over another is also a concern. MCDM models include preference ranking organisation method for enrichment evaluation (PROMOTHE), technique for the order of prioritisation by similarity to ideal solution (TOPSIS), AHP, elimination and choice expressing reality (ELECTRE), VIKOR, each of which has a different algorithm to solve the problems (Lee and Tu 2011).

AHP (Menzies et al. 2006) is the most explored MCDM technique having characteristics of both model-based systems and expert judgement. AHP, developed by Saaty (2004), is an MCDM technique which has been used in vast areas. Further, it has also been identified as one of the prime approaches that can be used for supporting the software estimation planning. AHP has the capability of combining historical data and expert judgement by quantifying subjective judgement. The result of this method is to provide a formal and systematic method of extracting, combining and capturing expert judgements and their relationship to similar reference data (Menzies et al. 2006). Despite being a popular approach, there are certain issues which need to be addressed. Firstly, as judgements given by experts are relative, so any arbitrary change in the value of alternatives may affect the weights of other alternatives resulting in a problem known as rank reversal (Wang and Luo 2009). Another issue with AHP is its subjectivity and imprecision due to Saaty’s nine-point scale (Saaty 2008). These issues can be handled by adding fuzziness to AHP, thus resulting in new approach called fuzzy analytic hierarchy process (FAHP).

3.1.1 Fuzzy analytic hierarchy process (FAHP)

Fuzzy logic is a well-established approach for handling the subjectivity of human judgements and vagueness of the data (Zadeh 1988). The combination of fuzzy logic and AHP is a hybrid approach for both qualitative and quantitative criteria comparison using expert judgements to find weights and relative rankings. Since most of multi-criteria methods suffer from vagueness, FAHP approach can better tolerate this vagueness as experts are always more confident while using the intervals for estimates rather than fixed values (Mikhailov and Tsvetinov 2004). The combination of fuzzy logic and structural analysis generates more credible results than conventional AHP (Liao 2011; Tang et al. 2005).

In FAHP, expert judgement is represented as a range of values instead of single crisp values (Kuswandari 2004). The range values can be given as optimistic, pessimistic or moderate. Triangular fuzzy number (TFN) is represented by Eq. (1). Here lmu are pessimistic, moderate and optimistic values, respectively. The difference between \(u - l\) describes the degree of fuzziness of judgement.

$$\begin{aligned} a_{ij} = ( l_{ij}, m_{ij}, u_{ij}) \end{aligned}$$
(1)
Table 1 Linguistic scale for FAHP
figure a

There are various methods available to calculate the weights and prioritise ranking of the alternatives (Buckley 1985; Chang 1992, 1996; Csutora and Buckley 2001; Kahraman et al. 2003; Van Laarhoven and Pedrycz 1983). Naghadehi et al. (2009) discussed these methods and recommended for using the method suggested by Chang (1996). Thus, in this paper for finding the weights, the synthetic extent analysis method given by Chang (1996) has been utilised. The objective of this method is to perform pairwise comparisons. TFN provides an opportunity in deciding the weight of one alternative over the other.

Table 2 Random index

Algorithm 1 describes the steps applied in FAHP. The membership function used for creating the fuzzy set is given in Eq. (2), where x is the weight of relative importance of one criterion over another criterion.

$$\begin{aligned} \mu _A(x) = \left\{ \begin{array}{l} 0 \quad \mathrm{for} \quad x \le l\\ \frac{x -l}{m - l }\quad \mathrm{for} \quad l \le x \le m\\ \frac{u - x }{u - m }\quad \mathrm{for} \quad m \le x \le u\\ 0 \quad \mathrm{for} \quad x \ge n \end{array} \right\} \end{aligned}$$
(2)

The modified Saaty’s scale using TFN (presented in Table 1) has been used to construct the FAHP comparison matrices. The reciprocal property for the TFN is generated as \(a_{ji} = ( 1/u_{ji}, 1/m_{ji}, 1/l_{ji})\). The next step is to use extent analysis method to calculate the relative ranking of alternatives, and the synthetic extent values are obtained by Eq. (3).

$$\begin{aligned} S_i = \sum \limits _{j = 1}^{m} {N}_{ci}^{j} \otimes \left[ \sum \limits _{i = 1}^{n} {} \sum \limits _{j = 1}^m N_{ci}^{j}\right] ^{ - 1} \end{aligned}$$
(3)

The degree of possibility of \(N_1\ge N_2\) is defined in Eq. (4).

$$\begin{aligned}&V \left( N_1\ge N_2 \right) = \mathrm{sup} [\mathrm{min}\left( \mu _N{}_1\left( x) \right) ,\mu _N{}_2\left( y\right) \right) ] \end{aligned}$$
(4)
$$\begin{aligned}&\begin{aligned}&V \left( N2\ge N1\right) = hgt\left( N_1 \cap N_2 \right) =\mu _N{}_1(d) \\&\quad =\left( \begin{array}{l} 1 \quad if \quad m2 > m1\\ 0 \quad if \quad l1 \ge u2\\ \frac{l_1 - u_2}{(m_2 - l_2 ) - ( m_1 - l_1) }, \mathrm{otherwise} \end{array} \right) \end{aligned} \end{aligned}$$
(5)

In Eq. (5), d is representing ordinate of the highest intersection point between \(\mu _N{}_1\) and \(\mu _N{}_2\). The degree of possibility for a convex fuzzy number is defined by Eq. (6).

$$\begin{aligned} \begin{aligned}&V\left( N \ge N_1,N_2,...,N_k \right) \\&\quad =[\left( N \ge N_1\right) ,...,\left( N \ \ge N_k \right) ] = \min V\left( N \ge N_i \right) \end{aligned} \end{aligned}$$
(6)

In order to normalise the weight vector, Eq. (7) is used.

$$\begin{aligned} W_A=\frac{W^T}{\sum (W^T)} \end{aligned}$$
(7)

After calculating the weights of criteria, the scores of alternatives with respect to each criterion are evaluated and composite weights of the decision alternatives are determined by aggregating the weights through the hierarchy.

In a situation, where many pairwise comparisons are performed, inconsistencies may typically arise. The effective method for identifying the inconsistency is by computing consistency index (CI). It refers to the average of the remaining solutions of the characteristic equation for the inconsistent matrix A. This index increases in proportion to the inconsistency of the estimates. The most often approach used for calculating CI is given in Eq. (8).

$$\begin{aligned} \mathrm{CI} = {\text { }}\frac{{{\lambda _{\max } - n }}}{{n - 1}} \end{aligned}$$
(8)

where \(\lambda _{\max }\) denotes the maximal eigenvalue of the matrix and n is the number of factors in the judgement matrix. When the matrix is consistent, then \(\lambda _{\max }=n\) and CI = 0.

CR is a measure of how a given matrix compares to a purely random matrix in terms of their consistency indices. Equation 9 is used for calculating CR.

$$\begin{aligned} \mathrm{CR} = {\text { }}\frac{{\mathrm{CI}}}{{\mathrm{RI}}} \end{aligned}$$
(9)

Here, random index (RI) is the average CI of 100 randomly generated (inconsistent) pairwise comparison matrices. These values are tabulated in Table 2 for different values of n. CR of less than 0.1 (“10% of average inconsistency”) of randomly generated pairwise comparisons (matrices) is usually acceptable. If CR is not acceptable, judgements should be revised.

This has been witnessed that the ranking in FAHP is still based on the decisions of the expert. Further, from the literature, it has been witnessed that ML algorithm provides promising results for labelled data. Thus, there is a need of a hybrid model that can handle knowledge from past dataset, while removing uncertainty in recall issue of the expert and utilise judgement of the expert. To address these aspects, the current research has proposed to merge the MCDM (for handling uncertainty) and ML algorithm (for handling imprecision), to create a robust hybrid method. The rankings of the projects have been calculated using FAHP, and these weights are further used to modify the weight of kernel. The modified kernel has been utilised by LSSVM algorithm for prediction. In this work, RBF kernel has been applied, whereas the generated set of weights from FAHP can also be independently incorporated into other kernel-based functions.

3.2 Support vector machine (SVM)

SVM is a discriminative classifier formally defined by a separating hyperplane as presented in Fig. 1. In other words, given labelled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorises the new examples. The hyperplane gives the largest minimum distance to the training examples. Therefore, the optimal separating hyperplane maximises the margin of the training data.

Fig. 1
figure 1

SVM as a binary classifier

Fig. 2
figure 2

Conversion of nonlinear data to linear in n-dimensional space

SVM is capable of handling and mapping nonlinear classification using kernel trick. The input data are mapped to n-dimensional feature space using function, and then, the linear model is applied in that space, as shown in Fig. 2. Equation (10) describes the mapping function.

$$\begin{aligned} y= f(x) = \mathrm{sign}[w^T\phi (x)+ b] \end{aligned}$$
(10)

Here, w is the weight vector in the input space, \(\phi \) is a nonlinear function providing the mapping from the input space to a higher (possibly infinite)-dimensional feature space, and b is a bias parameter. For data classification, it is assumed that \(y_k[w^T \phi (x_k)+ b] \ge ,\ k= 1....N\). The function \(f(x_k)\) is determined by solving the minimisation of Eq. (11), subject to \(y_k[w^T \phi (x_k)+ b] \ge \ 1-\epsilon _k,\ k= 1....N\), where \(\epsilon _k \ge \ 0, k= 1....N\).

$$\begin{aligned} \min \, J_p(w,\epsilon ) = \frac{1}{2}w^Tw + c\sum _{k=1}^{N}\epsilon _k \end{aligned}$$
(11)

Based on this, the Lagrangian dual of the problem is presented as in Eq. (13), which is a quadratic maximisation problem and hence requires the application of quadratic programming methods. Thus, the quadratic maximisation problem is given as in Eq. (13), subject to \(\sum _{k=1}^{N}\alpha _ky_k = 0\) and \(0 \le \alpha _k\ \le c, k =1,.....,N\).

$$\begin{aligned} L(w, b, \varepsilon , \alpha , v)= & {} J(w, \varepsilon _k) - \sum _{k=1}^{N}\alpha _ky_k[w^T\phi (x_k)+b] \nonumber \\&-1+\varepsilon _k - \sum _{k=1}^{N}v_k\varepsilon _k \end{aligned}$$
(12)
$$\begin{aligned} \max \,Q(\alpha )= & {} -\frac{1}{2}\sum _{k,l}^{N}y_ky_lK(x_k,x_l)\alpha _k\alpha _l\ + \sum _{k=1}^{N}\alpha _k\nonumber \\ \end{aligned}$$
(13)

Here, \(K(x_k,x_l)\) kernel function acts as a dot product to map the input data into a higher-dimensional space, given as \(K(x_k,x_l)= \phi (x_k)^T \phi (x_l)\). A detailed explanation of SVM is given by Vapnik (2013).

3.3 Least squares support vector machines (LSSVMs)

LSSVMs are least squares versions of SVM, which are a set of related supervised learning methods that analyse data and recognise patterns, and which are used for classification and regression analysis. Another synonym of standard SVM is LSSVM algorithm. It adopts equality constraints and a linear Karush–Kuhn–Tucker system, which has a more powerful computational ability in solving the nonlinear and small-sample problem using linear programming methods (Suykens and Vandewalle 1999). However, the modelling accuracy of a single LSSVM is not only influenced by the input data source but also affected by its kernel function and regularization parameters. LSSVM is a class of kernel-based learning methods. LSSVMs have been effectively used for estimation and nonlinear classification problems. Gestel Suykens et al. (2004) stated that standard SVM and LSSVM perform consistently in a combination of tuned hyperparameters. But, the advantage of using LSSVM is 1) less computational burden for the constrained optimisation programming and 2) better for higher-dimensional data.

figure b

In this paper, LSSVM has been utilised to predict the effort of software data, while learning from the past available data of software projects. The function \(f(x_k)\) is determined by solving the minimisation of Eq. (14), subject to \(1-e_k= \gamma _k[w^T \phi (x_k)+ b],\ k= 1....N\).

$$\begin{aligned} \min \,J_p(w,e) = \frac{1}{2}w^Tw + \gamma \frac{1}{2}\sum _{k=1}^{N}e_k^2 \end{aligned}$$
(14)

A trade-off is made between the minimisation of the squared error, \(e_k^2\), and minimising the dot product of the weight vectors, \(w^Tw\), by an optimisation parameter \(\gamma \). Based on this, the Lagrangian of the problem is presented as in Eq. (15), where \(\alpha _k\) is a Lagrangian multiplier.

$$\begin{aligned} \frac{1}{2}w^Tw+\gamma \frac{1}{2}\sum _{k=1}^{N}e_k^2 -\sum _{k=1}^{N} \alpha _k({\gamma _k}{[w^T\phi (x_k)+b]-1+e_k})\nonumber \\ \end{aligned}$$
(15)

The weighted kernel function is defined as \( K(\theta x_k,\theta x_l) \) where \( \theta \) represents a weight vector comprising of dataset features. This weighted kernel function makes the LSSVM as weighted LSSVM. The classification model of weighted LSSVM is represented as in Eq. (16).

$$\begin{aligned} y(x)= \mathrm{sign}{\left[ \sum _{k=1}^{N} \alpha _k y_k K(\theta x_k,\theta x_l)+b \right] } \end{aligned}$$
(16)

At this point, the kernel function is applied which computes the dot product in the higher-dimensional feature space by using the original attribute set. Some of the kernel functions are listed below:

  • Linear: \( K(x_k,x_l)= {x_l}^Tx_k+c \)

  • Polynomial : \( K(x_k,x_l)= (\gamma {x_l}^Tx_k+r)^d > 0 \)

  • Multilayer perceptron: \(K(x_k,x_l) = \tan \,h(\gamma x_l^Tx_k + r)\)

  • Radial basis function: \(K(x_k,x_l)= \exp (-\frac{{\vert \vert x_k-x_l \vert \vert }^2}{\sigma ^2})\)

Here \(\gamma \), \(\sigma \), r and d are kernel parameters. Kernel-based estimation techniques, such as SVM and LSSVM, have shown to be powerful nonlinear classification and regression methods. In this study, RBF kernel has been used since it is previously found to be a good choice in case of LSSVM (Gestel Suykens et al. 2004).

4 Proposed model

A hybrid model has been developed by combining MCDM approach FAHP and ML approach LSSVM. FAHP has been used to generate feature weights, and a weighted kernel (\(W_k\)) has been used to assimilate generated weights into the LSSVM. Algorithm 2 describes the steps followed to develop the model.

In this model, the criteria chosen have been effort and lines of code (LOC). The alternatives chosen have been the projects. The first step has been pairwise comparisons of the projects based on the effort and LOC criteria. The weights have been generated for the projects by using the FAHP methodology as discussed in Sect. 3.1.1 and using the fuzzy linguistic scale as described in Table 1. The weight vector generated by FAHP has been used as input to LSSVM model. The weight vector has been used to modify the kernel function as in Eq. (17) as discussed by Xing et al. (2009).

$$\begin{aligned} K(x_k, x_l)= \exp \frac{- \vert \vert S{(x_k-x_l)\vert \vert }^{2}}{\sigma ^{2}} \end{aligned}$$
(17)

In the equation, \({\vert \vert {(x_k-x_l)\vert \vert }^{2}} \) may be recognised as the squared Euclidean distance between the two feature vectors and S = diag[\(\theta _1,\theta _2,...,\theta _n\)], where \(\theta _1,\theta _2,...,\theta _n\) are weight vectors generated by FAHP. However, as discussed by Guo et al. (2008) and Xing et al. (2009), the components of feature weights subject to the conditions, viz. \(0 \le \theta _k \le 1\) where \(k = 1,...,n\) and \(\sum _{k=1}^{n}\theta _k=1\).

Table 3 Description of attributes of COCOMO dataset
Table 4 Statistical analysis of COCOMO dataset
Table 5 Comparison matrix of COCOMO dataset with respect to effort criterion
Table 6 Comparison matrix of COCOMO dataset with respect to LOC criterion
Table 7 Normalised weights
Fig. 3
figure 3

Hyperplane created by using cross-validation and tuning parameters such that \(\gamma = 1.4221\) and \(\sigma ^2=2.399\)

Further, LOC and effort values of the projects have been used to train LSSVM model. An optimal combination of parameters (\(\gamma \),\(\sigma \)) has been considered where \(\gamma \) denotes the relative weights to allowed \(e_k\) errors during the training phase and \( \sigma \) is a kernel parameter. They have been tuned keeping in consideration fitness value of the problem. In this case, the fitness value to be optimised is MMRE (Eq. 19) and its value should be minimum for making more accurate predictions. For tuning of parameters, n-fold cross-validation has been used. After tuning of parameters, the trained model has been provided LOC as input and effort has been calculated. The effort value has been calculated for both the methods, viz. RBF kernel-based LSSVM (RBF-LSSVM) as given in Eq. (18) and FAHP modified RBF kernel-based LSSVM (FAHP-RBF-LSSVM) as in Eq. (17).

$$\begin{aligned} K(x_k,x_l)= \exp \left( -\frac{{\vert \vert x_k-x_l \vert \vert }^2}{\sigma ^2}\right) \end{aligned}$$
(18)

5 Empirical validation

For empirical validation of the algorithmic models, the datasets from the public repository and private vendor projects data have been used. The dataset of COCOMO, Kemerer and NASA is available in public domain (Menzies et al. 2012). The private vendor dataset has been taken from Srivastava et al. (2012).

5.1 Performance measures

The performance of different approaches has been evaluated using different established performance measures, mean magnitude of relative error (MMRE) and root-mean-square error (RMSE) as depicted in Eqs. (19) and (20), respectively. Here, the actual effort is taken from the dataset and predicted effort is the effort calculated using the proposed technique. These measures have been widely used by the research community.

$$\begin{aligned} \text {MMRE}= & {} \frac{1}{N} \sum \limits _{{i}{ = } {1}}^{N} {\frac{\text {Actual effort} - \text {Predicted effort}}{\text {Actual effort}}} \end{aligned}$$
(19)
$$\begin{aligned} \text {RMSE}= & {} \sqrt{\frac{1}{N}\sum \limits _{i = 1}^N {{{(\text {Actual effort} - \text {Predicted effort})}^2}} }\nonumber \\ \end{aligned}$$
(20)

5.2 COCOMO dataset

COCOMO dataset is most commonly used dataset for evaluating the performance of proposed techniques. This dataset consists of 63 software projects, each of which is dependent upon 17 cost drivers (independent features). The effort value (dependent feature) is measured in person-months.

Table 3 presents the complete description of effort multipliers. Effort value (dependent feature) is measured in person-months and is dependent upon LOC and 15 effort multipliers. Each of the multipliers receives a rating on a six-point scale that ranges from “very low” to “extra high” (in importance or value).

The statistical analysis of COCOMO dataset is depicted in Table 4. From statistical analysis, it is evident that LOC values of the software projects are dispersed over a range of values with minimum value as 1.98 and maximum value as 150. Similarly, effort values also span a wide range with minimum value as 5.9 and maximum value as 11400. Thus, these data present the development of diverse projects.

The experts have an opinion of giving equal importance to multiple criteria, viz. effort and LOC. Thus, based on the fuzzy scale presented in Table 1, the comparison matrices are constructed as given in Tables 5 and 6. Data of 10 projects from the COCOMO dataset consisting of 63 projects have been considered to represent the methodology used. It is considered that the matrix created would have been a complex one if all the projects have been considered for the pairwise comparisons. The comparison matrix has been created by taking effort and LOC as the criteria and projects as alternatives. The other parameters of the dataset have been taken as constant and have not been used for analysis. The projects have been relatively ranked based on the effort and LOC values. In Table 5, the first row represents the relative ranking of projects from \(P_1\) to \(P_{10}\) with reference to project \(P_1\). Similarly, relative ranks for all other projects have been generated by using comparative judgements by considering the effort as a criterion. In addition, the same methodology has been used to construct the comparison matrix as depicted in Table 6. The values of CR for the matrices based upon effort and LOC are 0.088 and 0.049, respectively, which are less than 0.1, and hence, these matrices are considered consistent for weight calculations. The ranks generated have been used for computing the weights by following the methodology of FAHP as discussed in Sect. 3.1.1. The normalised weights thus calculated are shown in Table 7.

Table 8 Effort comparison for RBF-LSSVM and FAHP-RBF-LSSVM using COCOMO dataset
Fig. 4
figure 4

Empirical validation of hybrid model using COCOMO dataset. a Effort comparison. b MMRE comparison. c RMSE comparison

The normalised weights thus obtained have been used to modify RBF kernel as given in Eq. (17). Further, the modified kernel has been used in LSSVM model to generate the effort values. As a numerical example, the value of effort has been computed for LOC value as 30. For this purpose, the values of LOC and effort of remaining projects other have been provided as training data. The normalised weights obtained earlier have been given as the input to modify the kernel function as represented in Eq. (17). Then, LOC value of 30 has been provided as input and the values of \(\gamma \) and \(\sigma \) have been tuned to minimise MMRE. The values of \(\gamma \) and \(\sigma ^2\) have been tuned to 1.4221 and 2.399, respectively. The effort value computed for LOC value of 30 is 329.83 and 396.74 for methods RBF-LSSVM and FAHP-RBF-LSSVM, respectively. This has been presented in the form of a hyperplane as illustrated in Fig. 3. The effort values for other projects have been computed in the same manner. The effort values obtained by modified RBF kernel and unmodified RBF kernel are presented in Table 8 and Fig. 4a.

Table 9 depicts the MMRE and RMSE comparison values for RBF-LSSVM and FAHP-RBF-LSSVM with BCO. MMRE and RMSE values for FAHP-RBF-LSSVM are 0.57 and 569.43, respectively, proving the outperformance of FAHP-RBF-LSSVM over RBF-LSSVM and BCO. MMRE and RMSE comparison is also depicted in Fig. 4b, c.

Table 9 MMRE and RMSE comparison of BCO, RBF-LSSVM and FAHP-RBF-LSSVM using COCOMO dataset
Table 10 Statistical analysis of NASA dataset
Table 11 Comparison matrix of NASA dataset with respect to effort criterion
Table 12 Comparison matrix of NASA dataset with respect to LOC criterion

5.3 NASA dataset

NASA dataset is composed of 4 attributes out of which 2 are independent attributes, viz. methodology (ME) and developed lines (DL). DL depicts program size including both new source code and reused code from other projects. It is presented in KLOC (thousands of lines of codes). The ME attribute is related to the methodology used in the development of each software project. The dataset consists of 1 dependent attribute; namely, effort is given in man-months. The statistical analysis of NASA dataset is presented in Table 10. DL attribute has minimum value as 2.1 and maximum value as 100.8, and effort attribute spans the range with minimum value as 5 and maximum value as 138.3. This dataset is less diverse as compared to COCOMO dataset.

Table 13 Effort comparison of RBF-LSSVM and FAHP-RBF-LSSVM using NASA dataset
Table 14 MMRE and RMSE comparison of BCO, RBF-LSSVM and FAHP-RBF-LSSVM using NASA dataset

The projects of NASA dataset are compared on the basis of effort and LOC values given in the dataset based on the linguistic scale as given in Table 1. The comparison matrix based on effort values is depicted in Table 11. CR value of this matrix has been 0.097 which is less than 0.1; thus, this matrix is considered as a consistent matrix. The comparison matrix based on LOC values is depicted in Table 12. The value of CR for this matrix has been 0.097 justifying the consistency of the matrix.

Fig. 5
figure 5

Empirical validation of hybrid model using NASA dataset. a Effort comparison. b MMRE comparison. c RMSE comparison

The effort values have been computed using the RBF kernel function and modified RBF kernel function by weight vector generated using FAHP. The comparison of values obtained using RBF-LSSVM and FAHP-RBF-LSSVM is presented in Table 13. Table 14 depicts the MMRE and RMSE values, respectively, for NASA dataset. MMRE and RMSE values for FAHP-RBF-LSSVM are 0.19 and 5.99, respectively. These are comparatively less than RBF-LSSVM and BCO technique. Thus, it clearly depicts that weighted RBF kernel using FAHP performs better as compared to other methods. Figure 5a–c presents the comparison of effort, MMRE and RMSE in pictorial form.

5.4 Kemerer dataset

Kemerer dataset consists of 15 software projects characterised by 6 independent and 1 dependent attribute. Table 15 presents the description of the attributes. The independent attributes include 2 categorical and 4 numerical features. The categorical attributes are “Language” and “Hardware”, and numerical attributes are “Duration”, “KSLOC”, “AdjFP” and “RawFP”. Effort (dependent attribute) is measured by ’man-months’.

The statistical analysis of Kemerer dataset is depicted in Table 16. Numerical attributes have been analysed statistically, and KSLOC varies from 39 to 450. The effort value spans the range from 23.2 to 1107.31. This dataset is of less diverse projects that are compared to the previously discussed datasets.

Table 15 Description of attributes of Kemerer dataset
Table 16 Statistical analysis of Kemerer dataset

The comparison matrices have been obtained from effort and LOC values of 15 projects of Kemerer dataset. The matrices based upon effort and LOC values, respectively, are presented in Tables 17 and 18. The values of CR for the matrices based upon effort and loc are 0.096 and 0.067, respectively. Since the values are less than 0.1, thus matrices are considered as consistent matrices. The effort values computed using FAHP-RBF-LSSVM and RBF-LSSVM are presented in Table 19.

The comparison of MMRE and RMSE values for RBF-LSSVM and FAHP-RBF-LSSVM is presented in Table 20 and Fig. 6a–c. The results clearly reveal that incorporation of weight vector generated by FAHP -RBF-LSSVM has resulted in better effort estimates. MMRE and RMSE values obtained for FAHP-RBF-LSSVM are 0.31 and 149.5, respectively, showing the dominance over other methods.

5.5 Interactive voice response (IVR) dataset

The data in IVR dataset have been collected of IVR application though survey from a software industry where IVR application is developed with questionnaires directly to the company’s project managers and senior software development professionals (Srivastava et al. 2012). It consists of 4 attributes, viz. project no., KLOC, actual effort and actual time.

The statistical analysis of IVR dataset is presented in Table 21. From Table 21, it is evident that the values of the attributes are not spread over a wide range. This reveals that projects are of similar category.

The comparison matrix of IVR dataset has been obtained by comparing the effort values of 10 projects from the dataset as comparison matrix of 48 projects would have been a complex matrix. The alternatives have been relatively ranked based on multiple criteria, viz. LOC and effort value. Based on the dataset, the expert has given equal importance to LOC and effort for their relative ranking. The comparison matrices thus created are depicted in Tables 22 and 23. It is found that the created comparison matrices based upon effort and LOC have the CR value as 0.039 and 0.045, respectively, which is less than 0.1. Hence, the comparison matrices are considered consistent for further calculations. Effort values computed using RBF-LSSVM and FAHP-RBF-LSSVM are shown in Table 25.

Table 17 Comparison matrix of Kemerer dataset with respect to effort criterion
Table 18 Comparison matrix of Kemerer dataset with respect to LOC criterion

Figure 7a–c depicts the comparison of effort, MMRE and RMSE values of RBF-LSSVM, FAHP-RBF-LSSVM and BCO. Table 24 reveals the values of MMRE and RMSE as 0.07 and 5.23 for FAHP-RBF-LSSVM which are less than the values for RBF-LSSVM and BCO showing the dominance of FAHP-RBF-LSSVM.

6 Conclusion and future scope

It has been identified that SEE acts as a base point for many project management activities including planning, budgeting and scheduling. Thus, it is crucial to obtain almost accurate estimates. The researchers have proposed numerous estimation methods since the inception of SE as a research area. Further, it has been witnessed that SEE process depends on multiple intrinsic and extrinsic factors. Despite extensive research on it, the community is unable to develop and accept a single model that can be applied in diverse environments and which can handle multiple environmental factors.

Table 19 Effort comparison for RBF-LSSVM and FAHP-RBF-LSSVM using Kemerer dataset
Table 20 MMRE and RMSE comparison of BCO, RBF-LSSVM and FAHP-RBF-LSSVM using Kemerer dataset
Table 21 Statistical analysis of IVR dataset

Therefore, MCDM has been utilised for the process of SEE. FAHP, an extension of AHP, has been proposed in the current research for the purpose. It has been identified that FAHP can handle subjectivity and uncertainty using fuzzy numbers and CR.

Fig. 6
figure 6

Empirical validation of hybrid model using Kemerer dataset. a Effort comparison. b MMRE comparison. c RMSE comparison

Table 22 Comparison matrix of IVR dataset with respect to effort criterion
Table 23 Comparison matrix of IVR dataset with respect to LOC criterion
Table 24 MMRE and RMSE comparison for BCO, RBF-LSSVM and FAHP-RBF-LSSVM using IVR dataset
Fig. 7
figure 7

Empirical validation of hybrid model using IVR dataset. a Effort comparison. b MMRE comparison. c RMSE comparison

Table 25 Effort comparison for RBF-LSSVM and FAHP-RBF-LSSVM using IVR dataset

Further, to provide a robust method, a hybrid model has been developed to combine MCDM (for handling uncertainty) and ML algorithm (for handling imprecision) approach to predict the effort more accurately. The hybrid model has amalgamated MCDM approach (FAHP) and ML approach (LSSVM). FAHP has been used to generate feature weights. The weights have been calculated using effort and LOC as the criteria. The projects have been considered as alternatives. The ranks generated by FAHP have been utilised to modify the kernel. A weighted kernel (\(W_k\)) has been used to assimilate generated weights into the LSSVM. The performance of the proposed model has been compared with BCO. The combination of feature ranking by FAHP and the weighted kernel has resulted in more accurate effort estimates.

Future work may focus on the development of hybrid approach in which weights generated from other MCDM techniques can be amalgamated into ML techniques. Also, the prediction accuracy can be analysed by applying other kernel-based functions modified by incorporating the weights generated by MCDM approaches.