Software effort estimation using FAHP and weighted kernel LSSVM machine

Sehra, Sumeet Kaur; Brar, Yadwinder Singh; Kaur, Navdeep; Sehra, Sukhjit Singh

doi:10.1007/s00500-018-3639-2

Software effort estimation using FAHP and weighted kernel LSSVM machine

Methodologies and Application
Published: 04 December 2018

Volume 23, pages 10881–10900, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Software effort estimation using FAHP and weighted kernel LSSVM machine

Download PDF

Sumeet Kaur Sehra^1,2,
Yadwinder Singh Brar¹,
Navdeep Kaur³ &
…
Sukhjit Singh Sehra⁴

541 Accesses
17 Citations
Explore all metrics

Abstract

In the life cycle of software product development, the software effort estimation (SEE) has always been a critical activity. The researchers have proposed numerous estimation methods since the inception of software engineering as a research area. The diversity of estimation approaches is very high and increasing, but it has been interpreted that no single technique performs consistently for each project and environment. Multi-criteria decision-making (MCDM) approach generates more credible estimates, which is subjected to expert’s experience. In this paper, a hybrid model has been developed to combine MCDM (for handling uncertainty) and machine learning algorithm (for handling imprecision) approach to predict the effort more accurately. Fuzzy analytic hierarchy process (FAHP) has been used effectively for feature ranking. Ranks generated from FAHP have been integrated into weighted kernel least square support vector machine for effort estimation. The model developed has been empirically validated on data repositories available for SEE. The combination of weights generated by FAHP and the radial basis function (RBF) kernel has resulted in more accurate effort estimates in comparison with bee colony optimisation and basic RBF kernel-based model.

A Study on Application of Soft Computing Techniques for Software Effort Estimation

On the value of parameter tuning in heterogeneous ensembles effort estimation

Article 30 November 2017

RETRACTED ARTICLE: Enrichment of accurate software effort estimation using fuzzy-based function point analysis in business data analytics

Article 06 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Software engineering (SE) discipline has evolved since the 1960s and has garnered significant knowledge (Zelkowitz et al. 1984). Academia and industry have invested in SE research and development in past decades that resulted in the development of improved tools, methodologies and techniques. Over the years, there has been an intense criticism of SE research as it advocates more than it evaluates (Glass et al. 2002). Many researchers have attempted to characterise SE research, but they failed to present a comprehensive picture (Jørgensen et al. 2009; Shaw 2002).

Software effort estimation (SEE) is a critical component that predicts the effort to accomplish development or maintenance tasks based on historical data. Accurate estimates are critical to company and customers because it can help the company personnel to classify, prioritise and determine resources to be committed to the project (Nisar et al. 2008). Since its inception, the problems and issues in SEE have been addressed by researchers and practitioners. The researchers have proposed numerous estimation methods since the inception of SE as a research area (Jørgensen and Shepperd 2007; Rastogi et al. 2014; Trendowicz et al. 2008). The application of developed models has been found to be appropriate for the specific types of the development environment. The advances in the technology stack and frequent changing user requirements have made the process of SEE difficult. Numerous approaches have been tried to predict this probabilistic process accurately, but no single technique has performed consistently. Even, few researchers have tried to employ a combination of the approaches rather than a single approach. The major reason for inaccurate estimates is that datasets of past projects are usually sparse, incomplete, inconsistent and not well documented. Another reason for this is that SEE process is dependent on multiple seen and unseen factors.

Despite extensive research on it, the community is unable to develop and accept a single model that can be applied in diverse environments and which can handle multiple environmental factors. Recently, multi-criteria decision-making (MCDM) methods have emerged as well-qualified approaches to handle multi-factored decision making. Also, incorporation of machine learning (ML) approaches to the primary estimation model has always enhanced the performance. Thus, current paper is motivated to focus on the development of hybrid model based on ML and MCDM. Fuzzy analytic hierarchy process (FAHP) has been used effectively for feature ranking. Ranks generated from FAHP have been integrated into weighted kernel least square support vector machine (LSSVM) for effort estimation. The model developed has been empirically validated on data repositories available for SEE.

The paper has been divided into six sections. The following section describes related work in detail. The third section elaborates the methodology followed. Proposed hybrid model is presented in fourth section. The fifth section provides the empirical validation of the proposed model. The final section concludes the paper and provides directions for future work.

2 Related work

Algorithmic SEE models have been studied for many years (Jørgensen and Shepperd 2007). The application of ML techniques to predict effort has gained significant momentum from the year 2006. The most explored algorithmic approaches such as “fuzzy logic” (Ryder 1998; Wong et al. 2009), “neural networks” (Attarzadeh and Ow 2009; Azzeh and Nassif 2016; Dasheng and Shenglan 2012; Idri et al. 2006, 2002; Sheta et al. 2015) and genetic algorithms (Gharehchopogh et al. 2015; Milios et al. 2013; Oliveira et al. 2010) have been consistently used in every aspect of SEE. Other areas identified are “nature inspired algorithms” which focus on the use of algorithms based upon various natural phenomena (Chalotra et al. 2015a, b; Dave and Dutta 2011; Gharehchopogh et al. 2015; Idri et al. 2007; Madheswaran and Sivakumar 2014; Reddy et al. 2010). “feature selection in problem domain” (Kocaguneli et al. 2015; Liu et al. 2014), “support vector regression” (Braga et al. 2007; Corazza et al. 2011) and “case-based reasoning” (Mendes et al. 2002).

Wen et al. (2012) investigated 84 primary studies of ML techniques in SEE for finding out different ML techniques, their estimation accuracy, the comparison between different models and estimation contexts. Based upon this study, they found that in SEE, eight types of ML techniques have been applied and concluded that ML models provide more accurate estimates as compared to non-ML models.

The effectiveness of support vector regression (SVR) for web effort estimation using a cross-company dataset has been investigated by Corazza et al. (2011). The analysis has showed that different preprocessing strategies and kernels can significantly affect the prediction effectiveness of the SVR approach. It has also revealed that logarithmic preprocessing in combination with the radial basis function (RBF) kernel provides the best results. The estimation of software effort has been performed by using ML techniques instead of subjective and time-consuming estimation methods (Hidmi et al. 2017). Models using two ML techniques, viz. support vector machine (SVM) and k-nearest neighbour (k-NN) separately and combining those together using ensemble learning, have been proposed.

SVM has been widely used in classification and nonlinear function estimation. However, the major drawback of SVM is its higher computational burden for the constrained optimisation programming. This disadvantage has been overcome by LSSVM, which solve linear equations instead of a quadratic programming problem. LSSVM has been employed in numerous applications including stock market trend prediction (Marković et al. 2015), project risk forecasting (Liyi et al. 2010) and analogy-based estimation (ABE) (Benala and Bandarupalli 2016).

There may be numerous factors affecting software effort estimates, but few dominate the given environment of software development, and those must be identified. Effort estimation can be understood as a problem that depends upon multiple factors which are qualitative as well as quantitative.

Minku and Yao (2013) argued that SEE can be considered as a multi-objective problem. Ferrucci et al. (2011) studied the efficacy of multi-criteria genetic programming on effort estimation and stated that effort estimation is an inherently multi-criteria problem. Jiang and Naudé (2007) found project size as the crucial factor in effort estimation.

Shepperd and Cartwright (2001) surveyed a company and discovered that project managers rely on the parameters namely count of programs, functionality, difficulty level, personnel skill and sameness as past work for effort estimation in a particular project. Morgenshtern et al. (2007) considered the factors in mainly four dimensions, namely project uncertainty, use of estimation development, estimation management and the experience of the estimator. Furulund and Molokken-Ostvold (2007) highlighted and confirmed the importance of using experienced-based data and checklists. Jiang and Naudé (2007) considered factors for making a decision, viz. project size, average team size, development language, computer-aided software engineering (CASE) tools, development type and computer platform. They have also discussed that use of historical data can also be used as a mean to increase SEE accuracy.

Liu et al. (2017) suggested new method based on LSSVM and K-means clustering for ranking the optimal solutions for the multi-objective allocation of water resources. An approach has been proposed using feature ranking and feature selection approach in combination with weighted kernel LSSVMs. The feature weights obtained by the analytic hierarchy process (AHP) method are used for feature ranking and selection and used with the LSSVMs through a weighted kernel (Marković et al. 2017).

3 Methodology

3.1 Multi-criteria decision-making (MCDM)

MCDM approaches qualify for SEE as they can combine the historical data and expert judgement by quantifying subjectivity in judgement. The MCDM method is the process of making decisions in the presence of multiple criteria or objectives. The criteria can be multiple and quantifiable or non-quantifiable from which an expert is required to choose. The solution of the problem is dependent on the inclination of the expert since the objectives usually are contradictory (Belton and Stewart 2002). Further, the difficulty of developing a selection criterion for precisely describing the preference of one alternative over another is also a concern. MCDM models include preference ranking organisation method for enrichment evaluation (PROMOTHE), technique for the order of prioritisation by similarity to ideal solution (TOPSIS), AHP, elimination and choice expressing reality (ELECTRE), VIKOR, each of which has a different algorithm to solve the problems (Lee and Tu 2011).

AHP (Menzies et al. 2006) is the most explored MCDM technique having characteristics of both model-based systems and expert judgement. AHP, developed by Saaty (2004), is an MCDM technique which has been used in vast areas. Further, it has also been identified as one of the prime approaches that can be used for supporting the software estimation planning. AHP has the capability of combining historical data and expert judgement by quantifying subjective judgement. The result of this method is to provide a formal and systematic method of extracting, combining and capturing expert judgements and their relationship to similar reference data (Menzies et al. 2006). Despite being a popular approach, there are certain issues which need to be addressed. Firstly, as judgements given by experts are relative, so any arbitrary change in the value of alternatives may affect the weights of other alternatives resulting in a problem known as rank reversal (Wang and Luo 2009). Another issue with AHP is its subjectivity and imprecision due to Saaty’s nine-point scale (Saaty 2008). These issues can be handled by adding fuzziness to AHP, thus resulting in new approach called fuzzy analytic hierarchy process (FAHP).

3.1.1 Fuzzy analytic hierarchy process (FAHP)

Fuzzy logic is a well-established approach for handling the subjectivity of human judgements and vagueness of the data (Zadeh 1988). The combination of fuzzy logic and AHP is a hybrid approach for both qualitative and quantitative criteria comparison using expert judgements to find weights and relative rankings. Since most of multi-criteria methods suffer from vagueness, FAHP approach can better tolerate this vagueness as experts are always more confident while using the intervals for estimates rather than fixed values (Mikhailov and Tsvetinov 2004). The combination of fuzzy logic and structural analysis generates more credible results than conventional AHP (Liao 2011; Tang et al. 2005).

In FAHP, expert judgement is represented as a range of values instead of single crisp values (Kuswandari 2004). The range values can be given as optimistic, pessimistic or moderate. Triangular fuzzy number (TFN) is represented by Eq. (1). Here l, m, u are pessimistic, moderate and optimistic values, respectively. The difference between $u - l$ describes the degree of fuzziness of judgement.

$$\begin{aligned} a_{ij} = ( l_{ij}, m_{ij}, u_{ij}) \end{aligned}$$

(1)

Table 1 Linguistic scale for FAHP

Software effort estimation using FAHP and weighted kernel LSSVM machine

Abstract

Similar content being viewed by others

A Study on Application of Soft Computing Techniques for Software Effort Estimation

On the value of parameter tuning in heterogeneous ensembles effort estimation

RETRACTED ARTICLE: Enrichment of accurate software effort estimation using fuzzy-based function point analysis in business data analytics

Explore related subjects

1 Introduction

2 Related work

3 Methodology

3.1 Multi-criteria decision-making (MCDM)

3.1.1 Fuzzy analytic hierarchy process (FAHP)

3.2 Support vector machine (SVM)

3.3 Least squares support vector machines (LSSVMs)

4 Proposed model

5 Empirical validation

5.1 Performance measures

5.2 COCOMO dataset

5.3 NASA dataset

5.4 Kemerer dataset

5.5 Interactive voice response (IVR) dataset

6 Conclusion and future scope

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation