Keywords

1 Introduction

Service-Oriented Architecture (SOA) is a tiered structure that assists corporations in sharing information and logic between different applications and usage modes. An excellent SOA solution leads to loosely coupled devices that empower the readiness expected to align IT and the enterprise team. A wide variety of technologies, particularly OSGi, SCA, and web services, are used for imposing the SOA structure. Various service-based systems (SBS), starting from business frameworks to cloud-based frameworks, are built using SOA architectures. The developing requirement of the customers forces the SBS’s to conform to fit the new needs of the users. This evolving may additionally cause the deterioration of the design and quality of the software-based systems, ensuing in a systematic strategy to a repeating hassle, referred to as Anti-patterns [4]. Anti-patterns are the structures in the design that suggests infringement of critical design concepts and contrarily sway design quality [4]. These are not accidental but rather normal slip-ups and are nearly consistently accompanied with sincere intentions. Anti-patterns makes it challenging for the advancement and maintenance of the software program systems; however, they likewise will assist in figuring out troubles within the design, the code, and the management of software program initiatives. In this paper, we have developed models for the detection of four unique anti-patterns, namely: AP1: Chatty Web Service(CWS); AP2: Fine-Grained Web Service (FGWS); AP3: Data Web Service (DWS); AP4: God Web Service (GWS).

The vital motivation of the work added in this paper is to investigate the utilization of ensemble learning techniques in the detection of web-service anti-patterns. This work is roused by the need to fabricate procedures and tools to detect anti-patterns in web services automatically.

2 Related Work

Moha et al. [6] introduced a novel framework for specification and detection of anti-patterns in Service-based systems to detect new patterns like Tiny Service and Multiservice and achieved a precision of more than 0.9. Ouni et al. [8] introduced innovative genetic programming to detect web services anti-pattern by generating detection rules based on threshold values and a combination of different metrics. The validation of the above approach is done on 310 Web services to detect the five anti-patterns. Dimitrios et al. [12] used the Protege platform, a web-based environment, to facilitate collaborative ontology editing allowing multiple users to edit and enrich the anti-pattern ontology simultaneously. Palma et al. [9] used a rule-based search to detect and identify BP anti-patterns in the Business Process Execution Language (BPEL) processes generated via orchestrating web services. Coscia et al. [7] proposed a statistical correlation analysis on the WSDL-level service metrics and the number of traditional OO metrics and found a correlation between them. SODA-W, an extension of the SOFA framework, detects the SOAP and REST anti-patterns using the pre-established DSL. Upadhyaya et al. [15] proposed an approach to detect 9 SOA patterns. It is observed from the literature reviewed here that the research on SOA anti-pattern detection still needs to be explored thoroughly. Dimitrios et al. [13] have proposed a novel OWL ontology-based knowledge system, SPARSE, that helps in detecting anti-patterns. The ontology provides documentation for the anti-patterns, describing their relationship with other anti-patterns through their causes, symptoms, and consequences. Jaffar et al. [3] argued in his paper that classes taking part in anti-pattern and patterns of software designs have dependencies with other classes i.e., unvarying and mutating dependencies, that may spread issues to other classes. In this paper, authors have empirically investigated the consequences of dependencies in object-oriented system by focusing and analysing the relationship between the presence of co-change and static dependencies and change proness, fault proness and fault types that the classes are exhibiting. Kumar et al. [5] proposed an approach for the automatic detection of anti-patterns by static analysis of the source code. In this paper, author proposed that the aggregate values of the source code metrics computed at the web-service level can be used as predictors for anti-pattern detection. Saluja et al. [11] proposed a new optimized algorithm that uses dynamic metrics for execution in addition to the static metrics. The results obtained are further optimized using genetic algorithms. The proposed results achieved better results than the existing methods and had a recall rate of approximately 0.9

3 Dataset

The data set with 226 publicly available web services that are shared by Ouni et al. on GitHubFootnote 1 are used for experiments in this paper. Figure 1 shows the distribution of the web services in which the anti-patterns exists (\(\#\)AP) and does not exist (\(\#\)NAP).

Fig. 1.
figure 1

Distribution of anti-patterns in web services

4 Proposed Solution Framework

Figure 2 shows the detailed overview of the proposed framework. Figure 2 depicts that the proposed framework is a multi−step procedure consisting of computing CK metrics from the WSDL file, applying aggregation metrics for computing metrics at file level, handling class imbalance problem using different variants of SMOTE discussed in Sect. 4.2, removal of irrelevant features using techniques such as PCA and RSA discussed in Sect. 4.3, and lastly the development of anti-pattern prediction models using five different ensemble learning techniques. First, the java files are extracted from each of the WSDL file, for which the CK metrics discussed in Table 3 are computed using CKJM tool. To convert the metrics computed at file level to system level, aggregation measures which are discussed in Table 3 are applied. This forms the dataset using which the anti-pattern prediction models are developed. Next, we used different variants of SMOTE technique for handling the class imbalance problem. Further, we compare the models trained using balanced data with the models developed using original data. After this, we use features selected using three different feature selection techniques namely, significant features using rank−sum test, Rough Set Analysis (RSA), and Principal Component Analysis (PCA). Finally, five ensemble learning techniques namely: Bagging Classifier (EST1), Random Forest Classifier (EST2), Extra Trees Classifier (EST3), AdaBoost Classifier (EST4), Gradient Boosting Classifier (EST5) are used to generate models for the prediction of web service anti-patterns. We use performance measures such as: AUC, F-measure, and accuracy for computing and comparing the performance of the models generated for the prediction of web service anti-patterns.

Fig. 2.
figure 2

Proposed framework

4.1 Preprocessing of the Dataset

Preprocessing of the dataset involves the extraction of java files from the WSDL files (raw data) from which the source code metrics are computed. The dataset considered has a collection of 226 WSDL files. In this paper, A step-wise procedure for preprocessing of data is detailed here.

  • Step-1: Source code metrics computation:

    We used WSDL2Java tool in this work to extract java files from each of the WSDL file, and the Chidamber and Kemerer metrics (CK metrics) along with other java metrics are computed for each of the java file using CKJM extended toolFootnote 2. The list of various CK metrics used in this paper are listed in Fig. 3. The definition of each of the CK metric along with their computation formula are documented in [2].

  • Step-2: Aggregation measures on the source code metrics:

    In this study, our objective is to develop one model for predicting an anti-pattern present in the WSDL file. Here, we have used CK metrics to measure each java file and the metrics computed here are at the file level. Further a total of sixteen aggregation measures are applied to the metrics computed at the file level to obtain the metrics at the system level. The list of aggregation measures used in this paper are given in Fig. 3.

Fig. 3.
figure 3

List of CKJM metrics and aggregation measures

4.2 Data Sampling Techniques

The selection of an appropriate sampling technique plays a critical role in the research study, as it significantly impacts the quality of our results and findings. As discussed in Sect. 3, the dataset considered is having a class imbalance problem, and we are choosing the data sampling technique SMOTE and its variants to solve this problem [1]. In this paper, we are considering the five different data sampling techniques namely SMOTE, Borderline Smote (BSMOTE), SVM-SMOTE (SVMSMOTE), SMOTE- Edited Nearest Neighbour (SMOTEENN), and SMOTETOMEK along with the original dataset (OD) to generate the predictive models.

Table 1. Source code metrics selected using RSA: all anti-patterns

4.3 Effectiveness of Metrics

A total of three models are considered in this study where the occurrence of an anti-pattern is considered as the dependent variable, and the source code metrics computed are taken as an independent variable for developing the relation.

  • Subset of Features Selected as Significant Features (SIGF): In our previous work, We applied a set of feature selection techniques on original source code metrics to obtain the significant source code metrics(SM). This features are used as input to develop the predictive models for various anti-patterns prediction [14].

    $$\begin{aligned} \textit{Anti-pattern} \text { predictability}= f(\text {Significant features}) \end{aligned}$$
    (1)
  • Subset of Features Selected Using RSA: In order to reduce the complexity of the model developed, it is important to remove irrelevant features. For this purpose, we use a feature reduction technique known as “Rough Set Analysis (RSA)" to obtain a reduced set of features. Here the anti-pattern predictability is defined as a function of a reduced set of metrics.

    $$\begin{aligned} \textit{Anti-pattern} \text { predictability}= f(\text {Reduced set of features}) \end{aligned}$$
    (2)

    RSA enables the developer to find the subset of the original source code metrics that are most illuminating removing all the irrelevant attributes with minimal information loss [10]. Table 1 shows the reduced significant feature set for all the anti-patterns considered in this study. For example, Hoover index (WMC), Min (DIT), Mean (DIT), Max (RFC), skewness (LCOM), skewness (Ca), Q1 (DAM), Gini index (DAM), Gini index (IC) are selected features using RSA analysis for GOWS anti-pattern.

  • Subset of Features Selected Using PCA: A feature extraction technique known as “Principle Component Analysis(PCA)" is used to develop a model with less complexity using reduced set of features. PCA reduces the dimensionality of the data. It helps in reducing the computational complexity of the model. The primary idea of PCA is to diminish the dimensionality of a dataset comprising of numerous features correlated with one another, either vigorously or softly, while holding the variation present in the dataset, up to the greatest degree. Table 2a illustrates the eigenvalue, \(\%\) variance, \(\%\) cumulative for principal component (PC) domain metrics selected for GOWS anti-pattern. Similarly, 22, 21, and 21 PCs are selected for the FGWS, DWS, and CWS anti-patterns, respectively.

    $$\begin{aligned} \textit{Anti-pattern} \text { predictability}\,=\,f(\text {Extracted set of features}) \end{aligned}$$
    (3)

    The same is done by transforming the features into a new set of features, which are known as the principal components or simply, the pc\('\)s.

4.4 Classifier Techniques

In this paper, we applied five ensemble techniques for training the predictive models for the detection of web service anti-patterns. The ensemble techniques we have used in this paper are: Bagging classifier (EST1), Random Forest classifier (EST2), Extra Trees classifier (EST3), AdaBoost classifier (EST4) and Gradient Boosting classifier (EST5).

Table 2. Subset of features selected using PCA for anti-patterns
Fig. 4.
figure 4

Confusion matrix of EST1

Table 3. Accuracy of all models

5 Experimental Results

In this work, five sampling techniques besides the original data set (OD), features selected by three different feature selection techniques and five ensemble techniques are applied for generating the models for the detection of four web service anti-patterns. A total of 6 \(\times \) 3 \(\times \) 5 \(\times \) 4 = 240 predictive models are built for anti-pattern detection in this study. The predictive ability of these models are evaluated using Accuracy, and AUC performance values. Table 3 shows the Accuracy values for all the models generated. Figure 4 shows the confusion matrix obtained for the ensemble technique EST1 i.e. Bagging classifier. From Table 3 and Fig. 4, we observed that the:

  • The performance values of the models trained on data after applying sampling techniques is better than models trained on the original data.

  • SMOTEENN is showing the best performance, while the model developed using the original data (ORG) is showing the worst performance.

  • The model trained using features selected by PCA as input have better performance.

  • The performance of the model trained using EST3 i.e. Extra Trees classifier with a mean accuracy of 97.13 is higher when compared to the models trained using other ensemble techniques.

6 Competitive Analysis

In this section, we compare the performance of the various models generated using Box-plots, Descriptive statistics and Wilcoxon ranksum test.

Fig. 5.
figure 5

Box-plot: data sampling techniques

6.1 Data Sampling Techniques

Figure 5 depicts the performance values, i.e., Accuracy, AUC, and F-measure of the models developed using different variants of SMOTE using Box-plot diagrams. Table 4 shows the descriptive statistics for the different SMOTE techniques used in this study. From Fig. 5 and Table 4, we infer that the technique SMOTEENN is showing the best performance, with 0.989 mean, 1.000 max, 0.986 Q1 and 0.999 Q3 AUC values. The model developed using the original data(ORG) is showing the worst performance with the AUC value of 0.823. It is also observed that the model developed with the dataset after applying the data sampling technique(any) is showing better performance when compared to the model developed using the original data, as the sampling technique deals with the class imbalance problem.

Table 4. Descriptive statistics of data sampling techniques
Table 5. P-value: data sampling techniques

Wilcoxon signed-rank test is used in this study for statistically comparing the performance of the various web service anti-pattern prediction models developed using different variants of smote sampled data and the original data. The main motivation of the wilcoxon signed-rank test is to find whether there is a significant difference between the performance of the various models developed using different Smote sampled data or not. The null hypothesis considered for this paper is: “The web service anti-pattern prediction model trained using different variants of smote sampled data are not significantly different". The considered null hypothesis is accepted, if the p-value obtained using the wilcoxon signed-rank test is ‘1’. Table 5 shows the p-values obtained for the models developed using all the data sampling techniques along with the original dataset. A close inspection of Table shows that most of the comparison points are having p-values as ‘0’, i.e., the considered hypothesis is rejected. Hence we conclude that there is a significant difference between the performance of the models generated using different variants of smote sampled data and the original data.

Fig. 6.
figure 6

Box-plot: feature selection techniques

6.2 Feature Selection Techniques

Figure 6 depicts the performance values, i.e., Accuracy, AUC, and F-measure of the models developed using the features selected by different feature selection techniques as input using Box-plot diagrams. Table 6 shows the descriptive statistics for the various feature selection techniques used in this study. Table 6 show that the mean value of the model developed using the features selected by PCA as input is higher than the models developed using the features selected by rank-sum test, and RSA as input. From Fig. 6, we observe that the inter-quartile range for AUC value for the model generated using PCA is compa ratively small when compared to the other models. This indicates that the performance parameters obtained using multiple executions in PCA are showing less variation when compared to other models.

Table 6. Descriptive statistics of feature selection techniques

The null-hypothesis considered in this section is: “The performance of the anti-pattern prediction models developed using features selected by different feature selection techniques are significantly different." The defined null-hypothesis is accepted, if the p-value obtained using the wilcoxon signed rank test is ‘1’. Table 7 shows the p-values of the models developed using various combination of features as input. From Table 7, we observed that most of the comparison points have p-value as ‘1’ i.e. the defined null-hypothesis is accepted. Therefore, there is no significant difference between the performance of the models builds utilizing the features selected by using three different feature selection techniques.

Table 7. P-value: feature selection

6.3 Classifier Techniques

Figure 7 shows the box-plot diagram of the AUC, Accuracy and F-measure of the classifier techniques. Table 8 shows the descriptive statistics for the models trained using distinct ensemble techniques. From Table 8 and Fig. 7, we observed that the performance of the model trained using EST3 i.e. Extra Trees classifier is higher when compared to the models trained using other ensemble techniques. The model trained using EST3 is showing good performance with a mean accuracy of 97.13, median accuracy 97.18 and min accuracy of 91.59.

Fig. 7.
figure 7

Box-plot: classifier techniques

Table 8. Statistical description for ensemble techniques

The null-hypothesis considered in this work is: “The performance of the anti-pattern prediction models trained using various ensemble techniques are significantly different. " The defined null-hypothesis is accepted, if the p-value obtained using the wilcoxon signed rank test is ‘1\('\) and is rejected if the p-value is ‘0’. Table 9 shows the p-values of the models trained using different ensemble techniques. From Table 9, we observed that most of the comparison points have p-value as ‘0’ i.e. the defined null-hypothesis is rejected. Hence we conclude that there is a significant difference between the performance of the models trained using various ensemble techniques.

Table 9. P-value: ensemble techniques

7 Conclusion

We present the empirical analysis on anti-pattern prediction models developed using data sampling, feature selection and ensemble techniques. Five-fold cross validation is used for validating the performance of the models built. We used three performance parameters i.e. Accuracy, F-measure and AUC to compare the performance of the models built. We observed that the performance values of the models trained on data after applying sampling techniques is better than models trained on the original data. Wilcoxon sign rank test suggested that model trained using balanced data have significant improvement in predicting anti-patterns. It is observed that the performance of the model trained using features selected by PCA as input have better performance. Wilcoxon sign rank test suggested that there is no significant difference between the performance of the models builds utilizing the features selected by using three different feature selection techniques. We also observe that the model trained using EST3 i.e. Extra Trees classifier is showing good performance with a mean accuracy of 97.13.