Keywords

1 Introduction:

Service Oriented Architecture (SOA) is developing as the prime integration and architectural framework in the present mind-boggling and heterogeneous computing environment. Web services are the favored standard-based approach to acknowledge SOA. Like Object-oriented programming, Web services also suffer from the deterioration of the design and Quality of Software (QoS), which leads to a poor solution called Anti-patterns. An anti-pattern is a repeated application of code or design that leads to a bad outcome. Research revealed that the presence of anti-patterns hinders the progress and maintenance of the software system. Several web service anti-patterns are discovered over time [1] and in this paper, we considered the following four anti-patterns: GOWS: God Object Web Service (AP1), FGWS: Fine-Grained Web Service (AP2), CWS: Chatty Web Service (AP3), and DWS: Data Web Service (AP4).

The primary motivation of this study is to prove that the source code metrics teamed up with the machine learning framework plays a pivotal role in the detection of web service anti-patterns. The secondary objective of this research is to develop models for the automatic detection of web service anti-patterns with the best predictive capability. Area Under a ROC Curve (AUC) and hypothesis testing approach are used to examine the relative execution of the different variations of data sampling technique: SMOTE, different feature selection techniques, and machine learning algorithms in the detection of SOA anti-patterns. In this work, we attempt to answer the following research questions:

RQ1: What is the impact of the application of data sampling techniques for developing anti-pattern prediction models?

RQ2: Is there any critical distinction between the exhibition of the models produced using a subset of features selected by applying various feature selection techniques?

RQ3: Does there exist a neural network model that outperforms all others?

2 Related Work:

Palma et al. [2] proposed a framework SODA-W for specifying and detecting the 10 anti-patterns present in the weather and finance-related data. This proposed framework achieved an accuracy of 75\(\%\) and a recall of 100\(\%\). Ouni et al. [3] has used cooperative parallel evolutionary algorithms (P-EA), an automated approach to detect the anti-patterns. The idea behind their innovation is the combination of several detection algorithms executing in parallel optimization processes would give better results. Settas et al. [4] used the Protege platform, a web-based environment, to facilitate collaborative ontology editing. The model rectifies the false and imprecise information in SPARSE (using anti-pattern ontology as the knowledge base), an intelligent system that can detect anti-patterns existing in a software project. The statistical results confirm that the proposed technique outperforms other existing techniques.

3 Experimental Dataset

The data set with 226 publicly available web services that are shared by Ouni et al. on GitHubFootnote 1 are used for experiments in this paper. The dataset is of high quality, as Ouni et al. [5] who shared the dataset publicly in GitHub has manually validated the anti-patterns. The raw data that is available in the dataset is of WSDL format. A close observation of the dataset revealed that the percentage of anti-patterns present in the is varying from 5.75\(\%\) to 10.62\(\%\) i.e., GOWS exists in 21 out of 226 WSDL files. Similarly, FGWS, DWS, and CWS anti-patterns are present in 13, 14, and 21 out of 226 anti-patterns.

4 Research Framework

Figure 1 illustrates the methodology for the anti-patterns prediction in web services. As discussed in Sect. 3, the dataset has a collection of web services from various domains in WSDL format. CKJM metrics are computed for each java file (A WSDL file has multiple java files) using CKJM extended tool. Further, the aggregation measures are applied on the CKJM metrics computed at the file level to obtain metrics at the system level which forms the dataset. After the formulation of the dataset, we apply different variants of SMOTE, i.e., SMOTE, BSMOTE, SVMSMOTE, SMOTEENN, and SMOTETOMEK, to address the class imbalance problem. Then we apply two different feature selection techniques, i.e., PCA and RSA, for selecting the significant features in the dataset. Further, we use the subset of features selected using PCA, RSA along with the essential metrics (SM) selected in our previous paper [6] to generate the models for the prediction of web service anti-patterns. In this paper, we use different variants of the neural network along with the ensemble technique to generate the models. Lastly, the execution of the models is evaluated using different evaluation metrics and the impact of various techniques used for generating models are speculated based on the results of hypothesis testing.

Fig. 1.
figure 1

Research framework for web service anti-pattern prediction

5 Experimental Results

Artificial Neural Networks (ANN) are known for their ability to learn and model non-linear and complex relationships. A neural network is a collection of interconnected nodes. The input patterns are collected by the unit nodes in the input layer and are mapped to the target variables in the output layer. In this work, we apply five different variants of neural network (NN) by changing the number of hidden k hidden layers (HL), i.e., NN with 1 HL (HL−1), NN with 2 HL (HL−2), NN with 3 HL (HL−3), NN with 4 HL (HL−4) and NN with 5 HL (HL−5). The feature matrix sets selected from different feature selection techniques are taken as input to each of the models. In addition to these models, we are also using an ensemble technique for the prediction of anti-patterns. The output of the previous models are given as input to the ensemble technique. A five-fold cross-validation technique is applied to validate the results of the generated models. The models were trained using the original (imbalanced) dataset as well as the balanced dataset obtained after applying the data sampling techniques. Table 1 depicts the results of the models generated for the prediction of GOWS anti-pattern using five-fold cross-validation. The information present in Table 1 shows that the model developed using the neural network with 2 or 3 hidden layers have a good predictive aptness as compared to others. Similarly, the models trained using balanced data have good potential in predicting anti-patterns as compared to models generated using the original data.

Table 1. Accuracy & AUC values for GOWS anti-pattern

6 Comparative Analysis

In this section, we have discussed and analyzed the results obtained by applying various data sampling, feature selection, and machine learning classifiers on the considered dataset for the anti-patterns prediction. The empirical analysis of the results is carried out methodically by answering the research questions defined in Sect. 1.

RQ1: What is the impact of the application of data sampling techniques for developing anti-pattern prediction models?

The impact of data sampling techniques is evaluated by analyzing the performance measures (AUC, Accuracy, and F-Measure) of anti-pattern prediction models developed before and after the application of data sampling techniques. In this paper, we employed Box-plots, and Statistical hypothesis testing to evaluate the significance and reliability of the models generated.

Comparison of the data sampling techniques based on Descriptive Statistics and Box-plots: Figure 2 depicts the box-plots for the data sampling techniques and the original data. These are useful for comparing the minimum, maximum, median, and inter-quartile range (Q1; Q3) of the various developed models. Figure 2 shows that the mean value of the model developed using the sampling technique SMOTENN is higher than the corresponding values of the other models. From Fig. 2, it is observed that the inter-quartile range for model generated using SMOTE is comparatively taller when compared to the models generated using other sampling techniques which indicates that the performance parameters computed from multiple executions are showing more variations.

Fig. 2.
figure 2

Box-plot for accuracy and AUC: data sampling techniques

Comparison of the data sampling techniques based on Statistical hypothesis: We used Wilcoxon signed-rank test to evaluate the performance of the data sampling techniques statistically. The null hypothesis investigated by the Wilcoxon signed-rank test is defined as below:

Null-Hypothesis: The AUC performance value of the models commenced for web service anti-pattern prediction using various data sampling techniques is not significantly different.

The null hypothesis is accepted if the pair-wise value is greater than the considered threshold value of 0.05. From Table 2, it is noticed that most of the comparison points are having P-value, which is higher than 0.05. Hence we conclude that the null hypothesis is accepted, which means that there is no significant difference between the execution of the models generated employing various sampling techniques.

Table 2. Wilcoxon signed test: Data sampling techniques

RQ2: Is there any critical distinction between the exhibition of the models produced using a subset of features selected by applying various feature selection techniques?

The impact of the models developed by using the subset of features selected by applying various feature selection techniques is evaluated by analyzing the performance measures (AUC, Accuracy, and F-measure) on the considered dataset.

Comparison of the feature selection techniques based on Descriptive Statistics and Box-plots: Figure 3 shows the box-plots for the models trained using selected features and all features. Figure 3 show that the mean value of the model developed using the subset of features selected applying PCA is higher when compared to the models developed using the subset of features selected by applying other feature selection techniques.

Fig. 3.
figure 3

Box-plot for accuracy and AUC: Feature selection techniques

Comparison of the feature selection techniques based on Statistical hypothesis: In this paper, Wilcoxon signed-rank test is used for evaluating the performance of the models generated using various feature selection techniques. From Table 3, it is observed that many of the comparison points are having P-value which is less than 0.05. Hence we conclude that the null-hypothesis is rejected, which means that, there is a noticeable difference between the performance of the models generated using the subset of features selected by applying various feature selection techniques.

Table 3. Wilcoxon signed test: Feature selection techniques

RQ3: Does there exists a neural network model that outperforms all others?

The impact of the models generated using the neural network with different number of hidden layers and ensemble technique is evaluated by analyzing the performance measures (Accuracy, AUC, and F-measure) on the considered dataset.

Comparison of the classifier techniques based on Descriptive Statistics and Box-plots: Figure 4 shows the box-plots for the models generated using a neural network with a varying number of hidden layers and ensemble technique. Figure 4 show that the mean value of the model developed using the neural network with two hidden layers (HL−2) is showing preferable execution when compared to the models developed using a neural network with other numbers of hidden layers. HL−2 is showing greater performance when compared to the model generated using the ensemble technique (EST).

Fig. 4.
figure 4

Box-plot for accuracy and AUC: Classifier techniques

Comparison of the classifier techniques based on Statistical hypothesis: In this paper, Wilcoxon signed-rank test is used to evaluate the performance of the models developed using neural network with varying number of hidden layers and the ensemble technique. Considering only the models developed using the neural network with varying number of hidden layers, we notice from Table 4, that many of the comparison points are having P-value, which is higher than 0.05, from which we conclude that the null-hypothesis is accepted, which means that, there is no significant variation between the performance of the models generated using a neural network with varying number of hidden layers. If we consider the model developed by applying neural networks with any number of hidden layers and ensemble technique, it is observed that most of the comparison points are having a p-value of less than 0.05. Hence we conclude that the null-hypothesis is rejected and infer that there is a significant variation between the execution of the models generated using neural network (HL−1, HL−2, HL−3, HL−4 and HL−5) and the ensemble technique (EST).

Table 4. Wilcoxon signed test: Classifier techniques

7 Conclusion

The principle inference of this work is that neural network with less number of hidden layers can be used for the effective prediction of web service anti-patterns. In this paper, the application of five data sampling techniques along with the original data, three feature selection techniques and six machine learning classifier techniques, i.e., neural network (HL−1, HL−2, HL−3, HL−4 and HL−5) and the ensemble technique (EST) is investigated empirically. The significant finding of this experimental work is that feature selection techniques play a crucial role in removing irrelevant features. Experimental results reveal that SMOTEENN is showing better performance. We also infer that the model developed by considering metrics selected by Principal Component Analysis (PCA) as the input obtained better performance when compared to the model developed by other metrics. Experimental results also show that the neural network model developed with two hidden layers has outperformed all the other models developed with varying number of hidden layers.