1 Introduction

The outbreak of cloud computing has provoked a vital turn in the prospects of research and business organizations in IT infrastructures. It is the next step in the development of information technology services and products. Despite spending a huge price on hardware and plenty of money on maintenance costs, IT organizations can be preferred with little cost. The evidence of increases in the web services of companies such as Amazon [1, 19, 35], Google [18] and Salesforce [44, 47] dramatically shows how cloud computing is desirable in recent times. Amazon.com is one of the most important and heavily trafficked web sites in the world. It provides a vast selection of products using an infrastructure based on web services. Google is the prototypical cloud computing services company, and it supports some of the largest web sites and services in the world.Salesforce.com is a web application suite that is ‘Software as a Service’ (SaaS) and Force.com is Salesforce.com’s ‘Platform as a Service’ (PaaS) platform for building one’s own services.

Due to the growth of public cloud contributions, cloud consumers have become progressively more difficult to decide which provider can fulfill their Quality of Service (QoS) requirements. “QoS is the service providers’ capability to achieve the service users’ requirements, such as response time, throughput, availability, security and so forth”. Each cloud provider offers similar services at various prices and performance levels with different sets of features. Due to the multiplicity of cloud service offerings, an important issue that the consumers are concerned with is how to discover who are the “right” cloud providers that can satisfy their requirements. Therefore, it is not sufficient to just discover multiple cloud services but it is also significant to evaluate which is the most suitable cloud service.

In conventional component-based systems, software components are invoked in the vicinity. The client-side web service manipulations need real-time web service implementations and endeavours the following shortcomings: First, the reality of web service implementations compels costs for the user and intake resources from the service providers. Few of the initiations may be charged. Secondly, too many service applicants must be manipulated and few may not even be recognized in the lists. All users of web services may not be well-versed or highly trained in web service manipulation. The trivial time-to-market constraints restrict a manipulation of the aimed web services.

The general idea is that the QoS values of all candidate services to target users are known. On the other hand, it might not be true in realism. Owing to some factors, e.g., location and network environment, the QoS of the same service to different users may be different. For example, the response time for user (IP:12.108.128.196, RUSSIA) to invoke Web service (WSDL:http://biomoby.org/services/wsdl/mmb.pcb.ub.es/parseFeatureAASequenceFromFSOLVText, located in USA) is 5916 ms; whereas that for user (IP:183.1.74.162, Kenya) to invoke the same service is 690 ms. A user can barely invoke all services, meaning that the QoS values of the services that the user has not invoked are unknown. Hence, an adapted cloud service QoS ranking is required for different cloud applications.

Nonetheless, most of these existing methods [22, 57, 59] focus on the method of finding similarity between users and their services and then to find the missing value prediction of the users. The most undemanding approach of an adapted cloud service QoS ranking is to estimate all the applicant services at the user’s side and rank the services based on the observed QoS values. However, this approach is impossible in reality, since invocations of cloud services may be charged. Additionally, when the quantity of applicant services is large, it is not easy for the cloud application trend to evaluate all the cloud services professionally.

To overcome this critical challenge, Zheng [58] proposed the first step on a personalized ranking prediction framework for a current user. This approach predicts the QoS ranking of a set of cloud services, even though some services are not invocated by the current user. The author proposed two ranking prediction algorithms for computing the service ranking based on the cloud application designer’s preferences. Those two ranking algorithms perform well compared to the traditional greedy [31] and rating based [17] approaches. However, in this approach, the author predicts the rank on the basis of a single QoS value. So in this case, sometimes it may provide a different rank position for the same service based on different QoS parameters. Hence, we need to provide a single personalized ranking by using correlation properties of combined QoS values. This concept motivates us to produce a correlated QoS ranking for cloud services to improve the ranking accuracy.

2 System Overview

QoS is an essential idea in ranking process for users in cloud computing. This paper mainly focuses on a correlated ranking of user side properties, which are likely to have different values for different users of the same cloud service. More accurate correlated QoS ranking results can be accomplished by providing QoS values on additional cloud services, because the feature of an active user can be extracted from the given information. This paper focuses on examining the response time and throughput of different web services and service users. Response time is defined as the time taken between a service user sending a request and receiving the corresponding response. Throughput is defined as the average rate of successful service delivery.

Figure 1 depicts the system architecture of our CorQoSCloudRank framework, which provides correlated QoS ranking for cloud services. Here, the service users who require QoS services are named as active or current users. When an active user’s obligation arrives, the process of finding correlated similar users is first engaged. In the procedure of finding correlated similar users, we will compute similarity computation with the help of the Pearson Correlation Coefficient (PCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall Rank Correlation Coefficient (KRCC) using Normalized QoS values. Here, the datasets consist of Normalized QoS values. After finding the QoS correlation of similar users, we employ a data smoothing technique in normalized QoS datasets. This data smoothing is an efficient technique that is used to improve the accuracy of QoS ranking. For data smoothing, we employ a Fuzzy-C-Means (FCM) algorithm. A FCM algorithm is generally characterized as either a grouping of similar data values around a center or a prototype data instance nearest to the centered. Then, the preference function is estimated. During this process, the unknown QoS values of each and every user will be predicted with the help of the QoS correlated similar users. On the basis of estimated preference function values with data smoothing, correlated QoS ranking is employed.

Fig. 1
figure 1

System architecture of CorQoSCloudRank

The framework introduced in this work improves the accuracy of the QoS ranking with the help of a correlated QoS ranking technique. Correlated QoS ranking is implemented using an algorithm called CorQoSCloudRank. With the help of this algorithm, QoS ranking for all services has been ranked in an efficient way.

3 Correlated Qos Ranking Methodology

This section presents our Correlated QoS personalized ranking prediction framework for cloud services. Section 3.1 calculates the similarity of the active user with the training users based on correlated QoS values of commonly invoked cloud services. Section 3.2 describes the data smoothing concept that improves the ranking accuracy of our approach. Section 3.3 presents the description of the proposed Correlated QoS ranking prediction algorithm, named CorQoSCloudRank, respectively. Section 3.4 analyzes the ranking accuracy of the proposed approach.

3.1 Finding QoS Correlated Similar Users

This section commences the similarity estimation method of various service users. Let R be the user-service realistic QoS matrix of the size M × N where M symbolizes the number of users and N symbolizes number of services. At this instant, I symbolizes a set of services that is I = {i 1, i 2, i 3,…,i N } and U symbolizes a set of users or (user set) that is U = {u 1, u 2, u 3,…,u M } Each entry in this matrix R a,i represents a vector of QoS values that is observed by the service user a on the service item i. If user a did not invoke the service item i in his previous transactions, then R a,i  = null.

Similarity estimation is very useful to identify users utilizing the same resources. It is applied to compute the similarity between the users who use the same type of cloud resources and also to compute similarity among users who apply at least some of the resources. The relationship among user similarity is denoted by an M × M matrix, called the user–user similarity matrix. Similarity values generally ranges from 0 to 1, where 1 signifies an absolute value and 0 signifies an null value. Table 1 clearly depicts the similarity values and their variables. In this approach, three types of similarity measures are well-known. They are the PCC, SRCC and KRCC.

Table 1 Similarity values and variables

First, let us deal with the PCC [21, 22] that measures the similarity between two users based on their normalized services as;

$$SIM(a,b) = \frac{{\sum\nolimits_{{{\text{i}} \in {\text{I}}a \cap {\text{I}}_{b} }} {({\text{r}}_{{a,{\text{i}}}} - {\bar{\text{r}}}_{a} )} ({\text{r}}_{{b,{\text{i}}}} - {\bar{\text{r}}}_{b} )}}{{\left[ {\sum\nolimits_{{{\text{i}} \in {\text{I}}a \cap {\text{I}}_{b} }} {({\text{r}}_{{a,{\text{i}}}} - {\bar{\text{r}}}_{a} )}^{2} \sum\nolimits_{{{\text{i}} \in {\text{I}}a \cap {\text{I}}_{b} }} {({\text{r}}_{{b,{\text{i}}}} - {\bar{\text{r}}}_{b} )^{2} } } \right]^{1/2} }},$$
(1)

where a, b symbolize users, i, j symbolize services, \(I_{a} \cap I_{b}\) is the subset of cloud services commonly invoked by users a and b, r a,i is the resources of service ‘i’ observed by user a, r a,i is the resources of service ‘i’ observed by user b, \(\bar{r}_{a} ,\bar{r}_{b}\) is the average of resources worn by users a and b. From this description, the similarity of two service users, SIM(a, b), is in the interval of [−1,1], where a larger Pearson value indicates that service users a and b are more similar [59]. User-based collaborative filtering by means of PCC was employed in quite a lot of recommended [22, 43].

Second, the SRCC measures the strength of association between two ranked variables. It measures the similarity between two users a and b such as;

$$SIM(a,b) = \frac{{\sum\nolimits_{\text{i}} {(a_{\text{i}} - \bar{a})(b_{\text{i}} - \bar{b})} }}{{\sqrt {\sum\nolimits_{\text{i}} {(a_{\text{i}} } } - \bar{a})^{2} \sum\nolimits_{\text{i}} {(b_{\text{i}} - \bar{b})^{2} } }},$$
(2)

where ‘i’ symbolizes service, a i symbolizes the ith service taken by ath user and b i represents the ith service taken by bth user, \(\bar{a},\bar{b}\) represents the average users who avail the service. When two service users have a null service intersection, the value of SIM(a,b) cannot be computed (SIM(a,b) = null). If there are no repeated values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Third, the KRCC measures the similarity between two service rankings,

$$SIM(a,b) = 1 - \frac{{4 \times \sum\nolimits_{{i,j \in I_{a} \cap I_{b} }} {\tilde{I}((r_{a,i} - r_{a,j} )} (r_{b,i} - r_{b,j} ))}}{{\left| {I_{a} \cap I_{b} } \right| \times \left| {I_{a} \cap I_{b} } \right| - 1}}$$
(3)

where \(I_{a} \cap I_{b}\) is the subset of cloud services commonly used by users a and b, r a,i is the normalized QoS value of service ‘i’ used by user a, and \(\tilde{I}\) (x) is an indicator function given as;

$$\tilde{I}(x) = \left\{ {\begin{array}{*{20}l} 1 & {if\;x < 0} \\ 0 & {otherwise} \\ \end{array} } \right..$$
(4)

From the above definition, the ranking similarity between two rankings is in the interval [−1, 1], where −1 is obtained when the order of user a is the exact reverse of user b. Given that KRCC compares service pairs, the intersection between two users has to be at least \(2\left( {\left| {I_{a} \cap I_{b} } \right| \ge 2} \right)\) for making a similarity computation.

A set of similar users S(a) is identified for the active user a by

$${\text{S}}(a) = \{ b\left| b \right. \in {\text{Top}}\_{\text{K}}(a),{\text{SIM}}(a,b) > 0,b \ne a\} ,$$
(5)

where Top_K(a) is a set of the Top_K similar users of active user a. Top_K Similar users are acknowledged by arranging their similarity values in descending order. SIM(a,b) > 0 that excludes the dissimilar users with negative similarity values. In this paper, we make use of the hybrid Top-K algorithm to select neighbors. First, for the capable users, we set a similarity threshold value. Then the outcome of the dissimilar users are removed by adding the threshold value.

In this correlation technique, two or more QoS parameters are combined to find its correlated properties. Here throughput and response time are combined. When the throughput QoS value is high, it is said to be maximum; but for a response time when the value is less, it is said to be maximum. Since both QoS values are contradictory, normalization has to be performed to put both values in a common range (0–1). Here, the response time is manipulated as (1—response time).

Let us consider Si(a) as a set of similar users for the active user a based on the ith normalized QoS parameter. In our approach, a set of QoS correlated similar users CS(a) is identified for the active user a by

$$CS(a) = S_{1} (a) \cap S_{2} (a) \cap \cdots \cap S_{n} (a)$$
(6)

where ‘n’ represents the number of QoS parameters.

3.2 QoS Data Smoothing

Data smoothing is an important technique that is used to remove noise from a dataset, and allowing important patterns to stand out and improve the accuracy of the QoS prediction. Assume that u t is one of the similar users of u, and we want to predict the QoS of service ‘i’ to user u. Traditional methods replace the QoS of service ‘i’ to u t with 0 if u t has not invoked service ‘i’. Therefore, this process lowers the accuracy of the predicted QoS. This problem is handled in our paper with the help of the data smoothing concept.

The most preferred two partition-based cluster algorithms, notably K-Means and FCM are well-known. K-Means is one of the trouble-free unsubstantiated learning algorithms to solve clustering problems [54]. Through the result of performance based methods of Velmurugan [54], it is evident that the results are accurate, and easily understandable in FCM compared with K-Means [54]. Hence, in this approach, in order to improve the accuracy of the QoS value prediction, the data smoothing, as described in [42, 51], is done with the help of FCM clustering instead of a K-Means algorithm.

3.3 Correlated QoS Ranking

Given that the user-observed QoS values in a normalized form on two cloud services, the user preference function between these two services can be easily derived by comparing the normalized QoS values, where

$$\pounds\left( {{\text{i}},{\text{j}}} \right) = {\text{q}}_{\text{i}} {-}{\text{q}}_{\text{j}} .$$
(7)

\(\pounds\left( {{\text{i}},{\text{j}}} \right)\) is the preference function obtained where i and j represent services and qi and qj represents the normalized QoS value of the service ‘i’ and ‘j’. In our outlook of reality, our goal is to produce ranking for users, spotlight modelling a user’s preference function of the form as on [39, 58], \(\pounds\): I × I → IR, where \(\pounds\)(i,j) > 0.

Here ith service is compared with the jth service that is ith service is better than jth service, that means that service i is more preferable to j for present dynamic user a and vice versa. Suppose user a’s QoS throughput value on cloud service ‘i’ and ‘j’ are 5 and 3, respectively, This clearly indicates that the user prefers cloud service ‘i’ to the cloud service j as an indication for (i, j) > 0.

The magnitude of this preference function |\(\pounds\)(i,j)| indicates the strength of preference and a value of zero means that there is no preference between the two services. Assume that \(\pounds\)(i, i) = 0 for all i € I and that \(\pounds\) is anti-symmetric, i.e., \(\pounds\)(i, j) = −\(\pounds\)(j, i) for all i, j € I.

With the help of the user preference function calculation, the most similarity between services is gained as a \(\pounds\):(N × N) matrix. The result will be M number of users that will have M number of user preference functions such as

$$[\pounds_{1} ,\pounds_{2} ,\pounds_{3} ,\pounds_{4} , \ldots ,\pounds_{\text{M}} ],$$
(8)

To obtain the preference values regarding pairs of services that have not been raised or used by the present user, the preference values of similar correlated users CS(a) are engaged. Generally, stronger confirmation in priority is given by \(\pounds\)(i,j) > 0 for the present user where frequently similar correlated users in CS(a) view service ‘i’ is of higher quality than service ‘j’. This shows the way for manipulating the value of preference function \(\pounds\)(i, j), where service ‘i’ and service ‘j’ are not explicitly viewed by the present user a,

$$\pounds({\text{i}},{\text{j}}) = \sum\nolimits_{{b \in {\text{CS}}(a)}} {{\text{W}}_{b} ({\text{q}}_{{b,{\text{i}}}} - {\text{q}}_{{b,{\text{j}}}} )} ,$$
(9)

where b is a similar correlated user of the present user a, CS(a) is a subset of similar users of a, qb,i and qb,j represent the QoS value of the service i and j accessed by the user b. Wb is a weighting factor of the correlated similar user b, which can be estimated by

$$W_{b} = \frac{SIM(a,b)}{{\sum\nolimits_{b \in CS(a)} {SIM(a,b)} }}.$$
(10)

W b confirms that a correlated similar user with a higher similarity value has a better impact on the preference value prediction in Eq. (9). In the existing system, only the user b who accessed the both the services ‘i’ and ‘j’ is taken as the similar user of a. In our approach, if the correlated user didn’t access the service i or j, then the FCM clustering based data smoothing process is done to estimate \({\text{q}}_{{b,{\text{i}}}} ,{\text{q}}_{{b,{\text{j}}}}\) as described in [26]. Fuzzy C-Means clustering is used for clustering purposes on user similarity function. We assume all users into k group clusters as U = {u 1, u 2,…,un}, clustering results are represented as {C 1u , C 2u ,…,C ku }. Given b belongs to cluster Cu i.e., Cu \(\in\) {C 1u , C 2u ,…,C ku }, QoS vector qb,i is given as;

$$q_{b,i} = \tilde{q}_{b} + \Delta rC_{u} (i),$$
(11)

where \(\Delta rC_{u} (i)\) is the average QoS derivations of service ‘i’ to every users in cluster Cu

$$\Delta rC_{u} (i) = \frac{{\sum\nolimits_{{u^{\prime} \in C_{u(i)} }} {(q_{{u^{\prime},i}} - q^{\prime}_{u} )} }}{{\left| {C_{u} \left( i \right)} \right|}}$$
(12)

where Cu(i) \(\in\) Cu is the set of clusters Cu who have invoked service ‘i’ and |C u (i)| is the cardinality of Cu(i). After clustering based on user similarity average QoS for particular service has been employed.

3.4 CorQoSCloudRank

Qiu et al. [57] proposed a reputation-aware QoS value prediction approach that first calculates the reputation of each user based on their contributed values, and then takes advantage of its reputation-based ranking to exclude the values contributed by untrustworthy users. Wu et al. [41] presented a ranking method called ServiceRank that considers QoS aspects, such as response time and availability, as well as the social perspectives of services.

In collaborative filtering, the ranking on services is estimated on the basis of correlation, among the entire users’ service nature. Each user has his/her own ranking level for their preferred services according to the past QoS ranking system. For an ‘Employed’ service set, ranking is manipulated with the help of preference function calculation. For an ‘Unemployed’ service set, ranking is manipulated by picking the correlated neighbors who have chosen the service. For this algorithm, the following values are taken as input: the full service set (FS), the accessed service set for each user in M is (AS1, AS2,…,ASm), added normalized QoS values of each user (SNQ1, SNQ2,…,SNQn) and summation of preference function on each QoS values for all users in ‘m’.

CorQoSCloudRank algorithm explanation:

  • Step 1(lines 18). Rank the accessed cloud services in W based on the summation of normalised observed QoS values. stores the ranks of the accessed service, where (m) returns the rank of the service m, where m is a necessary cloud service that is accessed by the active user a.

From this procedure, rank is achieved for the accessed services. The Fig. 2 clearly depicts that an active user a provides rank for the accessed services s1 to sn, but there is no rank for the service s2 because it is not accessed. The following procedure has to be followed to find rank level for all the services, which are not accessed by the active user a.

Fig. 2
figure 2

Active user a ranking services

  • Step 2(lines 911). For each service in the full service set FS, the sum of the summation of preference values with all other services was calculated by \(\varPi_{\text{a}} ({\text{i}}) = \sum\nolimits_{{j \in {\text{FS}}}} {{\text{S\pounds}}_{\text{a}} (i,j)}\), where a is an active user. The larger Πa(i) value indicates active user a prefers ith service more than the other services.

  • Step 3(lines 1270). Here, the Correlated ranking scheme is applied to find the rank level for each service in FS. The line no 14 is executed to find the service m that has the maximum Πa(m) value. When the mth service is accessed by active user a then it will call the updaternk1 procedure i.e., (lines 71–82). In that procedure, the accessed service m is assigned with rank rnk, following the lines (65–70).

After that, as per the procedure from lines (78–81) the selected accessed service m is then removed from the full service set FS. The preference function values Π a (i) of the remaining services are updated to remove the effects of the selected accessed service m.

If mth service is not accessed by active user a, then (lines 20–28) will be executed. Here we check whether the correlated neighbors have chosen the mth service, if they are chosen, their ranks(nrnk) for mth service are retrieved with the help of their preference function. Three possibilities are available for choosing the rank on the basis of correlated ranks(nrnk) provided by correlated neighbors (Fig. 3).

Fig. 3
figure 3

First possibility

In this figure, the 2nd service is allotted with rank 3, because the correlated neighbors have not chosen the service and active user a has afforded rank 3.

  • Step 4 (lines 2931). First possibility is that the nrnk set is empty i.e., no one in the correlated neighbors chose the service m. In this case, the corresponding updaternk1 procedure is called to assign the rank with rnk.

  • Step 5 (lines 3238). In the second possibility, only one correlated neighbor provides rank nrnk(1) for the service m. Then line 33 checks if the rank value rnk (estimated by the active user on the basis of the preference function) differs more than one position level compared to the nrnk(1) means, if so it will call the updaternk1 procedure from lines (71–82); otherwise, it will call the updaternk2 procedure from lines (83–95) (Fig. 4).

    Fig. 4
    figure 4

    Second possibility with single rank

In this figure, the 2nd service is allotted with rank 5, because the rank provided by user 2 is the 5th rank and the rank estimated by active user a is the 4th rank. Both ranks differ only in one position level, but this is just opposite in the example shown in Fig. 5.

Fig. 5
figure 5

Second possibility with update rank

In this figure, the 2nd service is allotted with rank 7, because the rank provided by the user 2 is the 3rd rank and the rank estimated by active user a is the 7th rank, so both ranks do not satisfy the condition. So the 7th rank is assigned for the 3rd service.

In the updaternk2 procedure, the service m is assigned with rank nrnk(1), before it processes the services that are already allotted with the rank as nrnk(1) and the above will be incremented by one level.

After that, as per the procedure from (lines 84–87) the selected service m is then removed from the full service set FS. The preference function values Πa(i) of the remaining services are updated to remove the effects of the selected service m.

  • Step 6(lines 3963). In the third possibility, as shown in Figs. 6 and 7, more than one correlated neighbor provides rank for the service that is not accessed by the active user a. In this case, as per the lines (40–47), it will find the most frequent rank (mfrnk) from the correlated ranks (nrnk). Then, as per line number 48, it checks the availability of the most frequent rank, and if it so available, the most frequent rank(mfrnk) will update the rank; otherwise, it will choose the average of the correlated neighbors ranks(avgrnk).

    Fig. 6
    figure 6

    Third possibility with frequent rank

    Fig. 7
    figure 7

    Third possibility with average rank

  • In this figure, correlated users u2 and u4 have allotted the rank as 2, and since rank 2 has occurred frequently, we have considered the frequent rank as 2.

  • In both cases, it will check line numbers 49 and 56, to see whether the chosen rank (avgrnk/mfrnk) differs more than one position level compared to rank rnk which was estimated by the active user a. If the condition is satisfied call updaternk1 procedure or it will call updaternk2 procedure.

This figure clearly shows that there is no frequent rank available, so with the help of the correlated neighbors rank, we can find the average rank.

  • Step 7 (lines 6570). In this part, the ranks of the accessed services in is corrected with the help of .

4 Experiments

4.1 Dataset Description

To evaluate the correlated QoS ranking accuracy, we used the detailed WSDream-QoS dataset values that were publicly released online by Zheng et al. [60]. This dataset consists of QoS values of the 500 real-world web services viewed by the 300 service users. It is represented as a 300 × 500 user-item matrix, where each item in the matrix is the QoS value of a web service observed by a user. Totally 150,000 web service invocations are provided. The response time and throughput values of each invocation are given. Our experiment has been carried out with MATLAB 13. In our experiment, the QoS values are employed to rank the services that are to be correlated.

4.2 Evaluation Metric

The QoS ranking prediction is to predict QoS values as accurate as possible. If the QoS numerical values are given as class or labels our algorithm provides good results for qualitative variables also. In order to evaluate the ranking prediction accuracy, we employ the Normalized Discounted Cumulative Gain (NDCG) [58, 61] metric, which is a popular metric for evaluating ranking results. The NDCG measures the performance of a recommendation system based on the condition significance of the recommended entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal ranking of the entities. Given an ideal service QoS ranking (used as ground truth) and a correlated QoS ranking, the NDCG value of the Top-K ranked services can be estimated by

$$NDCG_{k} = \frac{{DCG_{k} }}{{IDCG_{k} }},$$
(13)

where DCGk and IDCGk are the Discounted Cumulative Gain (DCG) values of the Top-K services of the correlated ranking and ideal ranking. The value of DCGk can be estimated by

$${\text{DCG}}_{\text{k}} = rel_{1} + \mathop \sum \limits_{i = 2}^{k} \frac{{rel_{i} }}{{\log_{2} i}}.$$
(14)

where reli is the graded relevance QoS value of the service at position i of the ranking.

The premise of DCG is that a high-quality web service appearing lower in the ranking list should be punished as the graded relevance value that is reduced as logarithmically proportional to the position of the result. The DCG value is gathered cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks. The ideal rank achieves the highest gain among all different rankings. The NDCGk value is on the interval of 0–1, where a larger value stands for a better ranking accuracy, indicating that the correlated ranking is closer to the ideal ranking. The value of ‘k’ is in the interval of 1 to m, where m is the total number of cloud services.

4.3 Performance Comparison

The analysis on ranking prediction is done through six different techniques of correlated QoS ranking scheme. In the former three models of correlated QoS ranking, the correlation properties of multiple QoS parameters are used in the user similarity estimation process. The later three methods utilize the correlation property for user similarity as well as ranking prediction. The six proposed correlated QoS ranking algorithms are as follows,

  1. 1.

    QoS Correlated User based CloudRank without Data Smoothing (QCUCR).

    • This method employs correlation property among multiple QoS values only for user similarity estimation. The preference function value of each QoS parameter is added together to predict the rank on the basis of ideal ranking in descending order.

  2. 2.

    QoS Correlated User based CloudRank with (Data Smoothing) K-Means (QCUCRK).

    • This method is similar to QCUCR. In this approach, a pre-processing technique, such as data smoothing is employed with the help of the K-Means algorithm.

  3. 3.

    QoS Correlated User based CloudRank with (Data Smoothing) FCM (QCUCRFC).

    • This method is similar to QCUCR. In this approach, a pre-processing technique, such as data smoothing is employed with the help of the FCM algorithm.

  4. 4.

    Correlated QoS CloudRank without Data Smoothing (CQCR).

    • This method employs correlation property among multiple QoS values for user similarity estimation as well as to predict personalized ranking.

  5. 5.

    Correlated QoS CloudRank with (Data Smoothing) K-Means (CQCRK).

    • This method is similar to CQCR. In this approach, a pre-processing technique such as data smoothing is employed with the help of the K-Means algorithm.

  6. 6.

    Correlated QoS CloudRank with (Data Smoothing) FCM (CQCRFC).

    • This method is similar to CQCR. In this approach, a pre-processing technique, such as data smoothing is employed with the help of the FCM algorithm.

In the actual world, the user-item matrixes are generally very sparse since a user normally only chooses a very rare number of cloud services. So to make our experiments practically, we randomly remove entries from the user-item matrix to make sparser with various densities. The user-item matrix density (i.e., proportion of nonzero entries) is reduced randomly to d %.

In our experiment, all the six proposed correlated QoS ranking algorithms employ a matrix density from 10 to 50 % percent with the step value as 5 %. While evaluating the ranking accuracy, each six proposed correlated QoS ranking algorithms are executed up to 30 times and the average value is illustrated in this paper. The rankings based on the unique full matrix are utilized as model rankings to learn the QoS ranking accuracy. The Top-K value is set to 10 in the prediction process. The threshold assigned for the hybrid Top-K algorithm is 0.25.

The three traditional similarity computation methods such as the PCC, SRCC and KRCC are employed for all six proposed correlated QoS ranking algorithms, whose performance is evaluated in the form of NDCG1, NDCG10 and NDCGG100.

Tables 2, 3 and 4 shows the performance analysis of the NDCG based correlated cloud rank approach with the PCC, SRCC and KRCC calculated for 0, 30, and 50 % density for user-item matrix (Figs. 8, 9, 10).

Table 2 Analysis on NDCG versus manifest using PCC
Table 3 Analysis on NDCG versus Manifest using SRCC
Table 4 Analysis on NDCG versus manifest using KRCC
Fig. 8
figure 8

Impact of PCC with correlation ranking scheme

Fig. 9
figure 9

Impact of SRCC with correlation ranking scheme

Fig. 10
figure 10

Impact of KRCC with correlation ranking scheme

The result analysis shown in Tables 2, 3 and 4 clearly describes that the ranking accuracy (in terms of NDCG values) increases as the density of the user-item matrix also increases from 10 to 50 %. The denser user-item matrix provides more data for the ranking accuracy.

Figure 11 clearly shows that the ranking accuracy has been improved as the density of user-item matrix has been increased from 10 to 50 %. In our work, the CQCR-PCC in 50 % outperforms CQCR-PCC in 10 % with 0.0203 NDCG same for SRCC and KRCC with 0.1094 NDCG and KRCC with 0.1278 NDCG.

Fig. 11
figure 11

Matrix density 10, 30 and 50 %

The inclusions of the pre-processing technique, i.e., the data smoothing technique enhance the accuracy level. Compared with the K-Means data smoothing algorithm technique, the FCM algorithm based data smoothing consistently achieves a better ranking accuracy.

Figure 12 clearly shows that CQCRFC outperforms CQCR with 0.0298 NDCG for PCC, with 0.0075 NDCG for SRCC, with 0.0106 NDCG for KRCC and outperforms CQCRK with 0.023 NDCG for PCC, with 0.0032 NDCG for SRCC, with 0.0097 NDCG for KRCC. This examination indicates that, the correlation property in the combined form of QoS with the data smoothing technique improves the ranking accuracy for multiple QoS constraints.

Fig. 12
figure 12

Impact of data smoothing

On the basis of the similarity computation technique, the KRCC method obtained improved results compared to the PCC and SRCC.

Figure 13 clearly shows that the algorithm with the KRCC outperforms SRCC by 0.0427 NDCG and outperforms the PCC by 0.1699 NDCG.

Fig. 13
figure 13

Impact of similarity computation methods

While employing the correlation property of multiple QoS values for similarity estimation and ranking prediction yields a better result than applying the correlation property only for a similarity estimation.

Figure 14 clearly shows that the Correlated QoS CloudRank without Data Smoothing (CQCR) outperforms the QoS Correlated User based CloudRank without Data Smoothing (QCUCR). CQCR outperforms with 0.027 NDCG for PCC, with 0.0141 NDCG for SRCC and with 0.0036 NDCG for KRCC when compared with QCUCR.

Fig. 14
figure 14

Impact of QoS correlation property in ranking

The Correlated QoS CloudRank with (Data Smoothing) FCM (CQCRFC) technique obtains the improved prediction accuracy (largest NDCG values) with combined QoS under all the experimental settings consistently. It achieves a higher ranking accuracy than the other five techniques. Figure 15 clearly shows that the KRCC with FCM outperforms CQCR with 0.0106 NDCG and outperforms CQCRK with 0.0097 NDCG.

Fig. 15
figure 15

Impact of Kendall with FCM

In [58], the CloudRank2 (response time) approach with its ideal ranking (single QoS) it achieves a ranking accuracy 0.8884 NDCG for 50 % matrix density and KRCC for similarity estimation, for CloudRank2 (throughput) it is 0.8943 NDCG. The proposed approach CQCRFC compared with its ideal ranking (multiple QoS) it achieves 0.9091 NDCG for 50 % matrix density.

Figure 16 clearly shows that, the CQCRFC technique improves the ranking accuracy with 0.0148 NDCG compared with CloudRank2 (throughput) and with 0.0207 NDCG CloudRank2 (response time).

Fig. 16
figure 16

Impact of correlated property of multiple QoS

5 Related work

Cloud computing is gaining popularity these days. There are numerous works on cloud computing such as multimedia communication [13], virtualization [38, 46, 50], load balancing [10], fault tolerance [24, 40], service pricing [55], profit maximization [2, 4], admission control [32], service composition [6], queuing systems [3, 8], and resource monitoring [56]. The QoS is also an important concept in cloud computing [5, 34, 41]. Different applications have different QoS requirements [5, 15, 57]. QoS of cloud services can be compared either from the client side or at the server side (e.g.; price, availability, etc.). QoS measures have been used for various approaches such as resource management [11, 15], resource scheduling [9, 20, 23, 30, 48], service measurement [45], data replication [25], swarm optimization [14], and resource optimization [49]. This paper focuses on predicting optimal service selection using Correlated QoS ranking to improve accuracy.

Recommendations are specified to the user based on assessment of items by other users from the same group, with whom he/she shares common preferences. If the item has been positively rated by the community, it will be recommended to the user. Collaborative filtering methods are widely adopted in recommender [28] and QoS systems [17]. Two types of collaborative filtering approaches are widely studied memory based [21, 22, 27, 43] and model based [21, 22, 27, 37, 43]. The model-based approaches group together different users in the training database into a small number of classes based on their rating patterns. In order to predict the rating from a test user on a particular item, these approaches first categorize the test user into one of the predefined user classes and use the rating of the predicted class on the targeted item as the prediction [37, 43]. Algorithms within this category include bayesian network approaches [27], the aspect model [33, 52], gradient descent [12] and the latent class models [52, 53].

Two types of memory-based methods [29] have been studied: user-based [21, 43] and item-based [7, 21, 37, 39]. User-based methods first look for similar users who have similar rating styles with the active user and then employ the ratings from those similar users to predict the ratings for the active user. Item-based methods share the same idea with user-based methods. The only difference is user-based methods to find the similar users for an active user but item-based methods try to find the similar items for each item. However, the most common method used in a collaborative filtering method is the user based model [7, 43]. To find similarity between users, the Vector Similarity (VC) [16, 39], Cosine Based Similarity [7, 36, 37], KRCC [39, 58], SRCC and PCC [21, 22, 43], methods were employed. The present work is different from the preceding work [57, 58, 59] in the sense that the accuracy of QoS based ranking has improved with the help of correlation and combination of QoS properties. Still enormous effort is needed for employing collaborative filtering methods for Web service QoS value prediction.

6 Conclusion and Future Work

In this paper, we propose a correlated QoS ranking algorithm to predict personalized ranking for service selection for an active user. Six different kinds of correlated ranking algorithms were proposed and we compared their accuracy in terms of NDCG. The process of selecting similar neighbors for an active user is a very important one for the accuracy of prediction; hence, in this scheme we proposed a QoS correlation property based user selection with hybrid Top-K algorithm. Multiple QoS correlation properties were efficiently extracted and combined properly to predict the rank for the cloud service. The investigational results show that our approach improve the accuracy of the QoS ranking prediction.

For future work, we would like to investigate time-aware correlated QoS ranking approaches for cloud services by using data collected from service users, cloud services, and time. Apart from the correlation property of QoS, the location aware ranking prediction scheme can be extended in the future. Furthermore, we also plan to detect and handle malicious QoS values provided by users.