1 Introduction

In the cloud computing environment, computation is not performed on the local computers, but it is done by the cloud servers located in data centers, which provide infrastructure, software and platform as Internet-based service. In fact, the goal of cloud computing is to integrate hardware and software as a service accessible to users through the Internet (Aslanpour et al. 2017; Buyya et al. 2010; Ghobaei-Arani et al. 2016, 2017a). The rapid development of cloud computing leads to publishing many different web services throughout the world (Ghobaei-Arani and Shamsi 2015). Nowadays, one of the new solutions to build an enterprise application system is the service-based applications. Also, service-oriented architecture is seen as a leading method in the architecture of enterprise solutions. It provides a rapid response to changes, requirement and service adaptation, and demands several services available in the organization. For this reason, it is necessary to identify the key elements of service-oriented architecture (SOA) and restrictions of service composition support, composition verification and automated composition (Simon et al. 2013; Piprani et al. 2013). The quality of service (QoS) of web services refers to different non-functional properties such as response time, availability and reliability. For a combination request, several candidate services can be achieved so that they provide similar functions with different QoS. Regarding the non-functional properties of QoS-based service, web services composition engine establishes the best candidate services from the collection services (Portchelvi et al. 2012). By increasing demands of cloud service customers, cloud services have been expanded and therefore they have been being grown rapidly (Rahmanian et al. 2017). Since the capacity of a data center is limited, to achieve more stable services, it seems that the load distribution over global data centers can be a suitable approach (Ghobaei-Arani et al. 2017b). Most web services are established on cloud data centers distributed geographically around the world. The cloud data centers interdepend on the network to communicate with each other and cloud users. The network communication may have some effects on the performance of service provided by the involved data centers. The network QoS is a significant metric for web service composition problem, and also it is important to avoid violating the service-level agreement (SLA) during the service composition process (Gholami and Ghobaei-Arani 2015; Jula et al. 2014). In this paper, we propose a new algorithm for web service composition problem in the geographically distributed cloud environment. In proposed algorithm, we used the cuckoo search technique for web service composition problem; we then evaluated the response time, cost, availability and reliability of the composition process as four major QoS criteria. The service composition problem is known as an NP-hard optimization problem in that a larger service is provided by services integrating processes. In many cases, the web services are deployed by multiple service providers, and no single service can satisfy a user’s needs. Therefore, we need a service composition algorithm to combine several web services from different service providers and fulfill the user’s requirements. In this paper, we apply a new evolutionary optimization algorithm called cuckoo search algorithm (CSA) (Rajabioun 2011) for web service composition problem in the geographically distributed cloud environment called CSA-WSC. The CSA is a novel nature inspired by lifestyle of a bird family called cuckoo that is appropriate for where there is incomplete information regarding the environment and in dynamic, complex, or random environments with a large number of uncertainties such as cloud environments, and it outperforms compared to other well-known algorithms like particle swarm optimization (PSO), ant colony optimization (ACO), genetic algorithm (GA).

The main contributions of this research can be summarized as follows:

  • We present a meta-heuristic algorithm called CSA-WSC to reduce the complexity of the approach compared with other algorithms.

  • Our algorithm considers the distributed network environment. Also, our algorithm considers not only the QoS of the web services but also the network QoS.

  • The cuckoo algorithm applied has more advantages such as faster higher convergence speed, higher precision, ability to search for local as well as global search, much less likely trapped in local optimum points.

The rest of this paper is organized as follows: In Sect. 2, we focus on a survey of related works. Section 3 provides the necessary backgrounds for the proposal of this paper. In Sect. 4, we describe the proposed method. Section 5 presents an experimental evaluation of the proposal, and we finally present the conclusions and future works in Sect. 6.

2 Related works

This section will refer to some research about service composition in the cloud environment.

Zhou and Yao (2016) proposed a hybrid artificial bee colony (HABC) algorithm for optimal selection of QoS-based cloud manufacturing service composition. The HABC algorithm employs both the probabilistic model of Archimedean copula estimation of distribution algorithm (ACEDA) and the chaos operators of a global best-guided artificial bee colony to generate the offspring individuals with consideration of quality of service and cloud manufacturing environment. Experimental results have shown that the HABC algorithm could find better solutions compared with such algorithms as a genetic algorithm, particle swarm optimization and basic artificial bee colony algorithm.

Yu et al. (2015) presented an ant colony optimization-based algorithm for web service composition called ACO-WSC, which attempts to select cloud combinations that are feasible and use the minimum number of clouds. In the ACO-WSC algorithm, artificial ants travel on a logical digraph to construct cloud combinations. The ACO-WSC algorithm achieves a superior tradeoff between time and quality, and it is a practical solution for deploying in multi-cloud web service provision environments.

Wang et al. (2015) developed a genetic-based approach to web service composition problem in a geo-distributed cloud environment so that simultaneously minimizing SLA violations. To deal with cases where cloud datacenters are located geographically, a QoS-based composition model for cloud network environment is considered.

Seghir and Khababa (2016) a hybrid genetic algorithm (HGA) to solve the QoS-aware cloud service composition is proposed. The HGA combines two phases to perform the evolutionary process search, including genetic algorithm phase and fruit fly optimization phase. In genetic algorithm phase, the global search process is performed, while the local search process was carried out by the fruit fly optimization phase.

Chen et al. (2016) designed a flexible QoS-aware web service composition (QWSC) method by multi-objective optimization in cloud manufacturing. The QWSC method considers both the QoS performance and the QoS risk constraints as the optimization objectives. Moreover, a \(\upvarepsilon \)-dominance multi-objective evolutionary algorithm (EDMOEA) is developed to solve the large-scale QWSC.

Karimi et al. (2016) proposed a QoS-aware service composition approach in cloud computing using data mining techniques and genetic algorithm. In their proposed approach, a genetic algorithm was used to achieve global optimization about service-level agreement. Moreover, service clustering was used for reducing the search space of the problem, and association rules were used for a composite service based on their histories to enhance service composition efficiency.

Kurdi et al. (2015) proposed a novel combinatorial optimization algorithm for cloud service composition (COM2) that can efficiently utilize multiple clouds. The COM2 algorithm ensures that the cloud with the maximum number of services will always be selected before other clouds, which increases the possibility of fulfilling service requests with minimal overhead.

Liu and Zhang (2016) an approach of synergistic elementary service group-based service composition (SESG-SC) for QoS-aware service composition problem in cloud manufacturing is presented. Their approach releases the assumption of a one-to-one mapping between elementary services and subtasks, allowing a free combination of multiple functionally equivalent elementary services into a synergistic elementary service group (SESG) to perform each subtask collectively, thereby bettering the overall QoS and achieving more acceptable success rate.

Qi et al. (2017) applied a knowledge-based differential evolution (KDE) algorithm to solve web service composition optimizing problem in cloud environments. Their algorithm improves the accelerate convergence velocity by importing structure knowledge. The simulation results indicate the effectiveness of KDE in solving a complex optimization problem with large-scale solution space compared with the original differential evolution, particle swarm optimization algorithms.

Wang et al. (2013) presented a particle swarm optimization approach with skyline operator for fast cloud-based web service (CWS) composition in cloud environments. Their approach adopts skyline operator to prune redundant CWS candidates and then employs particle swarm optimization (PSO) to select CWS from amount of candidates for composing single service into a more powerful composite service.

Lartigau et al. (2015) described a method based on QoS evaluation along with the geo-perspective using an improved artificial bee colony (ABC) optimization algorithm in cloud manufacturing. The modification of the original ABC initialization gives certain advantages at the start, targeting the neighborhood directly around the shortest transportation path. Also from a computational efficiency perspective, their proposed algorithm is surely faster than PSO and GA optimization.

Huo et al. (2015) proposed the discrete Gbest-guided artificial bee colony (DGABC) algorithm for cloud service composition which simulates the search for the optimal service composition solution through the exploration of bees for food. The DGABC algorithm has advantages in terms of the quality of solution and efficiency compared with other algorithms, especially for large-scale data.

Klein et al. (2014) developed the self-adaptive network-aware approach to service composition in cloud environments. Their approach employs a self-adaptive genetic algorithm which balances the optimization of latency and other QoS as needed, and it handles the QoS of services and the QoS of the network independently, improves the convergence speed and finds compositions with low latency for a given execution policy.

Zhao et al. (2015) presented a systematic approach based on a fuzzy preference model and evolutionary algorithms for SLA-constrained service composition problem. Authors model this multi-objective optimization problem using the weighted Tchebycheff distance rather than a linear utility function and present a fuzzy preference model for preference representation and weight assignment. Also, they present two evolutionary algorithms (EA), single_EA and hybrid_EA, that attempt to determine different types of optimization solutions for service composition. Faruk et al. (2016) presented a unique heuristic method to resolve the QoS-aware cloud service selection using an enhanced genetic particle swarm optimization (GPSO) algorithm which is broadly utilized to crack hefty scale optimization issues. Also, the adaptive non-uniform mutation (ANUM) approach is proposed to attain the best particle globally to boost the population assortment on the motivation of conquering the prematurity level of GPSO algorithm.

Table 1 sums up some of the most relevant works related to web service composition techniques in cloud environments based on four characteristics: (1) optimization techniques used, (2) QoS criteria, (3) advantages and (4) disadvantages.

Table 1 Survey of works related to web service composition techniques in cloud environment

3 Preliminary

In this section, we first introduce the QoS-based web services, location-based web services and composition services. We then provide a brief overview of the cuckoo search algorithm.

3.1 The QoS-based web services

The QoS of web services describes the non-functional properties such as availability, reliability, response time. The QoS of individual services is supplied by service providers, and features of every QoS criterion are collected by user’s feedback. In this paper, we focused on four QoS criteria that include: response time, cost, availability and reliability as shown in Table 2. Also, SLA is defined by these four QoS criteria.

Table 2 Examples of QoS criteria of single and independent services

3.2 Location-oriented web services

Since there are several individual services in different data centers, the degree of distribution of these services effects on the composited services QoS. For example, the individual services located in the same data center compared with two individual services on data centers that have been deployed in Asia and Europe are different, and network delay between them in communicating with each other is a significant parameter. The performance of the network is critical to the composited services. Therefore, there are two types of network latency: the network latency between services and the network latency between services and the users. The network latency between services, which is defined by variable dt1 in Table 3, is mainly determined by the geographical location of data centers in which those services are deployed. The latency between data centers is measurable and predictable since the number of data centers for cloud providers is limited and stable. The cloud providers can save the network latency between data centers to use easily in their cache. The network latency between the service provider and the user, which is defined by variable dt2 in Table 3, is mainly determined by the network environment among the services, and it can also be obtained from the network feedback (Wang et al. 2015).

3.3 Composited services

SLA refers to a contract between users and providers for the guaranty of QoS criteria. In order to find whether a service composition can meet the SLA, it is necessary to check the quality of service by collecting of individual services QoS criteria and network QoS. The composition service QoS is related to composition path structure. As shown in Fig. 1, there are three composition structures: sequential, parallel and conditional (Wang et al. 2015). Computing of sequential structure QoS provides a basis for computing of other structures QoS. Granulation functions for computing the QoS criteria of composition services in Table 3 are shown where \(t_i\) represents the response time, \(p_i\) represents the price, \(a_i\) represents availability, and \(r_i\) represents collected reliability of the ith consecutive branches.

Table 3 Aggregation functions for quality of service computation (Wang et al. 2015)
Fig. 1
figure 1

The structure of the composition services. a Sequential. b Parallel. c Conditional

3.4 Cuckoo search algorithm

Cuckoo search algorithms (Rajabioun 2011; Wang et al. 2016a, b; Zhou et al. 2016; Fouladgar and Lotfi 2016) find the proper answer at a certain distance from the optimal solution which is called suboptimal solutions. The cuckoo search algorithm is one of the strongest evolutionary algorithms, which has a greater ability to find the global optimum compared with other evolutionary algorithms. Algorithm 1 provides the pseudo-code of the cuckoo search algorithm. This algorithm starts by an initial population. The population of cuckoo has some eggs, which will be put in the nests of some host eggs.

figure a

Those eggs that are similar to the eggs of host bird have more chance to grow and become mature cuckoo. The other eggs are recognized by the host bird, and they vanish. The more eggs indicate the nests suitability of that area. If more eggs can live and can also be rescued from that region, we should pay more attention to that area. Therefore, the situation in which more eggs are rescued will be a parameter for the cuckoo search algorithm (CSA) to optimize it. The cuckoos search the best place for rescuing more eggs. After hatching and becoming an adult cuckoo, they come together to make homogenous groups. Each group selects a place to live. The best place of all groups is the destination for the next group. Everyone of the groups emigrates to that optimal place, and each group settles near the best place. By considering the number of eggs for each cuckoo and also the distance of cuckoo from the optimal place, they take into accent the radius of laying. After that, cuckoo starts to lay into the next of radius of laying randomly. This process continues to reach the optimal play for laying. That optimal place is the place in which many cuckoos are gathered.

Fig. 2
figure 2

Service composer model for cloud computing environments

Fig. 3
figure 3

An example of web service composition

4 The proposed algorithm

In this section, we first model the cloud web service composition problem. After offering a cloud web service composition model, a workflow example in cloud computing is provided to show the way in which web service composition is applied to serve tasks of a cloud application. The objective function of the given problem is also stated. Finally, the proposed algorithm is described as the CSA-WSC.

4.1 Web service composition model

The abstract model of the web service composer is depicted in Fig. 2. A workflow of user’s requests is specified by the service requester. Service composition request section in the figure is used to serve the given tasks. Cloud combiner determines the candidate cloud web services for the requested tasks. Composition planner selects a subset of candidate cloud web services to execute the tasks. Finally, a service composition sequence for the given workflow is the output of the cloud web service composition model.

Let \(\mathrm{sc}=\{s_{1},s_{2},{\ldots },s_{n}\}\) denote a collection of web services where \(s_{i }(1\le i\le n)\) denotes the \(i\mathrm{th}\) web service. Each service provider published a set of web services as shown in Fig. 2. A cloud set (CS) is a set of m clouds \(\mathrm{CS}=\{C_{1},C_{2} ,{\ldots },C_{m}\}\) where \({C}_{i}(1\le i \le m)\) denotes the \(i\mathrm{th}\) cloud.

Generally, a cloud web service composition problem can be defined by a triple \(<I,G,W>\) where I shows the start point that is provided by the customer’s request (initial interface); G indicates the goal interface that is provided by customer’s requests (goal interface); and W denotes a set of available web services in clouds that can be used (web services). A sequence of web services ordered by their usage is the solution to the given problem such that the solution sequence \(\pi \subseteq w\).

Hence, a composite service is the process of selecting a subset of provided web services for tasks. As an example, Fig. 3 shows that several similar services are available for each task.

4.2 Objective function of the WSC problem

According to the described problem and the mentioned aggregation functions for the quality of service, the objective function of web service composition problem can be formulized as follows:

$$\begin{aligned}&\mathrm{Max}\mathop {\sum }\limits _{i=1}^x \mathop {\sum }\limits _{j=1}^k {\mathrm{Quality}}_j \left( {\pi _i } \right) .w_j \end{aligned}$$
(1)
$$\begin{aligned}&\mathrm{Subject~~to}:\nonumber \\&\quad \mathrm{Quality}_j \left( {\pi _i } \right) \le \mathrm{SLA}_{i,j} \end{aligned}$$
(2)
$$\begin{aligned}&0\le w_j \le 1 \end{aligned}$$
(3)
$$\begin{aligned}&\mathop {\sum }\limits _{j=1}^k w_j =1 \end{aligned}$$
(4)

where \(\pi _i \) is a subset of candidate web services that are chosen in a sequence to execute tasks; \(\mathrm{Quality}_j \left( {\pi _i } \right) \) function calculates the \(j\mathrm{th}\) qualitative parameter that is specified for the \(i\mathrm{th}\) workflow by the user. \(\mathrm{SLA}_{i,j}\, (1\le i\le x\) and \(1\le j\le k)\) denotes the \(j\mathrm{th}\) qualitative constraint for the \(i\mathrm{th}\) workflow that is specified by the user. The weight of the \(j\mathrm{th}\) qualitative parameter is denoted by \(w_{j}\, (1\le j\le k)\).

4.3 The CSA-WSC

In this section, the proposed algorithm, i.e. the cuckoo search algorithm for web service composition (CSA-WSC), is provided step by step based on the structure of cuckoo search algorithm. The steps of the proposed algorithm for selecting web services are shown in Fig. 4.

Fig. 4
figure 4

The flowchart of the CSA-WSC

4.3.1 The first step: the production of the initial habitat of cuckoo based on the needed services

In order to solve the given problem, the values of variables of the problem should be in the form of an array. In genetic algorithm (GA) and particle swarm optimization (PSO), these arrays are distinguished as chromosome and particle position, while at the cuckoo search algorithm this array is called habitat. In proposed algorithm, for each cuckoo two criteria to be considered, and the sample services table with their index are stored. The initial population of cuckoos is shaped, due to the needed services for each request. For better understanding about how to the production of initial population and proposed algorithm, an example is explained. If the number of sample service is considered to 30, each request may need between 1 and 30 services and the sum of the atomic services number for each sample is 200. For example, if a request needs six types of services, it is clear that the algorithm parameter of cuckoo will determine like Table 4. Based on the parameters for the given an example in Table 4, the length of each cuckoo from the population is 6, as shown in Table 5.

Table 4 The parameters initialization of cuckoo search algorithm
Table 5 The sample of request structure

If the solution of the problem needs finding \(N_\mathrm{var}\) responses at the next optimal problem, next \(N_\mathrm{var}\) that will be a habitat and array which has the current position of the life of cuckoo. It is defined in the format of \({\mathrm{Habitat}=[X}_\mathrm{1} {,X}_\mathrm{2} {,X}_\mathrm{3},....,X_{N_\mathrm{var}}]\). Here each X is corresponding to the position of a cuckoo. Each cuckoo has two parts; the habitat and its QoS. In Table 6, the position of sample cuckoo is provided. Each X index of a service is the sum of atomic services in request sample.

In order to improve the quality of solution and the speed of convergence, one part of initial population is produced based on the concept of horizon line according to Eq. (5) and the rest are produced randomly, so that, P is the initial population, N is the length of habitat array, and SL is the number of selected services obtained in the skyline series.

$$\begin{aligned} P = \mathrm{SL} / N + (N - \mathrm{SL}) / N \end{aligned}$$
(5)

For example, if the number of cuckoo habitats (variable X) is six, therefore, the half of the services (three services) are specified by the skyline series and the other three services are selected randomly. In fact, three-sixth of the initial population is generated based on the concept of the skyline, and the other three-sixth population is randomly generated.

Definition 1

(Dominance) for two individual services \(S_1 \) and \(S_2 \) in service set, called \(S_1 \), dominates on \(S_2 \) when the following conditions are met:

$$\begin{aligned}&\left\{ {\forall \left( {q_{i.1}^{-} .q_{i,2}^{-} } \right) |q_{i.1}^{-} \le q_{i.2}^{-} } \right\} \wedge \left\{ {\forall \left( {q_{i.1}^+ .q_{i.2}^+ } \right) |q_{i.1}^+ \ge q_{i.2}^+ } \right\} \end{aligned}$$
(6)
$$\begin{aligned}&\left\{ \exists \left( {q_{i.1}^{-} .q_{i.2}^{-} } \right) |q_{i.1}^{-} <q_{i.2}^{-} \right\} \vee \left\{ {\exists \left( {q_{i.1}^+ .q_{i.2}^+ } \right) \left| {q_{i.1}^+ } \right\rangle q_{i.2}^+ } \right\} \end{aligned}$$
(7)

where \(q_{i,j}^{-} \) is the ith negative QoS criteria of \(S_j\) atomic service and \(q_{i.j}^+ \) also is the ith positive QoS criteria of \(S_j\) atomic service. In the following, we can see an example of the definition of dominance in Figs. 5, 6 and 7 and corresponding Tables 7, 8 and 9. Four criteria are used to cover the quality of service, which contain two positive QoS criteria: reliability and availability, and two negative QoS criteria: cost and response time.

According to Fig. 5, individual service \(S_5 \) dominates an individual service \(S_3 \), or individual service \(S_2\) is not determined by other individual services. That is, the individual service \(S_2\) has no dominance.

According to Fig. 6, the individual service \(S_1\) is dominated by individual service \(S_3\), or the individual service \(S_5\) is not determined by other individual services. In other words, the individual service \(S_5\) has no dominance. Right now, we can get all four QoS criteria and obtain the average of positive and negative QoS criteria in Fig. 7.

Fig. 5
figure 5

Weight of the positive QoS criteria

Fig. 6
figure 6

Reverse weight of negative QoS criteria

Table 6 The sample of the cuckoo structure to sample of request structure Table 5
Fig. 7
figure 7

The average of positive and negative QoS criteria

Table 7 The services weight for each criterion
Table 8 Reverse the negative weight
Table 9 Average of positive and negative QoS criteria

Definition 2

(horizon line set) It is a subset of services, and it has produced alone available service in service set, which has no domination. For example, according to Fig. 7, horizon line set of positive values is \((S_2 , S_4 , S_6 )\) and the horizon line of negative values is \((S_2 , S_4 , S_5 )\) and the horizon line of each four values is \((S_5 , S_2 , S_4 )\).

Therefore, the computation cost for horizon line set become more expensive while the number of individual services increases.

4.3.2 The second step: the evaluation of initial population using fitness function

The fitness function is used to evaluate the suitability of the current habitat. Since we give each position of a cuckoo one adaptable value based on SLA, we consider two QoS criteria: the positive and negative QoS criteria. The increase in positive QoS criteria like availability and reliability is useful. The decrease in negative QoS criteria like time and cost is also important for users. The fitness function should reinforce the increase in positive QoS criteria and decrease in negative QoS criteria. Also, fitness function also needs to show the priority of users. The users prefer different QoS criteria. For example, some users prefer services with a higher availability and lower response time in SLA. For calculating fitness value, the value of QoS criteria needs to be normalized according to Eqs. (8) and (9) for positive and negative QoS criteria, respectively.

$$\begin{aligned}&\zeta _i^{-} \left( \mathrm{CS} \right) =\frac{Sq_i^{-} -q_i^{-} \left( \mathrm{CS} \right) }{Sq_i^+ } \end{aligned}$$
(8)
$$\begin{aligned}&\zeta _i^+ \left( \mathrm{CS} \right) =\frac{q_i^+ \left( \mathrm{CS} \right) -Sq_i^+ }{Sq_i^+ } \end{aligned}$$
(9)

where \(\zeta _i^{-} \) and \(\zeta _i^+ \) are normalized values of the ith QoS criterion of composited services, \(q_i \) is the ith of QoS criterion, and \(Sq_i^{-} \) is the ith of QoS constrain specified in SLA. To avoid SLA violations, positive QoS criteria should be increased at the same time negative QoS criteria is decreased. We also need to consider the user’s preference in fitness value calculation. We use \(\propto _i \) as a parameter to reflect the user’s preference. The higher value of \(\propto _i \) indicates the high level of QoS criterion. As mentioned above, fitness function is expressed by Eq. (10):

$$\begin{aligned}&f\left( \mathrm{cs} \right) =\mathop {\sum }\limits _{i=1}^o \propto _i \times \zeta _i \left( \mathrm{cs} \right) \end{aligned}$$
(10)
$$\begin{aligned}&\mathop {\sum }\limits _{i=1}^o \propto _i\, =1 \end{aligned}$$
(11)

where o refers to the number of QoS criteria. The aim of proposed algorithm is to find composited services with high fitness value.

4.3.3 The third step: movement toward optimal cuckoo with applying Levy flight

According to the fitness function, the value of each cuckoo or in other words the rate of quality function of the provided service is measured. In this step, firstly, the most valuable cuckoo is chosen. Afterward, cuckoos move toward it with limited step using Levy flight. This means that the selected services have the best value based on the cost and the response time as two negative QoS criteria and availability and reliability as two positive QoS criteria. Figure 8 shows the sample of Levy flight.

Fig. 8
figure 8

Levy flight to move toward to the optimal cuckoo

It is important to note that if less valuable cuckoos are moved to the exact position of the optimal cuckoo, we will have no new responses and thus the movement and exchange will be useless. The flight function each Levy is according to Eq. (12):

$$\begin{aligned}&{L}\left( {{s}.{\gamma }.{\mu }} \right) =\sqrt{\frac{{\gamma }}{2{\pi }}}\frac{1}{\left( {{s}-{\mu }} \right) ^{\frac{3}{2}}}\nonumber \\&\quad \mathrm{exp}\left( {-\frac{{\gamma }}{2\left( {{s}-{\mu }} \right) }} \right) 0<\mu<s<\infty \end{aligned}$$
(12)

where \(\mu \) is minimum steps and \(\gamma \) is the size parameter. If \(s\rightarrow \infty \):

$$\begin{aligned}&{L}\left( {{s}.{\gamma }.{\mu }} \right) =\sqrt{\frac{{\gamma }}{2{\pi }}}\frac{1}{\left( {{s}-{\mu }} \right) ^{\frac{3}{2}}} \nonumber \\&s\rightarrow \infty \end{aligned}$$
(13)

We use Eq. (14) to generate the random step s.

$$\begin{aligned} {s}=\frac{{u}}{\left| {v} \right| ^{1/{\beta }}} \end{aligned}$$
(14)

where v and u are usually random variables.

$$\begin{aligned} {u}\sim {N}\left( {0.{\sigma }_{{u}}^2 } \right) ,\quad {v}\sim {N}\left( {0.{\sigma }_{{u}}^2 } \right) \end{aligned}$$
(15)

Subject to:

$$\begin{aligned} {\sigma }=\left\{ {\frac{{{\Gamma } }\left( {1+{\beta }} \right) \sin \left( {{{\pi \beta }}/2} \right) }{{{\Gamma } }\left| {{\left( {1+{\beta }} \right) }/2} \right| {\beta }2^{{\left( {{\beta }-1} \right) }/2}}} \right\} ^{1/{\beta }} \end{aligned}$$
(16)

where \({\varGamma } \) is the gamma function. A distribution obtained from Eq. (14) for s will be a Levy distribution to \(|S|\ge |S_0 |\) that \(S_0\) is the smallest step. After Levy flight, if the next position has much superior fit, the next position should be replaced as a better position. Otherwise, the previous position is preserved.

4.3.4 The fourth step: creation random movement pattern for all cuckoo’s population

To make a better distribution between lower bound (LB) and upper bound (UB) to achieve the more probable answers, a random movement is used for the entire population with the Levy flight. In this step, the value of S is calculated by Eq. (17). Thus, those two cuckoo positions will be selected randomly. The difference between those two positions is calculated, and the result is multiplied by the random value.

$$\begin{aligned} S= & {} \mathrm{rand}.\left( \mathrm{nest}\left( \mathrm{{randperml}\left( n \right) .:} \right) \right. \nonumber \\&\left. -\mathrm{nest}\left( {\mathrm{randperm2}\left( n \right) .:} \right) \right) \nonumber \\&\mathrm{nest}^{t+1}=\mathrm{nest}^{t}+S*P \end{aligned}$$
(17)

where nest is the position matrix of all cuckoo’s, \(\mathrm{nest}^{t}\) is the current position cuckoo’s and \(\mathrm{nest}^{t+1}\) will be the next position cuckoo’s. To calculate the next position, we should calculate the value of P according to Eq. (18):

$$\begin{aligned} P=\left\{ { \begin{array}{ll} 1&{}\quad \mathrm{if \,\,rand}\,<P.a \\ 0&{}\quad \mathrm{if \,\,rand}\,\ge P.a \\ \end{array}} \right. \end{aligned}$$
(18)

The value of P is determined by the initial parameter Pa. Thus, a random number is generated (rand); if the random number is larger than Pa, the previous position is maintained, whereas if the random number is smaller than Pa, the next position based on the S is achieved.

4.3.5 The fifth step: selection of the best cuckoo

The best cuckoo will be selected as a generation response. After a few iterations of all cuckoo’s population, they will have the greatest overall benefit to a point which is an optimum level of superior fit and in which the lowest number of eggs will be disappear. Best position specifically is those services that are selected to offer the request.

5 Performance evaluation

In this section, we evaluate the effectiveness of applying the proposed CSA-WSC for combining web services in distributed cloud environments. Some web services are selected and then composited and given to the applicant as a service set. The selection of the best web services for a workflow based on QoS criteria is the major role of the proposed algorithm. First, we describe the performance criteria and the simulation settings, and then we present and discuss experimental results.

5.1 Performance criteria

We applied the following the performance QoS criteria for a comparison of proposed algorithm with other algorithms:

Response time is the amount of time between the start of the request and the receipt of the response by the user from the cloud.

Cost is the amount of money the user pays for the request for any virtual machine, based on the amount of memory, processes, and bandwidth consumed. It is calculated by Eq. (19):

$$\begin{aligned} \mathrm{Cost}=\sum _{i=1}^K {C_i \times } T_i \end{aligned}$$
(19)

where K is the number of virtual machines allocated to user requests, \(C_i\) is the cost of a virtual machine, and \(T_i\) is the time for which the user can use the virtual machine.

Reliability is an indicator of the successful running of a task by the virtual machine in a datacenter and is expressed by Eq. (20) as follows (Koren and Krishna 2010):

$$\begin{aligned} \mathrm{Reliability (RE)} V_k =\frac{C_k }{A_k } \end{aligned}$$
(20)

where \(A_k \) is the number of tasks accepted by a virtual machine \(V_k \), and \(C_k\) is the number of tasks completed by the virtual machine of \(V_k\) in a time limit T.

Availability is the possibility of accessing the virtual machines requested by a user from a datacenter. Suppose \(V_1 ,V_2 ,V_3 ,...,V_n \) are virtual machines in a datacenter; for any \(k=1,2,3,...,n\) the percentage availability for a virtual machine of a datacenter is calculated using Eq. (21) as follows (Bauer and Adams 2012):

$$\begin{aligned} \mathrm{Availability (AV)} V_k =\frac{\mathrm{MTTF}_k }{\mathrm{MTBF}_k }=\frac{\mathrm{MTTF}_k }{\mathrm{MTTF}_k +\mathrm{MTTR}_k }\nonumber \\ \end{aligned}$$
(21)

where MTTF is the mean time for which the resource works correctly without failure; MTTR is mean time to repair the resource after failure, and MTBF is the mean time between two failures in a resource.

Note that for any datacenter, the average availability, the average reliability, the average cost and the average response time can be computed.

5.2 Experimental setup

The experiments presented in this section were developed using cloudsim toolkit (Calheiros et al. 2011). The structure of all the datacenters used in the simulation is shown in Table 10. In each data center, there is a host. In Table 11, each host specification is detailed. When the hardware hosted is more powerful and of a higher level, the cost of access to the resources on the virtual machine on this host increases.

Table 10 The datacenters specification
Table 11 The host specification in a datacenter

Since one of the effective criteria is response time, the amount of latency or user interval from the datacenter is set to a random value with a normal distribution based on the distance between users and the datacenter, user and the service set, and the datacenters shown in Table 12. Furthermore, parameter settings for the algorithms are provided in Table 13. Also, the values of four QoS criteria of datacenters, response time, cost, availability and reliability were produced at the start of the simulation process shown in Table 14.

Table 12 Values of communication delay time between user, datacenter and service set
Table 13 Parameter settings of the algorithms
Table 14 Values of four QoS criteria of datacenters

5.3 Experimental analysis

To evaluate the proposed algorithm, the experimental evaluation is formed in three different scenarios according to Table 15. We compared the proposed algorithm with three most effective WSC algorithms as follows: A genetic-based generic algorithm which considers the network delay (GS-S-Net) (Wang et al. 2015); a genetic particle swarm optimization algorithm (GAPSO-WSC) (Faruk et al. 2016) is used for discovering optimum regions from complex search spaces via the collaboration of individuals in a crowd of particles; a greedy-based algorithm for web service composition (Greedy). In the Greedy algorithm, the user’s region is first determined. Then, the largest datacenter in terms of free resources is chosen from the available datacenters and allocated to the user. Also, the simulation results of QoS criteria under CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in each scenario are reported.

Table 15 Proposed scenarios for evaluating algorithms

5.3.1 The first scenario

In this scenario, the number of sample services is set 10, and the number of services is set 200, 400 and 500, respectively. First, the effect of the number of atomic services per set on the function values of GS-S-Net and CSA-WSC algorithms is evaluated (see Fig. 9). The fitness function values of the CSA-WSC outperform GS-S-Net and GAPSO-WSC algorithms, while it has higher integration in finding solutions. It is also can be seen that the CSA-WSC has less deviation in fitness function because it has faster convergence than the GS-S-Net algorithm and thus there is less possibility not to give the expected fitness value. The GAPSO-WSC algorithm has better performance compared with GS-S-Net because it uses PSO algorithm besides genetic algorithm, which provides better exploitation and exploration in the search space. While the increase in fitness results in the decrease in cost and Fig. 9 shows that the CSA-WSC provides better fitness compared with other algorithms, the cost incurred by applying CSA-WSC is reduced (see Fig. 10).

Fig. 9
figure 9

The comparison of fitness values in the CSA-WSC, GAPSO-WSC and GS-S-Net algorithms in the first scenario

Based on the QoS criteria discussed in Sect. 5.1, Fig.10 shows the cost of three CSA-WSC, GS-S-Net and Greedy algorithms with the number of requests from 1000 to 10,000.

Fig. 10
figure 10

The cost of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the first scenario

According to Fig. 10, it is clear that the cost of the Greedy algorithm is more than that of other algorithms, while it has a very simple idea of finding a suboptimal solution without any complexity. In contrast, the proposed CSA-WSC decreased the WSC cost to an acceptable level. Actually, cuckoos are able to make better solutions compared to the other algorithms and thus the incurred cost of dedicated and service composition is reduced. According to Fig. 10, when the number of requests increases, the value of reduction cost increases significantly. The second most important criterion is the availability of the service. Figure 11 shows the provider’s availability over a different number of requests for the first scenario.

Fig. 11
figure 11

The service availability of the CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the first scenario

According to Fig. 11, provider’s availability in the CSA-WSC is higher than other algorithms, which specifies that the proposed CSA-WSC provides a more appropriate distribution of requests over web services. This means that requests are better distributed among service sets that lead to increasing in availability. The reliability or the trust of the service provider is another important criterion. Figure 12 shows the service providers reliability level of different CSA-WSC for the first scenario.

Fig. 12
figure 12

The service reliability of the CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the first scenario

Fig. 13
figure 13

Average of requests response time in the CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the first scenario

Figure 12 indicates that the CSA-WSC has a higher reliability. When a service is selected appropriately, the number of failed or uncompleted tasks is reduced in a provider. When a number of failed tasks is low in a service provider, this means that the reliability of the provider is in high level. Another important criterion is the response time of requests. We executed each simulation five times, and the average of results is provided in figures. The results of Fig. 13 show that the Greedy algorithm has less response time and has high speed in responding. However, when the request increases, the execution time of the Greedy algorithm increases more. Hence, GS-S-Net and CSA-WSC algorithms have higher execution time in executions with a lower number of requests. In contrast, by increasing the number of requests, the incurred cost of the Greedy algorithm compared to other algorithms becomes more significant, while GS-S-Net and CSA-WSC algorithms still able to choose the best composition of web services regardless of increasing the number of possible solutions. Thus, we can conclude that the better service composition compared to the execution time is more important in case inaccurate service composition results in cost, less reliability and/or less availability. Furthermore, Fig. 14 illustrates the box plot of different algorithms for different QoS criteria. As can be seen, the median line of the positive QoS criteria (availability and reliability) for the proposed CSA-WSC is higher than the comparing algorithms, which shows improvement in this terms. Also, the median line of negative QoS criteria (cost and time) for the proposed algorithm is lower than the comparing algorithms. Thus, the ranges of all quartiles in all QoS criteria confirm that the proposed algorithm outperforms other algorithms.

Fig. 14
figure 14

Boxplot of different algorithms for different QoS criteria in the first scenario

5.3.2 The second scenario

In this scenario, the number of services is 30, and the service set is 200, 400 and 500. Figure 15 shows the fitness function values for CSA-WSC, GS-S-Net and GAPSO-WSC algorithms over a different number of atomic service per set. Due to the great convergence of CSA-WSC, its fitness compared to the GAPSO-WSC and GS-S-Net algorithm is better. While the increase in fitness results in the decrease in cost, the proposed CSA-WSC provides a better fitness compared to other algorithms and thus the cost incurred by applying CSA-WSC is reduced (see Fig. 15).

Fig. 15
figure 15

The comparison of the fitness values in the CSA-WSC, GAPSO-WSC and GS-S-Net algorithms in the second scenario

Figure 16 shows the amount of cost in CSA-WSC, GAPSO-WSC, GS-S-Net and the Greedy algorithms with a different number of requests categorized between 1000 to10000. As seen in Fig. 16, the incurred cost by the Greedy algorithm is more than other algorithms. In comparison with GAPSO-WSC and GS-S-Net algorithms, CSA-WSC decreases the amount of dedicated cost and the composition service. Accurate composition of web services by cuckoos reduced the incurred costs. Figure 16 illustrates that the increase in the number of requests brings more and more considerable decrease in the incurred cost by the proposed CSA-WSC compared to other algorithms. It is also done because of the availability of service set. Figure 17 shows the amount of availability of providers in the whole process of the second scenario.

Fig. 16
figure 16

The cost of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the second scenario

Fig. 17
figure 17

The service availability of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the second scenario

According to Fig. 17, provider’s availability in the CSA-WSC is higher than that of the other algorithms, which indicates that proposed algorithm provides more appropriate distribution requests. This means that requests are better distributed among service sets. Decreasing the busy time of service sets and increasing their availability depend on the more suitable distribution. The third important criterion is service provider reliability. Figure 18 shows the provider’s service reliability in the whole second scenario process.

Fig. 18
figure 18

The service reliability of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the second scenario

Figure 18 indicates that reliability rate in the CSA-WSC algorithm is higher. When a service is selected appropriately, failed or uncompleted task occurs less in a provider. When the number of failed-task is reduced in a service provider, this means that the reliability of the provider is increased. Another important criterion is request response time. Figure 19 shows the average requests response time for different algorithms over a different number of requests.

Fig. 19
figure 19

Average requests response time on three CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the second scenario

Figure 19 shows that the Greedy algorithm has a lower response time and has high speed in responding fore case with a number of requests lower than three thousand. However, an increase in the number of requests results in higher response time for the Greedy algorithm compared to the other algorithms. The reason that the algorithms are low in the light number of requests is low-selected action in the cuckoo search algorithm and genetic algorithm. Therefore, the increase in the number of requests leads to better selection of web service composition sets. Thus, we can conclude that service composition quality compared to the execution time is more important in case inaccurate web service composition for cloud applications results in cost, less reliability and/or less availability. Figure 20 depicts the box plot of different algorithms for different QoS criteria. As shown, the median line of the positive QoS criteria for the proposed CSA-WSC is higher, while the median line of the negative QoS criteria of the CSA-WSC is lower compared to GAPSO-WSC, GS-S-Net and Greedy algorithms. Therefore, the ranges of the upper and lower quartiles in QoS criteria show that the CSA-WSC outperforms other algorithms.

Fig. 20
figure 20

Boxplot of different algorithms for different QoS criteria in the second scenario

5.3.3 The third scenario

Similarly to other scenarios, the sample size is considered 60 and the number of services set 200, 400 and 500, respectively, in this scenario. To compare the performance of the GAPSO-WSC, GS-S-Net and CSA-WSC algorithms, fitness function values are evaluated over a different number of atomic services per set in Fig. 21. As can be seen, due to higher integration of CSA-WSC, the fitness function value generated by this algorithm is better than by GAPSO-WSC and GS-S-Net algorithms. Given that the task of fitness function is to minimize the cost, Fig. 22 indicates that the CSA-WSC reduces the incurred cost more than GAPSO-WSC and GS-S-Net algorithms.

Fig. 21
figure 21

The comparison of fitness values in the CSA-WSC, GAPSO-WSC and GS-S-Net algorithms in the third scenario

Given the important criteria discussed at the beginning of the evaluation, Fig. 22 shows the cost of the three CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms with the number of requests between 1000 and 10,000. According to Fig. 22, it is clear that the cost of Greedy algorithms is more than other algorithms. It specifically becomes more significant as the number of requests increases while in this case the algorithm faces with a wider pool of solutions for the problem. In comparison with GAPSO-WSC and GS-S-Net algorithms, the proposed CSA-WSC reduced the total incurred costs. The reason of reducing the total costs has faster convergence that causes better fitness function values, and thus it brings more optimal solutions.

Fig. 22
figure 22

The cost of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the third scenario

According to Fig. 22 when the requests number increases, the cost also significantly increases. However, the proposed algorithm is more stable in finding near-optimal solutions while the number of requests increases. Consequently, the incurred costs of the proposed CSA-WSC are much less than other algorithms in case the number of requests is high. Service availability, as another important criterion, for the proposed CSA-WSC has a better situation compared to other algorithms (see Fig. 23). It shows that requests are better distributed among service sets. Decreasing the busy time of services sets and increasing their availability depend on the more suitable distribution.

Fig. 23
figure 23

The service availability of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the third scenario

Fig. 24
figure 24

The service reliability of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the third scenario

Fig. 25
figure 25

Average requests of response time of CSA-WSC, GAPSO-WSC, GS-S-Net and Greedy algorithms in the third scenario

Fig. 26
figure 26

Boxplot of different algorithms for different QoS criteria in the third scenario

Figure 24 shows providers service reliability in the third scenario. As can be seen, it indicates that the reliability of the CSA-WSC is higher. When a service set is selected appropriately, the number of failed or incomplete tasks occurs less. When the number of failed tasks is reduced in a service provider, this means that the reliability of the provider is increased. Figure 25 shows request response time for different algorithms over a different number of requests.

Results Fig. 25 specifies that the increase in the number of services results in the higher incurred costs for GAPSO-WSC, GS-S-Net and Greedy algorithms, while the proposed CSA-WSC algorithm was not much affected by the number of requests. Thus, it works better because of finding better solutions. The CSA-WSC in a pool of very large solutions can be suitable as well as in the smaller solutions, while its fitness convergence is quite faster than other algorithms. Figure 26 illustrates the box plot of different algorithms for different QoS criteria. As can be seen, the median line of the positive QoS criteria for the proposed CSA-WSC is higher, while the median line of the negative QoS criteria of the CSA-WSC is lower compared to GAPSO-WSC, GS-S-Net and Greedy algorithms. Thus, the ranges of the upper and lower quartiles in the QoS criteria show that the CSA-WSC outperforms other algorithms.

5.3.4 Total comparison

In this section, we sum up all the evaluations of the three prior scenarios. To do so, the superiority of the algorithms regarding the quality of service is evaluated.

As mentioned in the previous subsections, the convergence speed of the cuckoos is very effective on finding the near-optimal solutions. Here, we first evaluate the convergence speed of the proposed CSA-WSC in three scenarios. Figure 27 illustrates the fitness function values of the three scenarios over time generation. As is seen, the fitness values for the first scenario grow faster than other scenarios, while it seems that the problems in this scenario have been easier to be solved. Furthermore, the fitness function values in the second scenario are also increased faster than the third scenario. It shows that the most challenging scenario is the last one as the fitness function over time generation indicates this fact.

According to Fig. 27, proposed CSA-WSC converges at a faster rate compared with other baseline algorithms. Besides from converge speed, the proposed algorithm provides better fitness in all scenarios. Thus, Fig. 27 proves the fact that cuckoos provide a fast increase in the fitness value over time generation and thus convergence of the algorithm to the target solution is quite fast.

Fig. 27
figure 27

Performance of CSA-WSC, GAPSO-WSC, GS-S-Net algorithm over generation

Fig. 28
figure 28

Average requests response time in three positions 10, 30 and 60 samples of service

According to Fig. 28, it is clear that the average requests of response time on the Greedy algorithm are higher than those on other algorithms. By comparing the values of GAPSO-WSC, GS-S-Net and CSA-WSC algorithms in Fig. 28, it becomes clear that the low diversity of service and proportion of the number of low service set, there is a slight time difference between algorithms. By increasing any service diversity, CSA-WSC response time reduces compared to GAPSO-WSC and GS-S-Net algorithms, which represents the optimal performance of this algorithm in the high services diversity.

Fig. 29
figure 29

Average cost of service composition in three positions 10, 30 and 60 samples of service

Table 16 Statistical comparison of proposed CSA-WSC with other baseline algorithms

Figure 29 shows the resource allocation costs on three algorithms. According to Fig. 29, it is clear that when the numbers of providers and thus the number of sample services increase, the CSA-WSC decreases the availability of the composition cost and dedicates services. Regarding two main criteria such as response time and resource costs, it is concluded that the proposed CSA-WSC. Finally, the proposed CSA-WSC is a more suitable algorithm for web service composition problem in the geographically distributed cloud environment. While experimental results in some cases and criteria seem to be hard for evaluation, we here conduct paired t test in order to show whether there are significance differences between the performances of the algorithms. To do so, we executed the algorithms under three different number of instance services and ten different number of requests. Hence, we had 30 different experiments. Each experiment with a specific number of instance service and the number of requests is executed ten times, and the median results for the experiment are considered as the experiment result.

Table 16 shows the statistical results of paired t-test that determine the significance level of proposed algorithm compared to other algorithms regarding fitness, cost, availability and reliability. The table provides criteria, an algorithm to be compared with the proposed algorithm, the number of tests, mean and standard deviation differences between the proposed and baseline algorithms, degrees of freedom, t value and p value. To do a statistical evaluation, a paired t test with a significance level of \(p<0.05\) is done to evaluate if the differences were statistically significant. As is seen in Table 16, there is a meaningful difference between proposed algorithm and other algorithms, while p value in all cases is lower than 0.05. Given certainty, more than 0.95 proves that proposed algorithm has significant improvement in terms of all criteria. Thus, the null hypothesis is rejected, and it is shown that the differences, compared with the baseline algorithms, are significant.

6 Conclusion and future work

Today, users are increasingly accustomed to using the Internet to gain software resources in the form of web services. Through service composition technologies, loosely coupled services that are independent of each other can be integrated into value-added composited services. Most web services are deployed on cloud data centers distributed geographically around the world. In this paper, we have addressed the problem of web service composition in geo-distributed cloud environments. We have proposed a cuckoo search algorithm to solve the web service composition problem. Our algorithm considers not only the QoS of the web services but also the network QoS. The results of the simulation indicate that our algorithm can achieve a close to optimal result in terms of QoS criteria. In our future work, we aim at developing a linear programming strategy. This will make our algorithm more practical and effective.