Automatic Web service composition driven by keyword query

Yu, Dongjin; Zhang, Lei; Liu, Chengfei; Zhou, Rui; Xu, Dengwei

doi:10.1007/s11280-019-00742-5

Automatic Web service composition driven by keyword query

Published: 07 February 2020

Volume 23, pages 1665–1692, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

World Wide Web Aims and scope Submit manuscript

Automatic Web service composition driven by keyword query

Download PDF

Dongjin Yu ORCID: orcid.org/0000-0001-8919-1613¹,
Lei Zhang¹,
Chengfei Liu²,
Rui Zhou² &
…
Dengwei Xu¹

345 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

Service-based systems (SBSs) reuse existing loosely coupled Web services to provide value-added composite ones, which brings about much flexibility when the business changes frequently. The advent of automatic Web service composition technology allows system designers to quickly build SBSs without having to manually create process models. Despite the large number of strategies proposed so far, most of them compose Web services through the user-provided initial inputs and expected target outputs, which is not convenient for users to express their functional requirements. To address this issue, we allow users to employ keywords to represent key tasks of the composed Web services. To automatically compose Web services based on the given keywords, we study a new problem of keyword search in the AND/OR graph constructed through semantically matching input-output interfaces of existing related Web services. Due to the complexity of the problem, we propose a heuristic search approach, called UP-DFS. To improve the performance of UP-DFS, we further design two types of pruning strategies. The empirical study shows that our approach can efficiently generate a semantic input-output-based Web service composition that contains all the key tasks in the right order required by users while minimizing the number of services in the composition.

Web Service Composition by Optimizing Composition-Segment Candidates

Comprehensive Quality-Aware Automated Semantic Web Service Composition

Research Challenges of Web Service Composition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Web service is a self-describing, modular application identified by a URI, whose public interface and bindings are defined, published and discovered using XML based protocols. However, the function of a single Web service is too simple and limited to satisfy the complex requirements of users. With the emergence of Service Oriented Architecture (SOA), the businesses have rapidly started transcending from their monolithic application designs into heterogeneous ones by decomposing their business functionalities into multiple services. SOA models the business based on coarse-grained Web services provided by different organizations to generate a service-based system (SBS), which makes the system more flexible to reuse, and more responsive to the changes of enterprise’s business. Here, to find the suitable component services and then to compose them efficiently are the key steps in the SBS engineering process, which can be translated into the problem of functional-request-oriented service composition. A typical solving process of this problem would be as follows: by analyzing the user’s functional request, an executable service composition solution is formed that satisfies the user’s requirements through effective discovery, selection, and matching in the Web service repository.

Graph search technology is widely used to solve such functional-request-oriented service composition problem. Most of the current works are based on the semantic information of the service interface parameters (input-output) [3, 4, 30]. They usually employ the similarity of semantic ontologies among Web services to build a Service Ontology Dependent Call Graph, and then introduce the graph traversal algorithm to find the composing path that satisfies the user’s requirements. However, the solution obtained by matching only inputs and outputs among services cannot accurately satisfy the personalized functional requirements when the semantic matching is ambiguous. Some works have attempted to improve the semantic service model [2, 19]. Although they make the matching of dependencies between services more accurate, they cannot guarantee that the tasks included in the composition meet the user’s specific functional requirements. In addition to the input and output (i.e., starting and ending) conditions, users also need a convenient way to express the key tasks they would like to perform.

In fact, there are already some applications allowing users to search for Web services by entering keyword such as ProgrammableWeb (www.programmablew eb.com), Mashape (www.mashape.com), and WebServiceList (www.webservicelist.com). However, as for these applications, the keywords are only used to query a single Web service rather than to compose Web services. On the other hand, there emerge a few related works with regard of Web service composition based on keyword query, published on such as ICSOC [12] and TSE [13]. For example, Q. He et al. introduced keyword search technique in the process of building SBSs [13]. Based on their approach, system engineers only need to enter some keywords to generate a composite Web service that covers the functional requirements described by these keywords. However, since it connects different Web services based on their collaboration history rather than input-output semantic associations and interactions, it ignores the special process structure of SBSs, such as AND/OR structures. This will cause the constructed SBS is not guaranteed to be executable. Besides, it is also difficult to determine the starting and ending tasks in an SBS.

To address these problems, we propose adding keyword query on the basis of traditional graph search approach based on input-output matching. We use keywords to denote the functional information of Web services, which can be extracted from the service specifications, such as WSDL, Web API documents and Web pages that contain references to Web services [10]. In addition to the initial inputs and final outputs, users can also request key tasks contained in the SBS in the form of keywords according to their functional requirements. In other words, the user’s requirements are compared with the keywords of Web services, and these matched Web services are considered for service composition. There have been many approaches on how to match Web services by keywords [20,21,22, 33]. Keyword searching by considering semantic similarity has also been studied [26]. Because this is not the focus of our work, we do not discuss in detail how the user requirements are matched with Web services based on keywords. Furthermore, our approach also allows users to indicate the execution order among these tasks represented by keywords. The initial inputs and final outputs are used to connect discrete services in the Web service repository into a directed AND/OR graph. Given a set of keywords that describe the tasks of an SBS, our approach returns a subgraph of the directed AND/OR graph that contains all the keywords required by the user. Finally, in large-scale scenarios, there are always a great number of possible composition solutions, among which those with smaller number of services can reduce communication overhead and are thus easy to monitor and deploy. Therefore, our approach also targets to efficiently find the composition solution with the minimum number of services.

The main contributions of this paper are as follows. 1) We propose a novel paradigm for automatically composing Web services in a convenient way, by introducing keyword query in the AND/OR graph constructed through semantically matching input-output interfaces of Web services. 2) We design an efficient heuristic search approach, called UP-DFS, and combine indexing with search to perform the uniqueness (the same keyword appears only once) and sequentiality (the keywords appear in a certain order) constraints of keywords in the result. 3) We propose an efficient upper-bound global pruning strategy that first finds a good solution eagerly to serve as an initial upper-bound, and then uses the upper-bound to prune unpromising solutions with estimated minimum cost for the services to be included or for the service nodes to be traversed. 4) We conducted extensive experiments on five datasets which were also employed by the work [32]. According to the evaluation results, the proposed UP-DFS can efficiently generate a semantic input-output-based Web service composition that contains all the keywords given by users while minimizing the number of services in the composition.

The rest of this paper is organized as follows. After giving an overview of the related work in Section 2, Section 3 shows a motivating example, and proposes the relevant concepts and the formal problem definition. Then, Section 4 describes a baseline approach and an advanced one to solve the proposed problem. The experimental results are demonstrated in Section 5. Finally, Section 6 concludes our work and shows the future work.

2 Related work

The traditional SBS construction process is usually divided into three stages: system planning, service discovery and service selection. This is mainly a manual Web services composition that requires the designer to have a good understanding of SOA knowledge. Therefore, the automatic Web services composition has been extensively studied during the past few years by the industry and academia from various research perspectives. The graph-based modeling approach is one of the primary means of early research on automatic Web service composition. Rodriguez-Mier et al. proposed a formal integrated graph-based composition framework anchored on the integration of service discovery and matchmaking within the composition process [31]. The framework includes an optimal composition search algorithm to extract the best composition from the graph minimizing the length and the number of services. Later they extended the work by presenting a hybrid algorithm to automatically build semantic input-output based compositions minimizing the total number of services while guaranteeing the optimal quality of service (QoS) [32].

Recently, the Web service composition based on automatic planning (AP) has gradually become the focus of research. Graiet et al. proposed a formal model for the SCA specification focusing on its behavioral aspect [11]. The Event-B formalism was employed for modeling behavioral properties. Jungmann et al. applied the service composition paradigm to the image processing domain [16]. They proposed the knowledge-based specification of image processing functionality and a planning-based composition algorithm. Similarly, Abdullah et al. incorporated service and agent computing paradigms and proposed an agent-based approach, called WSC [1]. According to their approach, upon receiving a user composition request, agents perform internal reasoning and corporate through a communication protocol attempting to find a solution.

There are also some influential works that use AI search algorithms to solve the problem of automatic Web services composition. For example, Oh et al. proposed a planning-based service composition algorithm called WSPR [27]. After two steps of heuristic forward search and backward search, WSPR generates service composition solution in polynomial time. To address the problem that WSPR algorithm needs to repeatedly parse the input and output parameters of the service in the forward search, Jiang et al. proposed to generate service inverted index in the Web services library, thus solving the performance bottleneck caused by frequent parsing of Web services [15]. In order to avoid the backward regression search in WSPR, Zheng et al. proposed a composition algorithm based on planning graph [34]. The algorithm uses the initial state of the composition request as the initial proposition of the planning graph, and then iteratively expands the action layer and proposition layer of the composite services through four dead-end services deletion strategies until the proposition reaches the target state or a stable state. However, these methods consider only the initial state and the final target state, paying no attention to the functionalities of specific services in the solution.

Setting functional keywords for Web services is an effective way to refine user requirements. There have been a lot of approaches proposed to query the corresponding Web services by keywords. Their main idea is to describe Web services using ontology-based semantic Web service description languages and design logic-based service retrieval inference algorithm. For example, Klusch et al. proposed service match-makers for different types of semantic services, e.g., SAWSDL-MX [21], OWLS-MX [20], and WSMO-MX [22]. Zhang et al. extended the traditional query method by mining domain knowledge about service function from textual descriptions of services [33]. In this way, SBS designers can accurately obtain the services related to the given keywords. However, these keyword-based approaches are only used to find a set of functionally similar candidate Web services for a single defined task in SBS, which are applied only to the service discovery stage.

An innovative keyword-based searching approach is indispensable if designers want to build SBSs directly based on queried services rather than go through all the complicated phases (system planning, service discovery and service selection). Riabov A. et al. developed an intelligent automatic composition engine which uses tag-based service description [29]. Inspired by this, the authors of [24] proposed a graph-based planning technique to recommend possible compositions by matching the entered source tag and the end tag with the first and last services in the composition. Later, Huang et al. described a generic data-driven model to specify Web services and their combinatorial logic [14]. The model applies a Steiner tree based algorithm to retrieve and then rank the possible compositions. In order to break the limit of the number of keywords in a single query, He et al. proposed KS3, a method that models the Web service composition as a keyword-driven integer programming problem and uses CPLEX solver to generate a compositional solution with the optimal QoS [13]. Later they improved their work in [12] by proposing KS3+, a highly efficient approach using dynamic programming. However, these approaches have some important limitations: 1) They require that the keywords of the first and last task must be given in each query. 2) They cannot deal with the AND/OR relations which are unavoidable for branching while composing Web services. 3) They fail to solve the cold-start problem.

Actually, the keyword search over graphs has been studied in database community where the results are modelled as minimal connected trees or graphs that contain the query keywords. For group Steiner tree semantics, BANKS-I [5] and BANKS-II [17] are the pioneer works. BANKS-I was a backward search algorithm for finding the results, while BANKS-II extended BANKS-I to search bi-directionally. To improve the efficiency, Ding et al. proposed a parameterized dynamic programming algorithm using the number of groups as a parameter [8], which is a generalization of the Dreyfus-Wagner algorithm for the traditional Steiner tree problem [9]. To further speed up the process, Li et al. proposed newly-developed optimal-tree decomposition and conditional tree merging techniques, and developed several pruning bounds [23]. The other popular graph-based approaches modeled the result as r-clique [7, 18], which provided more compact relationship between different keyword nodes. In order to avoid duplication among results, the work [25] defined the smallest k-compact tree as the result of keyword query. However, to the best of our knowledge, no work has reported on keyword search over AND/OR graphs, which is the focus and contribution of this paper.

In summary, despite the large number of approaches for automatic service composition, there is still a lack of effective techniques that are capable of generating composition solution which satisfies user’s specific functional requirements. In this paper, we advance the current works about Web service composition based on input-output matching and keyword query. We investigate how to effectively employ keyword search technology to efficiently find the near-optimal solution for Web service composition. Our approach can help a user quickly build a Web service composition solution by entering a few keywords that represent the key tasks (functions) of the SBS. It breaks through the limitations of existing approaches in the following aspects. First, our approach can handle the AND/OR graphs driven by data flow relationships. Secondly, users may specify the execution order of those tasks represented by given keywords. Thirdly, it can effectively minimize the number of services of the composition.

3 Problem description and definition

3.1 A motivating example

Figure 1 shows a motivating example. Negative competition on the Taobao Website is always there. Unscrupulous merchants may employ the ‘Online Water Army’ (refers to the hordes of people who are paid to post comments on the Internet) to maliciously comment the commodities of their competitors. Suppose the regulator wants to review whether there are malicious comments toward a Taobao order by processing several key tasks, that is, extracting the buyer’s identity information, extracting his (her) account information and reviewing all the comments he (she) posted. For convenience, these functional requirements can be represented by the following keywords: Get Identity Info, Get Taobao Account and Get Comments. The Web services corresponding to the three key phrases (or keywords) are marked in the figure.

The figure represents the service dependency network with all the relevant services constructed by the initial inputs {Order Info} and the final outputs {Boolean}. Here, services and input-output parameters are represented by rectangles and circles, respectively. The edges connecting outputs and inputs represent the semantic matching between input-output parameters. It can be expected that a service will only be executed if each of its inputs is matched by an output of a preceding service, which indicates the AND structure in the graph. Besides, as we can see, there are some inputs that can be matched by more than one output. For example, Identity Info can be matched by either Check Identity or Extract Identity services colored pink as shown in Figure 1. This corresponds to the OR structure in the graph, implying that there are different service composition solutions contained in this graph. Obviously, the final composition solution should contain those services corresponding to the user given keywords and other possible services that assist them. It is also important to note that the task of Get Identity Info must be executed before that of Get Taobao Account. Because the current Taobao account may not have been authenticated, in this case we cannot obtain the buyer’s exact identity information from the Taobao account. Conversely, in any case, we can first obtain the buyer’s identity information from the order information, and then query the Taobao account (one or more) associated with the buyer based on his (her) identity information.

Given these composition requests, the goal of this paper is first to automatically build a graph with AND/OR structures like the one shown in Figure 1, then to find a subgraph for the final composition solution with the minimum number of services and their required sequence from that graph. Because the keywords reflect what a user actually requires, we believe the service composition based on both keyword query and input/output matching is more applicable in a real scenario than that only based on input/output matching.

3.2 Semantics-based Web service model

A Web service is a software component that consists of some inputs and outputs. In this paper, we connect individual Web services by matching their inputs and outputs. In other words, service A may be the predecessor of service B on the condition that some outputs of service A semantically match at least one input of service B. We herein define the following concepts for building Web service model, which is an extension of the traditional one by allowing to specify keywords.

Definition 1

Web Service. A Web service is defined as a triple s =< In_s, Out_s, k_s > ∈ W, where In_s denotes the set of inputs required to invoke s, Out_s denotes the set of outputs generated after the execution of s, k_s denotes the functional description of service s, and W denotes the Web service library which contains all Web services in a particular domain.

It should be noted that the Web service in this paper is fine-grained, aiming at only one piece of function, indicated therefore by one keyword. If one Web service relates to more than one keyword, meaning it fulfills multiple functions, it would be split into multiple service nodes in the initial matching graph mentioned later, each corresponding to one keyword. On the other hand, different services may share the same keyword or keyword of synonym if they correspond to the same function. For example, both Baidu Map API and Google Map API are marked with the keyword of Provide Map Service. Besides, the word keyword in this paper may actually refer to a phrase. For example, although Extract Taobao Account consists of three words, it is considered as one keyword, but not three.

Definition 2

Ontology Concept Set. Given a Web service s, the semantic concepts of its inputs and outputs belong to a set of concepts \(C\left (I n_{s}, O u t_{s} \in C\right )\), which is defined in an ontology Ont.

Ontology is a unified way to define and describe classification standards and classification networks. It is mainly used to explore the connections among various entity concepts in the world. According to the concept set C and the ontology Ont, we can find out the semantic relationship between any input and output from different services.

Definition 3

Matching Function Between Concepts. In our work, we employ four traditional ontology semantic matching rules (exact, plugin, subsume and fail) to measure the matching degree between two concepts [28]. Given two concepts c_i, c_j ∈ C, we define function MATCH \(\left (c_{i}, c_{j}\right )\) to check whether c_i, c_j matches. We consider MATCH \(\left (c_{i}, c_{j}\right )\) returns true if the matching degree of them is not ‘fail’.

Definition 4

Matching Degree Between Concept sets. Given two sets of concepts \(C_{1}, C_{2} \subseteq C\), we use a class/subclass relationship to measure the matching degree between them. Three cases are discussed as follows:

1)
If MATCH \(\left (c_{1}, c_{2}\right )=true\left (\forall c_{2} \in C_{2}, \exists c_{1} \in C_{1}\right )\), we consider C₂ is a subclass of C₁(\(C_{2} \subseteq C_{1}\)), i.e., C₂ is fully matched by C₁;
2)
If MATCH \(\left (c_{1}, c_{2}\right )=true\left (\exists c_{2} \in C_{2}, \exists c_{1} \in C_{1}\right )\) and MATCH \(\left (c_{1^{\prime }}, c_{2^{\prime }}\right )=false\left (\exists c_{2^{\prime }} \in C_{2}, \forall c_{1^{\prime }} \in C_{1}\right )\), we consider C₁ and C₂ have common elements (C₁ ∩ C₂≠∅), i.e., C₂ is partially matched by C₁;
3)
If MATCH \(\left (c_{1}, c_{2}\right )=false\left (\forall c_{2} \in C_{2}, \forall c_{1} \in C_{1}\right )\), we consider C₂ is disjoint from C₁(C₁ ∩ C₂ = ∅), i.e., C₂ is not matched by C₁.

Based on the above definitions, for a single output parameter \(o_{s_{i}}\) of service s_i and a single input parameter \(i_{s_{j}}\) of service s_j, we consider that \(i_{s_{j}}\) is matched by \(o_{s_{i}}\) when MATCH (\(o_{s_{i}}, i_{s_{j}}\)) = true. Furthermore, given a set of available outputs \(O u t \subseteq C\), we consider a service s can be executed on the condition that the inputs of sIn_s can be satisfied by Out, that is, In_s must be fully matched by \(O u t\left (I n_{s} \subseteq O u t\right )\).

3.3 Graph model for Web service composition

In this paper, we organize a Web service library according to the ontology information between input-output interfaces of services. We herein concrete two heterogeneous directed graph models, Initial Matching Graph and Composition Graph, to formally define the composition problem based on keyword search.

Definition 5

Initial Matching Graph. Given a Web service library W, an ontology Ont and a composition request R = {I_R, O_R} (I_R, O_R ⊆ C), where I_R represents the initial inputs that the user provides and O_R represents the final outputs the user expects, we can extract relevant services from W and construct an initial matching graphG_I = (V, E) by matching the input and output parameters between services one by one. The vertex set V is constructed as S ∪ P, where S denotes the set of service node, and P denotes the set of parameter nodes. Here, S = S_R ∪{s_o, s_d}, where S_R ⊆ W is the set of relevant services in the graph. s_o =< ∅, I_R, ∅ > and s_d = < O_R, ∅, ∅ > are two special virtual services, and corresponding to the original and destination vertexes in the graph.

In an initial matching graph, s_o does not hold any input and its outputs are user’s initial inputs I_R. On the contrary, s_d does not hold any output and its inputs are the final outputs O_R. Neither of them contains any keywords. Besides, we merge the semantically matched input-output parameters into one parameter node. Any parameter node p ∈ P can serve as both the output of the predecessor service and the input of the successor service. The edge set E is constructed as SP ∪ PS, where SP ⊆ {(s, p)|s ∈ S ∧ p ∈ P} is a set of edges which connect services to their output parameters, and PS ⊆{(p, s)|p ∈ P ∧ s ∈ S} is a set of edges which connect input parameters to the services they flow into.

In fact, the initial matching graph is an AND/OR graph with the following constraints, where function d_in(v) returns the in-degree of vertex v ∈ V and |In_s| represents the number of inputs of service s.

1)
AND condition:\(\forall s \in S_{R} \cup \left \{s_{d}\right \}\left (S_{R},\left \{s_{d}\right \} \subset V\right ), d_{i n}(s)=\left |I n_{s}\right |\). For any service node s except s_o in the graph, all directed edges connected to it are logically ANDed. In other words, the service node s is an AND node;
2)
OR condition: ∀p ∈ P(P ⊂ V ), d_in(p) ≥ 1. For any parameter node p in the graph, all the directed edges connected to it are logically ORed. In other words, the parameter node p is an OR node.

For the AND condition, a service can be executed only when all of its inputs are available. As for the OR condition, when there are more than one service that can generate a same parameter p, only one service will be chosen.

Figure 2 formalizes the services dependency network in Figure 1 into an initial matching graph, where the rectangles represent Web services (service nodes, or AND nodes) and the circles represent input and output parameters (parameter nodes, or OR nodes) after merging. As we can see, the graph implies different composition solutions because some parameter nodes in the graph can be matched by different predecessors (previous service nodes). For example, we can choose either s₁ or s₅to resolve the input of s₁₀. Given an initial matching graph, our goal is to find the optimal solution where each parameter node is matched by only one service node and the number of services is minimized.

Definition 6

Composition Graph. A composition graphG_C = (V, E) is a subgraph of G_I, representing a feasible Web service composition solution. Unlike G_I, G_C is a Directed Acyclic Graph (DAG) with the conditions as follows:

1)
∀p ∈ P(P ⊂ V ), d_in(p) = 1. In other words, each parameter in the composition graph is generated by only one service node;
2)
\(\forall s \in S_{R} \cup \left \{s_{o}\right \}\left (S_{R},\left \{s_{o}\right \} \subset V\right ), \forall p \in P(P \subset V), d_{o u t}(s) \geq 1\) and d_out(p) ≥ 1. In other words, no service node is dead-ended except the destination node. Here, d_out(v) returns the out-degree of vertex v ∈ V.

3.4 Problem statement

Given a Web service library W, a semantic ontology Ont, and a keyword-based composition request R = {I_R, O_R, K_R, Q_R}, where \(K_{R}=\left \{k_{1},k_{2}, \dots , k_{n}\right \}\) (n ≥ 1) is the set of keywords to be included in the final composition solution, and Q_R represents the sequence requirements for specific keywords. The problem of generating an optimal Web service composition solution consists of building the initial matching graphG_I according to I_R, O_R, and then using G_I as the search space to further find a composition graphG_C such that the following conditions are satisfied:

1)
G_C covers all the keywords K_R = {k₁, k₂, … , k_n} and each keyword appears only once to avoid functional redundancy;
2)
The execution sequence of specific keywords satisfies Q_R;
3)
The number of service nodes in G_C, or \(\left | S \right |\ (S \subseteq V)\), is minimized.

We denote this composition graphG_C as the final composition graphG_F. Figure 3 shows an example of applying the keyword constraints on Figure 2 to find the final composition graph marked in bold line. In this case, we define k₁ = GetComments, k₂ = GetTaobaoAccount and k₃ = GetIdentityInfo. Here, K_R = {k₁, k₂, k₃}, Q_R = {< k₃, k₂ >} and the green rectangles represent the service nodes which contain these keywords (we call them keyword nodes in this paper). The directed bold-line portion of the graph in Figure 3 shows the final result that contains all the keywords and each keyword appears only once. As we can see, we choose s₁ instead of s₅ because choosing s₁ will result in fewer number of service nodes in the final solution. Besides, we choose {s₄, s₈} instead of {s₃, s₇} because k₃ is required to be invoked before k₂.

4 Composing Web services by keyword search

In this section, we introduce our solution to the composition problem based on keyword query. We first select the relevant Web services in a Web service library to generate the initial matching graph according to the user’s functional requirements. In order to improve the optimal composition search performance, we then pre-process the initial matching graph by using two graph optimization strategies, processing equivalent bridging services and eliminating dead-end services. These strategies can effectively reduce the size of the initial matching graph. Finally, we propose a baseline algorithm and an enhanced Upper-bound-Pruning-based Depth First Search (UP-DFS) algorithm to perform keyword search in the initial matching graph to obtain the final composition graph.

4.1 Generating and pre-processing initial matching graph

Given a Web service library and a semantic ontology, the first thing we need to do is to construct the initial matching graph based on the inputs that the user provides and the final outputs the user expects. The construction algorithm starts with the initial service s_o, and the relevant services are then selected layer by layer by employing the semantic matching rules presented in Section 3.2. At each layer, the algorithm selects a set of potential relevant services whose inputs can be matched by some outputs generated by the services that have been selected previously. The process ends with the destination service s_d being resolved and no more services can be added in.

The pseudocode of the construction algorithm is shown in ALGORITHM 1. It first adds s_o and s_d to S_inv as the services in the first layer and the last layer respectively. It then adds the outputs of s_o (the user’s initial inputs) to the set of available outputs C_out as the set of concepts to be matched (Line 1). For each service to be selected, the algorithm uses function MATCHED to check the matching possibilities between its inputs and the current set of available outputs (Line 7). This function returns the matched inputs of the service. If all the inputs of the service are matched, it adds the service to S_sel and then adds its outputs to C_available (Lines 10-13). Otherwise, the algorithm uses a map to record these inputs for the next round of matching verification (Line 14). This process is repeated layer by layer until no new service can be added. Finally, the algorithm uses function COMPLEMENT to complement the matching relationship between all inputs and outputs in the graph (Line 18). The generated initial matching graph may contain cycles, which will be handled when generating the final composition graph.

Once the initial matching graph is built, it is necessary to pre-process it by using some optimization strategies to reduce the graph size and remove useless Web services. We herein propose two sequential optimization strategies for the initial matching graph.

Processing equivalent bridging services

A service between any two service nodes is called a bridging service of those two. As shown in Figure 4a, Services A and B are connected via Service C, which is called a bridging service of Services A and B. Due to the availability of multiple alternative services in the repository that can provide the same functionality, in the initial matching graph, we may have collected multiple equivalent bridging services between two services, which carry the same inputs, outputs and keywords. In fact, the services with the same functionality usually share the same input and output. However, the existence of such services will unnecessarily increase the complexity when searching graph. In order to solve this problem, we propose to merge equivalent bridging services. We check three properties to determine whether some services are equivalent: input parameters, output parameters and keyword. If all three properties are the same, we abstract them into a new service. Figure 4b shows the merging process.

Furthermore, we allow a sequence of services be considered as one service set, and be merged with other bridging services or service sets if their inputs, outputs and keywords completely match. As shown in Figure 4c, the inputs of both Services C and D match the output of Service A, and the outputs of both Services C and E are matched to the input of service B. Consequently, Services D and E together constitute the equivalent bridging service of Service C. Similarly, we can replace Services D and E with Service C. However, the replacement must follow two important principles:

1)
The input of each bridging service cannot have any branch. That is, the ’bridge’ is of an absolute sequential structure;
2)
If a bridging service to be replaced contains some query keyword that is not contained in other counterpart bridging services, the replacement should not be performed.

Eliminating dead-end services

Dead-end services are the services that do not have any successor or predecessor in the initial matching graph, [6]. After the above optimization, many dead-end services may be generated. These services cannot contribute to the execution of the composition. Obviously, dead-end services should be eliminated.

In order to remove the dead-end services, we need to check for each service in the graph whether it has any successor or predecessor. If not, we remove the service and the edges that connect to it. It may bring in new dead-end services. Therefore, we repeat this process until there are no dead-end services.

The above two optimization strategies can reduce the size of the initial matching graph and eliminate the service nodes that do not contribute to subsequent processes. Next we will introduce how to extract the final composition graph from the preprocessed initial matching graph with the two proposed algorithms.

4.2 Baseline algorithm for Web service composition

Rodriguez-Mier et al. [32] proposed a global search algorithm to obtain the optimal composition by exhaustively exploring the space of possible solutions. In this section, we propose a baseline algorithm which uses the idea of their global search algorithm. Given an initial matching graphG_I = {V, E}, a set of keywords K_R = {k₁, k₂, … , k_n} and the sequence requirements for specific keywords Q_R. The baseline algorithm performs a backward expansion from the destination service s_d and preferentially traverses a wide area around it until s_o is reached, to extract the final composition graphG_F which satisfies the keyword constraints with minimized number of services. This mimics the idea of breadth-first search. Each time it is expanded, one input in G_I is resolved and the current partial solution is stored in a priority queue denoted as Q_ps, in ascending order of the number of services that have been selected. For an input of a service in G_I, the service whose output matches this input is called a candidate service of this input. When there are several candidate services that can resolve the same input, for each candidate service, the algorithm generates a copy of the current partial solution but with the input resolved by this candidate service. Once encountered a candidate service whose keyword belongs to K_R, the algorithm performs a keyword uniqueness check and a keyword sequence check to ensure the same keyword appears only once and the specific services are executed in the order in which they are specified in Q_R.

ALGORITHM 2 shows the pseudocode of the baseline algorithm. It starts with an initial matching graphG_I and a set of keywords K_R. At first, the algorithm adds s_d to S_sel, the set of services that have been selected to constitute the solution (Line 1). With the function INSERT, all the inputs of s_d are stored in In_un, a queue to record the unresolved inputs in the current partial solution (Line 3). Here, the unresolved input means that there are more than one candidate services that can resolve the input but no decision has been made yet. In order to prioritize easy-to-resolve inputs, these unresolved inputs in In_un are sorted by the number of candidate service of them.

In order to record which keywords are included in the current partial solution, the algorithm defines a key-value table K_sel < k, v >, where each key represents a keyword in K_R, and each value represents the service node corresponding to this keyword (Line 4). The current partial solution is defined as a tuple < In_un, S_sel, K_sel > and stored in a queue Q_ps sorted by the number of S_sel (Line 5), so that the partial solution with the minimum number of services will always be expanded first. For each iteration, the algorithm pops the head element of the queue for the next expansion (Line 7). If there are no unresolved inputs in the partial solution to be expanded and all the keywords queried by the user are kept in K_sel, this partial solution becomes the final composition graphG_F (Lines 8-9). Otherwise, the algorithm pops the head element of the queue In_un as the input to be resolved (Line 10). The two queues (Q_ps and In_un) allow the algorithm to expand in a breadth-first fashion overall and give priority to select the input with the least number of candidate services to expand, which avoids generating too many copies of intermediate solution graphs so as to reduce space consumption.

For each candidate service, the algorithm performs function CYCLED to check whether invoking this service will cause an inevitable cycle (Line 12). The function performs a forward breadth-first expansion from the current candidate service along the input to be resolved. During the process of expansion, the function always selects the determined inputs (resolved inputs or inputs with unique candidate) in order to guarantee that the detected cycle is inevitable. Once the function traverses to the original service, it is considered that the service will produce a cycle. For example, as shown in Figure 5, assuming that we have selected s₁₁ to resolve the input of s₉, now there are two candidate services (s₈ and s₉) for s₁₁. If we select s₉ to resolve the input of s₁₁, a cycle consisting of s₉ and s₁₁ will be generated. If the current candidate service causes a cycle, it is discarded and the next candidate service is selected (Line 12). Continue this process until the algorithm finds a suitable service to resolve the current input.

If the keyword of the current candidate service belongs to K_R, we call it a keyword node. For a keyword node, the algorithm performs the functions UNIQUE and ORDERED to check whether the keyword satisfies the uniqueness and sequence requirements (Line 12). The function UNIQUE mainly checks whether the current partial solution contains the keyword according to K_sel < k, v > to ensure each keyword appears only once. The function ORDERED checks if the composed services satisfy the order sequence. If the algorithm finds the paths between the current keyword and the keywords that have been selected, there is a reachability relationship between the two keywords, meaning the former is executed before the latter. For example, the highlighted portion of the graph in Figure 5 indicates a partial solution. Assuming that candidate service s₆ with sequence-constrained keyword (keyword that has sequence requirement) k₃ is selected to resolve the input of s₈, we check the connectivity between s₆ and other keyword nodes contained in the partial solution. After performing a forward breadth-first expansion from s₆ we find a path to s₁₀. That is, k₃ will be executed before k₄ if s₆ is selected.

If the current candidate service of the input satisfies all the requirements, then the input is marked as resolved by this candidate service. That means, the edges between the other candidates and the input must be deleted in the current partial solution. Afterwards, it removes the input from In_un, and selects the corresponding candidate service and moves it to S_sel unless it is already inside S_sel. In addition, it adds all inputs of the selected service to In_un and updates K_sel < k, v > if the selected service is a keyword node. Finally, this new partial solution is inserted in the queue for the next expansion (Line 20). If the eligible G_F is not found until Q_ps is empty, the algorithm returns ‘fail’.

4.3 Upper-bound pruning based depth first search algorithm (UP-DFS)

Although the baseline algorithm proposed above can find the optimal solution, it still suffers from several problems. 1) It uses the idea of exhaustive search to store every possible solution during the search process. This causes the number of elements stored in the queue grow exponentially when there are many OR-branches in the graph, possibly resulting in memory overflow. 2) It checks whether an OR-branch satisfies the requirements of keyword uniqueness and sequentiality only if the candidate service contains the queried keywords, which lowers the search efficiency. Therefore, in this section, we further propose the UP-DFS algorithm that can find a Web service composition solution which satisfies all the requirements more efficiently.

Due to the specific feature of the AND/OR graph, the complexity of the optimal services composition is much higher than that on normal graph, which makes traditional search algorithms not suitable for large graphs. In order to overcome the defects of the baseline algorithm and find the optimal solution efficiently within a reasonable time, we propose UP-DFS which uses well defined pruning strategies and heuristic rules in the process of searching.

Unlike the baseline algorithm, UP-DFS uses the idea of backtracking to expand the graph in a depth-first fashion. For an input of a service in G_I, the service whose output matches this input is called a candidate service of this input. Each time the graph is expanded, the algorithm resolves an input in G_I by selecting one candidate service of it. However, when expanding to a certain node, if it finds that the choice is not good enough or will not be able to reach the target, it will prune this branch rooted at this node and return to the previous step for a reselection. In this way, it is expected to obtain a solution in a short time. Although this solution may not be optimal in terms of the number of services, it can obtain a reasonable result quickly and use it as an upper bound for finding the final result. This is why we call it Upper-bound Pruning strategy. This strategy can avoid exploring unnecessary search paths, which greatly reduces the size of the search space.

Apart from the Upper-bound Pruning strategy, we propose two other pruning strategies, Pruning for Keyword Uniqueness and Pruning for Keyword Sequentiality, to further prune candidates that do not satisfy keyword uniqueness and sequentiality requirements when encountering an OR-branch. For those branches that have not been pruned, UP-DFS preferentially selects the most potential candidate for expansion, that is, the one that is more likely to result in the minimum number of services in the final composition solution.

4.3.1 Three types of pruning strategies

Pruning for Keyword Uniqueness

This pruning strategy is proposed to ensure the uniqueness of the keyword. For the proposed baseline algorithm, only when traversing to a keyword node can the algorithm verify whether the keyword has appeared in the partial solution. However, if we can predict the occurrence of keyword nodes in advance, we can verify it early. Therefore, we pre-compute, for each service node in G_I, the essential keyword nodes that have a sequence relationship with the service. The result is a collection of Node-preKeyword List, denoted as L_NP. For a service node s, L_NP(s) denotes the list of keyword nodes that s will inevitably reach during the process of reverse expansion (if s is selected, these keyword nodes must be selected later). Each element in the Node-preKeyword List consists of two parts (preknode, keyword), where preknode represents the ID of the keyword node; and keyword represents the keyword contained in the service node. The elements in the list are sorted by the order of addition. In Figure 6a, we show some parts of the Node-preKeyword List built for the graph in Figure 5. As an example, in the list for node s₆, the first element (s₁, k₁) reflects the fact that node s₁ with keyword k₁ will inevitably reach s₆.

The pseudocode of building Node-preKeyword List is provided in ALGORITHM 3. The algorithm first performs a breadth-first based forward expansion from every service node in the initial matching graph to find the successive service nodes that exclusively belong to it. Afterwards, the algorithm creates lists for these successive nodes and sets their preknode fields to the ID of the corresponding service node, sets their keyword fields to the keyword contained in the service node. The process will be repeated until encountering branches of input node.

Based on the Node-preKeyword List, we can predict the occurrence of keyword nodes in advance when making decision on OR-branches. For a branch that includes keyword nodes, we need to make sure that these keywords have not appeared in the current partial solution; Otherwise, this branch can be pruned.

Pruning for Keyword Sequentiality

This pruning strategy is proposed to ensure the sequentiality of the keyword nodes. As we know, through the Node-preKeyword List we introduced above, we can predict the keyword node that exclusively belongs to each OR-branch. Then, if we can further know the sequence relationship between the keyword node and other keyword nodes that already exist in the partial solution, the sequentiality verification can be performed in advance. Therefore, for each unresolved service in each partial solution, we compute its essential successive keyword nodes whose keywords are sequence-constrained. The result is stored in the Node-sucKeyword List, denoted as L_NS. Figure 6b shows some parts of the Node-sucKeyword List built for a highlighted partial solution of the graph in Figure 5. For a service node s, L_NS(s) denotes the list of keyword nodes that will definitely appear after s (if s is executed, these keyword nodes will be executed later). Each element in L_NS(s) consists of two parts (sucknode, keyword), where sucknode represents the ID of the keyword node; and keyword represents the keyword contained in the keyword node. The elements in L_NS(s) are sorted by the order of addition.

In fact, L_NP is built to record the relationship between the current input and the untraversed keyword nodes, while L_NS is used to record the relationship between the current input and the traversed keyword nodes. So each partial solution has its own L_NS which is dynamically updated during the process of expansion. In this way, when processing an OR-branch, we can use the two lists to check the sequence relationship between the keyword node that will be traversed next and the keyword node already contained in the current partial solution. If the sequence of execution between these two keyword nodes does not satisfy Q_R, the algorithm will cut the current OR-branch and select another candidate for expansion.

The pseudocode of building Node-sucKeyword List is provided in ALGORITHM 4. The algorithm first starts with each sequence-constrained keyword node contained in the current partial solution and performs a backward expansion along its inputs to find its predecessor node. Afterwards, it creates the Node-sucKeyword Lists for these nodes and sets their sucknode fields to the ID of the corresponding keyword node, sets their keyword fields to the keyword contained in the keyword node. The algorithm continues this process to build the list for their predecessor services until no new list can be built.

Upper-bound Pruning

Both of the above two pruning strategies are implemented for OR-branches. During the process of expanding a certain partial solution, we can effectively reduce the number of candidates that belong to the input to be resolved through these two strategies and avoid searching for invalid paths. Here, we propose another pruning strategy for each partial solution in the stack called Upper-bound Pruning. We use the idea of A^∗ algorithm, a famous heuristic search algorithm, to implement this strategy [25]. Unlike the breadth-first-search strategy used by the baseline algorithm, A^∗ algorithm considers not only the distance between the intermediate point and the original point in the searching process, but also the distance between the intermediate point and the destination point, thereby avoiding traversing every possible solution. According to A^∗ algorithm, given a partial solution denoted as G_P, we denote the minimum number of service nodes traversed from s_d to s_o (passing through the services nodes in G_P) as: f(G_P) = g(G_P) + h(G_P), where g(G_P) represents the exact cost for the traversed nodes in G_P (the number of service nodes in G_P) and h(G_P) represents the estimated minimum cost for the service nodes to be traversed from G_P.

In order to facilitate the calculation of h(G_P), we build the Node-cost Map to store the lowest cost from s_o to each service node in G_I (here the minimum number of nodes is used to represent the lowest cost). The structure of Node-cost Map built for the graph in Figure 5 is shown in Figure 6c. For any service node s in the graph, the Node-cost Map, M_ND(s), keeps two values (cost, prenodeset), where cost represents the minimum number of service nodes between s_o and s, and prenodeset represents the optimal predecessor service set for each input of s. We use the idea of Dijkstra algorithm to build the map. As shown in ALGORITHM 5, the algorithm first initializes the cost value of each service to infinity (the cost value of s_o is 0), which means that the service has not been resolved. Then it traverses forward along each output of s_o to update the cost value of successor service nodes. The cost value of a service is equal to the sum of the cost values of the predecessor services corresponding to each input of it and plus 1. If there are multiple services that match the same input, select the service with the lowest cost value.

After constructing the Node-cost Map, h(G_P) is calculated as follows. For each unresolved input in G_P, we first query Node-cost Map to select an optimal predecessor candidate (the candidate with the lowest cost value). Afterwards, we calculate the sum of the cost value of these selected candidates as the value of h(G_P). Since there may be common service nodes in these predecessor node sets, it is necessary to perform deduplication processing during the summation process (prenodeset in M_ND can be used as the pathfinding pointer).

When running UP-DFS, once we obtain a solution that satisfies all requirements, we use the number of service nodes in the solution as an upper-bound. Then for each partial solution G_P to be expanded later, we compare f(G_P) with the value of the upper-bound. If f(G_P) is no less than the upper-bound, it means that the minimum number of services that this partial solution can achieve is no less than that of the existing solution. Then, the algorithm will discard this partial solution and stop the expanding. It should be noted that the upper-bound is constantly being decreased. When a new solution with fewer service nodes is obtained, the upper-bound is updated and becomes tighter.

4.3.2 The UP-DFS algorithm

Based on the three types of indexes defined above, we herein further introduce the entire process of the UP-DFS algorithm. ALGORITHM 6 shows the pseudocode of UP-DFS. In order to record which keywords are included in the current partial solution, the algorithm defines a key-value table K_sel{< k, v >}, where k represents a keyword in K_R, and v represents the service node corresponding to this keyword (Line 4). The current partial solution is defined as a tuple < In_un, S_sel, K_sel, L_NS > and stored in a stack T_ps (Line 5). For the current partial solution to be expanded, it first compares its f value with the value of upper-bound (Line 8). If f value is greater than the upper-bound, it continues to pop the next element from the top of the stack to expand. If f value is smaller than the upper-bound and there are some unresolved inputs in the partial solution, it selects an unresolved input from In_un (Line 13). For each candidate service of the input, it uses function CYCLED to check whether invoking this service will cause an inevitable cycle. Functions UNIQUE and ORDERED are used to perform the pruning for Keyword Uniqueness and Keyword Sequentiality (Lines 14-16). Afterwards, it uses function SORT to sort the remaining candidate services (Line 17). This function gives priority to the candidate with necessary predecessor keyword nodes for expansion, which results in smaller h(G_P).

For each candidate service that has not been pruned, the algorithm generates a new partial solution where the current input to be resolved is matched only by the currently selected candidate service. Then it removes the input from the current In_un and adds the candidate service to S_sel unless it has been already there (Lines 18-20). All the inputs of the candidate service are added to In_un and K_sel < k, v > is updated if the selected service contains new keyword (Lines 21-24). Then the algorithm uses function REBUILD to build L_NS for the new partial solution (Lines 25). Finally, this new partial solution is pushed on T_ps for the next expansion (Line 26). The algorithm terminates the searching procedure when all the partial solutions in T_ps are processed.

5 Experimental evaluation

5.1 Experimental configuration

We implemented the baseline and UP-DFS algorithms using 5 datasets generated by Rodriguez-Mier et al. based on different random models [32]. We chose the datasets for experiments because they are well acknowledged by the community. Moreover, the datasets have different sizes and are also based on different models, which we think is good to verify the practicability of our approach. It is also noteworthy that we do not distinguish the difference among the relationships of Exact, Plug In and Subsume, which are all regarded as Matched in our approach. We randomly set the keyword for each Web service in these datasets. However, in real cases, the keyword should be extracted from the service specifications, such as WSDL, Web API documents and Web pages that contain references to Web services. Since the datasets have given the initial inputs, outputs and interface semantic matching relationship between services, we further constructed the initial matching graph. Before running UP-DFS, we also performed the preprocessing described in Section 3 on the initial matching graph. Table 1 shows the scale of each dataset and the results after constructing the initial matching graph.

Table 1 Dataset statistics

Full size table

In order to locate the Web service(s) corresponding to each queried keyword in the actual application scenario, we need to pre-build and maintain an inverted index for each Web service library, which records the set of Web services corresponding to each keyword. For example, suppose services s₁, s₉ and s₂₄ cover keyword k₆, and the service set corresponding to k₆ in the inverted index can be represented as S(k₆) = {s₁, s₉, s₂₄}. In order to facilitate the experiment, we only constructed the keyword inverted index for the service nodes in the initial matching graph. Given an example query, these two algorithms (the baseline and UP-DFS) searched the initial matching graph and returned the final composition graph that satisfied all the requirements specified in the query. All experiments were executed with a time limit of 5 min (refer to [32]). The experiments were conducted on a machine with a 2.0GHz CPU, 32GB memory and Ubuntu16.04 64-bit system. If the execution does not return a valid result within 5 minutes, it is considered a failure.

5.2 Experiment setup

In order to analyze the impact of number of keywords, graph density and graph size on the performance, we set up three series of experiments, namely Series A, B and C respectively. In each series, we employed two frequently used indicators, i.e., success rate and computation time, to evaluate the practicability and efficiency of the algorithms. The success rate was calculated as the percentage of successful runs in 50 runs, and the computation time was the average of all runnings that the competitors executed successfully on the test data. Here a successful run denotes the run that eventually gives the result after a fixed period of time. At the same time, to demonstrate the ability of UP-DFS to ensure the uniqueness and sequentiality of keywords, the keyword query instances in each series were divided into two types: query with uniqueness constraint, and query with sequentiality constraints. Here, query with uniqueness constraint means no more than one service contains the same keyword. Furthermore, query with sequentiality constraints adds sequentiality constraints on queried keywords.

Series A was designed to evaluate the effect of number of keywords in query on success rate and computation time, based on the preprocessed D#1 dataset with a fixed graph size of 22 service nodes and 422 edges. The experiments were divided into four groups with 2, 3, 4 or 5 keywords, respectively. In each group, we randomly selected 50 sets of keywords and applied them to two algorithms (the baseline and UP-DFS) respectively. For the query with uniqueness constraint, we randomly set the number of occurrences of each keyword in the range from 1 to 5. As for the query with sequentiality constraints, we further randomly added sequentiality constraints to these 50 sets of keywords. It should be noted that the constraints for these two algorithms were the same in the same set of comparative experiments.

Series B was designed to evaluate the effect of the graph size on the experimental results. To demonstrate how the performance changes with a wide-ranging graph size, we conducted the experiments of Series B on all datasets from D#1 to D#5, with the number of service nodes ranging from 22 to 115 and the number of edges from 422 to 1557. Similar to Series A, we conducted 50 sets of comparative experiments. In each set of experiments, we randomly selected two keywords to test two algorithms (the baseline and UP-DFS). The success rate and the computation time were calculated in the same way as Series A. The two types of queries had the same experimental conditions as Series A.

Series C was designed to evaluate the effect of the graph density on the experimental results. The density of graph is reflected in the number of edges in graph. When the number of nodes keeps unchanged, the graph density increases as the number of edges increases. We chose D#1 with a fixed graph size to conduct this series of experiments. We continuously reduced the number of edges in the graph by gradually reducing the number of OR-branches of each parameter node, and then obtained 5 graphs with the same number of service nodes but different numbers of edges. We also tested two types of queries for each graph. For each type of query, we conducted 50 sets of comparative experiments with two keywords. Other experimental conditions were the same as Series A.

5.3 Performance evaluation

Impact of number of keywords

We investigated how the number of keywords impacts on the performance of the proposed algorithms in Series A. As shown in Figure 7a, as the number of keywords increases, the success rates of the baseline algorithm decrease significantly. However, the success rates of UP-DFS always keep at a high level (no less than 80%). It can be seen that the proposed UP-DFS algorithm achieves excellent performance on queries with different numbers of keywords. In addition, compared to queries with uniqueness constraints, when the sequentiality constraints are added to the keywords, the success rates of both algorithms decrease. One reason is that the addition of sequentiality constraints makes it harder for the algorithms to find a solution within the specified time, and the other reason is that there may not be a solution that satisfies the sequentiality constraints.

Figure 7b shows the average computation time taken by two algorithms to answer queries with different number of keywords. The computation time of both algorithms increases as the number of keywords in query increases from 2 to 5. For each number of keywords in query, the baseline algorithm needs more query time than UP-DFS algorithm. As we can see, UP-DFS is more efficient for both types of queries, and the computation time does not change a lot for different number of keywords, which also indicates its stableness. In addition, the baseline algorithm requires more time to respond to queries with sequentiality constraints. In particular, the more keywords, the greater impact of sequentiality constraints on the baseline algorithm. This is because as the number of keywords increases, the sequentiality constraints become more diversified so that the baseline algorithm will take more time on sequentiality detection. In contrast, the Keyword Sequentiality Pruning Strategy of the UP-DFS algorithm can effectively reduce the time overhead caused by the blind search.

Impact of Graph Size.

To investigate how the graph size impacts the performance, we ran two algorithms with 2 keywords queries in 5 preprocessed datasets with the different number of service nodes. Figure 8a shows the results of success rate in Series B, where the success rates decrease as the graph size increases. The reason lies in that the larger the graph, the more complex its structure becomes. Nevertheless, as we can see, the success rates of UP-DFS are significantly higher than that of the baseline algorithm. In addition, as the graph size increases, their gap becomes more obvious. The baseline algorithm needs to occupy a large amount of memory space. When the size of the graph becomes large, its spatial complexity grows exponentially, leading to memory overflow. Obviously, our algorithm is more suitable when dealing with keyword search problem in large-scale graphs.

Figure 8b shows how the graph size impacts the efficiency of two algorithms. As we can see the graph size has a greater impact on the computation time of two algorithms compared with the number of keywords. The baseline algorithm takes a lot of time (up to 58 seconds) to answer queries on large data graphs for the reason that it has to traverse each path. For the UP-DFS algorithm, as the size of the graph increases, the computation time becomes longer. In a large data graph, the number of composition solutions that cover all the keyword nodes becomes larger even when there are only a few requested keywords. As a result, the UP-DFS algorithm needs to identify and check more candidate composition solutions, and query the solution with uniqueness and sequentiality constraints, thus leading to its long computation time. Nevertheless, UP-DFS increase its computation time much slowly than the baseline algorithm does. In addition, we can see that both algorithms with sequentiality constraint run faster on the D#4 dataset. This is because on this dataset, sequentiality constraint can help algorithm do more pruning and the efficiency gains from pruning compensate for the time spent in processing sequentiality verification.

Impact of Graph Density

Figure 9 shows how the graph density (number of graph edges in the dataset) impacts on the performance of the proposed algorithms based on the experiments of Series C. We divided the D#1 dataset into five sub-datasets by randomly pruning edges, and their number of edges decreased from more than 1,000 to more than 200. From Figure 9a we can see that as the number of edges in graph increases, the success rates of both algorithms decrease. A higher graph density means more OR-branches for each input parameter. As a result, for two keywords being queried, they are more likely to be in an OR relationship, which reduces the success rate of test cases. The success rates of UP-DFS always keep at a high level while the success rate of the baseline algorithm decreases dramatically with the increase of number of edges. This is because more OR-branches make the baseline algorithm to produce more copies of partial solutions, leading to space overflow.

In terms of efficiency, Figure 9b shows that the computation time of baseline algorithm increases significantly with the increase in graph density while the computation time of UP-DFS increases moderately. Actually, the number of OR-branches in graph is the main factor affecting the practicability and efficiency of the baseline algorithm. However, due to the introduction of a series of effective pruning strategies, it does not have much impact on the practicability and efficiency of UP-DFS algorithm.

5.4 Effectiveness evaluation

Our final optimization goal is to minimize the number of services in the Web service composition solution. Therefore, in addition to the performance, we also need to evaluate the effectiveness of the UP-DFS algorithm. Since the final composition graph obtained by the baseline algorithm with the breadth-first traversal achieves the optimal effectiveness obviously, for each queried keywords, we compare the numbers of services in the solutions obtained by the UP-DFS algorithm and the baseline algorithm respectively and present the difference between them (denoted as d). Then for each set of experiments in Series A-C, we evaluate the effectiveness of the UP-DFS algorithm by calculating the average difference of the results that both algorithms retrieved successfully (denoted as \(\bar {d}\)). The statistical results are shown in the Table 2.

Table 2 Average difference of service numbers between UP-DFS and the baseline algorithm

Full size table

As Table 2 indicates, under the different experimental conditions, the difference of the number of services between the solutions obtained by UP-DFS and the optimal solutions obtained by the baseline algorithm is rather small. Therefore, we consider that the solution obtained by UP-DFS is an approximate optimal solution, that is, UP-DFS can minimize the number of services in the solution as far as possible on the premise of guaranteeing computational efficiency.

6 Conclusion and future work

Service composition makes it possible to compose the single services into a composite one to fulfill the complex requirements of users. Such composition should be functional-request oriented. However, the traditional solutions usually compose services by semantically matching inputs and outputs among services, which ignore the functional requirements of the user. In fact, in addition to the input and output conditions, users also need a convenient way to express the key tasks they would like to perform.

This paper is dedicated to automating Web service composition more effectively and reasonably by using keyword query. For this purpose, we first proposed a baseline search algorithm driven by keyword query, which can generate a semantic input-output based Web service composition that contains all the keywords satisfying both uniqueness and sequentiality constraints with the minimized number of services. To improve the efficiency of the search process, we then proposed a heuristic DFS search algorithm with a series of effective pruning strategies. Extensive experiments on public datasets demonstrated the effectiveness and performance of UP-DFS. We advanced the current work by employing a heuristic search approach in an AND/OR graph with the required execution order and minimized number of services, which provides the better experience when creating a service-based system (SBS).

Although the experiments show the promising results of UP-DFS, the space consumption of the search process is a little bit high in the proposed algorithms. We will look into a more efficient storage scheme in our future work. In addition, the inputs and outputs required when running UP-DFS limit the practicability under some circumstance. In the future, we will consider using only keywords to compose Web services without providing the initial inputs and final outputs. Last but not least, to study and propose a comprehensive QoS framework for evaluating different composition solutions is also included as one of our future works.

References

Abdullah, A., Li, X.: Agent-based model to Web service composition. In: 2016 IEEE International Conference on Services Computing (SCC), pp 523–530 (2016)
Abid, A., Messai, N., Rouached, M., Abid, M., Devogele, T.: Semantic similarity based Web services composition framework. In: Proceedings of the Symposium on Applied Computing, pp 1319–1325. ACM (2017)
Angarita, R., Rukoz, M., Cardinale, Y.: Modeling dynamic recovery strategy for composite Web services execution. World Wide Web 19(1), 89–109 (2016)
Article Google Scholar
Barakat, L., Miles, S., Poernomo, I., Luck, M.: Efficient multi-granularity service composition. In: 2011 IEEE International Conference on Web Services, pp 227–234. IEEE (2011)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: Proceedings 18th International Conference on Data Engineering, pp 431–440. IEEE (2002)
Chen, M., Yan, Y.: Redundant service removal in qos-aware service composition. In: 2012 IEEE 19th International Conference on Web Services, pp 431–439. IEEE (2012)
Chen, L., Liu, C., Yang, X., Wang, B., Li, J., Zhou, R.: Efficient batch processing for multiple keyword queries on graph data. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp 1261–1270. ACM (2016)
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: 2007 IEEE 23rd International Conference on Data Engineering, pp 836–845. IEEE (2007)
Dreyfus, S.E., Wagner, R.A.: The steiner problem in graphs. Networks 1(3), 195–207 (1971)
Article MathSciNet Google Scholar
Fang, L., Wang, L., Li, M., Zhao, J., Zou, Y., Shao, L.: Towards automatic tagging for Web services. In: 2012 IEEE 19th International Conference on Web Services, pp 528–535. IEEE (2012)
Graiet, M., Lahouij, A., Abbassi, I., Hamel, L., Kmimech, M.: Formal behavioral modeling for verifying sca composition with event-b. In: 2015 IEEE International Conference on Web Services, pp 17–24. IEEE (2015)
He, Q., Zhou, R., Zhang, X., Wang, Y., Ye, D., Chen, F., Chen, S., Grundy, J., Yang, Y.: Efficient keyword search for building service-based systems based on dynamic programming. In: International Conference on Service-Oriented Computing, pp 462–470. Springer (2017)
He, Q., Zhou, R., Zhang, X., Wang, Y., Ye, D., Chen, F., Grundy, J.C., Yang, Y.: Keyword search for building service-based systems. IEEE Trans. Softw. Eng. 43(7), 658–674 (2017)
Article Google Scholar
Huang, G., Ma, Y., Liu, X., Luo, Y., Lu, X., Blake, M.B.: Model-based automated navigation and composition of complex service mashups. IEEE Trans. Serv. Comput. 8(3), 494–506 (2015)
Article Google Scholar
Jiang, W., Zhang, C., Huang, Z., Chen, M., Hu, S., Liu, Z.: Qsynth: A tool for qos-aware automatic service composition. In: 2010 IEEE International Conference on Web Services, pp 42–49. IEEE (2010)
Jungmann, A., Kleinjohann, B.: Automatic composition of service-based image processing applications. In: 2016 IEEE International Conference on Services Computing (SCC), pp 106–113. IEEE (2016)
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp 505–516. VLDB Endowment (2005)
Kargar, M., An, A.: Keyword search in graphs: finding r-cliques. Proceedings of the VLDB Endowment 4(10), 681–692 (2011)
Article Google Scholar
Klusch, M., Kapahnke, P.: Adaptive signature-based semantic selection of services with owls-mx3. Multiagent Grid Syst. 8(1), 69–82 (2012)
Article Google Scholar
Klusch, M., Fries, B., Sycara, K.: Owls-mx: A hybrid semantic Web service matchmaker for owl-s services. Web Semantics: Sci Serv Agents World Wide Web 7(2), 121–133 (2009)
Article Google Scholar
Klusch, M., Kapahnke, P., Zinnikus, I.: Hybrid adaptive Web service selection with sawsdl-mx and wsdl-analyzer. In: European Semantic Web Conference, pp 550–564. Springer (2009)
Klusch, M., Kaufer, F.: Wsmo-mx: A hybrid semantic Web service matchmaker. Web Intell. Agent Syst.: Int. J. 7(1), 23–42 (2009)
Article Google Scholar
Li, R.H., Qin, L., Yu, J.X., Mao, R.: Efficient and progressive group steiner tree search. In: Proceedings of the 2016 International Conference on Management of Data, pp 91–106. ACM (2016)
Liu, X., Ma, Y., Huang, G., Zhao, J., Mei, H., Liu, Y.: Data-driven composition for service-oriented situational Web applications. IEEE Trans. Serv. Comput. 8(1), 2–16 (2015)
Article Google Scholar
Liu, C., Yao, L., Li, J., Zhou, R., He, Z.: Finding smallest k-compact tree set for keyword queries on graphs using mapreduce. World Wide Web 19(3), 499–518 (2016)
Article Google Scholar
Naseriparsa, M., Islam, M.S., Liu, C., Moser, I.: No-but-semantic-match: Computing semantically matched xml keyword search results. World Wide Web 21(5), 1223–1257 (2018)
Article Google Scholar
Oh, S.C., Lee, D., Kumara, S.R.: Web service planner (wspr): An effective and scalable Web service composition algorithm. Int. J. Web Serv. Res. (IJWSR) 4(1), 1–22 (2007)
Article Google Scholar
Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.: Semantic matching of Web services capabilities. In: International Semantic Web Conference, pp 333–347. Springer (2002)
Riabov, A.V., Boillet, E., Feblowitz, M.D., Liu, Z., Ranganathan, A.: Wishful search: Interactive composition of data mashups. In: Proceedings of the 17th International Conference on World Wide Web, pp 775–784. ACM (2008)
Rodriguez-Mier, P., Mucientes, M., Lama, M.: Automatic Web service composition with a heuristic-based search algorithm. In: 2011 IEEE International Conference on Web Services, pp 81–88. IEEE (2011)
Rodriguez-Mier, P., Pedrinaci, C., Lama, M., Mucientes, M.: An integrated semantic Web service discovery and composition framework. IEEE Trans. Serv. Comput. 9(4), 537–550 (2016)
Article Google Scholar
Rodriguez-Mier, P., Mucientes, M., Lama, M.: Hybrid optimization algorithm for large-scale qos-aware service composition. IEEE Trans. Serv. Comput. 10(4), 547–559 (2017)
Article Google Scholar
Zhang, N., Wang, J., Ma, Y., He, K., Li, Z., Liu, X.F.: Web service discovery based on goal-oriented query expansion. J. Syst. Softw. 142, 73–91 (2018)
Article Google Scholar
Zheng, X., Yan, Y.: An efficient syntactic Web service composition algorithm based on the planning graph model. In: 2008 IEEE International Conference on Web Services, pp 691–699. IEEE (2008)

Download references

Acknowledgments

This work was partially supported by the key science and technology project of Zhejiang China under grant number 2017C01010, and Australia Research Council discovery projects under grant numbers DP170104747 and DP180100212.

Author information

Authors and Affiliations

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Dongjin Yu, Lei Zhang & Dengwei Xu
Swinburne University of Technology, Melbourne, Australia
Chengfei Liu & Rui Zhou

Authors

Dongjin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengfei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Dengwei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongjin Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, D., Zhang, L., Liu, C. et al. Automatic Web service composition driven by keyword query. World Wide Web 23, 1665–1692 (2020). https://doi.org/10.1007/s11280-019-00742-5

Download citation

Received: 01 June 2019
Revised: 17 September 2019
Accepted: 23 September 2019
Published: 07 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11280-019-00742-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic Web service composition driven by keyword query

Abstract

Similar content being viewed by others

Web Service Composition by Optimizing Composition-Segment Candidates

Comprehensive Quality-Aware Automated Semantic Web Service Composition

Research Challenges of Web Service Composition

Explore related subjects

1 Introduction

2 Related work

3 Problem description and definition

3.1 A motivating example

3.2 Semantics-based Web service model

Definition 1

Definition 2

Definition 3

Definition 4

3.3 Graph model for Web service composition

Definition 5

Definition 6

3.4 Problem statement

4 Composing Web services by keyword search

4.1 Generating and pre-processing initial matching graph

Processing equivalent bridging services

Eliminating dead-end services

4.2 Baseline algorithm for Web service composition

4.3 Upper-bound pruning based depth first search algorithm (UP-DFS)

4.3.1 Three types of pruning strategies

Pruning for Keyword Uniqueness

Pruning for Keyword Sequentiality

Upper-bound Pruning

4.3.2 The UP-DFS algorithm

5 Experimental evaluation

5.1 Experimental configuration

5.2 Experiment setup

5.3 Performance evaluation

Impact of number of keywords

Impact of Graph Size.

Impact of Graph Density

5.4 Effectiveness evaluation

6 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation