Keywords

1 Introduction

More and more tools and method for geospatial data analysis are being developed and distributed on the web, which makes it easier for us to solve problems in our lives [1]. For example, GIS services helps us find the best location to set up a fire station easily and quickly in [2]. Beside that, GIS services are also used in agriculture, medical care, transportation and various fields. Therefore, it is a trend that composing many GIS services together to provide added values to meet the user’s requirement. Automatic services composition can be of great value to the GIS users, cause it can greatly broaden the functional ability to handle users’ requirement [3]. However, it is still a big challenge for service developers that making GIS service composition fulfill functional requirements [4].

The process of service discovery, selection and composition is a crucial task in web service based application development [5]. The methods of [6] is based on syntax matching, which didn’t take the services semantics information into account. Later, in [11], the author proposed to optimize the service composition by considering QoS, which didn’t consider the semantic information. And some scholar proposed that automate interactions between web services are important [7]. So the concept of ontology is proposed in [8,9,10], which is used to measure the semantic distance between services. However, it is a huge problem that how to build a comprehensive and standard ontology library of GIS.

To solve the problem mentioned above, we proposed a method that can compose and recommend GIS services in a semantic way. The main contributions of this article are summarized as follows:

  • To get semantic relationships between services, we use Normalized Google Distance (NGD) to discover the actual inter-service invocation status.

  • Considering the hierarchical structure of ArcGIS Services, a round of filtering is carried out before the network is built for reducing the retrieval time.

  • In order to speed up the search time in the network, this paper use improved simulated annealing algorithm to get a relatively better solution.

The rest of this paper is organized as follows. Section 2 defines relevant concepts. Section 3 introduces the mechanism about how to construct the dynamic semantic model. Section 4 use an improved heuristic optimization algorithm to accelerate the processing of selecting DAG. Section 5 shows the result about our experimental evaluation and analysis the research. Section 6 introduces the related works of service recommendation. Finally, 7 concludes about this work.

2 Preliminaries

Definition 1

(User Requirement). An user requirement is a tuple req=(IuP, OutP), where:

  • IuP is a parameter set containing all user input parameters;

  • OutP is a parameter set containing all user input parameters;

A req is consist of input and output parameters set, given by the user.

Definition 2

(Directed Acyclic Graph). A Directed Acyclic Graph, which can be performed to meet the user requirement, is a tuple DAG = (S, INV), where:

  • S is the set of ArcGIS Services contained in this DAG, which can also regard as lots of vertices in this DAG;

  • INV is the set of direct links, which represents the invocation relationships between these ArcGIS Services contained in this DAG;

A DAG is used to describe the invocation relationship between services, which is generated to meet user requirements.

Definition 3

(ArcGIS Service). An ArcGIS Service is a tuple s = (nm, dsc, IuP, OutP), where:

  • nm is the name of ArcGIS Service;

  • dsc is an explanation of the functionality of this ArcGIS Service;

  • IuP is the set of input parameters contained in this ArcGIS Service;

  • OutP is the set of output parameters contained in this ArcGIS Service;

Each s has a specific function, which can be used to solve specific problem.

Definition 4

(Semantic Services Network Model). A Dynamic Semantic Services Network is a triple SNetM = (S, INV, WGT), where:

  • S are the services contained in this Dynamic Semantic Services Network;

  • INV is the set of direct links between ArcGIS Services, which represents the ability that this ArcGIS Services may invoke others;

  • WGT are the weights defined upon the direct links INV, which represent the specific possibility that an ArcGIS Services is invoked by the other; contained in this ArcGIS Services.

There is an example in Fig. 5. Each vertex represents a service, each oriented edge represents the direction of service execution, and the value on the edge represents the semantic similarity between services.

3 Construction of Semantic Network Model

3.1 Hierarchical Structure of ArcGIS Services

ArcGIS offers advanced GIS functionalities geoprocessing tool to the users to solve the problem, which are organized in a tree structure [12]. Such a special structure can help us to remove off the unnecessary ArcGIS services to save the time and computing resources, which is shown in Fig. 1. For example, if the output parameter of the previous service(s) is vector data, there is no need to retrieve the cluster of ArcGIS services which could only use raster data as input parameters in the same subtree. Thus, using ArcGIS services tree structure can help us reduce the scope of the search and speed up the retrieve.

Fig. 1.
figure 1

ArcGIS services tree structure.

3.2 Services Semantic Calculation

  1. (1)

    Normalized Google Distance (NDG)

Based on the principle that words with similar meanings appear more frequently in the browser web page, we use NGD to calculate the invocable between services. NGD is calculated by Eq. 1:

$$\begin{aligned} \small NGD (x,y )= \frac{max (logf(x),logf(y) )-logf(x,y) }{logM -min ( logf(x),logf(y)) } \end{aligned}$$
(1)

In Eq. 1, M represents the total number of pages searched by Google. f(x) and f(y) are the hits of the search terms x and y, respectively. f(xy) is the number of pages that appear in both x and y. If two search terms x and y never appear together on the same page, the normalized Google distance between them is infinite. Thus, the value of NDG ranges from 0 to infinity, the larger value represents the greater the distance, which meaning the greater semantic distance between two words, and vice versa.

  1. (2)

    Services Semantic Calculation

The name of GIS services would be broken dowm into multiple words. Then use the minimum cost and maximum flow algorithm [13, 14] adopted method to compute the cost between \(WD_{ArcNm1}\) and \(WD_{ArcNm2}\). So, the names similarity can be computed by Eq. 2.

$$\begin{aligned} \begin{aligned} sim_{serNm}&\!\!(ser{.}nm_{1},ser{.}nm_{2})\\&\!\!= 1-\frac{cost }{max (SizeOf (WD_{serNm1},WD_{serNm2}))} \end{aligned} \end{aligned}$$
(2)

The text description similarity of ArcGIS Services is calculated by Eq. 3, which use xsimilarity [15]. In this method, words similarity in sentences (denoted as wordSim) and the words order (denoted as ordSim) are taken as parameters. The specific calculation formula is as:

$$\begin{aligned} \begin{aligned} sim_{serDsc}&\!\!(ser_{1}{.}dsc_{1},ser_{2}{.}dsc_{2})\\&\!\!= \xi _{} \times wordSim +(1-\xi _{}\times ordSim ) \end{aligned} \end{aligned}$$
(3)

The Similarity Computation between ArcGIS Services is calculated by parameters \(sim_{serNm}\) and \(sim_{serDsc}\) in Eq. 4.

$$\begin{aligned} \begin{aligned} sim_{act}&\!\!(act_{1},act_{2})\\&\!\!= \varrho _{}\times {sim_{serNm}}({Arc_{1}}){.}{nm_{1}},{ser_{2}}{.}{nm_{2}})\\&\!\!+ (1-{\varrho _{}})\times {sim_{serDsc}}({ser_{1}}{.}{dsc_{1}},{ser_{2}}{.}{dsc_{2}}) \end{aligned} \end{aligned}$$
(4)
  1. (3)

    Calculating the Semantic value of Workfolw Pattern

There are two common workflow patterns for GIS service composition: sequential workflow pattern and parallel workflow pattern, which can see in Fig. 2. The semantic value for sequential workflow pattern and parallel workflow pattern are calculated by Eqs. 5 and 6 respectively.

$$\begin{aligned} \small {SIM_{seq}}=\sum _{i=1}^{n}{S_{i}} \end{aligned}$$
(5)
$$\begin{aligned} \small {SIM_{para}}=\frac{\sum _{i=1}^{n}{S_{i}}}{n } \end{aligned}$$
(6)
Fig. 2.
figure 2

Sequential workflow pattern and parallel workflow pattern.

3.3 Construction Network

  1. (1)

    Narrowing Candidate Service Set

It would be a huge project that retrieving the entire set of ArcGIS services when we selected the candidate services. So we can use the unique tree structure of the ArcGIS services (refer to Sect. 3.1), which can help us reduce the services search space. Algorithm 1 tells about how to narrow the service candidate set.

figure a

In Algorithm 1, S can be obtained. T represents the set of all GIS services organized in a tree structure. I represents all the input parameters. S represents all candidate services which take these parameters as input parameters. First, set S copies all GIS services in T. Count the number of first level subtrees in the tree structure and assign this value to variable n (lines 1–2). By checking the required parameter types between the subtree and I, we can remove the unmatch subtree from S. When all subtree nodes have been detected, count the number of subtrees left and assign the value to variable k (lines 3–8). For each subsubtrees in the subtree, a parameter type check is performed again. If the required parameters for the service to run in the subsubtree are more than the parameter types in I, this subsubtree is deleted (lines 9–15). And then find the services in \(Var_S\), taking all parameters in I as input (denoted as \(Var_S(I)\)), and assign it to S. Finally, put the output parameters of S into P (lines 16–18).

  1. (2)

    Building Semantic Networks

The Algorithm 2 is used to build a solution space network, from which generate the DAG and recommend it to users. Therefore, the Algorithm 2 takes user requirements req as input and the solution space network model SNetM as output. First, copy the parameters in InP to P and set \(Var\_S\), INV as empty sets, where \(Var\_S\) is used to store the services generated in the process and INV is used to record invocation relationships between services. Record the number of parameters in P and put them into variable n.Set parameter \(Var\_P\) to null to store the generated parameters (lines 1–2). For all parameters in P, if using Algorithm 1 (denoted as NarrSer) finds a narrowed service set, then find the appropriate service from the narrowed service set and put it into the variable \(Var_s\). The output parameters of all services generated during this process are put into the variable \(Var_p\). Record the relationship and sematic value between these services into the INV (lines 3–9). Looking for a candidate service with multiple parameters as input is similar to looking for one parameter as a candidate service (lines 10–18). Then, the number of iterations k is increased once and the parameters in the intermediate variable \(Var\_P\) are copied into the P set. NetM can be output if the generated parameters include the parameters required by the user or if the number of iterations is greater than the threshold. Otherwise, jump to the line 2 and continue with the above procedure (lines 19–24).

figure b

In this way, a dynamic semantic web is formed, which contains the DAG required by users. For instance, Fig. 5 is a SNetM. According to the user input and output parameters Req.I, Algorithms 1 and 2 are used to constructing semantic network model, which contains the DAG needed by users.

4 Recommendation System Based on Improved Simulated Annealing Algorithm

4.1 Generating New Path

To reach global optimal solution instead of local optimal solution, the simulated annealing algorithm is required to accept the new solution with a certain probability. Therefore, this section will talk about how to generate new path.

Fig. 3.
figure 3

Dividing into blocks.

Fig. 4.
figure 4

Dividing into blocks.

  • Dividing the Solution into Small Module: The resulting graph solution could be divided into blocks according to workflow patterns (Fig. 2).

  • Selecting the Replacement Module: The marked block should be replaced by the other block(s) in the SNetM. So use the random number generator to select a block, which will be replaced by other block, which is shown in Fig. 4.

  • Generating New Solution: Replace the selected block and connect the selected block between the former block and the latter block. Consequently, a new graph result is produced, which can be seen example B in Fig. 3.

4.2 Improved Simulated Annealing Algorithm

figure c

The simulated annealing algorithm starts with the initial solution i and the control parameter t and the process is controlled by the cooling schedule, which includes the initial value of the control parameter t and its attenuation factor \(\alpha \), the iteration number ILOOP of each t and the stop condition EPS in Algorithm 3. \(Cur\_DAG\) is a result randomly found from the network that meets the user’s input and output requirements. The \(Best\_DAG\) represents the DAG which can better meet the user’s requirement. Coolingtable represents a set of parameters that control the progress of an algorithm.

Parameters \(P\_L\) and \(P\_F\) are set to record the times of receiving bad results in a certain stage of annealing process and the times of this process respectively. Temporarily set \(Best\_DAG\) and \(New\_DAG\) to be the same value as the \(Cur\_DAG\) (lines 1–2). Use the algorithm changeSolution() to generate the \(New\_DAG\) and calculate the semantic value difference between the two path (denoted as dE). If the semantic values of \(New\_DAG\) (denoted as \(SIM_{New\_DAG}\)) is higher than that of \(Cur\_DAG\) (denoted as \(SIM_{Cur\_DAG}\)), the \(New\_DAG\) will be accepted as the \(Cur\_DAG\). Otherwise, the above operation is carried out with a certain probability to avoid falling into local optimal and increment the value of the \(P\_L\) by 1. If PL is greater than LIMIT, jump out of the loop (lines 3–19). After the above process, if the \(SIM_{Cur\_DAG}\) is higher than \(SIM_{Best\_DAG}\), replace the \(Cur\_DAG\) with \(Best\_DAG\) (lines 20–22). Then, determine whether the program is completed by judging whether \(P\_F\) is greater than OLOOP or the temperature t reaches the minimum value EPS. If the exit condition is not reached, use attenuation coefficient \(\alpha \) to cool the temperature and continue the cycle (lines 20–27). As a result, the DAG is found in the semantic web in Fig. 5 and recommended it to the user.

Fig. 5.
figure 5

The dynamic semantic network model.

5 Experiment

5.1 Dataset Description and Precision

In order to verify the effectiveness of our proposed method, we use the Java language to test the method and use MySQL database to store the data, which is conducted on a desktop with an Inter (R) Core (TM) i7-3770 CPU @ 3.40 GHz, 8.00 GB memory, and a 64-bit Windows 10 operating system.

The data uses 300 geoprocessing services organized by tree from ArcGIS Toolbox. In addition, 112 DAG rules, which represents the invocation rules between services based on different requirements, are found from numerous communities such as CSDN.

Our experimental results will be evaluated by precision and running times. The precision is computed as follows:

$$\begin{aligned} \begin{aligned} {precision} = \frac{{DAG_{P}} \bigcap {DAG_{R}} }{N } \end{aligned} \end{aligned}$$
(7)

In Eq. 7, \(DAG_{P}\) represents the DAG generated by our method and the \(DAG_{R}\) represent the right DAG that really meets the requirements of the user in the DAG rule set. N is the operation number contained in a \(DAG_{P}\). To get a more correct value of precision, we proceed experiment with different user requirement for 112 times. The average precision is 76.4%.

5.2 Impact of Parameters in Cooling Table

To investigate the effect of Cooling Table parameters in the proposed method in Algorithm 3. As show in Fig. 6, we set the parameters in the Cooling Table to three different sets of values and compared them.

The cooling Table contains four parameters \(t_{0}\), \(\alpha \), EPS and ILOOP. Normally, the values of \(t_{0}\) and \(\alpha \) are 1000 and 0, so we only consider \(\alpha \) and ILOOP, which are denoted as (\(\alpha \), ILOOP) in Fig. 6. \(\alpha \) represents the rate of temperature decay and ILOOP represents the number of temperature drops in the same stage, which are mutually dependent. Although the higher value of \(\alpha \) represents the better ability to cool the temperature in Fig. 6(a). It will also take a lot of times. For the same reason that higher value of ILOOP will cost more computing resource in Fig. 6(b), the value of parameter ILOOP should not be very large. Therefore, the experimental accuracy is relatively high and the computation time consumption is relatively small, when \(\alpha \) is 0.9 and ILOOP is 80.

Fig. 6.
figure 6

Influence of parameters in cooling tables.

5.3 Compare with Other Method

The number of services is varies from 3 to 9, so we consider the impact of the number of services on the service composition. We compare our method with the method proposed by the author in [16], which used the GA as heuristic optimization algorithm.

Figure 7 shows that the precision value reaches the peak value when the service number of DAG is from 4 to 6. The reason for this phenomenon are as follows. If a large number of services need to be found in the required DAG, but some detailed or transitional services may not be found during the actual execution, thus affecting the precision. That is the reason why precision decreases as the number of services increases. Because comparison method can only get services chain, the precision of our method is higher than compared one in Fig. 7(a). And as the number of services in the DAG increases, the service composition consumes more time. The reason why our method takes more time is that our graph structure solution is much more complex than chain structure in Fig. 7(b). But the usability of our proposal is much higher than the comparison one.

Fig. 7.
figure 7

The precision and run time of proposed method.

6 Related Work

6.1 Service Composition Technology

Web services composition technology, aiming to provide added values by loosely coupling web services, has been used to efficiently find near-optimal composite services to satisfy users’ requirements reasonably well [17]. The syntax-based service composition depends on the matching between selected keywords and Web service description [18], which takes little account of the semantics of web services. To get the concepts relationship, scholar use a certain criterion to measure the semantic distance in [19]. In [10], the authors proposed a novel Permutation-based Multifactorial Evolutionary Algorithm to solve the fully automated semantic service composition problem for diverse user segments with different QoSM preferences. And the principle of [20, 21] is that using ontology as a fundamental criterion to measuring the concept distance of the user’s requirement and the services. The method of using ontology is not suitable for direct application in GIS domain, cause it’s a hard work to construct the ontology. It is obviously that the accuracy of web services semantic annotations will significantly improve the effectiveness of the web service discovery, recommendation and composition [22]. In [11], the author proposed an invocation-based technique to verify the QoS accuracy by using annotations.

6.2 GIS Services Composition

The GIS domain service composition can be divided into three categories: semi-automated GIS services composition, syntax-based GIS services composition, and semantic GIS services composition. In [23], the author proposed the registration-binding-lookup mechanism, which is a semi-automated approach to service composition recommendation. In order to provide services to user automatically, some authors suggest that taking services context into consider. In [24], the authors proposed an active proxy, which can regard service context and user’s requirement, extract useful information and send it to the server. But this method can only used in location-based service. In [25], the authors mapped the OWS input/output message to WSRF ResourceProperties, which could bring higher efficient. But this method doesn’t incorporate many useful WSRF function. Besides, high performance data transfer is a challenge in GIS service.

7 Conclusion

The enhancement of Internet technologies has improved the technology in GIS services discovery, composition and recommendation. It is becoming increasingly important to combine GIS services to help users solve a various problems. Therefore, in this paper, we discusses the related technologies of service composition in GIS and computer fields, and analyzes the principles of these technologies. We find that effective use of semantic information between services can improve the quality of service composition, which could meet the users’ requirement better. To solve this problem, by using the tree organization structure of ArcGIS service, we can quickly select the set of services that meet the requirements according to the syntax matching relationship between services. To further explore the semantic correlation between services we use the NDG to build the dynamic semantic network. Then simulated annealing algorithm is used to find the DAG with high semantic value and recommend it to the user. Experiments show that our method could recommend a meaningful DAG with higher precision.