An Enhanced Method to Compute the Similarity Between Concepts of Ontology

Daoui, Abdelhadi; Gherabi, Noreddine; Marzouk, Abderrahim

doi:10.1007/978-3-319-64719-7_9

Abdelhadi Daoui¹⁶,
Noreddine Gherabi¹⁷ &
Abderrahim Marzouk¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 640))

Included in the following conference series:

International Conference on Information Technology and Communication Systems

459 Accesses
10 Citations

Abstract

With the use of ontologies in several domains such as semantic web, information retrieval, artificial intelligence, the concept of similarity measuring has become a very important domain of research. Therefore, in the current paper, we propose our method of similarity measuring which uses the Dijkstra’s algorithm to define and compute the shortest path. Then, we use this one to compute the semantic distance between two concepts defined in the same hierarchy of ontology. Afterward, we base on this result to compute the semantic similarity. Finally, we present an experimental comparison between our method and other methods of similarity measuring.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A New Similarity Measure for an Ontology Matching System

Combination of Lexical and Structure-Based Similarity Measures to Match Ontologies Automatically

An Enhanced Ontology Based Measure of Similarity between Words and Semantic Similarity Search

Keywords

1 Introduction

Today, ontologies play an important role in many domains related to the semantic Web [1], information retrieval [2], knowledge engineering [3] and knowledge management [4]. Therefore, several researches and studies have been developed or are being done to cover this fertile area. These researches can be used in different approaches such as concepts creation, ontology design [5], classification [6], or segmentation [7]. The latter is useful for the processing of large ontologies, which is difficult to maintain, namely the addition, modification or deletion of large ontology parts.

Our work will focus on the measuring of the semantic similarity between concepts of ontology. This one is an important concept used in different areas of research. Jeffrey Hau, William Lee and John Darlington [8] use the semantic similarity to define compatibility between semantic web services [9, 10] annotated by OWL ontologies [11]. In [12] the authors present a method based on multiple information resources (lexical taxonomy, corpus…) to measure the semantic similarity between words. The similarity is also used in the correspondence between the shapes for example, the authors in [13] compute the similarity between outlines of 2D shapes by using a technique based on the extracting of the shapes contours which are represented by a set of points, then the authors describe each segment of this contours by a local and global features, these ones will be coded in string of symbols and stored into XML files On which the similarity calculation will be executed.

Also, several techniques are proposed to compute the semantic similarity between ontologies [14, 15]. Where, the authors, in [15] propose a new method to compute the semantic similarity which is based on three steps. In the first the authors compute the semantic similarity of nodes, and then they compute the semantic similarity of relations between these nodes, at last they combine these two results to form a unified value of semantic similarity.

There are two families of approaches to compute the semantic similarity between concepts:

1.
A family based on computing the geometric distance between concepts to define their semantic similarity, where the less distance gives the more similarity [16].
2.
A family based on degree of information sharing, more common information between two concepts means more similarity [8].

The principal idea of our method is defining the shortest path between any node of a graph (in the current paper the term graph is used to describe ontology) and the root node. Then, we base on these shortest paths and our formula for computing the rate of semantic similarity between the concepts of this graph.

This paper is organized as follows: in Sect. 2 we describe our method. The next section presents an experimental comparison with some other methods of similarity measuring, followed by a discussion of the changes made to the methodology. Finally, the Sect. 4 presents our conclusion.

2 Proposed Method

Our method is designed to compute the semantic similarity between two concepts that exist in the same hierarchy of ontology, where all their nodes are connected by “is-a” relations type. This method is summarized in the algorithm shown below in Fig. 1.

The first step in our method is reserved for weight allocation to the arcs which represent the relations between the nodes of the studied ontology (Sect. 2.1). Then, we calculate the shortest path from the node, which we want to measure its similarity to the root node (Sect. 2.2). Afterward, Sect. 2.3 is devoted to compute the semantic distance between the two concepts which we need to measure the similarity between them. Then, we use this semantic distance to define the rate of similarity between these two concepts (Sect. 2.4). Finally, in (Sect. 2.5) we present our global algorithm which summarizes our proposed method.

2.1 Weight Allocation

In the similarity measuring literature, several methods of weight allocation exist, we distinguish between those that affect the value of weight to the nodes like the methods proposed in [17, 18] and the others that allocate the value of weight to the relations (arcs) between nodes [16]. Our method is designed to compute the semantic similarity rate between two concepts of ontology whose all their nodes are connected by the relations of “is-a” type (inheritance type). Therefore, in this paper, we have adjusted one from the second type of weight allocation methods which allows allocating the weight W (m, n) computed using the formula 1 to all the arcs in the ontology.

$$ W({\text{m,}}\;{\text{n}}) = \left[ {\hbox{max} ({\text{depth}}({\text{m}})) + \frac{N^\circ (n)}{NTNodes(G) + 1} + 1} \right]^{ - 1} $$

(1)

Where, m and n represent two nodes directly connected, max (depth (m)) represents the maximum depth of the node m (the depth of the root node is equal to 0, 1 for the nodes directly connected to the root node and so on), NTNodes and N° (n) represent successively the total Number of nodes in graph G and the order number of the node n between their siblings. And this later (N° (n)) is an integer greater or equal to 0.

We consider the following ontology:

The weights of arcs mentioned in the Fig. 2 are calculated by our formula (formula 1) of weight allocation. In this formula, we have taked max (depth (m)) for ensuring that the weight of the current arc is always less than the weights of their previous arcs. Therefore, the semantic similarity between two concepts more specific is greater than two concepts more generalist.

2.2 Shortest Path Defining

The shortest path is a famous problem in the science research domain. Therefore, several algorithms are proposed such as Dijkstra’s algorithm [19], Bellman-Ford algorithm [20], Floyd–Warshall algorithm [21], but each one of these is designed to define the shortest path under some criteria. In this paper, with our criteria which are: (1) the weight is strictly greater than zero. (2) The relations between the nodes are “is-a” type.

The best solution adapted to our case is Dijkstra’s algorithm with the use of the formulas (2) and (3). For this, we have adapted this one with some modifications that allow stopping the algorithm once the root node is visited, which is beneficial to us either at the performance or at the execution time. From our second criterion (The relations between the nodes are “is-a” type), we can deduce that all the arcs of the graph will be oriented to the root node. At the level of calculation of the shortest path, from a given node to the root node, only the nodes that can be considered a generalization of the current node will be visited and not all the nodes of the graph.

$$ W_{0} [m,n] = \left\{ \begin{aligned} 0\;if\;m = n \hfill \\ \infty \;m \ne n \hfill \\ \end{aligned} \right\} $$

(2)

$$ \begin{aligned} & For\;all\;0 \le k \le S - 1 \\ & W_{k + 1} [m,n] = { \hbox{min} }\left\{ {\begin{array}{*{20}l} {W_{k} [m,n]} \hfill \\ {W_{k + 1} [m,\text{x}] + G[\text{x},n]} \hfill \\ \end{array} } \right\} \\ \end{aligned} $$

(3)

m, n and x represent three nodes, S represents the set of all nodes of graph. The nodes x and n are directly connected and W_k [m, n] represents the weight of arc (path) [m, n] at iteration k,

In our adapted algorithm, for defining the shortest path, we use the formula 2 for initializing the nodes weight, where we give the value zero to the start node and infinity to all other nodes in the graph, and we rely on formula 3 to compute the weight of the shortest path which exists between the start node and the root node.

Our adapted algorithm is designed as follows (Fig. 3):

This algorithm will define the shortest path between a given node in the ontology and the root node. In contrast to the Dijkstra’s algorithm, this algorithm will stop the execution once the root node is visited.

In addition, our algorithm allows not only defining the shortest path, but also defining the value of its weight. And we can’t define the shortest path without defining its weight.

For example, we consider the ontology presented in the Fig. 2. To define the shortest path for the node H. There are three paths from H to A (the root node): (H, E, FC, A), (H, G, F, D, FC, A) and (H, G, C, FC, A) (Table 1).

Table 1. An example to define the weight of shortest path.

Full size table

We can easily observe that the shortest path between the node H and the root node is equal to 1.768.

From the Table 2, we can observe that:

Table 2. An example to define the shortest path.

Full size table

Where, SPath (H, E, FC, A) represents the shortest path between the nodes H and A defined by our adapted algorithm.

2.3 Semantic Distance Computing

At this level, to compute the semantic distance between two concepts we use a new technique based on the shortest path calculated in the previous phase. This technique allows eliminating unnecessary parts in the shortest paths and only keeps the necessary parts to calculate the semantic distance between two concepts.

This technique is designed as follows:

We consider the colored segment in the figure below which represents a segment of the ontology mentioned in Fig. 2.

Where, SPath1 and SParh2 represent the shortest paths between the nodes (H and B) and the root node. FC represents the first common node between these two shortest paths, if we come from the nodes in question towards the root node. And CSPath represents common sub shortest path between the FC node and the root node.

$$ {\text{SDis }}\left( {{\text{C1}},{\text{ C2}}} \right) = {\text{W }}\left[ {\text{SPath1}} \right] + {\text{W }}\left[ {\text{SPath2}} \right]{-}2*{\text{W }}\left[ {\text{CSPath}} \right] $$

(4)

Where, C1 and C2 represent two concepts (in our Fig. 4 C1 represents the concept (B, FC, A) and C2 represents the concept (H, E, FC, A)) and

$$ W \, \left[ {SPath_{i} } \right] = \sum\limits_{j = 1}^{k} {W_{j} [m,n]} $$

(5)

m and n represent two nodes directly connected in SPath_i and k represents the set of arcs in SPath_i.

The formula 4 allows calculating the semantic distance between any two concepts exist in the same graph.

2.4 Semantic Similarity Computing

There is an inverse relation between semantic similarity and semantic distance. Increasing of one among them decrease the other. We can categorize the output of the semantic similarity function in three categories:

1.
The two concepts are the same.
2.
Nothing in common between them.
3.
There is a rate of semantic similarity between them.

Therefore, the function of semantic similarity should verify these three conditions:

1.
$ \forall \;(\text{C}1,\text{C}2) \in G:0 \le SSim(\text{C}1,\text{C}2) \le 1 $
2.
$ \forall \;\text{C}1 \in G:SSim(\text{C}1,\text{C}1) = 1 $
3.
$ \begin{aligned} & \forall \;(\text{C}1,\text{C}2,C3) \in G:if\;SDis(\text{C}1,\text{C}2) > \text{SDis}(\text{C}1,\text{C}3)then \\ & \text{SSim}(\text{C}1,\text{C}2) < \text{SSim}(\text{C}1,\text{C}3) \\ \end{aligned} $

Where, SSim represents the semantic similarity, SDis represents the semantic distance and (C1, C2 and C3) represent three concepts of graph G. Ci represents the set of nodes constituting the shortest path between a given node and the root node.

In this paper, we have used the function of semantic similarity computing proposed in [16].

$$ SSim(\text{C}1,C2) = \frac{1}{{\deg^{*} SDis(\text{C}1,\text{C}2) + 1}} $$

(6)

C1 and C2 represent two concepts and the parameter “deg” represents the impact degree of Semantic distance on semantic similarity, and it should be between 0 < deg ≤ 1 (the concrete value of “deg” is defined in the experience).

2.5 Global Algorithm

This section is devoted to our global algorithm that includes all the previous algorithms and techniques (Fig. 5).

3 Experiments

In this section, by consulting the WordNet, we have got a fragment of ontology hierarchy shown in Fig. 6, concerning the terms: Vehicle, truck, car, family car, sport car, luxury car and bus which we want to compare the similarity between them.

After computation of semantic similarity between these concepts by using our method (where, we have fixed the parameters deg to 0,4) and two others, we have obtained the results presented in the following Tables 3, 4, and 5.

Table 3. Application of the first method [22].

Full size table

Table 4. Application of the second method [16].

Full size table

Table 5. Application of our method.

Full size table

In this experimental comparison, each node presents in the following tables such as Truck, Car and so on, represents a concept which is constituted from the set of all nodes of the shortest path which exists between this node and the root node. For example the node “SportCar” presents in these tables represents the concept (SportCar, Car, Vehicle).

By analyzing of the obtained results of semantic similarity, we observe that the first method [22] can only find the total similarity (otherwise, it can only discover the similarity between the same concepts). The second method [16] can find the semantic similarity between concepts but with a low score. Finally, our method can also calculate the semantic similarity between concepts but with a high score.

The second method [16] allows affecting the same value of the semantic similarity between the sibling concepts and their parent concept (for example, the semantic similarity between the concepts (Vehicle) and (Vehicle, Truck) is equal to 0.71 and the same value of semantic similarity between (Vehicle) and (Vehicle, Car)), but in the reality the rate of the semantic similarity between these concepts differs. For this reason, we have proposed our method which allows avoiding this type of problems by affecting the different values of semantic similarity for similar cases.

Also our method uses a simple function of weight allocation and bases on famous and robust algorithms with some adaptations to minimize the time of execution. Therefore, this one can calculate the semantic similarity in very short time even in industrial size ontologies.

4 Conclusion

In this paper, we have presented a new method to calculate the similarity between two concepts of ontology. Then, we have compared it against other methods that already exist. The obtained results are very interesting and prove the strength of our method.

We are interested in the future works to adjust this method to support the calculation of similarity between any two concepts that may exist in the same ontology or not and be related by any type of relations.

References

Lee, T.B., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 1–18 (2001)
Google Scholar
Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)
Google Scholar
Begam, F., Ganapathy, G.: Knowledge Engineering approach for constructing ontology for e-learning services. In: IEEE ICACSIS 2011, pp. 125–132 (2011)
Google Scholar
Brandt, S.C., Morbach, J., Miatidis, M., TheiBen, M., Jarke, M., Marquardt, W.: An ontology-based approach to knowledge management in design processes. Comput. Chem. Eng. 32(2008), 320–342 (2007)
Google Scholar
Trumbach, C.C., McKesson, C., Ghandehari, P., DeCan, L., Eslinger, O.: Innovation and design process ontology. In: IEEE Portland International Conference on Management of Engineering and Technology, pp. 2121–2132 (2015)
Google Scholar
Melo, G.D., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007l. LNCS, vol. 4425, pp. 541–548. Springer, Heidelberg (2007)
Google Scholar
Seidenberg, J., Rector, A.: Web ontology segmentation: analysis, classification and use. In: The 15th International Conference on World Wide Web, pp. 13–22. ACM, New York (2006)
Google Scholar
Hau, J., Lee, W., Darlington, J.: A semantic similarity measure for semantic web services. In: The 14th International Conference on World Wide Web, Chiba, Japan (2005)
Google Scholar
Burstein, M., Bussler, C., Zaremba, M., Finin, T., Huhns, M.N., Paolucci, M., Sheth, A.P., Williams, S.: A semantic web services architecture. IEEE Internet Comput. 5(9), 52–61 (2005)
Google Scholar
McIlraith, S.A., Martin, D.L.: Bringing semantics to web services. IEEE Intell. Syst. 1(18), 90–93 (2003)
Article Google Scholar
LNCS Homepage. https://www.w3.org/TR/owl2-overview. Accessed 17 Nov 2016
Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 4(15), 871–882 (2003)
Google Scholar
Gherabi, N., Bahaj, M.: Outline matching of the 2D shapes using extracting XML data. In: Elmoataz A., et al. (eds.) ICISP 2012. LNCS 7340, pp. 502–512. Springer, Heidelberg (2012)
Google Scholar
Li, F.: An improved method about the similarity calculation of ontology. In: IEEE International Conference on Multimedia Technology, pp. 1–4 (2010)
Google Scholar
Wang, H., Han, X.: Research on similarity of semantic web. In: IEEE International Conference on Computer Application and System Modeling, pp. 166–169 (2010)
Google Scholar
Ge, J., Qiu, Y.: Concept similarity matching based on semantic distance. In: IEEE 4th International Conference on Semantics, Knowledge and Grid, pp. 380–383 (2008)
Google Scholar
Zhong, J., Zhu, H., Li, J., Yu, Y.: Conceptual graph matching for semantic search. In: Priss U., Corbett D., Angelova G. (eds.): ICCS 2002, LNAI 2393, pp. 92–106. Springer, Heidelberg (2002)
Google Scholar
Ganjisaffar, Y., Abolhassani, H., Neshati, M., Jamali, M.: A similarity measure for OWL-S annotated web services. In: IEEE International Conference on Web Intelligence, pp. 621–624 (2006)
Google Scholar
Zhan, F.B.: Three fastest shortest path algorithms on real road networks: data structures and procedures. J. Geogr. Inf. Decis. Anal. 1(1), 69–82 (2001)
Google Scholar
Ma, Q., Steenkiste, P.: Quality-of-service routing for traffic with performance guarantees. In: The 5th International Workshop on Quality of Service, pp. 115–126. Springer, Berlin (1997)
Google Scholar
Hougardy, S.: The Floyd-Warshall algorithm on graphs with negative cycles. Inf. Process. Lett. 110(2010), 279–281 (2010)
Article MathSciNet MATH Google Scholar
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-Match: an algorithm and an implementation of semantic matching. In: Davies, J., et al. (eds.): ESWS 2004, LNCS 3053, pp. 61–75. Springer, Heidelberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

IR2M Laboratory, FSTS, Hassan 1st University, Settat, Morocco
Abdelhadi Daoui & Abderrahim Marzouk
LIPOSI Laboratory, ENSAK, Hassan 1st University, Khouribga, Morocco
Noreddine Gherabi

Authors

Abdelhadi Daoui
View author publications
You can also search for this author in PubMed Google Scholar
Noreddine Gherabi
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahim Marzouk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelhadi Daoui .

Editor information

Editors and Affiliations

ENSA, Khouribga, Morocco
Gherabi Noreddine
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daoui, A., Gherabi, N., Marzouk, A. (2018). An Enhanced Method to Compute the Similarity Between Concepts of Ontology. In: Noreddine, G., Kacprzyk, J. (eds) International Conference on Information Technology and Communication Systems. ITCS 2017. Advances in Intelligent Systems and Computing, vol 640. Springer, Cham. https://doi.org/10.1007/978-3-319-64719-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-64719-7_9
Published: 02 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64718-0
Online ISBN: 978-3-319-64719-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Enhanced Method to Compute the Similarity Between Concepts of Ontology

Abstract

Similar content being viewed by others

A New Similarity Measure for an Ontology Matching System

Combination of Lexical and Structure-Based Similarity Measures to Match Ontologies Automatically

An Enhanced Ontology Based Measure of Similarity between Words and Semantic Similarity Search

Keywords

1 Introduction