Keywords

1 Introduction

Ontology alignment is a well-known, broadly researched topic addressing the problem of establishing a set of mappings between independently created and maintained ontologies. These mappings connect elements from ontologies that somehow relate to the same objects taken from the real world. In other words - ontology alignment is the task of designating pairs of elements taken from ontologies that are as similar as possible. The topic has recently re-gained researchers’ attention due to the emerging topic of knowledge graphs and entity alignments. It puts focus more on the aspect of matching instances, representing real-world objects, rather than on matching schemas describing those objects.

Even though the described task is very simple to describe and understand, the solution is not easy to achieve. First of all, ontologies are very complex knowledge representation methods. Using them as a backbone of the knowledge base entails the decomposition of some domains into elementary classes (which form the level of concept), defining how they can interact with each other (forming the level of relations), and finally defining instances of concepts. In consequence, independent ontologies representing the same universe of discourse may significantly differ.

In the literature, a plethora of different solutions to this task can be found. A majority of methods can be stripped down to calculating similarities between two concepts taken from two ontologies.

These similarities can be based on the analysis of different aspects that describe the content of ontologies. For example, they can involve comparing concept names, how their hierarchies are constructed, which instances they include, what are their attributes, etc.

Such methods when used separately, cannot be expected to yield satisfying outcomes. Therefore, there must be a method of combining several different values into one, interpretable output. The most obvious approach would involve calculating an average similarity value from all partial functions. If such similarity is higher than some accepted threshold, then such a pair of concepts are added to the final result.

In our previous publications, we have proposed a different approach to combining several similarity values to decide whether or not some elements taken from independent ontologies are matchable. We developed a set of fuzzy inference rules that can be used to aggregate the aforementioned similarities. In other words - we proposed including another layer of experts’ knowledge about ontology alignment in the process.

Since the level concepts is the most covered in literature in [8] we focused on the level of relations, and in [9] on the level of instances. Therefore, the level of concepts remained to be addressed with our fuzzy approach. Thus, the main research tasks solved in the paper involve:

  1. 1.

    To develop a set of functions for calculating similarities between concepts based on their different features.

  2. 2.

    To develop a set of fuzzy inference rules that could be used to reason about how close two concepts taken from different ontologies are.

Along with previously developed tools (which target levels of instances and relations) presented in our earlier publications, the described methodology forms a fuzzy logic-based framework for ontology alignment. It is the main contribution of this article, which is organized as follows. Section 2 contains an overview of similar research along with its critical analysis. Section 3 provides mathematical foundations used throughout the paper. Section 4 forms the main contribution of the article. It describes partial similarity measures calculated between concepts of two ontologies. and how they can be combined into a single method for concept alignment employing fuzzy logic. The proposed solution was experimentally verified, and the collected results can be found in Sect. 5. Section 6 provides a summary and brief overview of our upcoming research plans.

2 Related Works

Since their introduction ontologies have become a popular knowledge representation format. They provide a flexible structure that allows to model, store, and process knowledge concerning some assumed universe of discourse ([19]). However, in many real cases, the knowledge can be distributed among multiple independent ontologies ([7]). Informally speaking, for a better understanding of the modelled part of the universe of discourse we need a whole picture of knowledge provided by the aforementioned distributed ontologies, thus a “bridge” between such ontologies is required ([12]).

As aforementioned in the previous section, a vast majority of ontology alignment methods come down to calculating similarities between two concepts taken from two ontologies. This approach spread across the initial research in the field ([3]) and more contemporary publications ([17]).

These similarities can be based on the analysis of different aspects that describe the content of ontologies. For example, they can involve comparing concept names ([2]) or how their hierarchies are constructed ([1]), etc.

To compare different approaches to ontology alignment an experimental verification is needed, which requires solid input data. In the context of ontology alignment, widely-accepted benchmark datasets are provided by the Ontology Alignment Evaluation Initiative ([16]), which contains a set of carefully curated ontologies along with alignments between them, that are treated as correct. Having such a dataset it is easy to compare results obtained from some ontology alignment tool with such arbitrarily given mappings and calculate measures like Precision and Recall.

According to [15] one of the most prominent ontology alignment systems are AML (also referred to as AgreementMaker), and LogMap. The description of AML can be found in ([4]). The solution is built on top of a set of matchers, each calculating different similarities among elements extracted from processed ontologies. The matchers are used in a cascade, each narrowing the results created by previous matchers to form the final result. The main focus is put on computational efficiency - AML has been designed to handle large ontologies while providing good quality outcomes.

[11] is solely devoted to LogMap. It is based on indexing lexical data extracted from ontologies and enriching them using WordNet. Then it uses interval labelling schema to create extended concepts’ hierarchies which are then used as a base for final ontology alignments. Further steps of mapping repair involve finding internally inconsistent alignments which can be excluded from the final outcome.

A machine learning approach was proposed in [13] where authors develop semantic similarity measures to use them as the background knowledge which in consequence can provide constraints that improve machine learning models.

To the best of our knowledge, very little research attempt to utilize fuzzy logic in the task of ontology alignment. Only a few publications ([5]) describes useful research on the given subject. Therefore, having in mind good results yielded from our previous publications, in this paper, we incorporate fuzzy logic for the task of ontology alignment on the level of concepts.

3 Basic Notions

A pair (A,V), in which A denotes a set of attributes and V denotes a set of valuations of these attributes (\(V = \bigcup _{a \in A}V_a\), where \(V_a\) is a domain of an attribute a), represents the real world. A single (AV)-based ontology taken from the set \(\tilde{O}\) (which contains all (AV)-based ontologies) is defined as a tuple:

$$\begin{aligned} O=(C, H, R^C, I, R^I) \end{aligned}$$
(1)

where:

  • C is a finite set of concepts,

  • H is a concepts’ hierarchy,

  • \(R^C\) is a finite set of binary relations between concepts \(R^C =\{r_1^C, r_2^C, ..., r_n^C\} \), \( n \in N \), such that \(r_i^C \in R^C \) (\(i \in [1,n]\)) is a subset of \(r_i^C \subset C \times C\) ,

  • I is a finite set of instance identifiers,

  • \(R^I =\{r_1^I, r_2^I, ..., r_n^I\} \) is a finite set of binary relations between instances.

As easily seen the proposed ontology definition distinguishes four levels of abstraction: concepts, relations between concepts, instances, and relations. In this paper, we will mainly focus on the level of concepts. Each concept c from the set C is defined as:

$$\begin{aligned} c=(id^c,A^c,V^c,I^c) \end{aligned}$$
(2)

where:

  • \(id^c\) is an identifier (name) of the concept c,

  • \(A^c\) represents a set of concept’s c attributes,

  • \(V^c\) represents a set of domains attributes from \(A^c\), \(V^c = \bigcup \limits _{a \in A^c}V_a\),

  • \(I^c\) represents a set of concepts’ c instances.

In order to “translate” a content of some source ontology to the content of some other target ontology one must provide a set of correspondences between their elements. This set is called an alignment, and between two (AV)-based ontologies \(O_1={(C_1,H_1,R^{C_1},I_1,R^{I_1})}\) and \(O_2={(C_2,H_2,R^{C_2},I_2,R^{I_2})}\) it can defined in the following way:

$$\begin{aligned} Align(O_1,O_2)=\{Align_C(O_1,O_2), Align_I(O_1,O_2), Align_R(O_1,O_2)\} \end{aligned}$$
(3)

It includes three sets each containing correspondences between elements taken from the level of concepts, instances, and relations. Due to the limited space and the fact that this paper is solely devoted to the level of concepts, we will provide a detailed definition only for the level of concepts:

$$\begin{aligned} Align_C(O_1,O_2)= \lbrace (c_1,c_2) | c_1 \in C_1 \wedge c_2 \in C_2 \rbrace \end{aligned}$$
(4)

where \(c_1,c_2\) are concepts from \(O_1\) and \(O_2\) respectively. The set \(Align_C(O_1,O_2)\) contains only concepts pairs that have been processed and eventually marked as equivalent by fuzzy-based alignment algorithm described in the next section.

4 Fuzzy Based Approach to Concept Alignment

The main aim of our work is to determine the mappings between two ontologies on the concept level. In the proposed fuzzy method, we distinguish four input variables and one output. The input elements are measures that examine the lexical, semantic, and structural degrees of similarity of the two concepts. We incorporate four similarity functions. For the two given concepts \(c_1 \in C_1\) and \(c_2 \in C_2\) taken from two ontologies \(O_1\) and \(O_2\) they include:

  1. 1.

    the value of similarity between sets of concepts’ attributes

    \(attributesSim(c_1,c_2)=\frac{|A^{c_1}\cap A^{c_2}|}{|A^{c_1}\cup A^{c_2}|}\);

  2. 2.

    the value \(jaroWinklerSim(c_1,c_2)\) of Jaro-Winkler similarity ([6]) between identifiers \(id^{c_1}\) and \(id^{c_2}\);

  3. 3.

    the value \(levenshteinSim(c_1,c_2)\) of Levenshtein similarity ([6]) between identifiers \(id^{c_1}\) and \(id^{c_2}\);

  4. 4.

    the value \(wordNetSim(c_1,c_2)\) of Wu-Palmer ([14]) similarity between identifiers \(id^{c_1}\) and \(id^{c_2}\) calculated by incorporating WordNet as external knowledge source.

Function 1 is based on a simple Jaccard index. However, attributes are taken from the global set A of all possible attributes therefore, it is possible to designate structural concept similarity this way. Functions 2–3 in order to provide a similarity value compare identifiers of concepts that may look counterintuitive from a formal model point of view. However, in practical applications concepts are usually identified by their names which express their nature. Moreover, in OWL (which is the most commonly used ontology representation format [10]) the given names are enforced to be unique. Therefore it is very straightforward to use them to calculate a similarity between concepts. In our method, we initially perform basic preprocessing of those names which involve lemmatization and stop word removal, which is omitted to increase its clarity.

Eventually, all input variables are associated with the following set of linguistic terms: low, medium, high, which allows us to define them through triangular-shaped membership functions presented in Fig. 1.

Fig. 1.
figure 1

Input variables of the fuzzy framework for ontology alignment on the concept level

The output of the proposed system (referred to as connection) can obtain one of three values: {equivalent, related, independent}. Based on these assumptions, in our approach, we decided that two concepts can be mapped if the obtained output of the fuzzy-processing of similarities calculated between them identifies the connection as equivalent. The described framework uses the minimum rule for the conjunction, the maximum rule for fuzzy aggregation, and the Mamdani type rule inference. The eventual defuzzification is performed based on the centroid method based on the values presented in Fig. 2. The inference rules were prepared by a group of experts and are presented in Table 1.

Fig. 2.
figure 2

Output variable of the fuzzy framework for ontology alignment on the concept level

The fuzzy based algorithm for creating ontology alignment on the concept level is presented as Algorithm 1. The procedure accepts two ontologies \(O_1\) and \(O_2\) as its input and initially creates an empty set \(Align_C(O_1,O_2)\) for the found mappings (Line 1). The it produces a cartesian product of two sets of concepts \(C_1\) and \(C_2\) (Line 2), creating a set of all concepts pairs from aligned ontologies. It iterates over its content (Lines 3–7). In each iteration the algorithm calculates four similarities \(attributesSim(c_1,c_2)\), \(jaroWinklerSim(c_1,c_2)\), \(levenshteinSim(c_1,c_2)\) and \(wordNetSim(c_1,c_2)\) (Line 3). Then the collected values are fuzzified (Line 4) and a potential mapping of two concepts is created with fuzzy inference rules taken from Table 1. This step yields final result. If the found mapping is identified as “equivalent” then it is added to the set \(Align_C(O_1,O_2)\) (Lines 5–7).

Table 1. Fuzzy inference rules
figure a

5 Experimental Verification

5.1 Evaluation Procedure and Statistical Analysis

The proposed fuzzy framework for ontology alignment on the concept, described in Sect. 4, has been implemented and verified against a benchmark dataset provided by the Ontology Alignment Evaluation Initiative (OAEI) ([16]). OAEI is a non-profit organisation which since 2004 organises annual evaluation campaigns aimed at evaluating ontology mapping solutions. Organisers provide benchmarks that include a set of pairs of ontologies with their corresponding reference alignments, which should be treated as correct. These pairs of ontologies are then grouped in tracks and each track allows testing alignment systems for different features i.e. the Interactive matching track offers the possibility to compare different interactive matching tools which require user interaction and the Link Discovery track verifies how alignment systems deal with link discovery for spatial data where spatial data are represented as trajectories. Having the aforementioned reference alignments at your disposal makes it very easy to confront them with alignments generated be the evaluated solution, eventually calculating the common measures of Precision, Recall and F-measure.

This section presents the results of an experimental verification of the developed solution. We wanted to verify a hypothesis that the quality of mappings created by our framework is better or at least not worse than the other alignment system described in Sect. 2. The experiment was based on the ConferenceFootnote 1 track from OAEI dataset from the campaign conducted in 2019 ( [18]). The accepted track includes seven ontologies cmt, conference, confOf, edas, ekaw, iasted and sifkdd. The experiment began with designating an alignment for each of 21 pairs of ontologies from this set using the implemented fuzzy framework. Then, thank to the provided reference alignments we were able to calculate values of Precision, Recall and F-measure. The obtained results can be found in Table 2.

Table 2. Results of fuzzy-based concept alignment framework

Data from Table 2 has been subjected to statistical analysis. All tests have been conducted for the significance level \(\alpha =0.05\). Before selecting a proper statistical test, we checked the distribution of obtained samples of: Precision, Recall and F-measure using a Shapiro-Wilk test. The p-value calculated for the sample of Precision equals 0.07538, the p-value for the Recall sample equals 0.113911 and the p-value for the F-measure sample equals 0.768263. The p-values of all samples were greater than the assumed \(\alpha \) therefore we were allowed to claim that all of them come from the normal distribution. Thus, we have chosen the t-Student test for further analysis of experimental results.

The first null hypothesis we checked claims that the mean value of the Precision measure equals 0.82. The calculated p-value of the t-Student test equals 0.0282232 and the value of statistic equals 2.025. We can reject this hypothesis in favour of claiming that the mean value of the Precision measure is greater than 0.82.

The second verified null hypothesis claims that the mean value of the Recall measure equals 0.52. The obtained p-value equals 0.048257 and the value of statistic equals 1.744. It allows us to reject such null hypothesis and claim that the mean value of the Recall measure is greater than 0.52.

The final verified null hypothesis claims that the mean value of the F-measure equals 0.63. The t-Student test resulted with 1.887 with a p-value equal to 0.036876. As previously, it is possible to accept the alternative hypothesis which claims that the mean value of the F-measure is greater than 0.63.

5.2 Results Interpretation

Unfortunately, the OAEI organization did not provide partial results obtained by individual tools on specific pairs of ontologies, but only a summary of mean values for the entire dataset. Therefore, in order to compare the developed method with others, we decide to calculate the average values of Precision, Recall and F-measure obtained for individual ontology pairs from the dataset. The collected values of other tools and the proposed framework are presented in Table 3.

As easily seen in Table 3, the proposed tool for determining mappings between ontologies obtained a very good value of the Precision equal to 0.87, which is only 0.01 worse than the best result of 0.88 obtained by the enda and StringEquiv tools. On the other hand, the average value of the Recall measure obtained on the entire dataset was 0.59, which may not be a very good result, but is still satisfactory considering the results obtained by other tools. Most of the solutions achieved an average value of the Recall ranging from 0.54 to 0.64, with the best value of the Recall equal to 0.76 for the SANOM, and the worst for ONTMAT1 equal to 0.49.

In the case of the most reliable indicator of the solution assessment, the F-measure which takes into account both Precision and Recall, the proposed tool was at the forefront of the tested solutions with a score of 0.69. The best result of the F-measure was obtained by the SANOM system - 0.77, while the worst equal 0.61 for Lily.

Table 3. Average results of ontology alignment tools

The conducted experiment proved that the developed tool obtains better values of measures for assessing the quality of solutions than most other tools tested in the Conference track. The dataset used in the experiment contained many reference mappings, the validity of which, from the subjective user point of view, seems questionable. For example, the concepts Country and State may seem similar, but the Cambridge Dictionary defines the former as “an area of land that has its own government, army, etc.”, while the latter as “one of the parts that some countries such as the US are divided into”.

Due to the nature of the benchmark dataset provided by OAEI, which included many difficult or unintuitive mappings, it can be assumed that the developed method is capable of achieving even better results under real-world conditions on real ontologies, and not on a synthetic dataset.

6 Future Works and Summary

In recent years ontologies have become more and more popular because they provide a flexible structure that allows one to model, store, and process knowledge concerning some assumed universe of discourse. In many real cases, there is a need for integrating ontologies into a single unified knowledge representation and the initial step to achieving such a goal is designating a “bridge” between two ontologies. In the literature, this issue is referred to as ontology alignment. Formally, it can be treated as a task for providing a set of correspondences (mappings) of elements taken from two aligned ontologies. Even though the described task is very simple to describe and understand, the solution is not easy to achieve.

The practical application of the tackled issue appears when communication of two independently created information systems is required. Informally speaking, some kind of a bridge between them is expected. Utilizing ontologies in such systems is not uncommon. For example, many medical systems operating different therapeutic devices (e.g. CT or linear accelerator) incorporate ontologies as a backbone healthcare vocabularies (e.g. SNOMED-CT or ICD10) used to describe patients’ treatment courses. Providing the interoperability of such systems requires their terminologies to be initially matched.

The article is the final element of our fuzzy logic ontology alignment framework, which addresses the level of concepts. We have taken inspiration from our previous publications ([8, 9]) in which we focused on mapping ontologies on the level of relations and the level of instances. We proposed to aggregate several similarity values calculated between concepts taken from independently created ontologies by introducing another level of expert knowledge in the form of fuzzy inference rules. Such an approach makes it possible to decide whether or not some elements taken from independent ontologies are matchable.

The developed framework was experimentally verified and compared with other solutions known from the literature (SANOM, AML, LogMap, Wiktionary, DOME, edna, LogMapLt, ALIN, StringEquiv, ONTMAT1and Lily). This comparison was performed using a commonly accepted dataset provided by Ontology Alignment Evaluation Initiative. The collected results allow us to claim that our approach yield very good alignments.

In the upcoming future, we plan to focus on the aspect of scalability of the developed fuzzy framework. To do so we plan to conduct more extensive experiments using different datasets created by the Ontology Alignment Evaluation Initiative. Additionally, we plan the developed other similarity functions for the level of concepts to increase the flexibility of the proposed framework.