1 Introduction

Ontology mapping is a well studied problem, several matching approaches have been proposed [10]. These methods aim at finding correspondences between the semantically related entities of those ontologies. From this approaches we can identify: the extensional approaches, and the intentional approaches. The majority of these methods finds only equivalence relations (CIDER-CL [6], YAM++ [7], LogMap [8]) and do not consider also the asymmetric relations like the subsumption. Most of the proposed approach are symmetrical and intentional. The only extensional and asymmetric method is the AROMA method [11]. Therefore, this method discovers only simple relationships.

Most existing matching approach concentrates on finding 1-1 mappings between two given ontologies. However, complex mappings are very useful in practice. Simple correspondences are not sufficient to express relationships that represent correspondences between entities since it (1) may be difficult to discover simple correspondences (or they do not exist) in certain cases, or (2) simple correspondences do not allow for expressing accurately relationships between entities.

As a motivating example, consider two ontologies \(\mathcal {O}_1\) and \(\mathcal {O}_2\) (Fig. 1) describing cell types. \(\mathcal {O}_1\) is a part of the ontology CLFootnote 1 and \(\mathcal {O}_2\) is an extract of the ontology BCGOFootnote 2. The proposed methods [7, 8] can’t find the most similar entity node in \(\mathcal {O}_2\) that maps to the entity node phagocyte in \(\mathcal {O}_1\). But, the entity phagocyte can match to the intersection of the three entities motile cell, native cell et stuff accumalating cell dans \(\mathcal {O}_2\). The terms describing the phagocyte concept are belonged in the dataset of the three concepts motile cell, native cell and stuff accumalating cell.

Fig. 1.
figure 1

Ontology O1 and ontology O2

The rest of this paper is organized as follows: First we review related work and we illustrate the limitations of existing complexes mapping approaches. Next we introduce the proposed method. Finally we present experimental results and concluding remarks.

2 Related Work

In order to find complex correspondences some approaches have been considered. We present in the following the most interesting ones. Doan and colleagues [12] developed a system CGLUE that uses machine learning techniques to semi-automatically generate semantic matching. This system finds disjunctions and equivalence relations between concepts. It finds complex matching between taxonomies. CGLUE is based on the notions of semantic similarity, expressed in terms of the joint probability distribution of the concepts involved. This system calculates the joint distribution of the concepts and use the joint distribution to compute any appropriate similarity measure.

CGLUE is based on the assumption that the children of any ontology entity are mutually exclusive and exhaustive. We note that the assumption maintains for many real ontology, in which the further specialization of an entity usually gives a partition of the instances of that entity. However, in many real ontologies, very sibling entities share instances. Hence, for these domains this approximating assumption is not hold.

The two approaches [13, 14] are based on the inductive logic programming, ILP, and attempts at creating alignments by using the learning theory. These approaches take complex correspondences into account and not only equivalence correspondences. But here it is not possible to create complex mappings without learning correspondences out of instances. Often ontologies do not contain any instances. Hence the learning theory cannot be applied in order to find complex correspondences in ontologies without instances.

The pattern-based ontology matching approach presented in [15] define patterns to discover automatically complex correspondences. A master alignment of ontologies is necessary. To detect these correspondences, a set of simple conditions must be satisfied for each model. These conditions are a combination of structural, linguistic techniques and types compatibility. The defined models are (notice that the notation \(i\#C\) is used to assign to an entity C from ontology \(O_i\)):

  1. 1.

    CAT (Class by Attribute Type Pattern): this model detects correspondences as \(1\#A\equiv \exists 2\#R.2\#B\);

  2. 2.

    Class by Inverse Attribute Type Pattern \((CAT^{-1})\): this model allows correspondences as which are written as \(1\#A\equiv 2\#B\cap 2\#R_1.T \), to be detected;

  3. 3.

    CAV (Class by Attribute Value Pattern): this model detects correspondences as \(1\#A\equiv \exists 2\#R.\left\{ ...\right\} \), \((where \left\{ ...\right\} \) is a set of concrete data values)

  4. 4.

    PC (Property Chain Pattern): this model allows correspondences as \(1\#R\equiv 2\#P\circ 2\#Q\).

This method can find a lot of complex correspondences. However, the used patterns cover only peculiar domains of ontologies.

After analysing these approaches, we note that the above mentioned methods only consider the equivalence relations between concepts and do not take into account the asymmetric relations such as the subsumption. To overcome these significant limitation, we have developed a new complex mapping methodology named ARCMA [4, 5] (Association Rules Complex Matching Approach) which permit to map subsumption relations between entities.

3 A New Method for Complex Matching

The alignment method ARCMA [4], aims at finding complex correspondences between two OWL ontologies.

ARCMA follows three consecutive steps: (1) the term or data sets extraction (The pre-processing step), (2) the detection of association rules between entities of two ontologies and (3) the post-processing of results.

In the pre-processing step, a set of relevant terms embedded in the descriptions and entities instances is generated by using a natural language processing tools. We represent the entities (concepts and proprieties) by set of terms and data generated from their description and instances. We extract the name and the terms contained in the annotations (labels, comments, etc.). We also add the local name, the annotations and the values of its instances [5].

In the second step, ARCMA detects the complex matching between two OWL ontologies using the association rule model and a statical measure, the implication intensity [2]. A valid association rule \(x\rightarrow y_1\wedge ...\wedge y_i.. \wedge y_n\) means that the vocabulary associated to a source entity x aims to be included in the intersection between the relevant terms of set of entities \(y_i\). For example, the valid rule \(phagocyte \rightarrow motile\ cell \wedge native\ cell \wedge stuff\ accumalating\ cell\) could be interpret: The entity phagocyte corresponds to intersection of the three entities motile cell, native cell and stuff accumalating cell. The post-processing eliminates the redundancies in matcher found.

Fig. 2.
figure 2

The ARCMA process

Figure 2 illustrate the process of our method to discover the complex mappings between OWL ontologies. First, we use two OWL multiple inheritance ontologies. Then we apply a pretreatment process to define their relationship on a common extension. We also consider a reference alignment between these two ontologies. Next, we utilize the association rules to find complex correspondences type \( x\Rightarrow y_1\wedge ...\wedge y_i...\wedge y_n\). Finally, we reduce the redundancy in the extracted rule set. A rule will be selected if none of its generative rules have a value of the implication intensity (\(\varphi \)) greater than or equals to its \(\varphi \) value.

4 Evaluation

To estimate the performance of our approach, a prototype is realized in Java. Our system supports input two OWL ontologies and a reference alignment, then comparing the correspondence obtained by our tool and those by a manual mapping. This evaluation is carried out by exploiting the two metrics alignment quality: precision and recall [16]. Precision measures the ratio of correctly found correspondences over the total number of returned correspondences. Recall compute the ratio of correctly found correspondences over the total number of expected correspondences.

The experiment is performed on the large biomedical ontologies and the anatomy track available to Test library of Ontology Alignment Evaluation Initiative OAEIFootnote 3. The Large Biomedical track contains the mapping of FMA (78,989 classes), NCI Thesaurus (66,724 classes) and SNOMED CT (306,591 classes) and uses the UMLS Metathesaurus as the basis for the track’s reference mappings. The reference mappings only include subsumption and equivalence relations between classes. The track consists of three matching problems: FMA-NCI, FMA-SNOMED CT and SNOMED CT-NCI. The anatomy track includes the mapping of the two ontologies Adult Mouse Anatomy (AM) and part of the NCI thesaurus describing human anatomy. The reference mapping includes only equivalence correspondences between classes.

Our method ARCMA requires that the source ontology supports multiple inheritances. Among the Large Biomedical track and the anatomy track, there are only three ontologies containing multiple inheritances (SNOMED, AM and human). Hence, we will exploit these last ontologies and two references alignments: SNOMED CT-NCI and reference. The characteristics of these ontologies are shown in Table 1.

Table 1. Description of the ontologies used for the evaluation of ARCMA

The Table 2 illustrates the results obtained by the alignment method ARCMA, with the rule selection threshold \(\varphi _r=0,9\).

Table 2. Performance measures of ARCMA
Fig. 3.
figure 3

Values of precision as a function of the threshold value \(\varphi _r\)

Fig. 4.
figure 4

Values of recall as a function of the threshold value \(\varphi _r\)

In this table we note that in some tests such as \(Small SNOMED\_nci-Small NCI\_fma\), the value of precision is 1, that means that the results of our method are the same given by an expert, and for the many other tests, the precision value is higher than 0.75, therefore, our system gives good results which are encouraging. For example, ARCMA discovered the following meaningful implications from SNOMED_small_overlapping_nci to NCI_small_overlapping_fma:

R1. Central_nervous_system_tract_structure \(\rightarrow \) Central_Nervous_System_Part AND Nerve

R2. Duodenal_papilla_structure \(\rightarrow \) Biliary_Tract AND Duodenum AND Pancreatic_Duct

R3. Colonic_muscularis_propria_structure \(\rightarrow \) Colon AND Muscularis_Propria

Figures 3 and 4 show the influence of rule selection threshold \(\varphi _r\) on the precision and recall of ARCMA. We note that the value of the precision increases with the higher level of the threshold. This phenomenon clearly shows a correlation between the deviation from independence situation and the relevance of rules. In general, we can conclude that ARCMA achieved a good precision/recall values. The high recall value can be explained by the fact that UMLS thesaurus contains definitions of highly technical medical terms.

5 Conclusion

In this paper, we proposed a new approach for discovering complex mappings between two OWL ontologies. We utilized the association rule and the statical measure, the implication intensity, to detect implicative and conjunctive mapping containing complex correspondences. We implemented the approach and experimentally evaluated it on the large biomedical ontologies and the anatomy track, which demonstrated the high precision of the discovered correspondences. The principal advantage of this approach is that it is simple. Besides, the use of the implication intensity measure permit to approve the validity of the complex correspondences and justifies the good precision values obtained by ARCMA.