Keywords

1 Introduction

Ontologies, which can be used to describe concepts and relations pertaining to various domains have been a field of intensive study for many years. They can be applied in many areas such as Semantic Web [13], eCommerce and eBusiness [11], analysis of gene functions [21, 22], autonomous mobile platforms control [10] and many others. As mentioned in [4], the ontologies can be classified into upper ontologies, an example of which is SUMO [27], describing the most general concepts, and domain ontologies, describing the concepts related to a particular domain. Moreover, the ontologies can be also classified as [4, 31]:

  • informal ontologies, in which the types are undefined or defined in natural language,

  • formal ontologies, in which the concepts and relations are named and partially ordered,

  • axiomatized ontologies, which are special forms of formal ontologies, in which the subtypes are defined in a formal language,

  • prototype-based ontologies, which are special types of formal ontologies, in which the subtypes are defined by means of comparison with typical members,

  • terminonological ontologies, in which the concepts are described by labels without an axiomatic foundations. An example of such an ontology is a wordnet.

Various methods of ontologies creation are known, being typically divided into manual and automatic or semi-automatic methods. Among the existing automatic and semi-automatic methods of ontologies creation one may find methods based on data extraction from text, as described in [4, 20]. It is also quite frequent that the corpora from which the ontologies can be built or enriched originate from the web resources [1, 26]. Furthermore, the learning methods can be divided into clustering methods [5, 9] and pattern matching methods [16]. For a broader overview of available (semi-)automatic methods of ontologies creation we refer the reader to [14].

There are also numerous papers dealing with ontology reuse methods, which typically use Wikipedia or WordNet to acquire some existing knowledge, e.g. [2, 17, 29, 30]. Arnold and Rahm [2] propose a method, which extracts semantic relations from Wikipedia definitions using the semantic patterns matching. The semantic patterns application phase is preceded by text preprocessing, which is also similar to the approach in [17], where Robust Minimal Recursion Semantics (RMRS) representation is used for data preprocessing. The works by Ruiz-Cascado et al. [29, 30] describe an approach in which the internal connections within Wikipedia (the hyperlinks), combined with WordNet relations were used to extract only relevant elements of the definitions. Then a set of generalized patterns is devised, which allows for identification of new relations.

Apart from the aforementioned approaches which mainly focus on the analysis of unstructured text, there are also solutions which try to investigate the already existing structures of the reused resources, such as Wikipedia info boxes used for the creation of DBpedia [3]. Finally, there are also approaches which aim at explicit modularisation of existing ontologies with the purpose of partial reuse. For examples of these and similar approaches see [12, 33].

The main focus of the current paper is the manual creation of ontologies with the use of ontology editors. In particular, we focus on the Ontolis ontology editor described in [8]. It is a multi-platform, highly adaptable tool for visual definition of ontologies, supporting the standard ontology data formats such as OWL and RDF, as well as custom formats [6]. There are some alternative ontology editors, out of which one of the most popular ones is Protégé, which is an open source ontology editor available as a desktop as well as web application [15]. There are also similar solutions supporting the edition of terminological ontologies, i.e. wordnets, for instance the Dictionary Editor and Browser (DEB) platform described in [18].

The main contribution of the paper is the proposal of an idea of ontology reuse for the purpose of new concepts prototyping. We consider an approach in which instead of building the complete concept definition manually, it is possible to obtain partial or even full information on the concept’s meaning from existing and widely available resources. The main motivation behind the proposed idea of ontology reuse is to ease the process of manual ontologies creation, which can be time-consuming and troublesome. However, the key difference between our approach and the ones described by Arnold and Rahm [2], or Ruiz-Cascado et al. [29, 30] is that we do not aim at large-scale automatic data extraction. Instead we propose to reuse existing resources as a guidance for the user of the ontology editor.

In the context of manual ontology edition, our approach is similar to the work of Xiang et al. [33]. However, the Ontofox system they propose is targeted at the life sciences domain, providing the support for specialized ontologies only, while our approach does not impose such restrictions. In particular, we focus on using generic wordnets, which are lexical databases [24, 25] or terminological fundamental ontologies [4], and Wikidata, which in turn is a collaborative knowledge base [32], as potential sources of information. As for the wordnets, we focus our attention on one of two Polish wordnets, namely plWordNet [23, 28]. The reason for choosing this particular resource is that its contents have already been investigated by one of the authors in [19] and they were found suitable for the purposes of the current paper.

The main drawbacks of Ontofox system [33] are that it only allows for reuse of a single ontology at a time, and that it provides a limited set of configuration settings (such as to only include all, computed or no intermediates). Furthermore, some of the settings require additional knowledge related to the proper format of the parameters (see the Annotation/Axiom Specification part). The approach we propose is free from these restrictions as we allow for full and user-friendly customization of the data retrieval process. Although currently we support only the Wikidata resource, the extension to use plWordNet should not require much additional effort.

The paper is divided into four sections. Section 2 describes our proposal of ontology reuse for the purpose of new concepts definition. We also present an example of how the existing resources, including Wikidata and plWordNet can be reused. In Sect. 3 we discuss the implemented extension to the Ontolis ontology editor and present a preliminary evaluation of this extension. Finally, Sect. 4 contains the concluding remarks and future research perspectives.

2 The Idea of Ontology Reuse

The proposed idea of ontology reuse is primarily aimed at the simplification of the process of new concepts definition. We assume here that the new concept can be defined by means of an ontology, which may become a means of communication and collaborative learning as suggested in [6, 7]. Let us then describe the steps necessary to perform the ontology reuse process:

  1. 1.

    A concept to be defined is selected. The concept can be represented by a single word or a compound phrase describing certain information. Let us denote the concept to be defined by \(c_0\). We assume that the concept is given in a certain language, but one may consider the following two extensions which can increase the probability of ontology reuse:

    1. (a)

      the user provides the concept in a multi-lingual form, i.e. the translations of the concept are provided manually, or

    2. (b)

      the user provides the concept in a single language, but the translations are automatically searched for. For this purpose a previously prepared dictionary can be used or some online resource may be consulted for possible translations.

    The increase of probability of ontology reuse mentioned above follows from the fact, that the concept may not be defined in the original language it has been provided. But it is possible that it has been defined in some other language, thus a chance of ontology reuse still exists.

  2. 2.

    A set of resources to be consulted is defined. This set can contain local or online resources, or a mix of both types of resources. Let us denote the set by \(R = \{R_1, R_2, \ldots , R_k\}\), where k is the number of available resources. The resources may be consulted in a sequential manner, according to some predefined order or they may also be accessed simultaneously (in parallel) to make the process faster.

  3. 3.

    Given the concept \(c_0\) and the set R, the resources are being asked for the definition of the concept \(c_0\). The outcome of this step can be three-fold:

    1. (a)

      none of the resources is able to locate a definition of the searched concept. If this is the case the user has to provide an additional part of the ontology defining the concept. This means that an element \(c_1\) related to the original concept \(c_0\) has to be defined, together with the name of the relation connecting the two elements. Element \(c_1\) becomes then the new concept to be defined and the procedure defined above is repeated again.

    2. (b)

      only one resource is able to locate a definition of the searched concept. If this is the case, the definition can be reused completely, allowing also for some modifications and extensions if needed. Such a result ends the concept definition process.

    3. (c)

      mutliple resources are able to locate a definition of the searched concept. If this is the case, the definitions can be either compared with each other, leaving the user the chance to decide on the ontology to be reused, or they can be integrated with each other by means of relations and concepts comparison. Regardless of the approach taken, the reused ontology can be further edited or modified according to the needs. Such a result also ends the concept definition process.

  4. 4.

    The concept definition process ends with either complete or partial reuse of existing resources or, if none of the concept parts can be found in the resources belonging to set R, it ends with the ontology being built manually by the user.

Let us observe, based on the description provided above, that in the best case the concept to be defined can be retrieved in its entirety from the external resources. The worst-case scenario assumes that no part of the concept definition can be reused, but we conjecture that such a case should not be very frequent. We think so, because the concepts constituting the parts of the ontology will probably become more general and more common, thus the chance of finding their definitions will also increase.

Let us also consider the possible resources that may be contained in set R. We claim that both wordnets as well as resources such as Wikidata can be used for the purpose of aiding of ontologies construction. Although both types of resources provide some information on the words or phrases (i.e. concepts) and the relations between them, the type of information differs, for example in terms of naming.

In case of wordnets, the primary relation is a synonymy of words, since wordnet contents are typically organized in the form of synsets (i.e. synonym sets). Apart from the synonymy we often consider hyponymy as well as hypernymy, which can be respectively considered as specializations and generalizations. Furthermore, certain words may have similar meaning to some other words. In this case we may say that words can be inexact synonyms, as opposed to exact synonyms, which can be used interchangeably without changing the meaning of the sentence. The drawback of using wordnets for the purpose of ontology reuse is that they do not map directly to the ontological relations, or the mapping is not unambiguous. For instance a hyponym can represent the subclass of relation, but in some cases it will also represent the part of or instance of relations. Nevertheless, the information provided by the hypernymy/hyponymy hierarchy may still be helpful and, what is more it can be at least partially reused. Furthermore, although the synonyms (both exact and inexact) do not have their corresponding relations in the constructed ontology, they can serve the purpose of disambiguating terms, as some words can have multiple meanings and the user has to decide which ontology to reuse.

As for the Wikidata-like resources, the relations can be considered in direct ontological terms, including the aforementioned subclass of, part of, and instance of relations, as well as some other relations. The synonymy relation, although not represented directly can be also partially seen in the form of also known as descriptor available in Wikidata, which contains alternative names for a particular term or concept. What is more, Wikidata provides also some specific relations appearing only in certain contexts. Thus we may assume that the information provided by Wikidata can be much broader than the information provided by wordnets. Whether the additional amount of information provided by Wikidata is considered as an advantage or a drawback depends probably on the particular user’s needs.

2.1 An Example

To illustrate the ideas presented above, and in particular to show the different types of information that can be provided by Wikidata and a wordnet, we will consider the concept of doctor. In the example we use plWordNet as the wordnet resource, but this should not affect the general conclusions drawn from the example. Following the procedure described above, the concept \(c_0\) is thus defined as \(c_0 = \text{ doctor }\) and the set of resources is given as \(R = \{\text{ Wikidata }, \text{ plWordNet }\}\).

The search for the concept in Wikidata shows that the word is considered a synonym of word physician. The information is obtained by observing the also known as descriptor. Focusing on the most basic relations it can be noticed that a physician is an instance of profession, and also a subclass of two classes, namely health professional and scientist. Following these relations leads to the discovery that profession is a subclass of occupation, which in turn is a subclass of activity and human behavior. Since both the activity and the human behavior concepts are subclasses of behavior, and there also exists a path of subclass of relations of the form human behavior \(\rightarrow \) animal behavior \(\rightarrow \) behavior, we can conclude that the analysis of the profession concept leads to the behavior concept appearing on different levels of the hierarchy. Similarly, an activity is a subclass of an event, which also appears in the tree of behavior definition. Finally, the analysis of a health professional and scientist concepts will lead to the concept of a person, which, through some intermediate relations leads to the concept of an entity. Thus, from the analysis of the ontology, which is also partially shown in Fig. 1, we may conjecture that a doctor can be very generally described as an entity with certain behavior.

Fig. 1.
figure 1

The partial view of the ontology obtained from Wikidata for the concept doctor

The search for the doctor concept in the plWordNet leads to the following discoveriesFootnote 1. The wordnet provides three synonyms, namely lek. (dr.), doktor (doctor) and konsyliarz (old name for a doctor), which can be considered as equivalents of also known as descriptor available in Wikidata. Furthermore, there are two hypernym paths, i.e. paths representing a sequence of subclass of relations, which “meet” at the level of człowiek (human) concept. By “meeting” of the hypernym paths we mean that at one point of the hypernymy hierarchy, two different concepts become subclasses of a human. The complete information on the ontology available in plWordNet is presented in Fig. 2 (the English translations of respective concepts were added for clarity).

Fig. 2.
figure 2

The view of the ontology obtained from plWordNet for the concept doctor

Comparing the results presented in Figs. 1 and 2 as well as their respective descriptions, the following conclusions can be drawn. Firstly, both resources provide a ready-to-use ontology describing the doctor concept. Thus it is possible to reuse the existing data, without the need for the provision of own definition. Secondly, although we have restricted our attention to only selected relations available in Wikidata, the amount of information available from this resource is much greater than the data obtained from plWordNet. Thirdly, in case of both resources we can observe that initially different relations lead to some common concepts appearing at different points of the hierarchy. In particular, the behavior and event concepts appear in Wikidata ontology a couple of times, while the human concept is the common hypernym for the two paths in the wordnet. Finally, comparing the actual ontologies, it can be observed that the wordnet-based ontology is more human-centered, i.e. it focuses on the meaning of a doctor as a person, while Wikidata, apart from the human-centric definition, provides also a definition concentrated around the more abstract behavior concept.

3 A Proof of Concept

In this section we show how the idea presented in Sect. 2 has been implemented in practice in the Ontolis ontology editor.

3.1 Ontolis Extension Implementation

As already mentioned, Ontolis is a highly adaptable ontology editor which mainly focuses on representing ontologies as graphs. It addresses the problem of usability and adaptability to user’s needs. This goal is achieved by using metaontology which describes such features as the visualization of supported node and relation types, supported input/output file formats (e.g. OWL), advanced plugins like mergers and graph alignment tools.

The problem of (semi-)automatic addition of related concepts to the concepts defined in Ontolis is not trivial. It is so, because various ontological knowledge bases provide different APIs. To tackle this problem, we plan to present a generic API and a number of wrappers for popular ontology resources’ HTTP/SPARQL APIs. Thus far we were able to enhance Ontolis with a functionality of adding related concepts for any concept based on the Wikidata API. We have wrapped Wikidata API functions for searching entities by their labels and getting entities which are related to them. Searching by label is used for retrieving alternative meanings of the concept. In the future, we want to order the alternative meanings using the context of the concept in an ontology.

The process of data retrieval is as follows. At first, the concept for which the ontology is being built is specified. Then it is searched for in Wikidata and after being found a set of its meanings is presented to the user. Selection of a particular meaning triggers the retrieval of entities related to this particular meaning, as shown in Fig. 3. As mentioned before the retrieved ontology can be also freely edited, taking into consideration the following possible situations:

  • The user wants to add a relation to the concept not yet defined in the ontology. If this is the case, we automatically add new concept. Moreover, we also save the information on the authorship of the concept in concept’s metadata and the Internationalized Resource Identifier (IRI) of the concept (and relation) in external repository.

  • The user wants to add a relation to the concept with an existing external IRI. If this is the case, we add new relation only.

  • The user wants to add a relation to the concept with a non-existing IRI, but with a matching label. If this is the case, we let the user decide whether the new concept should be created or the existing one should be used.

Fig. 3.
figure 3

The user interface for Wikidata’s data retrieval in Ontolis

The important and interesting aspect of the use of both IRIs and labels is that we can model ontologies in various languages, an example of which is shown in Fig. 4. Both ontologies present the partial ontology for the concept doctor, giving the possibility of finding the English and Russian counterparts describing the various concepts and relations.

Fig. 4.
figure 4

Ontology for the concept of doctor built in Ontolis using the proposed extension, in English (left) and Russian

To implement the extension we have also used the fact that Ontolis uses a metaontology, in which plugins can be described. We created a new plugin node named #ExternalOntologyResources where we have described how this external plugin could be run. Plugin configuration includes program name, working directory, and general program arguments for the search functions. Concept’s label to be searched and chosen identifier are passed as additional arguments. Although currently the interface for external ontological resources plugins is hard-coded into Ontolis, new implementations can be added in the future without source code modification and even without program restart.

3.2 Evaluation

To preliminarily evaluate our approach we have created a new ontology with the root doctor and checked whether it is difficult or not to create the ontology presented in Figs. 1 and 4. We found that the whole ontology could be created by successively extending the data with the information from Wikidata, using mouse only without any need for typing. The main advantage of being able to retrieve the information from Wikidata is that it contains some verified connections (at least to some degree), that may aid the teaching process during ontology engineering courses.

The usefulness of the proposed extension stems also from the fact that we were also able to automatically translate concepts and relations which were created from Wikidata. This is quite useful for the ontology-based data access system Reply, described in [7], where user can formulate queries using natural language. Such queries may be used e.g. to manually control the autonomous mobile platforms described in [10, 34]. With automatic translation we were able to add support for new language for basic queries without any source code modification. We have tested only basic queries in Polish, because advanced queries require deep language-specific analysis which is not yet available for Polish.

To evaluate our approach more extensively we decided to compare the Wikidata’s coverage of Data Science Ontology (http://www.datascienceontology.com) and bilingual (Russian, English) Ria News Ontology. The version of the Data Science Ontology we used, contains 322 concepts and 329 relations. It is a tree-structured ontology, so the only type of relation is subclass of. Furthermore each concept is associated with a single label. The Ria News Ontology in turn, includes 76 concepts and 129 relations, used mainly in news tagging, describing for instance places, main characters, etc.

The experiments were conducted according to the following scheme:

  1. 1.

    Wikidata was queried for concepts with labels taken from the analyzed ontologies.

  2. 2.

    For each found concept, concept’s relations were retrieved.

  3. 3.

    The number of concepts related in initial ontologies, which were also connected in Wikidata, was found.

In the course of the experiment we have noticed, that typically the concepts with corresponding meanings can be found in the first two results of Wikidata search. Thus, in the analysis performed in steps 2–3 we have included at most 2 results per ontology concept. In the sequel we refer to this reduction as a disambiguation step.

The statistics gathered for the three analyzed ontologies (Data Science, Russian Ria News and English Ria News) are shown in Table 1. From the results in Table 1 we can observe that none of the ontologies could be completely covered by Wikidata. However, we have determined that in case of Data Science Ontology, the concepts that were not found corresponded usually to tree leaves denoting algorithm or technology names. Thus, such a behaviour can be justified.

We can also notice that the average number of Wikidata concepts per ontology concept is greater than 1 for all three ontologies, even after the meaning disambiguation step (see \(\overline{N_{W/o}}\) and \(\overline{N^d_{W/o}}\) statistics). Furthermore, looking at the median number of relations in Wikidata (\(N_r\) statistic), we may conclude that all three ontologies could benefit from the reuse of Wikidata, acquiring new relations.

Table 1. Statistical comparison of Data Science and Ria News ontologies with respect to Wikidata. \(N_o\), \(N_W\) – number of found ontology and Wikidata concepts, \(N^d_W\) – number of Wikidata concepts after meanings disambiguation, \(\overline{N_{W/o}}\), \(\overline{N^d_{W/o}}\) – average number of Wikidata concepts per ontology concept (before and after disambiguation), \(N_r\) – number of relations

Apart from collecting the aforementioned statistics we have also investigated the most frequently appearing relations. Regardless of the analyzed ontology, the most common relations were instance of, subclass of, category’s main topic, topic’s main category and part of. The only difference was related to the fifth most common relation, which in case of Data Science Ontology was official website, while in case of both Ria News Ontologies was is a list of. The counts of relation occurrences are shown in Table 2. The table contains the 5 relations appearing as the most frequent in all analyzed ontologies.

Table 2. The summary of most common Wikidata relations appearing in concepts found for Data Science and Ria News ontologies

Finally, we investigated the correspondence of relations between Wikidata and the analyzed ontologies. At first, we rejected all the relations for which either the domain (the subject) or range (the object) was not found in Wikidata. This way 136, 42 and 77 relations were left in Data Science, Russian Ria News and English Ria News ontologies, respectively. However, out of these relations only 13, 3 and 3 relations were found both in Wikidata and in the analyzed ontologies. It can be concluded that although Wikidata provides many new relations, the existing relations are rarely preserved between the analyzed ontologies and Wikidata.

4 Conclusions

The paper discusses the idea of ontology reuse for the purpose of new concepts definition. As a potential sources of information to be reused, we consider Wikidata knowledge base as well as plWordNet semantic dictionary – one of two Polish wordnets. We present an example of ontology reuse based on these two resources. An extension to one of the popular ontology editors, i.e. Ontolis is also being discussed and preliminarily evaluated. The evaluation proves that the proposed idea can be applied in practice.

The conducted experiments have shown that for the concepts that were found in Wikidata and analyzed ontologies, the information provided by Wikidata is typically richer. This observation is true both from the point of view of the number of concepts as well as the number of relations. However, as discussed in Sect. 3.2, the relations existing in Wikidata and in the analyzed ontologies, very rarely share common relations. It may indicate that Wikidata contains more intermediate nodes than Data Science or Ria News ontologies.

In the future we plan to extend the Ontolis system even further, to enable the wordnets data reuse. We will probably begin with plWordNet, although the idea presented in the paper can be easily applied to other wordnets, such as the Princeton Wordnet or other resources participating in the EuroWordNet project. We think that the ability to obtain the word translations automatically should be also considered an interesting approach, which we plan to pursue.