Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Due to the semantic web, a multitude of ontologies are emerging. Quite a few ontologies are huge and contain hundreds of thousands of concepts. Although some of these huge ontologies fit into one of OWL’s three tractable profiles, e.g., the well known Snomed ontology is in the \( \mathcal {EL} \) profile, there still exist a variety of other OWL ontologies that make full use of OWL DL and require long processing times, even when highly optimized OWL reasoners are employed. Moreover, although most of the huge ontologies are currently restricted to one of the tractable profiles in order to ensure fast processing, it is foreseeable that some of them will require an expressivity that is outside of the tractable OWL profiles.

The research presented in this paper is targeted to provide better OWL reasoning scalability by making efficient use of modern hardware architectures such as multi-processor/core computers. This becomes more important in the case of ontologies that require long processing times although highly optimized OWL reasoners are already used. We consider our research an important basis for the design of next-generation OWL reasoners that can efficiently work in a parallel/concurrent or distributed context using modern hardware. One of the major obstacles that need to be addressed in the design of corresponding algorithms and architectures is the overhead introduced by concurrent computing and its impact on scalability.

Traditional divide and conquer algorithms split problems into independent sub-problems before solving them under the premise that not much communication among the divisions is needed when independently solving the sub-problems, so shared data is excluded to a great extent. Therefore, divide and conquer algorithms are in principle suitable for concurrent computing, including shared-memory parallelization and distributed systems.

Furthermore, recently research on ontology partitioning has been proposed and investigated for dealing with monolithic ontologies. Some research results, e.g. ontology modularization [10], can be used for decreasing the scale of an ontology-reasoning problem. The reasoning over a set of sub-ontologies can be executed in parallel. However, there is still a solution needed to reassemble sub-ontologies together. The algorithms presented in this paper can also serve as a solution for this problem.

In the remaining sections, we present our merge-classification algorithm which uses a divide and conquer strategy and a heuristic partitioning scheme. We report on our conducted experiments and their evaluation, and discuss related research.

2 A Parallelized Merge Classification Algorithm

In this section, we present an algorithm for classifying Description Logic (DL) ontologies. Due to lack of space we refer for preliminaries about DLs, DL reasoning, and semantic web to [3, 13].

We present the merge-classification algorithm in pseudo code. Part of the algorithm is based on standard top- and bottom-search techniques to incrementally construct the classification hierarchy (e.g., see [2]). Due to the symmetry between top-down (\( \top \_search \)) and bottom-up (\( \bot \_search \)) search, we only present the first one. In the pseudo code, we use the following notational conventions: \( \varDelta _i \), \( \varDelta _\alpha \), and \( \varDelta _\beta \) designate sub-domains that are divided from \( \varDelta \); we consider a subsumption hierarchy as a partially order over \( \varDelta \), denoted as \( \le \), a subsumption relationship where \(C\) is subsumed by \(D\) (\( C \sqsubseteq D \)) is expressed by \( C \le D \) or by \( \langle C , D \rangle \in \; \le \), and \( \le _i \), \( \le _\alpha \), and \( \le _\beta \) are subsumption hierarchies over \( \varDelta _i \), \( \varDelta _\alpha \), and \( \varDelta _\beta \), respectively; in a subsumption hierarchy over \( \varDelta \), \( C \prec D \) designates \( C \sqsubseteq D \) and there does not exist a named concept \( E \) such that \( C \le E \) and \( E \le D \); \( \prec _i \), \( \prec _\alpha \) and \( \prec _\beta \) are similar notations defined over \( \varDelta _i \), \( \varDelta _\alpha \), and \( \varDelta _\beta \), respectively.

Our merge-classification algorithm classifies a taxonomy by calculating its divided sub-domains and then by merging the classified sub-taxonomies together. The algorithm makes use of two facts: (i) If it holds that \( B \le A \), then the subsumption relationships between \( B \)’s descendants and \( A \)’s ancestors are determined; (ii) if it is known that \( B \not \le A \), the subsumption relationships between \( B \)’s descendants and \( A \)’s ancestors are undetermined. The canonical DL classification algorithms, top-search and bottom-search, have been modified and integrated into the merge-classification. The algorithm consists of two stages: divide and conquering, and combining. Algorithm 1 shows the main part of our parallelized DL classification procedure. The keyword spawn indicates that its following calculation must be executed in parallel, either creating a new thread in a shared-memory context or generating a new process or session in a non-shared-memory context. The keyword sync always follows spawn and suspends the current calculation procedure until all calculations invoked by spawn return.

The domain \( \varDelta \) is divided into smaller partitions in the first stage. Then, classification computations are executed over each sub-domain \( \varDelta _i \). A classified sub-terminology \( \le _i \) is inferred over \( \varDelta _i \). This divide and conquering operations can progress in parallel.

Classified sub-terminologies are to be merged in the combining stage. The told subsumption relationships are utilized in the merging process. Algorithm 2 outlines the master procedure, and the slave procedure is addressed by Algorithms 3, 4, 5, and 6.

figure a
figure b

2.1 Divide and Conquer Phase

The first task is to divide the universe, \( \varDelta \), into sub-domains. Without loss of generality, \( \varDelta \) only focuses on significant concepts, i.e., concept names or atomic concepts, that are normally declared explicitly in some ontology \( \mathcal {O} \), and intermediate concepts, i.e., non-significant ones, only play a role in subsumption tests. Each sub-domain is classified independently. The divide operation can be naively implemented as an even partitioning over \( \varDelta \), or by more sophisticated clustering techniques such as heuristic partitioning that may result in a better performance, as presented in Sect. 3. The conquering operation can be any standard DL classification method. We first present the most popular classification methods, top-search (Algorithm 3) and bottom-search (omitted here).

The DL classification procedure determines the most specific super- and the most general sub-concepts of each significant concept in \( \varDelta \). The classified concept hierarchy is a partial order, \( \le \), over \( \varDelta \). \( \top \_search\) recursively calculates a concept’s intermediate predecessors, i.e., intermediate immediate ancestors, as a relation \( \prec _i \) over \( \le _i \).

figure c

2.2 Combining Phase

The independently classified sub-terminologies must be merged together in the combining phase. The original top-search (Algorithm 3) (and bottom-search) have been modified to merge two sub-terminologies \( \le _\alpha \) and \( \le _\beta \). The basic idea is to iterate over \( \varDelta _\beta \) and to use top-search (and bottom-search) to insert each element of \( \varDelta _\beta \) into \( \le _\alpha \), as shown in Algorithm 4.

figure d
figure e

However, this method does not make use of so-called told subsumption (and non-subsumption) information contained in the merged sub-terminology \( \le _\beta \). For example, it is unnecessary to test \( \le ? ( B_2 , A_1 ) \) when we know \( B_1 \le A_1 \) and \( B_2 \le B_1 \), given that \( A_1, A_{2}\) occur in \(\varDelta _\alpha \) and \( B_1,B_{2}\) occur in \(\varDelta _\beta \).

figure f

Therefore, we designed a novel algorithm in order to utilize the properties addressed by Proposition 1 to 6. The calculation starts top-merge (Algorithm 5), which uses a modified top-search algorithm (Algorithm 6). This pair of procedures find the most specific subsumers in the master sub-terminology \( \le _\alpha \) for every concept from the sub-terminology \( \le _\beta \) that is being merged into \( \le _\alpha \).

Proposition 1

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( \langle B , A \rangle \in \, \prec _i \) is found in top-search, \( \langle A , \top \rangle \in \, \le _\alpha \) and \( \langle B , \top \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle b , B \rangle \in \, \le _\beta \}\) and \( \forall a \in \{ a \mid \langle A , a \rangle \in \, \le _\alpha \} \cup \{ A \} \) it is unnecessary to calculate whether \( b \le a \).

Proposition 2

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( \langle B , A \rangle \in \, \prec _i \) is found in top-search, \( \langle A , \top \rangle \in \, \le _\alpha \) and \( \langle B , \top \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle b , B \rangle \in \, \prec _\beta \wedge b \ne B \} \) and \( \forall a \in \{ a \mid \langle a , A \rangle \in \, \prec _\alpha \wedge a \ne A \} \) it is necessary to calculate whether \( b \le a \).

Proposition 3

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( B \not \le A \) is found in top-search, \( \langle A , \top \rangle \in \, \le _\alpha \) and \( \langle B , \top \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle b , B \rangle \in \, \le _\beta \wedge b \ne B \} \) and \( \forall a \in \{ a \mid \langle a , A \rangle \in \, \le _\alpha \} \cup \{ A \} \) it is necessary to calculate whether \( b \le a \).

Proposition 4

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( \langle A , B \rangle \in \, \prec _i \) is found in bottom-search, \( \langle \bot , A \rangle \in \, \le _\alpha \) and \( \langle \bot , B \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle B , b \rangle \in \, \le _\beta \} \) and \( \forall a \in \{ a \mid \langle a , A \rangle \in \, \le _\alpha \} \cup \{ A \} \) it is unnecessary to calculate whether \( a \le b \).

Proposition 5

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( \langle A , B \rangle \in \, \prec _i \) is found in bottom-search, \( \langle \bot , A \rangle \in \, \le _\alpha \) and \( \langle \bot , B \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle B , b \rangle \in \, \prec _\beta \wedge b \ne B \} \) and \( \forall a \in \{ a \mid \langle A , a \rangle \in \, \prec _\alpha \wedge a \ne A \} \) it is necessary to calculate whether \( a \le b \).

Proposition 6

When merging sub-terminology \( \le _\beta \) into \( \le _\alpha \), if \( A \not \le B \) is found in bottom-search, \( \langle \bot , A \rangle \in \, \le _\alpha \) and \( \langle \bot , B \rangle \in \, \le _\beta \), then for \( \forall b \in \{ b \mid \langle B , b \rangle \in \, \le _\beta \wedge b \ne B \} \) and \( \forall a \in \{ a \mid \langle A , a \rangle \in \, \le _\alpha \} \cup \{ A \} \) it is necessary to calculate whether \( a \le b \).

When merging a concept \( B \), \( \langle B , \top \rangle \in \; \le _\beta \), the top-merge algorithm first finds for \( B \) the most specific position in the master sub-terminology \( \le _\alpha \) by means of top-down search. When such a most specific super-concept is found, this concept and all its super-concepts are naturally super-concepts of every sub-concept of \( B \) in the sub-terminology \( \le _\beta \), as is stated by Proposition 1. However, this newly found predecessor of \( B \) may not be necessarily a predecessor of some descendant of \( B \) in \( \le _\beta \). Therefore, the algorithm continues to find the most specific positions for all sub-concepts of \( B \) in \( \le _\beta \) according to Proposition 2. Algorithm 5 addresses this procedure.

Non-subsumption information can be told in the top-merge phase. Top-down search employed by top-merge must do subsumption tests somehow. In a canonical top-search procedure, as indicated by Algorithm 3, the branch search is stopped at this point. However, the conclusion that a merged concept \( B \), \( \langle B , \top \rangle \in \; \le _\beta \), is not subsumed by a concept \( A \), \( \langle A , \top \rangle \in \; \le _\alpha \), does not rule out the possibility of \( b \le A \), \( b \in \{ b \mid \langle b , B \rangle \in \, \prec _\beta \} \), which is not required in traditional top-search and may be abound in the top-merge procedure, and therefore must be followed by determining whether \( b \le A \). Otherwise, the algorithm is incomplete. Proposition 3 presents this observation. For this reason, the original top-search algorithm must be adapted to the new situation. Algorithm 6 is the updated version of the top-search procedure.

Algorithm 6 not only maintains told subsumption information by the set \( green \), but also propagates told non-subsumption information by the set \( red \) for further inference. As addressed by Proposition 3, when the position of a merged concept is determined, the subsumption relations between its successors and the \( red \) set are calculated. Furthermore, the subsumption relation for the concept \( C \) and \( D \) in Algorithm 6 must be explicitly calculated even when the set \( green \) is empty. In the original top-search procedure (Algorithm 3), \( C \prec _i D \) is implicitly derivable if the set \( green \) is empty, which does not hold in the modified top-search procedure (Algorithm 6) since it does not always start from \( \top \) any more when searching for the most specific position of a concept.

2.3 Example

We use an example to illustrate the algorithm further. Given an ontology with a TBox defined by Fig. 1(a), which only contains simple concept subsumption axioms, Fig. 1(b) shows the subsumption hierarchy.

Fig. 1.
figure 1

An example ontology.

Fig. 2.
figure 2

The subsumption hierarchy over divisions.

Suppose that the ontology is clustered into two groups in the divide phase: \( \varDelta _\alpha = \{ A_2 , A_3 , A_5 , A_7 \} \) and \( \varDelta _\beta = \{ A_1 , A_4 , A_6 , A_8 \} \). They can be classified independently, and the corresponding subsumption hierarchies are shown in Fig. 2.

Fig. 3.
figure 3

The computation path of determining \( A_4 \le _i A_5 \).

Fig. 4.
figure 4

The subsumption hierarchy after \( A_4 \le A_5 \) has been determined.

In the merge phase, the concepts from \( \le _\beta \) are merged into \( \le _\alpha \). For example, Fig. 3 shows a possible computation path where \( A_4 \le A_5 \) is being determined.Footnote 1 If we assume a subsumption relationship between two concepts is proven when the parent is added to the set \( box \) (see Line 15, Algorithm 6), Fig. 4 shows the subsumption hierarchy after \( A_4 \le A_5 \) has been determined.

figure g

3 Partitioning

Partitioning is an important part of this algorithm. It is the main task in the dividing phase. In contrast to simple problem domains such as sorting integers, where the merge phase of a standard merge-sort does not require another sorting, DL ontologies might entail numerous subsumption relationships among concepts. Building a terminology with respect to the entailed subsumption hierarchy is the primary function of DL classification. We therefore assumed that some heuristic partitioning schemes that make use of known subsumption relationships may improve reasoning efficiency by requiring a smaller number of subsumption tests, and this assumption has been proved by our experiments, which are described in Sect. 4.

So far, we have presented an ontology partitioning algorithm by using only told subsumption relationships that are directly derived from concept definitions and axiom declarations. Any concept that has at least one told super- and one sub-concept, can be used to construct a told subsumption hierarchy. Although such a hierarchy is usually incomplete and has many entailed subsumptions missing, it contains already known subsumptions indicating the closeness between concepts w.r.t. subsumption. Such a raw subsumption hierarchy can be represented as a directed graph with only one root, the \( \top \) concept. A heuristic partitioning method can be defined by traversing the graph in a breadth-first way, starting from \( \top \), and collecting traversed concepts into partitions. Algorithms 7 and 8 address this procedure.

4 Evaluation

Our experimental results clearly show the potential of merge-classification. We could achieve speedups up to a factor of 4 by using a maximum of 8 parallel workers, depending on the particular benchmark ontology. This speedup is in the range of what we expected and comparable to other reported approaches, e.g., the experiments reported for the ELK reasoner [16, 17] also show speedups up to a factor of 4 when using 8 workers, although a specialized polynomial procedure is used for \( \mathcal {EL} \)+ reasoning that seems to be more amenable to concurrent processing than standard tableau methods.

We have designed and implemented a concurrent version of the algorithm so far. Our programFootnote 2 is implemented on the basis of the well-known reasoner JFact,Footnote 3 which is open-source and implemented in Java. We modified JFact such that we can execute a set of JFact reasoning kernels in parallel in order to perform the merge-classification computation. We try to examine the effectiveness of the merge-classification algorithm by adapting such a mature DL reasoner.

figure h

4.1 Experiment

A multi-processor computer, which has 4 octa-core processors and 128G memory installed, was employed to test the program. The Linux OS and 64-bit OpenJDK 6 was employed in the tests. The JVM was allocated at least 16G memory initially, given that at most 64G physical memory was accessible. Most of the test cases were chosen from OWL Reasoner Evaluation Workshop 2012 (ORE 2012) data sets. Table 1 shows the test cases’ metrics.

Table 1. Metrics of the test cases.
Fig. 5.
figure 5

The performance of parallelized merge-classification—I.

Each test case ontology was classified with the same setting except for an increased number of workers. Each worker is mapped to an OS thread, as indicated by the Java specification. Figure 5 and 6 show the test results.

In our initial implementation, we used an even-partitioning scheme. That is to say concept names are randomly assigned to a set of partitions. For the majority of the above-mentioned test cases we observed a small performance improvement below a speedup factor of 1.4, for a few an improvement of up to 4, and for others only a decrease in performance. Only overhead was shown in these test cases.

As mentioned in Sect. 3, we assumed that a heuristic partitioning might promote a better reasoning performance, e.g., a partitioning scheme considering subsumption axioms. This idea is addressed by Algorithms 7 and 8.

We implemented Algorithms 7 and 8 and tested the program. Our assumption has been proved by the test: Heuristic partitioning may improve reasoning performance where blind partitioning can not.

Fig. 6.
figure 6

The performance of parallelized merge-classification—II.

4.2 Discussion

Our experiment shows that with a heuristic divide scheme the merge-classification algorithm can increase reasoning performance. However, such performance promotion is not always tangible. In a few cases, the parallelized merge-classification merely degrades reasoning performance. The actual divide phase of our algorithm can influence the performance by creating better or worse partitions.

A heuristic divide scheme may result in a better performance than a blind one. According to our experience, when the division of the concepts from the domain is basically random, sometimes divisions contribute to promoting reasoning performance, while sometimes they do not. A promising heuristic divide scheme seems to be in grouping a family of concepts, which have potential subsumption relationships, into the same partition. Evidently, due to the presence of non-obvious subsumptions, it is hard to guess how to achieve such a good partitioning. We tried to make use of obvious subsumptions in axioms to partition closely related concepts into the same group. The tests demonstrate a clear performance improvement in a number of cases.

While in many cases merge-classification can improve reasoning performance, for some test cases its practical effectiveness is not yet convincing. We are still investigating the factors that influence the reasoning performance for these cases but cannot give a clear answer yet. The cause may be the large number of general concept inclusion (GCI) axioms of ontologies. Even with some more refined divide scheme, those GCI axioms can cause inter-dependencies between partitions, and may cause in the merge phase an increased number of subsumption tests. Also, the indeterminism of the merging schedule, i.e., the unpredictable order of merging divides, needs to be effectively solved in the implementation, and racing conditions between merging workers as well as the introduced overhead may decrease the performance. In addition, the limited performance is caused by the experimental environment: Compared with a single chip architecture, the 4-chip-distribution of the 32 processors requires extra computational overhead, and the memory and thread management of JVM may decrease the performance of our program.

5 Related Work

A key functionality of a DL reasoning system is classification, computing all entailed subsumption relationships among named concepts. The generic top-search and bottom-search algorithms were introduced by [19] and extended by [2]. The algorithm is used as the standard technique for incrementally creating subsumption hierarchies of DL ontologies. Reference [2] also presented some basic traversal optimizations. After that, a number of optimization techniques have been explored [8, 9, 26]. Most of the optimizations are based on making use of the partial transitivity information in searching. However, research on how to use concurrent computing for optimizing DL reasoning has started only recently.

The merge-classification algorithm is suitable for concurrent computation implementation, including both shared-memory parallelization and (shared-memory or non-shared-memory) distributed systems. Several concurrency-oriented DL reasoning schemes have been researched recently. Reference [18] reported on experiments with a parallel \( \mathcal {SHN} \) reasoner. This reasoner could process disjunction and at-most cardinality restriction rules in parallel, as well as some primary DL tableau optimization techniques. Reference [1] presented the first algorithms on parallelizing TBox classification using a shared global subsumption hierarchy, and the experimental results promise the feasibility of parallelized DL reasoning. References [16, 17] reported on the ELK reasoner, which can classify \( \mathcal {EL} \) ontologies concurrently, and its speed in reasoning about \( \mathcal {EL+} \) ontologies is impressive. References [28, 29] studied a parallel DL reasoning system. References [20, 21] proposed the idea of applying a constraint programming solver. Besides the shared-memory concurrent reasoning research mentioned above, non-shared-memory distributed concurrent reasoning has been investigated recently by [22, 25].

Merge-classification needs to divide ontologies. Ontology partitioning can be considered as a sort of clustering problem. These problems have been extensively investigated in networks research, such as [6, 7, 30]. Algorithms adopting more complicated heuristics in the area of ontology partitioning, have been presented in [5, 1012, 14].

Our merge-classification approach employs the well-known divide and conquer strategy. There is sufficient evidence that these types of algorithms are well suited to be processed in parallel [4, 15, 27]. Some experimental works about parallelized merge sort are reported in [23, 24].

6 Conclusion

The approach presented in this paper has been motivated by the observation that (i) multi-processor/core hardware is becoming ubiquitously available but standard OWL reasoners do not yet make use of these available resources; (ii) Although most OWL reasoners have been highly optimized and impressive speed improvements have been reported for reasoning in the three tractable OWL profiles, there exist a multitude of OWL ontologies that are outside of the three tractable profiles and require long processing times even for highly optimized OWL reasoners. Recently, concurrent computing has emerged as a possible solution for achieving a better scalability in general and especially for such difficult ontologies, and we consider the research presented in this paper as an important step in designing adequate OWL reasoning architectures that are based on concurrent computing.

One of the most important obstacles in successfully applying concurrent computing is the management of overhead caused by concurrency. An important factor is that the load introduced by using concurrent computing in DL reasoning is usually remarkable. Concurrent algorithms that cause only a small overhead seem to be the key to successfully apply concurrent computing to DL reasoning.

Our merge-classification algorithm uses a divide and conquer scheme, which is potentially suitable for low overhead concurrent computing since it rarely requires communication among divisions. Although the empirical tests show that the merge-classification algorithm does not always improve reasoning performance to a great extent, they let us be confident that further research is promising. For example, investigating what factors impact the effectiveness and efficiency of the merge-classification may help us improve the performance of the algorithm further.

At present our work adopts a heuristic partitioning scheme at the divide phase. Different divide schemes may produce different reasoning performances. We are planning to investigate better divide methods. Furthermore, our work has only researched the performance of the concurrent merge-classification so far. How the number of division impacts the reasoning performance in a single thread and a multiple threads setting needs be investigated in more detail.