Keywords

1 Introduction

Ontology is “a formal, explicit specification of a shared conceptualization” [1] that has numerous capabilities such as analyzing domain knowledge, making available implicit knowledge explicit, sharing a common understanding of the structure of information among people or software agents [2]. Thus, ontologies are incorporated in information systems as a component to manage heterogeneous information and high-volume data in domains like medicine, agriculture, defense, and finance. This supports information consumers or software agents to make the right decisions to tackle practical problems. However, the right decisions depend on the quality of the information provided, in the case of ontology-based information systems, the quality of the ontology.

Ontology quality is a promising research area in the semantic web that has been discussed under ontology evaluation [3]. Ontology quality assessment is useful for ontology consumers to select a suitable ontology from a set of ontologies or assess the fitness of an ontology for an intended purpose [4, 5]. Moreover, quality assessment should not be limited to evaluate the product at the final stage. Thus, an ontology is also needed to be evaluated across the entire ontology life cycle. An ontology consists of levels namely syntactic, vocabulary (i.e., terminology), architecture, semantic, and context, also named as layers [6, 7]. The evaluation of ontology can be considered with respect to each ontology level/layer to reduce the complexity of the overall ontology quality assessments.

However, building a good quality ontology is not straightforward as it requires to consider several aspects such as logic, reasoning, structure, domain knowledge to be modeled concerning the specified tasks [8, 9]. To this end, ontology quality criteria are important to assess the components of an ontology that may have several measures (i.e., metrics) which provide an objective and a quantitative basis insight for ontology developers. Then, they can understand the areas to be revised to achieve a good quality of an ontology. Firstly, the authors of the article [6] have proposed a significant set of ontology quality criteria (i.e., characteristics) such as correctness, soundness, consistency, completeness, conciseness, expandability, and sensitiveness. Later, many more criteria have been added to this list by scholars from different points of view [3, 10,11,12,13,14]. For instance, the scholars of the article [3] have proposed a set of criteria by considering ontology as a software artifact. They have adopted the standard ISO/IEC 25000:2005 titled SQuaRE [15] and have suggested the criteria: functional adequacy, reliability, performance efficiency, operability, maintainability, capability, transferability, and structural. OntoQA is an approach that describes eleven (11) quality measures that can be used to evaluate the quality of an ontology at schema and data (i.e., knowledge bases) levels [14]. Schema level measures are the richness of relationships, inheritance, and attributes. The measures: Class richness, average population, cohesion, fullness, connectivity are classified as data level measures. Furthermore, OntoMetric is another web-based tool that assesses ontology quality under five criteria: basic, schema, graph, knowledge, and class which include 160 measures [13].

At present, many quality criteria and measures have been defined to assess the quality of an ontology from different perspectives such as ontology perspectives (i.e., inherent ontology quality), real-world perspectives, and users’ perspectives [8, 13, 14, 16]. However, all these criteria and measures defined in the literature so far are messy and vague. For example, it is difficult to understand the quality criteria and thus the quality measures relevant to a given criterion in most cases as the terminology has been used inattentively. There is no distinction made between the two concepts quality criteria and quality measure from an ontology quality point of view. Moreover, the ontology has many levels as described in Sect. 4.1 and the quality of each level is significant for the overall quality of the ontology. Even if the quality of a certain level has been discussed in the literature in a very ad hoc way none of the existing definitions or approaches have defined quality criteria or measures in a methodical way for all of the levels. Consequently, no proper guidelines exist so far for ontology quality evaluation as the way it is with software engineering. For instance, when an ontology is evaluated through an application-based approach, it is necessary to understand what quality criteria to be adopted at the contextual level, semantic level, and structural level. Currently, ontology researchers and practitioners limit the quality evaluation only to a certain set of criteria namely expressiveness and usefulness due to the nonexistence of proper guidelines [16].

Nevertheless, our effort to analyze the quality criteria and measures that have been identified in the previous studies and synthesize them to provide an overview with ontology levels and approaches in order to produce a good quality ontology. To achieve this aim, data quality theories have been adopted. This would guide ontology developers and researchers to understand what quality criteria are to be assessed in each level (i.e., layer) and what the possible approaches would be to evaluate the ontologies.

2 Related Work

We analyzed the existing survey studies which have focused on ontology evaluation criteria, metrics, and approaches. Among them, a countable number of survey studies [16,17,18,19,20] were reviewed the related works comprehensively or systematically. However, none of them have provided a model or matrix, or overview among quality criteria, approaches, and ontology levels. This has caused a difficulty for researchers to gain insight on what quality criteria to be considered when performing ontology evaluation and what criteria would be more appropriate to assess each level of an ontology.

The author of the article [17] has highlighted the important quality criteria: consistency, completeness, conciseness, expandability, and sensitiveness through theoretical analysis and based on her experience. Then, a set of errors that can be made by ontology developers, have been classified under each quality criterion except expandability and sensitiveness. Finally, it has presented the ways of detecting inconsistency, incompleteness, and redundancy. Moreover, the work has also highlighted the requirement of developing language-dependent evaluation tools and the importance of documenting ontology quality with criteria. The research article [18] has considered the automatic, domain- and task-independent ontology evaluation as the scope of the study and has also focused on a set of ontology quality criteria that have been explained in five articles including [11, 17]. Furthermore, the evaluation of each ontology level with related measures has been described. For instance, the structure level can be evaluated by considering sub-graphs: depth, breadth, and fan-outness, and the context level can be assessed with competency questions, or through unit tests. Nevertheless, there is no clear comparison or discussion of ontology criteria and how those can be associated in each level when evaluating an ontology.

There are fifty-one structural quality measures that have been explored in [19]. Since the definition in natural language can be interpreted by different researchers from different perspectives and the paper has constructed formal definitions for each measure of quality criteria to provide a common understanding. Thus, the authors of the article [19] have addressed that issue by introducing formal definitions based on the Ontology Definition Model (ODM) and have presented the formal definitions for the quality measures: Richness, Cohesion, Class Importance, Fullness, Coupling, Class Connectivity, Class Readability. These formal definitions support researchers to compare the definitions and intents of measures when evaluating the structure of the ontology.

As the ontology quality assessment is required in each stage of the ontology life cycle, it is vital to aware of what criteria to be considered in each stage. Thus, the researcher of the article [16], has explored the quality criteria which are relevant for the evaluation of design and implementation stages. For that, a systematic review has been performed by retrieving articles from two reputed journals: the Journal of Web Semantics and the Semantic Web Journal. As the author has explored, accuracy, adaptability, cognitive adequacy, completeness, conciseness, consistency, expressiveness, and grounding as relevant criteria for evaluating the ontology in the design stage. To evaluate the quality of an ontology in the implementation stage, the criteria: computational efficiency, congruency, practical usefulness, precision, and recall have been recommended. Moreover, it has been revealed that few quality criteria such as expressiveness and practical usefulness have been used in practice though there are many quality criteria defined in theoretical approaches [16].

As a diverse set of ontology quality criteria exist, it is difficult for researchers to find a suitable set of quality criteria for assessing a particular ontology based on the intended purpose. To mitigate this issue, scholars have adopted well-defined theories and standards in the software engineering discipline [3, 20]. In the article [20], the authors have conducted a systematic review to identify the ontology quality criteria and grouped the measures of quality criteria into categories namely Inherent and Inherent-System, which have been defined in ISO/IEC 25012 Data Quality Standard. The adapted inherent quality criteria from this standard are accuracy, completeness, consistency, and currentness. The inherent-system criteria are compliance, understandability, availability. Under these criteria, the ontology measures identified through the survey have been mapped. For instance, the accuracy criterion includes the measures: Incorrect Relationship, Hierarchy Over-specialization, Class Precision, Number of Deprecated Classes and Properties, etc. and the completeness criterion includes the measures: Number of Isolated Elements, Missing Domain or Range in Properties, Class Coverage, and Relation Coverage. However, this classification can be applied to compare the quality of two or more ontologies in a similar domain, but it is not sufficient to assess a single ontology to get an idea on which components (i.e., levels) of an ontology have good quality and which are needed to be improved.

Moreover, the scholars [7, 21,22,23,24,25] have discussed several ontology evaluation approaches, criteria and ontology levels to be focused on when assessing ontology quality. Only the researchers of the articles [22, 23] have attempted to provide a comparison between the ontology quality approaches and criteria. However, the comparisons are abstract and difficult to interpret. The authors in [23] have stated that it is difficult to associate criteria with ontology approaches and ontology levels due to their diversity [23]. According to our study, a reason for having several criteria is due to the availability of different definitions to the same criteria, or vice versa (i.e., two or more closely related criteria may have the same definitions). This issue is further discussed in Sect. 4.2. Thus, we made an effort to carefully analyze these different definitions and to specify possible ontology quality criteria related to approaches, and levels (i.e., layers). To this end, a comprehensive theoretical analysis was conducted on ontology quality criteria and metrics (i.e., measures).

3 Methodology

To address the gap highlighted in Sects. 1 and 2, we performed a theoretical review by following the procedure proposed in [26]. To this end, the relevant background and gaps to be addressed have been explained in the previous sections. As the next step, the search terms to find the relevant papers from the databases: ACM Digital Library, IEEE Xplore Digital Library, Science Direct, Springer Link, and the search engine: Google Scholar, were defined. They are ontology, ontology quality criteria, measures, metrics, quality assessment, and ontology evaluation. Then, the general search strategy was developed to perform a search on the databases which are “[ontology AND [Quality OR Evaluation OR Assessment] AND [Criteria OR Measures OR Metric]]”. At this stage, the articles were filtered purposefully by analyzing titles and abstracts as the study intention is not to explore the state-of-art in ontology quality assessment, but to analyze the ontology criteria which have been covered through the ontology levels, possibly with approaches. To reduce the searching results and to retrieve quality studies, the inclusion criteria such as;

  • studies in English,

  • studies published during (2010–2021),

  • peer-reviewed,

  • full-papers, and

  • studies focused on quality assessment, criteria, and measures

have been applied. Finally, the relevant articles were downloaded through the reference management tool (i.e., Mendeley). Moreover, few potential articles were retrieved by looking up the references of the filtered articles. Thereafter, we selected the articles which are [3, 6, 10, 11, 27,28,29,30,31,32,33,34,35] for the analysis.

4 Data Analysis and Synthesis

4.1 Prerequisite

In previous studies, the following terms have been used interchangeably in their explanations of ontology quality. In this study, mostly we use the terms: criteria (i.e., characteristics), metrics (i.e., measures), dimensions, ontology levels, and ontology approaches to describe the theories in order to maintain consistency.

Criteria (i.e., Characteristics)

Ontology criteria (i.e., characteristics) describe a set of attributes. An attribute is a measurable physical or abstract property of an entity [36, 37]. In ontology quality, an entity can be a set of concepts, properties, or an ontology.

Metric (i.e., Measure)

Metric (i.e., measure) describes an attribute quantitatively or defines an attribute formally [38]. In other words, the ontology quality metric is used to measure the characteristic of an ontology that can be represented formally.

For instance, the conceptual complexity is a criterion (i.e., characteristic) that is used to evaluate an ontology (i.e., entity) and it can be quantitatively measured by using the metric: size, which may have measurements such as number of concepts and properties in the structure, number of leaf concepts and number of attributes per concepts.

Dimension (i.e., Aspects)

Dimensions (i.e., aspects) have been defined to classify several criteria/attributes based on different views. For example, if a dimension describes the content of an ontology, that may include a set of criteria related to the content assessments. The criteria like graph complexity, modularity, and graph consistency can be grouped into the structural dimension.

Therefore, similar to the software data quality, dimensions are qualitative and associate with several characteristics (i.e., attributes) that can be directly or indirectly measured through quantitative metrics.

Hereinafter, we use terms: criteria and metric instead of using the terms ontology quality criteria and ontology quality metrics respectively.

Ontology Levels (i.e., Layers)

In the ontology quality assessment, initially, three levels (also known as layers) to be focused on have been proposed in [6], namely: content, syntactic & lexicon, and architecture. Later, this was expanded by including structural and context [7, 18]. These levels/layers focus on different aspects of ontological information. The syntactic level considers the features related to the formal language that is used to represent the ontology. The lexicon level is also named vocabulary or data layer that takes into account the vocabulary that is used to describe concepts, properties, instances, and facts. The structural level/architectural layer focuses on the is-a relationship (i.e., hierarchical) which is more important in the ontology modeling against the other relations. Moreover, it considers the design principles and other structural features required to represent the ontology. Other non-hierarchical relationships and semantic elements are considered under the semantic level. The context level concerns the application level that the ontology is built for. It is important to assess whether the ontology confirms the real application requirements as a component of an information system or a part of a collection of ontologies [7, 39].

Ontology Evaluation Approaches (i.e., Methods, Techniques)

Mainly, ontology evaluation has been conducted under four approaches: application-based, data-driven-based, golden standard-based, and human-based [7]. In brief, the application-based approach: assesses the ontology when it is attached with the application and used in practice [39]. The data-driven approach: assesses the ontology against the data source (i.e., corpus) that is used for the ontology modeling. The golden-standard approach: compares the candidate ontology with the ontology that has the agreed quality or assesses the ontology with a benchmark/a vocabulary defined by experts. The human-based approach: assesses the ontology with the intervention of domain experts and ontology engineers based on the set of criteria, requirements, and standards [6, 7].

Table 1. The existing quality models for ontologies

4.2 Ontology Quality Dimensions, Criteria, and Metrics

The ontology quality evaluation throughout the ontology life cycle ensures that good quality ontology is being developed. However, a major issue is the unavailability of an agreed methodology for it. As a result, several criteria and metrics have been defined without a strong theoretical foundation. According to our analysis, we were able to identify a set of criteria and metrics that were presented in Table 2. The related measurements have not been mentioned as all together hundreds of measurements are available. Thus, only the relevant citations have been provided for further references. Moreover, the metrics definitions similar to the ones in [11] and [3] can also be found in [12] and [40] respectively.

There are significant attempts in the literature to introduce generalized dimensions to classify criteria and metrics (see Table 1). The authors of the article [10] have introduced quality dimensions: syntactic, semantic, pragmatic, and social by adopting a semiotic framework named the semiotic metric suit. In the article [11], the researchers have classified metrics into three dimensions: structural, functional, and usability-related. In addition to that, the scholars in the research [30] have introduced a set of dimensions namely presentation, content, and usage considering the web service domain. When analyzing the metrics defined under each dimension, it has been recognized that the proposed dimensions are overlapping. For instance, the criteria: modularity, size: concept/relations defined under the structural dimension in [11] also appear in the presentation dimension in [30]. Moreover, the criteria in the dimensions: semantic [10], functional [11], and content [30] have been defined concerning the domain that the ontology being modeled. The pragmatic [10], usability-related [11], and usage [30] dimensions consider the quality when an ontology is at the application level and to this end, criteria such as functional accuracy, relevancy, adaptability, efficiency, and comprehensibility have been considered. Based on that, we have identified five main distinguish dimensions that the criteria can be grouped such as syntactic, structural, semantic, pragmatic, and social. This can be seen as an extended version of the semiotic metric suit [10]. Then, taking this as a basis, the identified criteria and metrics in the literature were mapped by analyzing the given definitions (see Table 2).

Classification of Criteria and Metrics

After a thorough analysis, we identified fourteen main ontology criteria namely syntactic correctness, cognitive complexity, conciseness, modularity, consistency, accuracy, completeness, adaptability, applicability, efficiency, understandability, relevance, usability, and accessibility which are classified as follows (the possible metrics are in the italic format).

  • Syntactic: describes the conformance to the rules of the language that the ontology is written [10, 30]

    • Syntactic correctness: lawfulness, richness

  • Structural: describes the topological and logical properties of an ontology [11]

    • Cognitive Complexity: size, depth, breadth, fan-outness, Modularity: cohesion, coupling, Internal Consistency: tangledness, circularity, partition

  • Semantic: describes the characteristics related to the semantic (meanings) of an ontology [10, 34].

    • Conciseness: precision, Coverage: recall, External Consistency: clarity, interpretability

  • Pragmatic: describes the appropriateness of an ontology for an intended purpose/s

    • Functional Completeness: competency questions, precision, Accuracy, Adaptability, Applicability, Efficiency, Understandability, Relevance, Usability: ease of use, Accessibility

  • Social: describes the characteristics related to ontology quality in use (user-satisfaction/ social acceptance)

    • recognition, authority, history

Few of the criteria can be further decomposed into sub-criteria related to different perspectives: inherent to the ontology (i.e., ontology perspective), domain-depend (i.e., real-world perspective), and user-depend (i.e., user perspective). For instance, there are two types of consistency: internal consistency and external consistency [30].

Table 2. Associated ontology quality dimensions of criteria, and metrics

Internal consistency is an inherent characteristic of ontologies that considers whether there is any self-contradiction within the ontology (i.e., ontology perspective) [18, 30]. In the article [17], the authors have classified three inconsistencies in this regard as circularity errors, partition errors, and semantic errors. Circularity and partition errors describe the logical inconsistencies related to the structure and the relations of an ontology, thus, both are inherent to the ontology. Tangledness that has been described in [11, 12] also considers as a measure of internal consistency of the structure as tangledness occurs when a class has multiple parent classes.

To determine the semantic correctness (i.e., semantic errors), it is necessary to consider the domain knowledge that the ontology used to specify the conceptualization. Thus, it comes under the external inconsistency that considers the consistency from the real-world perspective. Furthermore, clarity and interpretability can also be considered as metrics of semantic correctness [33].

Moreover, the definitions are given for the criteria: completeness, coverage, and expressiveness are closely related (see Table 3). Based on the completeness definition given in [41, 42] for Data Quality, we identified that ontology completeness also can be further decomposed as coverage: syntactic completeness, coverage: semantic completeness, and functional completeness considering different perspectives. For instance, coverage (or the semantic completeness) describes the completeness from the real-world perspective that determines the degree of covered entities (i.e., concepts, relation, attribute, instances) in the domain [17, 28, 33]. Moreover, the measures: missing instances, missing properties, isolated relations, and incomplete formats are also used to assess coverage from the ontological inherent point of view, which can be detected without domain knowledge. Thus, it is named syntactic completeness. In addition to that, we defined the criteria: functional completeness concerning the user perspective, in which the completeness is measured considering whether the ontology provides complete answers for the users’ queries (i.e., competency questions), which is more subjective and difficult to measure. In the case of data-driven ontologies, functional completeness is measured against the corpus that the ontology to be covered.

Table 3. The definition is given for the criteria and metrics in the previous studies.

5 Results and Discussion

When mapping the criteria with respect to the defined dimensions in our study, few deviations have been observed in the literature. For instance, in the article [34], the scholars have defined structural complexity: the number of subclasses in the ontology as a syntactic characteristic [34]. Although it describes the static property of the ontology structure, it is not a property that reflects the syntactic feature as defined in [34]. Importantly, the authors in [32] show that the structural metric: cohesion can be adopted to measure the semantics instead of the structural features of an ontology. This implies that the metrics do not strictly attach to one dimension and they can be measured in different ways to achieve the desired quality objectives. Thus, the measures would influence many dimensions through several criteria, which could be mapped with several ontology levels as shown in Table 2.

Moreover, in Table 4, we represented the metrics with levels and evaluation approaches that give an overview for researchers to identify possible quality metrics to be considered in each level with a suitable approach. Moreover, it has been mentioned that whether those metrics can be assessed manually or semi-/automatically. Based on our analysis, the metrics related to the structural, and syntactic levels can be automated as they are domain-independent. The metrics that come under the semantic and lexicon level also are automatable, however, need extra effort as those are domain depended. To assess the context level criteria, manual methods are mostly required as those are relative to the users and may not have specific quantitative metrics.

The metric: formalism describes the capabilities of the language that the ontology is written such as machine understanding, reasoning, and defining required features (i.e., entities, properties, relations). This has not been included under the proposed dimensions as it is considered before the ontology is modeled. However, the metric formalism is noteworthy to consider when selecting a suitable ontology language for modeling. Thus, we included it in the matrix that can be assessed manually by ontology engineers.

To measure each metric, at least one approach is available. If many approaches are applicable, a suitable approach could be selected based on the purpose and availability of resources (i.e., time, experts, type of users, standards). For example, accuracy can be assessed through expert interventions, application-based or golden standard-based approaches [10]. However, in most cases, the golden standard (i.e., standard ontology, vocabularies, rules) is not available for comparison, and definitely, it is necessary to go with one of the other two approaches. If it is hard to evaluate an ontology in a real environment (i.e., application-based) with naïve users then the experts-based methods are acceptable.

Table 4. Matrix of ontology quality metrics, levels, and approaches.

When identifying the possible metrics with respect to the levels, we ignored the metrics defined in [3] since they have been defined by assuming ontology is a software artifact. As a result, the provided metrics are more subjective, and it is hard to match them with the ontology levels except the metrics defined under the structural dimension. Moreover, the authors in [22], have provided a criteria selection framework, without differentiating the metrics and the criteria. However, we adopted few metrics from it and have been included them in the matrix according to their classification (i.e., which are in the italic format in Table 4).

6 Conclusion and Future Work

A comprehensive theoretical analysis was performed on the ontology quality assessment criteria with the aim of providing an overview of criteria and possible metrics. The outcome of this analysis can be used to assess each ontology level with possible approaches as this domain has not been covered in the previous studies. To this end, we analyzed the definitions provided in the research works and clarified the vaguely defined definitions by studying theories in [36,37,38, 41,42,43,44]. Consequently, we were able to identify fourteen ontology quality criteria namely syntactic correctness, cognitive complexity, conciseness, modularity, consistency, accuracy, completeness, adaptability, applicability, efficiency, understandability, relevance, usability, and accessibility. These criteria have been classified under five dimensions namely: syntactic, structural, semantic, pragmatic, and social. Finally, a matrix was constructed that presents the association among the ontology levels, approaches, and criteria/metrics (see Table 4). This would become useful to gain an insight for researchers when dealing with the ontology quality assessment. Moreover, the absence of empirical evidence on the ontology quality assessment has limited the use of criteria in practice, and finding a methodological approach to derive ontology quality criteria with respect to the users’ requirements (i.e., fit for the intended purpose) remains an open research problem.