Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Semantic Web is “a set of standards for knowledge representation and exchange that is aimed at providing interoperability across applications and organizations” [1]. The degree of this interoperability between human and software agents depends upon how many communities they have in common and how many ontologies they share [1]. An ontology, which has been called the third component of the Semantic Web, is defined simply as a group of consistent and related terms [1] and more formally as “a formalization of a shared conceptualization” [2]. The latter definition, and the idea that the conceptualization is “shared” is expanded further by Hepp et al. (2006) who asserted that “ontologies are not just formal representations of a domain, but much more community contracts about such formal representations” [3].

A community consists of a set of relationships between people sharing a common interest [4]. An online community can then be considered as a community that employs the Internet for communication among its members [4]. Berners-Lee and Kagal described the Semantic Web as composed of overlapping online communities of varying sizes and fractal in nature, as membership in these communities changes frequently [1]. Many online communities allow members to participate fully in the site through contributing and accessing information, as well as by commenting on the information added by other members. The BioPortal ontology repository [5], for example, considers anyone who uses this portal to be a member and allows them to actively contribute to the content in the library -– a fact that its designers claim should increase the quality of that content [7].

This feeling of shared responsibility within a community for the overall improvement of the ontological content is consistent with what Shadbolt and Berners-Lee have asserted will greatly reduce the effort involved in developing an ontology as the size of the community grows [6]. Noy et al. contend that the Wisdom of the Crowd could even replace knowledge experts when a consensus is able to be reached within a community [7]. Reaching this consensus, however, is not always easy, requiring time and effort, and a large number of dedicated participants. Therefore, the degree of participation in the process of revising, adopting, expanding and reviewing of any ontology is a factor in the assessment of that ontology’s value.

The selection of an ontology from among the options available in an ontology repository should be made based upon a broad set of attributes that may be weighted depending upon the requirements of each application [24]. One of the attributes to include in such a list of criteria should be the acceptance of the ontology within its community. Metrics to assess this acceptance should include measures of how many community members endorse the ontology, how long the ontology has been available, how much active participation has been done by community members in the ontology’s development. This community acceptance attribute is difficult to assess, with metrics to measure it not applied successfully in the past [18]. While much work has been carried out developing metrics related to syntactic, semantic and pragmatic aspects of ontologies, the social quality of ontologies has not been thoroughly investigated. The objective of this research, therefore, is to do so.

This research introduces new metrics for social quality assessment, defines them formally, applies them to existing ontologies, and analyzes the challenges involved in using them. The result is to show how these attributes provide valuable insight into ontology quality and should, therefore, be included in any rigorous ontology evaluation. The results of this assessment could promote interoperability between systems and help progress the use of ontologies in the Semantic Web. Terms related to social quality assessment used in this paper are defined in Table 1.

Table 1. Definitions of terms related to social quality assessment

The next section provides an overview of prior work on assessing ontology quality based on its social valuation. Sections 3 and 4 present history and authority metrics for assessing ontology social quality, and outlines the implementation of these metrics. Section 5 describes a case study validating the results of the social quality metrics. Section 6 summarizes the work and suggests future research directions.

2 Related Research

In the decade and a half since the introduction of the Semantic Web [14], much work has been carried out on ontology evaluation. Many researchers have addressed the complexity of choosing a high-quality ontology for a particular task or domain. Attributes considered to be valid measures of ontology quality include adaptability, clarity, comprehensiveness, conciseness, correctness, craftsmanship, relevance, reusability, richness and stability as well as many others [14]. Numerous metrics have been developed to assess these and other aspects of ontology quality. Specific metrics which assess one particular attribute and broad suites of metrics that attempt to provide an overall picture of an ontology’s quality have been developed [1525].

D’Aquin and Noy (2012) defined an ontology library as “a Web-based system that provides access to an extensible collection of ontologies with the primary purpose of enabling users to find and use one or several ontologies from this collection” [26]. Although ontologies should reside in libraries and be developed and endorsed by communities that share a common interest [6], little work has been conducted to develop a means for assessing the amount of recognition received by each ontology within a library. To provide a comprehensive picture of an ontology’s quality, factors such as how much the ontology is being used, how many other ontologies refer to this one as an authority, and how long the ontology has been in existence, should all be taken into consideration [18].

2.1 Ontology Role in Communities

A community can no longer be considered as a physical place, but, rather, as a set of relationships between people who interact socially for their mutual benefit [4]. An online community is a social network that uses the Internet to facilitate the communication among its members rather than face-to-face meetings [4]. These virtual social networks are frequently used for information sharing and problem solving among members who share common interests [12].

Ontologies have been defined as formal representations of a domain, but in order for those representations to be meaningful, they must be agreed upon by the members of a community [6]. This type of meaningful discourse between members of a group is a dynamic social process consisting of shared topics being added, expanded, revised or even discarded. Therefore, an ontology representing the shared communication between members should not be static, but should be able to reflect the community consensus of meaning at any particular time [1]. When a community shows its approval of an ontology by actively participating in its ongoing evolution, the quality of the ontology is more likely to be high within that community [26]. A way of measuring this type of active participation would be helpful in assessing community endorsement of a particular ontology.

2.2 Metrics Suites

The usefulness of metrics to provide a quantified measurement of ontological quality has long been recognized [19] with many metric suites being created that attempt to provide a broad picture of many aspects of an ontology’s quality. OntoQA [19], OQuaRE [25], OntoMetric [17], and AKTiveRank [20] are a few of the most comprehensive suites of metrics. Table 2 summarizes these, and other, metric suites currently available for broad ontology assessment, identifies the number of metrics, and specifies how many of them measure an ontology’s social importance within a particular library.

Table 2. Examples of broad metrics suites

2.3 Social Assessment Within Metrics Suites

Although communities should support the development, maintenance and endorsement of ontologies [6], very few assessment systems have a means by which to measure an ontology’s value within its community. OntoMetric [17], the BioPortal Recommender [22], and the Semiotic Metrics Suite [18] are among the few suites that attempt to assess an ontology’s acceptance within a community as one of the factors to measure its quality. Unfortunately, none of these assessment suites are able to fully evaluate the level of acceptance an ontology receives within its community.

OntoMetric [17] contains approximately 160 metrics for assessing ontology quality, which focus primarily on the fitness of an ontology for a particular software project for which it will be used. However, only three of its metrics relate to its relationship with other ontologies. The large number of metrics makes the OntoMetric system difficult to employ [19]. The OntoMetric system reflects the fact that part of the suitability of an ontology for a given project is the methodology used to create it. It, therefore, assesses the social acceptance of that methodology by counting the number of other ontologies that were created with it, the number of domains that have been expressed with its developed ontologies, and how important the ontologies developed with this methodology have become. Unfortunately, in most situations, a user must attempt to answer these questions (perhaps by conducting additional research) as well as to provide an answer expressed on a scale between “very low” and “very high,” reducing the accuracy of the results in this factor’s assessment.

The BioPortal recommender system includes Acceptance metrics as part of the ranking system that it provides as a tool for choosing an ontology for a particular purpose [22]. Users enter desired keywords and the recommender system presents a list of ontologies from the BioPortal repository containing the keywords. The list of applicable ontologies is ranked in order of each ontology’s score on four individually weighted attributes, one of which is the Acceptance of the ontology within the BioPortal community. The other three attributes that are included in the Recommender system are Coverage, Detail of knowledge and Specialization. Unfortunately, the metrics used by the BioPortal recommender system to assess Acceptance are based on factors such as the number of site visits to the BioPortal website, membership in the UMLS database and mentions in the BioPortal journal, so those metrics cannot be used on ontologies in other libraries without access to this information.

The Semiotic Metrics Suite developed by Burton-Jones et al. [18] is based upon the theory of semiotics, the study of signs and their meanings, and builds upon Stamper et al.’s [27] framework for assessing the quality of signs. One of the layers of the framework is the Social layer, which evaluates a sign’s usefulness on a social level by evaluating its “potential and actual social consequences” and asks the question “Can it be trusted?” [27]. The Semiotic Metrics Suite includes the Social layer, which measures an ontology’s recognition within a community by two metrics: (1) Authority which measures the link from an ontology to other ontologies in the same library; and (2) History which measures the frequency with which these links are employed. Unfortunately the calculations for these measurements require information that is not available for most ontologies. The number of links from other ontologies to a particular one, and the number of times the linking ontologies have been used for other applications are usually not provided by ontology libraries, making these metrics difficult to use for ontology assessment. This research introduces new Authority and History metrics using information available for most ontologies and includes a case study demonstrating their effectiveness.

3 Metrics for Assessing Social Quality

Social Quality is “the level of agreement among participants’ interpretations” [27] and reflects the fact that, because agents and ontologies exist in communities, agreement in meaning is essential within the community. This research proposes two new metrics to measure the level of an ontology’s recognition within its community by measuring its authority within the library and the history of its participation and use in the library. These metrics can be combined to determine the overall assessment for Social Quality within the library.

Stvilia defines Authority as the “degree of reputation of an ontology in a given community or culture” [8]. One way to measure Authority is by the number of other ontologies that link to it as well as how many shared terms there are within those linked ontologies. More authoritative ontologies signal that the knowledge they provide is accurate or useful [18].

Another social metric is the History of an ontology. The history of a conceptualization is a valuable part of its definition [3]. The History metric measures the number of years an ontology has existed in a library, as well as the number of revisions made to it during the course of its residence there. Ontologies with longer histories are expected to be more dependable because each new revision should improve upon the previous version showing a pattern of active participation by community members resulting in additions and modifications.

3.1 Social Quality Metric

The Social Quality metric is computed by the combined weighted scores on these two measurements defined as SQ. The weights of the History and Authority metrics could be equivalent, but it is possible for a user to adjust the significance of each for a particular task by varying the values of the weights.

Definition 1: The Social Quality (SQ) of an ontology is defined as the weighted average of Authority (SQa) and History (SQh) where wa represents the percentage assigned by the user to the authority attribute and wh represents the weight assigned to the history attribute.

$$ SQ = w_{a} *SQa + w_{h} *SQh $$

3.2 Social Authority Metric

The Authority of a particular ontology is determined by the number of other ontologies that link to it. By scanning all of the other ontologies in the library looking for links to this ontology, two counts are determined: the number of total links to the ontology; and the number of ontologies which include 1 or more references to it. The two counts are weighted depending on the user’s task and the result is normalized between 0, meaning no links at all, to a score of 100, indicating that this ontology is the one in the library with the most links to it. The equation for computing this metric is defined as SQA. External links can also be considered in the determination of SQA if available. Many ontologies, such as the Gene Ontology [28], are in multiple libraries. SQA should then take into consideration all of the links to the Gene Ontology from all of the libraries for which it is a part.

Definition 2: The Social Quality Authority (SQA) of an ontology is defined as the weighted average of the number of linking ontologies (LO) and the number of total linkages (LT) where wo represents the percentage assigned by the user to the number of linking ontologies and wt represents the weight assigned to the total number of links.

$$ SQA = w_{o} *LO + w_{t} *LT $$

3.3 Social History Metric

History is determined by calculating the number of years that an ontology has been a member of a community as well as the number of revisions to the ontology that have been made during those years. The two counts are weighted depending on the user’s task and the result normalized between 1, indicating only one submission that was never updated, to a score of 100, indicating this ontology is the one in the library with the most total revisions over the longest number of years.

Definition 3: The Social Quality History (SQH) of an ontology is defined as the weighted average of the number of years it has been in the library (Y) and the number of submissions (including revisions) that have been uploaded (S) where wy represents the percentage assigned by the user to the number of years and ws represents the weight assigned to the total number of submissions.

$$ SQH = w_{y} *Y + w_{s} *S $$

4 Implementation

A system has been developed to assess community recognition of an ontology by applying the revised Social Quality metrics. This system can be employed by any community containing an ontology repository, and aids in the selection of an ontology when multiple options are available. By entering relevant keywords and desired metric weights into the system, a user retrieves a set of potential ontologies containing the keywords. The system then assesses the Authority of each of those ontologies by searching all the other ontologies in the repository counting the number of ontologies that link to each of the potential ontologies as well as the total number of links. Each ontology in the list of potential ontologies then has its History assessment computed by counting the number of years each ontology has been stored in the library and the number of revisions made to the ontology during that time. The Authority and History metrics are then weighted according to the metric weights entered by the user and the list of potential ontologies is sorted in decreasing order of the overall Social Quality score. At this time the user receives a list of recommended ontologies that contain the desired keywords and that rank high in social recognition from the community. The specific steps carried out for Social Quality metric assessment and ontology ranking are shown in Fig. 1.

Fig. 1.
figure 1

Social quality assessment and ranking of ontologies

5 Case Studies

To obtain an understanding of how information about an ontology’s acceptance within its community could help a user choose an appropriate ontology from a list of options, two case studies were carried out using the social quality metrics. The BioPortal ontology library was chosen for both studies as an example of a large, well maintained ontology repository that has been deemed useful to the biomedical community [5]. The BioPortal website was also selected because of the availability of additional information included in the library that could be used to examine the results of the case studies with other information on its ontology profile pages. The BioPortal website allows members of the community to contribute reviews to its ontologies, list projects, and make suggestions. BioPortal also keeps track of the number of site visits for each of the ontologies, and provides annotation and term mappings services for its ontologies [7].

The first case study applied the social quality metrics to all 383 of the ontologies currently in the library, ranking them from highest Social Quality score to lowest. This case study was carried out to assess whether the highest-ranking ontologies in the library were in actuality the ones that were most endorsed by the biomedical community. The second case study searched the BioPortal library for ontologies matching key terms and determining a list of recommended ontologies ranked by their Social Quality assessments as well as using our SQ metric. The ontology list for each term was then examined to ascertain whether the highest-ranking ontologies on each list was actually more likely to be frequently accessed than the ontologies that showed up later on the list. In both case studies, all metrics were weighted equally in the overall determination of Social Quality. It is possible to weight the individual metrics differently, depending on the particular task requirements. However, for the purposes of the case studies, all metrics were considered equally.

The two case studies showed that useful information could be obtained from assessing the ontologies on their level of endorsement within the BioPortal community. By examining the difference between ontologies high on the list to the ontologies that ranked lower, a pattern can be easily observed about whether the ontologies are well-supported by the BioPortal membership.

5.1 Case Study 1

The Authority metrics were first applied to all 383 of the ontologies currently part of the BioPortal library assigning equal weights to the number of links and the number of linking ontologies. For each ontology, all other ontologies were scanned for references to that ontology and counts made of the number of ontologies that included at least one reference to the ontology, as well as the total number of links to the ontology.

The History metric was then computed for all of the BioPortal ontologies using equal weighting for the number of years that the ontology has been in the library and the number of revisions that have been done to each of them, including the original submission.

The scores for Authority and History were individually normalized between 1 and 100 and then the two scores averaged to generate the overall Social Quality metric for each of them. The list of ontologies was ordered from 100 to 1 in order to identify the most highly ranked ontologies at the top of the list.

Table 3 shows the highest ranked ontologies from the library on the combined social metrics. Examining other information available on the BioPortal website, it is clear that the ontologies that scored high on the Social Quality metric were the ones involved in many biomedical projects and with good reviews from members who have used them. On the other hand, 70 of the ontologies tested scored only 1 out of 100 on the combined social quality metrics. Exploring the BioPortal website revealed that these 70 had no other ontologies linking to them and no revisions after the initial submission, which was often several years prior, and were not currently involved in any listed projects.

Table 3. Highest ranked ontologies from BioPortal using History and Authority metrics

5.2 Case Study 2

The BioPortal repository was searched for ontologies containing each of ten preselected keywords. A list of applicable ontologies was generated for each of the keywords, and each ontology’s Social Quality score determined by applying the method outlined in Case Study 1. Each potential ontology list was sorted in descending order to identify the highest results for each of the terms based upon social quality. The keyword searches each retrieved at least 30 potential ontologies. The results of ranking these ontology lists in reverse order of Social Quality was used to identify the best candidates for a possible task requiring each of the keywords. The top three ontologies recommended for each of the keywords are shown in Table 4.

Table 4. Top three ontologies recommended for each term with corresponding SQ scores.

Additional information provided by the BioPortal website showed that these listed ontologies are favorably reviewed and frequently accessed. In comparison, ontologies retrieved by the keyword search but scoring low on the Social Quality metric, were accessed infrequently, indicating little use within the community. For example, the Current Procedural Terminology ontology (CPT), which ranked highest for two of the keywords and obtained an SQ score of 53, received over 35,000 site visits in the last two years. In contrast, the Bone Dysplasia Ontology (BDO) containing the same two keywords, received an SQ score of 2 and only received 894 site visits.

6 Conclusions and Future Work

This research has introduced two metrics for assessing the authority and history of an ontology within a community and illustrated their effectiveness by applying them to approximately four hundred of the ontologies in the BioPortal library. Results from that case study showed that application of the metrics was feasible and provided useful information regarding ontology recognition within its community.

Future work will consider other factors such as the number of times an ontology has been viewed or downloaded; user comments/ranking of ontologies; and the usability of ontologies to gain a more comprehensive view of the social quality metric. In addition, the social quality metrics need to be incorporated into broad metric suites that assess various attributes of ontologies. When users select an ontology from a number of options, a broad overview is required that considers syntax, semantics, pragmatics, as well as social acceptance, to make an appropriate recommendation to a user. Furthermore, it is necessary for any recommendation system to consider the task for which an ontology will be needed. Merely matching keywords is not enough to select an appropriate ontology; the specific characteristics of the actual task to be completed must also be taken into account.