Keywords

1 Introduction

In recent years, with informatics being omnipresent in medicine, an increased number of healthcare term representations were created. Such representations are used to systemically denote, categorize, and relate healthcare data, allowing easier handling of the data in healthcare information systems (healthcare IS). The coexistence of multiple representations introduced a major problem in healthcare IS development: the problem of healthcare IS integration. In order for an healthcare IS to fulfill its purpose of assisting medical personnel in their activities, it must be able to exchange data with other healthcare ISs. For example, patients medical history is transferred when they change a place of residence, anonymized data is provided to research facilities, and so on.

Many standards have been created to allow interchange of data and integration of such healthcare ISs, usually focusing on low level protocols and predefined message formats. In the era of Internet, high connectivity and openness introduced an opportunity for a different kind of integration approach. Such an approach may utilize semantic-based technologies to represent and communicate knowledge between the healthcare ISs. Ontologies are often used as a way of representing such knowledge. Two main reasons for using them are their ability to capture healthcare knowledge in a formal way and an easy application of reasoning processes that is performed by a medical decision support system. Resource Description Framework (RDF)Footnote 1 in conjunction with Web Ontology Language (OWL)Footnote 2 can be considered as a de facto standard when it comes to semantic web and linked data technologies, and represents a foundation for defining healthcare ontologies. Despite the popularity of OWL/RDF format, systems are often centered around traditional eXtensible Markup Language (XML) technology and relational databases, partly because of good validation tools and support from major manufacturers. Majority of traditionally used data representation languages offer some sort of input validation like parsing grammars and meta-models for domain specific languages, XML Schema (XSD) for XML, and Data Definition Language for Structured Query Language (SQL). These properties of traditional systems can be preserved while using the OWL/RDF technology as an additional layer of integration [13], in order to obtain a semantically rich representation of underlying knowledge.

OWL/RDF based integration approaches are not a new idea and are evidenced both in many research papers published in previous years and many projects and movements such as the Yosemite manifestFootnote 3, SemanticHealthNetFootnote 4, and Clinical Information Modeling InitiativeFootnote 5. The goal of this paper is to provide an overview and a critical review of existing healthcare ontologies and approaches to healthcare IS integration, with a focus on OWL/RDF based solutions. With this review we want to show that although a lot work is done in this area, no universal or omnipresent solution has surfaced to allow automatic or at least semi-automatic integration of healthcare ISs. Solutions are usually confined to a specific part of a healthcare domain and are used for specific use cases. As there is a large number of established and emerging ontologies covering this subject, our review will not provide an exhaustive collection of all the references in the area, but present the most notable standards, ontologies, taxonomies, and integration approaches.

Apart from Introduction and Conclusion, the paper is organized in three sections. In Sect. 2, we give an overview of currently used standards in healthcare domain. We also provide information about existing OWL/RDF ontologies that describe these standards. Existing approaches to integration of healthcare ontologies are described in Sect. 3. In Sect. 4, we discuss related work.

2 An Overview of EHR Standards and Healthcare Ontologies

Currently, there is a plethora of electronic health record (EHR) standards covering many aspects of healthcare including the management of clinical records. These standards cover exchange of messages and patient data between healthcare institutions, integration of medical devices, interfaces with clinical decision support systems, etc. Examples of such standards that are trying to prescribe common building blocks of EHRs are: Health Level Seven (HL7)Footnote 6, in particular the Clinical Document Architecture (CDA) part of the standard; CEN/ISO EN13606Footnote 7; openEHRFootnote 8; and The Clinical Element Model (CEM) [7]. These share some common elements and visions of how an EHR should be structured and how a system should be implemented [32]. However, as these standards are managed by different companies from different countries, the specifications diverged significantly. This makes the integration and exchange of data between systems implementing two different standards a major issue to be addressed.

Before starting with the integration process, one must gain a deeper knowledge of underlying EHR elements from the aforementioned standards. Additionally, as we plan the approach based on the OWL/RDF, it is necessary to find all ontologies implemented in these technologies for each of the standards. Furthermore, it is important to find health ontologies not implemented according to these standards as to have wider picture of the current state-of the-art in this area.

In this Section, we present standards that are directly related with information architecture for communicating patient EHRs. Where applicable, we will give an overview of OWL/RDF based ontologies related to each of the standards.

2.1 Health Level Seven (HL7) Suite

Health Level Seven International (HL7) is a non-profit organization with a goal to develop healthcare standards in order to increase interoperability of healthcare information systems. It created a set of standards for the exchange, integration, sharing, and retrieval of electronic health information. These standards aim to support clinical practice and the management, delivery and evaluation of health services. The current version of the standard suite is the HL7 Version 3 which is centered around HL7 Reference Information Model (RIM)Footnote 9. RIM is an object model representing HL7 clinical data (domains) and identifies the life cycle that a message or groups of related messages will carry. In addition to the RIM, other HL7 standards specify elements whose data is of different types. Therefore, an additional standard named HL7 V3 Data Types (DT) is created and extensively used throughout the standard suite.

A part of HL7 which aims to standardize EHRs structure and semantics is Clinical Document Architecture (CDA) document markup standard. The machine derivable meaning of CDA components is defined using the HL7 RIM, HL7 DT, and by referencing a shared medical terminology such as The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT)Footnote 10. By using such a shared terminology not only does it allow a CDA to be interoperable with other standards using the same terminology, but the SNOMED CT is also used for representing the semantics of clinical documents. The CDA uses eXtensible Markup Language (XML) for the specification of appropriate clinical information with references to SNOMED CT terms which is a way to improve interoperability with other standards using the same terminology.

Because of the benefits of OWL/RDF, in regard to universal knowledge representation and automatic inference of rules, many papers are dealing with the representation of HL7 RIM and HL7 DT in this technology [11, 14, 28]. Of all the papers, [14] offers the most complete and most recent implementation of HL7 RIM. In [27] an ontology for HL7v3 was developed. However, due to some reported shortcomings of HL7 from the ontological viewpoint [35, 38, 39] and new explanations provided by the authors of HL7 RIM in [31], we feel that the HL7 RIM OWL implementation must be further developed and refined from the ontological point of view. These RIM-based ontologies can be considered as meta-ontologies that are used as a building blocks of all other ontologies. In such a way, an ontology of a specific part of EHR can be modeled using the classes specified in RIM ontology. One of such examples is the definition of a dental ontology in [8]. Developing the CDA ontology is the important step in providing means to integrate the HL7-based systems with the systems implementing other standards. The CDA can be used in ontology based integration solutions in two ways: CDA-based XML documents may be transformed into RDF triplets (iSMART approach [19]), and developing CDA ontology with OWL/RDF tools [12].

HL7 set of standards is also being extended with the Fast Healthcare Interoperability Resources (FHIR)Footnote 11. In the development of the FHIR currently popular web technologies are used. Therefore implementing FHIR using OWL is one of the main goals of HL7 International. This will have a positive influence on the ontology-based integration approaches as HL7 OWL/RDF implementations of FHIR concepts are added to the third-party implementations such as [2, 21].

2.2 openEHR and CEN/ISO EN13606 Standards

openEHR is an open-standard set of specifications in healthcare informatics with the aim to standardize management, storage, retrieval, and exchange of health data in EHRs. It follows the dual-model approach [1] that differentiates between two levels: (i) information level represented by a reference model specifying statements that are applied to all entities of the same class and (ii) knowledge level which is represented through archetypes that are statements about specific entities. The openEHR standard provides both functional and semantic interoperability, allowing for it to be read and processed by both humans and machines respectively.

The Electronic Health Record Communication (EN13606) is a European norm from the European Committee for Standardization (CEN) that specifies the normative for exchanging patient records between EHR systems. Although a stand-alone standard, it can be viewed as a subset of openEHR. Both EN13606 and openEHR follow the dual-model approach, however slightly different archetypes were defined. Transformations between archetypes were developed in [22] allowing future ontology-based integration approaches to consider the two standards in the following ways: (i) separately, where the ontology of each of the solutions should be developed or used if already exists, and (ii) together, using one ontology and transforming the EHRs described in one standard to the other using the predefined archetype transformations.

openEHR OWL ontology covering reference model, data types, and data structures of the openEHR was developed by RomanFootnote 12. Also, authors of [24] have developed an OWL ontology of the archetype library by following the guidelines from official openEHR specification. Therefore, first two ontologies can be viewed as a single, complete, ontology that can be used in the integration approaches. An ontology for both EN13606 and openEHR were developed in [16] as a part of an attempt to provide semantic interoperability between the standards. However, more complete ontology for EN13606 was developed by the authors of [30] while trying to develop an architecture comprising of different EHR systems capable of inter-operating so as to offer an integrated service using interoperability patterns based on EN13606 and the semantic technologies such as OWL. A partial OWL ontology for the definition and validation of archetypes was also developed in [25].

2.3 The Clinical Element Model (CEM)

The goal behind the development of the Clinical Element Model (CEM) [7] was to provide a single, referent, architecture for representing information in EHRs. CEM comprises two models: Abstract instance model for representing individual instances of collected data, and Abstract constraint model for representing constraints on the data instances. These two models are abstract specifications and can be implemented using different programming languages. The main purpose of such an abstract implementation is to provide a way to normalize different data from EHRs.

Originally, CEM was implemented using the Clinical Element Modeling Language (CEML) [7] which has a XML-like syntax. Additional implementation was in Constraint Definition Language (CDL) [15] that extends CEML to allow specification of new constraints to the modeling language. In order for CEM to be integrated using OWL/RDF technologies, the CEM-OWL ontology is developed by the authors of [37]. They have also developed an automatic transformation from CEM-XML specification to the corresponding CEM-OWL specification. This will allow future researchers to focus more on the process of integration than on data acquisition.

2.4 Other Healthcare Ontologies and Vocabularies

In addition to the previously described ontologies, that are based on widely used standards, several other healthcare ontologies were developed. These ontologies usually focus on some specific parts of EHR, but it could be beneficial, in the context of globally transferable healthcare data, to integrate systems that are using these ontologies.

Open Biomedical Ontologies (OBO) [34] is a set of ontologies developed and maintained by the scientific community with a goal to allow easier representation and integration of biomedical data. To clarify the terminology, the biomedical domain is broader than just the healthcare domain as it comprises other knowledge not only specific to patient medical care and EHRs. Disease Ontology (DO) [33] is a part of OBO repository and can be used to describe patient disease history in EHRs. A benefit of using this ontology is a fact that it is heavily referencing SNOMED and other medical thesauri. Another ontology form the OBO repository is the Gene Ontology (GO) [10] that provides structured, controlled vocabularies and classifications used in the annotation of genes, gene products, and sequences.

The Foundational Model of Anatomy Ontology (FMA)Footnote 13 is the representation of classes or types and relationships necessary for the symbolic representation of the phenotypic structure of the human body. FMA is a domain ontology that represents a coherent body of explicit declarative knowledge about human anatomy but can be also applied and extended to all other species.

Due to the lack of a common vocabulary, healthcare ontologies often reference terms from various existing vocabularies. Vocabulary standards are used to describe clinical problems, terms, categories, procedures, medications, and allergies. Various medical vocabulary standards exist and in order to implement a usable healthcare standard interoperability, these vocabularies must be also taken into the consideration. We have already described SNOMED-CT in Subsect. 2.1. Medical Subject Headings (MeSH) [18] consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. It is not an ontology per se, but it is referenced from a vast majority of ontologies as to provide classification and categorization of the biomedical terms. Therefore, it should be considered in all integration approaches as it provides structure and hierarchical information about the medical categories. MeSH is often used in conjunction with RxNorm [20], a pharmaceutical vocabulary used for e-prescribing, medication history, government reporting, and drug compendium mapping, and Logical Observation Identifiers Names and Codes (LOINC) [23], a database and universal standard for identifying medical laboratory observations. Another classification commonly referenced from ontologies is International Statistical Classification of Diseases and Related Health Problems (ICD) [40]. It is a medical classification provided by the World Health Organization (WHO) and contains codes for diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. More information about these vocabularies can be found in [29]. Another thesauri not covered by [29] is provided by the National Cancer Institute (NCIt)Footnote 14. NCIt is a widely recognized standard for biomedical coding and reference, used by a broad variety of public and private partners both in the U.S. and internationally.

The Unified Medical Language System (UMLS) [4] aims to alleviate the problem that exists when using multiple vocabularies in a healthcare informatics system. UMLS comprises several controlled vocabularies in the biomedical sciences including SNOMED-CT, ICD, RxNorm, etc. It provides a mapping structure among these vocabularies and it may also be viewed as a comprehensive thesaurus and ontology of biomedical concepts. Therefore, it is often referred as the UMLS meta-thesaurus. Although it provides a mapping structure, it does not make semantically integrated terminology interoperable. However, it provides enough information about term relations to be used in an integration process. Additionally, UMLS provides facilities for natural language processing.

3 Approaches to the Integration of Health Ontologies

Ontology alignment in healthcare has been an issue of research for many years, with a wide variety of approaches. The main differences is the degree of automation in the process of integration, as well as the number of ontologies they cover. Ideally we would prefer systems that offer a high degree of automation, can cover multiple ontologies and even offer support for multi-domain ontology alignment. Below we give an overview of some of the different approaches and their key characteristics.

The authors of [3] have proposed a system called Artemis Message Exchange Framework (AMEF) for mediating messages between HL7 v2 and HL7 v3, which they claim might be generalized to any two different healthcare standards. The ontology alignment itself is solved manually, while the main focus of the paper is in the format conversion (HL7 V2’s EDI \(\rightarrow \) XML \(\rightarrow \) OWL and OWL \(\rightarrow \) XML \(\rightarrow \) HL7 V3 message).

In paper [9] authors have attempted to integrate SNOMED-CT and disease ontologies. Relying on UMLS meta-thesaurus as a reference to disambiguate the term meaning, they have calculated the semantic similarity of concepts using Wu and Palmer’s algorithm and Jiang Coranth’s semantic similarity measure. This paper exemplifies usage of a reference knowledge base (the UMLS meta-thesaurus), which is a pattern followed in other papers recently.

In paper [17] authors have used HL7 and openEHR ontologies and created an ontology matching system by applying well-known tools FalconFootnote 15 and Agreement MakerFootnote 16 to the healthcare domain. Although Agreement Maker is primarily used for the biomedical domain, both of these systems are general purpose ontology matching tools which gives some hope that part of this problem can be generalized to different domains.

There have also been attempts to merge healthcare ontologies with ontologies from different domains. This kind of use case is interesting even for healthcare institutions that rely on a single healthcare IS provider, as it allows integration with systems that are not directly related with healthcare. One such paper [36] proposes a method to merge three multi-disciplinary ontologies related to diseases, places, and environments, relying on expert knowledge and statistical data analysis.

There are also cases of purely automated ontology matching systems. In one such example [26] authors present a machine learning based approach to ontology alignment, using the AdaBoost ensemble technique. This is done by training the system on a similarity matrix computed by one of the similarity methods (string-based, linguistic, and structural).

Even though these papers cover a majority of the ontologies mentioned in the previous section, not one of these approaches can overcome the problem of aligning ontologies implemented in different technologies. Each of these approaches works decently on its problem ontologies, but there is still a need for a universal solution. Therefore, we should utilize the advantages of these approaches in a single solution and transform different technologies into one. Currently, there are movements such as the Yosemite initiative that aim to create or at least propose a theoretical background for a unified solution, by using the OWL/RDF technology as a common ground for ontology alignment.

The Yosemite initiative suggests a two-step approach to healthcare ontology integration: (1) transforming any ontology format to OWL/RDF and (2) creating an integration algorithm for two OWL/RDF ontologies. Once the ontologies are transformed to OWL/RDF representation, in order to implement the second step, one must create an ontology alignment algorithm. Although this is the hardest and the most work-intensive part, it is always easier to preform integration and create more general solutions for a single representation technology than to create transformations on by-representation basis. Therefore the first step of the approach is the prerequisite to have such an universal technology, and due to that fact it is a subject of numerous discussions and criticism. The main issue concerns the choice of OWL/RDF for the universal representation technology for both ontologies and alignments/mapping between ontology concepts. There are several reasons of why the OWL/RDF is a suitable technology [5]:

  • It is possible to map any other representation to RDF. RDF is made up of atomic statements (triplets of subject, object, and predicate). As triplets are atomic pieces of information, all other more complex information can be implemented by a set of triplets. Sometimes, this may lead to more verbose representations.

  • RDF captures information, not syntax. Therefore, many different syntaxes (XML, Json, etc.) may be used to provide serialization for triplets. Therefore, usual storage mechanisms may be used for RDF-based solutions.

  • RDF is self describing as it uses Uniform Resource Identifiers (URIs) as main identifiers. This reduces ambiguity and allows the creation of term definitions to be referenced by any other documents. This reduces ambiguity and allows single points of knowledge.

  • OWL/RDF enables inference that derives new assertions from existing ones. This can lead to more automation of data translation processes.

In addition to these benefits, the sheer fact that the most notable healthcare ontologies and standards have their official RDF representation, or are getting RDF implementation for their next release (FHIR), is in favor to the claim that the RDF is a valid option. In the end, we feel that using the OWL/RDF can lead to the great reduction in complexity and more order in the already very confusing world of healthcare ontologies.

4 Related Work

While there have been a number of survey papers in the fields of ontology alignment as well as about healthcare ontologies in general, the number of papers focusing strictly on the state of the art healthcare ontology alignment systems is far smaller.

The authors of [29] mention five different ontologies and taxonomies. They also divide ontology alignment into three major categories by purpose: (1) mapping a global ontology view to a local ontology view in order to describe a proprietary local ontology better (2) semantic mapping between parts of the local and the target ontology and (3) mapping multiple ontologies in order to provide new knowledge not contained within the separate instances.

Paper [6] provides a slightly more detailed overview. Authors mention a wide variety of mappings between some two concrete ontology types, a lot of which base their approach on using the UMLS meta-thesaurus. They also show that it is possible to do ontology mapping by relying on First Order Logic (FOL).

Both papers deal the issue of aligning ontologies without tackling the problem of ontology format. They do not provide their thoughts on benefits and drawbacks of an approach to ontology alignment where all ontologies share a common implementation technology such as OWL/RDF.

5 Conclusion

Ontology alignment in the healthcare domain is far from solved. There is a large number of ontologies, vocabularies and taxonomies, and even though there are attempts such as HL7 to standardize and cover most of the healthcare field, it does not seem like there is going to be a single general ontology any time soon. Far more likely, we’ll continue to see new ontologies appearing and some of the old ones disappearing for quite a while. Furthermore, even if there was a single healthcare ontology, alignment between multi-disciplinary ontologies remains a problem. Therefore, it’s extremely important to consider creating automatic ontology alignment algorithms, as not only is the process of manual ontology matching hard, time consuming, and error prone, it’s also not going to offer a complete solution simply because of the ever increasing number of different ontologies.

To better understand the current state of the art of automated algorithms in ontology alignment, an initiative (OAEIFootnote 17) to benchmark them has been devised. Of particular interest to the healthcare field is the largebio challenge, featuring the alignment detection problem between three ontologies (FMA, NCIt, and SNOMED-CT) using UMLS as a meta-thesaurus. The last competition (OAEI2015) had 12 participating groups in this challenge category, and further, the benchmark data and framework provided by OAEI can also be used after the competition, as was done in [26].

The main healthcare ontologies today are defined using a number of different formats. Therefore, writing an automated tool that would work between any two ontologies is a challenging task due to the number of possible combinations. We think that in order to solve this problem a common ontology format should be used. Our stance is in accordance with the Yosemite initiative that proposes OWL/RDF as a common ontology representation format. The Yosemite initiative also suggests a two-step approach to healthcare ontology integration: (1) transforming any ontology format to OWL/RDF and (2) creating an integration algorithm for two OWL/RDF ontologies. We think that following such an approach to integration will lead to simplifying the currently complicated field of ontology alignment.