1 Introduction

Pharmacovigilance is the process of defining, assessing, understanding and preventing any adverse effects of medicines [1]. The Food and Drug Administration (FDA) has operated the Adverse Event Reporting system (AERS), which is the world’s largest database for the spontaneous reporting of adverse drug reactions, since the 1960s [2]. In South Korea, a spontaneous reporting system for adverse drug reactions has been used since 1985. A spontaneous reporting system, however, is limited in its ability to detect adverse drug reactions effectively [3]. A spontaneous reporting system requires more time and cost to collect the data. In addition, because this system depends mainly on voluntary reports from health care providers, the detection of adverse drug reactions may be postponed. Accordingly, drug safety monitoring based on an EMR database is becoming more important for the early detection of possible harmful adverse drug reactions.

There are some representative drug safety surveillance projects using EMR system. In the United States, the Mini-Sentinel project was initiated in 2009 to perform active surveillance of the safety of marketed medical products, including drugs, biologics, and medical devices [4]. The Mini-Sentinel project, which received funding from the US FDA [5], is a collaborative project that includes 25 institutions [6]. In the fourth quarter of 2008, the Observational Medical Outcomes Partnership (OMOP) began to improve the monitoring of drugs for safety and effectiveness. The OMOP was a public–private partnership among the FDA, academia, data owners, and the pharmaceutical industry [7]. The OMOP was funded and managed through the Foundation for the National Institutes of Health. In Europe, the European Commission funded the EU-ADR project, which began to develop an approach to the early detection of ADRs in February 2008 [8]. Four European countries, Italy, the Netherlands, the United Kingdom and Denmark, participated in the EU-ADR project [9]. These drug safety surveillance models have chosen a common data model to create harmonized input files from the distributed databases.

Some studies focused on drug safety surveillance have been conducted [1012]; a variety of studies were required for the development of a CDM for drug safety surveillance based on multi-center EMR systems. An approach to defining a common data model is the development of robust methods to overcome the possibility that different structures and coding schemes may limit the rapid evaluation of adverse drug reactions using multi-center EMR systems [13]. Using this common data model, which expresses the functional relationships regarding each database represented, has improved abstraction for accessing data through a common interface.

While there are advanced common data models, such as the OMOP data model and the Mini-Sentinel data model, it is difficult to apply these models within South Korea due to different treatment practices and database structures. Additionally, most of the EMR systems in Korea were developed in-house with existing structures. Furthermore, hospitals have heterogeneous EMR database structures and different terminologies, except for diagnosis codes, and it is difficult to integrate information. This heterogeneity makes it difficult to analyze information consistently [14].

Accordingly, an approach utilizing a common data model is important in developing an ADR surveillance system. After developing the CDM for Korea’s hospitals, if each hospital transforms the heterogeneous information utilizing the CDM with the EMR database, the EMR data could be analyzed consistently. In addition, it would be possible to detect an ADR early. Based on these considerations, we developed the EADR CDM for early decision of ADRs based on multi-center EMR systems. In addition, we validated the EADR CDM by exporting the actual EMR data of an academic hospital and comparing the EADR CDM with the EMR data table.

2 Drug safety surveillance model

The US NIH’s OMOP and the US FDA’s Mini-Sentinel were started in 2008 and comply with the HIPPA security rule for security. The OMOP and the Mini-Sentinel program have looked at opportunities associated with implementing and using a CDM. In addition, the OMOP and the Mini-Sentinel program have shared many resources in detail regarding CDMs. Therefore, we analyzed the characteristics of the OMOP and the Mini-Sentinel data models to develop the EADR CDM.

2.1 Mini-Sentinel data model

The Mini-Sentinel program has 19 partnerships and was funded by the US FDA. The Mini-Sentinel project used a distributed database for organizing and standardizing databases. The standardized data at the individual level remained in the local database [15]. The Mini-Sentinel program used claims data and outpatient and inpatient EMR and registries. The distributed database included administrative and claims data from 2000 to 2011 for over 300 million person-years, 2.4 billion encounters, 38 million inpatient hospitalizations, and 2.9 billion dispensed drugs [4]. In the Mini-Sentinel project, the data partners, such as Health Core, Inc., the HMO Research Network, Humana, the Kaiser Permanente Center for Effectiveness and Safety Research and Vanderbilt University, developed the Mini-Sentinel distributed data system [5]. The Mini-Sentinel project provided a CDM for organizing and standardizing the databases.

The Mini-Sentinel CDM (MSCDM) contained many of the data elements necessary for medical product safety evaluations [5]. The MSCDM V1.0 consisted of eight tables. The MSCDM V2.0 added new tables, including laboratory, vitals, and summary tables, which contained revised definitions of the encounter table variables such as discharge disposition and discharge status. The MSCDM V2.0 consisted of fourteen tables.

The MSCDM V2.0 was modified in January 2012. The MSCDM V2.1 consists of ten tables that represent information on most of the key data elements required for the Mini-Sentinel program activities. We used the MSCDM V2.1 to develop the EADR CDM for multi-center EMR systems. Details of the MSCDM V2.1 are provided in Table 1.

Table 1 List of the Mini-Sentinel CDM V2.1 tables

2.2 OMOP data model

The OMOP was initiated in 2008 to conduct research activities concerning the governance, data access, technology, and methods for drug safety and benefit monitoring [7]. The goal of the OMOP was to refine the secondary use of multiple observational databases [16]. The OMOP had 39 members and was funded by the US NIH. The OMOP methods were developed for distributed analysis, and an automatic mapping procedure using a standard vocabulary was applied. This project had a dictionary of coding guidelines for health outcomes of interest (HOI) that could be applied to electronic data such as health insurer claims and EMR databases [17]. The OMOP used the data from administrative claims and EHR.

The OMOP CDM V3.0 consisted of eighteen tables that included person, drug exposure, drug era, condition occurrence, condition era, visit occurrence, procedure occurrence, observations, observation period, death, drug cost, procedure cost, location, provider, organization, care site, payer plan period, and cohort tables [18]. Details of the OMOP CDM V3.0 are provided in Table 2.

Table 2 List of the OMOP CDM V3.0 tables

3 Methods

The following section describes study methods in detail. First, we analyzed the data models from two prominent drug safety surveillance studies, the Mini-Sentinel project and the OMOP, to develop the EADR CDM. Second, based on a common data model between the Mini-Sentinel data model and the OMOP data model, we developed the EADR CDM. Although the Mini-Sentinel and OMOP data models are advanced models, there are limitations in directly adopting a domestic system because Korean hospitals have heterogeneous EMR databases. Third, we exported the EMR data of the S hospital and converted this information based on the EADR CDM to evaluate the possibility of applying the EADR CDM to Korean hospitals. In addition, we exported data to detect ADRs according to research topic using the database that was converted based on the EADR CDM. For the data we used, the first prescription date was from January 2008 to December 2012. We counted the number of prescription based on patients’ visit. The descriptive analysis was conducted using the data extracted from 135,745 patient records. Age was calculated as ‘the prescription date-birth date’ in this study. We focused on six drugs that are the most popular medications in clinical practice: ranitidine, clopidogrel, rosuvastatin, ciprofloxacin, fluorouracil, and celecoxib. Through this process, we validated the EADR CDM for adverse drug reaction signal detection. In addition, we evaluated whether the exported data were sufficient to detect ADRs to the six drugs.

4 EADR common data model

4.1 The EADR CDM

The EADR CDM consisted of eight tables, including demographics, visit type, procedure, death, diagnosis, drug, laboratory, and organization tables. Each table consisted of 5–12 variables. Figure 1 displays the entity relationship-diagram (ER-Diagram) for the EADR CDM.

Fig. 1
figure 1

ER-Diagram for the EADR CDM

The demographics table included the patient ID, birth data, gender, race, and nationality, as shown in Table 3. The visit-type table consisted of eight columns: patient ID, visit type, visit start date, visit end date, admission date, discharge date, department and discharge disposition. The procedure table included patient ID, procedure date, procedure code, and procedure name. The death table consisted of patient ID, death date, death diagnosis code and death diagnosis name. The diagnosis table included eleven columns: patient ID, diagnosis date, diagnosis name, diagnosis code, diagnosis department, allergy code, allergy name, allergy count, medical history code, medical history name and medical history count.

Table 3 Demographics, visit type, procedure, death and diagnosis tables

A shown in Table 4, the drug table consisted of sixteen columns: patient ID, drug name, drug ingredients code, drug ingredients name, drug chemistry, ATC code, drug group, drug count, drug unit, drug route, dosage start date, dosage end date, dosage date, dosage purpose and dosage stop flag and dosage flag.

Table 4 Drug, laboratory and organization tables

The laboratory table had eleven columns: patient ID, order date, order name, lab type, lab data, result date, observation result, result unit, low range, high range and abnormal check. Finally, the organization table included three columns: organization ID, organization type code and organization source value.

4.2 Data extraction from the EMR system to validate the EADR CDM

Among a total of 2,931,060 patients from January 2008 to December 2012 in clinical data warehouse of the S hospital, we extracted the data from 135,745 patients who were prescribed below drugs to determine whether the exported data were sufficient to detect ADRs of six drugs. The total number of prescription is 51,947,144 for the period. The total prescription of six drugs is 1,449,376. We focused on six drugs that are the most popular medications in clinical practice: ranitidine, clopidogrel, rosuvastatin, ciprofloxacin, fluorouracil, and celecoxib. Through this process, we validated the EADR CDM for adverse drug reaction signal detection.

There were issues to resolve when converting the data based on the EADR CDM.

In the demographics table, there were three columns that included patient ID, birth data and gender in the EMR system of the S hospital. All the columns of visit type table were in the EMR system of the S hospital. Regarding the visit type, there were different codes depending on admission, outpatient clinic, emergency medicine and others. In the discharge disposition column, there were only those data that related to death in the hospital. If the patient died after being discharged from the hospital, it was difficult to trace the effects of drugs. All the columns in the procedure table were in the EMR system of the S hospital. All the columns of the death table, which were patient ID, diagnosis date, diagnosis name and diagnosis code, were also in the EMR system of the S hospital. The reference code for the death and diagnosis code was ICD-10. Among the eleven columns of the diagnosis table, five columns, patient ID, diagnosis date, diagnosis name, diagnosis code, and diagnosis department, were in the EMR system of the S hospital. Thirteen columns of the drug table, patient ID, drug name, drug ingredients code, drug ingredients name, drug chemistry, ATC code, drug group, drug count, drug unit, drug route, dosage start date, dosage end date, and dosage date, were in the EMR system of the S hospital. The dosage purpose, dosage stop flag, and dosage flag were not included in the EMR system of the S hospital. Although there were standard terms for the drug, there were different codes for the drug name depending on each hospital. In the case of the drug group, there was no standard of classification. Finally, all the columns in the organization table were not in the EMR system of the S hospital.

We obtained results regarding the characteristics of the actual exported EMR data of an academic hospital. We used the data from a total of 135,745 patient records. The most prescribed drug was ranitidine, as shown in Fig. 2.

Fig. 2
figure 2

Prescription status of six drugs according to age and gender

By age, ranitidine was prescribed more to patients in their fifties (n = 20,461, 19 %). Clopidogrel was prescribed more to those in their seventies (n = 1795, 34 %). In the case of clopidogrel, this drug is prescribed increasingly for the elderly because of it is an antithrombotic drug [19]. In our result, clopidogrel was also prescribed increasingly for the elderly and especially to males. These results have features similar to the general prescription features of clopidogrel. Rosuvastatin was prescribed more to patients in their sixties (n = 3022, 30 %). Ciprofloxacin was prescribed more to those in their fifties (n = 4035, 19 %). Fluorouracil was prescribed more to patients in their sixties (n = 1034, 31 %). Celecoxib was prescribed more to those in their seventies (n = 3980, 29 %).

According to gender, ranitidine was prescribed more to females (64,410, 60 %). Clopidogrel was prescribed more to males (3202, 61 %). Rosuvastatin was prescribed with similar frequencies to females and males. Ciprofloxacin was prescribed more to females (n = 12,339, 57 %). Fluorouracil was prescribed more to males (2077, 63 %). Finally, celecoxib was prescribed more to females (9935, 73 %) as shown in Table 5.

Table 5 Prescription distribution of six drugs according to age and gender

5 Conclusions

We developed the EADR CDM for the early decision of ADRs, and it can be extended to multiple EMR systems. The EADR CDM included eight tables that contain demographic, drug, visit, procedure, diagnosis, death, laboratory and organization information. The EADR CDM was implemented using an authentic EMR system within the S hospital and provided the following concerns and issues for further study.

Since race and nationality were not included in the EMR of the S hospital, the current EMR system cannot detect different effects or side effects according to race or nationality. If race and nationality are important factors, then this information should be captured at the point of care in the hospital for a straightforward analysis.

Standard codes were found to be major concerns in integrating multiple EMR databases. For example, in the visit type table, there is no national standard code for visit type. Because the S hospital used internal codes, we transformed the visit data to our standard codes, such as inpatient, outpatient, emergency visit and other visit type. The OMOP project proposed using SNOMED-CT for the procurement data. However, many Korean hospitals do not use SNOMED code for procedures. Therefore, standard procedure codes are essential when using the EADR CDM.

Regarding the drug table, there were many issues when the data related to the drug code were extracted because hospitals had different codes for the drug name, drug ingredients, drug ingredients name, and drug chemistry. Some drugs did not map onto the ATC code. The drug code should be a standardized code and map onto the ATC code.

In the case of the laboratory table, because each hospital used different codes and units for order name, laboratory type, result, result unit and abnormal check, standardization would be required. Free text is another major concern. The medical history was written in a free text format, and thus, the medical history should be re-coded for data analysis. To detect ADR signals in multi-center EMR systems, the development of standardized guidelines regarding laboratory codes is necessary for a multi-institutional pharmacovigilance database system. In addition, there were character type data in many columns when the data related to the observation result were extracted. Accordingly, data cleaning was absolutely necessary. For this purpose, an effective data cleaning tool should be developed.

The quality of the data in the EMR system is not satisfactory from a research perspective. When we extracted the data from the EMR database, we found that there were many missing values. Most of the medical history was missing or incomplete. To resolve these problems, guidelines for medical record should be prepared with the cooperation of a pharmacovigilance team.

Although the reasons for stopping a medication or changing the dose are important information, that information was not collected from the EMR system of the hospital. Accordingly, systems for the dosage stop flag and dosage flag must be developed to indicate the occurrence of adverse drug events.

In our results, all the columns of the organization table were not in the EMR system of the hospital. Accordingly, standard organization type codes are essential to use the EADR CDM. In addition, there were some problems when we extracted the data concerning hospitalization history because the hospitals used different names for medical departments. In addition, a single code sometimes mapped onto several medical departments. Therefore, a standard organization type code is required to use the EADR CDM based on multi-center EMR systems.

Controlled terminology will be applied to integrate different EMR systems. Further efforts for the standardization of procedure and laboratory codes will be required to develop a multi-institutional pharmacovigilance database system.

We extracted the data from 135,745 patient records to evaluate whether the exported data were sufficient to detect ADRs to the six most popular drugs. After exporting the data, we found that a diverse database-based multi-center EMR system is required, although certain drugs such as clopidogrel have relevant data to detect ADRs within one system. More research into the characteristics of ADRs and the causes of side effects in elderly patients is required [20]. Drug safety monitoring based on an EMR database provides information to detect the characteristics of ADRs and the causes of side effects in elderly patients. In addition, there is little information on drug safety for pediatric patients and inadequate scientific evidence to prove safety. An ADR surveillance system based on multi-center EMR databases using the EADR CDM will aid in finding the scientific evidence necessary for adequate proof. Accordingly, the collection of diverse data based on multi-center EMR systems using the EADR CDM is a method to effectively detect ADRs. Drug safety monitoring based on EMR databases is increasingly important for the early detection of possible harmful adverse drug reactions.

This study had some limitations. We compared the EADR CDM with the EMR data of the S hospital, i.e., we extracted the data from the EMR system to validate the EADR CDM. However, the EADR common data model is intended for adverse drug signal detection based on multi-center EMR systems. Therefore, future studies should be conducted focused on multi-center EMR databases, such as in Overhang et al. [18] in which ten different datasets were converted into the OMOP CDM to validate the OMOP CDM. Despite these limitations, we provide guidelines for developing the CDM for early decision of ADRs and make plans to develop the drug safety surveillance system based on multi-center EMR systems.