FormalPara Key Points

Detection of adverse reactions to therapies is an important component of regulatory approval and post-market evaluation of pharmacotherapies.

Adverse event surveillance frequently uses real-world data sources, including insurance claims and electronic health records data, and there is a need for computable phenotypes to identify acute adverse events.

Different structured data elements found in electronic health records data can inform and improve the performance of computable phenotypes for common adverse events.

1 Introduction

Detection and evaluation of adverse events (AEs) are critical components of the regulatory approval pathway for medical therapies and products, both during the prospective clinical trials stage and in post-market monitoring. The United States Food and Drug Administration (FDA) defines an AE as any undesirable experience associated with the use of a medical product in a patient [1]. Real-world data sources are increasingly used for post-market surveillance of AEs, including health insurance claims and electronic health record (EHR) data, with each of these sources presenting different strengths and weaknesses [2]. Insurance claims typically include large numbers of patients and capture all of a given individual’s healthcare interactions, regardless of setting. Importantly, insurance claims data primarily provide information related to the types of procedures and diagnoses for which an insurer was charged, but do not include contextual details that may provide insights into the relationships between different clinical events. In contrast, EHR data only provide information related to clinical care that occurs within a given health system and do not include data for healthcare encounters that occur outside of that system. However, EHR data provides a vast amount of clinical information about diagnoses, laboratory analyses, procedures, and medication administrations that occur during a given encounter, allowing incorporation of additional contextual information to infer associations between clinical events. Given the contrasting strengths and weaknesses of these data sources, it is important to have a clear understanding of the relative utility of data elements contained within each source when designing strategies to detect AEs.

Computable phenotypes are used to define a specific condition, clinical characteristic, or medical event from health data and can use a variety of approaches, ranging from the simple presence or absence of a diagnostic code to machine learning approaches that can analyze multiple forms of data to create trained classifiers [3]. Many computable phenotypes consist of Boolean algorithms that are built using one or more structured data elements such as International Classification of Disease (ICD) codes, laboratory results, or medication orders, wherein the algorithm identifies a phenotype based on the presence, absence, or specific value of the different data elements. The appeal of such algorithms is that they are a relatively simple approach to identify cohorts with specific conditions or clinical features for inclusion in comparative outcomes research, disease registries, and studies of population health [4]. Many existing computable phenotypes have been developed for the identification of chronic conditions [5, 6]. Phenotypes for chronic diseases tend to have good specificity, but variable sensitivity [7]. It is rare for an individual who does not have a chronic condition to have data elements associated with that condition, though it is possible to misclassify individuals as being unaffected with a chronic disease if, for example, they do not have recent clinical encounters that indicate the presence of the condition. In contrast, acute events represent a particular challenge for computable phenotype development as they are temporally circumscribed and only applicable to a given patient within a specific time period. Thus, there is a need to develop computable phenotypes that are applicable to acute conditions that appear and resolve over different time horizons, and to evaluate associations between these conditions and other events within a clinical encounter. Given the increasing availability and use of EHR data for these analyses, there is a particular need to assess the utility of different types of data elements to computable phenotypes for acute clinical events, and to evaluate their impact on the sensitivity and specificity of a given phenotype [8].

The objective of this study was to evaluate the utility of different types of data elements to the performance of computable phenotypes for the detection of four AEs after medication administration using EHR data. We used intravenous immunoglobulin (IVIG) as a model therapy, as administration occurs in a clinical setting and receipt can be verified within the EHR. A diverse set of AEs are associated with IVIG administration [9,10,11,12,13,14], including proximal or immediate infusion-related AEs such as headache, backache, chills, nausea, cardiac arrhythmia, muscle pain, and anaphylaxis, and distal AEs, which occur several days to a week or more after infusion, such as thrombotic events, hemolysis, and urticaria. We specifically evaluated the utility of data elements that are available in EHRs but not in health insurance claims data, including laboratory values and clinical events. Below, we assess the utility of these data elements to the performance of computable phenotypes for different AEs.

2 Methods

2.1 Data Source and Extraction

The study was conducted using retrospective EHR data from Duke University Health System (DUHS). DUHS consists of one tertiary care and two community-based hospitals, and a network of primary-care and specialty clinics that have utilized a single EHR system since 2014. We extracted clinical data through Duke’s Clinical Research Datamart, a PCORnet common data model-based EHR database, from 1 January 2014 to 31 December 2019 [4].

2.2 Identification of Intravenous Immunoglobulin (IVIG) Administrations

We used RxNorm Concept Unique Identifier (RxCUI) codes (see Online Supplemental Material (OSM), Resource 1, Supplemental Table 1) to identify patients who were administered an IVIG product, extracting time stamps for infusion start times and the location of the administration, classified as either inpatient (IP) or outpatient (OP). For patients who underwent more than one IVIG administration during the study period, we assessed the median number of days between the administrations (administration cadence). Indication for IVIG administration was determined based on the primary ICD code associated with the encounter during which IVIG was administered, and indications for administration were grouped as described in Supplemental Table 2 (see OSM Resource 1). We extracted age at time of administration, sex, race/ethnicity, and insurance status at the time of administration.

2.3 Development of Computable Phenotypes

Computable phenotypes for anaphylaxis, bradycardia, tachycardia, thrombosis, and hemolysis are detailed in Supplemental Table 3 (see OSM Resource 1); the development of these computable phenotypes is described below. For each of the five AEs, we considered an ICD-based phenotype, a phenotype based on EHR-derived contextual information, such as laboratory values, medication administrations, or vital signs, and a compound phenotype that required an ICD code for the AE in combination with EHR-derived contextual information. For the three proximal AEs (anaphylaxis, tachycardia, and bradycardia), the ICD code for the AE was required to occur within the same encounter as the IVIG administration, and any EHR-derived contextual information used for the phenotype had to occur within 6 h of the IVIG administration. For the two distal AEs (thrombosis, hemolysis), the AE had to occur within 7 days of IVIG administration. R scripts were created for each computable phenotype and run against the dataset to create sub-cohorts that were evaluated as described below.

2.4 Chart Review Validation of Computable Phenotypes

To assess the validity of different computable phenotypes, we conducted a chart review of a subset of patient charts. Chart review was conducted by a group of six medical students and subspecialty fellows. For each chart, reviewers were provided with the encounter date for the IVIG administration and the type of AE that was expected to have occurred based upon the computable phenotype. The reviewers were asked to select the indication for IVIG administration, whether they thought the AE had occurred based on their review of the information in the chart. If the reviewer determined that the AE had not occurred, they were asked to record their reasoning to help inform further refinement of the computable phenotypes. To ensure concordance among reviewers, all reviewers completed a review of a random sample of 20 charts. We determined areas of discordance and reviewed results to ensure that all reviewers understood the criteria for each aspect of the chart review. Additionally, results were reviewed at study team meetings to ensure that the team agreed on the findings for the chart review.

2.5 Statistical Analysis

We summarized characteristics of the study population with medians (interquartile range (IQR)) for continuous variables and counts (percentiles) for categorical variables. The population-level rate of administrations per person-year was defined as the total number of IVIG administrations that occurred between the patient’s first and last IVIG administration within the study window, divided by the total person-time contributed by each patient between their first and last IVIG administration.

The sample of patients selected for chart review was based on unique IVIG administration encounters drawn at random without replacement. The sampling was stratified by encounter setting (IP vs. OP) and phenotyping approach for each AE, except for anaphylaxis, for which all putative events were reviewed. We treated the chart review results as the true value and report positive predictive value (PPV) as a measure of phenotype accuracy. While we cannot calculate true measures of sensitivity because we did not review charts of phenotype-negative individuals, we generated estimates of phenotype sensitivity. Specifically, we assumed that the different computable phenotype approaches captured all true events in the data. The number of true events was estimated by the product of the PPV and number of computable phenotype-identified AEs. The estimated sensitivity of each computable phenotype for a given AE was calculated as the proportion of estimated true events that were captured by the phenotype. All analyses were performed in R 4.0.2 [15].

3 Results

3.1 Study population

We identified 3,897 individuals who had at least one IVIG administration between 1 January 2014 and 31 December 2019, and a total of 29,968 IVIG administrations (Table 1). The median age at the earliest encounter was 47 (IQR 16, 64) years, 49% of the cohort was female, and 47% had public insurance (i.e., Medicaid and/or Medicare). The majority of IVIG administrations (68%) occurred in an OP setting. The most common indications for IVIG administration (as determined by ICD codes associated with IVIG administration encounters; more than one code could be associated with an encounter) was immunodeficiency (45%), followed by transplant (40%, including solid organ and hematopoietic transplant), autoimmune disorder (38%), and hematologic malignancy (17%). We found that 54% of patients received two or more IVIG administrations, with a median time between administrations of 28 days (IQR: 7, 38), and 11.92 unique administrations per person year, suggesting that most patients with multiple administrations received IVIG on a monthly basis.

Table 1 Characteristics of patients receiving intravenous immunoglobulin (IVIG), 2014-2019

3.2 Identification of Potential Adverse Events (AEs) Across Computable Phenotypes

Among 29,968 IVIG administrations, we identified 6,692 potential AEs for anaphylaxis, tachycardia, bradycardia, thrombosis, and hemolysis across OP and IP encounters that included an IVIG administration (Table 2). Anaphylaxis was the rarest AE, with only 18 potential events identified. Potential tachycardia or bradycardia was identified in 5,743 encounters. We also evaluated two distal AEs, thrombosis (including myocardial infarction, stroke, and embolism) and hemolysis. In contrast to the potential proximal AEs, the majority of potential distal AEs were identified in patients who received IVIG in an IP setting, including 70% of thrombotic events and 73% of potential hemolytic events. Most potential thrombotic events (70%) were identified with ICD codes alone, whereas potential hemolytic events were largely identified through the presence of abnormal laboratory values (67%). Few potential distal AEs were identified with the presence of both an ICD code and laboratory values.

Table 2 Prevalence of potential adverse events among patients receiving intravenous immunoglobulin (IVIG), 2014–2019

3.3 Validation of Computable Phenotypes

We assessed the accuracy of different phenotyping strategies by comparing the results of different computable phenotypes to the results of a manual chart review (Table 3). The anaphylaxis phenotype based only on ICD codes had a PPV of 57% and an estimated sensitivity of 29%. In contrast, a computable phenotype based on the intramuscular (IM) or subcutaneous (SQ) administration of epinephrine (a typical treatment for anaphylaxis) had a higher PPV (90%) and higher estimated sensitivity (64%), as epinephrine IM/SQ has few other indications, making the route of administration highly specific for anaphylaxis. Of note, epinephrine administration by any route had a lower PPV of 37% (data not shown), consistent with the use of this medication for other indications. The compound phenotype for anaphylaxis was highly predictive; however, we identified only one patient with both a diagnosis of anaphylaxis (as indicated by the presence of an ICD code) in EHR data confirming receipt of epinephrine IM or SQ; the estimated sensitivity of this phenotype was 7%.

Table 3 Positive predictive value and sensitivity of acute adverse event (AE) phenotypes

The compound phenotype for tachycardia or bradycardia had the highest PPV (80%), and the vital signs-based phenotype had the highest estimated sensitivity (71%). Notably, these potential AEs are inherently defined by quantitative measures; thus, it is expected that vitals sign-based phenotypes will have both a high PPV and sensitivity. Moreover, we observed that in several cases, patients had documented changes in heart rate that were adjudicated by chart review as tachycardia or bradycardia, but were not accompanied by an ICD code, suggesting the change in heart rate may be due to other patient characteristics or clinical events such as fever, anxiety, or use of other medications that contribute to heart rate differences. Thus, some AEs may be captured through other coding methods that encompass that AE along with other clinical features or may not be coded due to patient characteristics that account for the event, such as the duration of the AE or other parameters that are not typically captured in individual structured data elements [16].

We similarly evaluated PPV and estimated sensitivity for the two distal AEs, thrombosis and hemolysis (Table 4). Of 50 encounters identified as having a potential thrombotic event by ICD-based computable phenotype, 23 were confirmed to have a thrombotic event by chart review. Using a laboratory test-based computable phenotype, we found that of the 45 patients who had abnormal laboratory values for troponin or D-dimer, 6 were confirmed to have a thrombotic event (PPV of 13%). Only 2.5% of all patients in our cohort had abnormal laboratory values, making the estimated sensitivity of this phenotype 9%. The compound phenotype for thrombosis, which used both ICD codes and laboratory values, had a PPV of 89%; however, the estimated sensitivity was low (10%). The ICD code-based phenotype for hemolysis had a relatively low PPV of 24%. We next evaluated the PPV for abnormal laboratory values for hematocrit and haptoglobin. Of note, multiple laboratory tests are used in combination to help identify hemolysis, including hematocrit and haptoglobin (Supplemental Table 4; see OSM Resource 1) [17]. Abnormal hematocrit values had a low PPV for hemolysis (11%); however, abnormal haptoglobin values had a comparatively high PPV of 67%. Compound phenotypes including an ICD code for hemolysis and an abnormal laboratory value (either hematocrit or haptoglobin) had PPVs of 100%; however, only one patient had both an ICD code for hemolysis and an abnormal laboratory value for hematocrit and only four patients had an ICD code and an abnormal haptoglobin, resulting in sensitivities of 15% or less.

Table 4 Positive predictive value and sensitivity of distal adverse event phenotypes

3.4 Accounting for Prior Medical History in Acute Event Phenotypes

During the chart review of distal AEs, reviewers noted that many patients who were identified as having an ICD code for an AE of interest did not experience the AE during the encounter under review; rather many patients had a prior history of the specific AE that was relevant for patient care in subsequent encounter(s). We therefore evaluated the effect of prior medical history on detection of AEs to determine if a history of these events negatively impacted the PPV of ICD-based computable phenotypes (Table 5). A patient was determined to have a prior medical history of a given AE if they had an ICD code associated for that AE during the time period from the beginning of the study period or the date of the patient’s first encounter within the dataset until their IVIG administration encounter.

Table 5 Prior medical history and detection of adverse events (AEs)

We found that we were more likely to detect a true AE (i.e., verified by chart review) in patients without a prior history of either thrombosis or hemolysis. For example, among 30 potential thrombosis AEs identified with an ICD-based computable phenotype in patients with no prior history of thrombosis, a manual chart review found that 76% of phenotyped AEs did in fact occur after the infusion encounter of interest, compared to a PPV of 33% in patients with a prior history of thrombosis. Similarly, among potential hemolysis AEs phenotyped with ICD codes, the PPV of the computable phenotype was 8% among patients with a prior history of hemolysis and 78% among patients with no prior history. Of note, 85% of prior ICD codes for thrombotic events and 96% of ICD codes for hemolysis were documented within the year prior to the IVIG administration encounter associated with that AE (data not shown). These findings indicate that prior medical history likely impacts the specificity of computable phenotypes based on ICD codes, particularly for conditions in which past medical history may influence subsequent patient care.

4 Discussion

Real-world data, including health insurance claims and EHR data, are increasingly being used to support pharmacovigilance processes [18]. This study evaluated the utility of different types of EHR-derived structured data elements to computable phenotype specificity and sensitivity for AEs linked to a medication administration, with a focus on data elements that are not generally available in insurance claims data. Unlike chronic conditions, some AEs occur and then resolve within a relatively short time period, making it challenging to construct computable phenotypes with high accuracy. We found that the compound computable phenotypes using both ICD codes and contextual information, including medication administration and vital signs, had high PPV for proximal events such as anaphylaxis and bradycardia or tachycardia; however, few patients had both ICD codes and the relevant EHR-derived contextual data, thereby decreasing sensitivity. In contrast, computable phenotypes for distal AEs (i.e., thrombotic events or hemolysis) frequently had ICD codes for these conditions, even in the absence of an AE during that particular encounter due to a prior history of such events. Therefore, patient medical history of distal AEs negatively impacted the PPV of computable phenotypes based on ICD codes. Taken together, we demonstrate the utility of different types of structured data in computable phenotypes for AEs linked to IVIG administration.

Most prior work in the development of computable phenotypes to detect IVIG-associated AEs has been performed using manual chart reviews for specific patient populations, though insurance claims and EHR data are increasingly being used for such studies [11]. Martinez and colleagues used the US Premier Healthcare Database to evaluate associations between different IVIG formulations and anaphylaxis in 24,919 hospitalized patients over a 9-year period based on administration of epinephrine on the same day as IVIG administration. Manual chart review identified 128 episodes of anaphylaxis among the 494 cases of epinephrine administration [19]. We did not identify any prior studies that retrospectively identified tachycardia or bradycardia associated with IVIG administration, though several prospective studies reported this AE in association with IVIG administration for specific indications [20,21,22]. Notably, these prior studies have been conducted in patients who were directly observed as inpatients or through clinical trials, increasing the likelihood that all relevant data required to identify an acute AE would be recorded. Manual chart review allows investigators to account for multiple forms of data that are not available within insurance claims or in structured EHR data, including clinical notes. Such unstructured data frequently includes the interpretations of the provider that specifically address the relationships between different clinical events. For example, we observed that tachycardia and bradycardia often occurred but were not reported using diagnostic coding, suggesting that other patient characteristics or clinical events may explain these changes in heart rate, including individual physiologic changes, medications (i.e., beta blockers), changes in blood volume resulting from the IVIG infusion itself, or the presence of other acute or chronic conditions that could account for a change in heart rate. Future studies will be required to develop methods that can accurately identify changes in baseline heart rate to more reliably attribute specific vital signs to drugs or other exposures of interest.

Unlike anaphylaxis or cardiac arrhythmia, thrombotic events and hemolysis may occur several days after IVIG administration and would generally only be detected if a patient remained under medical care (i.e., was hospitalized) or returned for medical care to the same institution within the specified time frame. A study by Jin and colleagues used the State Inpatient Databases and State Emergency Department Databases from three states [23] to identify thrombotic events among patients who had received IVIG within 120 days prior to the event based on ICD-9 codes. By comparison, we used a 7-day window to detect thrombotic events after IVIG exposure, potentially limiting our ability to capture this AE, but also making it less likely that we would identify events unrelated to treatment. Other studies have reported that the highest incidence of thrombotic events after IVIG exposure occurs within one week of treatment, though incidence has typically been reported as less than 1% and risk of thrombosis is likely to differ among the various patient populations receiving IVIG for different indications [24]. A study by Amman and colleagues used the FDA-sponsored Sentinel Distributed Database to assess risk of venous thromboembolism (VTE) after IVIG administration [25]. VTE cases were identified using ICD-9 codes associated with hospitalizations, and patient charts identified as cases by ICD-9 codes were then reviewed to determine validity of the ICD-9-based diagnosis. The investigators identified 75 post-IVIG VTE cases over a 6-year period, 38 of which were confirmed by chart adjudication. The authors found that patient history of VTE reduced the PPV of ICD-9 codes for VTE. Similarly, we also determined that a significant proportion of thrombotic events identified by ICD codes were not confirmed by manual adjudication, and that patient past history of thrombosis was common among potential events not confirmed by chart review. The inclusion of imaging studies such as ultrasound, computed tomography pulmonary angiography (CTPA), or ventilation/perfusion (V/Q) scans could potentially improve detection of thrombotic events; however, the evaluation of such studies would likely require analysis of text reports describing the findings to delineate positive and negative findings. The detection of hemolytic disease as an AE after IVIG administration is complicated by the many indications for IVIG administration are associated with baseline hemolysis [26]. As with thrombosis, hemolytic disease is a rare side effect of IVIG administration, and it is likely that sensitive computable phenotypes for this AE will have relatively low specificity [27]; however, because thrombosis and hemolytic anemia are serious AEs that can result in significant morbidity and mortality, it may be desirable to have a highly sensitive computable phenotype. Ultimately, the tradeoff between sensitivity and specificity must be determined by the goal of a given study (i.e., screening, identification of definite cases), with computable phenotype performance adjusted accordingly.

Given the potential applications of real-world data sources in post-market surveillance and effectiveness analyses, there is a growing need for computable phenotypes that can be used across different datasets, including EHR data from various health systems [28]. The ability to share phenotypes across data sources allows for the creation of the large patient cohorts necessary for the identification of AEs. Health data networks that utilize common data models, such as the National Patient-Centered Research Network (PCORnet) and the Observational Medical Outcomes Partnership (OMOP), provide expanded opportunities to integrate EHR data from multiple sources and additional motivation to develop phenotypes that can be easily applied across data from different institutions that rely on structured data elements that are regularly captured at all partnering facilities [29, 30]. Notably, Boolean algorithms that rely on structured data elements are of particular interest because they are readily applicable to nearly all institutions and do not require the organization of unstructured data elements or the substantial computing power that is often necessary for machine learning-based approaches. Such Boolean algorithm-based phenotypes can be used to create scripts to automate capture of clinical events of interest for patient populations that receive care in a variety of settings, thereby enhancing post-market surveillance.

4.1 Strengths and Limitations

Our study has several strengths and limitations. A strength of the study is that it included a relatively large number of patients with detailed EHR data. Moreover, the study included patients receiving IVIG for a variety of indications in multiple inpatient and outpatient settings. The size and diversity of this cohort enhances the generalizability of the computable phenotypes to multiple patient populations. A limitation of this study is that it evaluated a limited number of AEs in individuals receiving a particular class of biologic product and at a single institution. Future studies will be required to evaluate the utility of these AE computable phenotypes associated with other therapies and across different sites with different coding and practice patterns. Moreover, given the limited size of the cohort and the nature of the analysis, we were only able to estimate the sensitivity of the phenotypes developed in this study. Future studies will need to evaluate phenotype sensitivity through chart review of patients who are negative for the phenotype of interest.

5 Conclusions

In conclusion, we evaluated methods for creating computable phenotypes for four AEs based ICD codes and on structured data elements from the EHRs. We observed that positive predictive value of acute AEs was enhanced by the inclusion of additional contextual information (i.e., vital signs, treatment administration); however, sensitivity was significantly reduced by the requirement for multiple data elements. Detection of distal AEs (i.e., thrombosis and hemolysis) using ICD codes was highly sensitive; however, specificity was markedly reduced due to the inclusion of these codes for patients who had a history of these AEs. Taken together, our results provide evidence for the utility of different structured data elements in deriving computable phenotypes for AEs. Such computable phenotypes can be used across different data sources for the detection of infusion-related AEs.