Purpose

A compilation of potentially fatal adverse drug reactions for cancer drugs previously published in an oncology journal concluded that these events may be discovered as many as 36 years after a drug receives United States Food and Drug Administration (FDA) approval [8]. This suggested to the authors that there was a need for continued vigilance for such reactions with cancer drugs. In hopes of enhancing the ability to screen large databases of adverse event (AE) reports, several computer-assisted statistical signal detection algorithms—also known as data mining algorithms (DMAs)—are being studied in hopes of improving safety surveillance [17, 911]. These algorithms might be especially useful for cancer drugs that (1) can be approved on an accelerated basis, (2) are known to have serious toxicity, (3) are administered to patients with substantial and complicated comorbid illness, (4) are not available to the general medical community, and (5) may have a high frequency of “off-label” use. In this study, we applied the DMAs known as the multi-item gamma-Poisson shrinker (MGPS) and proportional reporting ratios (PRRs) to the FDA Adverse Event Reporting System (AERS) database to determine whether these methods would have flagged a sample of serious AEs obtained from the aforementioned published compilation of cancer AEs in advance of the “traditional” methods that were in operation at the time.

Methods

The FDA AERS database is a computerized information database for post-approval safety surveillance. It functions as an early warning system for adverse drug reactions not detected during pre-approval testing. It contains AE reports with approved drugs and therapeutic biological products submitted in accordance with mandatory reporting obligations by pharmaceutical companies and voluntarily by health care professionals and consumers. AEs are submitted on MedWatch forms. AE reports are reviewed and coded for data entry in accordance with the standardized terminology of the Medical Dictionary for Medical Regulatory Activities (MedDRA). Quarterly extracts are available through the National Technical Information Service (NTIS). These quarterly updates are subjected to extensive cleaning (i.e., removal of redundant drug nomenclature and duplicate reports) prior to data mining. The data extract used for the current analysis was through the first quarter of 2003 [12].

The two DMAs chosen for this analysis were PRRs [2] and the empirical Bayesian MGPS, (Lincoln Technologies, Wellesley Hills, MA, USA) [11].

The PRR is a simple metric relating the proportional representation of an event of interest with a drug of interest compared with the proportional representation of that event among all other drugs in the database (Table 1). For this analysis, a PRR>2 with an associated χ2>4 (with Yates correction) was considered a “signal” of disproportionate reporting (SDR), which has been frequently cited in published studies of data mining in conjunction with case count thresholds (e.g., n>2) [2]. We also examined whether imposing a case count threshold (n>2) in series with PRRs affected the results.

Table 1 Proportional reporting ratios

The theoretical basis of MGPS has been described in detail elsewhere [1, 11] but briefly is as follows. Expected counts for item sets (i.e., drug–event combinations or DECs) are based on the product of the marginal probabilities of each item (drug and event) in the database. The observed to expected (O/E) ratio is initially calculated as a crude disproportionality metric. Since the same ratio could be obtained from cell counts (frequencies) of markedly different sizes (O/E ratios based on smaller cell counts being considered more variable or imprecise) further modeling using maximum likelihood estimation and Bayesian inference are used to adjust the crude O/E ratios based on the respective cell counts. Each cell is considered to represent a Poisson process in which the Poisson parameter distribution is related to a mixture of two gamma distributions. The prior probability distribution of the gamma parameters are obtained by applying an interactive maximum likelihood algorithm to a negative binomial mixture likelihood. Posterior estimates of the gamma parameters are obtained by updating the prior with the individual cell counts via Bayes theorem.

Using logarithmic transformations or taking the lower 5% cut-off of the posterior distribution (EB05), an expectation value that adjusts for the variability by down weighting or “shrinking” the parameters associated with low cell counts is obtained. These metrics are known as the empirical Bayes geometric mean (EBGM) and the EB05. An EB05 of 8 may therefore be interpreted to mean that reports of the particular DEC occur in the database eight times more frequently than would be expected if drug and event were independently distributed in the database. The signal metric used for a threshold in the current analysis was the frequently cited lower 5% cut-off of the EBGM greater than two (EB05>2) [1]. The developers of MGPS have stated that for EB05≥2, “our experience indicates that the signals using this cutoff have high enough specificity to deserve further investigations.”“

A variety of data mining options and parameters exist including basic covariate adjustment (stratification by age, gender, and year of report) and cumulative subsetting. Stratification tends to reduce spurious associations due to confounding and markedly decreases the volume of disproportionalities [1, 5].

For the present analysis, the data mining was performed on suspect drug–AE pairs using stratification by age, gender, and FDA year of report with cumulative subsetting by year.

A recent peer-reviewed publication summarizing all AEs associated with oncology drugs reported from 2000 to 2002 using a pre-defined search strategy provided the sample of DECs for this analysis.1 In addition to the drug and event, this publication provided the year the drug was approved, the data source for the AE (e.g., FDA MedWatch Program, investigative team’s comprehensive cancer center) and the time interval between approval and package insert revision. It also supplied the year and reference in which ten or more cases (i.e., “case series”) of the AE were described in a single article. The year in which the signal metric exceeded the specified thresholds was compared with the year of the package insert revision and/or publication of a “case series.” For each specific AE or group of AEs, the verbatim term(s) from the paper and all MedDRA Preferred Terms that were considered clinically equivalent or closely related to the verbatim term as determined by an experienced reviewer were used for data mining. A second experienced reviewer reviewed and verified the initial findings.

Results

The peer-reviewed published analysis contained 21 drugs and 26 DECs that were considered sufficiently specific for data mining. Prior to 1995, 10 of the drugs were approved in the United States. Of the 11 drugs approved in 1995 or later, 6 underwent standard approval process and the remaining 5 drugs underwent an accelerated approval process. Of the DECs, 24 generated a signal of disproportionate reporting with PRR (6 at 1  year and 16 from 2 years to 18 years prior to either a published “case series” or a package insert change) and 20 with MGPS (3 at 1 year and 11 from 2 years to 16 years prior to either a published “case series” or a package insert change). One DEC did not generate a signal with either algorithm. With PRRs, 18 DECs (7 involving drugs that underwent accelerated approval) would have been highlighted in the year the first MedWatch form (range for number of reports received in first year: 1–14) was received. With MGPS, 6 DECs (4 involving drugs that underwent accelerated approval) would have been highlighted in the year the first MedWatch form (range 5–24) was received. These and other findings are provided in Table 2. Imposing a case count threshold of n>2 in series with a PRR threshold affected timing to initial SDR for three DECs. For only one of these DECs (gemtuzumab-hepatic venoocclusive disease) did this threshold effect timing of SDR in relation to package insert change/”case series” (i.e., SDR occurred one year after instead of the same year).

Table 2 Data mining on 26 potentially fatal drug–event combinations with 21 oncology drugs

Discussion and conclusion

At least one DMA appeared to generate a signal of disproportionate reporting for 22 of 26 DECs 1 year or more prior to publication of a case series and/or a package insert change for selected cancer drugs. The temporal resolution of 1 year is a limitation to this study, which makes it difficult to determine with any degree of certainty whether the signal from the DMA truly predated the initial signal identified in the cases in which the difference was only 1 year. Nevertheless, for 16 DECs with PRRs and 11 DECs with MGPS, one could conclude that a SDR was significant in that it would have been generated well enough in advance (≥ 2 years) of standard techniques in use at the time. Therefore, both methods could have had potential utility in this setting. PRRs have certain advantages (e.g., they are simpler and more “sensitive”), but a potential drawback of such methods is reduced “specificity,” which could result in an overabundance of signals, including “false-positive” signals that would be difficult to manage in general pharmacovigilance settings. However, in the specific situation studied here, the enhanced “sensitivity” may justify the reduced “specificity.” Going forward, any performance differentials between PRR and MGPS similar to what we observed are likely to be significantly mitigated when these methods are used as one element of a comprehensive pharmacovigilance program that utilizes multiple approaches to signal detection. Additionally, given that currently cited thresholds are unvalidated, somewhat subjective, and adjustable, performance gradients would be mitigated by titrating or optimizing thresholds. Noteworthy is the finding that performance differentials seemed to narrow with the most recently approved drugs, especially those that went with accelerated approval (e.g., four of the eight listed DECs in this category “signaled” in the same year, which was the first year of reporting for the relevant DEC). One possible explanation is that there might have been more intensive reporting for these drugs because of proactive targeted surveillance by the manufacturer. It should be noted that our retrospective analysis may not reflect “real-life” pharmacovigilance in that we preselected our event terms and did not do “open-ended” (i.e., all events exceeding designated thresholds were reviewed for relevance) data mining and that a SDR reflects reporting behavior and may or may not correlate with causality. Currently, drugs in this class may be approved on an accelerated basis, are known to have serious toxicity, are administered to patients with substantial and complicated comorbid illness, are not available to the general medical community and may have a high frequency of “off-label” use. Automated methods of drug surveillance might usefully supplement traditional surveillance strategies for oncology drugs and drugs marketed under similar circumstances.