Introduction

Regulatory agencies and clinical researchers from academia and industry, accumulate, monitor, and analyze data related to adverse events to perform continued safety assessment of medical products. Safety assessment can be performed based on a single data set, with respect to a single or multiple adverse events (AE); or by synthesizing information from multiple data sources. When multiple data sources are available, standard meta-analysis techniques such as fixed and random effects models can be used to produce overall risk-metric such as risk ratio or odds ratio. Random effects models use study-specific random effects in a model to account for random variation across studies. While meta-analytic estimate has been used to gauge the existence of an overall safety-signal, it is possible that signals detected in one study may not be detected in other related studies due to inherent differences between them, including differences in patient population or location of the treatment sites. Several articles discuss statistical methodologies for safety-signal detection from a large drug safety database, for example, Huang et al. [1] developed a Likelihood Ratio Test (LRT) based framework for safety-signal detection from FDA Adverse Event Reporting System (FAERS). Huang et al. [2] further extended this method and developed a weighted LRT framework, and Huang et al. [3] discuss a ZIP (Zero-inflated Poisson)-LRT methodology when there are excessive zero events (i.e., events with zero counts) in these large drug safety databases. Their methods, however, do not directly apply to conduct meta-analysis of a safety-signal from multiple randomized controlled trials (RCTs) when there are studies with zero events in either the treatment arm or the control arm.

Dong et al. [4] discuss the limitations of popular meta-analysis methods such as fixed and random effects models in presence of zero events in the treatment arm or the control arm for a single or multiple trials; and propose a frequentist zero-inflated binomial (ZIB) model addressing this issue; unlike the popular approaches such as DerSimonian-Laird (DL), Peto or Mantel-Haenszel (MH), their method does not require continuity correction. For the ZIB setup, Dong et al. propose a modified odds ratio (MOR) measure and argue that this measure is more appropriate compared to the odds ratio as a measure of treatment effect. The ZIB model by Dong et al. [4] does not account for the across-study variability. Muthukumarana, Martell and Tiwari [5] propose a Bayesian ZIB model that accounts for between-study variation by considering study-specific effects in their logit model.

In this article, we describe a method to further extend the Bayesian Zero-inflated binomial model for meta-analysis and develop a Bayesian hypothesis testing framework to explore safety-signal for each clinical-trial study (henceforth, we may interchangeably use study or trial) in this setting. This is achieved by assigning a spike-and-slab prior for the study- specific treatment effects. This prior is essentially a mixture distribution with a point mass at zero and a continuous distribution. Scott and Berger [6] use this prior to detect inactive genes in micro-array context. Several articles including Scott and Berger [7], Westfall [8], Berry and Berry [9] discuss the use of this prior in multiple testing context. These types of priors are extensively used in Bayesian variable selection in regression; George and McCulloch [10] and Ishwaran and Rao [11] are among the widely cited articles on this topic.

Berry and Berry [9] developed a model-based safety assessment approach using a logistic regression model with spike-and-slab prior for the treatment effect; their proposed hierarchical model accounts for relationship among different types of Adverse events (AE) that are classified into different body systems. This approach allows to study how one type of AE affects the other AEs. DuMouchel [12] extends this approach and propose multivariate logistic regression model that include covariates in the model to detect possible subgroup effect by considering treatment-by-covariate interactions in the model. In the same context, Xia et al. [13] use spike-and-slab prior, in a log-linear regression setup, where they include total subject-time at risk in the model. Tan et al. [14] propose a hierarchical frequentist testing approach for analyzing adverse event data. These articles explore hierarchical structure or categorization of the adverse events. Like Berry and Berry [9] and Xia et al. [13], we also use spike-and-slab prior for the treatment effect, but on a single adverse event at a time. Our focus is to explore the probability of safety-signal as well as summarizing the overall effect using our meta-analytic modeling approach by borrowing strength across all the trials. In addition, our approach is designed for a zero-inflated binomial setup.

In the following section, we discuss the proposed approach and prior distribution for the parameters. In “Data Analysis” section, we illustrate our approach using two published datasets, the first dataset, published by Katsanos et al. [15], provides mortality in the patients treated with paclitaxel drug-coated devices compared to the uncoated devices for treatment in patients with peripheral arterial disease (PAD) in femoropopliteal arteries. The second dataset was obtained from Nissen and Wolski [16]. This dataset provides summary level information of myocardial infarction (MI) and cardiovascular events (CV) death among the patients who received Rosiglitazone for 48 trials. We summarize our findings and discuss potential future research directions in the “Discussion” section.

Methods

Suppose there are \(k\) trials and \({X}_{Ti}\) and \({X}_{Ci}\) are the number of events for the treatment (T) and control (C) arm, respectively, for the \(i\)th trial. We assume, \({X}_{Ti}\sim Bin({n}_{Ti},{p}_{Ti})\) and \({X}_{Ci}\sim Bin({n}_{Ci},{p}_{Ci})\), where \({n}_{Ti}\) and \({n}_{Ci}\) are the number of enrolled patients and \({p}_{Ti}\), \({p}_{Ci}\) are the probability of observing an event for the treatment and control group, respectively.

A zero-inflated binomial model formulated by Dong et al. [4] can be expressed as,

$${Y}_{Ti}\sim p{I}_{\left[{Y}_{Ti}=0\right]}+\left(1-p\right)Bin\left({n}_{Ti},{p}_{Ti}\right); {Y}_{Ci}\sim q{I}_{[{Y}_{Ci}=0]}+(1-q)Bin({n}_{Ci},{p}_{Ci}),$$

where, YTi and YCi are the number of events for the treatment and the control arm and \(p\) and \(q\) are the probability of “zero-state” for treatment and control group, respectively. We assume a \(Beta(\rm {1,1})\) prior for both \(p\) and \(q\); indicating “Binomial” state is approximately equally likely to occur for both the treatment and control arms, a priori.

Let log-odds-ratio for the treatment over the control in the ith study be denoted by \({\delta }_{i}\); that is, \({\delta }_{i}=logit({p}_{Ti})-logit({p}_{Ci})=log\frac{{p}_{Ti}}{(1-{p}_{Ti})}-log\frac{{p}_{Ci}}{(1-{p}_{Ci})}\). Similar to Lunn et al. [17], we consider a one-to-one transformation from \(({p}_{Ti},{p}_{Ci})\) to \(({\delta }_{i},{\mu }_{i})\), where \({\delta }_{i}\) is the parameter of interest, and \({\mu }_{i}\) is the nuisance parameter defined as: \({\mu }_{i}=(1/2)\;(logit\left({p}_{Ti}\right)+logit\left({p}_{Ci})\right)\).

Therefore,

$$ logit\left( {p_{{Ti}} } \right) = \mu _{i} + \frac{1}{2}\delta _{i} ;\,{\text{and}}\,logit(p_{{Ci}} ) = \mu _{i} - \frac{1}{2}\delta _{i} . $$
(1)

We further define \({\delta }_{i}\) = \({\lambda }_{i}{\delta }_{i}^{*}\); where \({\lambda }_{i}\) is a binary variable assumed to follow Bernoulli(π), i.e., P(\({\lambda }_{i}\) = 1) = π; and \({\delta }_{i}^{*}|\delta ,\tau \sim f(\delta ,\tau )\). Essentially, we are assigning a spike-and-slab prior for \({\delta }_{i}\) i.e.,

$${\delta }_{i} \sim \left(1-\pi \right){I}_{\left[{\delta }_{i}=0\right]}+\uppi f(\delta ,\tau ).$$
(2)

We explore following choices of the “slab” distribution \(f(\delta ,\tau )\):

$$ f\left( {\delta ,\tau } \right) \equiv N\left( {\delta ,\tau ^{2} } \right),\delta \sim Uniform\left( { - 3,3} \right); $$
(3)
$$ f\left( {\delta ,\tau } \right) \equiv DP\left( {\alpha ,G_{0} } \right),G_{0} = N\left( {\delta ,\tau ^{2} } \right),\delta \sim Uniform\left( { - 3,3} \right). $$
(4)

We assume \({\mu }_{i}\sim N(0,{10}^{2})\); \({\tau }^{2}\)~ Inverse-Gamma(0.01,0.01) and \(\alpha \sim \) Uniform(1,10). Posterior distribution based on the model with prior (2) allows us to compute the posterior probabilities P(\({\delta }_{i}=0|y)\) or P(\({\delta }_{i}>0|y)\); i.e., probabilities of no-signal or a safety-signal for ith study, respectively. This can be used as a Bayesian substitute for a frequentist’s test for signal detection: \({H}_{0}: {\delta }_{i}=0\) vs \({H}_{a}: {\delta }_{i}>0\). A higher value of P (\({\delta }_{i}> 0|y)\) indicates evidence against \({H}_{0}\); i.e., possibility of a safety-signal associated with the treatment. We assign a noninformative Beta(1, 1) prior as a choice for π; other informative or weakly informative priors can also be used depending on the availability of information about π. Note that unlike Dong et al. [4], the proposed ZIB model accounts for study-specific variability; the parameter \(\tau \) measures heterogeneity across studies/trials.

Besides estimating study-specific odds ratios, we also estimate the overall modified odds ratio (MOR), defined as: \(MOR=OR(1-p)/(1-q)\). Here, \(p\) and \(q\) are the probability of zero events for treatment and control groups, respectively. According to our modeling approach, meta-analytic treatment effect on log-odds scale is, \(\pi \delta ;\) with \(\pi =1\) the model reduces to a simpler model for which \(\delta \) is the overall effect. Let, \({p}^{(r)}\), \({q}^{(r)}\) and \({\delta }^{(r)}\) be the posterior draws of \(p\), \(q\) and \(\delta \rm { respectively};\) where \({OR}^{(r)}=exp\left(\pi (r)\delta (r)\right)\), \(r=1,\dots ,R\). Based on these draws, posterior mean of OR and \(MOR\), i.e., \(E(OR|data)\) and \(E(MOR|data)\) are estimated by, \(\widehat{OR}\)= \(\frac{1}{R} {\sum }_{r=1}^{R}exp({{\pi }^{(r)}\delta }^{(r)})\) and \(\widehat{MOR}\)= \(\frac{1}{R}{\sum }_{r=1}^{R}{OR}^{(r)}(1-{p}^{(r)})/(1-{q}^{(r)})\), respectively. Note that unlike Dong et al. [4], the proposed approach accounts for study-specific variability by introducing the parameter, \({\tau }^{2}\), which measures heterogeneity across trials. In safety-signal evaluation context, modified odds ratio or MOR (introduced by Dong et al. [4]) can be used as a measure to identify the existence of an overall safety-signal based on available studies. We implement the model using Just another Gibbs Sampler (JAGS) software [18] via R2jags [19] package.

Data Analysis

Analysis of Long-Term All-Cause Mortality Data for Safety-Signal Detection

Katsanos et al. [15] published a meta-analysis reporting all-cause mortality risk of Paclitaxel coated (PTX) drug-coated balloons and stents compared to uncoated devices. These devices are used for treating peripheral arterial disease in femoropopliteal arteries. The article reported 1-year mortality analysis based on 28 studies, 2- and 5-year mortality results based on 12 and 3 studies, respectively. Since several studies have zero events in either one arm or both arms, continuity correction was used while conducting standard fixed and random effects meta-analysis to produce overall odds ratio. In this section, the proposed method is used to analyze the 12 studies with available 2-year mortality data discussed in Katsanos et al. [15].

In order to assess mortality risk for each study or to explore the existence of overall indication of safety-signal, first we implement frequentist’s Likelihood Ratio Test (LRT) approach discussed by Huang et al. [2], Jung et al. [20] and the references therein. Table 1 shows the resulting p values based on this approach. LRT approach provides a lower p value for Study 3 (p value < 0.05) indicating a potential safety-signal (in this case mortality risk) for this study. Based on the weighted LRT approach proposed by Huang et al. [2] and Jung et al. [20]), the overall p values (combining information from all the studies) are 0.016 and 0.012, respectively. These methods, however, may not be suitable as the dataset contains zero events in studies 2, 5, and 10 (Fig. 1A.2 of Appendix IA).

Table 1 Individual likelihood ratio test (LRT) based p values for the PTX data (Katsanos et al. [15]) with 12 studies

We implement the zero-inflated binomial model with spike-and-slab parametrization to the data and compare the results based on different choices of the prior for the treatment effects: Normal prior (ZIB + Normal), Dirichlet process prior (ZIB + DPP). Table 2 shows the modified odds ratio (MOR) estimates and the posterior probability of MOR being greater than 1. This posterior probability is large for this dataset under both choices of the prior distributions. Figure 1 presents posterior probability of no safety-signal. In Fig. 2a and b, we plot the posterior probability of safety-signal P (\({\delta }_{i}>0|y)\) for each study (i = 1,…,12) based on two different “slab” distributions; and observe that Study 3 has noticeably higher posterior probability compared to other studies. This finding is similar to what we obtain using the LRT approach [20]. Besides MOR, alternatively, one may consider max {P (\({\delta }_{i}>0|y)\)} as a potential measure of a signal.

Table 2 Odds ratio (OR) and modified odds ratio (MOR) estimates for the PTX data (Katsanos et al. [15]); using ZIB model with` spike-and-slab prior for the treatment effect
Fig. 1
figure 1

Posterior probability of no-signal (\({\delta }_{i}=0\)) based on PTX data (Katsanos et al. [15]). Spike-and-slab formulation was used with different choices of the slab distribution a Normal. b Dirichlet process prior (DPP). A high value of posterior probability of \({\delta }_{i}\)= 0 indicates no safety-signal

Fig. 2
figure 2

Posterior probability of safety-signal \({(\delta }_{i}>0)\) based on PTX data (Katsanos et al. [15]). Spike-and-slab formulation was used with different choices of the slab distribution a Normal. b Dirichlet process prior (DPP). A high value of posterior probability of \({\delta }_{i}\)> 0 indicates a potential safety-signal for the treatment arm

In order to compare the models and to study the model complexity, we compute the Deviance Information Criteria (DIC) and the model complexity measure \({p}_{D}\) [21] for all the models discussed in this section. DIC is defined as: DIC = 2 E {log p(y|\( \theta \))}–log p (y|\( \widehat{\theta } \))}, where \( \theta \) is a vector of model parameters and \( \widehat{\theta } \) is an estimate of \( \theta \). Let \( \theta ^{{(1)}} \),…,\( \theta ^{{(R)}} \) be the draws from the posterior of \( \theta \). The DIC is computed as:

$$ = \frac{2}{R}\sum _{{r = 1}}^{R} log\left( {p(\left( {y|\theta ^{{(r)}} } \right)} \right) - {\text{log}}({\text{p}}({\text{y|}}\widehat{\theta })), $$

\( \widehat{\theta } \) is a plug-in Bayes estimate of \( \theta \). A model with lower DIC is preferred. Table 3 shows the DIC and \({p}_{D}\) (effective number of parameters in a model) estimates of different variations of the ZIB model implemented to the data. The model diagnostics measures slightly differ with two different choices of the “slab” distribution. For exploratory purposes, we also fit the models without spike-and-slab formulation by setting \(\pi \)=1 in Eq. (2), i.e., we use a normal or DP prior for the treatment effects. It shows that the models without spike-and-slab perform slightly better in terms of DIC. However, unlike the proposed model, these models do not provide signal detection probability or overall odds- ratio estimate accounting for the studies with no treatment effects.

Table 3 Model diagnostics for the PTX data under different modeling approaches; ZIB model with spike-and-slab prior for \({\updelta }_{\rm {i}}\) based on (2); with slab distribution Normal (3) and DPP (4)

Overall, the two choices we consider for the slab distribution in this analysis yield similar results. The ZIB + DPP model provides a slightly wider 95% CI for OR and MOR. The odds ratio (OR) estimates based on frequentist fixed and random effects models with continuity correction are presented in Fig. 1 A.2 (in Appendix IA). The proposed approach seems to be more robust as it accounts for “zero events” in the model without adjusting for continuity correction.

Rosiglitazone Dataset

We further analyze Rosiglitazone dataset from the Nissen and Wolski article [16] that studied the rates of myocardial infarction (MI) and cardiovascular events (CV) death among the patients who received Rosiglitazone vs the patients who received placebo using a meta-analysis. There are 48 studies, Nissen and Wolski [16] excluded six trials with zero total events for both treatment and control arms. Their meta-analytic estimate of odds ratio for MI deaths was 1.43 with 95% CI (1.03, 1.98) and odds ratio for CV death was 1.64 with 95% CI of (0.98, 2.74). Subsequently, many articles analyzed this data with continuity correction and published the results [22, 23]. However, either continuity correction or excluding the trials with zero events from the meta-analysis may introduce bias [24, 25]. To address this issue, Dong et al. [4] employed a frequentist ZIB model to analyze the data; their analysis provided a modified odds ratio for MI death of 1.19 with 95% CI (0.95, 1.49) and cardiovascular death (CV) of 1.80 with 95% CI [1.30, 2.50]). Recently, Muthukumarana et al. [5] proposed a Bayesian extension of this model accounting for between trial variability and assuming DP prior for the treatment effect and reported modified odds ratio for MI death 1.45 with 95% CI (1.05,2.11) and odds ratio for CV death of 2.21 with 95% CI (1.15, 4.27) [5]. Note that, like Nissen and Wolski [16], Muthukumarana et al. [5] considered 42 trials for their meta-analysis. Table 1A of Appendix IA presents a comparison of the published results and the results based on our proposed approach.

Our spike-and-slab approach in the ZIB set up allows for computing posterior probability of safety-signal for each trial. We implement the proposed model and present the probabilities for CV and MI in Figures IB.1 and IB.2, in Appendix IB, respectively. Note that these plots are for Normal slab distribution. Results based on DPP slab distribution are similar. The plots suggest that none of the trials show high probability of potential signal. This is true for both adverse events, CV and MI. Note that here we considered all the 48 studies and did not use continuity corrections.

Table 4 provides posterior mean and 95% credible intervals for OR and MOR, respectively. These OR and MOR estimates suggest that there is no strong signal in terms of overall treatment effect, for neither CV, nor MI. As mentioned in Sect. 2, MOR is defined as \((1-p)/(1-q)\) \(OR\), where the adjustment factor \((1-p)/(1-q)\) plays an important role. Figures IB.3 and IB.4 (in Appendix IB) present posterior distributions of OR and MOR and the adjustment factor \((1-p)/(1-q)\) for the CV and MI data, respectively. For the MI analysis, the posterior probability of OR > 1 is approximately 0.7, while the posterior probability of MOR > 1 is approximately 0.8. This difference is due to the right skewed posterior distribution of the adjustment factor \((1-p)/(1-q)\).

Table 4 Estimate of MOR for the rosiglitazone data (cardiovascular (CV)-related death and myocardial infarction (MI)), based on 48 studies

Table 5 provides DIC and the model complexity measure (pD) for different models. It appears that DICs are close to the models with and without spike-and-slab, while the model with spike-and-slab prior provides additional information such as posterior probability of observing a safety-signal (Table 6).

Table 5 Model diagnostics of different ZIB models for Rosiglitazone CV data, with 48 studies; ZIB model with spike-and-slab prior for \({\updelta }_{\rm {i}}\) based on (2); with slab distribution Normal (3) and DPP (4)
Table 6 Model diagnostics for Rosiglitazone MI data, with 48 studies; ZIB model with spike-and-slab prior for \({\updelta }_{\rm {i}}\) based on (2); with slab distribution Normal (3) and DP (4)

Discussion

In this article, we describe a Bayesian methodological framework that provides the posterior probability of potential safety-signal for each study, for single adverse event along with a meta-analytic estimate of the overall signal. This proposed methodology is based on zero-inflated binomial (ZIB) with spike-and-slab parameterization for the treatment effects; our approach accounts for the between-study variability. The framework may serve as an alternative to frequentist LRT approach discussed in Jung et al. [20] when there are zero events in treatment or the control group. Furthermore, this approach avoids continuity correction for the zero events and can be used instead of fixed and random effects meta-analysis when there are zero events. We illustrated the proposed framework based on case studies using published data sets and compared the model performance for different choices of priors for treatment effect.

The advantages of Bayesian approach with the use of spike-and-slab prior, in the multiplicity adjustment context, have previously been discussed in several articles. The mixture prior with point mass at zero can be a reasonable choice for signal detection as some adverse events may not be associated with the treatment. Information related to these types of adverse events are routinely collected in clinical trials and hypothesis related to these events are not typically prespecified [9]. The proposed method produces an overall meta-analytic estimate of effect-size (e.g., odds ratio or risk ratio) of the adverse event adjusting for the studies with zero treatment effect; however, it is only suitable for analyzing a single adverse event at a time. Therefore, it does not account for the complex relationship (i.e., correlation) between the AEs. This is a potential topic for future research.