Background

Randomized clinical trials (RCTs) are often considered the gold standard for establishing treatment efficacy and safety of a new intervention and are critical for the purposes of establishing the evidence supporting product licensure. The risks associated with the new intervention may not be all known at the time of approval, because safety data are collected from studies that involve a relatively small number of subjects during a relatively short follow-up period of time. Furthermore, RCTs typically involve stringent study designs, employing carefully selected populations, and have been criticized for not providing results that can be generalized to routine clinical practice [1]. In contrast, the use of real-world effectiveness and safety data collected outside of highly controlled traditional randomized trials has received increasing attention in the medical literature [2,3,4]. In response to requirements under the 21st Century Cures Act and the sixth Prescription Drug User Fee Act (PDUFA VI) [5, 6], the FDA is developing the process and guidance for using real-world data (RWD) and its resultant real-world evidence (RWE) to support the assessment of safety and effectiveness in regulatory submissions.

Real-world data, as defined by U.S. FDA RWE framework [7], are data related to patient health status and/or the delivery of health care collected in the setting of routine clinical care. Examples of RWD include data derived from electronic health records (EHRs), medical claims, and billing data (used for administrative purposes) and data from product and disease registries [7]. These data sources often include a large number of patients and can be considered as representative of real-world patient populations. Based on research objectives and questions, the specific target patient population can be defined for whom to collect information and then construct the study sample from RWD [8]. The rich data sources in a real-world setting also promote external validity which is a commonly raised concern in RCTs due to the enrolled participants not fully representing those patients seen in routine clinical practice. Meanwhile, the increasing development of such large databases, as well as the maturation of information technology invites their use in health care and clinical decision-making. An acknowledged limitation of real-world databases is that treatment use is simply observation of physician/patient choice, not randomized. As a result, baseline covariates of treated subjects often differ systematically from those of untreated subjects. Randomization is routinely used in experimental/clinical research to reduce or eliminate the effect of baseline covariates that might otherwise occur due to treatment assignment and which can influence the treatment use and/or outcomes between treatment groups (also known as confounding factors).

Considering the respective strengths and limitations of RCT and RWD, there is a growing interest in utilizing data from both sources of evidence to inform regulatory decision-making. Safety evaluation is a continual and iterative process throughout the drug development life cycle and requires long time horizons and large amounts of data to fully understand the safety profile of a medical product. While clinical trials provide high-quality data for an initial assessment of the safety signals of a new drug, they alone cannot fully characterize the safety profile. Post-marketing RWD plays a critical role in further understanding the safety profile of a drug once it has been licensed and is being used in clinical practice. A well-designed observational study (either retrospective or prospective design), together with RCT, can be used to answer broader regulatory safety questions.

There is little methodological work assessing the statistical analysis strategy in conjunction with RCT and observational study design. And, significant challenges remain regarding potential study design and analytic methodologies to determine whether observational studies can reliably generate evidence on the safety and effectiveness of new interventions to inform regulatory decision-making. In this paper, we focus on the integrative analysis of RCTs and observational studies to inform post-marketing safety decision-making. We propose a three-stage statistical analysis strategy that utilizes data fusion techniques (originally developed for computer science) to integrate these two data sources in a cooperative manner, designed to enhance clinical trial results while mitigating several of the limitations associated with observational data. The remainder of this paper is organized as follows. In Section 2, we introduce statistical issues and methods for synthesizing RCT and RWD evidence. Section 3 presents the Cardiovascular Outcome Trial (CVOT) as a motivating example. Section 4 describes the proposed three-stage statistical analysis strategy. Practical considerations and related issues are discussed in Section 5.

Introduction

Meta-analysis is widely used in medical research to derive a pooled effect estimate based upon a synthesis of findings of multiple studies that frequently employ diverse designs [9]. Assuming the included studies are random samples drawn from a hypothetical population of pooled studies, the meta-analysis result can be interpreted as an objective estimation of the mean treatment effect across this hypothetical population [10]. Due to the heterogeneity across studies, this assumption may not hold and random-effect models may be able to provide “an overall summary of what has been learned” and “a quantitative measure of how results differ, above and beyond sampling error” [11]. In meta-analysis of RCTs and observational studies, more complicated models have been proposed to accommodate additional heterogeneity between these two types of studies, including naïve pooling that does not differentiate between-trial designs, inclusion of observational study data as prior information, a power prior approach where information from the observational study is down-weighted to reflect confidence in the study findings, and a three-level hierarchical model where one level of the model accounts for differences in RCT and observational study designs in addition to study level and participant level [12,13,14,15,16]. A more detailed discussion of statistical modeling can be found from the review paper by Schmitz et al. [12] and Efthimiou et al. [15].

As described, meta-analysis provides a quantitative measure of treatment effect across RCTs and observational studies. However, its applicability in combining these two study types is limited since subjects in observational studies are typically from a different population as from RCTs and the treatment effect may not be the same across those populations [15, 17]. There are restrictive procedures in the selection of RCT participants which makes the RCT population not a random sampling of the general patient population [18]. In particular, the distribution of confounding factors that modify the treatment effect and treatment choice in RCTs often differs from subjects seen in routine clinical practice. Moreover, the clinical question and trial objective may differ from the observational study. The design and conduct of RCT make RCT participants more likely to be adherent and less likely to deviate from the prespecified protocol schedules. The observational study is constrained by the data observed in actual practice patterns in real life. Directly combining these two types of studies together in a meta-analysis cannot produce estimates with a clear causal interpretation for any reasonable hypothetical population as meta-analyses combine study-specific effects based on the precision of the estimates rather than their relevance to the target population [17]. Moreover, the design of an RCT is different from that of an observational study, usually having shorter follow-up time, a smaller sample size, a bigger departure from routine clinical practice, and ethical restrictions. When an observational study data is included in a meta-analysis, it amplifies concerns about whether the efficacy and safety of the corresponding interventions can be translated into real-world effectiveness and safety [15]. In considering the challenge for defining the common clinical question across studies and large heterogeneity (including unexplained inconsistency) of treatment effect, meta-analysis may be inappropriate.

An alternative approach is presenting RCT and observational study results side by side where the observational study is considered to complement RCTs or address some of their limitations [17]. There is a stronger scientific justification for deriving evidence of a drug effect from RCTs as compared to observational studies [7]. The data produced from observational study studies can be useful complementary to those generated by RCTs and help to establish real-world effectiveness, harm, use, and value of treatments in a broad population of patients from routine care [19, 20]. In the literature, there are examples where RCT and observational study studies have reached similar conclusions about a treatment effect, and there are also examples when an effect identified in an observational study is discordant with the previously characterized effect (effect size, direction, or magnitude) [7]. Observational study results that are concordant with the RCT can provide regulators with greater confidence in new drug application review and approval process, whereas discordant results could warrant deeper reexamination of the RCT or observational study to identify reasons for this discordance and additional research may be necessary to further explore [3, 21]. This approach has been used in regulatory approvals. For example, an FDA advisory meeting was held in January 2019 to discuss the new drug application for Sotagliflozin as an adjunct to insulin therapy to improve glycemic control in adults with type 1 diabetes mellitus (T1DM). The risk for diabetic ketoacidosis (DKA), a life-threatening complication caused by a lack of insulin in the body, with the use of Sotagliflozin for treatment of T1DM has been raised as RCTs which showed an increased risk of DKA with Sotagliflozin compared with placebo [22]. The U.S. FDA presented both RCTs and FDA Sentinel data to support the review where analyses with Sentinel data showed that DKA rates observed in off-label use were higher than expected based on RCTs.

The complementary approach provides a qualitative way to review these two data sources together and draws the joint findings, but it does not provide an integrative quantitative assessment to accommodate both types of data sources. Furthermore, it is still questionable whether or not the observational study is sufficiently credible, interpretable, and ultimately acceptable for the regulatory purpose [16]. Due to the lack of randomization and the presence of confounding bias in an observational study, statistical methods (such as propensity score adjustment and instrumental variable analysis) have been developed to adjust for confounding bias and conduct causal inference. The commonly used methods include multivariable risk models, propensity score adjustment, and instrumental variable analysis [23, 24]. The propensity score for a subject, defined as the conditional probability of being treated given the subject’s covariates, can be used to balance the covariates between treatment groups and thus reduce estimation bias. In the absence of random assignment (which is typical of RWD study designs), propensity scoring is a tool designed to mitigate inter-group differences in important known baseline covariates. It is important to note that propensity score methods work best in large datasets in which one can obtain a reasonable overlap of confounding factors between treatment groups. In addition, propensity score will not adjust for the impact of unknown or unmeasured confounders that are not included in the propensity score model [23, 25]. Instrumental variable is a third variable that is influenced by explanatory and outcome variables in the causal model. Instrumental variable methods are not based on the hard-to-verify assumption of “no unmeasured confounding,” but require identification of a valid instrumental variable that needs to be correlated with treatment status and does not independently affect the outcomes of interest (except through a treatment effect). Often, proper instrumental variables are notoriously difficult to identify so that propensity score adjustment may be the only available practical choice. The causal estimation may also be compromised by measurement bias, i.e., when data items are measured with error. Assessment of data sources (including completeness, consistency, and trends over time) is needed to make sure the medical codes or combinations of codes, such as the World Health Organization’s International Classification of Diseases Coding and Coding and ICD-9-CM Codes Used in Vaccine Safety Research, adequately capture the underlying medical concepts they are intended to represent [7, 26].

Addressing residual confounding and other biases in casual estimation is critical to improving the validity of observational studies and subsequently drawing causal conclusions with both RCTs and observational studies. Current gaps or perceived obstacles to increasing use of RWD include insufficient confidence in data sources and also integrating these with clinical data [27]. Although there is considerable growth in the development of advanced causal inference methods in the past decades, it is still not guaranteed that causal inference methods for observational studies can address all of the potential bias and achieve the same level of evidence as RCTs. Increasing the credibility of observational studies can be at the design and/or analysis stage of the study. It is best to adjust for the potential for bias early, as statistical methods cannot replace a good research design. Since a well-designed RCT can provide high-quality evidence to establish causality, it can provide the foundation for framing observational studies. Several observational studies have successfully emulated the target trial using RWD and demonstrated that this approach can help avoid common methodological pitfalls and explain between-study differences [28,29,30,31,32]. By systematic comparisons of matched observational studies that emulate RCTs as closely as possible in terms of populations, exposure, and outcomes, it can provide confidence that observational study design and analysis plan are sufficiently valid to draw casual effect with RWD [33].

Motivating Example

Diabetes mellitus is a metabolic disorder characterized by the presence of hyperglycemia (elevated blood glucose) due to defective insulin secretion, insulin action, or both. Over time, poorly controlled serum glucose damages multiple organ systems, including elevating the risk for cardiovascular (CV) disease. Numerous antidiabetic drugs have been approved in Europe and the USA to control serum glucose. Although the correlation between poor glycemic control and cardiovascular risk is clear, it has proven difficult to demonstrate a causal relationship between improved diabetes control and reduced cardiovascular risk in type 2 diabetes mellitus (T2DM) [34]. Concerns about this gap in the assessment of CV safety grew after two highly controversial meta-analyses of CV risk were published [35, 36]. Given these concerns and the prevalence of CV disease in diabetic patients, both the U.S. FDA (in 2008) and the European Medicines Agency (EMA, in 2012) issued guidance for industry to address CV risk in for new antidiabetic therapies to treat T2DM [37, 38]. In particular, the U.S. FDA advised that pre-marketing phase 2 and phase 3 trials should rule out an 80% excess risk (upper bound of the rate ratio of the 2-sided 95% confidence interval less than 1.8) for major cardiovascular events (MACE). If the upper bound of the 95% confidence interval is between 1.3 and 1.8, a dedicated post-marketing cardiovascular outcome trial (CVOT) will be required to rule out a 30% excess risk after approval. The EMA guideline is similar to the FDA guidance but does not prospectively define any pre- or post-approval CV event risk margins.

After the publishing of the 2008 U.S. FDA guidance, every novel antidiabetic agent approved has undergone a dedicated CVOT, typically involving 5,000–15,000 people with type 2 diabetes and high-CV risk and planned to lasting 3–5 years [34]. And, none of the CVOTs completed and published to date have identified an increased risk of CV events; some of the CVOTs have instead demonstrated a reduced risk [39]. On October 24, 2018, the FDA held a 2-day Advisory Committee meeting to review the utility and impact of the 2008 guidance. The issues addressed by the FDA advisory board were as follows: (1) the impact of the recommendations in the 2008 guidance on the assessment of CV risk for drugs indicated to improve glycemic control in patients with T2DM; (2) the transferability of CV safety findings from members of a drug class that were studied to all drugs in the class (class effect), and (3) whether an unacceptable increase in CV risk needs to be excluded for all new drugs to improve glycemic control in patients with T2DM, regardless of the presence or absence of a signal for CV risk in the development program [40]. The panels made several recommendations for future regulatory guidance and CV outcome trials regarding antidiabetic therapies, include requiring only the 1.3 non-inferiority margin for regulatory approval, conducting trials for longer durations, considering head-to-head active comparator trials, increasing the diversity of patient populations, collecting safety data beyond cardiovascular events, and identifying ways to improve translation of trial results to general practice. Following the discussions at the Advisory Committee meeting, the U.S. FDA has revisited the recommendations of 2008 guidance and published new draft guidance in March 2020 [41]. The new draft guidance recommends sponsors to enroll a broader range of patients with comorbidities and diabetes-associated conditions, including patients with chronic kidney disease and older patients. It also removed the stringent criterion that requires sponsors to uniformly rule out a specific degree of risk margin for CV adverse outcomes (as recommended in previous 2008 guidance), but focuses more on the meaningful and reliable estimation of risk. However, the question still remains—how to rule out the excessive CV risk if a clinical trial study shows some degree of CV risk for the new antidiabetic treatment. In the new draft guidance, the importance of substantial safety information to support indication for glycemic control is still well recognized, and U.S. FDA continues to support CVOTs for patients with diabetes and will intend to work with sponsors to find efficient designs. Given the increasing availability of RWD supported by sophisticated statistical analytic tools along with regulatory interest in the use of such data to inform regulatory decision-making, observational study data could be an appealing alternative for a post-marketing CVOT.

Statistical Analysis Strategy

To address the acknowledged limitations of RCTs and RWD, we will explore the utility of data fusion techniques that were originally developed for computer science to integrate these two data sources in a cooperative manner. By definition, data fusion is the process of integrating multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to produce more consistent, accurate, and useful information than that provided by any individual dataset in isolation [42, 43]. It considers data integration as transporting heterogeneous datasets from one experimental condition to another via a causal diagram. Bareinboim and Pearl have derived a theoretical framework for causal analysis in combined experimental and observational datasets and extended it to clinical research settings [43]. In this manuscript, we present a novel statistical analysis strategy in applying data fusion for RCT and observational study evidence integration. Based on the relationship between the data sources, data fusion classifies the analysis methods into three categories: complementary fusion, where the data sources are not directly dependent on each other, represents non-overlapping information and can be combined to give a more complete image of the object under observation; competitive fusion (also called redundant fusion or meta-analysis), where data sources provide independent and overlaying information about the same target and can thus be fused to increment the confidence and robustness in a system; and cooperative fusion, where the data sources are used to create dependency with each other (e.g., one source of information is used to guide the search of new observations for another source) and eventually create new information that is more complex and not available from the original information. These three categories of data fusion methods are not mutually exclusive. Many applications implement aspects of more than one of the three types. The first two methods have been discussed in Section 2. When RCT and observational study are conducted on a different population and under a different set of conditions, the cooperative fusion aims to leverage exploit design distinctions among the available studies and synthesize an aggregate measure of “targeted” effect size instead of “averaging out” differences. Using the cooperative fusion idea to integrate RCT and RWD evidence, we propose a three-stage statistical analysis strategy: (1) feasibility analysis that uses an existing RCT to validate and plan an observational study, applying estimand framework and emulating RCT with RWD; (2) integrative analysis that combines evidence from the RCT and observational study data cooperatively; and (3) sensitivity analysis that examines the consistency of the previous analyses. Figure 1 summarizes the working process of the proposed statistical analysis strategy.

Fig. 1
figure 1

Flow chart of proposed statistical analysis strategy

Feasibility Analysis

Feasibility is necessary before fully considering an observational study for regulatory purposes. Through feasibility analysis, whether the data source is fit-for-use and whether the study design is sufficient to answer the intended research questions will be examined. To ensure the reliability and relevance of RWD submitted for use in regulatory decisions, FDA requires an assessment of whether the data capture relevant data on exposure, outcomes, and covariates [7]. Moreover, whether the observational study can provide credible casual estimation will need to be verified. Recently published EMA draft guideline on registry-based studies specified that analysis of potential information bias, selection bias, and potential confounding bias should be submitted either separately or as part of the proposed protocol [44]. An RCT can provide an excellent venue for the evaluation of an observational study. Although there is significant existing research on the validity of observational studies, the most credible approach to evaluating their utility to inform regulatory decision-making is to compare the findings of such studies with the findings of RCTs addressing similar clinical questions, under the assumption that the RCTs represent a reference standard for the underlying true treatment effect [32]. As an example, a study assessed the correspondence of cardiovascular events that were ascertained by algorithms applied to Medicare data with the original Women’s Health Initiative (WHI) trial that was conducted with the protocol-driven data collection and physician adjudication using linked records of trial participants to Medicare claims data [45]. Agreement between the two data sources was quite high, and the resultant hazard ratios and 95% confidence intervals were quite comparable. These encouraging results provided the needed support to launch a new embedded pragmatic trial that will rely heavily on Medicare claims to ascertain outcomes data. We propose the following three steps of feasibility analysis.

The first step of feasibility analysis is to emulate the target trial with RWD. Recent advances in comparative effectiveness research methods can help us design an observational study that mimics an RCT [46]. Herna´n and Robins developed a framework for comparative effectiveness research using large observational databases to emulate the target trial [30]. Several observational studies have successfully emulated the target trial with RWD and demonstrated that this approach can help avoid common methodological pitfalls and explain between-study differences [28, 29]. In the process of designing an observational study to emulate an existing RCT, the clinical question of the study can be framed and the study design can be adjusted to match with the RCT as much as possible. The estimand framework, described in ICH E9(R1) Guideline, can be applied to frame the estimand of the observational study [47, 48]. For summarizing data with RCT, the estimand of the observational study should be aligned with the RCT estimand which will form the basis to establish feasibility analysis. The five estimand attributes should be carefully examined by a multi-disciplinary team, including (1). treatment condition of interest; (2). population of patients targeted by the clinical question; (3). variable (or endpoint) to be obtained for each patient that is required to address the clinical question; (4). intercurrent events; and (5). population-level summary [47]. Similar to the five estimand attributes, data fusion considered four dimensions in transporting findings across studies with heterogeneous conditions (population, observational/experimental, sampling, measure) [43].

Completely replicate RCT is not possible. One important reason is that the RCT population is not a random sampling from the target population. The different distributions of patient baseline characteristics may vary the casual estimation between studies. Table 1 provides an extreme hypothetical example of such differences. A direct comparison of treatment effect from RCT and observational study that matched with RCT design, in the upper portion of Table 1, suggests that a lower treatment effect estimation from the observational study than the RCT. However, when stratifying the comparison by baseline disease severity, as in the lower portion of Table 1, the response rate from both treatments (A and B) is revealed to be the same in the RCT and observational study. The observed difference drawn from the naïve comparison between the RCT and observational study is an example that the heterogeneity in patient population between studies can be difficult to account for. Thus, the second step is to match the distribution of patient population between the observational study and RCT data. If the balance of the patient population cannot be achieved after adjusting the observational study design, the assessment of agreement between RCT data and RWD is not feasible. Similarly, if balance is achieved only in a very small subset of the RWD population, the analysis is again infeasible in the given RWD source. The third step of feasibility analysis is comparing the casual estimation of the RCT and matched observational study. If assignment to the treatment and outcome by treatment in the observational study were unconfounded, the treatment effects in the two studies should be expected to be similar. The differences in the estimated causal effects between the two studies can be considered as evidence of unobserved confounders.

Table 1 A hypothetical example of the observed difference in treatment effect estimation due to differences in patient baseline characteristics

By comparing the findings from RCT and matched observational study, whether the data, estimand, study design, and analysis plan are sufficiently valid to detect known causal associations in the RCT can be identified. If no substantial inconsistencies are found in the matched observational study, one can proceed with the integrative analysis of RCT and matched observational study. Disagreements might occur due to residual confounding, important differences in routine clinical practice and treatment in RCT, or systematic differences in matching observational study with RCT. If inconsistent causal effects are seen, further examination of their magnitude and reasons for such differences are warranted to gain a broader understanding of RWD and its corresponding on whether the observational study can be used to support regulatory safety decision-making.

Integrative Analysis

In this section, we will describe two applications in combining RCT data and RWD information together once the observational study can be validated via feasibility analysis. If the matched observational study can be established to match with RCT estimand, the meta-analysis can be applied to combine RCT data and matched observational study data. Moreover, it is more reliable to further expand the design of the validated observational study design to a broader patient population that was not previously studied in RCT. Other applications are also possible but will not be discussed in this paper.

The first application is meta-analysis of RCT and the matched observational study. As we discussed in Section 2, in the absence of a common target population, assessing causality when directly combining RCT and observational study findings in a meta-analysis approach may not be feasible. By matching the observational study with the RCT, the clinical question of the study can be framed to a similar question as in the RCT. Meanwhile, the study design and patient population are similar across the matched observational study and RCT. This meta-analysis approach can provide a quantitative measure of the pooled causal effect across the RCT and observational study and focus on the RCT population. As discussed in 3 with respect to the CVOT example, sponsors need to compare the incidence of major CV events occurring with the investigational agent to those in the control group to show that the upper bound of the 2-sided 95% for the estimated risk ratio is less than 1.8 for the pre-marketing submission. However, the number of CV events accrued during a typical phase 2 or 3 development program is usually insufficient to provide high statistical confidence. Thus, sponsors typically proposed to conduct a CV meta-analysis that includes completed Phase 2 and Phase 3 studies [49]. Although the 2020 U.S. FDA draft guidance removed the uniform degree of risk for CV events, it is still not clear how to exclude an unacceptable increase in CV or other identifiable risks associated with the new agent if the data are lacks of precision. To answer this question, the meta-analysis of an RCT and matched observational study could be performed to increase the precision of estimation and exclude a much lower degree of CV risk (such as the 1.3 criteria for the post-marketing setting that is defined in 2008 U.S. FDA guidance).

As described, the meta-analysis approach is only applicable when the analysis population can be properly defined. When performing integrative analysis of an RCT and the matched observational study under the meta-analysis framework, we can only provide answers to the RCT population which may not be generalizable for RWD population. The second application is to generalize RCT findings. As illustrated in Fig. 1, the RCT population can substantially differ from the RWD population. In particular, RCT findings generally only include patients who are adherent to therapy, monitoring, and follow-up and that is not part of real-world clinical practice. For example, current CVOTs typically have studied in patients with advanced CV risk or already established CV disease to ensure accrual of sufficient events in a timely manner and have sufficient statistical power in detecting CV risk differences [50, 51]. They cannot be entirely representative of the general population and therefore there is less certainty that the new treatment will improve CV outcomes for patients with a shorter duration of diabetes or without established CV complications. Lower-risk populations could determine whether diabetes medications offer CV protection for those who do not yet have CV disease [50]. Leveraging the finding established by feasibility analysis, the design of the matched observational study can be modified to reflect patients that are not well represented in the clinical trial. Since the study design and its causal relationship has been examined by the corresponding RCT, the extended observational study (with, for example, broader inclusion/exclusion criteria, longer follow-up times) could yield more credible results. In this sense, the RCT is used to validate RWD data source and observational study design, which may not necessary to combine its data with RWD.

Sensitivity Analysis

Despite attempts to emulate RCT design as closely as possible, differences between the RCT and corresponding RWE study populations remained. When emulating an RCT with RWD, a number of compromises will have to be made, such as eligibility criteria, endpoints definition, treatment strategies to be compared, and follow-up time [30]. Given the constraints of the RWD, adjusting the design and analysis strategy in matching the target RCT is needed. Then, we will choose the one that is closest to the target trial that can best answer the clinical question of interest. The difference between the matched observational study and the corresponding RCT should be documented and explored by evaluating alternative approaches to controlling the remaining confounding and its resultant bias. The sensitivity analysis can assess whether possible variations would lead to different results by altering some aspects of the study design and analysis. This is aligned with ICH E9 (R1) addendum for clinical trials and EMA draft guideline on registry-based studies [44, 47]. For example, death is recorded in Medicare claims but the cause of death is not [32]. The primary endpoint of CVOT typically is a composite of CV death, non-fatal MI, and non-fatal stroke. In addition, some CVOTs added hospitalization for unstable angina and heart failure as primary endpoints. It is expected that a significant proportion of all-cause deaths may be CV deaths in diabetic patients with high-CV risk [32]. To match the observational study with the existing CVOT, it is possible to include all-cause death as the primary endpoint. However, the assumptions about the cause of death should be further explored via sensitivity analysis with all-cause death excluded. The cause of death may able to be asserted via linked data from HER or National Death Index (NDI). In this case, the robustness of linked data can also be examined by sensitivity analysis. By extensive sensitivity analyses, the robustness of findings will be explored to identify the range of plausible effect estimates under varying assumptions about the untested assumption.

Discussion

In this paper, we discussed the utility of RWD for post-marketing safety decision-making, in conjunction with RCT data. The observational study can provide complementary evidence for the RCT in several dimensions, such as routine clinical practice, a broader patient population, and longer-term follow-up. If the questions that both studies are trying to answer are clinically identical or similar, meta-analysis methods can be used to synthesis study findings. However, there are still considerable obstacles in increasing the use of RWD, such as insufficient confidence for validating RWD, causal interpretation of observational studies, and also integrating these with RCTs. We proposed a three-stage statistical analysis strategy in this paper: feasibility analysis that uses an RCT to validate an observational study, integrative analysis that combines evidence from an RCT and an observational study cooperatively, and sensitivity analysis that examines the consistency of previous analyses. The proposed approach allows us to validate the causal effects in an observational study by matching the study with an RCT in all dimensions, such as distribution of patient population and treatment strategy, outcome. The forthcoming integrative analysis can only be feasible and interpretable if the observational study can be consistent with the RCT in causal estimation for a target population. Finally, the sensitivity analyses provide an additional examination of the robustness of the findings by exploring assumptions in the study design and analysis. A potential application of the proposed approach for the post-marketing CVOTs to alleviate the CV risk for antidiabetic therapies has been discussed.

Although there are some recent attempts examples where RWD was successfully used to emulate RCT, not every RCT can be closely emulated by RWD. For example, the treatment regimens in RCTs are often highly structured and may be difficult to be replicated in normal practice settings. It also has been argued that RCTs do not accurately reflect real-world circumstances under which patients are treated which limit the applicability of RCT results to real life [52, 53]. In order to yield meaningful data, the RCT must adapt and evolve so that the clinical knowledge generated from an RCT can increase the clinical relevance and overall usefulness of trial results. Such differences in study settings contribute to different treatment effect estimates from an RCT and the matched observational study. Furthermore, the results of a matched observational study are not always consistent with the RCT even when RWD can successfully emulate RCT and provides valid casual estimation. There have been several instances in which the efficacy and/or safety of marketed drugs differs substantially from that assessed in RCTs. Such phenomena can be considered as a problem of variability in drug response (efficacy or toxicity), either due to biological (e.g., previously unidentified patients’ genetic biomarkers and other intrinsic/extrinsic factors) or behavioral (e.g., inappropriate prescribing and drug handling, poor patient adherence) sources [54]. Overall, it is expected that the variability in a real-world setting is higher than that in clinical trial conditions which may reduce the likelihood to detect a true effect for the matched observational study. For these reasons, assessing concordance between RCTs and observational studies requires the application of quantitative methods which are interpreted and informed by medical judgment. Several statistical methods may be applied to assess agreement between RCT and observational study results, such as estimation agreement, standardized difference, and difference-in-difference [32, 55]. As these methods are applied to assess concordance between RCT and observational study results, there is the opportunity to further refine them within the setting of medical judgment. Nevertheless, by comparing an RCT with a matched observational study, it provides insight into whether and to what extent differences in treatment effect estimates are due to bias related to the study setting, patient population, treatment strategy, clinical practice, randomization, and/or other differences between RCT and observational study. The additional sensitivity analysis provides even more exploratory insights and generates hypotheses for future studies.

Previous studies, comparing the results of RCTs and observational studies, suggested that the lack of concordance can be attributed to differences in the populations being investigated or bias in the observational studies as a result of lack of randomization [56]. The proposed approach may rule out the possible differences in study populations and have a closer assessment of bias due to imbalanced unmeasured confounding between treatment arms in the observational study. If the feasibility analysis indicated that the matched observational study cannot provide consistent estimates to the RCT and no clinical rationale can explain such discordance, a randomized study will need to be conducted instead. This type of randomized study, such as a pragmatic randomized controlled trial (pRCT) or registry-based randomized controlled trial (R-RCT), that combined the advantages of a prospective RCT design with real-world natural in real-world features, could be seen as expanded use of RWD that meets regulatory criteria [27]. Nevertheless, the proposed approach could provide feasibility assessment for conducting such real-world trial and also inform trial design for the study. Such a staggered approach in developing an appropriate study type possesses flexibility for drug developers and regulators in evaluating which study is most appropriate to answer post-marketing safety questions, while still maintaining the highest standard of evidence to support regulatory decision-making.