Introduction

The growing, global pandemic of type 2 diabetes (T2DM) is a major public health concern. Randomized clinical trials (RCTs) have convincingly shown that lifestyle interventions consisting of exercise and diet behavioral modifications are highly efficacious in preventing or delaying the onset of T2DM for those at risk (Knowler et al. 2002; Norris et al. 2005; Pan et al. 1997; Tuomilehto et al. 2001). A critical next step in stemming this epidemic is to translate interventions developed in rigorously controlled clinical trials into everyday settings. A number of translational diabetes prevention initiatives have yielded promising results, albeit considerable variability in outcomes exists and the magnitude of risk reduction is generally less than that achieved in landmark clinical trials (Cefalu et al. 2016; Dunkley et al. 2014; Wareham 2015).

While implementing interventions of proven efficacy, these translational projects often forego inclusion of a randomized control group given ethical concerns about denying or delaying treatment. Thus, quasi-experimental designs are common in community-based translational research (Henry et al. 2017). Evaluations of such designs must be conducted with great caution. They are prone to mis-estimation of intervention effects due to potential biases resulting from selective enrollment and/or lack of control for placebo, historical, and other effects (Buntin et al. 2009; Flamm et al. 2012). Hence, appropriate analytical methods that can account for the potential biases but are also practical for program evaluation in routine settings are highly desirable.

Data from the National Institutes of Health funded Diabetes Prevention Program (DPP), a large-scale randomized trial that provided evidence for the efficacy of lifestyle intervention in a diverse sample of US adults (Knowler et al. 2002), has become publicly available through the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Repository. These data provide the potential opportunity to serve as a historical control to evaluate the effectiveness of translated versions of the DPP lifestyle intervention, provided the translational projects have similar eligibility criteria and outcome measures. The Special Diabetes Program for Indians Diabetes Prevention (SDPI-DP) demonstration project (Jiang et al. 2013) meets these criteria. Based on DPP and using very similar eligibility criteria and outcome measures, this one-armed intervention project implemented the DPP lifestyle intervention among over 2500 American Indian and Alaska Native (AI/AN) participants across 36 diverse grantee sites.

Simply combining intervention data with historical control data and analyzing the combined data as if they were from a single randomized trial can result in a biased treatment effect estimate due to imbalance in observed confounders between treatment conditions. A statistical method that has often been used when considering causal effects in observational studies without randomization is the propensity score (PS) approach whereby the distributions of potential confounders between comparison groups are statistically balanced (Guo and Fraser 2009; Rosenbaum and Rubin 1983). Another type of summary score, the disease risk score (DRS), has been suggested as an alternative method to control for confounding (Sturmer et al. 2005). The DRS method has been applied widely to predict the occurrence of chronic diseases, such as cardiovascular disease and diabetes (D'Agostino Sr. et al. 2008; Kahn et al. 2009; Lee et al. 2006; Lindstrom and Tuomilehto 2003; Noble et al. 2011; Wilson et al. 1998). Recently, simulation studies have shown when the PS distributions do not overlap well between the comparison groups, the DRS approach might allow researchers to assess treatment effects in a larger proportion of the treated population and yield effect estimates with improved precision. In this study, we explored and compared the use of PS and DRS approaches in evaluating the effectiveness of SDPI-DP, using the DPP data as historical controls.

Research Design and Methods

Data Sources

SDPI-DP

The SDPI-DP program is a demonstration project designed to reduce diabetes incidence among American Indians and Alaska Natives (AI/ANs) with prediabetes through local translation of the DPP lifestyle intervention. The details of this project are described elsewhere (Jiang et al. 2013). Briefly, 36 AI/AN health care programs implemented the 16-session Lifestyle Balance Curriculum drawn from the DPP (Knowler et al. 2002). The primary goal of the intervention was to achieve and maintain a weight reduction of at least 7% of initial body weight through a healthy diet and increased physical activity. Grantees used the DPP curriculum covering diet, exercise, and behavior modification to help participants achieve this goal. Adaptation to local culture and conditions was allowed, provided the same basic information was presented and adaptations were well documented.

The eligibility criteria of SDPI-DP included being AI/AN (based on eligibility to receive Indian Health Service [IHS] services), being 18+ years of age, no previous diagnosis of diabetes, and having either impaired fasting glucose (i.e., a fasting blood glucose (FBG) level of 100 to 125 mg/dL and an oral glucose tolerance test (OGTT) result < 200 mg/dL) or impaired glucose tolerance (i.e., an OGTT of 140 to 199 mg/dL 2 h after a 75-g oral glucose load and a FBG level < 126 mg/dL). Enrollment began January 2006. The analyses here included baseline and annual data for up to 3 years for 2553 participants who completed the baseline assessment and started intervention by July 31, 2008.

During the design phase of SDPI-DP, the inclusion of a control group was deemed an unethical delay of treatment due to strong evidence supporting the efficacy of the lifestyle intervention (Knowler et al. 2002; Norris et al. 2005; Pan et al. 1997; Tuomilehto et al. 2001). Rather, the goal of SDPI-DP was to pursue a comprehensive public health evaluation of the translation of a proven intervention in diverse AI/AN communities. Therefore, all SDPI-DP participants received the intervention. Previous analyses of SDPI-DP data based on one-arm design revealed significant improvements in weight and a number of secondary outcomes (Jiang et al. 2013). However, without a control group, a causal interpretation of intervention effectiveness is not straightforward.

DPP

We used de-identified DPP data obtained from the NIDDK Data Repository as historical controls (Cuticchia et al. 2006). The DPP was a RCT conducted at 27 US sites enrolling individuals at high risk for diabetes. Its methods are published elsewhere (The Diabetes Prevention Program 1999). Briefly, eligible participants were randomly assigned to one of the three groups: (1) placebo medication twice daily and standard lifestyle recommendations; (2) metformin twice daily and standard lifestyle recommendations; or (3) intensive lifestyle modification. The first group (placebo) from DPP served as the historical control to evaluate the effectiveness of SDPI-DP.

The eligibility criteria for most DPP participants were ≥ 25 years old, BMI ≥ 24, FBG level of 95–125 mg/dL, and OGTT 2-h result of 140–199 mg/dL. Compared to SDPI-DP, DPP had more stringent eligibility criteria because its participants needed to have both impaired fasting glucose and impaired glucose tolerance; additionally, BMI defined eligibility in DPP.

We obtained the DPP data following established policies of the NIDDK Data Repository. The University of Colorado Anschutz Medical Center and University of California Irvine IRBs approved this supplementary analysis.

Measures

Both studies have the same primary outcome: incidence of diabetes diagnosed by an annual OGTT or a semiannual FBG test, according to the American Diabetes Association 2004 criteria: a FBG ≥ 126 mg/dL or a 2-h result ≥ 200 mg/dL after a 75-g oral glucose load. In addition to the semiannual tests, FBG was measured if symptoms suggestive of diabetes developed. The diagnosis required confirmation by a second test, usually within 6 weeks of the first test.

Basic demographic characteristics and key diabetes risk factors comprised the common baseline measurements of SDPI-DP and DPP. Age, gender, and race were available in both data sets. Given the importance of de-identification in the public available DPP data, race/ethnic groups were simply coded as Caucasian, African American, Hispanic, and All Other. Similarly, age at baseline was collapsed into 5-year age groups, with truncation of those < 40 and those ≥ 65.

Well-known diabetes risk factors measured at baseline in both data sources are BMI, family history of T2DM, FBG, OGTT, systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides, high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C). In both studies, baseline physical examination included measurements of height, weight, and sitting SBP and DBP. BMI was calculated from height and weight (kg/m2). Blood was drawn from DPP and SDPI-DP participants after a 9–12-h fast to measure blood glucose level, triglycerides, HDL-C, and LDL-C.

Statistical Analysis

To quantify the effectiveness of the lifestyle intervention among SDPI-DP participants, Cox proportional hazards regression models were constructed after merging the SDPI-DP data with those from the DPP placebo group. We investigated four approaches to estimate the hazard ratio (HR) of developing T2DM among SDPI-DP vs. the DPP placebo group: (1) no adjustment for confounders; (2) multivariable regression adjustment; (3) propensity score matching; (4) disease risk score matching. We describe each of these methods in turn.

Unadjusted Estimate

After merging the two data sources, the Cox regression models with a dummy variable indicating SDPI-DP (intervention group) or DPP cohort (control group) as the only independent variable produced the estimate for unadjusted HR of diabetes between SDPI-DP and DPP placebo participants.

Multivariable Regression Adjustment

The Cox regression models were then adjusted by previously reported diabetes risk factors identified in three published diabetes prediction models validated by Mann et al. in a multiethnic population of US adults (Mann et al. 2010). These were based on the Framingham Offspring (FO) Study (Wilson et al. 2007), Atherosclerosis Risk in Communities (ARIC) Study (Schmidt et al. 2005), and the San Antonio Heart (SAH) Study (Stern et al. 2002). The risk factors included in the FO risk prediction model were overweight and obesity, impaired fasting glucose, low HDL-C, elevated triglycerides, high blood pressure, and parental history of diabetes. The ARIC model included age, height, waist circumference, black race/ethnicity, SBP, FBG, HDL-C, triglycerides, and parental history of diabetes. Finally, the San Antonio diabetes risk prediction model included age, sex, Mexican-American ethnicity, FBG, SBP, HDL-C, BMI, and family history of diabetes. In addition to the risk factors included in these three models, we added another variable, OGTT 2-h test result, to each model to account for the potential unbalance in it between the SDPI-DP and DPP participants. Thus, for each risk prediction model, we estimated adjusted HR with and without OGTT 2-h result included in the model.

Propensity Score Method

Some drawbacks of using regression adjustment include the strong assumptions in model specification of the outcome, computational complexity when many potential confounders exist, and the danger of extrapolation in situations with insufficient common support between groups of comparison. The propensity score (PS) approach was proposed to overcome these challenges. The PS, as defined by Rosenbaum and Rubin (Rosenbaum and Rubin 1983), is the predicted probability of exposure/intervention for a given vector of observed covariates. It is considered a balancing score and is usually estimated using a logistic regression model with the exposure variable as the response and all variables related to the outcome of interest as the covariates (D'Agostino Jr. 1998; Guo and Fraser 2009; Rosenbaum and Rubin 1983). Here, a logistic regression model was used to estimate the PS of belonging to the SDPI-DP cohort. The covariates included for estimating the PS were the same as the adjustment variables used in the multivariable regression models described above in order to compare the methods. After the PS has been estimated, it can then be used in various ways to obtain adjusted estimates of intervention effectiveness.

This study focused on the PS matching approach. We used a nearest-neighbor methodology with a caliper set to 0.1 of the standard deviation of the PS (Rosenbaum and Rubin 1985) to match SDPI-DP subjects to DPP placebo subjects with a maximum of one control matched per appropriate case. After matching on the PS, estimates of intervention effectiveness were calculated using the Cox regression models. The models include the exposure as the only independent variable as the comparison samples have already been matched. Covariate balance before and after matching was checked by calculating the absolute standardized difference (ASD) of each variable between the two treatment groups.

Disease Risk Score Method

The disease risk score (DRS), sometimes referred to as prognostic score, is a score summarizing the associations between a set of observed covariates and a disease outcome, such as diabetes incidence. It was originally proposed by Miettinen and was called a “multivariate confounder score” to overcome the difficulty of multiple cross-classification in stratified analysis based on a number of confounding factors (Miettinen 1976). Specifically, if the association between observed covariates and the outcome follows a generalized linear model, the DRS score is usually calculated as the conditional expectation of the outcome given the values of the observed covariates using the estimated parameters of the generalized linear model. In most previous studies using DRS, it was developed for prediction purposes and has been mainly used to estimate the probability of developing a disease among individuals not exposed to an intervention.

Recently, it also has been suggested that the DRS could be used as a balancing score similar to the PS, because the DRS can be used to account for not only unbalanced propensities of exposure but also different disease risks between comparison groups (Hansen 2008). Recent simulation studies suggested that the DRS could be a reasonable alternative to the PS approach when the association between covariates and exposure is, at most, moderate (Arbogast et al. 2008; Arbogast and Ray 2009, 2011). Additionally, when the PS distributions are severely separated, matching on DRS is often able to match a larger proportion of the treated population and yield effect estimates with improved precision (Wyss et al. 2015). In this study, DRSs were calculated based on the three diabetes risk prediction models described above: FO, ARIC, and SAH. DRS matching was then performed using an approach similar to the PS methods. Because none of the original version of these diabetes risk models included the OGTT 2-h test result, we fitted the three DRS models with and without OGTT 2-h result within the DPP placebo group and calculated the DRSs for all participants afterwards.

The confounding control ability of DRS matching cannot be evaluated through balance checks commonly used for the PS approach. Here, we used a newly proposed alternative, the “dry-run” analysis (Wyss et al. 2017), to assess the ability of DRS matching in controlling potential confounding effects. Briefly, in the dry-run analysis, we split the unexposed population (i.e., the DPP placebo group) into “pseudo-exposed” and “pseudo-unexposed” groups so that the differences on observed covariates between the two “pseudo” groups are similar to those between the DPP placebo and the SDPI-DP participants. We then evaluated the ability of each DRS model in confounding control by calculating the pseudo-bias, defined as the difference between the pseudo-effect estimate and the true null effect. A pseudo-bias close to 0 indicates adequate ability of the DRS matching approach in retrieving the unconfounded null estimate.

Results

Table 1 compares baseline characteristics of the DPP placebo group and the SDPI-DP participants. Overall, compared to the DPP, SDPI-DP participants were younger (46.8 vs. 50.3 years old), included more females (75% vs. 68%) and more obese participants (80% vs. 68%). They also had significantly lower FBG, OGTT 2-h result, and LDL-C level, but significantly higher weight, waist circumference, and systolic BP. The SDPI-DP subgroup (n = 648) who met the DPP eligibility criteria had similar differences when compared with the DPP placebo group. Here, however, FBG and OGTT 2-h result were not significantly different between the SDPI-DP subgroup and the DPP placebo group.

Table 1 Baseline characteristics of DPP placebo group and SDPI-DP participants

Unadjusted and adjusted HRs estimated by various statistical models are presented in Table 2. When we compare all SDPI-DP participants (N = 2553) to the DPP placebo group (N = 1030), the unadjusted HR for diabetes risk is 0.35, suggesting a 65% risk reduction by the lifestyle intervention among SDPI-DP participants. When OGTT 2-h test result was not included in the adjustment methods, all the statistical models produced similar HR estimates, ranging from 0.29 to 0.41, with a very small P value (< 0.0001). However, when OGTT 2-h test result was included, all the adjusted HR estimates were larger than the unadjusted HR, ranging from 0.56 to 0.69 (Table 2), indicating a weaker effectiveness of the SDPI-DP lifestyle intervention. These are close to the unadjusted HR comparing the SDPI-DP participants who met the DPP eligibility criteria to the DPP placebo group, which is 0.64 (95% CI 0.49–0.84). Regardless of including OGTT 2-h test results or not, the DRS matching resulted in more pairs of SDPI-DP and DPP participants to be matched. All the estimated HRs are significantly different from 1 (P < 0.05), suggesting lifestyle intervention was significantly effective at reducing diabetes risk among SDPI-DP participants.

Table 2 Intervention effectiveness of SDPI-DP based on different estimation methods using data from SDPI-DP participants and DPP placebo group

When comparing the DPP placebo participants and the SDPI-DP participants who met the DPP eligibility criteria, the unadjusted HR for diabetes risk is 0.64, indicating a 36% risk reduction by the SDPI-DP lifestyle intervention. This HR estimate is closer to the HRs estimated using the adjustment models with OGTT 2-h result included in Table 2, but is much larger than the unadjusted HR when comparing all SDPI-DP participants with the DPP placebo group. The adjusted HRs for SDPI-DP subgroup vs. the DPP placebo group are all slightly smaller than the unadjusted HR, but fairly close to it in general, regardless of including OGTT 2-h result in the model or not (Supplementary Table 1).

Figure 1 illustrates the PS and DRS distributions by intervention group before matching. The PS distributions of the SDPI-DP and the DPP placebo group do not overlap very well with each other (overlapping coefficient [a measure of the agreement between two probability distributions (Inman and Bradley 1989)] ranges from 0.34 to 0.40) with many SDPI-DP participants having very high probability of belonging to the intervention group. Meanwhile, for DRS, the overlapping coefficients are much larger (0.49–0.66).

Fig. 1
figure 1

Propensity score and disease risk score distributions across treatment groups. Abbreviations: ARIC, Atherosclerosis Risk in Communities Study; DPP, Diabetes Prevention Program; FO, Framingham Offspring Study; SAH, San Antonio Heart Study; SDPI-DP, Special Diabetes Program for Indians Diabetes Prevention Program

As shown in Fig. 2, before PS matching, the absolute standardized differences (ASDs) between the two treatment groups were larger than 0.1 for most of the diabetes risk factors included in our regression models. However, after matching on PS scores without OGTT 2-h test results included in the PS model, the ASDs were smaller or close to 0.1 for all risk factors except OGTT 2-h. Furthermore, after matching on PS scores with OGTT 2-h test results in the model, the ASD was smaller than 0.1 even for OGTT 2-h. Particularly, upon matching on the PS scores calculated based on the ARIC model, the ASDs are smaller than 0.1 for all risk factors except gender. The fitted PS model based on the ARIC covariates has excellent predictive performance, with a C statistic of 0.885.

Fig. 2
figure 2

Absolute standardized differences before and after propensity score matching in covariate values for all SDPI-DP vs. DPP placebo participants. a Without OGTT 2 h test included in PS models. b With OGTT 2 h test included in PS models. Abbreviations: ARIC, Atherosclerosis Risk in Communities Study; DPP, Diabetes Prevention Program; FO, Framingham Offspring Study; OGTT, oral glucose tolerance test; SAH, San Antonio Heart Study; SDPI-DP, Special Diabetes Program for Indians Diabetes Prevention Program

Table 3 exhibits the estimated dry-run pseudo-bias before and after DRS matching. Before matching, the pseudo-bias of the unadjusted HR was about − 0.30 (95% CI − 0.47, − 0.12), indicating the differences on observed covariates between the DPP placebo and the SDPI-DP participants would result in an estimated preventive effectiveness of − 0.30 even when the true intervention effect was 0. After matching based on DRS without OGTT 2-h test included in the models, the mean pseudo-biases ranged from − 0.14 to − 0.36. After matching based on DRS with OGTT 2-h results included, the estimated pseudo-biases were much closer to 0. The smallest mean pseudo-bias was 0.01 (95% CI − 0.25, 0.22), which was matched on the ARIC score.

Table 3 “Dry-run” analysis evaluating disease risk scores for confounding control

Discussion

Using historical control data, this study attempted to formally evaluate the translational effectiveness of a one-arm project focusing on translating an evidence-based intervention into a community setting. Many translational or programmatic diabetes prevention programs have sought to translate the well-established evidence of lifestyle interventions into “real-world” settings. Given the existing strong evidence supporting the intervention being implemented, such programs often adopt a pre-post study design without including a concurrent control group. The evaluation of the translational effectiveness for such programs is thus, challenging. While methods have been developed for continuous outcome variables in one-arm intervention designs (Chevreul et al. 2014), approaches are needed that allow one to assess intervention effects on a time-to-event outcome, such as diabetes incidence.

The current study illustrates a potential solution to this challenge, based on publicly available data from the original RCT that generated the evidence for the intervention being translated. While our results provide initial evidence for the usefulness of such an approach, they also underscore that great care is essential. As shown in Table 2, after simply merging the DPP and all SDPI-DP data, the unadjusted HR greatly overestimated the effectiveness of the SDPI-DP lifestyle intervention. Yet, when we restricted the analysis to the SDPI-DP subgroup or adjusted for all baseline diabetes risk factors, the magnitude of the estimated risk reduction was smaller. These observations highlight the importance of considering the eligibility criteria and baseline characteristics of two studies involved when conducting program evaluations using historical control data (Baker and Lindeman 2001).

Further, we found the omission of an unbalanced confounder, the OGTT 2-h result, produced biased estimates of the intervention effectiveness no matter which statistical method was employed. As noted in the PS literature (Drake 1993), even such sophisticated methods cannot correct the bias introduced by omitting important confounders. Many DPP translational projects did not conduct OGTT due to cost and feasibility considerations. Yet, our results suggest that, for accurate estimation of diabetes prevention effect of a translational intervention, this variable may be too important to ignore. Furthermore, emerging evidence highlighted the importance of OGTT in detecting prediabetes and T2DM (NCD Risk Factor Collaboration 2015). Indeed, a recent study in a high-risk population found that 47.3% of newly diagnosed patients with T2DM would have been missed if OGTTs were not performed (Meijnikman et al. 2017).

The regression adjustment method is the standard method to control for potentially unbalanced confounders, but suffers from computational complexity and model selection issues. It has been shown to produce biased estimates for regression coefficients when the number of events per covariate is less than 10 (Harrell Jr. et al. 1985; Peduzzi et al. 1996). Hence, dimension reduction methods such as the PS or DRS are preferred in the presence of a large number of potential confounders. When comparing PS matching with DRS matching in this study, we found the DRS approach resulted in more matched pairs than PS. This is consistent with a recent simulation study demonstrating that DRS can match a larger proportion of the treated population when the PS distributions across comparison groups are strongly separated (Wyss et al. 2015). Consequently, DRS matching can improve the precision and potential generalizability of the effect estimate due to larger sample size. The DRS approach has been shown to require a weaker condition than the positivity assumption of the PS approach (Hansen 2008). It only assumes no levels of disease risk at which each intervention or control is received with certainty, which means DRS matching can allow researchers to include individuals who would otherwise be excluded with PS matching, especially in regression discontinuity designs such as the current study where participants with an OGTT 2-h result < 140 mg/dL were excluded from the DPP study.

Regarding the three different DRS models we explored, all demonstrated adequate capability to correct the bias in the effectiveness estimates, as long as all important confounders were considered. Although due to lack of a true control group, it is difficult to assess which method produced the least biased estimate for the effectiveness, the PS matched samples based on the ARIC model exhibited the best covariate balance with ASD < 0.1 for almost all diabetes risk factors. Similarly, among the three DRS models we compared, matching on the ARIC score (with OGTT 2 h included in the score) produced the least pseudo-bias which is very close to 0. The ARIC score included waist circumference instead of BMI in its model, which has been demonstrated to be more predictive of diabetes risk than BMI (Klein et al. 2007). This might explain the better performance of ARIC model in confounding control shown here.

Several limitations exist in this study. First, the race/ethnic compositions of the two data sources were substantially different. The SDPI-DP only recruited AI/ANs while the DPP was a multiethnic cohort with only 49 AI/ANs in the placebo group (3). Since the publicly available DPP data coded AIs in the “Other” category along with other race/ethnicity groups, adjusting for AI/AN status was impossible. However, the DPP findings showed no significant racial differences in the efficacy of lifestyle intervention, including the AI/AN subgroup (Knowler et al. 2002). Second, except for the OGTT 2-h result, we only adjusted for diabetes risk factors that were included in one of the three diabetes risk models, which may not capture all the potentially unbalanced confounders. Third, we could only find a match for < 40% of the SDPI-DP participants. This means valid inference can only be made for a proportion of the SDPI-DP participants. Yet, a previous study reported no treatment heterogeneity in lifestyle intervention effects based on baseline diabetes risk of the DPP participants, suggesting potential generalizability of our results (Sussman et al. 2015).

Last, although the PS and DRS approaches appear to be useful statistical tools for evaluating intervention effectiveness in studies with observational data or quasi-experimental design, they cannot substitute RCTs in assessing the efficacy of a new intervention. The Women’s Health Initiative (WHI) study reported a well-known example where the conclusions from observational studies were different from those based on RCT: although several large observational studies with sound statistical design and analyses suggested postmenopausal hormone use reduced CHD risk (Grodstein et al. 1996) (Varas-Lorenzo et al. 2000), the WHI randomized trial reported those in the hormone therapy arm had a higher incidence of CHD than the women in the placebo group (Manson et al. 2003). The discrepancies between the results of the WHI RCT and observation studies could largely be explained by the time-varying HRs of the treatment effects (Prentice et al. 2005), which cannot be detected and solved by the statistical methods used here. Furthermore, several threats to internal validity as listed by Campbell and Stanley (Campbell and Stanley 1963) might exist in our study. The potential applicability of those threats to our study is listed and discussed in Supplementary Table 2.

In summary, this study illustrates how one can use publicly available RCT data as historical controls to evaluate the intervention effectiveness of community translational projects without a concurrent control. Carefully employed, this approach shows promise in obtaining relatively accurate estimates for the translational effectiveness of projects wherein the eligibility criteria and outcome measures are similar. Indeed, future translational initiatives without a control group may consider using similar eligibility criteria and outcomes as the original clinical trial(s), at least for a proportion of the participants, in order to allow for formal evaluation of translational effectiveness using historical control data. To overcome potentially severe selection bias while using historical controls, it is critical to employ a proper statistical method to balance the distributions of potential confounders between comparison groups. Both PS matching and DRS matching are good choices when the number of confounders that needs to be adjusted is large (Cepeda et al. 2003; Harrell Jr. et al. 1985; Peduzzi et al. 1996). Further, the DRS approach may be particularly suitable in circumstances when the PS distributions of the comparison groups do not overlap well with each other.