Most users of meta-analyses of randomized trials want to synthesize available evidence in order to draw causal inferences about a target population of substantive interest. Unfortunately, results obtained by conventional meta-analysis methods do not have a clear causal interpretation when the distribution of effect modifiers differs among the populations underlying the included trials and the target population (Dahabreh et al., 2020a; Sobel et al., 2017). This problem cannot be addressed in meta-analyses of effect sizes or other trial-level summary statistics because aggregate data cannot be used to fully account for individual-level effect modifiers. The problem is also not addressed by standard approaches to individual participant data meta-analyses, which can account for individual characteristics (e.g., via covariate-adjusted outcome regression) and heterogeneity of treatment effects across trials (e.g., via mixed-effects models; Tierney et al., 2015), because the output of these approaches cannot be interpreted in the context of the target population. In fact, meta-analyses often do not explicitly specify a target population.

The results of each trial in a meta-analysis apply to the population underlying the trial, reflecting the trial’s eligibility criteria and recruitment practices; this population will have a different distribution of effect modifiers than most target populations of substantive interest, such as patients who are candidates for treatment in a particular setting. Specialized methods are needed to transport inferences from each trial to the target population (Dahabreh et al., 2020b; Pearl & Bareinboim, 2011).

To illustrate, consider a hypothetical meta-analysis where, on average, an experimental treatment shows a benefit compared to some control treatment, and where the treatment effect is moderated by baseline (pre-treatment) mental health symptom severity, such that individuals with low severity experience benefit from the treatment, whereas individuals with high severity experience no effect. Suppose we are interested in using meta-analysis to inform treatment decisions for a target population characterized by high symptom severity. If individuals with higher symptom severity are less likely to be recruited and less likely to participate in trials, then trial samples would have lower symptom severity compared to the target population. Treatment effect estimates from the trials, and any conventional meta-analysis of these estimates, would show benefit from the experimental treatment, but these estimates are unlikely to apply to the target population where the treatment benefit will be attenuated due to the higher proportion of individuals with high symptom severity.

As in our hypothetical example, participants in the vast majority of trials are purposely recruited rather than randomly sampled. Recruitment of participants who meet a trial’s inclusion and exclusion criteria results in trials with underlying populations that differ from one another and from the target population. Moreover, participation in trials is voluntary and subject to self-selection: individuals who choose to participate in research likely differ from those who will ultimately receive the treatment (Elwood, 1982). Thus, the target population typically differs from the populations underlying the trials. Conventional meta-analysis methods do not address these differences, and causal inference in the context of meta-analysis requires use of specialized “transportability methods” to draw causal inferences about the target population.

Transportability methods address differences between the population underlying a trial and the target population of interest by combining background knowledge, statistical methods, data from the trials, and data from a sample of the target population to extend causal inferences from the trial to the target population (Cole & Stuart, 2010; Dahabreh et al., 2020b2019b; Rudolph & van der Laan, 2017; Westreich et al., 2017). We recently proposed extensions of these methods that combine individual participant data from multiple trials with baseline covariate data from a sample of the target population to estimate treatment effects relevant to the target population (Dahabreh et al., 2019a2020a). In this manuscript, we briefly describe transportability methods for individual participant data meta-analysis, provide a worked example in HIV prevention using data for which conventional meta-analytic methods would have limited usefulness, and discuss challenges and opportunities in using transportability methods for causally interpretable meta-analysis.

Transportability Methods for Meta-analysis

Identification of Treatment Effects in Each Trial

The potential outcomes framework facilitates the formal definition of causal estimands (e.g., average treatment effects) and articulation of assumptions needed for these causal estimands to be identifiable (Neyman, 1923; Robins & Greenland, 2000; Rubin, 1974). Briefly, the framework posits that each individual has a well-defined potential outcome under each treatment being considered. In a trial, at most, one potential outcome may be observed for each trial participant (because they are assigned to one treatment) while the other potential outcomes (treatments to which the individual is not assigned) remain unobserved (counterfactual). The potential outcome mean for the population underlying the trial is the average outcome if everyone in the population had received that treatment, and is identifiable when certain conditions are met, including consistency of potential outcomes, exchangeability between treatment groups, and positivity of the treatment assignment probability (Hernán & Robins, 2020). The consistency condition states that the observed outcome for an individual receiving a specific treatment is equal to that individual’s potential outcome under that treatment. This condition holds when there is no interference, that is, each participant’s potential outcome does not depend on the treatment of others, and when all participants receive the same version of treatment (or when treatment variation is irrelevant to the outcome of interest; Hernán, & VanderWeele, 2011; VanderWeele, 2009). These components of the consistency condition are sometimes referred to as the stable unit treatment value assumption (SUTVA; Rubin, 1980, 2010). Conditional exchangeability among treatment groups states that potential outcomes are independent of treatment assignment conditional on baseline covariates. Finally, positivity of the treatment assignment probability states that all participants in the trial have a non-zero probability of being assigned to each treatment. Randomization of well-defined treatments in a controlled trial helps ensure that these assumptions are met, and thus, the potential outcome means and average treatment effects can be expressed as functions of the observable data obtained from the population underlying the trial.

Identification of Treatment Effects in the Target Population

Transporting causal inferences from controlled trials to a target population requires conditions beyond those needed for identification of causal estimands in the population underlying each trial. First, it requires a stronger version of consistency that holds across the populations underlying the trials and the target population and also encodes an assumption that trial participation does not affect the outcome except through the treatments (Dahabreh et al., 2019c, d, 2020c). Second, it requires conditional exchangeability among populations underlying the randomized trials and the target population, which states that the potential outcome is independent of the trial or target populations, conditional on baseline covariates. Third, it requires positivity of the trial participation probability, which states for the baseline covariates needed to satisfy the condition of exchangeability among populations; every covariate pattern that occurs in the target population should have non-zero probability of occurring in at least one trial that evaluated each treatment of interest (Dahabreh et al., 2019a2020a discuss several versions of the exchangeability and positivity conditions and explore their implications for identifying causal estimands). When these assumptions are met, it is possible to transport inferences about potential outcome means using data from a collection of trials to a target population, allowing for direct comparison of potential outcome means in the context of the target population.

The identifiability conditions stated above can be used to express the potential outcome means in the target population as a function of the observed data distribution (Dahabreh et al., 2019a). To introduce some notation, let \(X\) be a set of baseline covariates collected from all trial participants and from a sample of the target population that are sufficient to satisfy the identifiability conditions. Informally, the sufficient set of covariates in \(X\) are all those that either predict the outcome (prognostic indicators) or modify response to treatment and relate to trial participation or treatment assignment. Let \(S\) denote the random variable for the data source from which an observation is obtained and \(\mathcal{S}\) the collection of trials. Our data application involves three randomized trials; we use \(S\) to denote the source trial for trial participants, with \(S\in \mathcal{S}=\left\{{1},{2},{3}\right\}\). We use the convention that \(S=0\) denotes the target population. Let \(Y\) be the outcome of interest examined in the trials and \(A\) the assigned treatment. We use lowercase letters to denote realizations of these random variables; for example, \(s\) denotes a specific study, and \(a\) denotes a specific treatment. We require data on covariates, treatment assignment, and outcomes from trial participants; but only data on covariates from the target population sample. Data from the trials and the sample from the target population are combined in a composite dataset where the total number of observations is denoted by \(n\), and observations are indexed by \(i\). We denote the potential outcome for participant i under treatment a as \({Y}_{i}^{a}\). The target of inference is the potential outcome mean in the target population, \(\mathrm{E}\left[{Y}^{a}|S=0\right]\). Other parameters of interest are functions of these potential outcome means. For example, the average treatment effect comparing two treatments, \(a\) and \({a}^{{'}}\), is defined as \(E\left[{Y}^{a}-{Y}^{{a}^{{'}}}|S=0\right]=E\left[{Y}^{a}|S=0\right]-E\left[{Y}^{a{'}}|S=0\right]\).

Under the identifiability conditions, the potential outcome mean for treatment \(a\) is identified by the following function of the observed data distribution (Dahabreh et al., 2019a): \(\phi \left(a\right)=\mathrm{E}\left[\mathrm{E}\left[Y|X,S\ne 0,A=a\right]|S=0\right]\). Informally, we are marginalizing the expectation of the outcome conditional on covariates and assignment to treatment \(a\) in the collection of trials (\(S\ne 0\)) over the covariate distribution of the target population (\(S=0\)). When the identifiability conditions hold, \(\phi \left(a\right)\) can be interpreted as the potential outcome mean under intention to assign members of the target population to treatment \(a\).

Estimation of Potential Outcome Means in the Target Population

The identification results above can be used to build an estimator of the potential outcome mean using a model for the expectation of the outcome conditional on covariates among trial participants assigned to treatment \(a\), that is, \(\mathrm{E}\left[Y|X,S\ne 0,A=a\right]\). Note, there are alternative approaches that rely on modeling the probability of trial participation instead of modeling the expectation of the outcome (Dahabreh et al., 2020a2019a), but for the purposes of this paper, we will focus on modeling the expectation of the outcome. Specifically, for every treatment of interest, \(a\), we propose to estimate \(\phi \left(a\right)\) as

$$\widehat{\phi }\left(a\right)={\left\{\sum _{i=1}^{n}I\left({S}_{i}=0\right)\right\}}^{-1}\sum _{i=1}^{n}I\left({S}_{i}=0\right){\widehat{g}}_{a}\left({X}_{i}\right).$$

The estimator uses an outcome regression model \({\widehat{g}}_{a}(X)\) estimated using trial data, generates model-based predictions using the covariates of everyone in the sample from the target population, and then averages the predictions in the sample of the target population to generate an estimate of the potential outcome mean in that population. We can estimate \({\widehat{g}}_{a}(X)\) using parametric approaches (e.g., generalized linear models) or more flexible data-adaptive approaches (e.g., regularized regression methods, random forests, or other machine learning methods). We can compare treatments \(a\) and \(a{'}\) in the target population by taking the difference \(\widehat{\psi }\left(a,{a}^{{'}}\right)=\widehat{\phi }\left(a\right)-\widehat{\phi }\left({a}^{{'}}\right)\). The estimators \(\widehat{\phi }\left(a\right)\) and \(\widehat{\psi }\left(a,{a}^{{'}}\right)\) can be interpreted as estimators of the potential outcome mean and the average treatment effect in the target population, respectively, provided the identifiability conditions hold and that the model \({\widehat{g}}_{a}(X)\) is correctly specified. Standard errors for \(\widehat{\phi }\left(a\right)\) and \(\widehat{\psi }\left(a,{a}^{{'}}\right)\) can be obtained using bootstrapping.

Defining the Target Population

An important aspect of transportability methods is the focus on specifying the target population and obtaining a sample from it. Target populations should be chosen based on substantive considerations by specifying the population in which the users of the meta-analysis want to better understand the impact of one or more treatments. For example, eligibility criteria for target populations can be defined on the level of a country, region, hospital, or program. It is also possible to examine subgroups of patients within a target population who share characteristics enabling more specific treatment recommendations. Once the target population has been specified, a suitable sample needs to be obtained. Causal inferences are interpreted in the context of the target population and are expected to change depending on the distribution of prognostic indicators or treatment modifiers in the target population. When the identifiability conditions hold and representative data from the target population are available, transportability analyses enable the comparison of multiple treatments by estimating how the target population would have responded to each of the treatments being considered. Thus, estimates for causal parameters in the target population can be compared overcoming between-trial differences in participant characteristics. In the spirit of precision medicine (Dahabreh et al., 2016), such comparative effectiveness results can help decision-makers identify the most promising treatments for specific target populations or subpopulations. Transportability methods thus provide a promising solution to between-trial differences in participant characteristics and differences between trials and the target population. The remainder of this manuscript will focus on applying these approaches to three adolescent HIV prevention trials among youth receiving mental health care.

Transportability Analysis with Data from HIV Prevention

Methods

We harmonized individual participant data across three clinical trials of HIV prevention designed for adolescents in mental health care (NCT00603369, NCT00500487, NCT00496691). The trials evaluated approaches focusing on emotion regulation (ER; Brown et al., 2011, 2017), family processes (FM; e.g., parent-adolescent communication, parental monitoring and supervision; Barker et al., 2019a; Brown et al., 2014), and HIV-related knowledge and skills training (ST; e.g., partner negotiation, condom-use; Brown et al., 2011). All trials included a general-health promotion control (HP; e.g., sleep, nutrition, sexual health). Trials included youth ages 13 to 19 sampled from mental health hospitals and clinics, or from therapeutic schools. All trials included interim (3–6 months) and extended follow-up (9–12 months) assessments. The harmonized dataset represents the largest collection of adolescents receiving mental health services who participated in HIV prevention trials (n = 1323), with 4081 completed assessments out of a 4875 possible. The largest of the three trials individually randomized adolescents to three arms, while the other two were implemented in therapeutic schools and used cluster-randomized cross-over designs. Specifically, participants received only one treatment, and to avoid crosstalk among students, only one treatment was administered at each school during each semester. Schools were randomized to treatment and rerandomized each semester so that by the end of the trial(s), all treatments were administered in each school. Individual participants only received one treatment. The larger of the two studies was a three-arm trial and the smaller a two-arm trial. Both cluster-randomized trials had small intra-class correlations (i.e., < 0.01) suggesting that within-cluster dependence was not influential. Although individuals were not randomized in these two trials, we proceeded with analyses assuming that treatment assignment was essentially random given covariates. Baseline characteristics for the three trials are presented in Table 1 and show marked between-trial differences in patient characteristics.

Table 1 Baseline characteristics

Measures

Measures in all three trials included demographics, psychiatric diagnosis using the Computerized Diagnostic Interview Schedule for Children (C-DISC; Schwab-Stone et al., 1996), functional impairment measured using the Columbia Impairment Scale (CIS; Bird et al., 1993), and risk behaviors associated with HIV transmission using the Adolescent Risk Behavior Assessment (ARBA; Donenberg et al., 2001). Also common across trials were 11 items addressing HIV knowledge and four items addressing self-efficacy for condom use. Items were averaged for HIV knowledge and for self-efficacy for condom use. All trials used audio-computer–assisted self-interviews to collect measures. The primary outcome was defined as reporting any occurrence of condomless sex across the extended follow-up period (i.e., across all follow-up assessments). We selected this cumulative definition to help address the developmentally expected sparsity and instability in adolescent sexual partnerships (Barker et al., 2019b) by increasing the opportunity for observing the risk behavior.

Analytic Approach

The goal of the analysis was to estimate (1) potential outcome means for each treatment (ER, FM, ST, HP) in a target population, (2) average treatment effects comparing treatments in the target population, and (3) conditional average treatment effects within prespecified subgroups identified as being key subpopulations in HIV prevention due to elevated risk for HIV infection including Black or African American young men, Black or African American young women, Hispanic youth, and youth reporting substance use. There were not sufficient numbers of other key subpopulations in HIV prevention, such as youth identifying as a sexual or gender minority, to support statistical analysis.

Target Population

We were interested in drawing inferences about a target population of US adolescents receiving mental health treatment in routine clinical practice. Unfortunately, we were unable to obtain data from such a target population where covariates (e.g., previous sexual history, mental health symptoms) sufficient to satisfy the identifiability conditions had been assessed. In particular, some of the larger surveillance datasets for adolescent populations provide information for some of these variables but differ from the trial data in the assessment of mental health symptoms. Thus, to illustrate the methods, we used a 15% holdout sample from Trial 1 as a sample from a target population of youth actively receiving mental health care and 15% from Trial 2 as a sample from a target attending therapeutic schools. Only the covariate data was used from the holdout samples, and they were not used in the estimation of the outcome models. We used this somewhat artificial approach to illustrate the methods appropriate for drawing causal inferences about a new target population.

Estimation

We used logistic regression to estimate outcome models. The models included predictors that defined the subgroups of interest (i.e., gender, race, ethnicity, substance use) plus important predictors of the outcome including gender, ever engaging in vaginal or anal sex, any condomless sex in the past 3–6 months, and having an externalizing diagnosis. These important predictors were identified using a conditional random forest approach with conditional permutation variable importance (Strobl et al., 2009).

We built two types of outcome models using trial data. The first type was built to conduct one-trial-at-a-time transportability analyses, by extending inferences from each trial to the two target populations. Thus, we built separate outcome models for each trial that included interactions between the covariates and treatments evaluated in the trial. These models were then applied to covariate distributions in the target populations to generate predicted probabilities (under each treatment) which were then averaged over each target population. As a point of comparison, we also estimated potential outcome means in each trial, in effect using the covariate distribution of each trial as representative of the trial’s underlying population.

The second type of outcome model was built to synthesize information across trials. We fit one model using data from all three trials that included interactions between the covariates and treatments. This model was then applied to the covariate distributions of the target populations. To estimate average treatment effects, we subtracted the four transported potential outcome means within each target population to estimate pairwise contrasts (i.e., ERvFM, ERvST, ERvHP, FMvST, FMvHP, STvHP). Conditional average treatment effects were calculated by marginalizing over subgroups of each target population.

Provided the conditions of consistency, exchangeability across populations, and positivity of trial participation hold for each of the trials, and that outcome models are correctly specified, we expect potential outcome mean estimates for the target population to be the same for trials that evaluated the same treatment (Dahabreh et al., 2020a, 2019a). Thus, we compared the estimates from one-trial-at-a-time transportability analyses for each treatment evaluated in each trial. Marked differences in these estimates may suggest one or more identifiability conditions are violated for one or more of the trials and/or for one or more of the treatments. For example, differences among estimates can suggest lack of exchangeability (i.e., omission of important covariates from outcome models), or positivity violations, or variation in how treatments were administered across trials. Nevertheless, lack of differences in the estimates does not guarantee that the conditions hold. Furthermore, it is possible for the identifiability conditions to hold for the aggregate collection of trials, but not for each trial. For example, positivity of participation may be violated for individual trials but not for the aggregate collection (e.g., when a specific trial does not have data on a subgroup of the target population, but at least one other trial in the meta-analysis does; Dahabreh et al., 2019a).

Our data have features that complicate application of conventional approaches to individual participant data meta-analysis, including small number of trials, treatments that were not evaluated in all trials, and a mixture of two and three arm trials. More importantly, when effects are heterogeneous over baseline covariates, conventional meta-analysis methods estimate different parameters compared to the transportability methods. To numerically compare our estimates with those of conventional approaches, we estimated the average treatment effect for the STvsHP contrast using data from the two trials that reported a direct comparison between ST and HP groups. We fit a linear model using study fixed effects with the same covariates as the transportability analyses. Missing data were addressed using multiple imputations.

Bootstrap Sampling and Missing Data

We used nonparametric bootstrap to obtain standard errors and 95% confidence intervals. Participants were sampled with replacement from each of treatment within each trial. Nonparametric bootstrap was selected because it does not require parametric assumptions as is required for parametric bootstrapping. Missing data on baseline characteristics ranged from 0.2% for gender to 8% for condom use self-efficacy. For the outcome, 65% of participants completed all assessments and 11% completed no follow-up assessments. Assuming data was missing at random (given baseline covariates), we performed a single imputation within each bootstrapped sample using predictive mean matching with chained equations (van Buuren & Groothuis-Oudshoorn, 2011). The imputation model included trial and treatment, all outcome assessments, and the covariates are listed in Table 1. We then obtained estimates of potential outcome means and treatment effects in each of 1000 bootstrapped and imputed samples (Schomaker & Heumann, 2018).

Results

Estimated potential outcome means in the target populations from one-trial-at-a-time transportability analyses are shown in Fig. 1. There are marked differences between estimates from transportability analyses and those for each trial’s underlying population. For both target populations, there appear to be some differences between one-trial-at-a-time transportability analyses for trials that examined ST and HP, suggesting that there may be violations of the identifiability conditions. We will return to the implications of these between trial differences in the Discussion. Estimated potential outcome means combining data across trial are also depicted in Fig. 1 and suggest that that treatments are largely similar to one another.

Fig. 1
figure 1

Estimated potential outcome means. Original estimates were based on the covariate distribution within each trial. By trial, estimates were transported from each trial to the target populations. Combined estimates use causally interpretable meta-analysis to transport the potential outcome means to the target population. FM family process; ER emotion regulation; ST skills training; HP health promotion

Average treatment effects and conditional average treatment effects for subgroups of the target populations are shown in Fig. 2. These results suggest some variability in treatment response among subgroups, with some evidence that Black or African American young women in therapeutic schools may not respond as well to the ST treatment. There is also some evidence that Hispanic youth in both target populations may not respond as well to the FM treatment, which is consistent with recent work adapting the FM treatment to better meet the needs of this population (Lescano et al., 2020). The subgroup analyses, however, result in wide confidence intervals, suggesting that more evidence is needed before making clinical determinations regarding heterogeneity of treatment effects. Importantly, we were able to obtain estimates for subgroups defined by multiple covariates, such as race and gender, and to estimate treatment effects comparing treatments that had not been directly compared in any trial (ER vs. FM).

Fig. 2
figure 2

Estimated treatment effects. Figure summarizes average treatment effects for each treatment transported to the target samples for the full target sample as well as youth who recently used substances, Hispanic youth, Black or African American young women, and Black or African American young men. Mean average treatment effects across bootstrapped samples and 95% confidence intervals calculated using bootstrapped standard errors. FM family based; ER emotion regulation; ST skills training; HP health promotion

For comparison, using conventional meta-analysis methods to combine evidence from the two trials that directly compared ST to HP, the treatment effect was estimated to be −0.04 [95% CI −0.10; 0.02], which differed from the causally interpretable meta-analysis estimates when using a target population of youth receiving mental health care, 0.01 [−0.06; 0.09] or a target population of youth in therapeutic schools, 0.03 [−0.05; 0.12]. The numerical differences between the estimates likely reflect the fact that the coefficients in the conventional meta-analysis do not estimate the same parameter as the novel estimators described in this paper. In fact, in causally interpretable meta-analysis, the parameter is expected to change with each target population. Furthermore, the two approaches rely on different model specification assumptions and do not use the same data (e.g., conventional methods do not use data from the target population).

Discussion

This manuscript presents an application of recently proposed methods that allow for causal interpretation of individual-participant-data meta-analysis (Dahabreh et al., 2020a2019a). Under explicit causal and statistical modeling assumptions, the methods provide estimates of potential outcome means and treatment effects in a target population of substantive interest and its subgroups and enable comparisons among treatments that have not been compared directly in the same trial. As with any new analytical tool, more work is needed to understand the utility of the methods in practical applications.

Defining the Target Population

Clearly specifying a target population and obtaining a representative sample of that population are defining features of transportability analyses. Ideally, sample data from the target population include all predictors of the outcome and effect modifiers that have a different distribution in the collection of trials and target population; the same variables need to be measured in the trial data. In practice, it can be difficult to obtain such trial and target population data. For example, in our analyses, there were marked differences in which covariates were measured across the three trials limiting the number of potential covariates available for analysis. We were also unable to obtain data from a target population of adolescents in mental health care that assessed the same key covariates as the trials. Data quality also impacts the ability to quantify whether a given sample from the target population is representative of that population. In settings with rich data resources, such as hospital systems with electronic medical records, target populations can be specified and sampled, presuming that the medical records contain assessments of the same patient characteristics as the trial data. In other settings, it is important to continue developing data resources that utilize common data elements in clinical trials as well as in data sampled from target populations of interest (Cella et al., 2007; Ohmann et al., 2017; Polanin & Williams, 2016).

The analyses reported here attempt to address differences in the covariate distributions of the trial samples and the target population. The target population distribution can be estimated from data that are masked or otherwise altered to maintain confidentiality. Currently, the approach assumes that an individual contributes data to only one sample from the trials or target population. There are situations where an individual may contribute data to the target population sample (e.g., through a medical record) and to a trial sample. If overlap can be identified, then it can be easily addressed in statistical analyses; unidentifiable overlap is usually more challenging to address (Saegusa, 2019).

Evaluating Assumptions

The identifiability conditions necessary for causal inference in meta-analysis must be carefully considered in light of substantive knowledge. For identification of treatment effects in each trial, the consistency, exchangeability among treatment groups, and positivity of treatment assignment probability will usually hold in well-designed randomized controlled trials with well-defined treatments. The no-interference component of the consistency condition depends on the nature of the treatments being compared and the structure of the population underlying the trials. For instance, interference may be present when treatments are implemented on the level of pre-existing groups such as schools or communities.

Evaluating the additional identifiability conditions needed for transportability analyses is challenging. Comparing potential outcome means from each trial in target populations like those depicted in Fig. 1 can provide an indication that assumptions are violated either for individual trials or for the aggregate collection of trials. It is possible for assumptions to hold for some treatments but not others. For example, the differences between one-trial-at-a-time transportability analyses and analyses pooling data across trials for the ST and HP treatments suggest that one or more identifiability conditions were violated for one or more of the trials, in relation to the target population; these differences, however, do not pinpoint where violations occurred or their exact nature. In general, differences can indicate that exchangeability across populations does not hold (necessitating sensitivity analyses; Dahabreh et al., 2019d); that there exists outcome-relevant treatment variation across trials; that positivity of participation does not hold for one or more trials; or that models are incorrectly specified.

In our worked example, the analysis included many variables that have been linked to condomless sex. However, there are other prognostic indicators not included in the modeling because they were not assessed in all trials, including emotion regulation, family functioning, parent-adolescent communication, parental monitoring, partner characteristics, HIV stigma, abuse history, and parental psychopathology among others (Barker et al., 2012, 2019a; Hadley et al., 2015, 2017). If one or more of these variables predict the outcome and differ in distribution between the trials and the target, including them in the model will generally improve the validity of estimates.

Another explanation for the between trial discrepancy in potential outcome means is between trial differences in treatment content and/or implementation leading to outcome-relevant variation. The single version of each treatment component of the consistency condition needs additional consideration in the context of individual participant data meta-analysis as it requires a treatment to be similarly structured and implemented across trials (Hernán & VanderWeele, 2011; VanderWeele, 2009). This is perhaps one of the most difficult assumptions for behavioral trials as there are almost always some differences in how a treatment is designed and implemented across trials. In our example, the core content of each treatment type was similar across trials, yet there were differences in how treatments were implemented, such as number of sessions, length of sessions, and time between sessions. By combining the treatments across trials, we assume all variations in implementation do not impact treatment potency. Indeed, there were differences in how HP was administered with Study 3, having a shorter dose compared to how it was administered in the other 2 trials. An intriguing direction for future investigation would be further developing transportability methods to examine treatment variation between trials.

We evaluated the positivity of trial participation assumption by comparing distributions of estimated probability of participation in the collection of trials and the samples of the target populations. We also examined the distribution of individual trials with the samples of the target populations. Estimated probabilities were generated using logistic regression using the same covariates as the outcome models. We found adequate overlap in the distributions of the estimated probabilities, suggesting that the positivity condition was not grossly violated for variables included in our models.

Obtaining and Harmonizing Individual Participant Data

Obtaining data from investigators continues to be a primary challenge to using these and other methods that rely on individual data (Polanin & Williams, 2016). There has been a consistent effort in improving data accessibility by requiring researchers to make their data publicly available (Cella et al., 2007; Ohmann et al., 2017; Polanin & Williams, 2016). As these efforts continue to mature, more individual participant data will become available for use. It is an open question how data from trials prior to these efforts will be included as it is costly to acquire and prepare older datasets. Once trial data are publicly available and appropriately documented, data must be harmonized across trials. Harmonization entails equating variable names, coding, and where possible, meaning of the questions and measures used in each trial (Curran & Hussong, 2009). Using common data elements across trials facilitates harmonization. When common data elements are not present, there is significant cost in time and expertise required to appropriately harmonize data across trials. These costs need to be accounted for when planning analyses with individual participant data.

Systematically Missing Data

Properly addressing missing data is essential to obtaining valid inference from clinical trials and for individual participant data meta-analysis. Much has been written about addressing missing data when some covariate is unavailable from some, but not all, individuals in a trial; we addressed potential bias introduced by such missingness by imputing each bootstrapped sample under a missing-at-random assumption. When dealing with data from multiple trials, differences in which patient characteristics are assessed and in which instruments are used to measure them result in systematically missing data, where some covariates are not available for any individuals in one or more trials. There has been work addressing systematically missing data between trials (Curran et al., 2016, 2018; Hong et al., 2018; Jolani et al., 2015; Kunkel & Kaizar, 2017), but to our knowledge, this work has not considered transportability analyses. To address systematically missing data in transportability analyses, it will be necessary to define additional identifiability conditions and develop novel estimators.

Implications for HIV Prevention

Results from the worked example suggest that differences among treatments for condomless sex are minimal across the extended follow-up. There was limited evidence of effect modification by factors associated with elevated risk for HIV acquisition, including that Black or African American young women in therapeutic schools do not appear to benefit from the skills training compared to other treatments, and Hispanic youth tend not to respond to the family-based treatment as implemented in Study 1. These results are valid if the identifiability conditions and missing data assumptions are met, and models are correctly specified. As discussed previously, there is evidence of violations of the identifiability conditions in our analyses. Our results, therefore, should be interpreted with caution. Future work, collecting additional information on prognostic indicators of condomless sex and including more trials, could improve precision and allow a more thorough evaluation of the identifiability conditions by examining patterns when transporting inferences from different trials to the same target population (e.g., to explore differences in treatment content and implementation across trials).

Implications for Evidence Synthesis

The conditions that allow estimation of treatment effects in a target population using treatment and outcome data from multiple trials suggest new directions and opportunities for evidence synthesis. Perhaps the most important consideration is that the utility of inferences drawn by synthesizing evidence from diverse sources depends on both the quality of evidence in each source as well as the relevance of that evidence to the target population of interest. Inference is connected to the target population in that results will vary depending on how the target population is defined and which participant characteristics influence treatment response. Guidelines and reporting standards for evidence synthesis tend to focus on evidence quality with little emphasis on the relevance of evidence to the target population. Evidence synthesis would be strengthened by including more explicit discussion of the relation between trials included in a meta-analysis and the target population(s) of substantive interest, as well as more explicit consideration of the numerous assumptions undergirding results and conclusions.

Conclusions

Transportability methods promise to provide individual participant data meta-analysis results that have a causal interpretation and thus facilitate the assessment of comparative effectiveness among treatments in target populations of substantive interest and their subgroups, even when the treatments have not been compared in the same head-to-head trial. To realize this promise, effort is needed to improve measurement consistency and data availability across trials and samples from target populations.