G-computation demonstration in causal mediation analysis

Wang, Aolin; Arah, Onyebuchi A.

doi:10.1007/s10654-015-0100-z

G-computation demonstration in causal mediation analysis

METHODS
Published: 04 November 2015

Volume 30, pages 1119–1127, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

European Journal of Epidemiology Aims and scope Submit manuscript

G-computation demonstration in causal mediation analysis

Download PDF

Aolin Wang^1,2 &
Onyebuchi A. Arah^1,2,3

4810 Accesses
54 Citations
8 Altmetric
Explore all metrics

Abstract

Recent work has considerably advanced the definition, identification and estimation of controlled direct, and natural direct and indirect effects in causal mediation analysis. Despite the various estimation methods and statistical routines being developed, a unified approach for effect estimation under different effect decomposition scenarios is still needed for epidemiologic research. G-computation offers such unification and has been used for total effect and joint controlled direct effect estimation settings, involving different types of exposure and outcome variables. In this study, we demonstrate the utility of parametric g-computation in estimating various components of the total effect, including (1) natural direct and indirect effects, (2) standard and stochastic controlled direct effects, and (3) reference and mediated interaction effects, using Monte Carlo simulations in standard statistical software. For each study subject, we estimated their nested potential outcomes corresponding to the (mediated) effects of an intervention on the exposure wherein the mediator was allowed to attain the value it would have under a possible counterfactual exposure intervention, under a pre-specified distribution of the mediator independent of any causes, or under a fixed controlled value. A final regression of the potential outcome on the exposure intervention variable was used to compute point estimates and bootstrap was used to obtain confidence intervals. Through contrasting different potential outcomes, this analytical framework provides an intuitive way of estimating effects under the recently introduced 3- and 4-way effect decomposition. This framework can be extended to complex multivariable and longitudinal mediation settings.

Mendelian randomisation for mediation analysis: current methods and challenges for implementation

Article Open access 07 May 2021

Estimating the natural indirect effect and the mediation proportion via the product method

Article Open access 20 November 2021

Testing the Causal Direction of Mediation Effects in Randomized Intervention Studies

Article 21 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Recent work has considerably advanced the definition, identification and estimation of different controlled or natural effects in causal mediation analysis. As a result, various estimation methods such as linear structural equation modeling, outcome and mediator regression-based methods, parametric g-computation, inverse-probability-weighted (IPW) fitting of marginal structural models (MSMs), and sequential g-estimation [1, 2] have been applied to mediation settings. With some notable exceptions [3–5], relatively few approaches can incorporate general types of mediator and outcome variables. The simulation-based approach introduced by Imai et al. [3] and the regression-based approach proposed by Valeri and VanderWeele [5] are increasingly popular approaches; the former may be unfamiliar to epidemiologists, and the latter requires different regressions or approximations for different mediator and outcome types. Meanwhile, g-computation [1], as an alternative for computing marginal effects over IPW fitting of MSM [6], holds promise to provide a unified framework for effect estimation in causal mediation analysis, especially given its capability in dealing with time-varying exposure and confounding [4, 7]. However, a didactic demonstration of the application of g-computation in causal mediation analysis, reflecting recent decompositions for mediation and interaction, is lacking.

In this paper, we demonstrate the utility of (parametric) g-computation in estimating various components of the total effect, such as (1) natural direct and indirect effects, (2) standard and stochastic controlled direct effects, and (3) reference and mediated interactions, using standard statistical software. We focus on marginal effects (standardized over covariates). The current approach extends the previous work on g-computation demonstration for total effect [6] and the gformula package (mediation option) in Stata [4] by showing the actual steps in estimation and incorporating estimation for various component effects under 2-, 3- and 4-way effect decomposition [8, 9]. The paper is organized as follows: we will first review the definition and identification criteria for different effects in mediation context. Then we will review the g-computation algorithm and introduce steps for mediation analysis. After an illustrative example using a partially simulated data set, we will discuss the strengths and limitations of g-computation and its relation to other existing estimation procedures. Readers familiar with the background material on mediation analysis under the potential outcomes framework can go directly to g-computation steps section.

Notation and definitions

Let X denote the exposure of interest, Y the outcome of interest, M the mediator of interest, and Z a set of covariates not affected by the exposure but which are assumed to be sufficient for confounding control for total, direct and indirect effects estimation. Throughout, we assumed X preceded M, which preceded Y. Let Y _x and M _x denote respectively the potential values of the outcome and mediator that would have occurred had exposure X been set, possibly counter to fact, to a specific value x. Similarly, let Y _xm denote the potential value of Y that would have occurred had X and M been set, possibly counter to fact, to x and m respectively. We use \(Y_{{xM_{{x^{*} }}}}\) to express potential outcome value had the exposure X been set to x and M to \(M_{{x^{*} }}\). Let x (index) and \(x^{*}\) (reference) denote two values of the exposure we wish to compare, and m (index) and \(m^{*}\) (reference) denote two controlled values of the mediator we wish to compare.

Total effect (TE) compares exposure level x to \(x^{*}\), while allowing the mediator to obtain its natural value under each exposure level. Thus, by assuming generalized consistency [10], TE can also be defined as \({\text{E}}\left[ {Y_{{xM_{x} }} - Y_{{x^{*} M_{{x^{*} }} }} } \right]\). TE can be decomposed into different types of component effects. There are two-way, three-way, and four-way decompositions of the total effect as presented in Table 1 [8, 9, 11, 12]. The counterfactual definitions are listed in Table 2 (left column). The choice of effect decomposition should be guided by substantive research questions (Table 3).

Table 1 Summary of effect decomposition in causal mediation analysis

Full size table

Table 2 Definition and empirical analog of total effect and component effects

Full size table

Table 3 Potential research questions related to various effects under effect decomposition

Full size table

There are four types of “direct” effects. The standard controlled direct effect (CDE) compares exposure level x to \(x^{*}\) while fixing the mediator to a specific level. The CDE estimates the effect of X on Y while fixing M to m for every individual in the population and it can be different for different levels of m [11, 12]. The stochastic CDE (CDE_sto) compares exposure level x to \(x^{*}\) while randomizing the mediator to a pre-specified distribution \(M^{\prime }\). Accordingly, the stochastic CDE subsumes both the standard CDE and the stochastic mediation contrast [13] in that the standard CDE corresponds to \(M^{\prime }\) being a constant m for the total population (i.e., full intervention) while the stochastic mediation contrast corresponds to \(M^{\prime }\) being a constant for a subset of the population and being the observed distribution for the rest (i.e., partial intervention).

The average pure direct effect (PDE) compares exposure level x to \(x^{*}\) while the mediator M is set to the natural value it would have attained under the reference level \(x^{*}\) of exposure (i.e., \(M_{{x^{*} }}\)). Accordingly, the average total direct effect (TDE) differs from PDE in that the mediator M is set to the natural value it would have attained under the index level x of exposure (i.e., M _x). Direct effects are relative to the mediator M of interest, that is, they are effects through alternative pathways other than through M.

The average pure indirect effect (PIE) compares mediator level M _x to \(M_{{x^{*} }}\) while setting exposure to reference level \(x^{*}\). The average total indirect effect (TIE) differs from the PIE in that the exposure is set to index level x.

Reference interaction effect (RIE) and mediated interaction effect (MIE) capture the effect of X on Y due to interaction only and the effect of X on Y due to both interaction and mediation respectively [9]. RIE and MIE are sometimes combined to reflect the effect of X on Y due to overall interaction and termed “portion attributable to interaction” (PAI) [9].

Assumptions for identification and estimation

To identify and estimate the effect decomposition quantities, we invoke the stable unit treatment value assumption (SUTVA) [1, 15], and assumptions of consistency [16], conditional exchangeability (no-uncontrolled-confounding), and positivity [17]. The conditional exchangeability assumption for mediation analysis includes the following [12, 18]: (1) the effect of the exposure X on the outcome Y is unconfounded conditional on a set Z of measured covariates; (2) the effect of the mediator M on Y is unconfounded conditional on both X and Z; (3) the effect of X on M is unconfounded conditional on Z. For identifying standard CDE, (1) and (2) are sufficient but for stochastic CDE, all three assumptions are needed. Successful randomization of the exposure will support the assumption of no uncontrolled confounding of the exposure–mediator and exposure–outcome relations but will not guarantee the absence of uncontrolled confounding of the mediator–outcome relation [19]. In the presence of possible violation of this assumption, sensitivity analysis are needed [3, 20]. To identify natural effects, a fourth conditional exchangeability assumption is needed: (4) none of the mediator–outcome confounders are affected by exposure. Assumption (4) is known as the cross-world independence assumption [21] because it requires that, conditional on Z, the mediator that would have been observed in a world under \(X = x^{*}\) is independent of the outcome that would have been observed in a world under X = x (i.e., ). This assumption is problematic because these two variables \(M_{{x^{*} }}\) and \(Y_{{xM_{x} }}\) can never be observed together [18, 22]. Recent research proposed identification criteria [23] and effect decomposition [18] aimed at circumventing the violation of this fourth assumption. However, this issue is beyond the scope of this article. In addition, we assume no selection bias and measurement error. Under the above assumptions, different types of effect can be nonparametrically identified and estimated using the empirical analogs listed in Table 2 (right column). For the current paper, we adopted a fully parametric approach, i.e., positing parametric (regression) models for each expectation in these empirical analogs to estimate their parameters from the observed data and then integrating over the covariate and/or mediator distribution [24]. We further assumed no model misspecification.

G-computation

G-computation algorithm was first introduced by Robins [1] to estimate the causal effect of a time-varying exposure in the presence of time-varying confounders that are affected by exposure, a scenario where traditional regression-based methods would fail. In recent years, several didactic examples were given in the literature [6, 25, 26], promoting the use of this causal analytic technique. Increasingly, more studies have applied this methodology in estimating the effect of dynamic treatment regimes [27, 28] or projecting the impact of hypothetical interventions [29–33]. In a simple setting with a single-time exposure and an outcome, g-computation can be seen as the generalization of standardization. Accessible examples of g-computation with detailed discussion of its strengths and limitations can be found elsewhere [2, 4, 6, 30].

G-computation steps in causal mediation analysis

The g-computation steps have been summarized in Fig. 1. Step 1 involves obtaining the parameters of the assumed covariate distributions, and fitting the assumed models for the mediator (M model) and the outcome (Y model) using observed data. The key covariates needed for the M model are confounders of the exposure–mediator relation. The key covariates needed for the Y model are confounders of the exposure–outcome relation and mediator–outcome relation. To avoid simulating the covariate set Z, one can replace steps 1a and 2a with resampling (Fig. 1). Step 2 entails simulating the covariate set Z (step 2a), an exposure intervention variable X (step 2b), and the potential mediator (step 2c), and outcome Y (step 2d) sequentially for J (J can be as large as computationally feasible) copies of the original sample. The simulation repetition done here is to reduce Monte Carlo simulation error. The intervention variable X should be distinguished from the observed exposure variable as the intervention X and the simulated covariates are marginally independent of each other. In step 2c, we simulate each potential mediator as a function of the simulated covariates and intervention X from the previous steps (2a and 2b), using the parameters obtained from the M model in step 1b. Similarly, in step 2d, we simulate a potential outcome variable that corresponds to each specific type of effect as a function of the simulated covariates, exposure intervention, and mediator from the previous steps, using the parameters obtained from the Y model in step 1c. Step 3 involves regressing each different potential outcome variable on the intervention variable X to obtain estimates of each marginal effect using the pooled data with J copies of the original sample. Repeat step 2–3 on K (usually 200 or more) bootstrapped samples taken at random with replacement from the original data. The Wald type 95 % confidence interval (CI) was calculated as: point estimate ±1.96 × SD, where SD was the standard deviation of the K resultant point estimates from the final regression in the third step.

Illustration

We used a directed acyclic graph (DAG) [34] to represent the data generating process for the illustration example (Fig. 2). We used the India sample from the World Health Survey (WHS) [35]. We used all the observed covariates to simulate exposure smoking, mediator body mass index (BMI, 5-unit increase), and outcome composite health score (0–100) sequentially according to the data generating process shown in Fig. 2. Covariates for confounding control included age, gender, education, urbanicity, and depression. Detailed description for generating this partially simulated data set can be found in the supplementary materials. In the illustrative example, we will focus on the interpretations for PDE and TIE from the most common decomposition used in epidemiology and the interpretations for component effects based on the recently introduced 4-way decomposition. Since smoking was binary, we used 1 to represent “yes” and 0 “no”. A second illustrative example using binary exposure, mediator and outcome was included in the supplementary materials. All analyses were done using SAS version 9.4 (SAS Institute Inc., Cary, NC) and the accompanying SAS code can be found in the supplementary materials.

To estimate different component effects of smoking on health, we implemented the following steps:

Step 1 Obtaining empirical parameters
- (1a) We obtained the marginal expectation of each variable except the outcome, and the standard deviation for the continuous age variable.
- (1b) The mediator BMI was regressed on smoking, age, gender, education, urbanicity and depression to obtain the regression coefficients and root mean square error (RMSE) for the linear M model.
- (1c) The outcome, overall health score, was then regressed on smoking, BMI, smoking × BMI, age, gender, education and depression to obtain the regression coefficients and RMSE for the linear Y model.
Step 2 Simulating the potential mediators and outcomes
- (2a) We created 1000 copies of the original sample and simulated age, gender, education, urbanicity, and depression that followed the same distribution as the observed variables.
- (2b) We simulated a smoking intervention variable (X) that followed the observed smoking prevalence but was marginally independent of all simulated covariates.
- (2c) We simulated each potential BMI variable as a function of the smoking intervention, age, gender, education, urbanicity and depression (Table 4, simulating M), using the regression coefficients and RMSE from the M model fit in (1b).
  Table 4 Equations used to simulate potential mediators and outcomes in step 2 of the G-computation marginal structural model for the illustrative example
  Full size table
- (2d) We simulated a potential health score variable for each type of effect as a function of the smoking intervention, potential BMI from (2c), product term between smoking intervention and potential BMI, age, gender, education and depression (Table 4, simulating Y), using the regression coefficients and RMSE from the Y model fit in (1c).

We will use PDE and TIE as examples to explain steps (2c) and (2d) further. For binary exposure, recall that PDE compares X = 1 to X = 0 while setting the mediator M to M ₀ (natural value under reference exposure). In (2c), we simulated the potential BMI variable (M ₀) as a function of non-smoking (setting X = 0 in the equation for simulating M) and other determinants of BMI. Next, we simulated the potential health score variable (Y _PDE) as a function of the smoking intervention variable (X), potential BMI variable (M ₀) from (2c), and other determinants of health in (2d). On the other hand, TIE compares mediator level M ₁ to M ₀ while setting exposure to index level 1. In (2c), we simulated the potential BMI variable (M _x) as a function of smoking intervention (setting X = x in the equation for simulating M) and other determinants of BMI. Then, we simulated the potential health score variable (Y _TIE) as a function of smoking (setting X = 1 in the equation for simulating Y), potential BMI variable (M _x) from (2c), and other determinants of health in (2d). In this way, the potential BMI variable (M _x) transmitted the effect of smoking intervention to health.

Step 3 Fitting final marginal structural models (MSMs)

We regressed each different potential health score variable on the smoking intervention to obtain point estimates of each marginal effect using the pooled sample. We repeated step 2–3 on 200 bootstrapped samples of the same size taken at random with replacement from the original data to obtain Wald type 95 % CIs.

Results from the illustrative example were presented in Table 5. This example is for illustration purpose and thus the results were not intended for quotation. Estimates followed the effect decompositions as described in Table 1. Smoking had an overall negative impact on health (TE: −0.96, 95 % CI −1.79, −0.13), but the majority of this impact was through pathways other than changing BMI (PDE: −0.70, 95 % CI −1.54, 0.14). When BMI was fixed at 24 for everyone, smoking did not appear to affect health directly (CDE: 0.27, 95 % CI −0.95, 1.50). In a hypothetical intervention where BMI was no longer affected by smoking and other covariates, smoking still had a negative impact on health (CDE_sto: −0.81, 95 % CI −1.63, 0.02). The impact of smoking on health that was due to interaction with BMI (RIE: −0.99, 95 % CI −1.71, −0.26) was much larger than the part that was due to both mediation and interaction (MIE: −0.14, 95 % CI −0.27, −0.01). The presence of such smoking-BMI interaction contributed to the difference seen when comparing CDE to PDE and CDE_sto.

Table 5 Effect estimate (95 % CI) for the illustrative example using g-computation of marginal structure models (N = 5326)

Full size table

Discussion

In this article we demonstrated the utility of parametric g-computation in estimating various marginal effects under different and detailed effect decompositions as well as stochastic controlled direct effects, using Monte Carlo simulations in standard statistical software. To our knowledge, this is the first use of g-computation for 3- and 4-way decomposition of effects recently introduced by VanderWeele [9]. Our approach yielded similar results as those obtained from VanderWeele’s approach [9] (online supplementary materials). However, marginal (standardized) effect measures obtained via g-computation approach are not conceptually equal to the conditional (on covariates) effect measures obtained from the latter approach and results from the two approaches may differ [36]. Alternative imputation [37] or simulation [3] based methods are also available for common mediation parameters.

G-computation has several strengths. It uses models for the outcome and the mediator, which produces more efficient estimates (with narrower confidence intervals) than the weighting approaches that use models for the mediator and/or the exposure [37, 38]. It can be used to estimate various types of effect of interest, incorporate nonlinearities and exposure-covariate and mediator-covariate interactions, and deal with general types of outcome, exposure and mediator. This simulation-based approach can be used to estimate various effects on both difference and ratio scales.

However, g-computation is not without its limitations. The parametric g-computation method applied in mediation settings, like the general g-formula, relies on a correctly specified model for the outcome. For natural effects specifically, it additionally requires that the model for the mediator is correctly specified, as with other approaches published previously [38–42]. When such parametric distribution for M is in doubt, a distribution-free approach with regard to the mediator [43] can be used. Alternative approaches are to use non-parametric g-computation method that combines bootstrapping and simulation as suggested in Imai et al. [3], or implement a doubly robust estimator that requires at least one of the model for exposure and mediator being correctly specified [37]. In addition, the computation time depends on the sample size and the number of covariates. As these two numbers increase, a random subset of the sample can be selected to perform the Monte Carlo simulation [4].

G-computation in mediation analysis, especially parametric g-formula implemented via Monte Carlo simulation, can be seen as a special application of the longitudinal time-varying g-computation formula [1, 7]. In this case, both post-baseline exposure and confounders that are affected by exposure can be seen as mediators. The steps are similar in simulating the baseline confounders and exposure intervention first and then the consequences of the exposure following the data structure represented in a specific DAG, though additional assumptions are needed in the mediation setting. G-computation as a unifying and flexible framework will gain popularity with the increase in applications of mediation analysis to answer mechanistic questions about either contextual or individual level causes [13, 44, 45].

By showing the steps for g-computation in estimating different quantities of interest in causal mediation analysis, we hope to encourage a wider audience of applied researchers to implement this framework, using software packages of their choice. Given the growing interest in adopting and applying complex systems approaches to examine complex disease etiologies, this method, especially with its simulation component, will be an important intermediate step towards this journey [46].

References

Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–512.
Article Google Scholar
Robins JM, Hernan MA. Estimation of the causal effects of time-varying exposures. In: Verbeke G, Davidian M, Fitzmaurice G, Molenberghs G, editors. Longitudinal data analysis. London: Chapman and Hall/CRC; 2009. p. 553–99.
Google Scholar
Imai K, Keele L. Tingley DA general approach to causal mediation analysis. Psychol Methods. 2010;15(4):309–34.
Article PubMed Google Scholar
Daniel RM, De Stavola BL, Cousens SN. gformula: estimating causal effects in the presence of time-varying confounding or mediation using the g-computation formula. Stata J. 2011;11(4):479–517.
Google Scholar
Valeri L, Vanderweele TJ. Mediation analysis allowing for exposure–mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;18(2):137–50.
Article PubMed Central PubMed Google Scholar
Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173(7):731–8.
Article PubMed Central PubMed Google Scholar
VanderWeele T, Tchetgen ET. Mediation analysis with time-varying exposures and mediators. Harvard Univ Biostat Work Pap Ser. 2014. Working Paper 168. http://biostats.bepress.com/harvardbiostat/paper168.
VanderWeele TJ. A three-way decomposition of a total effect into direct, indirect, and interactive effects. Epidemiology. 2013;24(2):224–32.
Article PubMed Central PubMed Google Scholar
VanderWeele TJ. A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 2014;25(5):749–61.
Article PubMed Central PubMed Google Scholar
Vansteelandt S. Estimation of direct and indirect effects. In: Berzuini C, Dawid P, Bernardinelli L, editors. Causality: statistical perspectives and applications. Chichester: Wiley; 2012. p. 126–50.
Chapter Google Scholar
Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3(2):143–55.
Article CAS PubMed Google Scholar
Pearl J. Direct and indirect effects. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. San Francisco: Morgan Kaufmann; 2001. p. 411–20. http://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf.
Naimi AI, Moodie EEM, Auger N, Kaufman JS. Stochastic mediation contrasts in epidemiologic research: interpregnancy interval and the educational disparity in preterm delivery. Am J Epidemiol. 2014;180(4):436–45.
Article PubMed Google Scholar
Rubin DB. Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by Basu. J Am Stat Assoc. 1980;75(371):591–3.
Google Scholar
Rubin DB. Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990;5:472–80.
Google Scholar
Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60.
Article CAS PubMed Google Scholar
Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60:578–86.
Article PubMed Central PubMed Google Scholar
Vanderweele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator–outcome confounder. Epidemiology. 2014;25:300–6.
Article PubMed Central PubMed Google Scholar
Imai K, Keele L, Tingley D, Yamamoto T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am Polit Sci Rev. 2011;105:765–89.
Article Google Scholar
VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–51.
Article PubMed Central PubMed Google Scholar
Richardson TS, Robins JM. Single world intervention graphs (SWIGs): a unification of the counterfactual and graphical approaches to causality. 2013. https://www.csss.washington.edu/Papers/wp128.pdf.
Naimi AI, Kaufman JS, MacLehose RF. Mediation misgivings: ambiguous clinical and public health interpretations of natural direct and indirect effects. Int J Epidemiol. 2014;43(5):1656–61.
Article PubMed Google Scholar
Tchetgen Tchetgen EJ. Vanderweele TJ. Identification of natural direct effects when a confounder of the mediator is directly affected by exposure. Epidemiology. 2014;25(2):282–91.
Article PubMed Google Scholar
Daniel RM, De Stavola BL, Cousens SN, Vansteelandt S. Causal mediation analysis with multiple mediators. Biometrics. 2015;71:1–14.
Article PubMed Central CAS PubMed Google Scholar
Daniel RM, Cousens SN, De Stavola BL, Kenward MG, Sterne JAC. Methods for dealing with time-dependent confounding. Stat Med. 2013;32:1584–618.
Article CAS PubMed Google Scholar
Keil AP, Edwards JK, Richardson DB, Naimi AI, Cole SR. The parametric G-formula for time-to-event data: intuition and a worked example. Epidemiology. 2014;25:889–97.
Article PubMed Central PubMed Google Scholar
Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Stat Biosci. 2011;3:119–43.
Article PubMed Central PubMed Google Scholar
Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, et al. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Stat Med. 2012;31:2000–9.
Article PubMed Central PubMed Google Scholar
Ahern J, Hubbard A, Galea S. Estimating the effects of potential public health interventions on population disease burden: a step-by-step illustration of causal inference methods. Am J Epidemiol. 2009;169:1140–7.
Article PubMed Central PubMed Google Scholar
Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int J Epidemiol. 2009;38:1599–611.
Article PubMed Central PubMed Google Scholar
Cole SR, Richardson DB, Chu H, Naimi AI. Analysis of occupational asbestos exposure and lung cancer mortality using the g formula. Am J Epidemiol. 2013;177:989–96.
Article PubMed Central PubMed Google Scholar
Danaei G, Pan A, Hu FB, Hernán MA. Hypothetical midlife interventions in women and risk of type 2 diabetes. Epidemiology. 2013;24:122–8.
Article PubMed Central PubMed Google Scholar
Garcia-Aymerich J, Varraso R, Danaei G, Camargo CA, Hernán MA. Incidence of adult-onset asthma after hypothetical interventions on body mass index and physical activity: an application of the parametric g-formula. Am J Epidemiol. 2014;179:20–6.
Article PubMed Central PubMed Google Scholar
Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–88.
Article Google Scholar
World Health Organization. World Health Survey: guide to administration and question by question specifications. Geneva. 2002. http://www.who.int/healthinfo/survey/whsshortversionguide.pdf. Accessed 22 Oct 2015.
Simpson EH. The interpretation of interaction in contingency tables. J R Stat Soc Ser B. 1951;13:238–41.
Google Scholar
Vansteelandt S, Bekaert M, Lange T. Imputation strategies for the estimation of natural direct and indirect effects. Epidemiol Method. 2012;1(1):131–58.
Article Google Scholar
Tchetgen EJT, Shpitser I. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness and sensitivity analysis. Ann Stat. 2012;40(3):1816–45.
Article Google Scholar
Tchetgen Tchetgen E, Shpitser I. Semiparametric estimation of models for natural direct and indirect effects. Harvard Univ Biostat Work Pap Ser. 2011.
Van der Laan MJ, Petersen ML. Direct effect models. Int J Biostat. 2008;4(1):1–27.
VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20(1):18–26.
Article PubMed Google Scholar
Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci. 2010;25(1):51–71.
Article Google Scholar
Albert JM. Distribution-free mediation analysis for nonlinear models with confounding. Epidemiology. 2012;23(6):879–88.
Article PubMed Central PubMed Google Scholar
Zhang YT, Laraia BA, Mujahid MS, et al. Does food vendor density mediate the association between neighborhood deprivation and BMI? Epidemiology. 2015;26(3):344–52.
Article PubMed Central PubMed Google Scholar
Jackson JW, VanderWeele TJ, Viswanathan A, Blacker D, Schneeweiss S. The explanatory role of stroke as a mediator of the mortality risk difference between older adults who initiate first- versus second-generation antipsychotic drugs. Am J Epidemiol. 2014;180(8):847–52.
Article PubMed Central PubMed Google Scholar
Hernán MA. Invited commentary: agent-based models for causal inference—reweighting data and theory in epidemiology. Am J Epidemiol. 2015;181(2):103–5.
Article PubMed Google Scholar
Pearl J. The causal mediation formula—a guide to the assessment of pathways and mechanisms. Prev Sci. 2012;13(4):426–36.
Article PubMed Google Scholar
Imai K, Keele L, Tingley D, Yamamoto T. Unpacking the black box: learning about causal mechanisms from experimental and observational studies. Am Polit Sci Rev. 2011;105(4):765–89.
Article Google Scholar

Download references

Funding

AW was supported by a doctoral scholarship from the Chinese Scholarship Council (CSC), and the Dissertation Year Fellowship from the University of California, Los Angeles. OAA was partly supported by grant R01-HD072296-01A1 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The authors benefited from facilities and resources provided by the California Center for Population Research at UCLA (CCPR), which receives core support (R24-HD041022) from the Eunice Kennedy Shriver NICHD.

Author information

Authors and Affiliations

Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles (UCLA), 650 Charles E. Young Drive South, Los Angeles, CA, 90095-1772, USA
Aolin Wang & Onyebuchi A. Arah
California Center for Population Research (CCPR), Los Angeles, CA, USA
Aolin Wang & Onyebuchi A. Arah
UCLA Center for Health Policy Research, Los Angeles, CA, USA
Onyebuchi A. Arah

Authors

Aolin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Onyebuchi A. Arah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aolin Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 86 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, A., Arah, O.A. G-computation demonstration in causal mediation analysis. Eur J Epidemiol 30, 1119–1127 (2015). https://doi.org/10.1007/s10654-015-0100-z

Download citation

Received: 19 May 2015
Accepted: 28 October 2015
Published: 04 November 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10654-015-0100-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

G-computation demonstration in causal mediation analysis

Abstract

Similar content being viewed by others

Mendelian randomisation for mediation analysis: current methods and challenges for implementation

Estimating the natural indirect effect and the mediation proportion via the product method

Testing the Causal Direction of Mediation Effects in Randomized Intervention Studies

Introduction

Notation and definitions

Assumptions for identification and estimation

G-computation

G-computation steps in causal mediation analysis

Illustration

Discussion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 86 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

G-computation demonstration in causal mediation analysis

Abstract

Similar content being viewed by others

Mendelian randomisation for mediation analysis: current methods and challenges for implementation

Estimating the natural indirect effect and the mediation proportion via the product method

Testing the Causal Direction of Mediation Effects in Randomized Intervention Studies

Introduction

Notation and definitions

Assumptions for identification and estimation

G-computation

G-computation steps in causal mediation analysis

Illustration

Discussion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 86 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation