1 Introduction

This article is motivated by an analysis of the English Longitudinal Study of Ageing, a longitudinal cohort study of individuals aged 50 and older living in the community in England, which follows participants biennially since 2002/03 (Steptoe et al. 2013; Davies et al. 2017). In previous studies, it was shown that both self-reported hearing loss and loneliness are significantly associated with a higher risk of dementia (Davies et al. 2017; Davies-Kershaw et al. 2018; Rafnsson et al. 2020). The question of interest, which we shall address here, is whether loneliness mediates the impact of hearing loss on incident physician-diagnosed dementia.

The focus on time-to-event endpoints (dementia) complicates the planned mediation analysis. It renders the popular difference- and product-of-coefficient methods inappropriate (VanderWeele 2016; Robins and Greenland 1992; Pearl 2001), necessitating the use of more complex causal mediation analysis methods. While such methods have been developed for the analysis of time-to-event endpoints, most ignore that the mediator is not assessed at baseline and that subjects may therefore experience the event prior to the mediator being assessed(Lange et al. 2013; Huang and Yang 2017; Vandenberghe et al. 2018). Further complications arise from the mediator, loneliness, being repeatedly measured. While useful to better capture mediation via the entire longitudinal mediator process (Vansteelandt et al. 2019), this also gives rise to complex time-varying confounding patterns whereby mediators (e.g. loneliness) and confounders (e.g. comorbidities) mutually influence each other over time. These complications have been addressed in a number of recent works (Zheng and van der Laan 2017; Lin et al. 2017; Vansteelandt et al. 2019).

A further complication that we must consider, and that we have not previously found being addressed in the mediation analysis literature, is the presence of competing risks by death. With observed - as opposed to counterfactual - event times, the modelling of cause-specific hazards is well known to simplify the handling of competing risks. However, the modelling of (cause-specific) hazards is not readily possible in the previous mediation analysis works (Zheng and van der Laan 2017; Lin et al. 2017; Vansteelandt et al. 2019), which instead focus on the analysis of survival chances. In view of this, in this paper, we will introduce so-called path-specific effect proportional (cause-specific) hazard models, which directly parameterise the direct and indirect effects of a given exposure on the cause-specific hazard of the considered event. We will extend the weighting-based approach proposed by Mittinty and Vansteelandt (2020) for longitudinal natural effect models to enable the planned mediation analysis of an exposure via repeatedly measured mediators on a time-to-event endpoint subject to competing risks.

We proceed as follows. In the next section, we first describe the setting of interest. We then extend the path-specific effect proportional hazard model to take into account the longitudinal nature of the mediator. In the same section, we discuss the assumptions under which the path-specific direct and indirect effects derived from such model are identifiable from data, and provide a step-by-step procedure for estimating these effects. In sect. 3, we apply the proposed approach to analyze data from the English Longitudinal Study of Ageing (Steptoe et al. 2013; Davies et al. 2017). We end with some final remarks and a discussion.

2 Proposal

2.1 Path-specific models for longitudinal mediators and a time-to-event endpoint subject to competing risks

Consider an observational study in which independent individuals \(i = 1, \ldots , n\) are exposed to a categorical factor \(A_i\) coded as \(0, 1,\ldots , P-1\) for P different categories. Longitudinal measurements of the mediator \(M_{i0},M_{i1}\ldots , M_{iK}\) and of the covariates \(L_{i0},L_{i1}\ldots , L_{iK}\) are subsequently recorded at baseline (subscript 0) and at visits \(1, \ldots , K\), along with (i) a time-to-event endpoint \(T_i\) and (ii) an index \(D_i\) specifying whether the main event (\(D_i = 1\)) or the competing one (\(D_i = 2\)) happens. Denote \(t_{(k)}\), \(k = 1\ldots K\) the fixed time point after the onset of the exposure at which the measurements of \(M_k\) and \(L_k\) are pre-planned for all patients. Assume that these measurements are only recorded until the last visit K or until event \(D_i = 1\) or \(D_i = 2\) happens, whichever comes first. The time-to-event endpoint may be censored administratively or due to loss to follow-up, in which case \(D_i = 0\).

The causal diagram in Fig. 1 depicts the relationships between the variables over time. In the diagram, \(L_k\) includes the indicator \(I(T\ge t_{(k)})\) of having survived visit k. Throughout, we will denote the history of measurements up to visit k using a bar, i.e. \({\overline{M}}_k = (M_1,\ldots M_k)\) and \({\overline{L}}_k = (L_1,\ldots L_k)\).

Fig. 1
figure 1

Causal diagram. \(U_l\): unmeasured confounders affecting L and (TD). \(U_m\): unmeasured confounders affecting different measurements of M over time. Note that in this figure, \(L_k=M_k = 0\) if \(T<t_{(k)}\). This is to avoid that future measurements of L and M after the time-to-event T can influence T

To define the direct and indirect effects of interest, we will make use of so-called path-specific effects, expressed as a (cause-specific) hazard ratio. In particular, we define the counterfactual variables \(T_{a,a^*}\) and \(D_{a,a^*}\) as the time to the main or competing event (whichever comes first) and the corresponding event index that would be observed if the exposure A were set to a and the mediator levels changed to the levels that we would have seen if the exposure were set to \(a^*\) and the levels of the time-varying confounders were as observed under this joint intervention on A and \({\overline{M}}_K\), respectively. The cause-j-specific hazard \(\lambda ^j_{a, a^*}(t)\) observed under such intervention can then be expressed as:

$$\begin{aligned} \lambda ^j_{a, a^*}(t) = \lim _{\delta t\rightarrow 0^+} \frac{P(t\le T_{a,a^*}\le t+\delta t,D_{a,a^*}=j | T_{a,a^*}\ge t)}{\delta t} \end{aligned}$$

The total causal effect (TE) on the cause-j-specific hazard when the exposure changes from a to \(a^*\) is thus \( HR^j_{TE}(t) = \frac{\lambda ^j_{a^*, a^*}(t)}{\lambda ^j_{a, a}(t)}\), which can be decomposed into the direct effect (DE) \(HR^j_{DE}(t) = \frac{\lambda ^j_{a^*, a}(t)}{\lambda ^j_{a, a}(t)}\) and the indirect effect (IE) \(HR^j_{IE}(t) = \frac{\lambda ^j_{a^*, a^*}(t)}{\lambda ^j_{a^*, a}(t)}\), where \(HR^j_{TE}(t) = HR^j_{IE}(t) \times HR^j_{DE}(t)\). The indirect effect \(HR^j_{IE} (t) \) hence reflects the part of the treatment effect (on the cause-j-specific hazard) that is mediated via the pathways \(A \rightarrow M_k \rightarrow \ldots \rightarrow T\), where \(k = 1, 2,\ldots \). On Fig. 1, these pathways start from the treatment A and go directly to one of the mediators before getting to the event time T by any intermediate path. In contrast, the direct effect \(HR^j_{DE} (t) \) reflects the part of the treatment effect (on the cause-j-specific hazard) that does not go through any of the above pathways.

There are many reasons why we choose to focus on the aforementioned pathways. As visualized in Fig. 1, one may nearly always expect the mediators and confounders to influence each other mutually over time. Due to this, many pathways will involve both mediators and confounders. This blurs a good understanding which pathways can be viewed as representing a mechanism via the mediator versus a mechanism via confounders. The decomposition that corresponds best with our intuition of a mechanism via a given mediator, is arguably the one we choose to focus on. The treatment or exposure first affects the mediator, which may in turn generate a cascade of effects, possibly involving confounders and mediators at later time points, to subsequently affect the outcome. Pathways whereby the exposure first influences confounders before in turn affecting mediator and then outcome, correspond better with our intuition of capturing (part of) the mechanism via that confounder. As will be shown below, this combination of effects, whereby the exposure first affects the mediator (at any time) and then in turn outcome by no matter what pathways, is also what can be identified from the observed data under reasonable assumptions.

One further challenge when defining the causal indirect effect via the proposed pathway is the truncation-by-death problem. This arises as a result of the difference in a person’s time-to-death with versus without the considered exposure. Indeed, even if a given person would survive a given time point t in the absence of exposure (\(A=0\)), that person’s counterfactual mediator value in the presence of exposure (\(A=1\)) is ill-defined at time t, if that person would have died by that time in the presence of the exposure. In that case, it is wise to interpret the identified effects differently as expressing the effect when setting the mediator at each time point at a random draw from its distribution in the exposed who are still alive and have the same considered observed data history. A more detailed discussion on this can be found elsewhere (Vansteelandt et al. 2019).

Finally, note that the total, direct and indirect effects on the (cause-specific) hazard ratio scale have a subtle interpretation. In the causal inference literature, the hazard ratio is often criticized due to the fact that it may be affected by selection effects caused by unobserved heterogeneity over time, and hence does not generally have a causal interpretation (Young et al. 2020; Martinussen et al. 2020). As an example, consider the cause-j specific hazard ratio \(HR_{TE}^j\) defined above as:

$$\begin{aligned} HR_{TE}^j=\frac{\lambda _{a^*,a^*}^j(t)}{\lambda _{a,a}^j(t)} = \frac{\lim _{\delta t \rightarrow 0^+}P(t\le T_{a^*,a^*}\le t+\delta t,D_{a^*,a^* } =j | T_{a^*,a^* }\ge t)/\delta t}{\lim _{\delta t\rightarrow 0^+} P(t\le T_{a,a}\le t+\delta t,D_{a,a} =j | T_{a,a}\ge t)/\delta t} \end{aligned}$$

The righthand expression makes clear that \(HR_{TE}^j\) contrasts the cause-j specific hazard functions with and without the exposure for two separate groups of individuals, namely (i) those who survive time t if exposed (i.e. \(T_{a^*,a^*}\ge t\)) and (ii) those survive time if unexposed t. (i.e. \(T_{a,a} \ge t\)). These two groups are in general not comparable if the exposure has a non-null causal effect on (TD) (except at baseline if the exposure is ‘randomized’). This selection effect likewise impacts the two causal contrasts \(HR_{DE}^j\) and \(HR_{IE}^j\).

To overcome the above limitation, the total, direct and indirect effect could be alternatively defined on the cause-specific cumulative incidence scale. For instance, under the joint intervention on A and \({\overline{M}}_K\) described above, one then has:

$$\begin{aligned} F^j_{a,a^*}(t) := P(T_{a,a^*}\le t,D_{a,a^*}=j)=\int _0^t \exp \bigg (-\int _0^u\sum _{j}\lambda ^j_{a,a^*}(s)ds\bigg )\lambda ^j_{a,a^*}(u)du, \end{aligned}$$
(1)

which expresses the incidence of the occurrence of event j at time t, while taking other competing risks into account. Contrasts of \(F^j_{a^*,a^*}(t)\) and \(F^j_{a,a}(t)\) (e.g. on the relative scale) hence quantify the total effect of the exposure on the event j at time t. Similarly, contrasts of \(F^j_{a^*,a^*}(t)\) and \(F^j_{a^*,a}(t)\) (or \(F^j_{a^*, a}(t)\) and \(F^j_{a,a}(t)\)) quantify the indirect effect of A on the event j via the pathways \(A \rightarrow M_k \rightarrow \ldots \rightarrow T\) (or the direct effect of A on the event j not via these pathways) at time t. Unlike hazard ratios, these contrasts are marginally defined and hence are causally interpretable.

Even so, exposure effects on the cumulative incidence functions also have a subtle interpretation because of the exposure effect on competing causes. For instance, the exposure may seem to be protective of the event of interest merely because it increases the likelihood of the competing event (Putter et al. 2007). Young et al. (2020) therefore evaluate effects under hypothetical interventions which eliminate the exposure effect on the competing events. We here avoid this as it demands additional assumptions, which we judged to be implausible in our study.

We are now ready to define the cause—j—specific path-specific effect proportional hazard model, accounting for longitudinal mediators, as follows:

$$\begin{aligned} \lambda ^j_{a, a^*}(t) = \lambda _0^j(t)e^{\alpha _{1j}a + \alpha _{2j}a^*} \end{aligned}$$
(2)

for all \(a,a^*\), where \(\lambda _0^j(t), \alpha _{1j}\) and \(\alpha _{2j}\) are unknown. This model is inspired by the so-called natural effect models proposed by Steen et al. (2017b), and subsequently extended by Mittinty and Vansteelandt (2020) to a longitudinal outcome setting. The difference, however, is that model (2) targets path-specific effects rather than natural indirect effects as in Steen et al. (2017b). Also, note that when \(a=a^*\), the proposed model (2) reduces to the standard marginal structural model proposed by Robins et al. (2000). It can thus be viewed as a generalization of marginal structural models to investigate path-specific effects.

Under model (2), the total, direct and indirect effect can be expressed on the hazard ratio scale as \(HR^j_{TE} = e^{(\alpha _{1j}+\alpha _{2j})(a^* - a)}\); \(HR^j_{IE} = e^{\alpha _{2j}(a^* - a)}\) and \(HR^j_{DE} = e^{\alpha _{1j}(a^* - a)}\), respectively. To assess the possibility of mediator-exposure interaction, one can alternatively consider model:

$$\begin{aligned} \lambda ^j_{a, a^*}(t) = \lambda _0^j(t)e^{\alpha _{1j}a + \alpha _{2j}a^* + \alpha _{3j}a.a^*} \end{aligned}$$
(3)

Under model (3), the total causal effect on the hazard ratio scale is expressed as \(HR^j_{TE} = e^{\left( \alpha _{1j}+\alpha _{2j} + \alpha _{3j}(a^*+a)\right) (a^* - a)}\) and is decomposed into the indirect effect \( HR^j_{IE} = e^{(\alpha _{2j} + \alpha _{3j}a)(a^* - a)}\) and the direct effect \(HR^j_{DE} = e^{(\alpha _{1j} + \alpha _{3j}a^*)(a^* - a)}\). The cumulative incidence function \(F_{a,a^*}^j(t)\) at any time t can also be computed from the estimates of model (2) and (3) (see below). Finally, note that other models for survival outcomes, such as the Aalen model, can also be extended to the current context.

2.2 Identification

In this section, we discuss the assumptions that allow one to identify the distribution of \(Y_{a,a^*} = (T_{a,a^*}, D_{a,a^*})\) from the observed data. For now, we will ignore the problem of censoring and will specifically focus on identifying the cause-j-specific cumulative distribution function \(F^j_{a,a^*}(t)\) at a time t between \(t_3\) and \(t_4\), i.e. the last measurements of M and L are \(M_3\) and \(L_3\), respectively. All results given below generalize to any time t with measurements for more than three study visits.

First, we make use of the following recanting witness (or cross-world) assumption:

  1. (i)

This assumption is satisfied in the causal graph depicted in Fig. 1, provided that it represents a non-parametric structural equation model with independent errors. Under this assumption, \(F^j_{a,a^*}(t)\) for \(t_3 < t \le t_4\) can be written as:

$$\begin{aligned} F^j_{a,a^*}(t) =&\int \text {Pr}\{T(a, {\overline{m}}_3) \le t, D(a, {\overline{m}}_3) = j, \nonumber \\&\quad ~ L_1(a) = l_1, L_2(a, l_1, m_1)=l_2, L_3(a, {\overline{l}}_2, {\overline{m}}_2) = l_3| L_0=l_0\} \nonumber \\&\times f\{ M_1 (a^*,l_1)=m_1, M_2(a^*,{\overline{l}}_2,m_1) = m_2, M_3(a^*,{\overline{l}}_3, ) = m_3, {\overline{m}}_2|L_0=l_0 \} \nonumber \\&\times f(L_0 = l_0) ~d{\overline{l}}_3 ~ d{\overline{m}}_3 \end{aligned}$$
(4)

where \(f\{\cdot \}\) denotes the (joint) density. To link \(F^j_{a,a^*}(t)\) to the observed data, we will further assume that the set of baseline covariates \(L_0\) is sufficient to control for confounding of the relationship between A and \(Y = (T,D)\), as well as between A and \(M_k\) or \(L_k\) at any time \(t_{(k)}\).

  1. (ii)

    and for all \(k = 1, 2, 3\)

  2. (iii)

Besides, we assume that conditional on the observed history, there are no unmeasured confounders of the relationship between the time-to-event outcome and the mediator, as well as between the mediator and the longitudinal confounder at any time. More precisely,

  1. (iv)

    for \(k = 1,2\) and

  2. (v)

    for all \(k = 1, \ldots , 3\)

For instance, in Fig. 1, assumption (iv) is satisfied since conditioning on the exposure A and the history of the time-varying confounder L up to time t is sufficient to adjust for confounding of the relationship between the time-to-event outcome T and the mediator level at time t. Our development allows for the presence of unmeasured common causes of the mediators over time (i.e. denoted \(U_m\) in Fig. 1) and separate, independent unmeasured common causes of the baseline/time-varying confounders and the survival time (TD) (i.e. denoted \(U_l\)).

Finally, we make use of the standard consistency assumption, i.e.:

  1. (vi)

    \(\text {Pr}(L_1(a) = L_1| a) = \text {Pr}(M_1(a,l_1)=M_1|a, l_1) = \text {Pr}(L_2(a,l_1,m_1)=L_2|a, l_1,m_1) = \ldots = 1\)

In Appendix 1, we show that under the above assumptions, \(F^j_{a,a^*}(t)\) can be linked to the observed data as:

$$\begin{aligned} F^j_{a,a^*}(t) =&\int \text {Pr}\{T \le t, D = j|a,{\overline{l}}_3, {\overline{m}}_3, l_0\} \cdot \prod _{s:0<s\le 3} f(m_{s}| a^*, {\overline{m}}_{s - 1}, {\overline{l}}_{s}, l_{0}, T\ge t_{(s)})\cdot \\&\cdot f(l_{s}|{\overline{l}}_{s-1}, {\overline{m}}_{s - 1}, l_{0}, a, T\ge t_{(s)})\cdot f(l_0)~d{\overline{m}}_3~d{\overline{l}}_3~d l_0 \end{aligned}$$

2.3 Estimation

In what follows, we generalize the standard estimation procedure for the popular marginal structural models to fit models (2) and (3). For this, we make an additional assumption that censoring is non-informative, in the sense that where C denote the time-to-censoring. At any timepoint, the instantaneous risk of the event among patients who then drop out of the study is hence not different (at all future times) from that of patients who remain, conditional on the exposure level. In the next section, we will relax this assumption to allow for more complicated censoring mechanisms. Under the aforementioned assumptions, Appendix 2 shows that consistent estimators of the parameters indexing model (2) can be obtained by solving the estimating equation:

(5)

where \(R_i^j(t) = I(T_i\ge t, D_i = j)\); \(dN^j_i(t) = I(T_i = t, D_i = j)\) and I(.) denotes the indicator function. Besides,

$$\begin{aligned} W_i(\lfloor t \rfloor ,a,a^*) =&\frac{\prod _{s\in {\mathbb {N}},t_s\le \lfloor t \rfloor } {\hat{\text {Pr}}}(M_{s,i}|A_i = a^*, {\overline{M}}_{s - 1,i}, {\overline{L}}_{s,i}, L_{0,i}, T_i \ge t_{(s)})}{\prod _{s \in {\mathbb {N}}, t_s\le \lfloor t \rfloor } {\hat{\text {Pr}}}(M_{s,i}|A_i = a, {\overline{M}}_{s - 1,i}, {\overline{L}}_{s,i}, L_{0,i}, T_i \ge t_{(s)})}\\&\times \frac{I(A_i = a)}{{\hat{\text {Pr}}}(A_i=a|L_{0,i})} \end{aligned}$$

denotes the weight of individual i at time t and at exposure levels a and \(a^*\), \({\hat{\text {Pr}}}(\cdot )\) denotes a consistent estimator of the corresponding probability \(\Pr (\cdot )\), and \({\hat{E}}(\cdot )\) denotes the sample average. Here, the notation \(\lfloor t \rfloor \) is slightly different from its standard definition, to take into account the fact that if a patient experiences an event or leaves the study at time \(t=t_{(k)}\) of visit k, no measurement of \(M_k\) and \(L_k\) is possible at that time. More precisely,

$$\begin{aligned}\lfloor t \rfloor = {\left\{ \begin{array}{ll} t_{(k-1)}&{}\quad \mathrm {if}~~~ t=t_{(k)}\\ t_{(k)}&{}\quad \mathrm {if}~~~ t_{(k)}< t < t_{(k+1)} \end{array}\right. }\end{aligned}$$

where \( k =1,\ldots ,K\). The second component of the weight ensures that the exposure-outcome association is adjusted for confounding by \(L_0\). It creates a pseudo-population in which the exposure is no longer associated with \(L_0\) and hence removes confounding by \(L_0\) (Lange et al. 2012). The first component of the weight then distinguishes between the direct and indirect paths by correcting for the fact that the observed mediator value at each time point before t may differ from the counterfactual value that is of interest at that time. From this, the fitting procedure is described as follows:

Step 1 Postulate and fit a suitable model for the exposure A conditional on the baseline confounders (\(L_0\)) based on the original data set. For instance, a multinomial logistic model can be used for a categorical exposure with P possible values:

$$\begin{aligned} \log \frac{\text {Pr}(A = a|L_0 = l_0)}{\text {Pr}(A = 0|L_0 = l_0)} = \beta _{0,a} + \beta _{1,a}\,l_0, \end{aligned}$$

where \(a = 1, \ldots , P - 1\) and \(\text {Pr}(A = 0|L_0 = l_0) = 1/(1 + \sum _{a = 1}^{P - 1} e^{\beta _{0,a} + \beta _{1,a}\,l_0})\).

Step 2 Convert the original dataset to a long or counting-process format, in which the observation period \([0, T_i]\) of individual i (where \(T_i\) is the observation time of this individual) is broken into \(k' + 1\) intervals where \(k' = \mathrm {argmax}_k\{t_{(k)} \le \lfloor T \rfloor \}\). Each interval k \((1\le k \le k')\) will have the following information encoded:

  1. (a)

    The beginning of the interval, which equals the time \(t_{(k-1)}\) of visit \(k-1\), with \(t_{(0)} = 0\).

  2. (b)

    The end of the interval, which equals the time \(t_{(k)}\) of visit k or the event/censoring time \(T_i\) for the last interval.

  3. (c)

    The event status at the end of the interval.

  4. (d)

    The exposure \(A_i\) and the baseline covariates \(L_{0i}\), whose values remain unchanged across all intervals.

  5. (e)

    The history of the mediator and the longitudinal confounders recorded up to the end of the interval. Note that the history of L and M for subject i up to the time \(T_i\) is similar to their history up to the last visit prior to time \(T_i\).

Table 1 provides a toy example in which three patients receive a binary treatment and are followed up for a total of three years, with two visits pre-planned at the end of year 1 and 2. Patient 3 is free of event at the end of the study and hence has the mediator level fully recorded at the two intermediate visits. In contrast, patient 1 and 2 experience an event after 1.5 and 0.9 years, due to which they have no (i.e. patient 2) or only one (i.e. patient 1) mediator level recorded. Table 2 illustrates how the information of these three hypothetical individuals is encoded in a counting-process format.

Table 1 Illustrating example: A toy dataset in standard short format
Table 2 The counting-process format of the dataset in Table 1

Step 3 Postulate and fit a suitable model for the mediator at each time \(t_{(k)}\), conditional on the exposure, the longitudinal confounder \({\overline{L}}_k\) and previous measurements of the mediator (i.e. \({\overline{M}}_{k - 1}\)), by using the long data set. For instance, one may assume multinomial logistic models for a categorical mediator \(M_k\) with possible values \(0, \ldots , Q\), that is:

$$\begin{aligned}&\log \frac{\text {Pr}(M_k = q|A = a, {\overline{M}}_{k - 1} = {\overline{m}}_{k - 1}, {\overline{L}}_k = \bar{l}_k, L_0=l_0, T\ge t_{(k)})}{\text {Pr}(M_k = 0|A = a, {\overline{M}}_{k - 1} = {\overline{m}}_{k - 1}, {\overline{L}}_k ={\overline{l}}_k, L_0=l_0, T\ge t_{(k)})} \\&= \gamma _{0q} + \gamma _{1q} t_{(k)} + \gamma _{2q}a + \gamma _{3q}'{\overline{m}}_{k - 1} + \gamma _{4q}'{\overline{l}}_k + \gamma _{5q}l_0 \end{aligned}$$

where \(q = 1, \ldots , Q\). Note that other non-linear terms such as two-by-two interactions between baseline covariates and previous measurements of M before k can also be added to the above model.

Step 4 A new data set is then constructed by copying the original data set (in long format) P times and including an additional variable \(A^*\) to capture the P possible values of the exposure relative to the indirect path. \(A^*\) is set to the actual value of the exposure A for the first replication, to the other potential values of A for the remaining replications. This step ensures that the estimating equation solved by standard software is summing over levels of \(a^*\) as in equation (5). Such a step is also standard in the estimation procedure of natural effect models and extensions thereof, which was first proposed by Lange et al. (2012) in the simple setting of single mediator. For the example discussed in Tables 1 and 2, the corresponding extended data set is provided in Table 3.

Table 3 Extended data set for the example in Table 1

Step 5 Compute weights by applying the fitted models from steps 1 and 3 to the new data set. At visit k, the weight for the \(i^{th}\) individual is \(w_{i}(k, a,a^*) = w_{i}^{ttm}(a)\cdot w_{i}^{med}(k,a,a^*),\) where:

$$\begin{aligned} w_i^{ttm}(a) = \sum _{a = 0}^{P-1} \frac{I(A_i = a)}{{\hat{\text {Pr}}}(A_i = a|L_{0i} = l_{0i})} \end{aligned}$$

and

$$\begin{aligned}&w_{i}^{med}(k,a,a^*) \nonumber \\&= \prod _{s\in {\mathbb {N}}, t_{(s)} \le t_{(k)}} \frac{{\hat{\text {Pr}}}(M_{s,i} = m_{s,i}|A_i = a^*, {\overline{M}}_{s - 1,i} = {\overline{m}}_{s - 1,i}, {\overline{L}}_{s,i} = \bar{l}_{s,i}, L_{0,i} = l_{0,i}, T_i\ge t_s)}{{\hat{\text {Pr}}}(M_{s,i} = m_{s,i}|A_i = a, {\overline{M}}_{s - 1,i} = {\overline{m}}_{s - 1,i}, {\overline{L}}_{s,i} ={\overline{l}}_{s,i}, L_{0,i} = l_{0,i}, T_i\ge t_s)} \end{aligned}$$

where the superscript ttm and med denotes treatment and mediator, respectively. At the end of the follow-up time, the weight for a patient having \(T_i = t_i\) is \(w_{i}(t_i, a,a^*) = w_{i}(\lfloor t_i \rfloor ,a,a^*)\).

Step 6 Fit the path-specific effect cause-specific proportional hazard model (2) and (3) by proportional hazard regression of the cause-specific event time on A and \(A^*\) on the basis of the expanded data set, using the weights computed in the previous step. Derive confidence intervals for the parameters in model (2) and (3) using the non-parametric bootstrap. For this, one first generates S bootstrap samples with replacement from the original dataset, then repeat all the above steps for each bootstrap sample. The 95% confidence interval for each parameter in model (2) and (3) is computed by using the 2.5% and 97.5% quantiles of the bootstrap distribution of the corresponding estimator.

Step 7 Establish the cause-specific cumulative incidence curves under different sets of a and \(a^*\). To estimate the curves, consider the different event types as terminal states of a multi-state model where the transition from one state to the other is treated as an absorbing state, i.e. the one that subjects never exist (Putter et al. 2007).

Fig. 2
figure 2

(Simplified) causal diagram when censoring presents—a Censoring is non-informative conditional on the exposure and b Censoring is non-informative at time t conditional on the exposure and the history up to that time. \(R_C(t) = I(C\ge t)\), where C is the time-to-censoring

2.4 Addressing complications due to censoring

As stated above, when the censoring is non-informative conditional on the exposure (Fig. 2a), the provided estimating procedure remains valid without further adjustment. When censoring is dependent upon the baseline covariate vector \(L_0\) and the exposure A (i.e. ), one may adjust for censoring by alternatively focusing on the so-called conditional cause-specific path-specifci effect proportional hazard model, that is,

$$\begin{aligned} \lambda ^j_{a, a^*}(t|L_0 = l_0) = \lambda _0^j(t)e^{\alpha _{1j}a + \alpha _{2j}a^* + \alpha _{2j}'l_0} \end{aligned}$$

Note that an interaction between \(a^*\) and \(l_0\) could also be permitted in such model to assess the possibility of mediator-baseline covariate interaction. The procedure discussed in Sect. 2.2 can then be applied to estimate the parameters in this model, with a slight adjustment in step 6 where apart from A and \(A^*\), the covariates \(L_0\) (and the product of \(A^*\) and \(L_0\) if mediator- baseline covariate interaction is assessed) are also included into the proportional hazard regression model. In practice, censoring may however also depend upon post-baseline factors such as the longitudinal mediator and confounder levels that are measured prior to censoring (Fig. 2b). In that case, progress can be made upon assuming that at any time t, the risk of dropping out of the study for patients not yet experiencing an event does not depend on when they will experience an event in the future and of what type this event will be, given the history up to time t, i.e. \(\lambda _C(t|T>t,T,D,A,{\overline{M}}_t,{\overline{L}}_t)=\lambda _C(t|T>t,A,{\overline{M}}_t,{\overline{L}}_t)\) where \(\lambda _C(\cdot )\) denoting the cause-specific hazard function of the time-to-censoring. In that case, the so-called inverse probability of censoring weighting approach can be used to account for censoring. More precisely, the parameters indexing models (2) can be estimated by solving the estimating Eq. (5) with the following weight for each individual i (see appendix A2 for a formal proof):

$$\begin{aligned} W_i(\lfloor t \rfloor ,a,a^*) =&\frac{\prod _{s: t_{(s)} \le \lfloor t \rfloor } {\hat{\text {Pr}}}(M_{s,i}|A_i = a^*, {\overline{M}}_{s-1,i}, {\overline{L}}_{s,i}, L_{0,i}, T_i\ge t_{(s)}, C_i \ge t_{(s)})}{\prod _{s: t_{(s)} \le \lfloor t \rfloor } {\hat{\text {Pr}}}(M_{s,i}|A_i = a, {\overline{M}}_{s-1,i}, {\overline{L}}_{s,i}, L_{0,i}, T_i\ge t_{(s)}, C_i \ge t_{(s)})}\\&\quad \cdot \frac{I(A_i = a)}{{\hat{\text {Pr}}}(A_i=a|L_{0,i})}\cdot \\&\,\cdot \frac{1}{\varvec{\prod }_{0<s\le t} \left[ 1- \hat{\lambda }_C(s|T_{i}>s, {\overline{M}}_{\lfloor s \rfloor ,i}, {\overline{L}}_{\lfloor s \rfloor ,i},L_{0,i},A_i=a)ds\right] } \end{aligned}$$

Here, \(\varvec{\prod }_{s} x_s\) is defined as a product limit and \(\hat{\lambda }_C(\cdot )\) is a consistent estimator of \(\lambda _C(\cdot )\). With the above weight \(W_i(\lfloor t \rfloor ,a,a^*)\), the additional component that accounts for the informative censoring can make the overall weight become unstable (e.g. when the censoring hazard is close to 1 in some strata). To overcome this, one can then use stabilized (censoring) weights which incorporate a numerator defined in the same way as the denominator but adjusting only for the exposure, that is:

$$\begin{aligned} \frac{\varvec{\prod }_{s: 0\le s \le t} \left[ 1- \hat{\lambda }_C(s|A_i=a)ds\right] }{\varvec{\prod }_{s: 0\le s \le t} \left[ 1- \hat{\lambda }_C(s|T_{i}>s, {\overline{M}}_{\lfloor s \rfloor ,i}, {\overline{L}}_{\lfloor s \rfloor ,i},L_{0,i},A_i=a)ds\right] } \end{aligned}$$

One then needs to postulate two models for the censoring hazard at time t, with one conditioning on exposure and the other conditioning on exposure, baseline covariates and the history of the longitudinal mediator and confounders up to time t, where only the latter model needs to be correct. For instance,

$$\begin{aligned}&\lambda _C(t|a_i)=\lambda '_{0C}(t)e^{\eta a_i} \nonumber \\&\lambda _C(t|T_{i}>t, {\overline{m}}_{\lfloor t \rfloor ,i}, {\overline{l}}_{\lfloor t \rfloor ,i},l_{0,i},a_i)=\lambda _{0C}(t)e^{\theta _0a_i + \theta _1l_{0,i} + \theta _2'{\overline{m}}_{\lfloor t \rfloor ,i} + \theta _3'{\overline{l}}_{\lfloor t \rfloor ,i}} \end{aligned}$$
(6)

Note that non-proportional hazards can be addressed in the standard way, e.g. the time-varying effect can be modeled parametrically or with a step function.

In step 5 of the estimation procedure, apart from computing the mediator and treatment weights, one needs to additionally derive the censoring weight. More precisely, the weight for the \(i^{th}\) individual at visit k is now \(w_i^{ttm}(a,a^*)\cdot w_i^{med}(k,a)\cdot w_i^{cen}(k,a)\), where the superscript cen denotes censoring, i.e.,

$$\begin{aligned} w_i^{cen}(k,a)=\frac{\varvec{\prod }_{s: 0\le s \le t} \left[ 1- \hat{\lambda }_C(s|A_i=a)ds\right] }{\varvec{\prod }_{s: 0\le s \le t} \left[ 1- \hat{\lambda }_C(s|T_{i}>s, {\overline{M}}_{\lfloor s \rfloor ,i}, {\overline{L}}_{\lfloor s \rfloor ,i},L_{0,i},A_i=a)ds\right] } \end{aligned}$$

while \(w_i^{ttm}(a,a^*)\) and \(w_i^{med}(k,a,a^*)\) are computed as above. Of note, some R packages such as ipcwswitch provide useful built-in functions to calculate the censoring weights (Graffeo et al. 2019). We thus refer the reader to these references for more practical guidance.

Once the individual weights are computed, the natural effect cause-specific proportional hazard model (2) and (3) can be fitted using these weights and the confidence intervals for the estimates can be derived via the nonparametric bootstrap, as described above.

3 Illustrating example

We illustrate the proposed approach on the ELSA dataset (https://www.elsa-project.ac.uk), which includes 4232 patients over 50 years of age. In this ongoing study, the first contact with the participants was in 2002/03 (wave 1). These participants were then followed up biennially, with measures collected via computer-assisted face-to-face interview and self-completion questionnaires. As stated above, the question of interest here is whether the feeling of loneliness mediates the impact of hearing loss on time-to-dementia (measured in years), accounting for mortality as a competing event.

For this analysis, we used the hearing measurement recorded at wave 2 (e.g. 2004/05) and dichotomized subjects into two groups, namely normal (\(A=0\)) and limited (\(A=1\)) hearing ability. Among 4232 patients at baseline, 772 (18.2%) had limited hearing ability. The longitudinal mediator "loneliness" was recorded from wave 3 (2006/07) to wave 7 (2014/15) and had two potential values, namely frequent (\(M=1\)) vs. infrequent feeling of loneliness (\(M=0\)). Alongside the mediator, four longitudinal confounders were recorded over time (i.e. wave 3 to wave 7), namely depression status (yes vs. no), mobility score (continuous), smoking status (non-smoker vs. current smoker) and alcohol status (non-drinker vs. current drinker). The baseline covariates consisted of 15 variables, including age at wave 2, gender, ethnicity (white vs. non-white), wealth (1=low, 5=high), education level (1 = no formal qualification, 2 = intermediate and 3 = higher education), marital status (yes vs. no), the use of hearing aids (yes vs. no), the presence of hypertension (yes vs. no), diabetes (yes vs. no), stroke (yes vs. no) and cancer (yes vs. no), alongside the baseline values of the four aforementioned time-varying confounders. A detailed description of these covariates, as well as the mediator distribution, the event and censoring rate over time is provided in the appendix and in previous works (Davies et al. 2017; Hackett et al. 2018; Davies-Kershaw et al. 2018; Rafnsson et al. 2020). Of note, all patients were followed up until one year after the last wave in 2014/15, which equals to a total of 11 years of follow-up.

We assumed that the relationship between the variables obeys the causal structure depicted in Fig. 1. Here, the mediators and confounders measured at time \(t_{k - 1}\) are time-varying confounders of the relationship between the mediator measured at time \(t_k\) and the outcome. To estimate the exposure weights (step 1), we first considered a logistic (exposure) model adjusting for the main effects of all baseline covariates. To assess the potential of covariate-covariate interactions, we used a LASSO variable selection process (R package glmnet) to select the most important interaction terms from the set of all 105 possible two-by-two covariate interactions (i.e. a zero penalty was given to the main effect of all variables). The chosen interactions were then added into the exposure model. Of note, we selected the value of the tuning parameter that minimized the mean cross-validated error in a ten-fold cross-validation with deviance as loss function.

To estimate the mediation weights (step 3), we first considered a logistic (mediator) model adjusting for \(t_k\), \(M_{k-1}\), \(L_k\) and the main effects of all baseline covariates, i.e.:

$$\begin{aligned}&\log \frac{\text {Pr}(M_k = 1|A = a, {M}_{k - 1} = {m}_{k - 1}, {L}_k = {l}_k, L_0=l_0, T\ge t_{(k)})}{\text {Pr}(M_k = 0|A = a, {M}_{k - 1} = {m}_{k - 1}, {L}_k ={l}_k, L_0=l_0, T\ge t_{(k)})} \\&= \gamma _{0} + \gamma _{1} t_{(k)} + \gamma _{2}a + \gamma _{3}{m}_{k - 1} + \gamma _{4}{l}_k + \gamma _{5}'l_0 \end{aligned}$$

We then improved this mediator model in a sequential manner. First, to assess whether the conditional distribution of \(M_k\) has a residual dependence upon the measurements of M and L that preceded \(M_{k-1}\) and \(L_k\), we used the LASSO to determine the first measurement of M and L prior to \(M_{k-1}\) and \(L_k\) that were predictive for \(M_k\), conditional on \(t_k\), \(M_{k-1}\), \(L_k\) and the main effects of all baseline covariates in \(L_0\). This measurement and all measurements following this one up to \(M_{k-1}\) for M and \(L_k\) for L were included into the mediator model. Next, we assessed whether there were important (i) treatment-baseline/longitudinal covariate interactions, (ii) time-baseline covariate interactions and (iii) baseline covariate-covariate interactions that should be adjusted for. For each step, an independent LASSO variable selection process was performed to select the most important interaction terms from all possible interactions. The interactions that were chosen in the previous step were always included in the model of the subsequent steps (which implies no shrinkage on these terms in the subsequent steps). In each LASSO procedure, we selected the value of the tuning parameter that minimized the mean cross-validated error in a ten-fold cross-validation with deviance as loss function. The final model was refitted before calculating the mediation weights.

Table 4 Data analysis: estimation of the path-specific effect models (1) and (2)
Table 5 Data analysis: the effect of limited vs. normal hearing ability on the time-to-event outcomes, mediated through the feeling of loneliness

To estimate the censoring weights, we first considered a cause-specific proportional hazard model adjusting for \(M_{k}\), \(L_k\), the exposure A and the main effects of all baseline covariates \(L_0\). We then improved this censoring model in a sequential manner. First, to assess whether the censoring hazard at time t had a residual dependence upon the history of M and L that preceded the time \(t_{k}\) for M and for L, we implemented a backward elimination process, using the Akaike information criterion to determine the first post-baseline measurement of M and L that were predictive for the censoring hazard at time t, conditional on the later measurements. This measurement and all measurements following this one were included into the censoring proportional hazard model. Next, as for the mediator model, we assessed whether there were important (i) treatment-baseline/longitudinal covariate interactions and (ii) baseline covariate-covariate interactions that should be adjusted for. For each step, an independent backward elimination process was performed to select the most important interaction terms from all possible interactions of the same type. The interactions that were chosen in the previous step were always included in the model of the subsequent steps (which implies no exclusion of these terms in the subsequent steps). Note that we used backward elimination for the construction of the censoring models (as opposed to LASSO) due to the lack of prepackaged software that can apply LASSO or other advanced variable selection methods on a counting format survival dataset (Bien et al. 2013; Bickel et al. 2010). Results of the variable selection processes for the treatment, mediator and censoring models are reported in the Online Supplementary Materials.

The two path-specific effect proportional hazard models specific for dementia and for death (i.e. model (1) without interaction between a and \(a^*\), and model (2) with this interaction) were then fitted using the calculated weights. The confidence intervals of the total, direct and indirect hazard ratios were derived by the non-parametric bootstrap method, with 5000 samples taken from the original data set by sampling with replacement. We then established the cumulative incidence curves of dementia and of death under different sets of a and \(a^*\). As the interaction between a and \(a^*\) was not statistically significant at the 5% level, we only use the results of the path-specific effect model (1) (which omits the interaction between a and \(a^*\)) to establish the cumulative incidence curves.

Results of the estimation procedures are provided in Tables 4 and 5. As can be seen from these tables, the total effect of hearing loss on the time to dementia diagnosis was statistically significant (Model 1, HR = 1.84; 95%CI 1.10 to 2.71). Model (1) further suggested that this total effect was weakly mediated through feelings of loneliness, with a non-statistically significant indirect effect equal to 1.01 (HR scale; 95%CI 0.99 to 1.05). This expresses that the hazard of dementia would increase by 1% if all patients were to have limited hearing ability but loneliness levels were switched from the values that would have been observed if they had normal hearing ability to the value observed under limited hearing. In contrast, the total effect of hearing loss on mortality was not statistically significant (Model 1, HR = 0.88; 95%CI 0.63 to 1.15). There was no statistical evidence of an indirect effect through feelings of loneliness (HR = 1.00, 95%CI 0.99 to 1.01). These findings did not change when considering model (2) with interaction (Table 4—p-value of the interaction coefficient equals 0.52 for dementia and 0.63 for death).

Figure 3 provides the estimated cumulative incidence curve of dementia for different a and \(a^*\), which visualizes the weak indirect effect of hearing loss on dementia through the suggested longitudinal mediator. At some time points, there are large jumps in these curves due to the fairly high rate of interval censoring in the dataset (i.e. if the date for dementia diagnosis (or for death) was not known but the person had a new diagnosis (or passed away) from one visit to the next, then we considered the midpoint between two visits as the event date). Besides, note that the exposure (hearing impairment) was not found to affect mortality. This implies that interpretation of the exposure effect on the cumulative incidence of dementia is not hindered by the exposure effect on the competing event.

Fig. 3
figure 3

Estimated cumulative incidence curves of dementia diagnosis and of death over time, if the whole population were to have hearing ability level a, but the loneliness levels \(M_1,M_2,\ldots \) were switched from the values observed under hearing ability level a to the values observed under hearing ability level \(a^*\) (while the levels of the time-varying confounders were as observed under this joint intervention on A and M

The above findings should be interpreted with caution due to many potential limitations. First, there might be important baseline or time-varying confounders that were not measured and taken into account. For instance, genes affecting the likelihood of hearing loss may well be correlated with genes increasing the risk of dementia. Second, note that choosing time zero at wave 2 prevents bias due to left truncation, but renders interpretation slightly more subtle by ignoring age at study entry, which we did in order to display population-level results. Third, the obtained findings can be biased due to the involved models being incorrectly specified or censoring being informative (e.g. elderly patients who live alone might not come to the control visit due to dementia-related problems). Fourth, the data-adaptive methods used for constructing exposure, mediator and censoring models may affect the validity of the obtained confidence intervals and p-values. Addressing this complication is non-trivial and beyond the scope of this paper. In practice, a possible (heuristic) approach to overcome this could be to adjust for the union of all variables and/or interactions selected in all models, to ensure that no important confounders are omitted. This is inspired by the double selection principle proposed in the recent literature (Belloni et al. 2014). Further study might give insight into the extent to which this improves results, though we are not hopeful that a rigorous justification is possible.

4 Discussion

In this paper, we have generalized the weighting-based strategy proposed for natural effect models in single mediation analysis to the setting where the mediator of interest is repeatedly measured over time (hence subject to longitudinal confounders) and the primary outcome is a time-to-event endpoint, subject to competing risks. The proposed approach yields consistent estimates for the suggested path-specific direct and indirect effects if the causal assumptions hold, and the path-specific effect model and the conditional distribution of the exposure, mediator and censoring are correctly specified. As noted by Steen et al. (2017b), the mediator model needs careful consideration, especially when the exposure (and baseline covariates) are highly predictive of the mediator, for then even minor misspecification can have a major impact on the weights, lead to biased results and large variance due to extreme weights. While the presence of extreme weights might appear as a limitation at first, it may also diagnose severe model extrapolation that often goes unnoticed when using a repeated regression approach proposed for the same setting (Steen et al. 2017b). Simple weighting-based approaches also tend to yield larger standard errors (compared to imputation or regression-based approaches) due to lack in efficiency. This can be especially problematic when the mediator is continuous. In that case, the weight-based approaches tend to be unstable even under proper model specification. The repeated regression approach may be more appropriate for continuous mediators (Steen et al. 2017a; Vansteelandt et al. 2012, 2019).

Like in other settings, the presence of competing risks complicates the interpretation of the results. This is because the cumulative incidence of the event of interest is determined by the cause-specific hazards for both competing events, which may both be impacted by the exposure. Further insight may sometimes be gained by studying how the direct and indirect effects of exposure on the cumulative incidence of the event of interest would be like if we could intervene to hold the cause-specific hazard for the competing event fixed (Gran et al. 2015). While of interest, we have chosen not to do this in our study because it is unlikely that we have access to all common causes of dementia and mortality to render such further adjustment trustworthy.

Several proposals can be made to improve the suggested approach. For instance, instead of modeling the cause-specific hazard as discussed above, one may alternatively consider the Fine and Gray subdistribution hazard model to directly quantify the impact of covariates on the cumulative incidence function (Fine and Gray 1999). Future research might also focus on the development of doubly or multiply robust estimators (Bang and Robins 2005) to improve the robustness and efficiency of the current weight-based approach. The proposed strategy can also be easily extended to take into account multiple mediators \(M^{(1)}, \ldots , M^{(V)}\) that are repeatedly measured over time. When these mediators are causally ordered then as suggested by VanderWeele and Vansteelandt (2014), one can first evaluate the effect mediated through \(M^{(1)}\), then examine how much this changes when \(M^{(1)}\) and \(M^{(2)}\) are jointly considered as mediators. This then reveals the additional contribution of \(M^{(2)}\) beyond \(M^{(1)}\) alone. The process is then carried on by sequentially adding one mediator at a time until all V mediators are included. By accounting for multiple, repeatedly measured mediators, results of the analysis may allow one to get closer to evaluating the entire mediation process that underlies the treatment mechanism in practice. Finally, future research should also extend the proposed approach to account for continuous exposures, which are quite common in epidemiology and social science.