Abstract
Causal inference and missing data have attracted significant research interests in recent years, while the current literature usually focuses on only one of these two issues. In this paper, we develop two multiply robust methods to estimate the quantile treatment effect (QTE), in the context of missing data. Compared to the commonly used average treatment effect, QTE provides a more complete picture of the difference between the treatment and control groups. The first one is based on inverse probability weighting, the resulting QTE estimator is root-n consistent and asymptotic normal, as long as the class of candidate models of propensity scores contains the correct model and so does that for the probability of being observed. The second one is based on augmented inverse probability weighting, which further relaxes the restriction on the probability of being observed. Simulation studies are conducted to investigate the performance of the proposed method, and the motivated CHARLS data are analyzed, exhibiting different treatment effects at various quantile levels.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Causal inference has attracted more and more attention, which aims to identify the extent and nature of cause-and-effect relationships. For example, in the motivated Chinese Health and Retirement Longitudinal Study (CHARLS) in Sect. 5, we aim to study the causal effect of social activities on cognitive functions, among middle-aged and older adults in China. Due to possible heterogeneity in the data, it is necessary to provide a complete picture of the relationship between social activities and cognitive functions, rather than the conditional mean relationship. Furthermore, approximately 24% of the responses are missing, which further challenges the analysis.
In causal analysis, one most popular technique is the potential outcome model [19, 20], where the average treatment effect (ATE) is an important measurement, which measures the average difference between the treatment group and control group; see fruitful studies on ATE, Rosenbaum and Rubin [19, 20], Lunceford and Davidian [17], Tan [23], Imbens and Rubin [14], Hernan and Rubins [12]. However, the ATE is not informative enough when there is heterogeneity in the data and is sensitive to the outliers. The quantile treatment effect (QTE) gives a more complete picture of the causal effect and is robust to the heavy tails of the responses, thus is of growing interests. To estimate the QTE in the absence of randomization, classical methods to deal with the confounding effect include propensity score, outcome regression, and doubly robust methods (e.g., [7, 8, 18, 27]).
As in the motivated CHARLS data, response missing is often encountered in real applications, particularly in observational studies. Among various missing mechanisms, the most useful one is missing at random (MAR), which assumes that the missingness only depends on the observed values. To avoid inconsistent estimation by simply ignoring the missing data, several classes of methods are proposed, say inverse probability weighting (IPW, [22]), augmented IPW (AIPW, [1]), imputation [21]. Recently, double robust and multiply robust methods are developed under the MAR assumption [3, 9,10,11, 15], to provide robustness against model misspecification. Xie et al. [25] further extended the multiply robust method to handle QTE estimation.
Current literature for quantile regression focuses on either causal inference [8, 27] or missing data [9, 24]. To our best knowledge, there is no literature which studies both issues simultaneously, possibly due to the complexity introduced by both the confounding effects and response missing. However, in practice, the combination of these two issues is frequently encountered, such as the motivated CHARLS data in Sect. 5. To handle both issues, we need to correctly specify the propensity score (PS) model for confounding adjustment and the probability of being observed (PO) or the outcome regression (OR) to deal with missing data, which is challenging in practice.
In this paper, we develop two multiply robust methods to estimate the QTEs with responses being MAR. The first method is that, based on multiply robust estimations of the PS and PO, the IPW approach is utilized to develop a weighted objective function to estimate the QTEs. The second approach, multiply robust AIPW, is to make further resistance to the model misspecification, which relaxes the restriction on the PO. The contributions can be summarized as follows. Firstly, the proposed methods provide a complete picture of the causal effects, which is helpful to identify possible heterogeneity and thus make personalized intervention; this merit is confirmed in the motivated CHARLS data. Secondly, the proposed methods are able to adjust the confounding effect and deal with missing data, which are commonly encountered in the real applications, providing consistent QTE estimates. Finally, the proposed two estimators are multiply robust. For the IPW method, the resulting estimator is consistent if the class of candidate models of PS contains the correct model and so does that for the PO. For the AIPW method, the restriction on the PO can be relaxed to that either PO or OR contains a correct specified model. Furthermore, the proof of consistency and asymptotic normality properties is not trivial, especially for the proposed AIPW estimator, which is mainly from the cumulative challenges of the augmentation and confounding adjustment.
The rest parts are organized as follows. In Sect. 2, we introduce the potential outcome framework and some key assumptions, and propose the multiply robust IPW and AIPW estimators of QTE with missing responses. We present the asymptotic properties in Sect. 3. Simulation studies are conducted to investigate the finite-sample performance in Sect. 4. Finally, the proposed method is applied to the motivated CHARLS data, showing that the causal effects of social activities on cognitive functions vary across different quantile levels of the responses. All technical proofs are provided in Section S2 of the online Supplementary Materials.
2 The Proposed Method
2.1 Notation and Identification
Let Y(1) and Y(0) be the potential outcomes for the treated and control groups, \(\textbf{X}\) be the covariate vector, and T be the treatment status where \(T=1\) means treated and \(T=0\) means untreated. Under the consistency assumption [25], we observe the outcome \(Y_i = T_iY_i(1)+(1-T_i)Y_i(0)\). Let \(\eta (\textbf{X})=Pr(T=1\mid \textbf{X})\) be the propensity score (PS), satisfying \(0<\eta (\textbf{X})<1\) for all \(\textbf{X}\).
We are interested in the \(\tau \)-quantile treatment effect (\(\tau \)-QTE), defined as
where \(\tau \in (0,1)\) is a quantile index, and \(q_1(\tau )\) and \(q_0(\tau )\) are the \(\tau \)-quantile of random variables Y(1) and Y(0), respectively.
Assuming that the treatment assignment is strongly ignorable, i.e., (Y(1), Y(0)) are independent of T conditional on \(\textbf{X}\). Under this assumption, \(q_1(\tau )\) and \(q_0(\tau )\) are identified based on the following moment equalities,
Let \(\alpha _0(\tau )=q_0(\tau )\) and \(\alpha _1(\tau )=q_1(\tau )-q_0(\tau )\), thus \(\alpha _1(\tau )\) is our interested \(\tau \)-QTE. Equation (2.1) is equivalent to
where \({\widetilde{\textbf{T}}}=(1,T)^{\top }\), \({\widetilde{\varvec{\alpha }}}_{0}(\tau )=(\alpha _0(\tau ),\alpha _1(\tau ))^{\top }\). Then, \(\alpha _1(\tau )\) can be identified.
In practice, the responses are possibly missing. Let R be an indicator of observing Y with \(R=1\) if Y is observed and \(R=0\) otherwise. The MAR mechanism is considered here which assumes \(\textrm{Pr}(R=1\mid Y,\textbf{X},T) = \textrm{Pr}(R=1\mid \textbf{X},T)\). Denote the probability of being observed (PO) by \(\pi (\textbf{X},T)=\textrm{Pr}(R=1\mid \textbf{X},T)\).
2.2 Multiply Robust IPW Estimator
To simultaneously handle both confoundedness and missingness, we introduce PS \(\eta (\textbf{X})=\textrm{Pr}(T=1\mid \textbf{X})\) to account for the confoundedness and PO \(\pi (\textbf{X},T)=\textrm{Pr}(R=1\mid \textbf{X},T)\) to deal with the missingness, estimating the \(\tau \)-QTE by minimizing
with respect to \(\textbf{a}=(a_0,a_1)^{\top }\) where \(\rho _{\tau }(u)=u\{\tau -I(u<0)\}\) corresponds to the check loss function. To deal with the missingness, we use \(1/\pi (\textbf{X}_i,T_i)\) to weight each observed subject, which helps to recover the population information based on the biased sample \(\{i: R_i=1\}\). To handle the confounding effect, \(\eta (\textbf{X}_i)\) is used to balance the weighted distribution of the covariates in the two groups.
In practice, \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\) are unknown, and need to be estimated. Generally, we can use parametric methods to estimate \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\), assuming that PS and PO follow a generalized linear model, say the logistic model, and then obtain their maximum likelihood estimates. However, the parametric methods require to specify the models for PS and PO correctly, which is usually infeasible in practice, and we need some robustness against model misspecifications. In this paper, we follow the idea in Han [11] and Han et al. [9], obtaining the estimates through the multiply robust approach.
We first present the estimator for \(\pi (\cdot )\). Let \(\mathcal {P}_{\pi }=\{\pi ^j(\textbf{X},T;\varvec{\theta }^j): j=1,\ldots ,J\}\) denote the set of candidate models for \(\pi (\textbf{X},T)\), where \(\varvec{\theta }^j\) is the corresponding parameter vector. We use \({\widehat{\varvec{\theta }}}^j\) to denote the estimator of \(\varvec{\theta }^j\), usually taking to be the maximizer of the binomial likelihood
Let \(m=\sum _{i=1}^n R_i\) be the number of subjects observed, and without loss of generality, we assume that \(R_1=\cdots =R_m=1, R_{m+1}=\cdots =R_n=0\). Let \(\omega _1(\textbf{X},T)=1/\pi (\textbf{X},T)\), then it is easy to verify that
Therefore, we define the weights \(\omega _{1,i}\), \(i=1,\ldots ,m\), subject to
where \({\widehat{\zeta }}^{1,j}(\varvec{\theta }^j)=n^{-1}\sum _{i=1}^n\pi ^j(\textbf{X}_i,T_i;\varvec{\theta }^j)\). We obtain the empirical likelihood estimate of \(\omega _{1,i}\) by maximizing \(\prod _{i=1}^m\omega _{1,i}\) subject to the constraints in (2.4). We then give
Next, we obtain the PS estimates for the treatment group and the control group, respectively. Let \(\mathcal {P}_{\eta }=\{\eta ^l(\textbf{X};\varvec{\gamma }^l): l=1,\ldots ,L\}\) denote the set of candidate models for \(\eta (\textbf{X})\), where \(\varvec{\gamma }^l\) is the corresponding parameter vector. We use \({\widehat{\varvec{\gamma }}}^l\) to denote the estimator of \(\varvec{\gamma }^l\), usually taking to be the maximizer of the binomial likelihood
Step 1, we get the PS estimates for the treatment group. Denote \(\mathcal {T}_n=\{i: T_i=1\}\), and let
where \(\{{\widehat{\omega }}_{2,i},i\in \mathcal {T}_n\}\) are obtained by maximizing \(\prod _{i\in \mathcal {T}_n}\omega _{2,i}\), subject to
with \(\zeta ^{2,l}(\varvec{\gamma }^l)=n^{-1}\sum _{i=1}^n\eta ^l(\textbf{X}_i;\varvec{\gamma }^l)\).
Step 2, we get the PS estimates for the control group. Let \(\mathcal {T}_n^c=\{1,\ldots ,n\}\backslash \mathcal {T}_n\) and let
where \(\{{\widehat{\omega }}_{3,i},i\in \mathcal {T}_n^c\}\) are obtained by maximizing \(\prod _{i\in \mathcal {T}_n^c}\omega _{3,i}\), subject to
Given \(\{{\widehat{\eta }}(\textbf{X}_i), i=1,\ldots ,n\}=\{{\widehat{\eta }}_{\mathcal {T}_n}(\textbf{X}_i):i\in \mathcal {T}_n\}\cup \{{\widehat{\eta }}_{\mathcal {T}_n^c}(\textbf{X}_i):i\in \mathcal {T}_n^c\}\) and \(\{{\widehat{\pi }}(\textbf{X}_i,T_i), i=1,\ldots ,m\}\), we propose to estimate the \(\tau \)-QTE by minimizing
with respect to \(\textbf{a}\). Denote the minimizer as \({\widehat{\textbf{a}}}(\tau )=({\widehat{a}}_0(\tau ),{\widehat{a}}_1(\tau ))^{\top }\), then \({\widehat{a}}_1(\tau )\) is the proposed estimator for the \(\tau \)-QTE \(\alpha _1(\tau )\), while \({\widehat{a}}_0(\tau )\) is the estimator for \(\alpha _0(\tau )\), the marginal \(\tau \)-th quantile of the control group. In Sect. 3, we will prove that, if \(\mathcal {P}_{\pi }\) and \(\mathcal {P}_{\eta }\) contain the correct PO and PS models, respectively, \({\widehat{a}}_1(\tau )\) is root-n consistent to the \(\tau \)-QTE and asymptotic normal.
2.3 Multiply Robust AIPW Estimator
In our objective function (2.2), we only specify the model for PO to address the missing issues. To gain more resistance against the model misspecification, we develop an augmented inverse probability weighted (AIPW) method to estimate the \(\tau \)-QTE by solving
where the set \(\{Y^{s}_{\textbf{X},T}\}_{s=1}^{S}\) is a random sample of size S from the conditional density \(f_{Y\mid \textbf{X},T}(\cdot )\). The first term is the score function of the IPW estimator [see expression (2.2)], the augmentation term is motivated by the imputation estimator, which is added to obtain more information from the outcome regression (OR) and provide double protection against model misspecifications.
Note that \(f_{Y\mid \textbf{X},T}(\cdot )\) is unknown, we need estimate \(f_{Y\mid \textbf{X},T}(\cdot )\) to draw random samples. To gain some resistance against the model misspecifications, we also adopt the multiply robust idea. Let \(\mathcal {P}_{f}=\{f^{k}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{k}): k=1,\ldots ,K\}\) denote the set of candidate models for \(f_{Y\mid \textbf{X},T}(\cdot )\), where \(\varvec{\xi }^{k}\) is the corresponding parameter vector. We use \({\widehat{\varvec{\xi }}}^k\) to denote the estimator of \(\varvec{\xi }^k\), usually taking to be the maximizer of the likelihood function
Let \(\omega (\textbf{X},T)=1/\pi (\textbf{X},T)\), it is easy to verify that
where \(\{Y^{s}(\varvec{\xi }^k)\}_{s=1}^{S}\) denotes a random sample of size S from \(f^{k}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{k})\).
Based on the sample versions of the above two equations, we use the empirical likelihood method to estimate \(\tau \)-QTE by solving
where the weighted estimating equation is solved by converting it into the weighted quantile regression and call the package quantreg, and \(\widehat{\eta }(\textbf{X}_i)\) is obtained by the multiply robust method in Sect. 2.2 and \(\widehat{w}_{i}\)’s are calculated by the following steps.
-
We first calculate \({\widehat{\varvec{\xi }}}^k\), \(k=1,2,\dots ,K\), by maximizing the expression (2.6). And we draw a random sample \(\{Y^{s}(\widehat{\varvec{\xi }}^k)\}_{s=1}^{S}\) from \(f^{k}_{Y\mid \textbf{X},T}(y;\widehat{\varvec{\xi }}^{k})\).
-
Based on the estimator \(\widehat{\eta }(\textbf{X}_i)\), calculate \(\widehat{\textbf{a}}^{k}_{S}(\tau )\), \(k=1,2,\dots ,K\), by solving
$$\begin{aligned}{} & {} \frac{1}{n}\sum _{i=1}^nR_i\Big \{ \frac{(T_i,T_i)^{\top }}{\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{(1-T_i,0)^{\top }}{1-\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\Big \}+(1-R_i)\\{} & {} \qquad \times \Bigg [ \frac{(T_i,T_i)^{\top }}{\widehat{\eta }(\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y_{i}^{s}(\widehat{\varvec{\xi }}^k)-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a}\}+\frac{(1-T_i,0)^{\top }}{1-\widehat{\eta }(\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y_{i}^{s}(\widehat{\varvec{\xi }}^k)-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a}\}\Bigg ]\\{} & {} \quad \approx 0, \end{aligned}$$where the weighted estimating equation is solved by converting it into the weighted quantile regression and call the package quantreg.
-
Finally, \(\widehat{w}_{i}\) can be obtained by maximizing \(\prod _{i=1}^m\omega _{i}\) subject to
$$\begin{aligned}{} & {} \omega _{i}\ge 0~(i=1,\ldots ,m),~\sum _{i=1}^m\omega _{i}=1,~\nonumber \\{} & {} \quad \sum _{i=1}^m\omega _{i}\{\pi ^j(\textbf{X}_i,T_i;{\widehat{\varvec{\theta }}}^j)-{\widehat{\zeta }}^{1,j}({\widehat{\varvec{\theta }}}^j)\}=0~(j=1,\ldots ,J),\nonumber \\{} & {} \quad \sum _{i=1}^m\omega _{i}\{\textbf{U}^{k}(\textbf{X}_i,T_i;{\widehat{\varvec{\xi }}}^k,\widehat{\eta }(\textbf{X}_{i}),\widehat{\textbf{a}}^{k}_{S}(\tau ))-{\widehat{\zeta }}^{k}({\widehat{\varvec{\xi }}}^k,\widehat{\eta }(\textbf{X}_{i}),\widehat{\textbf{a}}^{k}_{S}(\tau ))\}=0~\nonumber \\{} & {} \quad \quad \quad \quad \quad (k=1,\ldots ,K), \end{aligned}$$(2.9)where \({\widehat{\zeta }}^{1,j}(\varvec{\theta }^j)\) is defined in (2.4) and
$$\begin{aligned} {\widehat{\zeta }}^{k}(\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a}^{k}_{S}(\tau ))=n^{-1}\sum \limits _{i=1}^n\textbf{U}^{k}(\textbf{X}_i,T_i;\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a}^{k}_{S}(\tau )). \end{aligned}$$
Denote the solution of Eq. (2.8) as \(\breve{\textbf{a}}(\tau )=(\breve{a}_0(\tau ),\breve{a}_1(\tau ))^{\top }\), then \(\breve{a}_1(\tau )\) is the proposed multiply robust AIPW estimator of the \(\tau \)-QTE \(\alpha _1(\tau )\). In Sect. 3, we will prove that, as long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the \(\tau \)-QTE \(\breve{a}_1(\tau )\) is root-n consistent and asymptotic normal.
3 Asymptotic Properties
In this section, we present the asymptotic properties of the proposed two QTE estimators. Without loss of generality, let \(\pi ^1(\textbf{X},T;\varvec{\theta }^1)\), \(\eta ^1(\textbf{X};\varvec{\gamma }^1)\) and \(f^{1}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{1})\) be the correctly specified models for \(\pi (\textbf{X},T)\), \(\eta (\textbf{X})\) and \(f_{Y\mid \textbf{X},T}(y)\), respectively, with \(\varvec{\theta }_0^1\), \(\varvec{\gamma }_0^1\) and \(\varvec{\xi }_{0}^{1}\) being the true parameter vectors. Let \(\varvec{\Phi }_1\) and \(\varvec{\Phi }_2\) be the score functions for \(\varvec{\theta }^1\) and \(\varvec{\gamma }^1\), respectively, \(\varvec{\Phi }_{1,i}\) and \(\varvec{\Phi }_{2,i}\) be the score functions evaluated at ith subject. The following conditions are needed for the asymptotic properties of the proposed estimators.
-
A1.
The treatment assignment is strongly ignorable: (Y(1), Y(0)) are independent of T conditional on \(\textbf{X}\). We also assume that \(\sum _{i=1}^nT_i/n\rightarrow c_1\) for some \(0<c_1<1\).
-
A2.
The response is missing at random, \(\textrm{Pr}(R_i=1\mid \textbf{X}_i,T_i,Y_i)=\textrm{Pr}(R_i=1\mid \textbf{X}_i,T_i)\). We also assume that \(\sum _{i=1}^nR_i/n\rightarrow c_2\) for some \(0<c_2<1\).
-
A3.
The \(\alpha _0(\tau )\) and \(\alpha _0(\tau )+\alpha _1(\tau )\) are the unique \(\tau \) quantile of potential outcomes Y(0) and Y(1), respectively. The density functions of Y(0) and Y(1), \(f_{Y(0)}(\cdot ),~f_{Y(1)}(\cdot )\) at \(\alpha _0(\tau )\) and \(\alpha _0(\tau )+\alpha _1(\tau )\) are bounded away from zero. Assume that \(\textrm{E} \Vert \textbf{X}\Vert ^{4}<\infty \).
-
A4.
For \(j=1,\ldots ,J, l=1,\ldots ,L\), \(\pi ^j(\textbf{X},T;\varvec{\theta }^j)\) and \(\eta ^l(\textbf{X};\varvec{\gamma }^l)\) have bounded derivatives in \(\textbf{X}\) up to the second order, and are continuously differentiable in \(\varvec{\theta }^j\) and \(\varvec{\gamma }^l\), respectively. Assume that \(\inf \limits _{\textbf{X},T}\inf \limits _{\varvec{\theta }^j}\pi ^j(\textbf{X},T;\varvec{\theta }^j)>0\), \(\inf \limits _{\textbf{X}}\inf \limits _{\varvec{\gamma }^l}\eta ^l(\textbf{X};\varvec{\gamma }^l)>0\).
-
A5.
The matrices \(\textbf{D}\) and \(\textbf{G}_1, \textbf{G}_2, \textbf{G}_3, \textbf{G}_4\), defined in Section S1 of the online Supplementary Materials, are invertible.
Regularity assumptions A1–A5 for the major results are mild. Assumptions A1 and A2 are commonly used in literatures of causal inference and missing data, respectively. Assumptions A3 and A4 are similar to the conditions adopted in Han et al. [9], in order to ensure the consistency of the proposed estimator. Assumption A5 is a technical assumption needed for the asymptotic normality.
Before presenting the asymptotic properties of the proposed estimators, we first give a lemma which shows that the IPW estimation is consistent when the true \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\) are given.
Lemma 3.1
Let \({{\widetilde{\textbf{a}}}}(\tau )=({\widetilde{a}}_0(\tau ), \widetilde{a}_1(\tau ))^{\top }\) be the minimizer of (2.2) with true \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\). Under assumptions A1 and A2, we have that \({{\widetilde{\textbf{a}}}}(\tau )\) is a consistent estimator of \({\widetilde{\varvec{\alpha }}}_0(\tau )\).
Theorem 3.2
Assume that \(\mathcal {P}_{\pi }\) contains a correctly specified model for \(\pi (\textbf{X},T)\) and so does \(\mathcal {P}_{\eta }\) for \(\eta (\textbf{X})\). Under assumptions A1–A4, we have that \({\widehat{\textbf{a}}}(\tau ){\mathop {\rightarrow }\limits ^{P}}{{\widetilde{\varvec{\alpha }}}}_0(\tau )\) as \(n\rightarrow \infty \).
Theorem 3.3
Assume that the conditions in Theorem 3.2 hold. Under the additional assumption A5, we have that
where \(\textbf{D}\) and \(V_{1,i}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) are defined in Section S1 of the online Supplementary Materials, \(\textbf{A}^{\otimes 2}=\textbf{A}\textbf{A}^{\top }\) for a vector/matrix \(\textbf{A}\). Furthermore, we can derive that
where
Finally, for the \(\tau \)-QTE, let \(\textbf{e}_{1}=(1,0)^{\top }\), we have
Theorem 3.2 proves that the proposed IPW estimator is consistent, as long as \(\mathcal {P}_{\pi }\) contains a correctly specified model for \(\pi (\textbf{X},T)\) and so does \(\mathcal {P}_{\eta }\) for \(\eta (\textbf{X})\), which implies the robustness of \({\widehat{\textbf{a}}}(\tau )\). Theorem 3.3 gives the representation of the estimated QTE, where the first part in \(V_{1,i}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) is the representation as if the true \(\pi (\textbf{X},T)\) and \(\eta (\textbf{X})\) were known, and the rest parts account for the multiple robust estimation of \(\pi (\textbf{X},T)\) and \(\eta (\textbf{X})\). The expression of the covariance matrix in Theorem 3.3 is complicated, thus resampling bootstrap would be a good choice for uncertainty quantification of the QTE.
Theorem 3.4
Assume that \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR. Under assumptions A1–A4, we have \(\breve{\textbf{a}}(\tau ){\mathop {\rightarrow }\limits ^{P}}{{\widetilde{\varvec{\alpha }}}}_0(\tau )\) as \(n\rightarrow \infty \).
Theorem 3.5
Assume that the conditions in Theorem 3.4 hold. Under the additional assumption A5, we have
where \(\textbf{e}_{1}=(1,0)^{\top }\) and
where \(\textbf{D}\) and \(V_2(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) are defined in Section S1 of the online Supplementary Materials.
Theorem 3.4 demonstrates that the proposed AIPW estimator is multiply robust, i.e., as long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the resulting estimator \(\breve{\textbf{a}}(\tau )\) is consistent. Theorem 3.5 gives the asymptotic normality of the proposed AIPW estimator. In fact, just as in Theorem 3.3, we first get the representation of \(\breve{\textbf{a}}(\tau )\), and we only present the asymptotic distribution of \(\breve{a}_{1}(\tau )\) for brevity. The proof of Theorem 3.5 is more challenging than Theorem 3.3 due to the additional outcome regression model. Compared with the literature of missing data, Theorem 3.5 is challenging due to confounding adjustment.
4 Simulation Study
In this section, simulation studies are conducted to evaluate the finite performance of the proposed \(\tau \)-QTE estimators. On the one hand, simulation studies are used to illustrate the effectiveness of the proposed estimators in dealing with the confounding effect and missingness. On the other hand, the proposed multiply robust IPW and AIPW estimators have a certain level of resistance to the model misspecifications.
The data are generated as follows. First, we generate the covariates as \(X_{i,1}\sim \text{ Unif }(-0.25, 0.25)\), \(X_{i,2}\sim \text{ Binom }(0.5)\), \(X_{i,3}\sim \text{ Unif }(-0.5, 0.5)\), \(X_{i,4}\sim \text{ Binom }(0.5)\times 0.6\). Second, the treatment indicator \(T_i\) is generated from a logistic regression model as
where \(\varvec{\gamma }=(\gamma _0, \gamma _1,\gamma _2)^{\top }=(0.5,-1.5,-1)^{\top }\). Third, the response is generated from the outcome regression model
where \((\alpha _0,\alpha _1,\ldots , \alpha _5)^{\top }=(3.5,2,1,1,0.5,2)^{\top }\), \(\varepsilon _i\)’s are random errors. Finally, the missing mechanism follows
where \(\varvec{\theta }=(\theta _0,\theta _1,\theta _2,\theta _3)^{\top }\). For the random errors, we consider two cases: (1) \(\varepsilon _i\sim N(0,1)\), (2) \(\varepsilon _i\sim t(3)\), both leading to \(\tau \)-QTE \(\alpha _1=2\). To investigate the impact of different missing probabilities, we choose \(\varvec{\theta }=(2.5,0.5,-3,1)^{\top }\) and \((1.5,0.5,-3,1)^{\top }\), which on average result in \(15\%\) and \(30\%\) missing proportions, respectively.
According to the data generating process, the correct working models for PS and PO are given by \(\text{ logit }\{\eta ^{1}(\textbf{X}_i;\varvec{\gamma }^{1})\}=\gamma _0^1+\gamma _1^1X_{i,1}+\gamma _2^1X_{i,2}\) and \(\text{ logit }\{\pi ^{1}(\textbf{X}_i,T_i; \varvec{\theta }^{1})\}= \theta _0^1+\theta _1^1X_{i,3}+\theta _2^1X_{i,4}+\theta _3^1T_i\), respectively; for the AIPW estimator, the correct working model for \(f_{Y\mid \textbf{X},T}(\cdot )\) is given by the probability density function of \(N(\xi ^{1}_0+\xi ^{1}_1T_i+\xi ^{1}_2X_{i,1}+\xi ^{1}_3X_{i,2}+\xi ^{1}_4X_{i,3}+\xi ^{1}_5X_{i,4},1)\) or non-central t(3) distribution with non-centrality parameter \(\xi ^{1}_0+\xi ^{1}_1T_i+\xi ^{1}_2X_{i,1}+\xi ^{1}_3X_{i,2}+\xi ^{1}_4X_{i,3}+\xi ^{1}_5X_{i,4}\). The additional incorrect models for PS and PO are \(\text{ logit }\{\eta ^{2}(\textbf{X}_i;\varvec{\gamma }^{2})\}=\gamma _0^2+\gamma _1^2\exp (X_{i,1})\) and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i,T_i; \varvec{\theta }^{2})\}= \theta _0^2+\theta _1^2\exp (X_{i,3})\), respectively, and \(N(\xi ^{2}_0+\xi ^{2}_1T_i,1)\) for OR with normal errors and non-central t(3) distribution with non-centrality parameter \(\xi ^{2}_0+\xi ^{2}_1T_i\) for OR with t(3) errors, respectively. The parameters in OR are derived by the quasi-maximum likelihood estimate, that is, the ordinary least squares regression, based on complete-case analysis; we also tried to estimate those parameters by the maximum likelihood method, but it heavily depends on initial value and converges very slowly. We consider two imputation sample sizes \(S=10\) and 50, and only report the results based on \(S=10\), as those from \(S=50\) are similar.
To the best of our knowledge, this is the first work discussing causal inference on quantiles with missing responses, so we only compare the proposed IPW and AIPW estimators, denoted as IPW.1111 and AIPW.111111, with four different classes of estimates, and the descriptions of all estimators are summarized in Table 1. (I) TM-type estimators, the minimizer of the objective function (2.5), where \(\widehat{\eta }(\textbf{X}_i)\) and \(\widehat{\pi }(\textbf{X}_i,T_i)\) are obtained by the empirical likelihood method similarly to those in IPW.1111, but including only one model for \(P(T_{i}=1\mid \textbf{X}_i)\) and \(P(R_{i}=1\mid \textbf{X}_i,T_i)\). (II) TMO-type estimators, the solution of the estimation Eq. (2.8), where the weights are obtained by the empirical likelihood method similarly to those in AIPW.111111, and we include five representative TMO-type estimators. The TMO-type estimators provide double protection for missing issues and only require that either PO or OR model is correct. (III) T-type estimators, the minimizer of an objective function similar to (2.5), where \(\widehat{\eta }(\textbf{X}_i)\) is obtained in the same way as the TM-type estimator, while \(\widehat{\pi }(\textbf{X}_i,T_i)=1\) if \(R_{i}=1\). The T-type estimators consider the causal effect but ignores the feature of missingness. (IV) Naive estimator, denoted as Naive.0000, the minimizer of an objective function similar to (2.5), where \(\widehat{\eta }(\textbf{X}_i)=T_{i}\) and \(\widehat{\pi }(\textbf{X}_i,T_i)=1\) if \(R_{i}=1\). The naive estimator ignores the features of causal effect and missingness.
A sample size of \(n = 400\) and two quantile levels \(\tau =0.3,0.5\) are considered, and all simulation results are summarized based on 1000 replicates. Table 2 summarizes the finite-sample performances of the QTE estimators, including the bias (BIAS), the root mean squared error (RMSE), as well as the empirical coverage probabilities (ECP) and the empirical mean lengths (EML) with nominal level 95%. The confidence intervals are constructed by the bootstrap method, which is implemented by the summary.rq function of the R package quantreg, and the number of resampling replicates is 500. From Table 2, we have the following findings. First, when both PS and PO candidate models include a correctly specified model, the obtained estimates TM.1010 have apparently small BIAS and RMSE than the estimators under model misspecification. The proposed IPW.1111 has almost the same performance as TM.1010, and performs even better in some scenarios. Second, when the PS model is correctly specified, the estimators are reasonably good even if the PO model is misspecified, though the BIAS and RMSE are relatively larger than IPW.1111 and TM.1010, see the performances of TM.1001, T.1100, and T.1000. However, if the PS model is misspecified, no matter whether the PO model is correctly specified, the BIAS and RMSE are much larger than IPW.1111. Third, the proposed AIPW estimator has lower RMSE than the IPW estimator. As long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the resulting estimators are close to TMO.101010 and AIPW.111111, which reflects that the AIPW estimator provides more protection for model specification. Similar to the TM-type estimator, if the PS model is misspecified, even if the models for PO and OR are correct, the BIAS and RMSE are much larger than other estimators. Fourth, the ECPs of IPW.1111, AIPW.111111, TM.1010 and TMO.11$$$$ estimator (except TMO.110101), are close to the nominal level, while others lead to low ECPs. Table 3 summarizes the results for t(3) error, and the comparison results are similar.
Finally, we would like to give a comment on the comparisons between AIPW.111111 and IPW.1111. From the view of theory, the AIPW approach is more efficient than the IPW approach, as we use more information in the AIPW. However, even if the true outcome model is included, there are several sources that lead to additional variation in the estimator: (i) the estimation of the parameters in the true outcome model; (ii) the inclusion of the misspecified outcome model in the empirical likelihood approach; (iii) sampling from the outcome models. Therefore, the AIPW approach does not benefit from augmentation when the missing rate is low (say 15%), but performs slightly better when the missing rate achieves 30%.
5 Analysis of CHARLS Data
Cognitive function is a critical dimension of the life quality in later life, the decline of which disrupts daily life function and happiness. It is thus important to study the determinants of cognitive function in order to know how to delay and/or slow down its eventual decline. A growing body of research suggested that social activities or social engagement among the elderly have a positive effect on cognitive function [13, 16]. In this section, we apply the proposed method to the motivated data set from the Chinese Health and Retirement Longitudinal Study (CHARLS), collected in 2018, and we aim to study the causal effect of social activities on cognitive functions among middle-aged and older adults in China.
Following the analysis of Hu et al. [13], we use episodic memory, a necessary component of reasoning in many cognitive dimensions, as the outcome variable. Episodic memory was evaluated through the means of scores in the immediate and delayed word recall, with scores ranged from 0 to 10. After discarding participants who are either younger than 45 years old, or social activity information (the covariates) missing, 7818 participants are included in our study, with 1889 whose outcome variables are not observed completely, yielding around 24% missingness. The treatment variable is whether to participate in certain common specified activities in China, such as playing chess, card games or Mahjong, interacting with friends, and other social activities. Other covariates (confounders) in the analysis are composed of sociodemographic characteristics and health status related to cognitive function, including age, gender, resident areas (urban or rural), education levels (illiterate, primary education, secondary or above), marital status (currently married or not), smoke (yes or no), hypertension (yes or no), diabetes (yes or no), cardiopathy (yes or no), apoplexy (yes or no), where age is standardized and education levels introduce two dummy variables, \(\textrm{education}_\textrm{a}\) (primary education as 1, otherwise 0) and \(\textrm{education}_\textrm{b}\) (secondary and above as 1, otherwise 0). The QTE is defined as the quantile difference of cognitive distributions between the treatment (participate in at least one social activity) and control (not participate in any social activities) groups.
To apply our method, we need specify candidate models. For PS \(\eta (\textbf{X})\), we consider two candidate models: \(\text{ logit }\{\eta ^{1}(\textbf{X}_i;\varvec{\gamma }^{1})\}=\gamma _0^{1}+\varvec{\gamma }_1^{\top }\textbf{X}_{i}\) and \(\text{ logit }\{\eta ^{2}(\textbf{X}_i^*;\varvec{\gamma }^{2})\}=\gamma _0^{2}+\varvec{\gamma }_2^{\top }\textbf{X}_{i}^*\), where \(\textbf{X}_{i}\) only contains all covariates we mentioned above, while \(\textbf{X}_{i}^*\) includes \(\textbf{X}_i\) and the interactions between the age and other covariates. For PO, we consider \(\text{ logit }\{\pi ^{1}(\textbf{X}_i,T_i; \varvec{\theta }^{1})\}= \theta _0^{1}+\varvec{\theta }_1^{\top }\textbf{X}_{i}+\theta _{11}T_i\) and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i^*,T_i; \varvec{\theta }^{2})\}= \theta _0^{2}+\varvec{\theta }_2^{\top }\textbf{X}_{i}^*+\theta _{21}T_i\). For AIPW estimator, the two models specified for \(f_{Y\mid \textbf{X},T}(\cdot )\) are given by the probability density functions of \(N(\xi ^{1}_0+\xi _{11}T_i+\varvec{\xi }^{\top }_1\textbf{X}_{i},\sigma ^{2}_{1})\) and \(N(\xi ^{2}_0+\xi _{21}T_i+\varvec{\xi }^{\top }_2\textbf{X}_{i}^*,\sigma ^{2}_{2})\). We compare the proposed IPW (IPW.1111) and AIPW (AIPW.111111) estimators with four different classes of estimators as in Sect. 4.
Table 4 summarizes the point estimators, the bootstrap standard error (BSE), and 95% confidence intervals at quantile levels \(\tau =0.3, 0.5\) and 0.7. The confidence intervals are constructed by 500 bootstrap samples. We have the following findings. First, the positive estimates from all the methods indicate that the social activities have positive effects on cognition, and the effect is significant since none of the 95% confidence intervals includes zero, which is consistent with previous literature [6, 13]. Second, the causal effects of social activities on cognitive functions vary across different quantiles. The QTE at lower quantile is larger than those at the median or higher quantile, which indicates that social activities have greater effects on cognitive improvement of individuals with low cognitive function. This new finding provides important personalized intervention evidence for improving cognitive function. Third, the proposed estimator has a certain discrepancy with T.$$$$ and Naive.0000, showing the necessity of dealing with the confoundedness and missingness. Finally, the proposed IPW.1111 is very close to TM.0101, which indicates the suitability of \(\text{ logit }\{\eta ^{2}(\textbf{X}_i^*;\varvec{\gamma }^{2})\}\) for PS and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i^*,T_i; \varvec{\theta }^{2})\}\) for PO; the proposed AIPW.111111 deviates from IPW.1111 in some extent, and according to our theory, the inclusion of the augmentation term plays a role, producing more trustworthy results.
6 Discussion
In this paper, we proposed multiply robust IPW and AIPW estimators for the QTE with missing responses. The proposed IPW estimator is robust against model misspecification for PS and PO, and the proposed AIPW estimator further strengthens this robustness. Further research topics can be proposed. First, the methods proposed can be generalized to estimate QTE at extreme quantile levels [26] with missing responses. Second, under memory constraint, we can extend our method to distributed computing [4, 5]. Third, we can further consider the statistical inference of the QTE [2] with missing responses, testing whether the QTE is zero, or heterogeneous to covariates, or varies across quantiles.
References
Bang, H., Robins, J.M.: Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973 (2005)
Cai, Z.W., Fang, Y., Lin, M., Tang, S.F.: Inferences for partially conditional quantile treatment effect model. Technical Report 202005, Department of Economics, University of Kansas (2020)
Chen, C.X., Shen, B.Y., Liu, A.Y., Wu, R.L., Wang, M.: A multiple robust propensity score method for longitudinal analysis with intermittent missing data. Biometrics 77(2), 519–532 (2021)
Chen, L.J., Li, D.Y., Zhou, C.: Distributed inference for the extreme value index. Biometrika 109(1), 257–264 (2021)
Chen, X., Liu, W.D., Zhang, Y.C.: Quantile regression under memory constraint. Ann. Stat. 47(6), 3244–3273 (2019)
Christelis, D., Dobrescu, L.I.: The causal effect of social activities on cognition: evidence from 20 European countries. Soc. Sci. Medi. 247, 1–9 (2020)
Donald, S., Hsu, Y.C.: Estimation and inference for distribution functions and quantile functions in treatment effect models. J. Econom. 178, 383–397 (2014)
Firpo, S.: Efficient semiparametric estimation of quantile treatment effects. Econometrica 75(1), 259–276 (2007)
Han, P.S., Kong, L.L., Zhao, J.W., Zhou, X.C.: A general framework for quantile estimation with incomplete data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 81(2), 305–333 (2019)
Han, P.S., Wang, L.: Estimation with missing data: beyond double robustness. Biometrika 100, 417–430 (2013)
Han, P.S.: Multiply robust estimation in regression analysis with missing data. J. Am. Stat. Assoc. 109(507), 1159–1173 (2014)
Hernán, M.A., Robins, J.M.: Causal Inference: What If. Chapman & Hall/CRC, Boca Raton (2020)
Hu, Y.Q., Lei, X.Y., Smith, J.P., Zhao, Y.H.: Effects of social activities on cognitive functions: evidence from CHARLS. In: Aging in Asia: Findings From New and Emerging Data Initiatives, pp. 279–308, Chap. 12. National Academies Press (US), Washington (2012)
Imbens, G.W., Rubin, D.B.: Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge (2015)
Lin, H.Z., Zhou, F.Y., Wang, Q.X., Zhou, L., Qin, J.: Robust and efficient estimation for the treatment effect in causal inference and missing data problems. J. Econom. 205(2), 363–380 (2018)
Liu, H.X., Fan, X.J., Luo, H.Y., Zhou, Z.L., Shen, C., Hu, N.B., Zhai, X.M.: Comparison of depressive symptoms and its influencing factors among the elderly in urban and rural areas: evidence from the china health and retirement longitudinal study (CHARLS). Int. J. Environ. Res. Public Health 18(8), 3886 (2021)
Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23, 2937–2960 (2004)
Melly, B.: Estimation of counterfactual distributions using quantile regression. University of St.Gallen, Monograph (discussion paper) (2006)
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (2004)
Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22, 278–295 (2013)
Tan, Z.Q.: Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika 97(3), 661–682 (2010)
Wei, Y., Ma, Y.Y., Carroll, R.J.: Multiple imputation in quantile regression. Biometrika 99, 423–438 (2012)
Xie, Y.Y., Cotton, C., Zhu, Y.Y.: Multiply robust estimation of causal quantile treatment effects. Stat. Med. 39, 4238–4251 (2020)
Zhang, Y.C.: Extremal quantile treatment effects. Ann. Stat. 46(6B), 3707–3740 (2018)
Zhang, Z.W., Chen, Z., Troendle, J.F., Zhang, J.: Causal inference on quantiles with an obstetric application. Biometrics 68, 697–706 (2012)
Acknowledgements
The authors would like to thank two anonymous reviewers, an associate editor and the editor for constructive comments and helpful suggestions. This publication is based upon work partially supported by National Natural Science Foundation of China grants 11871376 and 11871164, Natural Science Foundation of Shanghai 21ZR1420700 and Open Research Fund of KLATASDS-MOE.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary information
The online Supplemental Materials include some notations used in asymptotic properties and all technical proofs. (pdf 410KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Qin, G., Tang, Y. et al. Multiply Robust Estimation of Quantile Treatment Effects with Missing Responses. Commun. Math. Stat. (2023). https://doi.org/10.1007/s40304-023-00380-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40304-023-00380-4