1 Introduction

Causal inference has attracted more and more attention, which aims to identify the extent and nature of cause-and-effect relationships. For example, in the motivated Chinese Health and Retirement Longitudinal Study (CHARLS) in Sect. 5, we aim to study the causal effect of social activities on cognitive functions, among middle-aged and older adults in China. Due to possible heterogeneity in the data, it is necessary to provide a complete picture of the relationship between social activities and cognitive functions, rather than the conditional mean relationship. Furthermore, approximately 24% of the responses are missing, which further challenges the analysis.

In causal analysis, one most popular technique is the potential outcome model [19, 20], where the average treatment effect (ATE) is an important measurement, which measures the average difference between the treatment group and control group; see fruitful studies on ATE, Rosenbaum and Rubin [19, 20], Lunceford and Davidian [17], Tan [23], Imbens and Rubin [14], Hernan and Rubins [12]. However, the ATE is not informative enough when there is heterogeneity in the data and is sensitive to the outliers. The quantile treatment effect (QTE) gives a more complete picture of the causal effect and is robust to the heavy tails of the responses, thus is of growing interests. To estimate the QTE in the absence of randomization, classical methods to deal with the confounding effect include propensity score, outcome regression, and doubly robust methods (e.g., [7, 8, 18, 27]).

As in the motivated CHARLS data, response missing is often encountered in real applications, particularly in observational studies. Among various missing mechanisms, the most useful one is missing at random (MAR), which assumes that the missingness only depends on the observed values. To avoid inconsistent estimation by simply ignoring the missing data, several classes of methods are proposed, say inverse probability weighting (IPW, [22]), augmented IPW (AIPW, [1]), imputation [21]. Recently, double robust and multiply robust methods are developed under the MAR assumption [3, 9,10,11, 15], to provide robustness against model misspecification. Xie et al. [25] further extended the multiply robust method to handle QTE estimation.

Current literature for quantile regression focuses on either causal inference [8, 27] or missing data [9, 24]. To our best knowledge, there is no literature which studies both issues simultaneously, possibly due to the complexity introduced by both the confounding effects and response missing. However, in practice, the combination of these two issues is frequently encountered, such as the motivated CHARLS data in Sect. 5. To handle both issues, we need to correctly specify the propensity score (PS) model for confounding adjustment and the probability of being observed (PO) or the outcome regression (OR) to deal with missing data, which is challenging in practice.

In this paper, we develop two multiply robust methods to estimate the QTEs with responses being MAR. The first method is that, based on multiply robust estimations of the PS and PO, the IPW approach is utilized to develop a weighted objective function to estimate the QTEs. The second approach, multiply robust AIPW, is to make further resistance to the model misspecification, which relaxes the restriction on the PO. The contributions can be summarized as follows. Firstly, the proposed methods provide a complete picture of the causal effects, which is helpful to identify possible heterogeneity and thus make personalized intervention; this merit is confirmed in the motivated CHARLS data. Secondly, the proposed methods are able to adjust the confounding effect and deal with missing data, which are commonly encountered in the real applications, providing consistent QTE estimates. Finally, the proposed two estimators are multiply robust. For the IPW method, the resulting estimator is consistent if the class of candidate models of PS contains the correct model and so does that for the PO. For the AIPW method, the restriction on the PO can be relaxed to that either PO or OR contains a correct specified model. Furthermore, the proof of consistency and asymptotic normality properties is not trivial, especially for the proposed AIPW estimator, which is mainly from the cumulative challenges of the augmentation and confounding adjustment.

The rest parts are organized as follows. In Sect. 2, we introduce the potential outcome framework and some key assumptions, and propose the multiply robust IPW and AIPW estimators of QTE with missing responses. We present the asymptotic properties in Sect. 3. Simulation studies are conducted to investigate the finite-sample performance in Sect. 4. Finally, the proposed method is applied to the motivated CHARLS data, showing that the causal effects of social activities on cognitive functions vary across different quantile levels of the responses. All technical proofs are provided in Section S2 of the online Supplementary Materials.

2 The Proposed Method

2.1 Notation and Identification

Let Y(1) and Y(0) be the potential outcomes for the treated and control groups, \(\textbf{X}\) be the covariate vector, and T be the treatment status where \(T=1\) means treated and \(T=0\) means untreated. Under the consistency assumption [25], we observe the outcome \(Y_i = T_iY_i(1)+(1-T_i)Y_i(0)\). Let \(\eta (\textbf{X})=Pr(T=1\mid \textbf{X})\) be the propensity score (PS), satisfying \(0<\eta (\textbf{X})<1\) for all \(\textbf{X}\).

We are interested in the \(\tau \)-quantile treatment effect (\(\tau \)-QTE), defined as

$$\begin{aligned} \Delta _q(\tau ) = q_1(\tau )-q_0(\tau ), \end{aligned}$$

where \(\tau \in (0,1)\) is a quantile index, and \(q_1(\tau )\) and \(q_0(\tau )\) are the \(\tau \)-quantile of random variables Y(1) and Y(0), respectively.

Assuming that the treatment assignment is strongly ignorable, i.e., (Y(1), Y(0)) are independent of T conditional on \(\textbf{X}\). Under this assumption, \(q_1(\tau )\) and \(q_0(\tau )\) are identified based on the following moment equalities,

$$\begin{aligned} \textrm{E}\left( \frac{T}{\eta (\textbf{X})}[\tau -I\{Y\le q_1(\tau )\}]\right) =0,~\textrm{E}\left( \frac{1-T}{1-\eta (\textbf{X})}[\tau -I\{Y\le q_0(\tau )\}]\right) =0.\nonumber \\ \end{aligned}$$
(2.1)

Let \(\alpha _0(\tau )=q_0(\tau )\) and \(\alpha _1(\tau )=q_1(\tau )-q_0(\tau )\), thus \(\alpha _1(\tau )\) is our interested \(\tau \)-QTE. Equation (2.1) is equivalent to

$$\begin{aligned} \textrm{E}\left( \frac{{\widetilde{\textbf{T}}}^{\top }}{\eta (\textbf{X})}[\tau -I\{Y\le {\widetilde{\textbf{T}}}^{\top }{\widetilde{\varvec{\alpha }}}_{0}(\tau )\}]-\frac{(1-T,0)^{\top }}{1-\eta (\textbf{X})}[\tau -I\{Y\le {\widetilde{\textbf{T}}}^{\top }{\widetilde{\varvec{\alpha }}}_{0}(\tau )\}]\right) =0, \end{aligned}$$

where \({\widetilde{\textbf{T}}}=(1,T)^{\top }\), \({\widetilde{\varvec{\alpha }}}_{0}(\tau )=(\alpha _0(\tau ),\alpha _1(\tau ))^{\top }\). Then, \(\alpha _1(\tau )\) can be identified.

In practice, the responses are possibly missing. Let R be an indicator of observing Y with \(R=1\) if Y is observed and \(R=0\) otherwise. The MAR mechanism is considered here which assumes \(\textrm{Pr}(R=1\mid Y,\textbf{X},T) = \textrm{Pr}(R=1\mid \textbf{X},T)\). Denote the probability of being observed (PO) by \(\pi (\textbf{X},T)=\textrm{Pr}(R=1\mid \textbf{X},T)\).

2.2 Multiply Robust IPW Estimator

To simultaneously handle both confoundedness and missingness, we introduce PS \(\eta (\textbf{X})=\textrm{Pr}(T=1\mid \textbf{X})\) to account for the confoundedness and PO \(\pi (\textbf{X},T)=\textrm{Pr}(R=1\mid \textbf{X},T)\) to deal with the missingness, estimating the \(\tau \)-QTE by minimizing

$$\begin{aligned}{} & {} Q_{n,\tau }(\textbf{a};\pi ,\eta )\nonumber \\{} & {} \quad =\frac{1}{n}\sum _{i=1}^n\frac{R_i}{\pi (\textbf{X}_i,T_i)}\Big \{ \frac{T_i}{\eta (\textbf{X}_i)}\rho _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{1-T_i}{1-\eta (\textbf{X}_i)}\rho _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\Big \},~~~~~~ \end{aligned}$$
(2.2)

with respect to \(\textbf{a}=(a_0,a_1)^{\top }\) where \(\rho _{\tau }(u)=u\{\tau -I(u<0)\}\) corresponds to the check loss function. To deal with the missingness, we use \(1/\pi (\textbf{X}_i,T_i)\) to weight each observed subject, which helps to recover the population information based on the biased sample \(\{i: R_i=1\}\). To handle the confounding effect, \(\eta (\textbf{X}_i)\) is used to balance the weighted distribution of the covariates in the two groups.

In practice, \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\) are unknown, and need to be estimated. Generally, we can use parametric methods to estimate \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\), assuming that PS and PO follow a generalized linear model, say the logistic model, and then obtain their maximum likelihood estimates. However, the parametric methods require to specify the models for PS and PO correctly, which is usually infeasible in practice, and we need some robustness against model misspecifications. In this paper, we follow the idea in Han [11] and Han et al. [9], obtaining the estimates through the multiply robust approach.

We first present the estimator for \(\pi (\cdot )\). Let \(\mathcal {P}_{\pi }=\{\pi ^j(\textbf{X},T;\varvec{\theta }^j): j=1,\ldots ,J\}\) denote the set of candidate models for \(\pi (\textbf{X},T)\), where \(\varvec{\theta }^j\) is the corresponding parameter vector. We use \({\widehat{\varvec{\theta }}}^j\) to denote the estimator of \(\varvec{\theta }^j\), usually taking to be the maximizer of the binomial likelihood

$$\begin{aligned} \prod _{i=1}^n\{\pi ^j(\textbf{X}_i,T_i;\varvec{\theta }^j)\}^{R_i}\{1-\pi ^j(\textbf{X}_i,T_i;\varvec{\theta }^j)\}^{1-R_i}. \end{aligned}$$

Let \(m=\sum _{i=1}^n R_i\) be the number of subjects observed, and without loss of generality, we assume that \(R_1=\cdots =R_m=1, R_{m+1}=\cdots =R_n=0\). Let \(\omega _1(\textbf{X},T)=1/\pi (\textbf{X},T)\), then it is easy to verify that

$$\begin{aligned} \textrm{E} \left( \omega _1(\textbf{X},T)[\pi ^j(\textbf{X},T;\varvec{\theta }^j)-\textrm{E} \{\pi ^j(\textbf{X},T;\varvec{\theta }^j)\}] \mid R=1\right) =0~(j=1,\ldots ,J).\nonumber \\ \end{aligned}$$
(2.3)

Therefore, we define the weights \(\omega _{1,i}\), \(i=1,\ldots ,m\), subject to

$$\begin{aligned}{} & {} \omega _{1,i}\ge 0~(i=1,\ldots ,m),~\frac{1}{n}\sum _{i=1}^m\omega _{1,i}=1,~\nonumber \\{} & {} \quad \frac{1}{n}\sum _{i=1}^m\omega _{1,i}\{\pi ^j(\textbf{X}_i,T_i;{\widehat{\varvec{\theta }}}^j)-{\widehat{\zeta }}^{1,j}({\widehat{\varvec{\theta }}}^j)\}=0~(j=1,\ldots ,J),~~ \end{aligned}$$
(2.4)

where \({\widehat{\zeta }}^{1,j}(\varvec{\theta }^j)=n^{-1}\sum _{i=1}^n\pi ^j(\textbf{X}_i,T_i;\varvec{\theta }^j)\). We obtain the empirical likelihood estimate of \(\omega _{1,i}\) by maximizing \(\prod _{i=1}^m\omega _{1,i}\) subject to the constraints in (2.4). We then give

$$\begin{aligned} {\widehat{\pi }}(\textbf{X}_i,T_i)=\frac{1}{{\widehat{\omega }}_{1,i}}, ~i=1,\ldots ,m. \end{aligned}$$

Next, we obtain the PS estimates for the treatment group and the control group, respectively. Let \(\mathcal {P}_{\eta }=\{\eta ^l(\textbf{X};\varvec{\gamma }^l): l=1,\ldots ,L\}\) denote the set of candidate models for \(\eta (\textbf{X})\), where \(\varvec{\gamma }^l\) is the corresponding parameter vector. We use \({\widehat{\varvec{\gamma }}}^l\) to denote the estimator of \(\varvec{\gamma }^l\), usually taking to be the maximizer of the binomial likelihood

$$\begin{aligned} \prod _{i=1}^n\{\eta ^l(\textbf{X}_i;\varvec{\gamma }^l)\}^{T_i}\{1-\eta ^l(\textbf{X}_i;\varvec{\gamma }^l)\}^{1-T_i}. \end{aligned}$$

Step 1, we get the PS estimates for the treatment group. Denote \(\mathcal {T}_n=\{i: T_i=1\}\), and let

$$\begin{aligned} {\widehat{\eta }}_{\mathcal {T}_n}(\textbf{X}_i)=\frac{1}{{\widehat{\omega }}_{2,i}}, ~i\in \mathcal {T}_n, \end{aligned}$$

where \(\{{\widehat{\omega }}_{2,i},i\in \mathcal {T}_n\}\) are obtained by maximizing \(\prod _{i\in \mathcal {T}_n}\omega _{2,i}\), subject to

$$\begin{aligned}{} & {} \omega _{2,i}\ge 0~(i\in \mathcal {T}_n),~\frac{1}{n}\sum _{i\in \mathcal {T}_n}\omega _{2,i}=1,\\{} & {} \quad \frac{1}{n}\sum _{i\in \mathcal {T}_n}\omega _{2,i}\{\eta ^l(\textbf{X}_i;{\widehat{\varvec{\gamma }}}^l)-\zeta ^{2,l}({\widehat{\varvec{\gamma }}}^l)\}=0~(l=1,\ldots ,L), \end{aligned}$$

with \(\zeta ^{2,l}(\varvec{\gamma }^l)=n^{-1}\sum _{i=1}^n\eta ^l(\textbf{X}_i;\varvec{\gamma }^l)\).

Step 2, we get the PS estimates for the control group. Let \(\mathcal {T}_n^c=\{1,\ldots ,n\}\backslash \mathcal {T}_n\) and let

$$\begin{aligned} 1-{\widehat{\eta }}_{\mathcal {T}_n^c}(\textbf{X}_i)=\widehat{\textrm{Pr}}(T_i=0\mid \textbf{X}_i)=\frac{1}{{\widehat{\omega }}_{3,i}},~i\in \mathcal {T}_n^c, \end{aligned}$$

where \(\{{\widehat{\omega }}_{3,i},i\in \mathcal {T}_n^c\}\) are obtained by maximizing \(\prod _{i\in \mathcal {T}_n^c}\omega _{3,i}\), subject to

$$\begin{aligned}{} & {} \omega _{3,i}\ge 0~(i\in \mathcal {T}_n^c),~\frac{1}{n}\sum _{i\in \mathcal {T}_n^c}\omega _{3,i}=1,\\{} & {} \quad \frac{1}{n}\sum _{i\in \mathcal {T}_n^c}\omega _{3,i}[\{1-\eta ^l(\textbf{X}_i;{\widehat{\varvec{\gamma }}}^l)\}-\{1-\zeta ^{2,l}({\widehat{\varvec{\gamma }}}^l)\}]=0~(l=1,\ldots ,L). \end{aligned}$$

Given \(\{{\widehat{\eta }}(\textbf{X}_i), i=1,\ldots ,n\}=\{{\widehat{\eta }}_{\mathcal {T}_n}(\textbf{X}_i):i\in \mathcal {T}_n\}\cup \{{\widehat{\eta }}_{\mathcal {T}_n^c}(\textbf{X}_i):i\in \mathcal {T}_n^c\}\) and \(\{{\widehat{\pi }}(\textbf{X}_i,T_i), i=1,\ldots ,m\}\), we propose to estimate the \(\tau \)-QTE by minimizing

$$\begin{aligned}{} & {} Q_{n,\tau }(\textbf{a};{\widehat{\pi }},{\widehat{\eta }})\nonumber \\{} & {} \quad =\frac{1}{n}\sum _{i=1}^n\frac{R_i}{{\widehat{\pi }}(\textbf{X}_i,T_i)}\Big \{ \frac{T_i}{\widehat{\eta }(\textbf{X}_i)}\rho _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{1-T_i}{1-\widehat{\eta }(\textbf{X}_i)}\rho _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\Big \},~~~~~~ \end{aligned}$$
(2.5)

with respect to \(\textbf{a}\). Denote the minimizer as \({\widehat{\textbf{a}}}(\tau )=({\widehat{a}}_0(\tau ),{\widehat{a}}_1(\tau ))^{\top }\), then \({\widehat{a}}_1(\tau )\) is the proposed estimator for the \(\tau \)-QTE \(\alpha _1(\tau )\), while \({\widehat{a}}_0(\tau )\) is the estimator for \(\alpha _0(\tau )\), the marginal \(\tau \)-th quantile of the control group. In Sect. 3, we will prove that, if \(\mathcal {P}_{\pi }\) and \(\mathcal {P}_{\eta }\) contain the correct PO and PS models, respectively, \({\widehat{a}}_1(\tau )\) is root-n consistent to the \(\tau \)-QTE and asymptotic normal.

2.3 Multiply Robust AIPW Estimator

In our objective function (2.2), we only specify the model for PO to address the missing issues. To gain more resistance against the model misspecification, we develop an augmented inverse probability weighted (AIPW) method to estimate the \(\tau \)-QTE by solving

$$\begin{aligned}{} & {} \frac{1}{n}\sum _{i=1}^n\frac{R_i}{\pi (\textbf{X}_i,T_i)}\left\{ \frac{(T_i,T_i)^{\top }}{\eta (\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{(1-T_i,0)^{\top }}{1-\eta (\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\right\} -\left\{ \frac{R_i}{\pi (\textbf{X}_i,T_i)}-1\right\} \\{} & {} \quad \quad \times \left\{ \frac{(T_i,T_i)^{\top }}{\eta (\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }(Y^{s}_{\textbf{X}_{i},T_i}-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{(1-T_i,0)^{\top }}{1-\eta (\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }(Y^{s}_{\textbf{X}_{i},T_i}-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\right\} \\{} & {} \quad \approx 0, \end{aligned}$$

where the set \(\{Y^{s}_{\textbf{X},T}\}_{s=1}^{S}\) is a random sample of size S from the conditional density \(f_{Y\mid \textbf{X},T}(\cdot )\). The first term is the score function of the IPW estimator [see expression (2.2)], the augmentation term is motivated by the imputation estimator, which is added to obtain more information from the outcome regression (OR) and provide double protection against model misspecifications.

Note that \(f_{Y\mid \textbf{X},T}(\cdot )\) is unknown, we need estimate \(f_{Y\mid \textbf{X},T}(\cdot )\) to draw random samples. To gain some resistance against the model misspecifications, we also adopt the multiply robust idea. Let \(\mathcal {P}_{f}=\{f^{k}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{k}): k=1,\ldots ,K\}\) denote the set of candidate models for \(f_{Y\mid \textbf{X},T}(\cdot )\), where \(\varvec{\xi }^{k}\) is the corresponding parameter vector. We use \({\widehat{\varvec{\xi }}}^k\) to denote the estimator of \(\varvec{\xi }^k\), usually taking to be the maximizer of the likelihood function

$$\begin{aligned} \prod _{i=1}^n f^{k}_{Y\mid \textbf{X}_i,T_i}(Y_{i};\varvec{\xi }^{k})^{R_i}. \end{aligned}$$
(2.6)

Let \(\omega (\textbf{X},T)=1/\pi (\textbf{X},T)\), it is easy to verify that

$$\begin{aligned}{} & {} \textrm{E} \left( \omega (\textbf{X},T)[\pi ^j(\textbf{X},T;\varvec{\theta }^j)-\textrm{E} \{\pi ^j(\textbf{X},T;\varvec{\theta }^j)\}] \mid R=1\right) =0, (j=1,\ldots ,J),\nonumber \\{} & {} \textrm{E} \left\{ \omega (\textbf{X},T)\left( \left[ \frac{(T,T)^{\top }}{\eta (\textbf{X})}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y^{s}(\varvec{\xi }^k)-{\widetilde{\textbf{T}}}^{\top }\textbf{a}\}\right. \right. \right. \nonumber \\{} & {} \quad \left. \left. \left. +\frac{(1-T,0)^{\top }}{1-\eta (\textbf{X})}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y^{s}(\varvec{\xi }^k)-{\widetilde{\textbf{T}}}^{\top }\textbf{a}\}\right] \right. \right. \nonumber \\{} & {} \quad -\textrm{E} \left[ \frac{(T,T)^{\top }}{\eta (\textbf{X})}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y^{s}(\varvec{\xi }^k)-{\widetilde{\textbf{T}}}^{\top }\textbf{a}\}\right. \nonumber \\{} & {} \quad \left. \left. \left. +\frac{(1-T,0)^{\top }}{1-\eta (\textbf{X})}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y^{s}(\varvec{\xi }^k)-{\widetilde{\textbf{T}}}^{\top }\textbf{a}\}\right] \right) \mid R=1\right\} \nonumber \\{} & {} \triangleq \textrm{E}\left( \omega (\textbf{X},T)\left[ \textbf{U}^{k}(\textbf{X},T;\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a})-\textrm{E}\{\textbf{U}^{k}(\textbf{X},T;\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a})\}\right] \mid R=1\right) =0, \nonumber \\{} & {} \quad (k=1,\ldots ,K),~~~~ \end{aligned}$$
(2.7)

where \(\{Y^{s}(\varvec{\xi }^k)\}_{s=1}^{S}\) denotes a random sample of size S from \(f^{k}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{k})\).

Based on the sample versions of the above two equations, we use the empirical likelihood method to estimate \(\tau \)-QTE by solving

$$\begin{aligned} \sum _{i=1}^m\widehat{w}_{i}\Big \{ \frac{(T_i,T_i)^{\top }}{\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{(1-T_i,0)^{\top }}{1-\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\Big \}\triangleq M_{n,\tau }(\textbf{a};{\widehat{\omega }},{\widehat{\eta }})\approx 0, \end{aligned}$$
(2.8)

where the weighted estimating equation is solved by converting it into the weighted quantile regression and call the package quantreg, and \(\widehat{\eta }(\textbf{X}_i)\) is obtained by the multiply robust method in Sect. 2.2 and \(\widehat{w}_{i}\)’s are calculated by the following steps.

  • We first calculate \({\widehat{\varvec{\xi }}}^k\), \(k=1,2,\dots ,K\), by maximizing the expression (2.6). And we draw a random sample \(\{Y^{s}(\widehat{\varvec{\xi }}^k)\}_{s=1}^{S}\) from \(f^{k}_{Y\mid \textbf{X},T}(y;\widehat{\varvec{\xi }}^{k})\).

  • Based on the estimator \(\widehat{\eta }(\textbf{X}_i)\), calculate \(\widehat{\textbf{a}}^{k}_{S}(\tau )\), \(k=1,2,\dots ,K\), by solving

    $$\begin{aligned}{} & {} \frac{1}{n}\sum _{i=1}^nR_i\Big \{ \frac{(T_i,T_i)^{\top }}{\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})+\frac{(1-T_i,0)^{\top }}{1-\widehat{\eta }(\textbf{X}_i)}\psi _{\tau }(Y_i-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a})\Big \}+(1-R_i)\\{} & {} \qquad \times \Bigg [ \frac{(T_i,T_i)^{\top }}{\widehat{\eta }(\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y_{i}^{s}(\widehat{\varvec{\xi }}^k)-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a}\}+\frac{(1-T_i,0)^{\top }}{1-\widehat{\eta }(\textbf{X}_i)}\frac{1}{S}\sum _{s=1}^S\psi _{\tau }\{Y_{i}^{s}(\widehat{\varvec{\xi }}^k)-{\widetilde{\textbf{T}}}_i^{\top }\textbf{a}\}\Bigg ]\\{} & {} \quad \approx 0, \end{aligned}$$

    where the weighted estimating equation is solved by converting it into the weighted quantile regression and call the package quantreg.

  • Finally, \(\widehat{w}_{i}\) can be obtained by maximizing \(\prod _{i=1}^m\omega _{i}\) subject to

    $$\begin{aligned}{} & {} \omega _{i}\ge 0~(i=1,\ldots ,m),~\sum _{i=1}^m\omega _{i}=1,~\nonumber \\{} & {} \quad \sum _{i=1}^m\omega _{i}\{\pi ^j(\textbf{X}_i,T_i;{\widehat{\varvec{\theta }}}^j)-{\widehat{\zeta }}^{1,j}({\widehat{\varvec{\theta }}}^j)\}=0~(j=1,\ldots ,J),\nonumber \\{} & {} \quad \sum _{i=1}^m\omega _{i}\{\textbf{U}^{k}(\textbf{X}_i,T_i;{\widehat{\varvec{\xi }}}^k,\widehat{\eta }(\textbf{X}_{i}),\widehat{\textbf{a}}^{k}_{S}(\tau ))-{\widehat{\zeta }}^{k}({\widehat{\varvec{\xi }}}^k,\widehat{\eta }(\textbf{X}_{i}),\widehat{\textbf{a}}^{k}_{S}(\tau ))\}=0~\nonumber \\{} & {} \quad \quad \quad \quad \quad (k=1,\ldots ,K), \end{aligned}$$
    (2.9)

    where \({\widehat{\zeta }}^{1,j}(\varvec{\theta }^j)\) is defined in (2.4) and

    $$\begin{aligned} {\widehat{\zeta }}^{k}(\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a}^{k}_{S}(\tau ))=n^{-1}\sum \limits _{i=1}^n\textbf{U}^{k}(\textbf{X}_i,T_i;\varvec{\xi }^k,\eta (\textbf{X}),\textbf{a}^{k}_{S}(\tau )). \end{aligned}$$

Denote the solution of Eq. (2.8) as \(\breve{\textbf{a}}(\tau )=(\breve{a}_0(\tau ),\breve{a}_1(\tau ))^{\top }\), then \(\breve{a}_1(\tau )\) is the proposed multiply robust AIPW estimator of the \(\tau \)-QTE \(\alpha _1(\tau )\). In Sect. 3, we will prove that, as long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the \(\tau \)-QTE \(\breve{a}_1(\tau )\) is root-n consistent and asymptotic normal.

3 Asymptotic Properties

In this section, we present the asymptotic properties of the proposed two QTE estimators. Without loss of generality, let \(\pi ^1(\textbf{X},T;\varvec{\theta }^1)\), \(\eta ^1(\textbf{X};\varvec{\gamma }^1)\) and \(f^{1}_{Y\mid \textbf{X},T}(y;\varvec{\xi }^{1})\) be the correctly specified models for \(\pi (\textbf{X},T)\), \(\eta (\textbf{X})\) and \(f_{Y\mid \textbf{X},T}(y)\), respectively, with \(\varvec{\theta }_0^1\), \(\varvec{\gamma }_0^1\) and \(\varvec{\xi }_{0}^{1}\) being the true parameter vectors. Let \(\varvec{\Phi }_1\) and \(\varvec{\Phi }_2\) be the score functions for \(\varvec{\theta }^1\) and \(\varvec{\gamma }^1\), respectively, \(\varvec{\Phi }_{1,i}\) and \(\varvec{\Phi }_{2,i}\) be the score functions evaluated at ith subject. The following conditions are needed for the asymptotic properties of the proposed estimators.

  1. A1.

    The treatment assignment is strongly ignorable: (Y(1), Y(0)) are independent of T conditional on \(\textbf{X}\). We also assume that \(\sum _{i=1}^nT_i/n\rightarrow c_1\) for some \(0<c_1<1\).

  2. A2.

    The response is missing at random, \(\textrm{Pr}(R_i=1\mid \textbf{X}_i,T_i,Y_i)=\textrm{Pr}(R_i=1\mid \textbf{X}_i,T_i)\). We also assume that \(\sum _{i=1}^nR_i/n\rightarrow c_2\) for some \(0<c_2<1\).

  3. A3.

    The \(\alpha _0(\tau )\) and \(\alpha _0(\tau )+\alpha _1(\tau )\) are the unique \(\tau \) quantile of potential outcomes Y(0) and Y(1), respectively. The density functions of Y(0) and Y(1), \(f_{Y(0)}(\cdot ),~f_{Y(1)}(\cdot )\) at \(\alpha _0(\tau )\) and \(\alpha _0(\tau )+\alpha _1(\tau )\) are bounded away from zero. Assume that \(\textrm{E} \Vert \textbf{X}\Vert ^{4}<\infty \).

  4. A4.

    For \(j=1,\ldots ,J, l=1,\ldots ,L\), \(\pi ^j(\textbf{X},T;\varvec{\theta }^j)\) and \(\eta ^l(\textbf{X};\varvec{\gamma }^l)\) have bounded derivatives in \(\textbf{X}\) up to the second order, and are continuously differentiable in \(\varvec{\theta }^j\) and \(\varvec{\gamma }^l\), respectively. Assume that \(\inf \limits _{\textbf{X},T}\inf \limits _{\varvec{\theta }^j}\pi ^j(\textbf{X},T;\varvec{\theta }^j)>0\), \(\inf \limits _{\textbf{X}}\inf \limits _{\varvec{\gamma }^l}\eta ^l(\textbf{X};\varvec{\gamma }^l)>0\).

  5. A5.

    The matrices \(\textbf{D}\) and \(\textbf{G}_1, \textbf{G}_2, \textbf{G}_3, \textbf{G}_4\), defined in Section S1 of the online Supplementary Materials, are invertible.

Regularity assumptions A1–A5 for the major results are mild. Assumptions A1 and A2 are commonly used in literatures of causal inference and missing data, respectively. Assumptions A3 and A4 are similar to the conditions adopted in Han et al. [9], in order to ensure the consistency of the proposed estimator. Assumption A5 is a technical assumption needed for the asymptotic normality.

Before presenting the asymptotic properties of the proposed estimators, we first give a lemma which shows that the IPW estimation is consistent when the true \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\) are given.

Lemma 3.1

Let \({{\widetilde{\textbf{a}}}}(\tau )=({\widetilde{a}}_0(\tau ), \widetilde{a}_1(\tau ))^{\top }\) be the minimizer of (2.2) with true \(\eta (\textbf{X})\) and \(\pi (\textbf{X},T)\). Under assumptions A1 and A2, we have that \({{\widetilde{\textbf{a}}}}(\tau )\) is a consistent estimator of \({\widetilde{\varvec{\alpha }}}_0(\tau )\).

Theorem 3.2

Assume that \(\mathcal {P}_{\pi }\) contains a correctly specified model for \(\pi (\textbf{X},T)\) and so does \(\mathcal {P}_{\eta }\) for \(\eta (\textbf{X})\). Under assumptions A1–A4, we have that \({\widehat{\textbf{a}}}(\tau ){\mathop {\rightarrow }\limits ^{P}}{{\widetilde{\varvec{\alpha }}}}_0(\tau )\) as \(n\rightarrow \infty \).

Theorem 3.3

Assume that the conditions in Theorem 3.2 hold. Under the additional assumption A5, we have that

$$\begin{aligned}{} & {} n^{1/2}\{{\widehat{\textbf{a}}}(\tau )-{\widetilde{\varvec{\alpha }}}_0(\tau )\}\nonumber \\{} & {} \quad =\textbf{D}^{-1}n^{-1/2}\sum _{i=1}^n\left[ V_{1,i}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)+\textrm{E} \Big \{\frac{\partial V_{1}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\theta }^1},\frac{\partial V_{1}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\gamma }^1}\Big \}\begin{pmatrix}\{\textrm{E} (\varvec{\Phi }_1^{\otimes 2})\}^{-1}\varvec{\Phi }_{1,i}\\ \{\textrm{E} (\varvec{\Phi }_2^{\otimes 2})\}^{-1}\varvec{\Phi }_{2,i}\end{pmatrix}\right] +o_p(1), \end{aligned}$$

where \(\textbf{D}\) and \(V_{1,i}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) are defined in Section S1 of the online Supplementary Materials, \(\textbf{A}^{\otimes 2}=\textbf{A}\textbf{A}^{\top }\) for a vector/matrix \(\textbf{A}\). Furthermore, we can derive that

$$\begin{aligned} n^{1/2}\{{\widehat{\textbf{a}}}(\tau )-{\widetilde{\varvec{\alpha }}}_0(\tau )\}{\mathop {\rightarrow }\limits ^{d}}N(\textbf{0},\varvec{\Sigma }_1), \end{aligned}$$

where

$$\begin{aligned} \varvec{\Sigma }_1=\textbf{D}^{-1} \textrm{Cov}\left[ V_{1}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)+\textrm{E} \Big \{\frac{\partial V_{1}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\theta }^1},\frac{\partial V_{1}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\gamma }^1}\Big \}\begin{pmatrix}\{\textrm{E} (\varvec{\Phi }_1^{\otimes 2})\}^{-1}\varvec{\Phi }_{1}\\ \{\textrm{E} (\varvec{\Phi }_2^{\otimes 2})\}^{-1}\varvec{\Phi }_{2}\end{pmatrix}\right] (\textbf{D}^{-1})^{\top }. \end{aligned}$$

Finally, for the \(\tau \)-QTE, let \(\textbf{e}_{1}=(1,0)^{\top }\), we have

$$\begin{aligned} n^{1/2}\{\widehat{a}_{1}(\tau )-\alpha _{1}(\tau )\}{\mathop {\rightarrow }\limits ^{d}}N(0,\textbf{e}_{1}^{\top }\varvec{\Sigma }_1\textbf{e}_{1}). \end{aligned}$$

Theorem 3.2 proves that the proposed IPW estimator is consistent, as long as \(\mathcal {P}_{\pi }\) contains a correctly specified model for \(\pi (\textbf{X},T)\) and so does \(\mathcal {P}_{\eta }\) for \(\eta (\textbf{X})\), which implies the robustness of \({\widehat{\textbf{a}}}(\tau )\). Theorem 3.3 gives the representation of the estimated QTE, where the first part in \(V_{1,i}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) is the representation as if the true \(\pi (\textbf{X},T)\) and \(\eta (\textbf{X})\) were known, and the rest parts account for the multiple robust estimation of \(\pi (\textbf{X},T)\) and \(\eta (\textbf{X})\). The expression of the covariance matrix in Theorem 3.3 is complicated, thus resampling bootstrap would be a good choice for uncertainty quantification of the QTE.

Theorem 3.4

Assume that \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR. Under assumptions A1–A4, we have \(\breve{\textbf{a}}(\tau ){\mathop {\rightarrow }\limits ^{P}}{{\widetilde{\varvec{\alpha }}}}_0(\tau )\) as \(n\rightarrow \infty \).

Theorem 3.5

Assume that the conditions in Theorem 3.4 hold. Under the additional assumption A5, we have

$$\begin{aligned} n^{1/2}\{\breve{a}_{1}(\tau )-\alpha _{1}(\tau )\}{\mathop {\rightarrow }\limits ^{d}}N(0,\textbf{e}_{1}^{\top }\varvec{\Sigma }_2\textbf{e}_{1}), \end{aligned}$$

where \(\textbf{e}_{1}=(1,0)^{\top }\) and

$$\begin{aligned} \varvec{\Sigma }_2=\textbf{D}^{-1} \textrm{Cov}\left[ V_{2}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)+\textrm{E} \Big \{\frac{\partial V_{2}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\theta }^1},\frac{\partial V_{2}(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)}{\partial \varvec{\gamma }^1}\Big \}\begin{pmatrix}\{\textrm{E} (\varvec{\Phi }_1^{\otimes 2})\}^{-1}\varvec{\Phi }_{1}\\ \{\textrm{E} (\varvec{\Phi }_2^{\otimes 2})\}^{-1}\varvec{\Phi }_{2}\end{pmatrix}\right] (\textbf{D}^{-1})^{\top }, \end{aligned}$$

where \(\textbf{D}\) and \(V_2(\varvec{\theta }_0^1,\varvec{\gamma }_0^1)\) are defined in Section S1 of the online Supplementary Materials.

Theorem 3.4 demonstrates that the proposed AIPW estimator is multiply robust, i.e., as long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the resulting estimator \(\breve{\textbf{a}}(\tau )\) is consistent. Theorem 3.5 gives the asymptotic normality of the proposed AIPW estimator. In fact, just as in Theorem 3.3, we first get the representation of \(\breve{\textbf{a}}(\tau )\), and we only present the asymptotic distribution of \(\breve{a}_{1}(\tau )\) for brevity. The proof of Theorem 3.5 is more challenging than Theorem 3.3 due to the additional outcome regression model. Compared with the literature of missing data, Theorem 3.5 is challenging due to confounding adjustment.

4 Simulation Study

In this section, simulation studies are conducted to evaluate the finite performance of the proposed \(\tau \)-QTE estimators. On the one hand, simulation studies are used to illustrate the effectiveness of the proposed estimators in dealing with the confounding effect and missingness. On the other hand, the proposed multiply robust IPW and AIPW estimators have a certain level of resistance to the model misspecifications.

The data are generated as follows. First, we generate the covariates as \(X_{i,1}\sim \text{ Unif }(-0.25, 0.25)\), \(X_{i,2}\sim \text{ Binom }(0.5)\), \(X_{i,3}\sim \text{ Unif }(-0.5, 0.5)\), \(X_{i,4}\sim \text{ Binom }(0.5)\times 0.6\). Second, the treatment indicator \(T_i\) is generated from a logistic regression model as

$$\begin{aligned} \text{ logit }\{\eta (\textbf{X}_i; \varvec{\gamma })\}= \gamma _0+\gamma _1X_{i,1}+\gamma _2X_{i,2}, \end{aligned}$$

where \(\varvec{\gamma }=(\gamma _0, \gamma _1,\gamma _2)^{\top }=(0.5,-1.5,-1)^{\top }\). Third, the response is generated from the outcome regression model

$$\begin{aligned} Y_i=\alpha _0+\alpha _1T_i+\alpha _2X_{i,1}+\alpha _3X_{i,2}+\alpha _4X_{i,3}+\alpha _5X_{i,4}+\varepsilon _{i}, i=1,\ldots ,n, \end{aligned}$$

where \((\alpha _0,\alpha _1,\ldots , \alpha _5)^{\top }=(3.5,2,1,1,0.5,2)^{\top }\), \(\varepsilon _i\)’s are random errors. Finally, the missing mechanism follows

$$\begin{aligned} \text{ logit }\{\pi (\textbf{X}_i,T_i; \varvec{\theta })\}= \theta _0+\theta _1X_{i,3}+\theta _2X_{i,4}+\theta _3T_i, \end{aligned}$$

where \(\varvec{\theta }=(\theta _0,\theta _1,\theta _2,\theta _3)^{\top }\). For the random errors, we consider two cases: (1) \(\varepsilon _i\sim N(0,1)\), (2) \(\varepsilon _i\sim t(3)\), both leading to \(\tau \)-QTE \(\alpha _1=2\). To investigate the impact of different missing probabilities, we choose \(\varvec{\theta }=(2.5,0.5,-3,1)^{\top }\) and \((1.5,0.5,-3,1)^{\top }\), which on average result in \(15\%\) and \(30\%\) missing proportions, respectively.

According to the data generating process, the correct working models for PS and PO are given by \(\text{ logit }\{\eta ^{1}(\textbf{X}_i;\varvec{\gamma }^{1})\}=\gamma _0^1+\gamma _1^1X_{i,1}+\gamma _2^1X_{i,2}\) and \(\text{ logit }\{\pi ^{1}(\textbf{X}_i,T_i; \varvec{\theta }^{1})\}= \theta _0^1+\theta _1^1X_{i,3}+\theta _2^1X_{i,4}+\theta _3^1T_i\), respectively; for the AIPW estimator, the correct working model for \(f_{Y\mid \textbf{X},T}(\cdot )\) is given by the probability density function of \(N(\xi ^{1}_0+\xi ^{1}_1T_i+\xi ^{1}_2X_{i,1}+\xi ^{1}_3X_{i,2}+\xi ^{1}_4X_{i,3}+\xi ^{1}_5X_{i,4},1)\) or non-central t(3) distribution with non-centrality parameter \(\xi ^{1}_0+\xi ^{1}_1T_i+\xi ^{1}_2X_{i,1}+\xi ^{1}_3X_{i,2}+\xi ^{1}_4X_{i,3}+\xi ^{1}_5X_{i,4}\). The additional incorrect models for PS and PO are \(\text{ logit }\{\eta ^{2}(\textbf{X}_i;\varvec{\gamma }^{2})\}=\gamma _0^2+\gamma _1^2\exp (X_{i,1})\) and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i,T_i; \varvec{\theta }^{2})\}= \theta _0^2+\theta _1^2\exp (X_{i,3})\), respectively, and \(N(\xi ^{2}_0+\xi ^{2}_1T_i,1)\) for OR with normal errors and non-central t(3) distribution with non-centrality parameter \(\xi ^{2}_0+\xi ^{2}_1T_i\) for OR with t(3) errors, respectively. The parameters in OR are derived by the quasi-maximum likelihood estimate, that is, the ordinary least squares regression, based on complete-case analysis; we also tried to estimate those parameters by the maximum likelihood method, but it heavily depends on initial value and converges very slowly. We consider two imputation sample sizes \(S=10\) and 50, and only report the results based on \(S=10\), as those from \(S=50\) are similar.

To the best of our knowledge, this is the first work discussing causal inference on quantiles with missing responses, so we only compare the proposed IPW and AIPW estimators, denoted as IPW.1111 and AIPW.111111, with four different classes of estimates, and the descriptions of all estimators are summarized in Table 1. (I) TM-type estimators, the minimizer of the objective function (2.5), where \(\widehat{\eta }(\textbf{X}_i)\) and \(\widehat{\pi }(\textbf{X}_i,T_i)\) are obtained by the empirical likelihood method similarly to those in IPW.1111, but including only one model for \(P(T_{i}=1\mid \textbf{X}_i)\) and \(P(R_{i}=1\mid \textbf{X}_i,T_i)\). (II) TMO-type estimators, the solution of the estimation Eq. (2.8), where the weights are obtained by the empirical likelihood method similarly to those in AIPW.111111, and we include five representative TMO-type estimators. The TMO-type estimators provide double protection for missing issues and only require that either PO or OR model is correct. (III) T-type estimators, the minimizer of an objective function similar to (2.5), where \(\widehat{\eta }(\textbf{X}_i)\) is obtained in the same way as the TM-type estimator, while \(\widehat{\pi }(\textbf{X}_i,T_i)=1\) if \(R_{i}=1\). The T-type estimators consider the causal effect but ignores the feature of missingness. (IV) Naive estimator, denoted as Naive.0000, the minimizer of an objective function similar to (2.5), where \(\widehat{\eta }(\textbf{X}_i)=T_{i}\) and \(\widehat{\pi }(\textbf{X}_i,T_i)=1\) if \(R_{i}=1\). The naive estimator ignores the features of causal effect and missingness.

A sample size of \(n = 400\) and two quantile levels \(\tau =0.3,0.5\) are considered, and all simulation results are summarized based on 1000 replicates. Table 2 summarizes the finite-sample performances of the QTE estimators, including the bias (BIAS), the root mean squared error (RMSE), as well as the empirical coverage probabilities (ECP) and the empirical mean lengths (EML) with nominal level 95%. The confidence intervals are constructed by the bootstrap method, which is implemented by the summary.rq function of the R package quantreg, and the number of resampling replicates is 500. From Table 2, we have the following findings. First, when both PS and PO candidate models include a correctly specified model, the obtained estimates TM.1010 have apparently small BIAS and RMSE than the estimators under model misspecification. The proposed IPW.1111 has almost the same performance as TM.1010, and performs even better in some scenarios. Second, when the PS model is correctly specified, the estimators are reasonably good even if the PO model is misspecified, though the BIAS and RMSE are relatively larger than IPW.1111 and TM.1010, see the performances of TM.1001, T.1100, and T.1000. However, if the PS model is misspecified, no matter whether the PO model is correctly specified, the BIAS and RMSE are much larger than IPW.1111. Third, the proposed AIPW estimator has lower RMSE than the IPW estimator. As long as \(\mathcal {P}_{\eta }\) contains the correct model for PO, and so does either \(\mathcal {P}_{\pi }\) for PS or \(\mathcal {P}_{f}\) for OR, the resulting estimators are close to TMO.101010 and AIPW.111111, which reflects that the AIPW estimator provides more protection for model specification. Similar to the TM-type estimator, if the PS model is misspecified, even if the models for PO and OR are correct, the BIAS and RMSE are much larger than other estimators. Fourth, the ECPs of IPW.1111, AIPW.111111, TM.1010 and TMO.11$$$$ estimator (except TMO.110101), are close to the nominal level, while others lead to low ECPs. Table 3 summarizes the results for t(3) error, and the comparison results are similar.

Finally, we would like to give a comment on the comparisons between AIPW.111111 and IPW.1111. From the view of theory, the AIPW approach is more efficient than the IPW approach, as we use more information in the AIPW. However, even if the true outcome model is included, there are several sources that lead to additional variation in the estimator: (i) the estimation of the parameters in the true outcome model; (ii) the inclusion of the misspecified outcome model in the empirical likelihood approach; (iii) sampling from the outcome models. Therefore, the AIPW approach does not benefit from augmentation when the missing rate is low (say 15%), but performs slightly better when the missing rate achieves 30%.

Table 1 Descriptions of several QTE estimators
Table 2 BIAS(\(\times \)100), RMSE(\(\times \)100), ECP(%) and EML for normal error, with \(n=400\), based on 1000 replicates
Table 3 BIAS(\(\times \)100), RMSE(\(\times \)100), ECP(%) and EML for t(3) error, with \(n=400\), based on 1000 replicates

5 Analysis of CHARLS Data

Cognitive function is a critical dimension of the life quality in later life, the decline of which disrupts daily life function and happiness. It is thus important to study the determinants of cognitive function in order to know how to delay and/or slow down its eventual decline. A growing body of research suggested that social activities or social engagement among the elderly have a positive effect on cognitive function [13, 16]. In this section, we apply the proposed method to the motivated data set from the Chinese Health and Retirement Longitudinal Study (CHARLS), collected in 2018, and we aim to study the causal effect of social activities on cognitive functions among middle-aged and older adults in China.

Following the analysis of Hu et al. [13], we use episodic memory, a necessary component of reasoning in many cognitive dimensions, as the outcome variable. Episodic memory was evaluated through the means of scores in the immediate and delayed word recall, with scores ranged from 0 to 10. After discarding participants who are either younger than 45 years old, or social activity information (the covariates) missing, 7818 participants are included in our study, with 1889 whose outcome variables are not observed completely, yielding around 24% missingness. The treatment variable is whether to participate in certain common specified activities in China, such as playing chess, card games or Mahjong, interacting with friends, and other social activities. Other covariates (confounders) in the analysis are composed of sociodemographic characteristics and health status related to cognitive function, including age, gender, resident areas (urban or rural), education levels (illiterate, primary education, secondary or above), marital status (currently married or not), smoke (yes or no), hypertension (yes or no), diabetes (yes or no), cardiopathy (yes or no), apoplexy (yes or no), where age is standardized and education levels introduce two dummy variables, \(\textrm{education}_\textrm{a}\) (primary education as 1, otherwise 0) and \(\textrm{education}_\textrm{b}\) (secondary and above as 1, otherwise 0). The QTE is defined as the quantile difference of cognitive distributions between the treatment (participate in at least one social activity) and control (not participate in any social activities) groups.

To apply our method, we need specify candidate models. For PS \(\eta (\textbf{X})\), we consider two candidate models: \(\text{ logit }\{\eta ^{1}(\textbf{X}_i;\varvec{\gamma }^{1})\}=\gamma _0^{1}+\varvec{\gamma }_1^{\top }\textbf{X}_{i}\) and \(\text{ logit }\{\eta ^{2}(\textbf{X}_i^*;\varvec{\gamma }^{2})\}=\gamma _0^{2}+\varvec{\gamma }_2^{\top }\textbf{X}_{i}^*\), where \(\textbf{X}_{i}\) only contains all covariates we mentioned above, while \(\textbf{X}_{i}^*\) includes \(\textbf{X}_i\) and the interactions between the age and other covariates. For PO, we consider \(\text{ logit }\{\pi ^{1}(\textbf{X}_i,T_i; \varvec{\theta }^{1})\}= \theta _0^{1}+\varvec{\theta }_1^{\top }\textbf{X}_{i}+\theta _{11}T_i\) and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i^*,T_i; \varvec{\theta }^{2})\}= \theta _0^{2}+\varvec{\theta }_2^{\top }\textbf{X}_{i}^*+\theta _{21}T_i\). For AIPW estimator, the two models specified for \(f_{Y\mid \textbf{X},T}(\cdot )\) are given by the probability density functions of \(N(\xi ^{1}_0+\xi _{11}T_i+\varvec{\xi }^{\top }_1\textbf{X}_{i},\sigma ^{2}_{1})\) and \(N(\xi ^{2}_0+\xi _{21}T_i+\varvec{\xi }^{\top }_2\textbf{X}_{i}^*,\sigma ^{2}_{2})\). We compare the proposed IPW (IPW.1111) and AIPW (AIPW.111111) estimators with four different classes of estimators as in Sect. 4.

Table 4 summarizes the point estimators, the bootstrap standard error (BSE), and 95% confidence intervals at quantile levels \(\tau =0.3, 0.5\) and 0.7. The confidence intervals are constructed by 500 bootstrap samples. We have the following findings. First, the positive estimates from all the methods indicate that the social activities have positive effects on cognition, and the effect is significant since none of the 95% confidence intervals includes zero, which is consistent with previous literature [6, 13]. Second, the causal effects of social activities on cognitive functions vary across different quantiles. The QTE at lower quantile is larger than those at the median or higher quantile, which indicates that social activities have greater effects on cognitive improvement of individuals with low cognitive function. This new finding provides important personalized intervention evidence for improving cognitive function. Third, the proposed estimator has a certain discrepancy with T.$$$$ and Naive.0000, showing the necessity of dealing with the confoundedness and missingness. Finally, the proposed IPW.1111 is very close to TM.0101, which indicates the suitability of \(\text{ logit }\{\eta ^{2}(\textbf{X}_i^*;\varvec{\gamma }^{2})\}\) for PS and \(\text{ logit }\{\pi ^{2}(\textbf{X}_i^*,T_i; \varvec{\theta }^{2})\}\) for PO; the proposed AIPW.111111 deviates from IPW.1111 in some extent, and according to our theory, the inclusion of the augmentation term plays a role, producing more trustworthy results.

Table 4 Analysis results of for the effects of social activities on cognition (\(n=7818\))

6 Discussion

In this paper, we proposed multiply robust IPW and AIPW estimators for the QTE with missing responses. The proposed IPW estimator is robust against model misspecification for PS and PO, and the proposed AIPW estimator further strengthens this robustness. Further research topics can be proposed. First, the methods proposed can be generalized to estimate QTE at extreme quantile levels [26] with missing responses. Second, under memory constraint, we can extend our method to distributed computing [4, 5]. Third, we can further consider the statistical inference of the QTE [2] with missing responses, testing whether the QTE is zero, or heterogeneous to covariates, or varies across quantiles.