Marginal quantile regression for varying coefficient models with longitudinal data

Zhao, Weihua; Zhang, Weiping; Lian, Heng

doi:10.1007/s10463-018-0684-7

Marginal quantile regression for varying coefficient models with longitudinal data

Published: 18 August 2018

Volume 72, pages 213–234, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Marginal quantile regression for varying coefficient models with longitudinal data

Download PDF

Weihua Zhao¹,
Weiping Zhang² &
Heng Lian³

674 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we investigate the quantile varying coefficient model for longitudinal data, where the unknown nonparametric functions are approximated by polynomial splines and the estimators are obtained by minimizing the quadratic inference function. The theoretical properties of the resulting estimators are established, and they achieve the optimal convergence rate for the nonparametric functions. Since the objective function is non-smooth, an estimation procedure is proposed that uses induced smoothing and we prove that the smoothed estimator is asymptotically equivalent to the original estimator. Moreover, we propose a variable selection procedure based on the regularization method, which can simultaneously estimate and select important nonparametric components and has the asymptotic oracle property. Extensive simulations and a real data analysis show the usefulness of the proposed method.

Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data

Article 29 November 2017

Smoothing combined generalized estimating equations in quantile partially linear additive models with longitudinal data

Article 12 August 2015

Smoothed empirical likelihood inference via the modified Cholesky decomposition for quantile varying coefficient models with longitudinal data

Article 23 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Besides parametric models, various semi-/nonparametric models have been used to describe longitudinal data. The varying coefficient model (VCM), firstly proposed by Hastie and Tibshirani (1993), has been a popular modeling approach for longitudinal data due to its flexibility and interpretability, where the coefficients are smooth nonparametric functions of the measurement time. The varying coefficient model for longitudinal data can be written as

$$\begin{aligned} y_{ij}=\varvec{X}_{ij}^T\varvec{\alpha }(t_{ij})+\epsilon _{ij},\ i=1,\ldots ,n, \ j=1,\ldots ,m_i, \end{aligned}$$

(1)

where $\varvec{X}_{ij}=(x_{ij1},\ldots ,x_{ijp})^T$ is a p-dimensional covariate associated with the j-th measurement for the i-th subject at time $t_{ij}$, $\varvec{\alpha }(t)=(\alpha _1(t),\ldots , \alpha _p(t))^T$ comprises p unknown nonparametric functions and $\epsilon _{ij}$ is the random error.

There exist many studies on model (1) in the framework of mean regression; see, for example, Fan and Zhang (1999, 2000), Chiang et al. (2001), Huang et al. (2002), Qu and Li (2006), Şentürk and Müller (2008). Quantile regression (Koenker 2005) is a valuable alternative to least squares-based method. Kim (2007) and Cai and Xu (2008) proposed the quantile VCM for cross-sectional data using polynomial splines and local polynomial smoothing, respectively. Andriyana et al. (2014) investigated the quantile VCM for longitudinal data through the penalized splines approach. Compared with mean regression for VCM, quantile regression can provide a more comprehensive summary of the response distribution and describe the dynamic functional relationship between covariates and response at different percentiles of the distribution. In addition, quantile regression is more robust than least squares regression.

For longitudinal data analysis, it is important to take into account the correlation properly within each subject, and ignoring such correlations could yield less efficient estimators. Some strategies have been developed to incorporate the correlation to fit the longitudinal data in mean regression. Lin et al. (2007) studied the VCM based on generalized estimating equation (GEE, Liang and Zeger 1986). Wang et al. (2005) and Huang et al. (2007) considered the efficient estimation for partial linear semiparametric models, and Lian et al. (2014) investigated partial linear additive model in high dimensions. Although GEE-based estimators are consistent, they lose efficiency if the working correlation is misspecified. To alleviate the impact of correlation misspecification and improve estimation efficiency, Qu and Li (2006) applied quadratic inference functions (QIF, Qu et al. 2000) to VCM using penalized splines. The QIF approach takes into account the within-cluster correlation and is more efficient than the GEE approach when the working correlation is misspecified. Some studies have also been carried out for other nonparametric/semiparametric models using QIF; see, for example, Xue et al. (2010), Li et al. (2014), Ma et al. (2014).

Quantile regression has been widely used in longitudinal data. However, most of these works proposed estimators that ignore the correlation within subject for simplicity, which resulted in low efficiency for estimation and inferences. Compared to mean regression, it is more challenging to account for correlation for quantile regression due to some computational issues as pointed out by Leng and Zhang (2014). In view of the flexibility of VCM and good performance of QIF, in this paper, we study the estimation and inference problem for quantile VCM. By using the polynomial splines to approximate the nonparametric functions and taking into account the correlation, we propose a new estimation procedure for longitudinal data and establish theoretical properties of the resulting estimators. To further improve estimation, we develop a method to select the important variables using the adaptive group SCAD penalty. The theoretical challenge largely lies in the non-continuity of the objective function as well as the diverging dimensionality resulting from spline approximation, which is dealt with using empirical processes theory (Van der Vaart 2000). Simulation results and real data analysis show that our proposed method outperforms the existing methods.

The rest of the paper is organized as follows. In Sect. 2, we present the estimation approach to quantile VCM based on QIF, where the nonparametric functions are approximated by polynomial splines. The large sample properties of the proposed estimators are established, and we also develop an estimation procedure using induced smoothing. In Sect. 3, to select the important nonparametric components, we develop a variable selection procedure based on the adaptive group SCAD penalty, and its oracle properties are also investigated. Simulation studies in Sect. 4 and real data analysis in Sect. 5 are used to illustrate the performance of the proposed approach. Finally, some concluding remarks are given in Sect. 6. Technical proofs are contained in Appendix in Supplementary Material.

2 Methodology

2.1 Estimation based on QIF

We assume in (1) that the conditional $\tau $-th quantile of error $\epsilon _{ij}$ given ${\varvec{X}}_{ij}$ and $t_{ij}$ is zero, i.e.,

$$\begin{aligned} Q_\tau (y_{ij}|\varvec{X}_{ij})=\varvec{X}_{ij}^T\varvec{\alpha }_\tau (t_{ij}), \ i=1,\ldots ,n,\ j=1,\ldots , m_i, \end{aligned}$$

and that the observations are independent across different subjects. For notation simplicity, we omit the subscript $\tau $ in $\varvec{\alpha }_\tau (t)$ in the following.

In our estimation procedure, we approximate the smooth functions $\{\alpha _l(\cdot )\}_{l=1}^p$ by polynomial splines (De Boor 2001; He and Shi 1996). We assume without loss of generality that the covariates $t_{ij}$ are scaled to take value in the interval [0, 1]. For each $1\le l \le p$, let

$$\begin{aligned} \xi _{-l,(d-1)}=\cdots =0=\xi _{l,0}<\xi _{l,1}<\cdots<\xi _{l,K_l}<1=\xi _{l,K_l+1}=\cdots =\xi _{l,K_l+d} \end{aligned}$$

be a partition of [0, 1] into subintervals $[\xi _{l,k},\xi _{l,k+1}), k=0,\ldots , K_l$ with $K_l$ interior knots. A polynomial spline of order d is a function which is a polynomial of degree $d-1$ in each subinterval and globally $d-2$ times continuously differentiable on [0, 1]. Denote the d-th order B-spline basis as $\varvec{B}_l(t)=(B_{l1}(t),\ldots , B_{lJ_l}(t))^T$, $J_l=K_l+d$, and its normalized basis as $\sqrt{J_l}\varvec{B}_l(t)=(\sqrt{J_l}B_{l1}(t),\ldots , \sqrt{J_l} B_{lJ_l}(t))^T$. With an abuse of notation, the normalized basis is still denoted by $\varvec{B}_l(t)$. Then, each unknown function $\alpha _l(\cdot )$ can be approximated by a linear combination of the normalized basis such that

$$\begin{aligned} \alpha _l(t)\approx s_l(t)=\sum _{k=1}^{J_l}B_{lk}(t)\gamma _{lk},\ \ l=1,\ldots ,p. \end{aligned}$$

Define $\varvec{\gamma }_l=(\gamma _{l1},\ldots ,\gamma _{lK_l})^T$, $\varvec{\gamma }=(\varvec{\gamma }_1^T,\ldots ,\varvec{\gamma }_p^T)^T$, $\varvec{U}_{ij}^T=\varvec{X}^T_{ij}\bar{\varvec{B}}(t_{ij})$,

$$\begin{aligned} \bar{\varvec{B}}(t)=\left( \begin{array}{llll} \varvec{B}_1^T(t) &{}\quad {\varvec{0}} &{}\quad \cdots &{}\quad {\varvec{0}} \\ {\varvec{0}} &{}\quad \varvec{B}_2^T(t) &{}\quad \cdots &{}\quad {\varvec{0}} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ {\varvec{0}} &{}\quad {\varvec{0}} &{}\quad \cdots &{}\quad \varvec{B}_p^T(t) \end{array}\right) , \end{aligned}$$

$\varvec{U}_i=(\varvec{U}_{i1},\ldots ,\varvec{U}_{im_i})^T$ and $\varvec{\epsilon }_i=(\epsilon _{i1},\ldots ,\epsilon _{im_i})^T$. If we ignore the correlation within subjects, we can obtain the estimator by minimizing the following objective function (Kim 2007):

$$\begin{aligned} G(\varvec{\gamma })=\sum _{i=1}^n\sum _{j=1}^{m_i} \rho _\tau \left( y_{ij}-\varvec{U}_{ij}^T\varvec{\gamma }\right) , \end{aligned}$$

(2)

where $\rho _\tau (u)=u(\tau -I(u<0))$ is the check function. The estimating equation derived from (2) is

$$\begin{aligned} \sum _{i=1}^n\varvec{U}_i^T \psi _\tau (\varvec{y}_i-\varvec{U}_{i}\varvec{\gamma })=0, \end{aligned}$$

where $\varvec{y}_i=(y_{i1},\ldots ,y_{im_i})^T$, $\psi _\tau (u)=\tau -I(u<0)$ and $\psi _\tau (\varvec{y}_i-\varvec{U}_{i}\varvec{\gamma })=(\psi _\tau ({y}_{i1}-\varvec{U}_{i1}^T{\varvec{\gamma }}),\ldots , \psi _\tau ({y}_{im_i}-\varvec{U}_{im_i}^T\varvec{\gamma }))^T$.

As in Jung (1996) and Leng and Zhang (2014), a more efficient estimating equation takes the form

$$\begin{aligned} \sum _{i=1}^n\varvec{U}_i^T \varvec{\Gamma }_i\varvec{A}^{-1}_i\psi _\tau (\varvec{y}_i-\varvec{U}_{i}\varvec{\gamma })=0, \end{aligned}$$

(3)

where $\varvec{\Gamma }_i=\mathrm{diag}(f_{i1}(0),\ldots ,f_{im_i}(0))$ with $f_{ij}(\cdot )$ being the conditional pdf of $\epsilon _{ij}$, and $\varvec{A}_i$ is the working correlation matrix that may depend on some nuisance parameters which, however, may be difficult to estimate. To circumvent this problem, we apply the QIF approach by approximating $\varvec{A}^{-1}_i$ with a linear combination of basis matrices as

$$\begin{aligned} \varvec{A}^{-1}_i=\sum _{k=1}^\upsilon a_k\varvec{M}_{ki}, \end{aligned}$$

where $\varvec{M}_{ki}$’s are known symmetric matrices and $a_1, \ldots , a_\upsilon $ are unknown constants. As stated in Qu et al. (2000), this is a sufficiently rich family that could accommodate, or at least approximate, many correlation structures commonly used. For example, if the working correlation has compound symmetric structure with parameter $\varpi $, then $\varvec{A}^{-1}_i$ can be represented as $a_1\varvec{M}_{1i}+a_2\varvec{M}_{2i}$, where $a_1=-\{(m_i-2)\varpi +1\}/k_1$, $a_2=\varpi /k_1$, $k_1=(m_i-1)\varpi ^2-(m_i-1)\varpi -1$, $\varvec{M}_{1i}$ is the identity matrix, $\varvec{M}_{2i}$ has 0 on the diagonal and 1 off the diagonal. Similar linear representation of $\varvec{A}^{-1}_i$ also holds for the AR(1) correlation structure with appropriate basis matrices. The advantage of the QIF approach is that it does not need to estimate the linear coefficients $a_i$’s.

In QIF, we use estimating equations defined as

$$\begin{aligned} \varvec{S}(\varvec{\gamma })=\frac{1}{n}\sum _{i=1}^n\varvec{S}_i(\varvec{\gamma }), \end{aligned}$$

(4)

where

$$\begin{aligned} \varvec{S}_i(\varvec{\gamma }) =\left( \begin{array}{c} \varvec{U}_i^T\varvec{\Gamma }_i\varvec{M}_{1i}\psi _\tau (\varvec{y}_i-\varvec{U}_i\varvec{\gamma })\\ \vdots \\ \varvec{U}_i^T\varvec{\Gamma }_i\varvec{M}_{\upsilon i}\psi _\tau (\varvec{y}_i-\varvec{U}_i\varvec{\gamma }) \end{array}\right) . \end{aligned}$$

(5)

Note that there are more estimation equations than the number of unknown parameters in (4). $\varvec{\gamma }$ can be estimated as

$$\begin{aligned} \hat{\varvec{\gamma }}=\mathrm{argmin}_{\varvec{\gamma }}\varvec{Q}_n(\varvec{\gamma }), \end{aligned}$$

(6)

where

$$\begin{aligned} \varvec{Q}_n(\varvec{\gamma })= & {} n\varvec{S}^T(\varvec{\gamma })\varvec{\Sigma }_n^{-1}(\varvec{\gamma })\varvec{S}(\varvec{\gamma }),\\ \varvec{\Sigma }_n(\varvec{\gamma })= & {} \frac{1}{n}\sum _{i=1}^n\varvec{S}_i(\varvec{\gamma })\varvec{S}_i^T(\varvec{\gamma }). \end{aligned}$$

As a result, we define the estimator of the unknown functions as

$$\begin{aligned} {\hat{\alpha }}_l(t)=\varvec{B}_l^T(t)\hat{\varvec{\gamma }}_l, \ l=1,\ldots ,p. \end{aligned}$$

Remark 1

Note that the estimating Eq. (4) involves the unknown error density $f_{ij}(0)$. In this paper, we adopt the method of Hendricks and Koenker (1992) and estimate $f_{ij}(0)$ by the difference quotient,

$$\begin{aligned} {\hat{f}}_{ij}(0)=2h_n\left\{ \varvec{X}_{ij}^T\left[ \check{\varvec{\alpha }}(t_{ij},\tau +h_n)- \check{\varvec{\alpha }}(t_{ij},\tau -h_n)\right] \right\} ^{-1}, \end{aligned}$$

where the estimators $\check{\varvec{\alpha }}(t,\tau )$ can be obtained from (6) by omitting the term $\varvec{\Gamma }_i$ at quantile level $\tau $ and $h_n$ is a bandwidth parameter tending to zero as $n\rightarrow \infty $. In our numerical studies, we choose $h_n=1.57n^{-1/3} \left( 1.5\phi ^2\{\Phi ^{-1}(\tau )\}/(2\{\Phi ^{-1}(\tau )\}^2+1)\right) ^{2/3}$ following Hall and Sheather (1988), where $\phi (\cdot )$ and $\Phi (\cdot )$ are the pdf and cdf of the standard normal distribution. For the sake of simplicity in the proof, we assume that $f_{ij}(0)$’s are known in the following in order to derive the asymptotic properties of the estimators. This is similar to the approach adopted in the literature, for example in Leng and Zhang (2014). Unfortunately, it seems very hard to establish the asymptotic property when the density is estimated. On the other hand, we believe a very accurate estimator of the density is not very critical in our context, unlike for classical density estimation problems. For example, even if we replace the density by 1, the estimating equation is still consistent. It required further study to see whether the gap in theory can be closed with more technical advancements.

2.2 Theoretical properties

In this subsection, we study the rate of convergence for ${\hat{\alpha }}_l(t)$. The following assumptions are needed to derive the asymptotic properties of ${\hat{\alpha }}_l(t)$.

(C1):: The cluster sizes $m_i$ are uniformly bounded for all $i=1,\ldots ,n$.
(C2):: The conditional density $f_{ij}$ of $\epsilon _{ij}$ is uniformly bounded and bounded away from zero and has a bounded first derivative in the neighborhood of zero.
(C3):: $\alpha _l (\cdot )\in {\mathcal {H}}_r$, $l=1,\ldots ,p$, for some $r>1/2$, where ${\mathcal {H}}_r$ is the collection of all functions on [0, 1] whose s-th order derivative satisfies the Hölder condition of order $\vartheta $ with $r=s+\vartheta $. The spline order $d\ge r+1$.
(C4):: The covariates $\varvec{X}_{ij}$ are bounded in probability for all i and j.
(C5):: The conditional distribution of t given $\varvec{X}=\varvec{x}$ has a density $f_{t|\varvec{X}}$ which satisfies that $0<c_1\le f_{t|\varvec{X}}(t|\varvec{x})\le c_2<\infty $ uniformly in $\varvec{x}$ and t for some positive constants $c_1$ and $c_2$.
(C6):: The knots sequences $\xi _{l,K_l}$ for $l=1, \ldots , p$, are quasi-uniform. That is, there exists $c_3>0$, such that $\max _l\left( \max _k h_{l,k}/\min _k h_{l,k}\right) \le c_3$, where $h_{l,k}=\xi _{l,k+1}-\xi _{l,k}$ ($0\le k \le K_l$) are the distances between neighboring knots. Furthermore, the number of interior knots $K_l\asymp n^{1/(2r+1)}$.
(C7):: The eigenvalues for each $\varvec{M}_{ki}$ are bounded away from 0 and $\infty $.

These conditions are common in the literature; see, for example, Kim (2007), Huang et al. (2002), Wang et al. (2009) and Xue et al. (2010). Conditions (C1) and (C2) are standard assumptions used in longitudinal studies and quantile regression, respectively. Condition (C3) is a smoothness assumption on the nonparametric functions. The boundedness assumption in (C4) is mainly for convenience of proof. Condition (C5) is commonly used in VCM. Condition (C6) is a mild assumption on the choice of knots. Finally, Condition (C7) is satisfied for commonly used basis matrices.

Theorem 1

Under Conditions (C1)–(C7), there exists a local minimizer of (6) such that

$$\begin{aligned} \Vert {\hat{\alpha }}_l-\alpha _l\Vert _2^2=O_p\left( n^{-2r/(2r+1)}\right) , \ \ l=1, \ldots , p, \end{aligned}$$

where $\Vert .\Vert _2$ is the $L_2$ norm for functions on [0, 1].

Remark 2

The rate of convergence given here is the same as that in Kim (2007) for independent data ($m_i\equiv 1$) and in Wang et al. (2009) for longitudinal data which, however, ignore the correlations. In practice, the main advantage of the QIF approach is that it incorporates within-cluster correlation by optimally combining estimating equations without estimating the correlation parameters.

Remark 3

Due to that ${\varvec{Q}}_n(\varvec{\gamma })$ is not continuous, we are not aware of a satisfactory algorithm to compute its minimizer. We propose a smoothing method to obtain a feasible objective function in the next subsection. However, the theoretical results here provide a benchmark so one can compare the rates of the feasible estimator with the current infeasible one. Similarly, in Sect. 3, infeasible estimator is also studied theoretically when penalization is used for variable selection.

2.3 Induced smoothing

It is difficult to solve the estimating Eq. (6) directly, which is caused by the fact that $\varvec{S}(\varvec{\gamma })$ is not continuous. To overcome this difficulty, we apply the smoothing method which has been used in linear quantile regression by Fu and Wang (2012) and Leng and Zhang (2014).

Let

$$\begin{aligned} \tilde{\varvec{S}}(\varvec{\gamma })=\mathrm{E}\left[ \varvec{S}(\varvec{\gamma }+(nh)^{-1/2}\varvec{\Omega }^{1/2}\varvec{\delta })\right] , \end{aligned}$$

(7)

where $h=n^{-1/(2r+1)}$, the expectation is taken with respect to $\varvec{\delta }\sim N(0, \varvec{I}_q)$ with $q=\sum _{l=1}^p J_l$, and $\varvec{\Omega }$ is a $q\times q$ positive definite matrix. We note that such choices are certainly not the only possibilities. We just followed the literature on induced smoothing and used the multivariate Gaussian distribution for smoothing. The smoothing in some sense is merely used to make sure that the objective function is differentiable. As long as the disturbance is small enough, we should expect $\tilde{\varvec{Q}}_n$ to be close to $\varvec{Q}_n$ and thus should have reasonable performances.

By some simple calculations, $\tilde{\varvec{S}}_i(\varvec{\gamma })$ can be written as

$$\begin{aligned} \tilde{\varvec{S}}_i(\varvec{\gamma })=\mathrm{E}{\varvec{S}}_i(\varvec{\gamma }+(nh)^{-1/2} \varvec{\Omega }^{1/2}\varvec{\delta })=\left( \begin{array}{c} \varvec{U}_i^T\varvec{\Gamma }_i\varvec{M}_{1i}\left[ \Phi \left( \sqrt{nh}\frac{ \varvec{y}_i-\varvec{U}_i\varvec{\gamma }}{\varvec{r}_i}\right) -\varvec{1}(1-\tau )\right] \\ \vdots \\ \varvec{U}_i^T\varvec{\Gamma }_i\varvec{M}_{\upsilon i}\left[ \Phi \left( \sqrt{nh} \frac{\varvec{y}_i-\varvec{U}_i\varvec{\gamma }}{\varvec{r}_i}\right) -\varvec{1}(1-\tau )\right] \end{array} \right) , \end{aligned}$$

(8)

where $\varvec{r}_i=(r_{i1},\ldots ,r_{im_i})^T$ with $r_{ij}=\sqrt{\varvec{U}_{ij}^T \varvec{\Omega }\varvec{U}_{ij}}$, $j=1,\ldots ,m_i$, $\varvec{1}$ being the $m_i$-dimensional column vector with all elements 1, and $\Phi \left( \sqrt{nh}\frac{ \varvec{y}_i-\varvec{U}_i\varvec{\gamma }}{\varvec{r}_i}\right) $ denotes an $m_i$-dimensional vector with j-th element being $\Phi \left( \sqrt{nh}\frac{ {y}_{ij}-\varvec{U}_{ij}^T\varvec{\gamma }}{r_{ij}}\right) $.

The smoothing estimator $\tilde{\varvec{\gamma }}$ can be obtained as

$$\begin{aligned} \tilde{\varvec{\gamma }}=\mathrm{argmin}_{\varvec{\gamma }} \tilde{\varvec{Q}}_n(\varvec{\gamma }), \end{aligned}$$

(9)

where

$$\begin{aligned} \tilde{\varvec{Q}}_n(\varvec{\gamma })=n\tilde{\varvec{S}}^T(\varvec{\gamma })\tilde{\varvec{\Sigma }}^{-1}_n(\varvec{\gamma })\tilde{\varvec{S}}^T(\varvec{\gamma }) \end{aligned}$$

with $\tilde{\varvec{\Sigma }}_n(\varvec{\gamma })=\frac{1}{n}\sum _{i=1}^n \tilde{\varvec{S}}_i(\varvec{\gamma })\tilde{\varvec{S}}_i^T(\varvec{\gamma })$. Then, we can set

$$\begin{aligned} {\tilde{\alpha }}_l(t)=\varvec{B}_l^T(t)\tilde{\varvec{\gamma }}_l, l=1,\ldots ,p. \end{aligned}$$

Theorem 2 establishes the theoretical property of the estimator after smoothing, which enjoys the same convergence rate.

Theorem 2

Let $\varvec{\Omega }$ be any symmetric and positive definite matrix with bounded eigenvalues. Under Conditions (C1)–(C7), there exists a local minimizer of (9) such that

$$\begin{aligned} \Vert \tilde{\alpha }_l-\alpha _l\Vert _2^2 =O_p\left( n^{-2r/(2r+1)}\right) , \ \ l=1, \ldots , p. \end{aligned}$$

With the above smoothing method, we can use the standard Newton–Raphson method to obtain the estimator. However, the closed-form derivative of $\tilde{\varvec{Q}}_n(\varvec{\gamma })$ with respect to $\varvec{\gamma }$ is very messy to say the least and thus we use the two-stage method as mentioned in Greene (2011). First, the initial estimator is obtained by the R package quantreg with the weights $\varvec{\Gamma }_i, i=1,\ldots ,n$ and correlations ignored. Then, the initial estimator is used to calculate $\varvec{\Sigma }_n$ which is then fixed and standard Newton–Raphson method is used to obtain the refined estimator. According to our experience, the proposed method based on Newton–Raphson algorithm is very fast and typically converges within 20 iterations in our numerical results. Although multiple-stage approach could also be used in which refined estimator is used to further update $\varvec{\Sigma }_n$, we find this does not improve over the two-stage approach and thus do not use multiple-stage approach in our study.

Remark 4

In practice, the matrix $\varvec{\Omega }$ can be updated as an estimate of the covariance matrix of the estimators, which can be simply calculated as

$$\begin{aligned} \hat{\varvec{\Omega }}=\left( \{{\tilde{\varvec{S}}}^\prime (\tilde{\varvec{\gamma }})\}^T\tilde{\varvec{\Sigma }}_n^{-1}(\tilde{\varvec{\gamma }}) {\tilde{\varvec{S}}}^\prime (\tilde{\varvec{\gamma }})\right) ^{-1}, \end{aligned}$$

where ${\tilde{\varvec{S}}}^\prime (\tilde{\varvec{\gamma }})$ is the derivative of ${\tilde{\varvec{S}}}(\tilde{\varvec{\gamma }})$. This is the induced smoothing method proposed by Brown and Wang (2005), and Leng and Zhang (2014) adopted the same formula for longitudinal quantile linear regression. However, due to technical difficulties, we are not able to establish the validity of this rigorously due to the unsmooth nature of the objective function, and it is hard to deal with the spline approximation bias. Thus, the proposed asymptotic variance formula ignores the bias in function estimation and uses directly the formula pretending it is a linear model after spline expansion.

Remark 5

Although we are not able to provide rigorous asymptotic normality results for the nonparametric functions, we can informally argue that the QIF estimator is more efficient than the estimator assuming working independence as follows. Since the QIF estimator is a GMM (generalized method of moments) estimator, it is well known that the asymptotic variance of the QIF estimator is at least as small as that obtained by any linear combinations of the estimating equations. When one of the basis matrices in (5) is the identity matrix corresponding to the estimating equations ignoring correlations, we have that QIF is more efficient than assuming working independence. In fact, in practice one always set one of the basis matrices to be the identity matrix.

To implement the above estimating procedure, one needs to select the order and the number of knots in spline approximation. In all simulation studies and real data analysis, for simplicity, the nonparametric functions are approximated by cubic splines and the same number of interior knots, i.e., $K:= K_1=\cdots =K_p$. The number of interior knots is chosen from the interval $\left[ n^{1/(2d+1)}, 5n^{1/(2d+1)}\right] $ by minimizing the following BIC criterion

$$\begin{aligned} \mathrm{BIC}(K)=\tilde{\varvec{Q}}_n(\tilde{\varvec{\gamma }})+p(K+d){\log }(n). \end{aligned}$$

This range for K satisfies the order assumptions in Theorem 1 on the number of interior knots K, which has also been used in Ma et al. (2014).

3 Variable selection

In practice, there are often many covariates in model (1). With high-dimensional covariates, sparse modeling is often considered superior, owing to enhanced model predictability and interpretability. In this section, we address the variable selection problem for quantile varying coefficient model based on the QIF method.

There exist some works focusing on the variable selection methods for conditional mean or conditional quantile of VCM. For example, Wang and Xia (2009) proposed the variable selection approach for VCM using kernel smoothing adaptive group LASSO (Tibshirani 1996; Yuan and Lin 2006; Zou 2006), and Zhao et al. (2013) extended the method to the quantile VCM. Noh et al. (2012) applied the polynomial spline approximation and group SCAD with local linear approximation (LLA, Zou and Li 2008) to investigate variable selection for quantile VCM. Verhasselt (2014) discussed the variable selection for the generalized VCM using P-splines. For longitudinal data, Wang et al. (2008) studied the variable selection approach via group SCAD penalty (Fan and Li 2001). However, it ignored the correlation within subjects, which may lead to some efficiency loss in estimation and variable selection. Here, we consider variable selection of quantile VCM that incorporates the correlations within subjects.

To conduct variable selection, we propose the penalized estimator by minimizing the following penalized QIF, defined as

$$\begin{aligned} \hat{\varvec{\gamma }}^P=\mathrm{argmin}_{\varvec{\gamma }}\left\{ \varvec{Q}_n(\varvec{\gamma })+n \sum _{l=1}^p p_{\lambda _l}\left( \Vert \varvec{\gamma }_l\Vert _{\varvec{H}_l}\right) \right\} , \end{aligned}$$

(10)

where $p_{\lambda _l}(\cdot )$ is a given penalty function depending on the tuning parameter $\lambda _l$, $l=1,\ldots ,p$, and $\Vert \varvec{\gamma }_l\Vert ^2_{\varvec{H}_l} =\varvec{\gamma }_l^T\varvec{H}_l\varvec{\gamma }_l$ with $\varvec{H}_l=\int \varvec{B}_l(t)\varvec{B}_l^T(t)dt$. Notice that $\Vert \varvec{\gamma }_l\Vert _{\varvec{H}_l}=\Vert s_l\Vert _2$, the $L_2$ norm of the spline function $s_l(\cdot )$. Shrinking $\Vert s_l\Vert _2$ to 0 entails $s_l\equiv 0$. In addition, the tuning parameters $\lambda _l$ for the penalty functions in (10) are not necessarily the same for different l, which can provide further flexibility.

There are several possible choices for the penalty function $p_{\lambda _l}(\cdot )$, such as LASSO (Tibshirani 1996), MCP (Zhang 2010) and SCAD penalty (Fan and Li 2001). Here, we choose the SCAD penalty, which is defined as

$$\begin{aligned} p_{\lambda }(\theta )=\left\{ \begin{array}{ll} \lambda |\theta |, &{}\quad |\theta |\le \lambda , \\ -(\theta ^2-2a\lambda |\theta |+\lambda ^2)/[2(a-1)], &{}\quad \lambda <|\theta |\le a\lambda ,\\ (a+1)\lambda ^2/2, &{}\quad |\theta |>a\lambda , \end{array}\right. \end{aligned}$$

for given $a>2$ and $\lambda >0$. The SCAD penalty is continuously differentiable on $(-\infty , 0)\cup (0, \infty )$ but singular at 0, and its derivative vanishes outside $[-a\lambda , a\lambda ]$, which can produce sparse estimates for small coefficients and unbiased estimates for large coefficients. Therefore, we obtain the penalized estimator of $\alpha _l(t)$ as

$$\begin{aligned} {\hat{\alpha }}^P_l(t)=\varvec{B}_l^T(t)\hat{{\varvec{\gamma }}}^P_l, \ \ l=1,\ldots ,p. \end{aligned}$$

We next discuss the asymptotic properties of the resulting penalized estimator. Without loss of generality, we assume that $\alpha _{l}(\cdot )=0, j=d_0+1,\ldots ,p$ and $\alpha _{l}(\cdot ), l=1,\ldots ,d_0$ are all nonzero components of $\varvec{\alpha }(\cdot )$. We first show in Theorem 3 that the penalized QIF estimators $\{{\hat{\alpha }}^P_l(t)\}_{l=1}^p$ have the same convergence rate as the unpenalized estimators $\{{\hat{\alpha }}_l(t)\}_{l=1}^p$ . Moreover, Theorem 4 establishes the sparsity property of the penalized estimators, that is, ${\hat{\alpha }}^P_l(t)=0$ with probabilities approaching one for $l=d_0+1,\ldots ,p$.

Theorem 3

Under the conditions of Theorem 1, if the tuning parameters satisfy $\max _l\lambda _l\rightarrow 0$ in probability as $n\rightarrow \infty $, then there exists a local minimizer of (10) such that

$$\begin{aligned} \Vert {\hat{\alpha }}_l^P-{\alpha }_l\Vert _2^2=O_p\left( n^{-2r/(2r+1)}\right) , \ \ l=1, \ldots , p. \end{aligned}$$

Theorem 4

Under the same conditions of Theorem 3, if the tuning parameters further satisfy $\min _{d_0+1\le l\le p}\lambda _l n^{r/(2r+1)}\rightarrow \infty $ in probability as $n\rightarrow \infty $, then, with probability approaching 1, ${\hat{\alpha }}^P_l(t)=0$ for $l=d_0+1,\ldots ,p$.

Since the penalty function is not differentiable, to obtain the penalized estimator $\hat{\varvec{\gamma }}^P$, we need to smooth the penalized object function for both terms in (10). For $\varvec{Q}_n(\cdot )$, we can use the induced smoothing method as in Sect. 2.3, and for penalty $p_{\lambda _l}(\cdot )$, we can use the local quadratic approximation (LQA, Fan and Li 2001). Then, we can use the Newton–Raphson algorithm to obtain the penalized smoothing QIF estimator, denoted by $\tilde{\varvec{\gamma }}^P$. As in Theorem 2, we can establish that the penalized estimator $\tilde{\varvec{\gamma }}^P$ still enjoys the same asymptotic property as $\hat{\varvec{\gamma }}^P$. The details are omitted.

Unlike our approach, Noh et al. (2012) used the local linear approximation (LLA) to deal with the penalty terms, which allows the authors to convert the optimization problem to a second-order cone programming. We just use the simple LQA approach in order to make the optimization problem differentiable. Our model is more complicated, and it remained to be seen whether LLA will allow a more efficient algorithm to be developed.

Note that there are p tuning parameters to be chosen in conducting the variable selection procedure. To reduce the computational burden, we propose to use the following strategy by setting

$$\begin{aligned} \lambda _{l}=\frac{\lambda }{\Vert \tilde{{\varvec{\gamma }}}_l\Vert _{H_l}},\ \ l=1,\ldots ,p, \end{aligned}$$

where $\tilde{{\varvec{\gamma }}}_l$ is the initial unpenalized estimator of ${\varvec{\gamma }}_l$ obtained in Sect. 2.3. Note that the above strategy has also been used in Wang and Xia (2009). Then, we use the following criterion to obtain the optimal tuning parameter

$$\begin{aligned} {\hat{\lambda }}^{\mathrm{opt}}=\mathrm{argmin}_\lambda \left\{ \tilde{\varvec{Q}}_n(\tilde{\varvec{\gamma }}^P) + J\cdot {\log }(n)\cdot df_\lambda \right\} , \end{aligned}$$

where $df_\lambda $ is the number of nonzero coefficient functions for a given tuning parameter $\lambda $.

4 Simulation studies

In this section, we conduct simulation studies to illustrate our proposed methods. Note that the minimizer of $\varvec{Q}_n$ is presented merely for theoretical reasons and we are not able to compute it due to the discontinuity of the objective function, while the minimizer of $\tilde{\varvec{Q}}_n$ can be easily found using the two-stage approach as mentioned in Sect. 2. For each example, we focus on $\tau =0.25$ and 0.5, and 500 data sets are generated to evaluate the simulation results.

Example 1

In this example, the responses $y_{ij}$ are generated from

$$\begin{aligned} y_{ij}=\alpha _0(t_{ij})+x_{ij1}\alpha _1(t_{ij})+x_{ij2}\alpha _2(t_{ij})+(1+\kappa |t_{ij}|)\epsilon _{ij}, \ i=1,\ldots ,n, \ j=1,\ldots ,m_i, \end{aligned}$$

where the number of subjects is $n=50, 100$ and 150, and each subject is supposed to be measured at scheduled time point $t_{ij}$ from $\{0,0.1,0.2,\ldots ,1\}$, each of which has a 20% probability of being skipped. The actual measurement times are generated by adding a $U(-0.05, 0.05)$ random variable to the scheduled times. The three nonparametric functions are set to be

$$\begin{aligned} \alpha _0(t)=3\sin (2\pi t), \ \ \alpha _1(t)=8t(1-t) \ \ \mathrm{and} \ \ \alpha _2(t)=2\cos (2\pi t). \end{aligned}$$

Table 1 Simulation results of AIMSE when the true error structure is compound symmetry

Full size table

The marginal distributions of the two covariates are standard normal, and their correlation coefficient is 0.5. The random error $\varvec{\epsilon }_i =(\epsilon _{i1},\ldots ,\epsilon _{im_i})^T$ follows multivariate normal distribution or multivariate t-distribution (degrees of freedom 3) with location parameter $-q_\tau $ and covariance matrix $\Sigma $, where $q_\tau $ is the $\tau $-th quantile of the standard normal distribution or t-distribution with degrees of freedom 3, which implies the corresponding $\tau $-th quantile of ${\epsilon }_{ij}$ is zero. The covariance matrix $\Sigma $ follows either the compound symmetry (CS) or AR(1) structure with parameter $\rho =0.8$. In addition, the quantity $\kappa $ equals 0 or 1 corresponding to homoscedastic model (HM) and heteroscedastic model (HT), respectively.

Table 2 Simulation results of AIMSE when the true error structure is AR(1)

Full size table

To assess the estimation efficiency for nonparametric functions, we calculate the integrated mean square error (IMSE) defined as

$$\begin{aligned} \mathrm{IMSE}(\alpha _l)=\frac{1}{n_{\mathrm{grid}}}\sum _{k=1}^{n_{\mathrm{grid}}} \left\{ \alpha _l(t_k)-{\hat{\alpha }}_l(t_k)\right\} ^2 \end{aligned}$$

averaged over 500 data sets and report the average of the integrated mean square error (AIMSE)

$$\begin{aligned} \mathrm{AIMSE}=\frac{1}{p}\sum _{l=1}^p \mathrm{IMSE}(\alpha _l), \end{aligned}$$

where $\{t_k: k=1,\ldots ,n_{\mathrm{grid}}\}$ with $n_{\mathrm{grid}}=200$ are the grid time points at which the functions $\{\alpha _l(\cdot )\}$ are evaluated. The simulation results are shown in Tables 1 and 2, where we also report the corresponding results by assuming working independence (WI) for comparison.

Table 1 summarizes the estimation results when the error correlation has compound symmetry structure. The AIMSEs for each method become smaller as the sample size increases. Moreover, it shows that the estimators with a correct CS working correlation have the smallest AIMSE, and even with misspecified AR(1) working correlation the efficiency gains are also obvious compared with WI ignoring the correlation within subjects. If the model is heteroscedastic and/or the error follows multivariate t-distribution, the efficiency gain is more obvious. Similar phenomena are also observed for the case of the true error correlation being AR(1) as shown in Table 2. Further simulation studies (not shown) indicate that the performances when incorporating $\varvec{\Gamma }_i$ are only a little better than replacing $\varvec{\Gamma }_i$ with an identity matrix. In addition, we have also tried the simulations when the error has lower correlation $\rho =0.3$ or $\rho =0.5$; the performances of our proposed approach are better than or at least as well as WI. We omit them to save space.

Example 2

In this example, the data are generated from the following model:

$$\begin{aligned} y_{ij}=\sum _{l=1}^p x_{ijl}\alpha _l(t_{ij})+(1+\kappa |x_{ij3}|)\epsilon _{ij}, \ i=1,\ldots ,250, \ j=1,\ldots ,m_i. \end{aligned}$$

Each subject is supposed to be measured at scheduled time points $\{0,1, 2, \ldots , 20\}$, and each time point has a 50% probability of being skipped. Similar to Example 1, the actual measurement times are generated by adding a $U(-\,0.5, 0.5)$ random variable to the scheduled times. The three relevant variables, $x_{ijl}, l=1,2,3$, are simulated as follows: $x_{ij1}$ is generated from $U(t_{ij}/10,2+t_{ij}/10)$, $x_{ij2}$, conditional on $x_{ij1}$, is $N(0,(1+x_{ij1})/(2+x_{ij1}))$, and $x_{ij3}$, independent of $x_{ij1}$ and $x_{ij2}$, is a Bernoulli random variable with success probability 0.8. The rest five redundant variables, $x_{ijl}, l=4,\ldots ,p$, are mutually independent, and for each l, $x_{ijl}$ is generated from a multivariate Gaussian distribution with zero mean and a decayed exponential covariance

$$\begin{aligned}\mathrm{cov}(x_{ijl},x_{uvl})=\left\{ \begin{array}{ll} 4\exp (-|t_{ij}-t_{uv}|), &{}\quad \mathrm{if} \ i=u\\ 0, &{}\quad \mathrm{if} \ i\ne u \end{array}. \right. \end{aligned}$$

The three nonparametric functions are

$$\begin{aligned} \alpha _1(t)=15+20\sin \left( \frac{\pi t}{40}\right) ,\ \alpha _2(t)=2-3\cos \left( \frac{(t-25)\pi }{15}\right) ,\ \alpha _3(t)=6-0.6 t. \end{aligned}$$

The marginal variance of random error $\varvec{\epsilon }_i =(\epsilon _{i1},\ldots ,\epsilon _{im_i})^T$ is 4, and the correlation settings are the same as in Example 1, and the cases $\kappa =0$ and 1 are still denoted by HM and HT, respectively.

Table 3 Mean and standard deviation of AIMSE ($\times 100$) over 500 replications for Example 2 with $p=8$

Full size table

Table 4 Variable selection results for Example 2 with $p=8$

Full size table

We report the results with $p=8$ in Tables 3 and 4, where the simulation results obtained by ignoring the correlation are also shown. Moreover, we also report the results with group LASSO penalty in Tables 3 and 4 for comparisons. In Table 3, the oracle estimator is obtained using only the first three relevant variables. In Table 4, two quantities “False positive rate” (FPR) and “false negative rate” (FNR) are used to evaluate the performance of variable selection, where FNR denotes the proportion of nonzero varying coefficient functions incorrectly estimated as zero coefficients, while FPR denotes the proportion of zero coefficients incorrectly estimated by nonzero functions, and we present the mean proportion over 500 replications. In addition, according to the suggestion of a reviewer, we also report the simulation results for case of divergent dimension $p=[n^{1/2}]$ in Appendix in Supplementary Material (Tables 1 and 2).

The results in Tables 3 and 4 indicate that the performances of our proposed variable selection approach are satisfactory. Both the nonzero components and zero components can be correctly identified in terms of FPR and FNR. For variable selection, the difference between the proposed QIF method and the method assuming working independence is very small for both homoscedastic model and heteroscedastic model with group SCAD penalty or group LASSO penalty. Moreover, as we can see from Table 3, AIMSEs calculated based on SCAD penalized estimates are closer to the AIMSEs from the oracle estimator than the group LASSO penalty. In addition, the QIF-based estimation performance is a little better than working independence even if we use the misspecified working correlation matrix. This shows that our penalized QIF estimators can simultaneously estimate and select important variables and gain estimation accuracy by effectively removing the zero component variables. The above findings confirm our theoretical results and demonstrate efficiency gains by our proposed approach compared to ignoring correlations within subjects.

For the case of divergent dimension $p=[n^{1/2}]$, the proportion of FNR in Table 2 in Supplementary Material is still very small while the proportion of FPR for heteroscedastic model is noticeably larger. However, the gains of our method are still obvious in terms of AIMSEs in Table 1.

5 Real data analysis

In this section, we demonstrate an application of our proposed method to a longitudinal AIDS data from the Multi-Center AIDS Cohort Study between 1984 and 1991. This AIDS Cohort Study was firstly reported in Kaslow et al. (1987), where each individual was scheduled to undergo measurements at semiannual visits, but because many individuals missed some of their scheduled visits and the human immunodeficiency virus (HIV) infections occurred randomly during the study, there were unequal numbers of repeated measurements and different measurement times for each individual. As a subset of the cohort, our analysis focused on the 283 homosexual men who were infected with HIV during the follow-up period. The main interest of these data is to describe the trend of the level of the CD4 percentage depletion over time and to evaluate the effects of cigarette smoking, pre-HIV infection CD4 percentage and age at infection on the CD4 percentage after the infection. These data have been studied in several papers, including Huang et al. (2002), Qu and Li (2006), Fan et al. (2007) and Wang et al. (2009). Recently, Wang et al. (2008) have considered variable selection in varying coefficients models for these data based on mean regression and quantile regression, respectively. However, they did not consider the correlation between measurements for the same individual, which may lose estimation efficiency.

In the following, our analysis focuses on evaluating the time-dependent effects of smoking status ($x_{i1}$, taking values of 1 and 0 whether smoke or not), age ($x_{i2}$), preCD4 ($x_{i3}$, pre-HIV infection CD4 percentage) and the interaction of the covariates at different quantile levels. We consider the following varying coefficient model:

$$\begin{aligned} y_{ij}= & {} \alpha _0(t_{ij},\tau )+\alpha _1(t_{ij},\tau )x_{i1} +\alpha _2(t_{ij},\tau )x_{i2}+\alpha _3(t_{ij},\tau )x_{i3}+\alpha _4(t_{ij},\tau )x_{i2}^2\\&\quad +\,\alpha _5(t_{ij},\tau )x_{i3}^2+\alpha _6(t_{ij},\tau )x_{i1}x_{i2}+\alpha _7(t_{ij},\tau )x_{i1}x_{i3} +\alpha _8(t_{ij},\tau )x_{i2}x_{i3}+\epsilon _{ij}(\tau ), \end{aligned}$$

where $y_{ij}$ is the i-th individual’s CD4 percentage at time $t_{ij}$ (in years), and $x_{i2}$ and $x_{i3}$ are standardized. The baseline function $\alpha _0(t,\tau )$ represents the $\tau $-th quantile of CD4 percentage t years after the infection for a nonsmoker with average preCD4 percentage and average age at HIV infection. Here, we focus on three quantile levels $\tau = 0.25$, 0.5 and 0.75, and we use the compound symmetry working correlation to fit the data, and the results are compared with working independence. The results of AR(1) working correlation are very similar.

Table 5 Selected components of two different methods at three quantile levels for AIDS data

Full size table

Table 5 shows the selected components at different quantile levels, and we found that our QIF method selects one or two more variables than WI approach at quantile levels $\tau =0.25$ and 0.5. In particular, the smoking effect is selected at $\tau =0.25$ but not at $\tau =0.5$ and 0.75, which indicates that smoking has the effect on the CD4 counts when the count is small. The estimated curves at $\tau =0.5$ together with their 95% point-wise confidence band of the four important nonparametric components are shown in Fig. 1. Note that the zero lines are not completely contained in the selected 95% point-wise confidence bands, which indicate that the variables selected are reasonable. Figure 1a implies that the baseline coefficient function $\alpha _0(t,\tau )$ decreases over time. From Fig. 1b, the preCD4 has positive effect on the post-infection CD4 on the whole with the rate decreases at first and increases later. It is noteworthy that the interaction between smoking and age is significant from Fig. 1c, though the individual variables have no effects on the post-infection CD4. In particular, the elder smokers tend to have lower CD4 counts. Furthermore, Fig. 1d shows that interaction between smoking and preCD4 increases at first and then decreases, which implies that smokers with high preCD4 tend to have higher CD4 counts after infection and it decreases after a period of follow-up. One possible explanation is that a smoking patient with lower preCD4 may choose to quit smoking due to medical concerns at first, and they start to smoke again when their preCD4 counts improved after a period.

6 Concluding remarks

In this paper, we investigate statistical method and theory for estimation and variable selection of quantile varying coefficient model in longitudinal data analysis. By using the QIF approach, the correlation within subject is incorporated and the efficiency of estimation and accuracy of the variable selection are improved compared to quantile regression assuming working independence. We further propose to use induced smoothing for estimation, and the procedure is easy to implement. Simulation studies and real data analysis show that the performance of our proposed approach is encouraging.

Several problems can be investigated in the future. Note that the dimension of the nonparametric components is fixed in this work. It may be extended to the case with a diverging number of covariates. One can also extend this work to the partially linear varying coefficient model, which is more flexible than VCM. In addition, it is also important to identify varying and constant coefficients among the nonzero coefficients. For that purpose, it is possible to use the hypotheses testing approach like Wang et al. (2009) after identifying the nonzero coefficients based on our method. Finally, since quantile regression curves are estimated individually, one interesting and important problem is how to avoid the crossing of the estimated quantile curves at adjacent quantile levels (Bondell et al. 2010).

References

Andriyana, Y., Gijbels, I., Verhasselt, A. (2014). P-splines quantile regression estimation in varying coefficient models. Test, 23(1), 153–194.
MathSciNet MATH Google Scholar
Bondell, H. D., Reich, B. J., Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika, 97(4), 825–838.
MathSciNet MATH Google Scholar
Brown, B., Wang, Y.-G. (2005). Standard errors and covariance matrices for smoothed rank estimators. Biometrika, 92(1), 149–158.
MathSciNet MATH Google Scholar
Cai, Z., Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association, 103(484), 371–383.
MathSciNet MATH Google Scholar
Chiang, C.-T., Rice, J. A., Wu, C. O. (2001). Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association, 96(454), 605–619.
MathSciNet MATH Google Scholar
De Boor, C. (2001). A practical guide to splines (revised ed., Vol. 27)., Applied mathematical sciences New York: Springer.
MATH Google Scholar
Fan, J., Huang, T., Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of covariance function. Journal of the American Statistical Association, 102(478), 632–641.
MathSciNet MATH Google Scholar
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
MathSciNet MATH Google Scholar
Fan, J., Zhang, W. (1999). Statistical estimation in varying coefficient models. Annals of Statistics, 27, 1491–1518.
Fan, J., Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics, 27(4), 715–731.
MathSciNet MATH Google Scholar
Fu, L., Wang, Y.-G. (2012). Quantile regression for longitudinal data with a working correlation model. Computational Statistics & Data Analysis, 56(8), 2526–2538.
MathSciNet MATH Google Scholar
Greene, W. H. (2011). Econometric Analysis. New York, USA: Pearson.
Google Scholar
Hall, P., Sheather, S. J. (1988). On the distribution of a studentized quantile, Journal of the Royal Statistical Society. Series B (Methodological), 50(3), 381–391.
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society Series B (Methodological), 55(4), 757–796.
MathSciNet MATH Google Scholar
He, X., Shi, P. (1996). Bivariate tensor-product B-splines in a partly linear model. Journal of Multivariate Analysis, 58(2), 162–181.
MathSciNet MATH Google Scholar
Hendricks, W., Koenker, R. (1992). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association, 87(417), 58–68.
Google Scholar
Huang, J. Z., Wu, C. O., Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89(1), 111–128.
MathSciNet MATH Google Scholar
Huang, J. Z., Zhang, L., Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scandinavian Journal of Statistics, 34(3), 451–477.
MathSciNet MATH Google Scholar
Jung, S.-H. (1996). Quasi-likelihood for median regression models. Journal of the American Statistical Association, 91(433), 251–257.
MathSciNet MATH Google Scholar
Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, B. F., Rinaldo, C. R., et al. (1987). The multicenter aids cohort study: rationale, organization, and selected characteristics of the participants. American Journal of Epidemiology, 126(2), 310–318.
Google Scholar
Kim, M.-O. (2007). Quantile regression with varying coefficients. The Annals of Statistics, 35(1), 92–108.
MathSciNet MATH Google Scholar
Koenker, R. (2005). Quantile regression. Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Leng, C., Zhang, W. (2014). Smoothing combined estimating equations in quantile regression for longitudinal data. Statistics and Computing, 24(1), 123–136.
MathSciNet MATH Google Scholar
Li, G., Lai, P., Lian, H. (2014). Variable selection and estimation for partially linear single-index models with longitudinal data, Statistics and Computing (in press).
MathSciNet MATH Google Scholar
Lian, H., Liang, H., Wang, L. (2014). Generalized additive partial linear models for clustered data with diverging number of covariates using gee. Statistica Sinica, 24(1), 173–196.
Liang, K.-Y., Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
MathSciNet MATH Google Scholar
Lin, H., Song, P. X.-K., Zhou, Q. M. (2007). Varying-coefficient marginal models and applications in longitudinal data analysis. Sankhyā: The Indian Journal of Statistics, 69(3), 581–614.
Ma, S., Liang, H., Tsai, C.-L. (2014). Partially linear single index models for repeated measurements. Journal of Multivariate Analysis, 130, 354–375.
MathSciNet MATH Google Scholar
Noh, H., Chung, K., Van Keilegom, I., et al. (2012). Variable selection of varying coefficient models in quantile regression. Electronic Journal of Statistics, 6, 1220–1238.
MathSciNet MATH Google Scholar
Qu, A., Li, R. (2006). Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics, 62(2), 379–391.
MathSciNet MATH Google Scholar
Qu, A., Lindsay, B. G., Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika, 87(4), 823–836.
MathSciNet MATH Google Scholar
Şentürk, D., Müller, H.-G. (2008). Generalized varying coefficient models for longitudinal data. Biometrika, 95(3), 653–666.
MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
MathSciNet MATH Google Scholar
Van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge, UK: Cambridge University Press.
Google Scholar
Verhasselt, A. (2014). Generalized varying coefficient models: a smooth variable selection technique. Statistica Sinica, 24(1), 147–171.
MathSciNet MATH Google Scholar
Wang, H. J., Zhu, Z., Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 37(6), 3841–3866.
MathSciNet MATH Google Scholar
Wang, H., Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association, 104(486), 747–757.
MathSciNet MATH Google Scholar
Wang, L., Li, H., Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.
MathSciNet MATH Google Scholar
Wang, N., Carroll, R. J., Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association, 100(469), 147–157.
MathSciNet MATH Google Scholar
Xue, L., Qu, A., Zhou, J. (2010). Consistent model selection for marginal generalized additive model for correlated data. Journal of the American Statistical Association, 105(492), 1518–1530.
MathSciNet MATH Google Scholar
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
MathSciNet MATH Google Scholar
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
MathSciNet MATH Google Scholar
Zhao, W., Zhang, R., Lv, Y., Liu, J. (2013). Variable selection of the quantile varying coefficient regression models. Journal of the Korean Statistical Society, 42(3), 343–358.
MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476), 1418–1429.
MathSciNet MATH Google Scholar
Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of statistics, 36(4), 1509–1533.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We sincerely thank the Editor/Professor Hironori Fujisawa, an associate editor and two anonymous reviewers, for their insightful comments that have led to significant improvement of the paper. Zhao’s research is supported in part by National Social Science Foundation of China (15BTJ027). Zhang’s research is supported by the National Natural Science Foundation of China Grant 11671374 and 71631006. Heng Lian’s research is partially supported by City University of Hong Kong Start-up Grant 7200521 and RGC General Research Fund 11301718.

Author information

Authors and Affiliations

School of Science, Nantong University, 9 Seyuan Rd, Nantong, 226000, China
Weihua Zhao
Department of Statistics and Finance, University of Science and Technology of China, 96 Jinzhai Rd, Hefei, 230026, China
Weiping Zhang
Department of Mathematics, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong
Heng Lian

Authors

Weihua Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Heng Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weihua Zhao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 258 KB)

About this article

Cite this article

Zhao, W., Zhang, W. & Lian, H. Marginal quantile regression for varying coefficient models with longitudinal data. Ann Inst Stat Math 72, 213–234 (2020). https://doi.org/10.1007/s10463-018-0684-7

Download citation

Received: 08 October 2017
Revised: 24 July 2018
Published: 18 August 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10463-018-0684-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Marginal quantile regression for varying coefficient models with longitudinal data

Abstract

Similar content being viewed by others

Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data

Smoothing combined generalized estimating equations in quantile partially linear additive models with longitudinal data

Smoothed empirical likelihood inference via the modified Cholesky decomposition for quantile varying coefficient models with longitudinal data

1 Introduction

2 Methodology

2.1 Estimation based on QIF

Remark 1

2.2 Theoretical properties

Theorem 1

Remark 2

Remark 3

2.3 Induced smoothing

Theorem 2

Remark 4

Remark 5

3 Variable selection

Theorem 3

Theorem 4

4 Simulation studies

Example 1

Example 2

5 Real data analysis

6 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 258 KB)

About this article

Cite this article

Keywords

Navigation

Marginal quantile regression for varying coefficient models with longitudinal data

Abstract

Similar content being viewed by others

Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data

Smoothing combined generalized estimating equations in quantile partially linear additive models with longitudinal data

Smoothed empirical likelihood inference via the modified Cholesky decomposition for quantile varying coefficient models with longitudinal data

1 Introduction

2 Methodology

2.1 Estimation based on QIF

Remark 1

2.2 Theoretical properties

Theorem 1

Remark 2

Remark 3

2.3 Induced smoothing

Theorem 2

Remark 4

Remark 5

3 Variable selection

Theorem 3

Theorem 4

4 Simulation studies

Example 1

Example 2

5 Real data analysis

6 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 258 KB)

About this article

Cite this article

Share this article

Keywords

Search

Navigation