1 Introduction

Single-index models are becoming increasingly popular due to its flexibility and interpretability. They also can effectively overcome the problem of “curse of dimensionality” through projecting multivariate covariates onto one-dimensional index variate \({\varvec{x}}^T {\beta }\). Longitudinal data frequently occur in the biomedical, epidemiological, social, and economical fields. For longitudinal data, subjects are often measured repeatedly over a given time period. Thus, observations from the same subject are correlated and those from different subjects are often independent. The technique of generalized estimating equation (GEE) proposed by Liang and Zeger (1986) is widely used for longitudinal data. The GEE method produces consistent estimators for the mean parameters through specifying a working structure. However, the main drawback of GEE is that it may lead to a great loss of efficiency when the working covariance structure is misspecified. Thus, it is an interesting topic to model the covariance structure. Recently, the modified Cholesky decomposition has been demonstrated to be effective for modeling the covariance structure. It not only permits more general forms of the correlation structures, but also leads automatically to positive definite covariance matrix. Ye and Pan (2006) utilized the modified Cholesky decomposition to decompose the inverse of covariance matrix and proposed a joint mean–covariance model for longitudinal data. Leng et al. (2010) constructed a semiparametric mean–covariance model through relaxing the parametric assumption, which is more flexible. Zhang and Leng (2012) used a new Cholesky factor to deal with the within-subject structure by decomposing the covariance matrix rather than its inverse. Other related references include Mao et al. (2011), Zheng et al. (2014), Liu and Zhang (2013), Yao and Li (2013), and Liu and Li (2015).

In recent years, some statistical inference methods have been proposed for longitudinal single-index models. Xu and Zhu (2012) proposed a kernel GEE method. Lai et al. (2012) presented the bias-corrected GEE estimation and variable selection procedure for the index coefficients. Zhao et al. (2017) constructed a robust estimation procedure based on quantile regression and a specific correlation structure (e.g., compound symmetry (CS) or the first-order autoregressive (AR(1)). All these articles used some specific correlation structures when taking into account the within-subject correlation. Thus, these methods may result in a loss of efficiency when the true correlation structure is not correctly specified. Recently, Lin et al. (2016) developed a new efficient estimation procedure for single-index models by combining the modified Cholesky decomposition and the local linear smoothing method. Motivated by Leng et al. (2010), Guo et al. (2016) proposed a two-step estimation procedure for single-index models based on the modified Cholesky decomposition and the GEE method. The above two papers are built on mean regression, which is very sensitive to outliers and heavy tail errors. In contrast with mean regression, quantile regression not only has the ability of describing the entire conditional distribution of the response variable, but also can accommodate non-normal errors. Thus, it has emerged as a powerful complement to the mean regression. Although the modified Cholesky decomposition has been well studied for the mean regression models, it is lack of analyzing longitudinal single-index quantile models. In this paper, we use the modified Cholesky decomposition to parameterize the within-subject covariance structure and construct more efficient estimation procedure for the index coefficients and the link function. Compared with existing research results, the new method has several advantages. Firstly, the proposed method does not need to specify the working correlation structure to improve the estimate efficiency. So our approach not only can take the within-subject correlation into consideration, but also permits more general forms of the covariance structures, which indicates that it is more flexible than most of the existing methods. Secondly, since the proposed estimating equations include the discrete quantile score function, we construct new smoothed estimating equations for fast and accurate computation of the parameter estimates. Thirdly, the estimators of the index coefficients and the link function are demonstrated to be asymptotically efficient.

The rest of this article is organized as follows: In Sect. 2, within the framework of independent working structure, we propose the quantile score estimating functions for the index coefficients based on “remove–one–component” method, and the corresponding theoretical properties are also given in this section. In Sect. 3, we apply the modified Cholesky decomposition to decompose the within-subject covariance matrix as moving average coefficients and innovation variances, which can be estimated through constructing two estimating equations. In Sect. 4, more efficient quantile estimating functions are derived based on the estimated covariance matrix. In Sect. 5, extensive simulation studies are carried out to evaluate the finite sample performance of the proposed method. In Sect. 6, we illustrate the proposed method through a real data analysis. Finally, all the conditions and the proofs of the main results are provided in “Appendix.”

2 Estimation procedure under the independent structure

A marginal quantile single-index model with longitudinal data has the following structure

$$\begin{aligned} {Y_{ij}} = g_{0\tau }\left( {{\varvec{X}}_{ij}^T{\beta }_{0\tau }} \right) + \varepsilon _{ij,{\tau }}, i=1,\ldots ,n,j=1,\ldots ,m_i, \end{aligned}$$

where \({Y_{ij}} = Y\left( {{t_{ij}}} \right) \in \mathbb {R}\) is the jth measurement of the ith subject, \({{\varvec{X}}_{ij}} = {\varvec{X}}\left( {{t_{ij}}} \right) \in \mathbb {R} ^p\), \(g_{0\tau }(\cdot )\) is an unknown differentiable univariate link function, \({\varepsilon _{ij, \tau }}\) is the random error term with an unspecified density function \(f_{ij}(\cdot )\) and \(P\left( {{\varepsilon _{ij,\tau }} < 0} \right) = \tau \) for any i, j and \(\tau \in \left( {0,1} \right) \), and \({\beta }_{0\tau } \) is an unknown parameter vector which belongs to the parameter space

$$\begin{aligned} {\varvec{\varTheta }}=\left\{ {{\beta } = {{\left( {{\beta _1},\ldots ,{\beta _p}} \right) }^T}:{{\left\| {\beta } \right\| }} = 1, ~{\mathrm{~and~ the}}~r{\mathrm{th ~component~ is~ positive~}} } \right\} , \end{aligned}$$

where \({\left\| \cdot \right\| }\) is the Euclidean norm. Without loss of generality, we assume that the true vector \({\beta }\) has a positive component \({\beta _r}\) (otherwise, consider \(-{\beta }\)). For simplicity, we omit \(\tau \) from \({\varepsilon _{ij, \tau }}, {\varvec{\beta }}_{0\tau }\) and \(g_{0\tau }\left( \cdot \right) \) in the rest of this article, but we should remember that they are \(\tau \)-specific. Let \({{\varvec{Y}}_i} = {\left( {{Y_{i1}},\ldots ,{Y_{i{m_i}}}} \right) ^T}\), \({{\varvec{X}}_i} = \left( {{{\varvec{X}}_{i1}},\ldots ,{{\varvec{X}}_{i{m_i}}}} \right) ^T\), and \({{\varvec{\varepsilon }}_i} = {\left( {{\varepsilon _{i1}},\ldots ,{\varepsilon _{i{m_i}}}} \right) ^T}\). In this paper, we assume the number of measurements \(m_i\) is uniformly bounded for each i, which means that n and N (\(N = \sum _{i = 1}^n {{m_i}}\)) have the same order.

2.1 Estimations of \(g_0(\cdot )\) and its first derivative \({g'_0}(\cdot )\)

B-spline is commonly used to approximate the nonparametric function for its efficient in function approximation and numerical computation, which can refer to Ma and He (2016), Guo et al. (2016), and Zhao et al. (2017). In this paper, we adopt B-spline basis functions to approximate the unknown link function \(g_0(\cdot )\). We assume \({\varvec{X}}_{ij}^T{\beta }\) is confined in a compact set [ab]. Consider a knot sequence with \(N_n\) interior knots, denoted by \({\xi _1} = \cdots =a = {\xi _q}< {\xi _{q + 1}}< \cdots< {\xi _{q + {N_n}}} < b = {\xi _{q + {N_n} + 1}} = \cdots = {\xi _{2q + {N_n}}}\). We set the B-spline basis functions as \({{\varvec{B}}_q}(u) = {\left( {{B_{1,q}}(u),\ldots ,{B_{{J_n,q}}}(u)} \right) ^T}\) with the order q (\(q\ge 2\)) and \(J_n=N_n+q\). We approximate the link function \(g_0(u)\) by \(g_0\left( u \right) \approx {{\varvec{B}}_q{\left( u \right) ^T}}{\varvec{\theta }}\), where \({\varvec{\theta }} = {\left( {{\theta _1},\ldots ,{\theta _{{J_n}}}} \right) ^T}\) is the spline coefficient vector. For a given \({\beta }\), we can obtain the estimator \({\hat{{\varvec{\theta }}}}\left( {\beta } \right) \) of \({\varvec{\theta }}\) by minimizing the following objective function

$$\begin{aligned} {L_n}\left( {\beta };{\varvec{\theta }}\right) =\sum \limits _{i = 1}^n {\sum \limits _{j = 1}^{{m_i}} {{\rho _\tau }\left( {{Y_{ij}} - {{\varvec{B}}_q}{{\left( {\varvec{X}}_{ij}^T{\beta } \right) }^T}{\varvec{\theta }} } \right) } }, \end{aligned}$$
(1)

where \({\rho _\tau }\left( u \right) = u\left\{ {\tau - I\left( {u < 0} \right) } \right\} \) is the quantile loss function. Then, the link function \(g_0(\cdot )\) is estimated by the spline functions \(\hat{g}\left( u;{\beta } \right) = {{\varvec{B}}_q{\left( u \right) ^T}}{\hat{{\varvec{\theta }}}} \left( {\beta } \right) \). Following Ma and Song (2015), the estimator of \({{ { g'_0(\cdot )}}}\) is defined by

$$\begin{aligned} {\hat{g}'}(u;{\beta } ) = \sum \limits _{s = 1}^{{J_n}} {{ B'_{s,q}}} (u){{ \hat{\theta } }_s}\left( {\beta } \right) = \sum \limits _{s = 2}^{{J_n}} {{ B_{s,q - 1}}} (u){{\hat{\omega } }_s}\left( {\beta } \right) , \end{aligned}$$

where \({{\hat{\omega } }_s}\left( {\beta } \right) = {{\left( {q - 1} \right) \left\{ {{{ \hat{\theta } }_s}\left( {\beta } \right) - {{ \hat{\theta } }_{s - 1}}\left( {\beta } \right) } \right\} } \Big /{\left( {{\xi _{s + q - 1}} - {\xi _s}} \right) }}\) for \(2\le s\le J_n\). Thus, we have \({{ {\hat{g}'}}}({u};{\beta } )={\varvec{B}}_{q-1}(u)^T {\varvec{D}}_1 {{\hat{{\varvec{\theta }}}}} ({\beta })\), where \({\varvec{B}}_{q-1}(u)=\left( {{B_{s,q - 1}}(u):2\le s\le J_n} \right) ^T\) is the \((q-1)\)th-order B-spline basis and

$$\begin{aligned} {{\varvec{D}}_1} = (q - 1) = \left[ {\begin{array}{*{20}{c}} {\frac{{ - 1}}{{{\xi _{q + 1}} - {\xi _2}}}} &{} {\frac{1}{{{\xi _{q + 1}} - {\xi _2}}}} &{} 0 &{} \cdots &{} 0 \\ 0 &{} {\frac{{ - 1}}{{{\xi _{q + 2}} - {\xi _3}}}} &{} {\frac{1}{{{\xi _{q + 2}} - {\xi _3}}}} &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} {\frac{{ - 1}}{{{\xi _{{N_n} + 2q - 1}} - {\xi _{{N_n} + q}}}}} &{} {\frac{1}{{{\xi _{{N_n} + 2q - 1}} - {\xi _{{N_n} + q}}}}} \\ \end{array}} \right] _{({J_n} - 1) \times {J_n}}. \end{aligned}$$

2.2 The profile-type estimating equations for \({\beta }\)

The parameter space \({{{\varvec{\varTheta }}}}\) means that \({\beta }\) is on the boundary of a unit ball. Therefore, \(g_0\left( {{\varvec{X}}_{ij}^T{\beta } } \right) \) is non-differential at the point \({\beta }\). However, we must use the derivative of \(g_0\left( {{\varvec{X}}_{ij}^T{\beta } } \right) \) with respect to \({\beta }\) when constructing the profile-type estimating equations. To solve the above problem, we employ “remove–one–component” method (Cui et al. 2011) to transform the boundary of a unit ball in \(\mathbb {R} ^p\) to the interior in \(\mathbb {R} ^{p-1}\). Specifically, let \({{\beta } ^{(r)}} = {\left( {{\beta _1},\ldots ,{\beta _{r - 1}},{\beta _{r + 1}},\ldots ,{\beta _p}} \right) ^T}\) be a \(p-1\) dimensional vector by removing the rth component \(\beta _{r}\) in \({\beta }\). Then, \({\beta }\) can be rewritten as \({\beta } ={ {\beta }} ({\beta }^{(r)} ) = {\left( {{\beta _1},\ldots ,{\beta _{r - 1}},{{\left( {1 - {{\left\| {{{\beta } ^{(r)}}} \right\| }^2}} \right) }^{{1/2}}},{\beta _{r + 1}},\ldots ,{\beta _p}} \right) ^T}\) and \({\beta } ^{(r)}\) belongs to the parameter space

$$\begin{aligned} {{\varvec{\varTheta }}^{(r)}} = \Bigg \{ {{{\beta } ^{(r)}} = {{\left( {{\beta _1},\ldots ,{\beta _{r - 1}},{\beta _{r + 1}},\ldots ,{\beta _p}} \right) }^T}:\left\| {{{\beta } ^{(r)}}} \right\| < 1} \Bigg \}. \end{aligned}$$

So \({\beta }\) is infinitely differentiable with respect to \({\beta }^{(r)}\) and the Jacobian matrix is

$$\begin{aligned} {{\varvec{J}}_{{{\beta } ^{(r)}}}} = \frac{{\partial {\beta } }}{{\partial {{\beta } ^{(r)}}}} = {\left( {{\varsigma _1},\ldots ,{\varsigma _p}} \right) ^T}, \end{aligned}$$

where \({\varsigma _r} = - {\left( {1 - {{\left\| {{{\beta } ^{(r)}}} \right\| }^2}} \right) ^{ - {1 / 2}}}{{\beta } ^{(r)}}\) and \({\varsigma _s}\left( {1 \le s \le p,s \ne r} \right) \) is a \(\left( p-1 \right) \times 1\) unit vector with sth component 1.

Motivated by the idea of GEE (Liang and Zeger 1986), together with the estimators of \(g_0(\cdot )\) and \({g'_0}(\cdot )\), we construct the profile-type estimating equations for \(p-1\) dimensional vector \({\beta }^{(r)}\)

$$\begin{aligned} R \left( {\beta }^{(r)}\right) =\sum \limits _{i = 1}^n {{{\varvec{J}}_{{{\beta } ^{(r)}}}^T}{\hat{{\varvec{X}}}}_i^T{\hat{{\varvec{G'}}}} \left( {{\varvec{X}}_i}{\beta }; {\beta }\right) {{\varvec{\varLambda }} _i} {\psi _\tau }\left( {{{\varvec{Y}}_i} - \hat{g}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) } \right) } = {\varvec{0}}, \end{aligned}$$
(2)

where \({\hat{{\varvec{G'}}}}\left( {{\varvec{X}}_i}{\beta };{\beta }\right) = {\mathrm{diag}}\left\{ {{ \hat{g}'}\left( {{\varvec{X}}_{i1}^T{\beta } } ;{\beta }\right) ,\ldots , {\hat{g}'}\left( {{\varvec{X}}_{i{m_i}}^T{\beta } } ;{\beta }\right) } \right\} \), \({\psi _\tau }\left( u \right) = I\left( {u < 0} \right) -\tau \) is the quantile score function, \({\psi _{{\tau }}}\left( {{{\varvec{u}}_i}} \right) = {\left( {{\psi _{{\tau }}}\left( {{u_{i1}}} \right) ,\ldots ,{\psi _{{\tau }}}\left( {{u_{i{m_i}}}} \right) } \right) ^T}\), \(\hat{g}\left( {{\varvec{X_i}}{\beta } ;{\beta } } \right) = {\left( {\hat{g}\left( {{\varvec{X}}_{i1}^T{\beta } ;{\beta } } \right) ,\ldots ,\hat{g}\left( {{\varvec{X}}_{i{m_i}}^T{\beta } ;{\beta } } \right) } \right) ^T}\), \({{\hat{{\varvec{X}}}}_i} = {\left( {{{\hat{{\varvec{X}}}}_{i1}},\ldots ,{{\hat{{\varvec{X}}}}_{i{m_i}}}} \right) ^T}\), \({{\hat{{\varvec{X}}}}_{ij}} = {{\varvec{X}}_{ij}} - \hat{E}\left( {{{\varvec{X}}_{ij}}\left| {\varvec{X}}_{ij}^T {\beta }\right. } \right) \) and \(\hat{E}\left( {{{\varvec{X}}_{ij}}\left| {\varvec{X}}_{ij}^T {\beta } \right. } \right) \) is the spline estimator of \(E\left( {{{\varvec{X}}_{ij}}\left| {\varvec{X}}_{ij}^T {\beta }_0 \right. } \right) \). In estimating Eq. (2), the term \({\varvec{\varLambda }}_i = {\mathrm{diag}}\left\{ {{f_{i1}}\left( 0 \right) ,\ldots ,{f_{i{m_i}}}\left( 0 \right) } \right\} \) describes the dispersions in \(\varepsilon _{ij}\). In some cases when \(f_{ij}\) is difficult to estimate, \({\varvec{\varLambda }}_i\) can be simply treated as an identity matrix with a slight loss of efficiency (Jung 1996). We define the solution of (2) as \({\hat{{\beta }}}^{(r)}\) and then use the fact \({\beta _r} = \sqrt{1 - {{\left\| {{{\beta } ^{(r)}}} \right\| }^2}} \) to obtain \({{\hat{{\beta }}}}\). The asymptotic property of \({{\hat{{\beta }}}}\) is given in Lemma 3 of “Appendix.”

2.3 Computational algorithm

Solving estimating Eq. (2) faces some interesting challenges due to the discontinuous indicator function. To overcome the calculation difficulty, we approximate \({\psi _\tau }\left( \cdot \right) \) by a smooth function \({\psi _{h\tau } }\left( \cdot \right) \) based on the idea of Wang and Zhu (2011). Define \(G\left( x \right) = \int _{u < x} {K\left( u \right) } du\) and \({G_h}\left( x \right) = G\left( {{x \big / h}} \right) \), where \(K\left( \cdot \right) \) is a kernel function and h is a positive bandwidth parameter. Then, we approximate \({\psi _\tau }\left( {{Y_{ij}} - \hat{g}\left( {{\varvec{X}}_{ij}^T{\beta } ;{\beta } } \right) } \right) \) by \({\psi _{h\tau }}\left( {{Y_{ij}} - \hat{g}\left( {{\varvec{X}}_{ij}^T{\beta } ;{\beta } } \right) } \right) = 1-{G_h}\left( {Y_{ij}}-\hat{g}\left( {{\varvec{X}}_{ij}^T{\beta } ;{\beta } } \right) \right) - \tau \). Therefore, based on the approximation, estimating Eq. (2) can be replaced by the following smoothed estimating equations

$$\begin{aligned} \tilde{R} \left( {\beta }^{(r)}\right) =\sum \limits _{i = 1}^n {{{\varvec{J}}_{{{\beta } ^{(r)}}}^T}{\hat{{\varvec{X}}}}_i^T{\hat{{\varvec{G'}}}}\left( {{\varvec{X}}_i}{\beta }; {\beta }\right) {{\varvec{\varLambda }} _i} {\psi _{h\tau } }\left( {{{\varvec{Y}}_i} - \hat{g}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) } \right) } = {\varvec{0}}. \end{aligned}$$
(3)

We define the solution of (3) as \({\tilde{{\beta }}^{(r)}}\). Since (3) is nonlinear equation about \({\beta }^{(r)} \), the Fisher–Scoring iterative algorithm can be used to solve it. Specifically, the iterative algorithm is described as follows:

Step 0 Start with an initial value \({\varvec{ \beta }} ^{(0)}\), which is obtained by Ma and He (2016).

Step 1 Set \(k=0\). Use the current estimate \({\beta } ^{\left( k \right) }\) and minimize \(L_n({\varvec{\beta ^{(k)}}};{\varvec{\theta }})\) with respect to \({\varvec{\theta }}\) to obtain the estimator \({\hat{{\varvec{\theta }}}}^{(k)}\). Then, we can obtain \(\hat{g}^{(k)}\left( u;{\beta }^{(k)} \right) ={{\varvec{B_q}}{{\left( u \right) }^T}{\hat{{\varvec{\theta }}}}^{(k)} }\) and \({\hat{g}'^{(k)}} \left( u ;{\beta }^{(k)}\right) = {{{\varvec{ B}}}_{q - 1}}{\left( u \right) ^T}{{\varvec{D_1}}}{\hat{{\varvec{\theta }}}}^{(k)}\).

Step 2 Utilize the estimators \(\hat{g}^{(k)}\) and \({\hat{g}'^{(k)}}\) obtained by Step 1; \({\left( {{{\beta } ^{(r)}}} \right) ^{(k)}}\) can be updated by

$$\begin{aligned} {\left( {{{\beta } ^{(r)}}} \right) ^{(k + 1)}} = {\left( {{{\beta } ^{(r)}}} \right) ^{(k)}} - {\tilde{D}} {\left( {{{\beta } ^{(r)}}} \right) ^{ - 1}}{\tilde{R}}\left( {{{\beta } ^{(r)}}} \right) {|_{{{\beta } ^{(r)}} = {{\left( {{{\beta } ^{(r)}}} \right) }^{(k)}}}}, \end{aligned}$$

where

$$\begin{aligned} {\tilde{D}}\left( {{{\beta } ^{(r)}}} \right) \buildrel \varDelta \over = \frac{{\partial \tilde{R}\left( {{{\beta } ^{(r)}}} \right) }}{{\partial {{\beta } ^{(r)}}}} = \sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta } ^{(r)}}}^T{\hat{{\varvec{X}}}}_i^T{\hat{{\varvec{G}}}'}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) {{\varvec{ \varLambda }}_i} {{\tilde{{\varvec{\varLambda }}} }_i}\left( {\beta }\right) {\hat{{\varvec{G}}}'}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) {{\hat{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta } ^{(r)}}}}} \end{aligned}$$

and

$$\begin{aligned}&{{\tilde{{\varvec{\varLambda }}} }_i}\left( {\beta }\right) \\&\quad = diag\Bigg \{ {{h^{ - 1}}K\left[ {{{\left( {{Y_{i1}} - \hat{g}\left( {{\varvec{X}}_{i1}^T{\beta } ;{\beta } } \right) } \right) } \Big / h}} \right] ,\ldots ,{h^{ - 1}}K\left[ {{{\left( {{Y_{i{m_i}}} - \hat{g}\left( {{\varvec{X}}_{i{m_i}}^T{\beta } ;{\beta } } \right) } \right) } \Big / h}} \right] } \Bigg \}. \end{aligned}$$

Step 3 Set \(k=k+1\) and repeat Steps 1 and 2 until convergence.

Step 4 With the final estimators \({\tilde{{\beta }}^{(r)}}\) and \({\hat{{\varvec{\theta }}}}\) obtained from Step 3, we can get the final estimator of \( g_0\left( u \right) \) by \(\hat{g}\left( {u;{\tilde{{\beta }}} } \right) ={{\varvec{B}}_q}{({u})^T}{{{\hat{{\varvec{\theta }}}}}}({\tilde{{\beta }}} )\), where \({{\tilde{{\beta }}}}\) is obtained by the fact \({\beta _r} = \sqrt{1 - {{\left\| {{{\beta } ^{(r)}}} \right\| }^2}} \).

Remark 1

If the sum of \(\left| {\left( {{{\beta } ^{(r)}}} \right) ^{(k + 1)}} - {\left( {{{\beta } ^{(r)}}} \right) ^{(k)}}\right| \) is smaller than a cutoff value (such as \(10^{-6}\)), we stop the iteration. Our simulation studies indicate that the Fisher–Scoring algorithm can find the numerical solution of (3) quickly.

2.4 Asymptotic properties

Let \(g_0\left( {u } \right) \) and \({\beta }_0\) be the true values of \(g\left( {u } \right) \) and \({\beta }\), respectively. In the following theorems, we need to restrict \({{\beta } } \in {\varvec{\varTheta }}_n\), where \({{\varvec{\varTheta }}_{n}} = \left\{ {{{\beta } } \in {\varvec{\varTheta }}:\left\| {{\beta } }-{{\beta } _0} \right\| \le C {n^{{{ - 1} / 2}}}} \right\} \) for some positive constant C. Since we anticipate that the estimators of \({\beta }_0\) are root-n consistent, we should look for the solutions of (3) which involve \({\beta }\) distant from \({\beta }_0 \) by order \(n^{-1/2}\). Define

$$\begin{aligned} {\varvec{\varPhi }} = \mathop {\lim }\limits _{n \rightarrow \infty } \frac{1}{n}\sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta }_0 ^{(r)}}}^T {\tilde{{\varvec{X}}}}_i^T {{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\varvec{\varLambda }} _i}{{\varvec{\varLambda }} _i}{{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta }_0 ^{(r)}}}}} , \end{aligned}$$

and

$$\begin{aligned} {\varvec{\varPsi }} = \mathop {\lim }\limits _{n \rightarrow \infty } \frac{1}{n}\sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta }_0 ^{(r)}}}^T{\tilde{{\varvec{X}}}}_i^T{{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\varvec{\varLambda }} _i}{{\varvec{\varSigma }} _{\tau i}}{{\varvec{\varLambda }} _i}{{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta }_0 ^{(r)}}}}}, \end{aligned}$$

where \({{\varvec{\varSigma }}_{\tau i}} = Cov\left( {{\psi _\tau }\left( {{{\varvec{\varepsilon }}_i}} \right) } \right) \) , \({{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) } = {\mathrm{diag}}\left\{ {{ g'_0}\left( {{\varvec{X}}_{i1}^T{{\beta } _0}} \right) ,\ldots ,{g'_0}\left( {{\varvec{X}}_{i{m_i}}^T{{\beta } _0}} \right) } \right\} \), \({{\tilde{{\varvec{X}}}}_i} = {\left( {{{\tilde{{\varvec{X}}}}_{i1}},\ldots ,{{\tilde{{\varvec{X}}}}_{i{m_i}}}} \right) ^T}\) and \({{\tilde{{\varvec{X}}}}_{ij}} = {{\varvec{X}}_{ij}} - E\left( {{{\varvec{X}}_{ij}}\left| {{\varvec{X}}_{ij}^T{{\beta } _0}} \right. } \right) \).

Theorem 1

Under conditions (C1)–(C7) in “Appendix,” and the number of knots satisfies \({n^{{1/{(2d + 2)}}}} \ll {N_n} \ll {n^{{1/4}}}\), where d is given in the condition (C2) of “Appendix,” we have

$$\begin{aligned} \sqrt{n} \left( {\tilde{{\beta }}} - {\beta }_0\right) \mathop \rightarrow \limits ^d N\left( {{\varvec{0}},{\varvec{J}}_{{{\beta }_0 ^{(r)}}}{{\varvec{\varPhi }}\varPhi ^{ -1}}{\varvec{\varPsi }} {{ {{{\varvec{\varPhi }} ^{ -1}}} }}} {\varvec{J}}_{{{\beta }_0^{(r)}}}^T\right) \end{aligned}$$

as \(n\rightarrow \infty \), where \(\mathop \rightarrow \limits ^d\) denotes convergence in distribution.

Theorem 2

Let \({\varvec{\theta }} ^0\) be the best approximation coefficient of \(g_0\left( {u } \right) \) in the B-spline space. When \({\beta }\) is a known constant \({\beta }_0\) or estimated to the order \({{O_p}\left( {{n^{{{ - 1}/2}}}} \right) }\), under conditions (C1)–(C7) in “Appendix,” and the number of knots satisfies \({n^{{1/{(2d + 2)}}}} \ll {N_n} \ll {n^{{1/ 4}}}\), then (i) \(\left| {{{\hat{g}}}({u};{{\beta } }) - {g_0}({u})} \right| = O_p\left( {\sqrt{{{{N_n}} / n}} + N_n^{ - d}} \right) \) uniformly in \(u \in \left[ {a,b} \right] \); and (ii) under \({n^{{1 / {(2d + 1)}}}} \ll {N_n} \ll {n^{{1 / 4}}}\),

$$\begin{aligned} \sigma _n^{-1}\left( u \right) \left( {\hat{g}\left( {u;{\beta } } \right) - \check{g} \left( {u;{\beta } } \right) } \right) \mathop \rightarrow \limits ^d N\left( {0,1} \right) , \end{aligned}$$

where \(\sigma _n^2\left( u \right) = {\varvec{B}}_q^T\left( u \right) {{\varvec{V}}^{ - 1}}\left( {{{\beta } _0}} \right) \sum _{i = 1}^n {{\varvec{B}}_q^T\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _{ i}}{{\varvec{\varSigma }} _{\tau i}}{{\varvec{\varLambda }} _{ i}}{{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) } {{\varvec{V}}^{ - 1}}\left( {{{\beta } _0}} \right) {{\varvec{B}}_q}\left( u \right) \), \(\check{g} \left( u;{\beta } \right) = {\varvec{B}}_q^T\left( u \right) {{\varvec{\theta }} ^0\left( {\beta }\right) }\), \({\varvec{B}}_q\left( {{{\varvec{X}}_i}{\beta }_0 } \right) = {\left( {{{\varvec{B}}_q}\left( {{\varvec{X}}_{i1}^T{\beta }_0 } \right) ,\ldots ,{{\varvec{B}}_q}\left( {{\varvec{X}}_{i{m_i}}^T{\beta }_0 } \right) } \right) ^T}\) and \( {\varvec{V}}\left( {{{\beta } _0}} \right) = \sum _{i = 1}^n {{\varvec{B}}_q^T\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _{ i}}{{\varvec{\varLambda }} _{ i}}{{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }. \)

3 Modeling the within-subject covariance matrix via the modified Cholesky decomposition

To incorporate the correlation within subjects, following the idea of GEE (Liang and Zeger 1986), we can use the estimating equations that take the form

$$\begin{aligned} \sum \limits _{i = 1}^n {\left( \begin{array}{l} {{\varvec{B}}_q^T}\left( {{{\varvec{X}}_i}{\beta } } \right) \\ {\varvec{J}}_{{{\beta } ^{(r)}}}^T{\hat{{\varvec{X}}}_i}^T{{{\varvec{ G'}}}\left( {{{\varvec{X}}_i}{\beta } } \right) } \\ \end{array} \right) }{{\varvec{\varLambda }} _i}{\varvec{\varSigma }}_{\tau i}^{-1} {\psi _\tau }\left( {{{\varvec{Y}}_i} - {{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{\beta } } \right) {\varvec{\theta }} } \right) ={\varvec{0}}. \end{aligned}$$
(4)

Unfortunately, estimating Eq. (4) includes an unknown covariance matrix \({\varvec{\varSigma }} _{\tau i}\). So our primary task is to estimate it. To guarantee the positive definiteness of \( {\varvec{\varSigma }} _{\tau i}\), we firstly apply the modified Cholesky decomposition to decompose \( {\varvec{\varSigma }} _{\tau i}\) as

$$\begin{aligned} {{\varvec{\varSigma }} _{\tau i}} = {{\varvec{L}}_{\tau i}}{{\varvec{D}}_{\tau i}}{\varvec{L}}_{\tau i}^T, \end{aligned}$$

where \({\varvec{L}}_{\tau i}\) is a lower triangular matrix with 1’s on its diagonal and \({\varvec{D}}_{\tau i}\) is an \(m_i \times m_i\) diagonal matrix. Let \({{\varvec{e}}_{\tau i}} ={\varvec{L_{\tau i}}}^{-1} {\psi _\tau }\left( {{{\varvec{\varepsilon }} _{i}}} \right) = {\left( {{e_{\tau ,i1}},\ldots ,{e_{\tau ,i{m_i}}}} \right) ^T}\), so we have \(Cov\left( {{{\varvec{e}}_{\tau i}}} \right) ={\varvec{L}}_{\tau i}^{-1} {\varvec{\varSigma }} _{\tau i} \left( {\varvec{L}}_{\tau i}^{-1}\right) ^T= {{\varvec{D}}_{\tau i}} \buildrel \varDelta \over = diag\left( { d_{\tau , i1}^2,\ldots ,d _{\tau ,i{m_i}}^2} \right) \), where \(d _{\tau ,ij}^2\) is called as innovation variance. Furthermore, we assume that the below diagonal entries of \({\varvec{L}}_{\tau i}\) are \(l_{\tau , ijk} (k<j=2,\ldots ,m_i)\), and then \({{\varvec{e}}_{\tau i}} ={\varvec{L_{\tau i}}}^{-1} {\psi _\tau }\left( {{{\varvec{\varepsilon }} _{i}}} \right) \) can be rewritten as

$$\begin{aligned} {\psi _\tau }\left( {{\varepsilon _{ij}}} \right) = \sum \limits _{k = 1}^{j - 1} {{l_{\tau , ijk}}} {e_{\tau , ik }} + {e_{\tau , ij}}, \end{aligned}$$

where \(l_{\tau , ijk}\) are the so-called moving average coefficients. In this paper, we define the notation \(\sum _{k = 1}^{0} \) means zero when \(j=1\). The main advantage of the modified Cholesky decomposition is that \(l_{\tau , ijk}\) and \(d _{\tau , ij}^2\) are unconstrained. In order to estimate the moving average coefficients \({l_{\tau ,ijk}} \) and the innovation variances \(d _{\tau ,ij}^2\), we construct two generalized linear models as follows:

$$\begin{aligned} {l_{\tau ,ijk}} ={\varvec{w}}_{ijk }^T{{\gamma }_\tau },\log \left( {d _{\tau ,ij}^2} \right) ={\varvec{z}}_{ij}^T{{\varvec{\lambda }}_\tau }, \end{aligned}$$
(5)

where \({{\gamma }_\tau } = {\left( {{\gamma _{\tau 1}},\ldots ,{\gamma _{\tau p_1}}} \right) ^T}\) and \({{\varvec{\lambda }}_\tau } = {\left( {{\lambda _{\tau 1}},\ldots ,{\lambda _{\tau p_2}}} \right) ^T}\). Based on the idea of Zhang and Leng (2012), the covariates \({\varvec{z}}_{ij}\) are those used in regression analysis, and \({\varvec{w}}_{ijk }\) is usually adopted as a polynomial of time difference \(t_{ij}-t_{ik}\). By adopting the idea of the GEE approach, we construct two estimating equations for \({{\gamma }_\tau }\) and \({\varvec{\lambda }} _\tau \) by

$$\begin{aligned} {U_1}\left( {\gamma }_\tau \right)= & {} \sum \limits _{i = 1}^n {\left( {\frac{{\partial {\varvec{e}}_{\tau i}^T}}{{\partial {\gamma }_\tau }}} \right) {\varvec{D}}_{\tau i}^{ - 1}{{\varvec{e}}_{\tau i}}} = {\varvec{0}}, \end{aligned}$$
(6)
$$\begin{aligned} {U_2}\left( {\varvec{\lambda }}_\tau \right)= & {} \sum \limits _{i = 1}^n {{\varvec{z}}_i^T{{\varvec{D}}_{\tau i}}{\varvec{W}}_{\tau i}^{ - 1}\left( {{\varvec{e}}_{\tau i}^2 - {\varvec{d}}_{\tau i}^2} \right) } = {\varvec{0}}, \end{aligned}$$
(7)

where \({{\partial {{\varvec{e}}_{\tau i}^T}} \big /{\partial {{\gamma }_\tau }}}\) is a \(p_1\times m_i\) matrix with the first column zero and the jth \(j\ge 2\) column \({{\partial {e_{\tau ,ij}}} \big / {\partial {{\gamma }_\tau }}} = - \sum _{k = 1}^{j - 1} {\left[ {{{\varvec{w}}_{ijk}}{e_{\tau ,ik }} + {l_{\tau ,ijk }}{{\partial {e_{\tau ,ik }}} \big / {\partial {{\gamma }_\tau }}}} \right] } \), \({{\varvec{z}}_i} = {\left( {{{\varvec{z}}_{i1}},\ldots ,{{\varvec{z}}_{i{m_i}}}} \right) ^T}\) and \({\varvec{d}}_{\tau i}^2 = {\left( {d _{\tau ,i1}^2,\ldots ,d _{\tau ,i{m_i}}^2} \right) ^T}\). Here, \({\varvec{W}}_{\tau i}\) is the covariance matrix of \({\varvec{e}}_{\tau i}^2\), namely \({{\varvec{W}}_{\tau i}} = Cov\left( {{\varvec{e}}_{\tau i}^2} \right) \). The true \({{\varvec{W}}_{\tau i}}\) is unknown and can be approximated by a sandwich “working” covariance structure \({{\varvec{W}}_{\tau i}} = {\varvec{A}}_{\tau i}^{1/2}{{\varvec{R}}_{\tau i}}\left( \rho \right) {\varvec{A}}_{\tau i}^{1/2}\) (Liu and Zhang 2013), where \({{\varvec{A}}_{\tau i}} = 2diag\left( {d _{\tau ,i1}^4,\ldots ,d _{\tau ,i{m_i}}^4} \right) \) and \({{\varvec{R}}_{\tau i}}\left( \rho \right) \) mimics the correlation between \(e_{\tau , ij}^2\) and \(e_{\tau ,ik}^2\)\((j\ne k)\) with a new parameter \(\rho \). The common structures of \({{\varvec{R}}_{\tau i}}\left( \rho \right) \) contain the compound symmetry and the first-order autoregressive. We assume that \({\hat{{\gamma }}}_\tau \) and \({\hat{{\varvec{\lambda }}}}_\tau \) are the solutions of estimating Eqs. (6) and (7). Liu and Zhang (2013) have pointed out that \({\hat{{\gamma }}}_\tau \) and \({\hat{{\varvec{\lambda }}}}_\tau \) are not sensitive to the parameter \(\rho \). So, we take \(\rho =0\) in our simulation studies and real data analysis.

Let \({\gamma }_{\tau 0}\) and \({\varvec{\lambda }}_{\tau 0}\) be the true values of \({\gamma }_\tau \) and \({\varvec{\lambda }}_\tau \), respectively. Meanwhile, we define the covariance matrix of the function \({{{{\left( {{{\varvec{U}}_1}{{\left( {\gamma }_{\tau 0} \right) }^T},{{\varvec{U}}_2}{{\left( {\varvec{\lambda }}_{\tau 0} \right) }^T}} \right) }^T}} \Bigg / {\sqrt{n} }}\) by \({{\varvec{V}}_{\tau n}} = {\left( {{\varvec{v}}_{\tau n}^{jl}} \right) _{j,l = 1,2}}\), where \({\varvec{v}}_{\tau n}^{jl} = {n^{ - 1}}Cov\left( {{{\varvec{U}}_j},{{\varvec{U}}_l}} \right) \) for \(j,l=1,2\). Furthermore, we assume that the covariance matrix \({\varvec{V}}_{\tau n}\) is positive definite at the true value \(({\gamma }_{\tau 0}^T,{\varvec{\lambda }}_{\tau 0}^T)^T\) and

$$\begin{aligned} {\varvec{V}}_{\tau n} = \left( \begin{array}{l} {\varvec{v}}_{\tau n}^{11}~~{\varvec{v}}_{\tau n }^{12} \\ {\varvec{v}}_{\tau n}^{21}~~{\varvec{v}}_{\tau n}^{22} \\ \end{array} \right) \mathop \rightarrow \limits ^p {\varvec{V}}_{\tau } = \left( \begin{array}{l} {\varvec{v}}_{\tau }^{11}~~{\varvec{v}}_{\tau }^{12}\\ {\varvec{v}}_{\tau }^{21}~~{\varvec{v}}_{\tau }^{22} \\ \end{array} \right) , \end{aligned}$$

where \(\mathop \rightarrow \limits ^p\) denotes convergence in probability. Then, the proposed estimators \(\left( {{\hat{{\gamma }}}_\tau ^T,{\hat{{\varvec{\lambda }}}}_\tau ^T} \right) ^T\) are \(\sqrt{n} \)-consistent and have the following asymptotic distribution

$$\begin{aligned} \begin{array}{l} \sqrt{n} \left( {\begin{array}{*{20}{c}} {{{\hat{{\gamma }} }_\tau } - {{\gamma } _{\tau 0}}} \\ {{{\hat{{\varvec{\lambda }}} }_\tau } - {{\varvec{\lambda }} _{\tau 0}}} \\ \end{array}} \right) \mathop \rightarrow \limits ^d N\left\{ {{\varvec{0}},{{\left( {\begin{array}{*{20}{c}} {{\varvec{v}}_{\tau }^{11}}~~~~{\varvec{0}} \\ ~~~~{\varvec{0}}~~~~{{\varvec{v}}_{\tau }^{22}}\\ \end{array}} \right) }^{ - 1}}{\varvec{V}}_{\tau } {{\left( {\begin{array}{*{20}{c}} {{\varvec{v}}_{\tau }^{11}}~~~~{\varvec{0}} \\ ~~~~{\varvec{0}}~~~~{{\varvec{v}}_{\tau }^{22}}\\ \end{array}} \right) }^{ - 1}}} \right\} . \\ \end{array} \end{aligned}$$

The proof is omitted since it is similar to the proof of Theorem 1 of Lv et al. (2017). Now, we show that the estimated covariance matrix \({\hat{{\varvec{\varSigma }}}}_{\tau i}\) is consistent. For a matrix \({\varvec{A}}\), \(\left\| {\varvec{A}} \right\| = {\left[ {tr\left( {{\varvec{A}}{{\varvec{A}}^T}} \right) } \right] ^{{1 / 2}}}\) denotes its Frobenius norm.

Theorem 3

Let \({\varvec{\varSigma }}_{\tau i}\) and \({\hat{{\varvec{\varSigma }}}}_{\tau i}\) be the true and estimated covariance matrix within the ith cluster, respectively. Suppose that the regularity conditions in “Appendix” hold. If the covariance matrix has the model structure (5) , we have \(\left\| {{\hat{{\varvec{\varSigma }}}}_{\tau i}-{\varvec{\varSigma }}_{\tau i}} \right\| = {O_p}\left( {{n^{{{ - 1} / 2}}}} \right) \).

4 Efficient estimating equations for the index coefficients \({\beta }\) and the link function \(g(\cdot )\)

Based on the discussions in Sect. 3, the covariance matrix \({\varvec{\varSigma }}_{\tau i}\) can be estimated by \({{\hat{{\varvec{\varSigma }}}} _{\tau i}} = {\hat{{\varvec{L}}}_{\tau i}}{\hat{{\varvec{D}}}_{\tau i}}\hat{{\varvec{L}}}_{\tau i}^T\), where \(\hat{{\varvec{D}}}_{\tau i}= diag\left( {\hat{d} _{\tau ,i1}^2,\ldots ,\hat{d} _{\tau ,i{m_i}}^2} \right) \) with \(\hat{d} _{\tau ,ij}^2= \exp \left( {\varvec{z}}_{ij}^T{\hat{{\varvec{\lambda }}}}_\tau \right) \) and the (jk) element of \({\hat{{\varvec{L}}}}_{\tau i}\) is \(\hat{l}_{\tau ,ijk}= {\varvec{w}}_{ijk}^T{{\hat{{\gamma }}}_\tau }\) for \(k<j=2,\ldots ,m_i\). Firstly, for a given \({\beta }\), we construct efficient smoothed estimating equations of \({\varvec{\theta }}\) by

$$\begin{aligned} \sum \limits _{i = 1}^n {{{\varvec{B}}_q^T}\left( {{{\varvec{X}}_i}{\beta } } \right) {{\varvec{\varLambda }} _i}{\hat{{\varvec{\varSigma }}}} _{\tau i}^{ - 1} {\psi _{h\tau }}\left( {{{\varvec{Y}}_i} - {{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{\beta } } \right) {\varvec{\theta }} } \right) } ={\varvec{0}}. \end{aligned}$$
(8)

Then, the efficient smoothed estimating equations of \({\beta }^{(r)}\) are constructed similarly by

$$\begin{aligned} \sum \limits _{i = 1}^n {{{\varvec{J}}_{{{\beta } ^{(r)}}}^T}{\hat{{\varvec{X}}}}_i^T{\bar{{\varvec{G}}'}}\left( {{\varvec{X}}_i}{\beta };{\beta }\right) {{\varvec{\varLambda }} _i}{\hat{{\varvec{\varSigma }}}}_{\tau i}^{-1} {\psi _{h\tau } }\left( {{\varvec{Y_i}} - {{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{\beta } } \right) }{\bar{{\varvec{\theta }}}}\left( {\beta }\right) \right) } = {\varvec{0}}, \end{aligned}$$
(9)

where \({\bar{{\varvec{\theta }}}}({\beta })\) is the solution of (8), \({\varvec{{\bar{G}'}}}\left( {{\varvec{X}}_i}{\beta };{\beta }\right) = {\mathrm{diag}}\Big \{ { \bar{g}'}\left( {{\varvec{X}}_{i1}^T{\beta } } ;{\beta }\right) ,\ldots , {\bar{g}'}\Big ( {{\varvec{X}}_{i{m_i}}^T{\beta } } ;{\beta }\Big ) \Big \}\) and \({\bar{g}'}(u;{\beta } ) = \sum _{s = 1}^{{J_n}} {{ B'_{s,q}}} (u){{ \bar{\theta } }_s}\left( {\beta } \right) \). Let \({\bar{{\beta }}^{(r)}}\) be the resulting estimator of (9). Therefore, the efficient estimator of \( g_0\left( u \right) \) can be achieved by \(\bar{g}\left( {u;{\bar{{\beta }}} } \right) = {{\varvec{B}}_q}{\left( u \right) ^T}{\bar{{\varvec{\theta }}}} \left( {\bar{{\varvec{{\beta }}}}} \right) \).

Remark 2

There are two main differences between the proposed estimating Eqs. (8) and (9) with Zhao et al. (2017)’s estimating Eq. (6). On the one hand, we apply a different method to smooth the discontinuous estimating equations. On the other hand, Zhao et al. (2017) applied a working correlation matrix \({\varvec{C}}_i\) which need to be specified to improve the estimation efficiency. Therefore, Zhao et al. (2017)’s approach cannot permit more general forms of the correlation structures and results in a loss of efficiency when the \({\varvec{C}}_i\) is far from the true correlation structure.

Theorem 4

Under conditions (C1)–(C7) in “Appendix,” and the number of knots satisfies \({n^{{1 / {(2d + 2)}}}} \ll {N_n} \ll {n^{{1/4}}}\), we have

$$\begin{aligned} \sqrt{n} \left( {\bar{{\beta }}} - {\beta }_0\right) \mathop \rightarrow \limits ^d N\left( {{\varvec{0}},{\varvec{J}}_{{{\beta }_0 ^{(r)}}}{{\varvec{\varGamma }} ^{ -1}}} {\varvec{J}}_{{{\beta }_0^{(r)}}}^T\right) , \end{aligned}$$

as \(n\rightarrow \infty \), where

$$\begin{aligned} {\varvec{\varGamma }} = \mathop {\lim }\limits _{n \rightarrow \infty } \frac{1}{n}\sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta }_0 ^{(r)}}}^T {\tilde{{\varvec{X}}}}_i^T {{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\varvec{\varLambda }} _i}{\varvec{\varSigma }}_{\tau i}^{-1}{{\varvec{\varLambda }} _i}{{\varvec{G'}}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) }{{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta }_0 ^{(r)}}}}} \end{aligned}$$

and the other symbols are the same as that in Theorem 1.

Let \({{\varvec{\varUpsilon }} _i} = {{\varvec{\varPhi }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{\varvec{\varSigma }} _{\tau i}^{{{ 1} / 2}} - {{\varvec{\varGamma }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{\varvec{\varSigma }} _{\tau i}^{{{ - 1}/2}}\). Then,

$$\begin{aligned} \begin{aligned} {{\varvec{\varUpsilon }} _i}{\varvec{\varUpsilon }} _i^T&= {{\varvec{\varPhi }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{{\varvec{\varSigma }} _{\tau i}}{{\varvec{\varLambda }} _i}{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{\beta } _0^{(r)}}}{{\varvec{\varPhi }} ^{ - 1}} \\&\quad - {{\varvec{\varPhi }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{{\varvec{\varLambda }} _i}{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{\beta } _0^{(r)}}}{{\varvec{\varGamma }} ^{ - 1}} \\&\quad - {{\varvec{\varGamma }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{{\varvec{\varLambda }} _i}{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{\beta } _0^{(r)}}}{{\varvec{\varPhi }} ^{ - 1}} \\&\quad + {{\varvec{\varGamma }} ^{ - 1}}{\varvec{J}}_{{\beta } _0^{(r)}}^T{\tilde{{\varvec{X}}}}_i^T{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _i}{\varvec{\varSigma }} _{\tau i}^{ - 1}{{\varvec{\varLambda }} _i}{\varvec{G}}'\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\tilde{{\varvec{X}}}}_i}{{\varvec{J}}_{{\beta } _0^{(r)}}}{{\varvec{\varGamma }} ^{ - 1}}. \\ \end{aligned} \end{aligned}$$

Since \({{\varvec{\varUpsilon }} _i}{\varvec{\varUpsilon }} _i^T\) is nonnegative definite, we have

$$\begin{aligned} \begin{array}{l} 0 \le \mathop {\lim }\limits _{n \rightarrow \infty } \frac{1}{n}\sum \limits _{i = 1}^n {{{\varvec{\varUpsilon }} _i}{\varvec{\varUpsilon }} _i^T} = {{\varvec{\varPhi }} ^{ - 1}}{\varvec{\varPsi }} {{\varvec{\varPhi }} ^{ - 1}} - {{\varvec{\varPhi }} ^{ - 1}}{\varvec{\varPhi }} {{\varvec{\varGamma }} ^{ - 1}} - {{\varvec{\varGamma }} ^{ - 1}}{\varvec{\varPhi }} {{\varvec{\varPhi }} ^{ - 1}} + {{\varvec{\varGamma }} ^{ - 1}}{\varvec{\varGamma }} {{\varvec{\varGamma }} ^{ - 1}} \\ ~~ = {{\varvec{\varPhi }} ^{ - 1}}{\varvec{\varPsi }} {{\varvec{\varPhi }} ^{ - 1}} - {{\varvec{\varGamma }} ^{ - 1}} .\\ \end{array} \end{aligned}$$

Thus, \({{\varvec{\varGamma }} ^{ - 1}} \le {{\varvec{\varPhi }} ^{ - 1}}{\varvec{\varPsi }} {{\varvec{\varPhi }} ^{ - 1}}\). This implies that \({\bar{{\beta }}}\) has smaller asymptotic covariance matrix than that of \({\tilde{{\beta }}}\). So the proposed estimator \({\bar{{\beta }}}\) is asymptotically more efficient than \({\tilde{{\beta }}}\).

Motivated by (9), a sandwich formula for estimating the covariance of \({\bar{{\beta }}}\) is

$$\begin{aligned} Cov\left( {\bar{{\varvec{{\beta }}}} } \right) \approx {{\varvec{J}}_{{{\bar{{\beta }}}^{(r)}}}}{\bar{{\varvec{\varGamma }}}} _n^ {-1} {{\bar{{\varvec{\varOmega }}}}_n}{ {\bar{{\varvec{\varGamma }}} _n^{-1}} }{\varvec{J}}_{{{\bar{{\beta }} }^{(r)}}}^T, \end{aligned}$$
(10)

where

$$\begin{aligned}&{{\bar{{\varvec{\varGamma }}}} _n} = \sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta } ^{(r)}}}^T{\hat{{\varvec{X}}}}_i^T{\bar{{\varvec{G}}}'}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) {{\varvec{\varLambda }}_i}{\hat{{\varvec{\varSigma }}}} _{\tau i}^{ - 1}{{\bar{{\varvec{\varLambda }}}}_i}\left( {\beta } \right) {\bar{{\varvec{G}}'}}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) {{\hat{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta } ^{(r)}}}}\left| {_{{{\beta } ^{(r)}} = {{\bar{{\beta }}}^{(r)}}}} \right. }, \\&{{\bar{{\varvec{\varOmega }}}} _n} = \sum \limits _{i = 1}^n {{\varvec{J}}_{{{\beta } ^{(r)}}}^T{\hat{{\varvec{X}}}}_i^T{\bar{{\varvec{G}}'}}\left( {{{\varvec{X}}_i}{\beta } ;{\beta } } \right) {{\varvec{\varLambda }}_i}{\hat{{\varvec{\varSigma }}}} _{\tau i}^{ - 1}{{\bar{{\varvec{S}}}}_{\tau i}\left( {\beta } \right) }{\bar{{\varvec{S}}}}_{\tau i}^T\left( {\beta } \right) {\hat{{\varvec{\varSigma }}} } _{\tau i}^{ - 1}{{\varvec{ \varLambda }}_i}{\bar{{\varvec{G}}}'}\left( {{{\hat{{\varvec{X}}}}_i}{\beta } ;{\beta } } \right) {{\hat{{\varvec{X}}}}_i}{{\varvec{J}}_{{{\beta } ^{(r)}}}}\left| {_{{{\beta } ^{(r)}} = {{\bar{{\beta }} }^{(r)}}}} \right. }, \\&{{\bar{{\varvec{S}}}}_{\tau i}}\left( {\beta } \right) = {\Bigg \{ {{\psi _{h\tau }}\left( {{Y_{i1}} - \bar{g}\left( {{\varvec{X}}_{i1}^T{\beta } ;{\beta } } \right) } \right) ,\ldots ,{\psi _{h\tau }}\left( {{Y_{i{m_i}}} - \bar{g}\left( {{\varvec{X}}_{i{m_i}}^T{\beta } ;{\beta } } \right) } \right) } \Bigg \}^T}, \end{aligned}$$

and

$$\begin{aligned}&{{\bar{{\varvec{\varLambda }}} }_i}\left( {\beta } \right) \\&\quad = diag\Bigg \{ {{h^{ - 1}}K\left( {{{\left( {{Y_{i1}} - \bar{g}\left( {{\varvec{X}}_{i1}^T{\beta } ;{\beta } } \right) } \right) } \Big / h}} \right) ,\ldots ,{h^{ - 1}}K\left( {{{\left( {{Y_{i{m_i}}} - \bar{g}\left( {{\varvec{X}}_{i{m_i}}^T{\beta } ;{\beta } } \right) } \right) } \Big / h}} \right) } \Bigg \}. \end{aligned}$$

Theorem 5

When \({\beta }\) is a known constant \({\beta }_0\) or estimated to the order \({{O_p}\left( {{n^{{{ - 1}/2}}}} \right) }\), under conditions (C1)–(C7) in “Appendix,” and the number of knots satisfies \({n^{{1 / {(2d + 2)}}}} \ll {N_n} \ll {n^{{1 /4}}}\), then (i) \(\left| {{\bar{g}}({u};{{\beta } }) - {g_0}({u})} \right| = O_p\left( {\sqrt{{{{N_n}} \big / n}} + N_n^{ - d}} \right) \) uniformly in \(u \in \left[ {a,b} \right] \); and (ii) under \({n^{{1 /{(2d + 1)}}}} \ll {N_n} \ll {n^{{1/4}}}\),

$$\begin{aligned} \sigma _n^{*-1}\left( u \right) \left( {\bar{g}\left( {u; {\beta } } \right) - \check{g}\left( {u;{\beta } } \right) } \right) \mathop \rightarrow \limits ^d N\left( {0,1} \right) , \end{aligned}$$

where \(\sigma _n^{*2}\left( u \right) = {\varvec{B}}_q^T\left( u \right) {{\varvec{V}}^{ *- 1}}\left( {{{\beta } _0}} \right) {{\varvec{B}}_q}\left( u \right) , {\varvec{V}}^*\left( {{{\beta } _0}} \right) = \sum _{i = 1}^n {\varvec{B}}_q^T\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) {{\varvec{\varLambda }} _{ i}}{\varvec{\varSigma }} _{\tau i}^{ - 1}{{\varvec{\varLambda }} _{ i}}{{\varvec{B}}_q}\left( {{{\varvec{X}}_i}{{\beta } _0}} \right) .\)

Remark 3

We can adopt a similar iterative algorithm given in Sect. 2.3 to find the solutions of estimating Eqs. (8) and (9). Here, we omit it for saving space. Furthermore, we also can prove that \({\bar{g}}({u};{{\beta } })\) is asymptotically more efficient than \({\hat{g}}({u};{{\beta } })\) by using the above similar method.

5 Simulation studies

In this section, we conduct simulation studies to compare our approach with some existing methods. The major aim is to show that the proposed method not only can deal with complex correlation structures, but also produces more efficient estimates for the index coefficients and the link function. Specifically, we compare the proposed estimators \({\bar{{\beta }}}\) and \(\bar{g}\) (denoted as \({\hat{{\beta }}}_{pr}\) and \(\hat{g}_{pr}\)) with the other three types of estimators: (i) the estimator proposed by Ma and He (2016) without using “remove–one–component” method (denoted as \({\hat{{\beta }}}_{ma}\) and \(\hat{g}_{ma}\)); (ii) the estimators \({\tilde{{\beta }}}\) and \(\hat{g}\) (denoted as \({\hat{{\beta }}}_{in}\) and \(\hat{g}_{in}\)) without considering the within-subject correlation, which are given in Sect. 2.3; (iii) the estimators proposed by Zhao et al. (2017) with the AR(1) working correlation structure (denoted as \({\hat{{\beta }}}_{ar}\) and \(\hat{g}_{ar}\)) and the compound symmetry structure (denoted as \({\hat{{\beta }}}_{cs}\) and \(\hat{g}_{cs}\)), which involve a tuning parameter \(h_1\). We set \(h_1=n^{-1/2}\) in our simulations and real data analysis according to their suggestions. Similar to Wang and Zhu (2011), we smooth the quantile score function by the following second-order (\(\nu =2\)) Bartlett kernel

$$\begin{aligned} K\left( u \right) = \frac{3}{{4\sqrt{5} }}\left( {1 - {{{u^2}} \Big / 5}} \right) I\left( {\left| u \right| \le \sqrt{5} } \right) . \end{aligned}$$

In order to achieve good numerical performances, we need to select several parameters appropriately. Firstly, we fix the spline order to be \(q=4\), namely we use cubic B-splines to approximate the nonparametric link function in our numerical examples. Meanwhile, we use equally spaced knots with the number of interior knots \(N_n=[n^{1/(2q+1)}]\) which satisfies theoretical requirement. The similar strategy had been adopted by Ma and Song (2015). Secondly, Wang and Zhu (2011) had proved that the smoothed approach is robust to the bandwidth h. Thus, we fix \(h=n^{-0.3}\) which satisfies the theoretical requirement \( n{h^{2\nu }} \rightarrow 0\) with \(\nu =2\).

Example 1

Similar to Ma and He (2016), we generate the data from the following longitudinal single-index regression model

$$\begin{aligned} {Y_{ij}} = {g_0}\left( {{\varvec{X}}_{ij}^T{{\beta } _0}} \right) + \delta {\varepsilon _{ij}}, i=1,\ldots ,n,j=1,\ldots ,m_i, \end{aligned}$$

where \(\delta =0.5\), \(g_{0}(u)=\sin \left( {\frac{{0.2 \pi \left( {u - A} \right) }}{{B - A}}} \right) \) with \(A = \sqrt{3} /2 - 1.645/\sqrt{12} ,B = \sqrt{3} /2 + 1.645/\sqrt{12}\), \({\beta }_0=\left( \beta _{01},\beta _{02},\beta _{03} \right) ^T=\left( 3,2,0.4\right) ^T/\sqrt{3^2+2^2+0.4^2}\), and the covariate \({{\varvec{X}}_{ij}} = {\left( {{X_{ij1}},{X_{ij2}},{X_{ij3}}} \right) ^T}\) follows a multivariate normal distribution \(N(0,{\varvec{\varSigma }})\) with \(({\varvec{\varSigma }})_{k,l}=0.5^{|k-l|}\) for \(1\le k,l\le 3\). Here, we define \({\varepsilon _{ij }} ={\xi _{ij}}- {c_\tau } \), and \(c_\tau \) is the \(\tau \)th quantile of the random error \(\xi _{ij}\), which implies the corresponding \(\tau \)th quantile of \(\varepsilon _{ij}\) is zero. Meanwhile, we consider two error distributions of \({\varvec{\xi }} _{i}=\left( \xi _{i1},\ldots ,\xi _{im_i} \right) ^T\) for assessing the robustness and effectiveness of the proposed method.

Case 1 Correlated normal error, \({{\varvec{\xi }}}_i\) is generated from a multivariate normal distribution \(N({\varvec{0}},{\varvec{\varXi }}_i)\), where \({\varvec{\varXi }}_i\) will be listed later.

Case 2\({{\varvec{\xi }}}_i\) is generated from a multivariate t-distribution with the degree 3 and the covariance matrix \({\varvec{\varXi }}_i\).

Table 1 Simulation results of the bias (\(\times 10^{-2}\)) and SD (\(\times 10^{-2}\)) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100, 400\) and Case 1 in Example 1
Table 2 Simulation results of the bias (\(\times 10^{-2}\)) and SD (\(\times 10^{-2}\)) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100,400\) and Case 2 in Example 1
Table 3 Simulation results of the bias (\(\times 10^{-2}\)) and SD (\(\times 10^{-2}\)) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100,400\) and Case 1 in Example 2
Table 4 Simulation results of the bias (\(\times 10^{-2}\)) and SD (\(\times 10^{-2}\)) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100,400\) and Case 2 in Example 2
Table 5 Simulation results of the bias (\(\times 10^{-2}\)) and SD (\(\times 10^{-2}\)) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100\) in Example 3

By using a similar strategy of Liu and Zhang (2013), the covariance matrix \({\varvec{\varXi }}_i\) is constructed by \({\varvec{\varXi }}_i= {\varvec{L}}_i {\varvec{D}} _i {{\varvec{L}}_i^T}\), where \({\varvec{D}}_i=diag\left( \exp ({\varvec{h}}_{i1}^T{\varvec{\alpha }}),\ldots ,\exp ({\varvec{h}}_{im_i}^T{\varvec{\alpha }}) \right) \) and \( {\varvec{L}}_i \) is a unit lower triangular matrix with (jk) element \({\varvec{\omega }}_{ijk}^T{{\phi }} \)\((k<j=2,\ldots ,m_i)\), where \({\phi }=\left( -0.3,0.2,0,0.5\right) ^T\), \({\varvec{\alpha }}=\left( -0.3,0.5,0.4,0\right) ^T\), \({\varvec{h}}_{ij}=\left( 1, h_{ij1},h_{ij2},h_{ij3} \right) ^T\), \({\varvec{\omega }}_{ijk}=\left( 1, (t_{ij}-t_{ik}),(t_{ij}-t_{ik})^2,(t_{ij}-t_{ik})^3 \right) ^T\), \(h_{ijl}\) follows a standard normal distribution for \(l=1,2,3\), and \(t_{ij}\) is generated from the standard uniform distribution. In addition, we adopt a similar strategy of Liu and Zhang (2013) to generate unbalanced longitudinal data. Specifically, each subject is measured \(m_i\) times with \(m_i-1 \sim Binomial(11,0.8)\), which results in different numbers of repeated measurements for each subject. For the covariates \({\varvec{z_{ij}}}\) and \({\varvec{w}}_{ijk}\) of the covariance model (5), we set \({\varvec{z}}_{ij}={\varvec{h}}_{ij}\) and \({\varvec{w}}_{ijk}={\varvec{\omega }}_{ijk}\).

Example 2

The model setup is similar to that of Example 1. Firstly, we take \(\delta =1\) and \({\beta }_0=\left( \beta _{01},\beta _{02},\beta _{03} \right) ^T=\left( 3,2,1\right) ^T/\sqrt{14}\). Secondly, we define the covariance matrix \({\varvec{\varXi }}_i\) by \({{\varvec{\varXi }} _i} = {\varvec{\varDelta }} _i^{ - 1}{{\varvec{B}}_i}{\left( {{\varvec{\varDelta }} _i^T} \right) ^{ - 1}}\), where \({\varvec{B}}_i\) is an \(m_i \times m_i\) diagonal matrix with the jth element \(\sin \left( {\pi {\varsigma _{ij}}} \right) /3 + 0.5\) and \(\varsigma _{ij} \sim U(0,2)\), and \({\varvec{\varDelta }} _i \) is a unit lower triangular matrix with (jk) element \(-\delta _{j,k}^{(i)}\)\((k<j=2,\ldots ,m_i)\), \(\delta _{j,k}^{(i)} = 0.2 + 0.5\left( {{t_{ij}} - {t_{ik}}} \right) \) and \(t_{ij} \sim U(0,1)\). For the covariates \({\varvec{z}}_{ij}\) and \({\varvec{w}}_{ijk}\) of the regression model (5), we set \({\varvec{z}}_{ij}=\left( 1, t_{ij},t_{ij}^2,t_{ij}^3 \right) ^T\), and \({\varvec{w}}_{ijk}\) is the same as that in Example 1. Meanwhile, we set \(m_i=12\), but each element has \(20\%\) probability of being missing at random, which leads to unbalanced longitudinal data. Other settings are the same as that in Example 1.

Example 3

For a clear comparison, we adopt a similar strategy of Zhao et al. (2017) to construct the covariance matrix \({\varvec{\varXi }}_i\). We define \({\varvec{\varXi }}_i={\varvec{B}}_i^{1/2}{{\varvec{H}}_i}{\varvec{B}}_i^{1/2}\), where \({\varvec{B}}_i\) is given in Example 2 and \({\varvec{H}}_i\) follows either the compound symmetry structure (cs) or the AR(1) structure (ar1) with the correlation coefficient \(\rho =0.85\). In addition, we take \(\delta =1\) and \({\beta }_0=\left( \beta _{01},\beta _{02},\beta _{03} \right) ^T=\left( 3,2,0\right) ^T/\sqrt{13}\). The scheme of generating unbalanced longitudinal data, and \({\varvec{z}}_{ij}\) and \({\varvec{w}}_{ijk}\) are the same as that in Example 2. Other settings are the same as that in Example 1.

Tables 1, 2, 3, 4, and 5 give the biases and the standard deviations (SD) of \({\hat{{\beta }}}_{ma}\), \({\hat{{\beta }}}_{cs}\), \({\hat{{\beta }}}_{ar}\), \({\hat{{\beta }}}_{in}\), and \({\hat{{\beta }}}_{pr}\) at \(\tau =0.5, 0.75\) for \(n=100\) and 400. We can derive the following several observations from these tables. Firstly, it is easy to find that all methods yield unbiased estimators for the index coefficients \({\beta }\), since the corresponding biases are small. Furthermore, \({\hat{{\beta }}}_{pr}\) has smaller bias in most cases. Secondly, the estimator \({\hat{{\beta }}}_{in}\) performs better than \({\hat{{\beta }}}_{ma}\), which indicates “remove–one–component” method leads to more efficient estimators. Thirdly, the proposed smoothed estimator \({\hat{{\beta }}}_{pr}\) performs best (with smallest SD) among all methods. It is not surprised that the SDs of \({\hat{{\beta }}}_{cs}\) and \({\hat{{\beta }}}_{ar}\) are bigger than that of \({\hat{{\beta }}}_{pr}\) for Examples 1 and 2, because \({\hat{{\beta }}}_{cs}\) and \({\hat{{\beta }}}_{ar}\) use misspecified correlation structures for Examples 1 and 2, which results in a loss of efficiency. Fourthly, as far as we know, the correlation structure of \({\psi _\tau }\left( {{{\varvec{\varepsilon }} _i}} \right) \) also has the exchangeable structure if the correlation structure of \({\varvec{\varepsilon }} _i\) is exchangeable. Thus, \({\hat{{\beta }}}_{cs}\) use a correct working correlation structure under the compound symmetry (cs) structure in Table 5. However, \({\hat{{\beta }}}_{pr}\) also has slighter advantage than \({\hat{{\beta }}}_{cs}\). The main reasons are as follows. On the one hand, we adopt another smoothed approach that is different from that of Zhao et al. (2017) to construct the smoothed estimating equations. On the other hand, the moment approach does not work well and yields inaccuracy estimator of the correlation coefficient for unbalanced longitudinal data with large \(m_i\). This leads to a bad estimator of \({\varvec{C}}_i\) (given in Sect. 2.3 of Zhao et al. 2017). Finally, the correlation structure of \({\psi _\tau }\left( {{{\varvec{\varepsilon }} _i}} \right) \) does not possess the AR(1) correlation structure when \({\varvec{\varepsilon }} _i\) has the AR(1) correlation structure. Therefore, \({\hat{{\beta }}}_{cs}\) and \({\hat{{\beta }}}_{ar}\) use the misspecification of correlation structure under the AR(1) correlation structure in Example 3. Therefore, \({\hat{{\beta }}}_{cs}\) and \({\hat{{\beta }}}_{ar}\) perform worse than \({\hat{{\beta }}}_{pr}\). In addition, for the nonparametric link function \(g_0(\cdot )\), we apply the mean squared error (MSE) to evaluate the accuracy of the estimator, which is defined as

$$\begin{aligned} \textit{MSE}\left( { g} \right) = \frac{1}{500}\sum \limits _{k = 1}^{500}{{{\left[ {\frac{1}{N}\sum \limits _{i = 1}^n {\sum \limits _{j = 1}^{{m_i}} {{{\left( {{{ g}^{(k)}}\left( {{u_{ij}}} \right) - g_0\left( {{u_{ij}}} \right) } \right) }^2}} } } \right] }}}, \end{aligned}$$

where \({{ g}^{(k)}}\left( {{u_{ij}}} \right) \) is the kth estimated value of \(g_0(u_{ij})\). From Table 6, the proposed \(\hat{g}_{pr}\) achieves the smallest MSE among all methods in general, which indicates that \(\hat{g}_{pr}\) outperforms the existing approaches. Overall, the proposed estimators \({\hat{{\beta }}}_{pr}\) and \(\hat{g}_{pr}\) can achieve better efficiency than the existing methods.

In order to evaluate the accuracy of the sandwich formula (10), we give the ratio of sample standard deviation (SD) and the estimated standard error (SE). For brevity, we only list the results of Example 2. In Table 7, “SD” represents the sample standard deviation of 500 estimators of the parameters. It can be taken as the true standard deviation of the resulting estimators. “SE” represents the sample average of 500 estimated standard errors by utilizing the sandwich formula (10). Table 7 indicates that the sandwich formula (10) works well for different situations, especially for large simple size (\(n=400\)), since the value of SD/SE is very close to one. Compared with the method of Zhao et al. (2017), our method provides more accurate variance estimation. In addition, we use \(P_{0.95}\) to stand for the coverage probabilities of 95\(\%\) confidence intervals over 500 repetitions. From Table 7, the proposed estimator \({\hat{{\beta }}}_{pr}\) consistently achieves higher coverage probabilities and it is closer to its nominal level.

Table 6 Simulation results of MSE(\(\times 10^{-3}\)) for \(\tau =0.5\) with \(n=100,400\) in Examples 13
Table 7 Simulation results of the SE/SD and \(P_{0.95}\) for \({\beta }=(\beta _1,\beta _2,\beta _3)^T\) with \(n = 100,400\) in Example 2
Fig. 1
figure 1

Estimators of \({\beta }=(\beta _{1},\beta _{2},\beta _{3})^T\) and \(g(\cdot )\) at \(\tau =0.5\) (black) and \(\tau =0.75\) (blue) for Case 1 in Example 2 (color figure online)

Table 8 Estimated regression coefficients and their standard errors (SE), and PMSE for progesterone data

Finally, it is an interesting question to test whether the proposed estimates \({\bar{{\beta }}}\) and \(\bar{g}(\cdot )\) are sensitive to the dimensions of the covariates \({\varvec{z}}_{ij}\) and \({\varvec{w}}_{ijk}\). For brevity, we only present the results of \({\bar{{\beta }}}\) and \(\bar{g}(\cdot )\) for Case 1 in Example 2. We set \({{\varvec{w}}_{ijk }} = {\left\{ {1,{t_{ij}} - {t_{ik }},\ldots ,{{\left( {{t_{ij}} - {t_{ik }}} \right) }^{p_1-1}}} \right\} ^T}\) and \({\varvec{z}}_{ij}={\left( {1,{t_{ij}},\ldots ,t_{ij}^{p_2-1}} \right) ^T}\), where \((p_1,p_2)\) are given in Fig. 1. From Fig. 1, we can see that \({\bar{{\beta }}}\) and \(\bar{g}(\cdot )\) are not very sensitive to the dimensions \((p_1,p_2)\).

6 Real data analysis

In this section, we illustrate the proposed estimation method through an empirical example which has been studied by Zhang et al. (1998). These data include 34 women whose urine samples were collected in one menstrual cycle and whose urinary progesterone was assayed on alternate days. These women were measured 11–28 times, and it involves a total of 492 observations. Our goal is to explore the relationship between progesterone level and the following covariates: patient’s age and body mass index. Therefore, we define the log-transformed progesterone level as the response (Y), age, and body mass index are taken as the covariates. We use the longitudinal single-index quantile regression model to analyze this data set

$$\begin{aligned} {Y_{ij}} = g\left( {{\beta _1}{X_{ij1}} + {\beta _2}{X_{ij2}}} \right) + {\varepsilon _{ij}}, \end{aligned}$$

where \((Y_{ij},X_{ij1},X_{ij2})\) is the jth observed value at the time \(t_{ij}\) for the ith woman, \(X_{ij1}\) and \(X_{ij2}\) are the standardized variables of age and body mass index, respectively. Meanwhile, repeated measurement time \(t_{ij}\) is rescaled into interval [0, 1]. For the covariance model (5), we take the corresponding covariates as

$$\begin{aligned} {{\varvec{w}}_{ijk }} = {\left\{ {1,{t_{ij}} - {t_{ik }},\ldots ,{{\left( {{t_{ij}} - {t_{ik }}} \right) }^{p_1-1}}} \right\} ^T}, {\varvec{z}}_{ij}={\left( {1,{t_{ij}},\ldots ,t_{ij}^{p_2-1}} \right) ^T}. \end{aligned}$$

We consider different \((p_1,p_2)\) for this data set. Six estimators are considered: \(\hat{{\beta }}_{cs}\), \(\hat{{\beta }}_{ar}\), and \(\hat{{\beta }}_{in}\) are given in Sect. 5, and \(\hat{{\beta }}_{32}\), \(\hat{{\beta }}_{23}\), and \(\hat{{\beta }}_{44}\) represent the proposed estimators with \(( p_1=3,p_2=2)\), \((p_1=2,p_2=3)\), and \((p_1=4, p_2=4)\), respectively. The leave–one–out cross-validation procedure is used to evaluate the forecasting accuracy of the estimators. Specifically, we investigate the forecasting accuracy of different methods by using the prediction mean squared error (PMSE), which is defined as

$$\begin{aligned} {\mathrm{PMSE} }= \frac{1}{n}\sum \limits _{i = 1}^n {{{\left\| {{{\varvec{Y}}_i} - g\left( {{\varvec{X}}_i{{\beta } _{( - i)}}} \right) } \right\| }^2}}, \end{aligned}$$

where \({{\varvec{Y}}_i} = {({Y_{i1}},\ldots ,{ Y_{i{m_i}}})^T}, {{\varvec{X}}_i} = {({{\varvec{X}}_{i1}},\ldots ,{{\varvec{X}}_{i{m_i}}})^T},{{\varvec{X}}_{ij}} = {({X_{ij1}},{X_{ij2}})^T}\), and \({\beta } _{( - i)}\) is the estimator which is obtained based on the data of the other 33 subjects except the ith subject. In Table 8, we report the PMSE, the estimated regression coefficients, and the corresponding standard errors which are obtained by the sandwich formula (10). Compared with the method of Zhao et al. (2017), our proposed method possesses smaller standard errors in general. Meanwhile, we see that our method has smaller PMSE, which indicates that the forecasting accuracy of our method is better. In addition, scatter plots of the response and the estimated link functions with 95\(\%\) confidence intervals for \(\tau =0.5\) and 0.75 are displayed in Fig. 2. It is clear that there is a nonlinear trend. Thus, using a nonlinear term in the regression is perhaps more appropriate than using a linear term.

Fig. 2
figure 2

Scatter plots of the response and the estimated link functions (solid curve) with 95\(\%\) confidence intervals (dashed curve) for \(\tau =0.5\) (black) and \(\tau =0.75\) (red) (color figure online)