1 Introduction

Since first proposed by Koenker and Bassett (1978), quantile regression has emerged as an important statistical methodology. By estimating various conditional quantile functions, quantile regression complements the focus of classical least squares regression on the conditional mean and explores the effect of covariate on the location, scale and shape of the distribution of the response variable. It has been used in wide range of applications including economics, biology, and finance. A comprehensive review of the theory of quantile regression and some of the most recent development can be found in Koenker (2005).

In order to reduce severe modeling biases caused by the mis-specifying parametric models, there has been an upsurge of interests and efforts in nonparametric models. Although the nonparametric approach is useful in exploring hidden structures and reducing modeling biases, it can be too flexible to make the concise conclusions, and faces the curse of dimensionality due to a large number of covariates. In order to overcome these shortcomings, Engle et al. (1986) proposed the partially linear regression model (PLRM) which allowed some explanatory variables to act in a nonparametric manner, while the others to have a linear relation with the response variable. PLRM not only avoids the curse of dimensionality problem in nonparametric regression, but also retains the interpretation of the effect of the explanatory variables in linear regression. Therefore, PLRM has received considerable attention. For example, several papers have been published on this class of model for the independent and identically distributed case (see Speckman 1988; Shi and Li 1994; Mammen and Geer 1997) as well as for dependent data (see Gao 1995; Fan and Li 1999; Liu 2011), for more details see Härdle et al. (2000). Estimation methods in above mentioned papers are mainly based on the mean regression. As for partially linear quantile regression model, He and Shi (1996) used bivariate tensor-product B-splines to approximate the nonparametric function. He and Liang (2000) introduced partially linear quantile regression in errors-in-variables models.

There is a large collection of functional data literature on mean regression (see Ramsay and Silverman 2005 for a review), but relatively few studies from a quantile regression perspective. Only Cardot et al. (2005) introduced penalized quantile regression when the covariates are functions. It is well known quantile regression approach is insensitive to outliers and more robust than the ordinary least squares method. Moreover, it may even work beautifully when the variance of random error is infinite, while the least squares method breaks down. In addition, fitting data at a set of quantiles provides a more comprehensive description of the response distribution than does the mean. In many applications, the functional impacts of the covariates on the response may vary at different percentiles of the distribution (see Wang et al. 2009). In many practical application, we are interested in the low or high quantile of the response variable when some explain covariates are functions and others may be classifiable variates. In the conditional growth charts, for example, one often uses the gender as classifiable variate because the adult male height is higher than woman’s in general, and uses the entire growth phase as function. The ultimate outcome interest is the adult height. We are very much concerned about the low quantile of adult height, and want to decide which factors stunt height growth and which factors facilitate. All of these motivate us to combine the quantile and semiparametric approaches with functional regression, which result in the functional partially linear quantile regression model (FPLQRM). It is obvious that the proposed FPLQRM is more flexible than functional linear quantile regression model proposed by Cardot et al. (2005), since it not only contains the nonparametric function, but also the linear variables which may be categorical variable. To the best of our knowledge, there is not yet any work that combines the semiparametric and quantile approaches in a functional semiparametric model in existing literature. We attempt to sort out this problem. However, there are many difficulties in this task. Firstly, unlike the square loss function, the quantile loss function is not differentiable at the origin, which makes it difficult to derive large sample properties of the resulting estimators. Secondly, one needs to estimate simultaneously the parametric and nonparametric component which have different convergence rates. In this paper, we develop new estimators for the parameters of functional partially linear quantile regression model from the perspective of the Karhunen-Loéve expansion and show the consistence and asymptotical normality of the resulting estimators.

The remainder of the paper is organized as follows. Section 2 introduces functional partially linear quantile regression model. Section 3 then develops an approach. The large sample properties of the proposed estimators are given in Sect. 4. Simulation studies are presented in Sect. 5. The proofs of the main results are presented in the “Appendix”.

2 Functional partially linear quantile regression model

Let \(Y\) be a real value random variable defined on a probability space \((\Omega ,\mathcal{B },P)\). \({{\varvec{Z}}}=(Z_1,Z_2,\ldots ,Z_p)\) is a \(p-\)dimensional random variables and let \(\{X(t): t \in \mathcal{F }\}\) be a zero mean, second-order stochastic process defined on \((\Omega ,\mathcal{B },P)\) with sample paths in \(L^2(\mathcal{F })\), the Hilbert space containing square integrable functions with inner product \(\langle x, y\rangle =\int _\mathcal{F } x(t)y(t)dt, \forall x, y \in L^2(\mathcal{F })\) and norm \(\Vert x\Vert = \langle x, x\rangle ^{1/2}\). Without loss of generality, we suppose throughout the paper that \( \mathcal{F }=[0,1]\). At a given quantile level \(\tau \in (0, 1)\), the dependence between \(Y\) and \((X,{{\varvec{Z}}})\) is expressed as

$$\begin{aligned} Y=\int \limits _0^1\beta (t,\tau )X(t)dt+{{\varvec{Z}}}^T{\varvec{\theta }}(\tau )+\varepsilon (\tau ), \end{aligned}$$
(1)

where \(\varepsilon (\tau )\) is a random error whose \(\tau \)th quantile equals zero, \(\beta (t,\tau )\) is a square integrable function on [0,1], \({\varvec{\theta }}(\tau )\) is \(p\)-dimensional unknown real vector. In the rest of the article, we will suppress \(\tau \) in \({\varvec{\theta }}(\tau )\) and \(\beta (t,\tau )\) for notational simplicity.

Remark 1

Model (1) generalizes both the linear quantile regression model and functional linear model which correspond to the cases \(\beta =0\) and \({\varvec{\theta }}=0\), respectively. If \(\tau = 0.5\) and \(\varepsilon \) have symmetric distributions with a finite mean, then the median FPLQRM is equal to the conditional mean. Therefore, model (1) also includes the partially functional linear regression model proposed by Shin (2009) and includes the semi-functional partially linear regression model in Aneiros-Pérez and Vieu (2006) given by \(Y={\varvec{\beta }}^T{{\varvec{z}}} + m(X) +\varepsilon \) when \(m(X)=\int _0^1\gamma (t)X(t) dt\).

3 Estimation methods

Let \(\{(X_i,{{\varvec{Z}}}_i, Y_i),i=1,\ldots ,n\}\) be an independent and identically distributed sample which is generated from model (1). Define the covariance function and the empirical covariance function respectively as

$$\begin{aligned} K(s,t)=\text{ Cov }(X(t),X(s)) \end{aligned}$$

and

$$\begin{aligned} \hat{K}(s,t)=\frac{1}{n}\sum \limits _{i=1}^n X_i(s)X_i(t). \end{aligned}$$

The covariance function \(K\) defines a linear operator which maps a function \(f\) to \(Kf\) given by \((Kf)(u) =\int K(u,v)f(v)dv\). We shall assume that the linear operator with kernel \(K\) is positive definite. Let \(\lambda _1>\lambda _2>\cdots >0\) and \(\hat{\lambda }_1\ge \hat{\lambda }_2\ge \cdots \ge 0\) be the ordered eigenvalue sequences of the linear operators with kernels \(K\) and \(\hat{K}\), \(\{\phi _j\}\) and \(\{\hat{\phi }_j\}\) be the corresponding orthonormal eigenfunction sequences respectively. It is clear that the sequences \(\{\phi _j\}\) and \(\{\hat{\phi }_j\}\) each forms an orthonormal basis in \(L^2([0, 1])\). Then, the spectral decompositions of the covariance functions \(K\) and \(\hat{K}\) can be written as

$$\begin{aligned} K(s,t)=\sum _{j=1}^{\infty } \lambda _j\phi _j(s)\phi _j(t) \end{aligned}$$

and

$$\begin{aligned} \hat{K}(s,t)=\sum _{j=1}^{\infty } \hat{\lambda }_j\hat{\phi }_j(s)\hat{\phi }_j(t), \end{aligned}$$

respectively.

According to the Karhunen-Loève representation, we have

$$\begin{aligned} X(t)=\sum \limits _{i=1}^{\infty } \xi _i\phi _i(t) \end{aligned}$$

and

$$\begin{aligned} \beta (t)=\sum \limits _{i=1}^{\infty } \gamma _i\phi _i(t) \end{aligned}$$
(2)

where the \(\xi _i\) are uncorrelated random variables with mean 0 and variance \(E[\xi _i^2]=\lambda _i\), and \(\gamma _i=\langle \beta , \phi _i \rangle \), for more details see Ramsay and Silverman (2005). Substituting (2) into model (1), we can get

$$\begin{aligned} Y=\sum \limits _{j=1}^{\infty }\gamma _j\langle \phi _j,X\rangle +{{\varvec{Z}}}^T {\varvec{\theta }}+\varepsilon (\tau ). \end{aligned}$$
(3)

Therefore, the regression model in (3) can be well approximated by

$$\begin{aligned} Y\approx \sum \limits _{j=1}^m\gamma _j\langle \phi _j,X\rangle +{{\varvec{Z}}}^T{\varvec{\theta }}+\varepsilon (\tau ), \end{aligned}$$
(4)

where \(m\le n\) is the truncation level that trades off approximation error against variability and typically diverges with \(n\). We replace the \(\phi _j\) by \(\hat{\phi }_j\) for \(j=1,\ldots ,m\), model (4) can be rewritten as

$$\begin{aligned} Y\approx {{\varvec{Z}}}^T {\varvec{\theta }} +{{\varvec{U}}}{\varvec{\gamma }} + \varepsilon , \end{aligned}$$

where \({{\varvec{U}}}=\{\langle X,\hat{\phi }_j \rangle \}_{j=1,\ldots ,m}\), \({\varvec{\gamma }}=(\gamma _1,\ldots ,\gamma _m)^T\). The quantile coefficient estimates of \({\varvec{\gamma }}\) and \({\varvec{\theta }}\) can be obtained by minimizing

$$\begin{aligned} \sum \limits _{i=1}^n\rho _\tau \left( Y_i-{{\varvec{U}}}_i{\varvec{\gamma }}-{{\varvec{Z}}}_i^T{\varvec{\theta }}\right) \!, \end{aligned}$$
(5)

where \(\rho _\tau (u) = u\left\{ \tau -I(u < 0)\right\} \) is the quantile loss function. The solution to (5) satisfies the following gradient condition:

$$\begin{aligned} \sum \limits _{i=1}^n\psi _\tau \left( Y_i-{{\varvec{U}}}_i{\varvec{\gamma }}-{{\varvec{Z}}}_i^T{\varvec{\theta }}\right) \left( {{\varvec{U}}}_i,{{\varvec{Z}}}_i^T\right) ^T=\mathbf{0}, \end{aligned}$$

where \(\psi _\tau \) is score function of \(\rho _\tau \).

4 Large sample properties

Before presenting the main asymptotic results, we first introduce some conditions required for our asymptotic properties. Throughout this paper, the constant \(C\) may change from line to line for convenience.

  • C1: \(E\Vert X\Vert ^4<C<\infty \).

  • C2: For each \(j\), \(E[U_j^4]\le C \lambda _j\). For the eigenvalues \(\lambda _j\) and Fourier coefficients \(\gamma _j\), we require that \(\lambda _j-\lambda _{j+1}\ge C^{-1}j^{-a-1}\) and \(|\gamma _j|\le Cj^{-b}\) for \(j>1\), \(a>1\) and \(b>a/2+1\).

  • C3: For the tuning parameter \(m\), we assume that \(m\sim n^{1/(a+2b)}\).

  • C4: \(E\Vert {{\varvec{Z}}}\Vert ^4<\infty \).

Conditions C1–C3 are very common in functional linear regression model (see Hall and Horowitz 2007; Shin 2009). Condition C4 is common in functional partially linear regression model (see Shin 2009).

One complicating issue for FPLQRM comes from the dependence between \({{\varvec{Z}}}\) and \(X\). Similar to Shin (2009), we let \({{\varvec{Z}}}={\varvec{\eta }}+\langle {{\varvec{g}}}, X \rangle \), where \({\varvec{\eta }}=(\eta _{1},\ldots , \eta _{p})\) is zero-mean random variable, \({{\varvec{g}}}=(g_{1},\ldots , g_{p})\) with \(g_j \in L_2([0,1]), j=1,\ldots ,p.\)

C5: \(E[{\varvec{\eta }}] = 0\) and \( E[{{\varvec{\eta }}} {\varvec{\eta }}^T ]=\Sigma \). Furthermore, we need that \(\Sigma \) is a positive definite matrix.

Condition C5 controls the limiting behaviour of the variance of \(\hat{{\varvec{\theta }}}\). Speckman (1988), Moyeed and Diggle (1994), He et al. (2002) and Shin (2009) used a similar device for modeling the dependence between parametric and nonparametric component.

The following Theorem 1 describes the rate of convergence of the functional slope parameter \(\beta \) and the asymptotic normality of constant slope parameters \({\varvec{\theta }}\).

Theorem 1

Under conditions C1–C5, we have

$$\begin{aligned} \Vert \beta _0-\hat{\beta }\Vert ^2= O_p\left( \delta _n^2\right) \!, \end{aligned}$$

and

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\theta }}}-{{\varvec{\theta }}})\rightarrow N\left( 0,\varpi \Sigma ^{-1}\right) \end{aligned}$$
(6)

where \(\delta _n^2=n^{-(2b+1)/(a+2b)}\), \(\varpi =\frac{\tau (1-\tau )}{f^2(0)}\) with \(f(\varepsilon )\) is density function of the random error.

5 Simulation studies

In this section, we investigate the finite sample performance of the proposed estimation method with Monte Carlo simulation studies. We consider two sample sizes \(n=100\) and \(n=400\). We focus on \(\tau = 0.25,0.5\) and \(\tau =0.75\) in this study.

The data are generated from the following quantile regression model

$$\begin{aligned} Y= z_{1}\theta _1+ z_{2}\theta _2+\int \limits _0^1 X(t)\beta (t) dt +\varepsilon (\tau ), \end{aligned}$$

where \(z_{1}\) follows the standard normal distribution and \(z_{2}\) follows a Bernoulli distribution with 0.5 probability of being 1, \(\theta _1=2\) and \(\theta _2=1\). \(\varepsilon (\tau ) = \varepsilon -F^{-1}(\tau )\) with \(F\) being the CDF of \(\varepsilon \). Here, \(F^{-1}(\tau )\) is subtracted from \(\varepsilon \) to make the \(\tau \)th quantile of \(\varepsilon (\tau )\) zero for identifiability purpose. For the functional linear component, we take the same form as Shin (2009), that is, the functional coefficient \(\beta (t)=\sqrt{2} \sin (\pi t/2)+3 \sqrt{2} \sin (3 \pi t/2)\) and \(X(t)=\sum \nolimits _j \xi _j \phi _j(t)\), where the \(\xi _j\) are distributed as independent normal with mean 0 and variance \(\lambda _j=((j-0.5)\pi )^{-2}\) and \(\phi _j(t)=\sqrt{2} \sin ((j-0.5)\pi t)\).

We consider three cases for generating random error \(\varepsilon \).

  • Case 1. \(\varepsilon \) follows a standard normal distribution.

  • Case 2. \(\varepsilon \) follows a \(t(3)\) distribution. This yields a model with heavy-tailed.

  • Case 3. \(\varepsilon \) follows a standard Cauchy distribution. This yields a model in which the expectation of the response do not exist.

Throughout our numerical studies, we choose the number of eigenfunctions as the minimizer to the following Schwarz-type information criterion,

$$\begin{aligned} \text{ SIC }(m)=\log \left\{ \sum \limits _{i=1}^n\rho _\tau \left( Y_i-{{\varvec{z}}}_i^T \hat{{\varvec{\theta }}}_{(m)}-{{\varvec{U}}}_i \hat{{\varvec{\gamma }} }_{(m)} \right) \right\} +\frac{\log (n)}{2n}(m+p), \end{aligned}$$

where \(p=2\), \({\hat{\varvec{\theta }}}_{(m)}\) and \({\hat{\varvec{\gamma }}}_{(m)}\) are the \(\tau \)th quantile estimators obtained from minimizing (5) with \(m\) eigenfunctions; see He et al. (2002) and Wang et al. (2009) for a similar criterion for tuning parameters selection. Based on Condition C3, the optimal order of \(m\) should have the same order as \(n^{1/(a+2b)}\) with \(a>1\) and \(b>a/2+1\). Similar to Zhang and Liang (2011), we propose to choose the optimal knot number, \(m\), from a neighborhood of \(n^{1/5.5}\). In our simulation studies, we have used \([2/3Nr, 4/3Nr]\), where \(Nr =\text{ ceiling }(\text{ n }^{1/5.5})\) and the function ceiling\((\cdot )\) returns the smallest integer not less than the corresponding element. Then the optimal knot number, \(m_{opt}\) , is the one which minimizes the SIC value. That is

$$\begin{aligned} m_{opt}=\arg \min \limits _{m\in [2/3Nr,4/3Nr]}\text{ SIC } (m). \end{aligned}$$

Base on 1,000 random perturbations, Table 1 summarizes the bias (Bias) and mean squared error (MSE) of the estimated \(\theta \) with different sample sizes under different quantile level for different cases. Bias is reasonably small in general. We may conclude that this simulation study provides strong evidence in support of the asymptotic theory that we derive in Sect. 4.

Table 1 Simulations results

Figure 1 shows the boxplots for standard normal random error with different sample sizes and quantile level. For the student t distribution with \(3^{\circ }\) and standard Cauchy random error distribution, the figures perform similarly with the standard normal distribution. To save space, we omit them. From the boxplots, we known that the proposed methods is consistent.

Fig. 1
figure 1

The boxplots of resulting estimator of \(\hat{\theta }-\theta _0\) at different quantile levels with different sample sizes for standard normal random error

Figures 2, 3 and 4 demonstrate the performance of the curve estimation of the slope parameter \(\beta (\cdot )\) for different cases under different levels with \(n=100\) and show that the estimated curves are very close to the true curve \(\beta (\cdot )\). We may conclude that the proposed estimation of the function \(\beta (\cdot )\) performs reasonably well.

Fig. 2
figure 2

The true \(\beta (t)\) (blue) and \(\hat{\beta }(t)\) (red) for standard normal random error with \(\text{ n }=100\) (color figure online)

Fig. 3
figure 3

The true \(\beta (t)\) (blue) and \(\hat{\beta }(t)\) (red) for standard Cauchy random error with \(\text{ n }\,=\,100\) (color figure online)

Fig. 4
figure 4

The true \(\beta (t)\) (blue) and \(\hat{\beta }(t)\) (red) for t(3) random error with \(\text{ n }=100\) (color figure online)