1 Introduction

Quantile regression (QR), introduced by Koenker and Bassett (1978), has been widely used as a way to estimate the conditional quantiles of a response variable distribution. Thus, QR in general provides a much more comprehensive picture of the conditional distribution of a response variable than the conditional mean function. Furthermore, QR is a useful and robust statistical method for estimating and conducting inferences about a model for conditional quantile functions (Yu et al. 2003). Applications of QR in many different areas, including medicine (Cole and Green 1992; Heagerty and Pepe 1999), survival analysis (Ying et al. 1995; Koenker and Geling 2001; Shim and Hwang 2009), econometrics (Hendricks and Koenker 1992; Koenker and Hallock 2001; Shim et al. 2011), and growth charts (Wei and He 2006), have been studied.

To address the curse of dimensionality problem in regression study, the additive model by Breiman and Friedman (1985) and the varying coefficient (VC) model by Hastie and Tibshirani (1993) have been proposed. It is well known that a general form of the VC model includes the additive model as a special case. VC models constitute an important class of nonparametric models. However, VC models have inherited the simplicity and easy interpretation of classical linear models. The introductions, various applications, and current research areas of VC models can be found in Hastie and Tibshirani (1993), Hoover et al. (1998), Fan and Zhang (2008), and Park et al. (2015). Recently, QR with VCs has been studied. Honda (2004) considered the estimation of conditional quantiles in VC models by estimating the coefficients by local \(L_1\) regression. Kim (2007) also considered conditional quantiles with VCs and proposed a methodology for their estimation and assessment using polynomial splines. Cai and Xu (2008) considered QR with VCs for a time series model. They used local polynomial schemes to estimate the coefficients. In this paper, we propose a support vector quantile regression (SVQR) with VCs and its two estimation methods, which can be applied effectively to high-dimensional cases. This is the first article that deals with SVQR with VCs. By the way, we do not deal with the quantile crossing problems in this paper.

The support vector machine (SVM), first developed by Vapnik (1995) and his group at AT&T Bell Laboratories, has been successfully applied to a number of real world problems related to classification and regression problems. Takeuchi and Furuhashi (2004) first considered QR by SVM. Li et al. (2007) proposed a SVQR using quadratic programming (QP) and derived a simple formula for the effective dimension of the SVQR, which allows convenient selection of hyperparameters. Shim and Hwang (2009) considered a modified SVQR using an iterative reweighted least squares (IRWLS) procedure.

In this paper we present an SVQR with nonlinear coefficient functions and its two estimation methods. One uses QP and the other uses the IRWLS procedure. The IRWLS procedure uses a modified check function. This IRWLS procedure makes it possible to derive a generalized cross validation (GCV) method for choosing hyperparameters and to construct pointwise confidence intervals for coefficient functions. We also investigate the performance of the SVQR estimations through numerical studies. The rest of this paper is organized as follows. Section 2 introduces two versions of SVQR with VCs. Sections 3 and 4 present our numerical studies and conclusions, respectively.

2 SVQR with VCs

In this section we propose two versions of SVQR with VCs and their hyperparameter selection procedures.

2.1 SVQR with VCs using QP

We now illustrate SVQR with VCs using QP and its hyperparameter selection procedure. In this section we adapt a dimension reduction modeling method termed the VC modeling approach to explore dynamic patterns.

We assume the \(\theta \)th QR with VCs takes the form

$$\begin{aligned} q_{\theta }\left( \varvec{x}_{i}, \varvec{u}_{i}\right) = \sum _{k=0}^{d_x} x_{ik} \beta _{k,\theta }\left( \varvec{u}_{i}\right) = \varvec{\beta }^t_\theta \left( \varvec{u}_i\right) \varvec{x}_i , \end{aligned}$$
(1)

where superscript t denote the transpose, \(\varvec{u}_{i}\) is called the smoothing variables, \(\varvec{x}_{i} =(x_{i0}, x_{i1}, \ldots , x_{i d_x})^t\) with \(x_{i0} \equiv 1\) is the input vector, \(\{\beta _{k,\theta }(\cdot )\}\) are smooth coefficient functions, and \(\varvec{\beta }_\theta (\varvec{u}_i)=(\beta _{0,\theta }(\varvec{u}_i), \ldots , \beta _{{d_x},\theta }(\varvec{u}_i))^t\). Here all of the \(\{\beta _{k,\theta }(\cdot )\}\) are allowed to depend on \(\theta \). For simplicity, we drop \(\theta \) from \(\{\beta _{k,\theta }(\cdot )\}\). The QR model (1) has been widely used to analyze conditional quantiles due to their flexibility and interpretability. In fact, this model constitutes an important class of nonparametric models.

We first estimate the coefficients \(\{\beta _{k}(\cdot )\}\) using the basic principle of SVQR based on the training data set \(\mathcal{D}=\{ ( \varvec{x}_{i} , \varvec{u}_{i} , y_{i})\}_{i=1}^{n}\). Then we estimate the conditional quantile \(q_{\theta }(\cdot , \cdot )\) in VC model by estimating the coefficients. For the SVQR with VCs we assume that each coefficient function \(\beta _k (\varvec{u}_{i})\) is nonlinearly related to the smoothing variables \(\varvec{u}_{i}\) such that \(\beta _{k}({\varvec{u}}_{i})=\varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i})+b_{k}\) for \( k=0, \ldots , d_x\), where \(\varvec{w}_{k}\) is a corresponding weight vector of size \(d_f \times 1\). Here the nonlinear feature mapping function \(\varvec{\phi }: R^{d_u} \rightarrow R^{d_f}\) maps the input space to the higher dimensional feature space, where the dimension \(d_f\) is defined in an implicit way. An inner product in feature space has an equivalent kernel in input space, \(\varvec{\phi }(\varvec{u}_{i})^t \varvec{\phi }(\varvec{u}_{j}) = K(\varvec{u}_{i},\varvec{u}_{j})\), provided certain conditions hold Mercer (1909). Among several kernel functions we use in this paper Gaussian kernel, polynomial kernel and Epanechnikov kernel defined, respectively, as

$$\begin{aligned} K \left( \varvec{u}_{i}, \varvec{u}_{j}\right)= & {} \exp \left( - \Vert \varvec{u}_{i} - \varvec{u}_{j} \Vert ^2 / 2 \sigma ^2 \right) , \\ K \left( \varvec{u}_{i}, \varvec{u}_{j} \right)= & {} \left( 1 + \varvec{u}_{i}^{t} \varvec{u}_{j} \right) ^d , \quad i, j =1, \ldots , n, \\ K\left( \varvec{u}_i, \varvec{u}_j \right)= & {} 0.75 \left( 1- \Vert \frac{\varvec{u}_i - \varvec{u}_j}{h} \Vert ^{2}\right) I\left( \Vert \frac{\varvec{u}_i - \varvec{u}_j}{h} \Vert < 1\right) , \end{aligned}$$

where \(\sigma \), h and d are kernel parameters.

Then, using the basic principle of SVQR, the coefficient estimators \(\{\hat{\beta }_{k}(\cdot )\}\) of SVQR with VCs can be obtained by minimizing the following equation,

$$\begin{aligned} L = {1 \over 2}\sum _{k=0}^{d_x} \left\| \varvec{w}_{k} \right\| ^{2}+{C} \sum _{i=1}^{n} \rho _{\theta }\left( y_{i}-\sum _{k=0}^{d_x} x_{ik}\left( \varvec{w}_{k}^t \varvec{\phi }\left( \varvec{u}_{i}\right) +b_{k}\right) \right) , \end{aligned}$$
(2)

where \(\rho _\theta (r) = \theta r I(r \ge 0 ) -(1- \theta ) r I(r<0)\) is the check function with the indicator function \(I (\cdot )\), and \(C>0\) is a penalty parameter which controls the balance between the smoothness and fitness of the QR estimator.

We can express the optimization problem (2) by the formulation for SVQR as follows:

$$\begin{aligned} L={1 \over 2}\sum _{k=0}^{d_x} \left\| \varvec{w}_{k} \right\| ^{2} + C \theta \sum _{i=1}^{n}\xi _{i} +C(1-\theta )\sum _{i=1}^{n}\xi _{i}^* \end{aligned}$$

subject to

$$\begin{aligned} \left\{ \begin{array}{l} y_{i}-\sum _{k=0}^{d_x}x_{ik}\left( \varvec{w}_{k}^t\varvec{\phi }(\varvec{u}_{i}) + b_{k}\right) \le \xi _i, \\ \,\\ - y_{i}+\sum _{k=0}^{d_x}x_{ik}\left( \varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k}\right) \le \xi _i^*, \quad i=1,\ldots ,n. \\ \end{array} \right. \end{aligned}$$

We construct a Lagrange function as follows:

$$\begin{aligned} L= & {} {1 \over 2}\sum _{k=0}^{d_x} \Vert \varvec{w}_{k} \Vert ^{2}+C \theta \sum _{i=1}^{n}\xi _{i} +C (1-\theta )\sum _{i=1}^{n} \xi _{i}^* \nonumber \\&-\,\sum _{i=1}^{n} \alpha _{i}\left( \xi _{i}-y_{i}+\sum _{k=0}^{d_x}x_{ik}(\varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k})\right) \nonumber \\&-\,\sum _{i=1}^{n}\alpha _{i}^{*}\left( \xi _{i}^{*}+y_{i}-\sum _{k=0}^{d_x}x_{ik}(\varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k})\right) \nonumber \\&-\,\sum _{i=1}^{n}\eta _{i}\xi _{i} -\sum _{i=1}^{n}\eta _{i}^{*}\xi _{i}^*. \end{aligned}$$
(3)

We notice that the non-negative constraints with Lagrange multipliers \(\alpha _{i}^{(*)},\eta _{i}^{(*)} \ge 0\) should be satisfied. Taking partial derivatives of Eq. (3) with regard to the primal variables \((\varvec{w}_{k}, \xi _{i}^{(*)},b_k)\), we have

$$\begin{aligned}&{{\partial L} \over {\partial \varvec{w}_{k}}}= \varvec{0} \Rightarrow \varvec{w}_{k}= \sum _{i=1}^{n} x_{ik} \varvec{\phi }(\varvec{u}_{i})(\alpha _{i}-\alpha _{i}^{*}),\quad k=0, 1, \ldots , d_x , \nonumber \\&{{\partial L} \over {\partial \xi _{i}}}= 0 \Rightarrow C \theta = \alpha _{i}+\eta _{i}, \quad i=1,\ldots ,n, \nonumber \\&{{\partial L}\over {\partial {\xi }_{i}^*}}={ 0} \Rightarrow C(1-\theta )= \alpha _{i}^{*}+\eta _{i}^*,\quad i=1,\ldots ,n, \nonumber \\&{{\partial L}\over {\partial b_{k}}}={ 0} \Rightarrow \sum _{i=1}^{n}x_{ik}\left( \alpha _{i}-\alpha _{i}^{*}\right) =0, \quad k=0, 1, \ldots , d_x. \end{aligned}$$

Plugging the above results into the Eq. (3), we have the dual optimization problem to maximize

$$\begin{aligned} -{1 \over 2} \sum _{i,j=1}^{n}\left( \alpha _{i}-\alpha _{i}^{*}\right) \left( \alpha _{j}-\alpha _{j}^{*}\right) \sum _{k=0}^{d_x}x_{ik}x_{jk} K(\varvec{u}_{i},\varvec{u}_{j})+\sum _{i=1}^{n}y_{i}\left( \alpha _{i}-\alpha _{i}^{*}\right) \end{aligned}$$
(4)

subject to

$$\begin{aligned} \left\{ \begin{array}{l} \sum _{i=1}^{n}x_{ik}\left( \alpha _{i}-\alpha _{i}^{*}\right) =0,\quad k=0,1, \ldots , d_x, \\ \,\\ 0 \le \alpha _{i} \le C \theta , \; 0 \le \alpha _{i}^{*} \le C (1-\theta ),\quad i=1, \ldots , n. \end{array} \right. \end{aligned}$$

We notice that this SVQR with VCs works by solving a constrained QP problem.

Solving the QP problem (4) with the constraints determines the optimal Lagrange multipliers \((\hat{\alpha }_{i}, \hat{\alpha }_{i}^{*})\). Thus, for a given \((\varvec{x}_{t},\varvec{u}_{t})\) the SVQR with VCs using QP for coefficient function estimation takes the form:

$$\begin{aligned} {\hat{\beta }}_{k}(\varvec{u}_{t})=\sum _{i=1}^{n}x_{ik}K(\varvec{u}_{t},\varvec{u}_{i})\left( {\hat{\alpha }}_{i}-{\hat{\alpha }}_{i}^{*}\right) +{\hat{b}}_{k}, \end{aligned}$$

and then for QR function estimator takes the form:

$$\begin{aligned} {\hat{q}_\theta } (\varvec{x}_t,\varvec{u}_t) = \sum _{i=1}^n \sum _{k=0}^{d_x} x_{tk} x_{ik}K(\varvec{u}_t,\varvec{u}_i) \left( {\hat{\alpha }}_i-{\hat{\alpha }}_i^*\right) + \sum _{k=0}^{d_x} x_{tk}{\hat{b}}_k. \end{aligned}$$

We remark that \((\varvec{x}_{t},\varvec{u}_{t})\) could be an observation in the training data set or a new observation. Here \({\hat{b}}_k\) for \(k=0, 1, \ldots , d_x\) is obtained via Kuhn–Tucker conditions (Kuhn and Tucker 1951) such as

$$\begin{aligned} \left( \begin{array}{c} {\hat{b}}_0 \\ {\hat{b}}_1 \\ \vdots \\ {\hat{b}}_{d_x} \\ \end{array} \right) = \left( \varvec{X}_s^t \varvec{X}_s \right) ^{-1} \varvec{X}_{s}^{t} {\varvec{y}}_{s}, \end{aligned}$$
(5)

where \(\varvec{X}_{s}\) is an \(n_s \times (d_{x}+1)\) matrix with ith row \(\varvec{x}_{i}^t\) for \(i \in I_s= \{ i=1,\ldots ,n | 0< \alpha _i< C \theta , 0< \alpha _i^* < C (1-\theta ) \} \), \({\varvec{y}}_s\) is an \(n_s \times 1\) vector with ith element \(\left( y_i-\sum _{j=1}^n\sum _{k=0}^{d_x} x_{ik} x_{jk} K(\varvec{u}_i,\varvec{u}_j)\left( {\hat{\alpha }}_j-{\hat{\alpha }}_j^*\right) \right) \) for \(i \in I_{s}\) and \(n_s\) is the size of \(I_s\).

We now consider the hyperparameter selection problem which determines the appropriate hyperparameters of the proposed SVQR with VCs using QP. The functional structure of the SVQR with VCs using QP is characterized by hyperparameters such as the regularization parameter C and the kernel parameter \(\gamma \) \(\in \{\sigma , h, d\}\). To choose the values of hyperparameters of the SVQR with VCs using QP we first need to consider the cross validation (CV) function as follows:

$$\begin{aligned} CV(\varvec{\lambda })={\sum _{i=1}^n \rho _\theta \left( y_i- {\hat{q}}_{\theta }^{(-i)} (\varvec{x}_{i},\varvec{u}_{i} ) \right) }, \end{aligned}$$

where \(\varvec{\lambda }=(C, \gamma )\) is the set of hyperparameters, and \({\hat{q}}_{\theta }^{(-i)} (\varvec{x}_{i},\varvec{u}_{i} )\) is the \(\theta \)th QR function estimated without ith observation. Since for each candidate of hyperparameters, \({\hat{q}}_{\theta }^{(-i)} (\varvec{x}_{i},\varvec{u}_{i} )\) for \(i=1,\ldots ,n\), should be evaluated, selecting hyperparameters using CV function is computationally formidable. Applying Yuan (2006), a GACV function to select the set of hyperparameters \(\varvec{\lambda }\) for SVQR with VCs using QP is shown as follows:

$$\begin{aligned} GACV(\varvec{\lambda })={ {\sum _{i=1}^{n} \rho _{\theta }\left( y_{i}-{\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) \right) } \over {n-df}}, \end{aligned}$$

where df is a measure of the effective dimensionality of the fitted model. In this paper we used \(df = n_s \) related with (5) from Li et al. (2007). Another common criterion is Schwarz information criterion (SIC) (Schwarz (1978), Koenker et al. (1994))

$$\begin{aligned} SIC(\varvec{\lambda }) = \ln \left( {1 \over n} \sum _{i=1}^n \rho _{\theta } \left( y_i -{\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) \right) \right) + {{\ln n} \over {2n}} df. \end{aligned}$$

2.2 SVQR with VCs using IRWLS

We now illustrate SVQR with VCs using IRWLS procedure and its hyperparameter selection procedure. This method enables us to derive GCV for selecting hyperparameters and obtain the variance of \({\hat{\beta }}_{k}(\varvec{u}_{t})\) so as to construct an approximate pointwise confidence interval for \({\beta }_{k}(\varvec{u}_{t})\).

The check function \(\rho _\theta (\cdot )\) used in SVQR with VCs using QP can be seen as the weighted quadratic loss function such as

$$\begin{aligned} \rho _\theta (r)=\upsilon (\theta )r^2, \end{aligned}$$

where \(\upsilon (\theta )=( {\theta } I{(r \ge 0)} + (1-\theta ) I{(r <0)})/|r|\). Now the optimization problem (2) becomes the problem of obtaining \((\varvec{w}_k , b_k)\)’s which minimize

$$\begin{aligned} L={1 \over 2} \sum _{k=0}^{d_x} \Vert \varvec{w}_{k} \Vert ^{2}+{C \over 2}\sum _{i=1}^{n}\upsilon _i(\theta ) \left( y_{i}-\sum _{k=0}^{d_x}x_{ik}\left( \varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k}\right) \right) ^2, \end{aligned}$$
(6)

where \(\upsilon _i(\theta )= ( {\theta } I{(e_i \ge 0)} + (1-\theta ) I{(e_i <0)})/|e_i|\) with \(e_i=y_{i}-\sum _{k=0}^{d_x}x_{ik}(\varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k})\) and \(C>0\) is a penalty parameter.

We can express the optimization problem (6) by formulation for weighted least squares SVM as follows:

$$\begin{aligned} L={1\over 2}\sum _{k=0}^{d_x} \Vert \varvec{w}_{k} \Vert ^{2}+{C \over 2}\sum _{i=1}^{n}\upsilon _i(\theta ) e_{i}^2 \end{aligned}$$

subject to

$$\begin{aligned} y_{i}-\sum _{k=0}^{d_x}x_{ik}\left( \varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) + b_{k}\right) = e_i,\quad i=1,\ldots ,n. \end{aligned}$$

We construct a Lagrange function as follows:

$$\begin{aligned} L\!=\!{1 \over 2 } \sum _{k=1}^{d_x} \Vert \varvec{w}_{k} \Vert ^{2} \!+\! {C \over 2}\sum _{i=1}^{n}\upsilon _i(\theta ) e_{i}^2 \!-\!\sum _{i=1}^{n}\alpha _{i}\left( e_{i}-y_{i}\!+\! \sum _{k=0}^{d_x}x_{ik}\left( \varvec{w}_{k}^t \varvec{\phi }(\varvec{u}_{i}) \!+\! b_{k}\right) \right) ,\nonumber \\ \end{aligned}$$
(7)

where \(\alpha _i\)’s are Lagrange multipliers. Taking partial derivatives of Eq. (7) with regard to \((\varvec{w}_{k}, b_k , e_i , \alpha _{i})\) we have,

$$\begin{aligned}&{{\partial L}\over {\partial \varvec{w}_{k}} }= \varvec{0} \Rightarrow \varvec{w}_{k}=\sum _{i=1}^{n} x_{ik} \varvec{\phi }(\varvec{u}_{i}) \alpha _{i},\quad k=0,\ldots , d_x , \\&{{\partial L}\over {\partial b_{k}}}=0 \Rightarrow \sum _{i=1}^n x_{ik} \alpha _{i}=0,\quad k=0,\ldots , d_x , \\&{{\partial L}\over {\partial e_{i}}}= 0 \Rightarrow C \upsilon _i(\theta ) e_i -\alpha _{i}=0,\quad i=1,\ldots ,n, \\&{{\partial L}\over {\partial \alpha _{i}}}= 0 \Rightarrow e_{i}-y_{i}+\sum _{k=0}^{d_x}x_{ik}(\varvec{w}_{k}^t\varvec{\phi }(\varvec{u}_{i}) + b_{k})=0,\quad i=1,\ldots ,n. \end{aligned}$$

After eliminating \(e_i\)’s and \(\varvec{w}_k\)’s, we have the optimal values of \(\alpha _i\)’s and \(b_k\)’s from the linear equation as follows:

$$\begin{aligned} \left( \begin{array}{cc} \varvec{X}\varvec{X}^t \odot \varvec{K} + \frac{1}{C} \varvec{V}(\theta )^{-1} &{}\quad \varvec{X}\\ \varvec{X}^t &{}\quad \varvec{0}_{(d_x+1) \times (d_x+1)} \end{array} \right) \left( \begin{array}{c} \varvec{\alpha }\\ \varvec{b}\end{array} \right) = \left( \begin{array}{c} \varvec{y}\\ \varvec{0}_{(d_{x}+1) \times 1} \end{array} \right) \end{aligned}$$
(8)

where \(\varvec{X}=(\varvec{x}_1, \ldots , \varvec{x}_n)^t\), \(\varvec{K}\) is an \( n \times n \) kernel matrix with \((i,j)\hbox {th}\) element \(K(\varvec{u}_{i},\varvec{u}_{j})\), \(\varvec{V}(\theta )\) is an \( n \times n \) diagonal matrix of \(\upsilon _{i}(\theta )\), \(\varvec{0}_{p \times q}\) is a \(p \times q\) zero matrix, \({\varvec{\alpha }}=({\alpha }_{1}, \ldots , {\alpha }_{n})^t\), \({\varvec{b}}=(b_{0}, \ldots , b_{d_x})^t\) and \(\odot \) denotes a componentwise product. We notice that the solution to (8) cannot be obtained in a single step since \(\varvec{V}(\theta )\) contains \((\varvec{\alpha }, \varvec{b})\), which leads to apply the IRWLS procedure which starts with initialized values of (\(\varvec{\alpha }, \varvec{b}\)).

Solving the linear Eq. (8) determines the optimal Lagrange multipliers \(\hat{\alpha }_{i}\)’s and bias terms \(\hat{b}_{k}\)’s. Thus, for a given \((\varvec{x}_{t},\varvec{u}_{t})\) the SVQR with VCs using IRWLS for coefficient function estimation takes the form:

$$\begin{aligned} {\hat{\beta }}_{k}(\varvec{u}_{t})=\sum _{i=1}^{n}x_{ik}K(\varvec{u}_{t},\varvec{u}_{i}) {\hat{\alpha }}_{i} +{\hat{b}}_{k}, \end{aligned}$$
(9)

and then for QR function estimation takes the form:

$$\begin{aligned} {\hat{q}_\theta } (\varvec{x}_t,\varvec{u}_t) = \sum _{i=1}^n \sum _{k=0}^{d_x} x_{tk} x_{ik}K(\varvec{u}_t,\varvec{u}_i) {\hat{\alpha }}_i + \sum _{k=0}^{d_x} x_{tk}{\hat{b}}_k. \end{aligned}$$
(10)

For the purpose of utilizing in constructing confidence intervals for \(\beta _{k}(\varvec{u}_{t})\) and \({q}_{\theta }(\varvec{x}_{t},\varvec{u}_{t})\) we are going to express \(\hat{\beta }_{k}(\varvec{u}_{t})\) and \({\hat{q}}_{\theta }(\varvec{x}_{t},\varvec{u}_{t})\) as the linear combination of \(\varvec{y}\) in what follows. From (8) we can express \(\hat{\beta }_{k}(\varvec{u}_{t})\) as follows:

$$\begin{aligned} {\hat{\beta }}_{k}(\varvec{u}_{t})= & {} \left( \varvec{x}_{(k)}^t \odot \varvec{k}_{t}, {\varvec{\nu }}_{d_x +1}^t(k)\right) \varvec{M} \varvec{y}\nonumber \\= & {} \varvec{s}_{k}(\varvec{u}_{t}) \varvec{y}, \end{aligned}$$
(11)

where \(\varvec{s}_{k}(\varvec{u}_{t})=( \varvec{x}_{(k)}^t \odot \varvec{k}_{t}, {\varvec{\nu }}_{d_x +1}^{t} (k)) \varvec{M}\), \(\varvec{x}_{(k)}\) is the \((k+1)\)th column of \(\varvec{X}\), \(\varvec{k}_{t} = (K(\varvec{u}_t , \varvec{u}_1),\ldots , K(\varvec{u}_t , \varvec{u}_n))\), \(\varvec{\nu }_{d_x +1}(k)\) is a vector of size \((d_{x}+1)\) with 0’s but 1 in \((k+1)\)th, and \(\varvec{M}\) is the \((n+d_{x}+1) \times n\) submatrix of the inverse of the leftmost matrix in (8). For a point \((\varvec{u}_{t},\varvec{x}_{t})\) we can also express \({\hat{q}}_{\theta }(\varvec{x}_{t},\varvec{u}_{t})\) as follows:

$$\begin{aligned} {\hat{q}}_{\theta }(\varvec{x}_{t},\varvec{u}_{t})= \varvec{h}_{t}(\theta ){\varvec{y}}, \end{aligned}$$

where \(\varvec{h}_{t}(\theta )= \left( (\varvec{x}^t_{t} \varvec{X}^t) \odot \varvec{k}_t, \varvec{x}^t_{t} \right) \varvec{M}\). From (11) we can obtain the estimator of \(Var({\hat{\beta }}_{k}(\varvec{u}_{t}))\) for \(k=0,1,\ldots ,d_x\) as follows:

$$\begin{aligned} \widehat{Var} \left( {\hat{\beta }}_{k}(\varvec{u}_{t})\right) = \varvec{s}_{k}(\varvec{u}_{t}) \hat{\varvec{{\varSigma }}} \varvec{s}_{k}^{t}(\varvec{u}_{t}), \end{aligned}$$
(12)

where \(\hat{\varvec{{\varSigma }}}\) is an estimator of \(Var(\varvec{y})\).

For nonparametric inference the confidence interval is really useful. There are two types of confidence intervals. One is the pointwise confidence interval. The other is the simultaneous confidence interval. Our interest here is in estimating coefficient functions rather than the QR function itself. Thus we illustrate the pointwise confidence intervals only for the coefficient functions in the SVQR with VCs using IRWLS. But the pointwise confidence interval for the QR function can be derived in the same way. The estimated variance (12) can be used to construct pointwise confidence intervals. Under certain regularity conditions (Shiryaev 1996), the central limit theorem for linear smoothers is valid and we can show asymptotically

$$\begin{aligned} \frac{\hat{\beta }_{k} (\varvec{u}_{t})- E \left( \hat{\beta }_{k} (\varvec{u}_{t}) \right) }{ \widehat{Var} \left( \hat{\beta }_{k} (\varvec{u}_{t}) \right) } \rightarrow ^{D} N(0,1), \quad k = 0, 1, \ldots , d_{x}, \end{aligned}$$

where \(\rightarrow ^{D}\) denotes convergence in distribution. If the estimator is conditionally unbiased, i.e., \(E ( \hat{\beta }_{k} (\varvec{u}_{t}) ) = \beta _{k} (\varvec{u}_{t})\) for \( k= 0, 1, \ldots , d_{x}\), approximate \(100(1- \alpha )\%\) pointwise confidence interval takes the form

$$\begin{aligned} \left( \hat{\beta }_{k} (\varvec{u}_{t}) \pm z_{1 - \frac{\alpha }{2}} \sqrt{\widehat{Var} \left( \hat{\beta }_{k} (\varvec{u}_{t}) \right) } \right) , \quad k= 0, 1, \ldots , d_{x}, \end{aligned}$$
(13)

where \(z_{1 - \alpha /2}\) denotes the \((1 - \alpha /2)\)th quantile of the standard normal distribution. In fact, the interval (13) is a confidence interval for \(E ( \hat{\beta }_{k} (\varvec{u}_{t}) )\). It is a confidence interval for \(\beta _{k} (\varvec{u}_{t})\) under the assumption \(E ( \hat{\beta }_{k} (\varvec{u}_{t}) ) = \beta _{k} (\varvec{u}_{t})\). Thus it is actually the bias-ignored approximate \(100(1- \alpha )\%\) pointwise confidence interval.

We now consider the hyperparameter selection problem which determines the appropriate hyperparameters of the proposed SVQR with VCs using IRWLS. To determine the values of hyperparameters of the SVQR with VCs using IRWLS we first need to consider the CV function as follows:

$$\begin{aligned} CV(\lambda )= {{1}\over {n}}\sum _{i=1}^{n} \upsilon _{i} (\theta ) \left( y_{i}-{\hat{q}}_{\theta }^{(-i)}(\varvec{x}_{i},\varvec{u}_{i}) \right) ^2 \end{aligned}$$

By leaving-out-one Lemma of Craven and Wahba (1979),

$$\begin{aligned} \left( { y}_{i}-{\hat{q}}_{\theta }^{(-i)}\left( \varvec{x}_{i},\varvec{u}_{i}\right) \right) - \left( {y}_{i}-{\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) \right)= & {} {\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) -{\hat{q}}_{\theta }^{(-i)}\left( \varvec{x}_{i},\varvec{u}_{i}\right) \\\simeq & {} {{\partial {\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) } \over {\partial y_i}} \left( y_i-{\hat{q}}_{\theta }^{(-i)}\left( \varvec{x}_i,\varvec{u}_i\right) \right) \end{aligned}$$

we have

$$\begin{aligned} \left( y_{i}-{\hat{q}}_{\theta }^{(-i)}\left( \varvec{x}_{i},\varvec{u}_{i}\right) \right) \simeq {{y_{i}-{\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) } \over {1- {{\partial {\hat{q}}_{\theta }\left( \varvec{x}_{i},\varvec{u}_{i}\right) } \over {\partial y_{i}}}}}. \end{aligned}$$

Then the ordinary cross validation (OCV) function can be obtained as

$$\begin{aligned} OCV({\varvec{\lambda }})= & {} {{1}\over {n}}\sum _{i=1}^{n} \upsilon _{i}(\theta ) \left( {{y_{i}-{\hat{q}}_{\theta }(\varvec{x}_{i},\varvec{u}_{i})} \over {1- {{\partial {\hat{q}}_{\theta }(\varvec{x}_{i},\varvec{u}_{i})} \over {\partial y_{i}}}}} \right) ^2 \\= & {} {{1} \over {n}} \sum _{i=1}^{n} \upsilon _{i}(\theta ) \left( {{y_{i}-{\hat{q}}_{\theta }(\varvec{x}_{i},\varvec{u}_{i})} \over {1- h_{ii}}} \right) ^2, \end{aligned}$$

where \(h_{ij} = {\partial {\hat{q}}_{\theta }(\varvec{x}_{i},\varvec{u}_{i})}/{\partial y_{j}}\) is an (ij)th element of the hat matrix \(\varvec{H}\). Replacing \(h_{ii}\) by their average \(tr(\varvec{H})/n\), the GCV function can be obtained as

$$\begin{aligned} GCV(\varvec{\lambda })= {{n \sum _{i=1}^n \upsilon _{i}(\theta ) \left( y_{i} - {\hat{q}}_{\theta }\left( \varvec{x}_i , \varvec{u}_i\right) \right) ^2} \over {\left( n-tr(\varvec{H})\right) ^2}}. \end{aligned}$$

3 Numerical studies

In this section, we illustrate the performance of the SVQR with VCs using QP and IRWLS with synthetic data and the wage data in Wooldridge (2003). For our numerical studies, we compare the proposed methods with SVQR in the study by Li et al. (2007) and local polynomial quantile regression with VCs (LPQRVC) in the study by Cai and Xu (2008). Throughout this paper, we use the Epanechnikov kernel for the LPQRVC and the Gaussian kernel function for the SVQR, SVQR with VCs using QP and IRWLS. For hyperparameter selection we use the CV function for the LPQRVC method, the GCV function for the SVQR with VCs using IRWLS, and the GACV function for the SVQR and SVQR with VCs using QP. To obtain the best of each method, we use different kernels and criteria. The hyerparameters are selected to minimize each objective function with the grid search method. The candidates sets of the regularization parameter C and kernel parameter \(\sigma \) in SVQR with VCs using QP and IRWLS, and SVQR are \(\{10, 20, 40, 70, 100, 200, 400, 600, 800, 1000, 1200 \}\) and \(\{ 0.5, 1, 2, \ldots , 8 \}\), respectively. The parameter h in Epanechnikov kernel function is selected from the set \(\{0.1, 0.2, \ldots ,1 \}\). We use \(\varvec{0}\)’s as the initial values of \(\varvec{\alpha }\) and \(\varvec{b}\) for the IRWLS procedure associated with the SVQR with VCs using IRWLS.

Table 1 Comparison of the MISE and SDISE values for the case that \(e_{i} \sim \; i.i.d. \; N(0, 1)\)

3.1 Synthetic data example

For the synthetic data example we generate \(\{ ( \varvec{x}_{i} , u_{i} , y_{i})\}_{i=1}^{n}\) from the location-scale model,

$$\begin{aligned} y_{i} = \beta _{1}(u_{i}) x_{1i} + \beta _{2}(u_{i}) x_{2i}+ \sigma (u_{i}) e_{i},\quad i=1,\ldots ,n, \end{aligned}$$

where \(\beta _{1}(u_{i})=\mathrm{sin}(\sqrt{2}\pi u_{i})\), \(\beta _{2}(u_{i})=\mathrm{cos}(\sqrt{2}\pi u_{i})\), \(\sigma ( u_{i})= \exp ( \sin (0.5 \pi u_{i} ))\), \(u_{i} \sim \; i.i.d. \; U(0,3)\), \(x_{1i}, x_{2i} \sim \; i.i.d. \; N(1,1)\), and \(e_{i} \sim \; i.i.d. \; N(0, 1)\) or Student’s t with three degrees of freedom. The \(\theta \)th QR is

$$\begin{aligned} q_{\theta }( u_{i}, x_{1i}, x_{2i})= \beta _{0}(u_{i}) + \beta _{1}(u_{i}) x_{1 i} + \beta _{2} ( u_{i}) x_{2 i} , \end{aligned}$$
(14)

where \( \beta _{0}(u_{i}) = \sigma (u_{i}) \Phi ^{-1}(\theta )\) and \(\Phi ^{-1}(\theta )\) is the \(\theta \hbox {th}\) quantile of the standard normal.

The performance of the estimators \({\hat{q}}_{\theta }\)’s and \(\hat{\beta }_k\)’s is assessed by the mean integrated squared errors (MISE) and by the standard deviation of ISEs, respectively, defined as

$$\begin{aligned} MISE= & {} \frac{1}{N}\sum _{j=1}^{N} ISE_j, \\ SDISE= & {} \left( \frac{1}{N} \sum _{j=1}^{N} \left( ISE_j - MISE \right) ^2 \right) ^{1/2}, \end{aligned}$$

where \(ISE_j = \frac{1}{n} \sum _{i=1}^n ( \hat{f}_i - f_i )^2\), \(f_i = q_{\theta }( u_{i}, \varvec{x}_i)\) or \(\beta _k ( u_{i})\), \(k=0, 1, 2\), for the \(j\hbox {th}\) data set, and n and N are the numbers of observations and data sets, respectively. For our experiment, we repeat \(N=100\) times with each sample size \(n=100\) for each \(\theta =0.1, 0.5\) and 0.9.

Table 2 Comparison of the MISE and SDISE values for the case that \(e_{i} \sim \; i.i.d. \; t_{3}\)

Tables 1 and 2 show the results for the MISE and SDISE values for \(q_{\theta }\)’s and \(\beta _k\)’s for \(\theta =0.1, 0.5, 0.9\) when the distribution of error term is the standard normal N(0, 1) and Student’s t with three degrees of freedom, respectively. The SDISE values are in parentheses. Boldfaced values indicate the best performance for the given quantity. We know from Table 1 that the proposed SVQR with VCs using QP and IRWLS outperform SVQR and LPQRVC in estimating all \(q_{\theta }\)’s and LPQRVC in estimating all \(\beta _k\)’s for the standard normal error distribution. In particular, the SVQR with VCs using IRWLS has the smallest values of MISE and SDISE for all \(\theta \)’s. We know from Table 2 that the SVQR with VCs using IRWLS outperforms SVQR and LPQRVC in estimating \(q_{\theta }\)’s and LPQRVC in estimating \(\beta _k\)’s except \(\theta = 0.5\) for the \(t_{3}\) error distribution. However, the SVQR with VCs using QP performs best in estimating \(q_{\theta }\), \(\beta _{1}\) and \(\beta _{2}\) except \(\beta _{0}\) for \(\theta = 0.5\) for the \(t_{3}\) error distribution.

Table 3 Estimated coefficients for linear QR for \(\theta =0.1, 0.5\) and 0.9 for the wage data set
Fig. 1
figure 1

Plots of the estimated coefficient functions by SVQR with VCs using IRWLS (SVQRVCLS) for three quantiles, \(\theta = 0.1\) (solid line), \(\theta = 0.5\) (dashed line) and \(\theta = 0.9\) (dotted line). Top left \(\beta _0 (u)\) versus u, top right \(\beta _1 (u)\) versus u, bottom left \(\beta _2 (u)\) versus u, and bottom right \(\beta _3 (u)\) versus u

3.2 Real data example

For a real example we consider a subset of the wage data set studied in Wooldridge (2003), which consists of three variables collected regarding each of 526 working individuals for the year 1976. The dependent variable y is the logarithm of wages in dollars per hour. Among major independent variables possibly affecting wages, we use years of education (u), indicator of gender (\(x_1\)), marital status (\(x_2\)), and years of potential labor force experience (\(x_3\)). Two variables, \(x_1\) and \(x_2\), are binary in nature and serve to indicate qualitative features of the individual. We define \(x_1\) to be a binary variable taking on the value one for males and the value zero for females. We also define \(x_2\) to be one if the person is married and zero if the person is not married. For a complete description of all 24 variables, refer to http://fmwww.bc.edu/ec-p/data/wooldridge/wage1.des.

Simple correlation analysis shows that all variables \(u, x_1, x_2\) and \(x_3\) have positive correlation coefficient values with y, which are 0.4311, 0.3737, 0.2707, and 0.1114, respectively. From the coefficients we might interpret that a married man with higher education and longer experience will have a higher chance of getting higher wages. Another analysis through linear QR for \(\theta =0.1, 0.5\), and 0.9 has been done, and the coefficient estimators are shown in Table 3. From Table 3 we know that marital status and gender are more important factors in predicting wages compared to the education length for the low and median wage group (\(\theta =0.1, 0.5\)). For the high wage group (\(\theta =0.9\)), the effect of years of education is greater than that of marital status, and gender is still a major factor. We can see that gender has the largest coefficient values of 0.3759 and 0.3399 for \(\theta =0.5\) and 0.9, respectively. The coefficient of \(x_3\) is negligibly small for all \(\theta \)’s. It is even a negative value for \(\theta =0.1\)

We now analyze the wage data with the SVQR with VCs using IRWLS only. Figure 1 depicts the estimated coefficient functions for three quantiles, \(\theta = 0.1\) (solid line), \(\theta = 0.5\) (dashed line), and \(\theta = 0.9\) (dotted line), \(\beta _0 (u)\) vs. u in the top left, \(\beta _1 (u)\) vs. u in the top right, \(\beta _2 (u)\) vs. u in the bottom left, and \(\beta _3 (u)\) vs. u in the bottom right. As seen in Fig. 1, wages increase as the years of education increase for the high and median wage groups and remains almost unchanged for the low wage group. The positive effect of gender on wages for the high wage group is strong for subjects with low education status, but the effect gradually disappears as the years of education increase. On the contrary, gender barely affects wages for subjects with low education status, but it has a strong effect on subjects with high education status in the low wage group. However, the positive effect of gender remains almost unchanged regardless of the years of education for the median wage group.

Figure 1 also shows that the effect of marriage slightly increases for the low and median wage groups as the years of education increase, and it remains almost unchanged for the high wage group. The experience length barely affects wages for subjects with low education status in all wage groups. A slight positive effect of experience length on wages is seen for subjects with high education status in the high and median wage groups. However, experience length does not help to increase wages regardless of education status for the low wage group. Thus, we notice that the smoothing variable, u, and the independent variables have different effects on the different quantiles of the conditional distribution of wages.

According to the linear QR analysis, the coefficients of \(x_1\) are 0.1948, 0.3759, and 0.3399 for \(\theta =0.1, 0.5\) and 0.9, respectively. Figure 1 shows that the order of magnitude of these coefficient values is maintained only in the vicinity of \(u=13\). Also, the strong positive effect of gender on wages for the case of the low wage and high education status has not been revealed by the linear QR. Thus, the SVQR with VCs using IRWLS method reveals what we can not observe through linear QR.

4 Conclusion

In this paper, we considered the estimation of conditional quantiles in VC models by estimating the coefficient functions. We proposed the SVQR with VCs using QP and IRWLS for estimating quantiles. The coefficient functions are estimated by using the kernel trick of SVM. The proposed estimators are easy to compute via standard SVQR algorithms. Through two examples, we observed that the proposed methods derive satisfying results and overall give more accurate and stable estimators than the SVQR and LPQRVC. Thus, our methods appear to be useful in estimating QR function and nonlinear coefficient functions. In particular, SVQR with VCs using IRWLS is preferred since this method makes it possible to construct confidence intervals for coefficient functions and save computing time. The SVQR with VCs using QP and IRWLS also make hyperparameter selection easier and faster than a leave-one-out CV or k-fold CV. Thus, the SVQR with VCs using QP and IRWLS methods can be easily and effectively applied to nonlinear regression coefficients depending on the high-dimensional vector of smoothing variables. We conclude that SVQR with VCs using IRWLS is a promising nonparametric estimation method of QR function.