Keywords

JEL Classifications

Introduction

The quantile regression is a semiparametric technique that has been gaining considerable popularity in economics (for example, Buchinsky 1994). It was introduced by Koenker and Bassett (1978b) as an extension to ordinary quantiles in a location model. In this model, the conditional quantiles have linear forms. A wellknown special case of quantile regression is the least absolute deviation (LAD) estimator of Koenker and Bassett (1978a), which fits medians to a linear function of covariates. In an important generalization of the quantile regression model, Powell (1984, 1986) introduced the censored quantile regression model. This model is an extension of the ‘Tobit’ model and is designed to handle situations in which some of the observations on the dependent variable are censored.

The quantile regression model has some very attractive features: (a) it can be used to characterize the entire conditional distribution of a dependent variable given a set of regressors; (b) it has a linear programming representation which makes estimation easy; (c) it gives a robust measure of location; (d) typically, the quantile regression estimator is more efficient than the least squares estimator when the error term is non-normal; and (e) L-estimators, based on a linear combination of quantile estimators (for example, Portnoy and Koenker 1989) are, in general, more efficient than least squares estimators.

This article presents the basic structure of the quantile regression model. It highlights the most important features and provides the elementary tools for using quantile regressions in empirical applications. The article concentrates on cross-section applications, where the observations are assumed to be independently and identically distributed (i.i.d.).

The Model

Definitions and Estimator

Any real-valued random variable z is completely characterized by its (right continuous) conditional distribution function F(z) = Pr (za). For any 0 < θ < 1, the quantity Qθ(z) ≡ F−1(θ) = inf {a : F(a) ≥ θ} is called the θth quantile of z. This quantile is obtained as a solution to a minimization problem of a particular objective function, the check function, given by ρθ(λ) = λ(θI(λ < 0)), where I( ) denotes the usual indicator function. That is,

$$ {Q}_{\theta }(z)\equiv \arg \min \limits_{a} E\left[{\rho}_{\theta}\left(z-a\right)\right] $$

An estimate for the θth quantile of z can be obtained from i.i.d. data zi, i=1,…n, by minimizing the sample analogue of the population objective function defined above. That is,

$$ {\widehat{Q}}_{\theta }(z)\equiv \arg \min \limits_{a} \frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({z}_i-a\right) $$

or alternatively

$$ {\widehat{Q}}_{\theta }(z)\equiv \arg \min \limits_{a} \left\{\sum \limits_{i:{z}_i\ge b}\theta \left|{z}_i-a\right|+\sum \limits_{i:{z}_i<b}\left(1-\theta \right)\left|{z}_i-a\right|\right\} $$

The last equation provides a clear intuition for the quantile estimates. The θth quantile estimate is obtained by weighting the positive residuals by θ, while the negative residuals are weighted by the complement of θ, namely, 1 − θ.

The extension of this idea to the case of a conditional quantile is straightforward. Suppose that the θth conditional quantile of y, conditional on a K × 1 vector of regressors x = (1, x2, … , xK), is

$$ {Q}_{\theta}\left(y|x\right)={x}^{\prime }{\beta}_{\theta } $$

This implies that the model can be written as

$$ y={x}^{\prime }{\beta}_{\theta }+{u}_{\theta } $$
(1)

and, by construction, it follows that Qθ(uθ|x) = 0.

This model, which was first introduced by Koenker and Bassett (1978b), can be viewed as a location model. That is,

$$ \Pr \left(y\le \tau |x\right)={F}_{u_{\theta }}\left(\tau -{x}^{\prime }{\beta}_{\theta }|x\right) $$

where uθ has the (right continuous) conditional distribution function \( {F}_{u_{\theta }}\left(\cdot |x\right), \) satisfying Qθ(uθ| x) = 0.

Similar to the unconditional case presented above, the population parameter vector βθ is defined by

$$ {\beta}_{\theta }=\arg \min \limits_{\beta } E\left[{\rho}_{\theta}\left(y-{x}^{\prime}\beta \right)|x\right] $$

The sample analogue for the θth quantile conditional quantile is defined in a similar manner. Let (yi, xi), i =1, …, n, be an i.i.d. sample from the population. Then,\( {\widehat{\beta}}_{\theta }, \)the estimator for βθ, is defined by

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta } \frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({u}_{\theta i}\right),\mathrm{or}\ \mathrm{alternatively} $$
(2)
$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\left\{\sum \limits_{i:{y}_i\ge {x}_i^{\prime}\beta}\theta \left|{y}_i-{x}_i^{\prime}\beta \right|+\sum \limits_{i:{y}_i\ge {x}_i^{\prime}\beta}\left(1-\theta \right)\left|{y}_i-{x}_i^{\prime}\beta \right|\right\} $$

The θth quantile regression problem in (2) can be also be rewritten as

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime }b\right)\right)\left({y}_i-{x}_i^{\prime }b\right) $$

where sgn(λ) = I(λ ≥ 0) − I(λ < 0). The last equation gives, in turn, the K × 1 vector of first-order conditions (F.O.C.):

$$ \frac{1}{n}\sum \limits_{i=1}^n\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime}\widehat{\beta}\right)\right){x}_i=\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,\beta \right)=0 $$
(3)

where ψ(x, y, β) = (θ − 1/2 + 1/2sgn(y − x ′ β))x. It is straightforward to show that under the quantile restriction Qθ(uθi| xi) = 0 the moment function ψ( ) satisfies \( E\Big[\psi \left({x}_i,{y}_i,{\beta}_{\theta}\right)\equiv E\left[\psi \left({x}_i,{y}_i,\beta \right)\right]\left|{}_{\beta ={\beta}_{\theta }}=0\right. \). In the jargon of the generalized method of moments (GMM) framework, this establishes the validity of ψ( ) as a moment function. Consequently, using the methodology of Huber (1967), one can establish consistency and asymptotic normality of \( {\widehat{\beta}}_{\theta } \).

For illustration and discussion below, it is convenient to define the following: Let y denote the stacked vector of yi, i = 1, … n, and let X denote the stacked matrix of the row vectors \( {x}_i^{\prime },i=1,\dots, n \).

Linear Programming and Quantile Regression

The problem in (2) can be shown to have a linear programming (LP) representation. This feature has some important consequences from both theoretical and practical standpoints.

Let the K × 1 vector β be written as a difference of two non-negative vectors β+ and β , that is, β = β+β, for β+, β ≥ 0. Similarly let the n × 1 residuals vector u be written as a difference of two non-negative vectors u+ and u, that is, u = u+u, for u+, u ≥ 0. Furthermore, define the following quantities: A = (X, −X, In, −In), where In is an n dimensional identity matrix, \( z={\left({\beta}^{+\prime },{\beta}^{-\prime },{u}^{+\prime },{u}^{-\prime}\right)}^{\prime },c={\left({0}^{\prime },{0}^{\prime },\theta \cdot {l}^{\prime}\left(1-\theta \right)\cdot {l}^{\prime}\right)}^{\prime },0 \) is a k × 1 vector of zeros, and l is an n × 1 vector of ones.

When written in matrix notation the problem in (2) takes the familiar primal problem of LP:

$$ {\displaystyle \begin{array}{l}{\mathrm{min}}_z{c}^{\prime }z\\ {}\mathrm{subject}\kern0.24em \mathrm{to}\kern0.24em Az=y,z\ge 0.\end{array}} $$

Furthermore, the dual problem, of LP is (approximately) the same as the F.O.C. given above, namely

$$ {\displaystyle \begin{array}{l}{\mathrm{max}}_w{w}^{\prime }y\\ {}\mathrm{subject}\kern0.24em \mathrm{to}\kern0.24em {w}^{\prime }A\le {c}^{\prime }.\end{array}} $$

The duality th of LP implies that feasible solutions exist for both the primal and the dual problems, if the design matrix X is of full column rank, that is, rank (X) = K. The equilibrium th of LP guarantees then that this solution is optimal.

The LP representation of the quantile regression problem has several important implications from both computational and conceptual standpoints. First, it is guaranteed that an estimate will be obtained in a finite number of simplex iterations.

Second, the parameter estimate is robust to outliers. That is, for \( {y}_i-{x}_i^{\prime }{\widehat{\beta}}_{\theta }>0,{y}_i \) can be increased toward ∞, and for all \( {y}_i-{x}_i^{\prime }{\widehat{\beta}}_{\theta }<0,{y}_i \) can be decreased toward − ∞, without altering the solution \( {\widehat{\beta}}_{\theta } \). In other words, the only thing that matters is not the exact value of y, but rather on which side of the estimated hyperplane it lies. This is important for many economic applications in which yi might be censored, at say \( {y}_i^0 \). For example, for the right-censored model \( {\widehat{\beta}}_{\theta } \) will not be affected as long as for all i we have \( {y}_i^0-{x}_i^{\prime }{\widehat{\beta}}_{\theta }>0 \).

Equivariance Properties

The quantile regression estimator has several important equivariance properties which help facilitate the computation procedure. That is, data-sets that are based on certain transformations of the original data set lead to estimators which are simple transformations of the original estimator. Denote the set of feasible solutions to the problem defined in (2) by B(θ, y, X). Then for every \( {\widehat{\beta}}_{\theta}\equiv \widehat{\beta}\left(\theta, y,X\right)\in B\left(\theta, y,X\right) \) we have (see Koenker and Bassett 1978b: Theorem 3.2):

$$ \begin{array} {llll} & \beta \left(\theta, \lambda y,X\right)=\lambda {\beta}\left(\theta, y,X\right),\mathrm{for}\;\lambda \in \left[0,\infty \right),\\ & {\beta}\left(1-\theta, \lambda y,X\right)=\lambda {\beta}\left(\theta, y,X\right),\mathrm{for}\;\lambda \in \left(-\infty, 0\right) \\ & {\beta}\left(\theta, y+ X\gamma, \mathrm{X}\right)={\beta}\left(\theta, \mathrm{y},\mathrm{X}\right)+\gamma, \mathrm{for}\kern0.3em \gamma \in {\mathrm{R}}^K, \\ & {\beta}\left(\theta, y, XA\right)={A}^{-1}{\beta}\left(\theta, y,X\right),\mathrm{for}\;\mathrm{nonsingular}\;k\times k\;\mathrm{matrix}\;A. \end{array}$$

These properties help in reducing the number of simplex iterations (of any LP algorithm) required for obtaining \( {\widehat{\beta}}_{\theta } \). For example, suppose that \( {\widehat{\beta}}_{\theta}^0 \) is a good starting value for \( {\widehat{\beta}}_{\theta } \) (for example, the least-squares estimate from the regression of y on x, or an estimate obtained from only a small subset of the data available). Let \( {\widehat{\beta}}_{\theta}^R \) denote the quantile regression estimate from the θth quantile regression of \( {y}^R=y-X{\widehat{\beta}}_{\theta}^0 \) on x. Then \( {\widehat{\beta}}_{\theta }={\widehat{\beta}}_{\theta}^R+{\widehat{\beta}}_{\theta}^0 \). In many cases it is faster to obtain the two estimates \( {\widehat{\beta}}_{\theta}^R \) and \( {\widehat{\beta}}_{\theta}^0 \) then to estimate \( {\widehat{\beta}}_{\theta } \) directly.

Efficient Estimation

The quantile regression estimator described above is not the efficient estimator for βθ. An efficient estimator can be obtained by solving

$$ \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n{f}_{u_{\theta }}\left(0|x\right)\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime}\beta \right)\right)\left({y}_i-{x}_i^{\prime}\beta \right). $$

That is, each observation is weighted by the conditional density of its error evaluated at zero. This estimation procedure requires the use of an estimate for the unknown density \( {f}_{u_{\theta }}\left(0|x\right) \). Below we provide details about the estimation of the asymptotic covariance matrix, which, in turn, also provides information about possible estimates for \( {f}_{u_{\theta }}\left(0|x\right) \). (For a more complete discussion of this estimator, see Newey and Powell 1990.)

Interpretation of the Quantile Regression

How can the quantile’s coefficients be interpreted? Consider the partial derivative of the conditional quantile of y with respect to one of the regressors, say j, that is, ∂Qθ(y| x)/∂xj. This derivative may be interpreted as the marginal change in the θth conditional quantile due to a marginal change in the jth element of x. If x contains K distinct variables, then this derivative is given simply by βθj, the coefficient on the jth variable. It is important to note that one should be cautious not to confuse this result with the location of an individual in the conditional distribution. In general, it need not be the case that an observation that happened to be in the θth quantile of one conditional distribution will also be at the same quantile if x had changed. The above derivative reflects changes in the conditional distribution but has nothing to say about the location of an observation within the conditional distribution.

Note that an estimate for the θth conditional quantile of y given x is given by \( {Q}_{\theta}\left(y|x\right)={x}^{\prime }{\widehat{\beta}}_{\theta } \). Hence, if one were to vary θ between 0 and 1 and estimate a different quantile regression estimate for each θ, one can trace the entire conditional distribution of y, conditional on x.

Large Sample Properties of \( {\widehat{\boldsymbol{\beta}}}_{\boldsymbol{\theta}} \)

We denote the conditional distribution function of uθ by \( {F}_{u_{\theta }}\left(\cdot |x\right) \) and the corresponding density function by \( {f}_{u_{\theta }}\left(\cdot |x\right) \).

Assumption A.1

The distribution functions \( \left\{{F}_{u_{\theta i}}\left(\cdot |{x}_i\right)\right\} \) are absolutely continuous, with continuous density functions \( {f}_{u_{\theta i}}\left(\cdot |{x}_i\right) \) uniformly bounded away from 0 and ∞ at the point 0, for i = 1 , 2 , …

Assumption A.2

There exit positive definite matrices Δθ and Λ0 such that

  1. (i)

    \( {lim}_{n\to \infty}\frac{1}{n}{\sum}_{i=1}^n{x}_i{x}_i^{\prime }={\Lambda}_0; \)

  2. (ii)

    \( {lim}_{n\to \infty}\frac{1}{n}{\sum}_{i=1}^n{f}_{u_{\theta i}}\left(0|{x}_i\right){x}_i{x}_i^{\prime }={\Delta}_{\theta };\mathrm{and} \)

  3. (iii)

    \( {\mathrm{max}}_{i=1,\dots, n}\left\Vert x\right\Vert /\sqrt{n}\to 0 \)

Assumption A.3

The parameter vector βθ is in the interior of the parameter space \( {\mathcal{B}}_{\theta } \).

Assumption A.1 requires that the conditional density of uθi, conditional on xi, be bounded and that there be no mass point at the conditional θth quantile at which βθ is estimated.

Assumptions A.2 and A.3 provide regularity conditions very similar to those used for the usual least-squares estimator. Assumptions A.1 and A.2 are sufficient for establishing that \( {\widehat{\beta}}_{\theta}\to {\beta}_{\theta } \) as n → ∞, while Assumption A.3 is needed in addition for establishing the asymptotic normality of \( {\widehat{\beta}}_{\theta } \) in the following theorem.

Theorem 1

Under Assumptions A.1, A.2, and A.3

  1. (i)

    \( \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,\theta \left(1-\theta \right){\Delta}_{\theta}^{-1}{\Lambda}_0{\Delta}_{\theta}^{-1}\right); \)

  2. (ii)

    if in addition \( {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0) \) with probability 1, then

\( \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,{w}_{\theta}^2{\Lambda}_0^{-1}\right) \), where \( {w}_{\theta}^2=\theta \left(1-\theta \right){f}_{u_{\theta}}^2(0) \).

The result in (i) uses the fact that the (yi,xi) are independent, but need not be identically distributed. This is the case when \( {f}_{u_{\theta }}\left(\cdot |x\right) \) depends on x, as is the case, for example, with heteroskedasticity. The result in (ii) simplifies the result in (i) when \( \left({y}_i,{x}_i^{\prime}\right) \) are i.i.d.

Estimation of the Asymptotic Covariance Matrix

Several estimators for the asymptotic covariance matrix are readily available. Some of the estimators are valid under Theorem 1(i), while others are valid only under the independence assumption of Theorem 1(ii). In the following, we refer to the former as the general case, while the latter is referred to as the i.i.d. case. Note that under either cases Λ0 can be easily estimated by its sample analogue, namely,

$$ \widehat{\Lambda}={n}^{-1}{\sum}_{i=1}^n{x}_i{x}_i^{\prime } $$

The i.i.d. Case

In this case the problem centers around estimating \( {\omega}_{\theta}^2 \), or more specifically around estimating \( 1/{f}_{u_{\theta }}(0) \). Let \( {\widehat{u}}_{\theta (1)},\dots, {\widehat{u}}_{\theta (n)} \) be the ordered residuals from the θth quantile regression.

Order estimator: Following Siddiqui (1960), an estimator for \( 1/{f}_{u_{\theta}}^2(0) \) is provided by

$$ \frac{1}{{\widehat{f}}_{u_{\theta}}^2(0)}=\frac{{\left({\widehat{u}}_{\theta \left(\left[n\left(\theta +{h}_n\right)\right]\right)}-{\widehat{u}}_{\theta \left(\left[n\left(\theta +{h}_n\right)\right]\right)}\right)}^2}{4{h}_n}, $$

for some bandwidth hn = op(1). Bofinger (1975) provides an optimal choice for the bandwidth, that minimizes the mean squared error, based on the normal approximation for the true \( {f}_{u_{\theta }}\left(\cdot \right): \)

$$ {h}_n={n}^{-1/5}{\left(\frac{4.5{\varphi}^4\left({\Phi}^{-1}\left(\theta \right)\right)}{{\left[2{\Phi}^{-1}\left(\theta \right)\Big){}^2+1\right]}^2}\right)}^{1/5}, $$

where Φ and φ denote the distribution function and density function of a standard normal variable, respectively.

Kernel estimator

The density \( {f}_{u_{\theta }}(0) \) can be estimated directly by

$$ {\widehat{f}}_{u_{\theta }}(0)={\left({c}_nn\right)}^{-1}\sum \limits_{i=1}^n\kern0.24em \kappa \left({\widehat{u}}_{\theta i}\right), $$

where κ( ) is some kernel function κ( ) and cn = op(1) is the kernel bandwidth. It can be optimally chosen using a variety of cross-validation methods (for example, least-squares, log likelihood, and so on).

Bootstrap estimator for\( {\omega}_{\theta}^2 \): This estimator relies on bootstrapping the residual series \( {\widehat{u}}_{\theta i},i=1,\dots, n. \) Specifically, one can obtain B bootstrap estimates for qθ, the θ’s quantile of uθ, say \( {\widehat{q}}_{\theta 1}^{\ast },\dots, {\widehat{q}}_{\theta B}^{\ast } \), from B bootstrap samples drawn from the empirical distribution \( {\widehat{F}}_{u_{\theta }}. \) An estimator for \( {\omega}_{\theta}^2 \) is obtained then by

$$ {\widehat{\omega}}_{\theta}^2=\frac{n}{B}\sum \limits_{j=1}^B{\left({q}_{\theta j}^{\ast }-{\overline{q}}_{\theta}^{\ast}\right)}^2 $$

where \( {\overline{q}}_{\theta}^{\ast }=\frac{1}{B}{\sum}_{j=1}^B{\widehat{q}}_{\theta j}^{\ast } \).

The General Case

There are several alternative estimators for the general case. Here we provide two possible estimators that have been proven accurate in a variety of Monte Carlo studies (for example, Buchinsky 1995).

Kernel estimator

Powell (1986) considered the following kernel estimator for Δθ

$$ {\widehat{\Delta}}_{\theta }={\left({c}_nn\right)}^{-1}\sum \limits_{i=1}^n\;\kappa \left({\widehat{u}}_{\theta i}\right){x}_i{x}_i^{\prime } $$

where κ(⋅) is some kernel function and cn = op(1) is the kernel bandwidth. Note that the top left-hand element of the matrix \( {\widehat{\Delta}}_{\theta } \) is an estimate of the density\( {f}_{u_{\theta }}(0). \)Hence, the same cross-validation methods discussed before can be used to optimally choose cn.

Design matrix bootstrap estimators

There are several alternative ways for employing the bootstrap method of Efron (1979). The most general method is what is termed the design matrix bootstrapping, whereby one re-samples from the joint distribution of (y, x). Specifically, let \( \left({y}_i^{\ast },{x}_i^{\ast}\right) \), i = 1 , … , n be a randomly drawn sample from the empirical distribution of (x,y), denoted \( {\widehat{F}}_{xy} \). Let \( {\widehat{\beta}}_{\theta}^{\ast } \) denote the quantile regression estimate based on the bootstrap sample. If we repeat this process B times, then an estimate for \( {V}_{\theta }=\theta \left(1-\theta \right){\Delta}_{\theta}^{-1}{\Lambda}_0{\Delta}_{\theta}^{-1} \) is given by

$$ {\widehat{V}}_{\theta }=\frac{n}{B}\sum \limits_{j=1}^B\left({\widehat{\beta}}_{\theta j}^{\ast }-{\overline{\beta}}_{\theta}^{\ast}\right){\left({\widehat{\beta}}_{\theta j}^{\ast }-{\overline{\beta}}_{\theta}^{\ast}\right)}^{\prime } $$

where \( {\overline{\beta}}_{\theta}^{\ast }=\frac{1}{B}{\sum}_{j=1}^B{\widehat{\beta}}_{\theta j}^{\ast } \). The estimate \( {\widehat{V}}_{\theta } \) is a consistent estimator for Vθ in the sense that the conditional distribution of \( \sqrt{n}\left({\widehat{\beta}}_{\theta}^{\ast }-{\widehat{\beta}}_{\theta}\right) \), conditional on the data, weakly converges to the unconditional distribution of \( \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right) \).

One important caveat about bootstrapping is in order. If one already uses the bootstrap method, it can be used more efficiently and effectively, taking advantage of the higher-order refinement properties of the method. For example, one can directly construct confidence intervals, test statistics, and so forth, based on the bootstrap estimates without having to first compute an estimate for Vθ. The number of bootstrap repetitions required for the particular application may be different. Nevertheless, the exact number of repetitions can be computed using the method proposed by Andrews and Buchinsky (2000).

Set of Quantile Regressions

The model presented in (1) considered only the estimation for a single quantile θ. In practice one would like to estimate several quantile regressions at distinct points of the conditional distribution of the dependent variable. This section outlines the estimation of a finite sequence of quantile regressions and provides its asymptotic distribution.

Estimation and Large Sample Properties

Consider the model given in (1) (dropping the i subscript) for p alternative θ’s:

$$ {\displaystyle \begin{array}{l}y={x}^{\prime }{\beta}_{\theta_j}+{u}_{\theta_j}\;\mathrm{where}\\ {}{Q}_{\theta_j}\left({u}_{\theta_j}|x\right)=0,\end{array}} $$

for j = 1 , … , p. Without loss of generality assume that 0 < θ1 < θ2 < ⋯ < θp < 1. Estimating the p quantile regressions amounts to running p separate regressions for θ1 through θp. Let the stacked vector of \( {\beta}^{'}\mathrm{s}\;{\beta}_{\theta }={\left({\beta}_{\theta 1}^{\prime },\dots, {\beta}_{\theta_p}^{\prime}\right)}^{\prime } \) denote the population’s true parameter vector and let \( {\widehat{\beta}}_{\theta }={\left({\widehat{\beta}}_{\theta_1}^{\prime },\dots, {\widehat{\beta}}_{\theta_p}^{\prime}\right)}^{\prime } \) denote its corresponding estimate.

Theorem 2

under Assumptions A.1, A.2, and A.3

  1. (i)

    \( {\displaystyle \begin{array}{l}\sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}\kern0.24em N\left(0,{\Lambda}_{\theta}\right), where\;{\Lambda}_{\theta }={\left\{{\Lambda}_{\theta_{jk}}\right\}}_{j,k=1,\dots, p'}\ \mathrm{and}\\ {}{\Lambda}_{\theta_{jk}}=\left(\min \left\{{\theta}_j,{\theta}_k\right\}-{\theta}_j{\theta}_k\right)\;{\Delta}_{\theta_j}^{-1}{\Lambda}_0{\Delta}_{\theta_k}^{-1};\mathrm{if}\ \mathrm{inaddition}\ {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0)\;\mathrm{for}\end{array}} \)

  2. (ii)

    \( {\displaystyle \begin{array}{l}j=1,\dots, p\;\mathrm{with}\ \mathrm{probability}\;1,\mathrm{then}\;\sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,{\Lambda}_{\theta}\right),\mathrm{where}\\ {}{\Lambda}_{\theta }={\Omega}_{\theta}\otimes {\Lambda}_0^{-1}\;\mathrm{and}\kern0.19em {\Omega}_{\theta_{jk}}=\left[\min \left\{{\theta}_j,{\theta}_k\right\}-{\theta}_j{\theta}_k\right]/\left[{f}_{u_{\theta_j}}(0){f}_{u_{\theta_k}}(0)\right]\end{array}} \)

Crossing of Quantiles

Note that the estimated conditional quantiles, conditional on x, are given by \( {x}^{\prime }{\widehat{\beta}}_{\theta_1},\dots, {x}^{\prime }{\widehat{\beta}}_{\theta_p} \). Since the estimates \( {\widehat{\beta}}_{\theta_j}\left(j=1,\dots, p\right) \) for the p quantiles are obtained from separate quantile regressions, it is possible that for some vector x0\( {x}_0^{\prime }{\widehat{\beta}}_{\theta_j}>{x}_0^{\prime }{\widehat{\beta}}_{\theta_k} \)even though θj < θk, that is, conditional quantiles may cross each other. This may not be of any practical consequence, since there may not be such a vector within the relevant range of plausible x’s. Nevertheless, in any empirical application these potential crossing need to be examined.

Testing for Equality of Slope Coefficients

Under the i.i.d. assumption the p coefficient vectors \( {\beta}_{\theta_1},\dots, {\beta}_{\theta_p} \) should be the same, except for the intercept coefficients. There are a number of ways for testing the null hypothesis of i.i.d. errors. Only two testing procedures are provided here. For other alternative methods see Koenker (2005).

Wald-Type Testing

This testing procedure is based on the optimal minimum distance (MD) estimator under the null hypothesis. Denote the parameter vector under the null by \( {\beta}_{\theta}^R \) and note that \( {\beta}_{\theta}^R={\left({\beta}_{\theta_11},\dots, {\beta}_{\theta_p1},{\beta}_2,\dots, {\beta}_k\right)}^{\prime } \) is a (p + K − 1) × 1, with p distinct intercepts \( {\beta}_{\theta_11},\dots, {\beta}_{\theta_p1} \), and k − 1 common slope parameters β2, … , βk.

An optimal estimate for the restricted coefficient vector \( {\beta}_{\theta}^R \) is defined by

$$ {\widehat{\beta}}_{\theta}^R=\arg \min \limits_{\beta^R} {\left({\widehat{\beta}}_{\theta }-R{\beta}^R\right)}^{\hbox{'}}{\widehat{V}}_{\theta}^{-1}\left({\widehat{\beta}}_{\theta }-R{\beta}^R\right) $$

Where \( {\widehat{V}}_{\theta } \) is a consistent estimate for the covariance matrix of \( {\widehat{\beta}}_{\theta }, \)the unrestricted parameter estimate from the p quantile regressions, estimated under the null (that is, under Theorem 2(ii). The matrix R is simply a (p + K − 1) × pk restriction matrix which imposes the restrictions implied by the i.i.d. assumption. A test statistic for equality of the slope coefficients is provided then by

$$ {W}_n=n{\left({\widehat{\beta}}_{\theta }-R{\widehat{\beta}}_{\theta}^R\right)}^{\prime }{\widehat{V}}_{\theta}^{-1}\left({\widehat{\beta}}_{\theta }-R{\widehat{\beta}}_{\theta}^R\right) $$

Under the null hypothesis \( {W}_n\overset{D}{\to }{\upchi}^2\left( pK-p-K+1\right) \) as n → ∞ so, the null hypothesis is rejected if \( {W}_n>{\upchi}_{1-\alpha}^2\left( pK-p-K+1\right), \)where \( {\upchi}_{1-\alpha}^2(m) \) denotes the 1 − α quantile of a χ2-distribution with m degrees of freedom.

GMM-Type Testing

An alternative testing procedure can be applied using Hansen’s (1982) GMM method. Define a moment function ψ(x, y, β) by stacking the p individual moment functions as defined in (3). While this moment function is a pk × 1 vector, under the null there are only p + K − 1 parameters to be estimated. Hansen’s GMM framework provides an estimator for \( {\beta}_{\theta}^R \), say \( {\widehat{\beta}}_{\theta}^R \), defined by

$$ {\widehat{\beta}}_{\theta}^R=\arg \min \limits_{b}{\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,b\right)\right)}^{\prime }{A}^{-1}\left(\frac{1}{n}\sum_{i=1}^n\psi \left({x}_i,{y}_i,b\right)\right), $$
(4)

An efficient estimator can be obtained if A is chosen so that \( A\overset{p}{\to }E\Big[\psi \left(x,y,{\beta}_{\theta}\right)\psi {\left(x,y,{\beta}_{\theta}\right)}^{\prime }. \)as n → ∞. This framework provides us with a straightforward testing procedure. Under the null hypothesis

$$ n{\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,{\widehat{\beta}}_{\theta}^R\right)\right)}^{\prime }{A}^{-1}\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,{\widehat{\beta}}_{\theta}^R\right)\right)\overset{D}{\to }{\upchi}^2\left( pK-p-K+1\right), $$

as n → ∞.

Note that, because of the linearity of the conditional quantiles, the GMM testing provides a test statistics which is (asymptotically) equivalent to that provided by the MD testing.

Censored Quantile Regression

An important extension to the quantile regression model was suggested by Powell (1984, 1986). This extension considers the case in which some of the observations are censored. This model is essentially a semiparametric extension to the well known ‘Tobit’ model and can be written as

$$ {y}_i=\min \left\{{y}_i^0,{x}_i^{\prime }{\beta}_{\theta }+{u}_{\theta i}\right\} $$

for i = 1 , … , n, where \( {y}_1^0 \) is the (known) top coding value of yi in the sample, for i = 1 , … , n. (For simplicity of presentation it will be assumed that \( {y}_i^0={y}^0 \) for all i = 1 , … , n.)

This model can be written as a latent variable model. That is, we have \( {y}_i^{\ast }={x}_i^{\prime }{\beta}_{\theta }+{u}_{\theta i} \), where Qθ(uθi| xi) = 0 and \( {y}_i={y}_i^{\ast }I\left({y}_i^{\ast}\le {y}^0\right) \). It is easy to see that the observed conditional θth quantile of yi, conditional on xi, is given by

\( {Q}_{\theta}\left({y}_i|{x}_i\right)=\min \left\{{y}^0,{x}_i^{\prime }{\beta}_{\theta}\right\} \).

Hence, Powell suggested the following estimator for βθ

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({y}_i-\min \left\{{y}^0,{x}_i^{\prime}\beta \right\}\right) $$
(5)

where ρθ(λ) is the same check function as defined above. Note that in order to obtain a consistent estimator of βθ it is necessary that \( {x}_i^{\prime }{\beta}_{\theta }<{y}^0 \) for at least a fraction of the sample. Intuitively, the larger the fraction, the more precise the estimator will be.

Powell (1986) showed that, under certain regularity conditions, similar to those established by Huber (1967), the estimator is asymptotically normal. That is, \( \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right)\overset{D}{\to }N\left(0,{V}_{\theta}^C\right) \) as n → ∞, where

$$ {\displaystyle \begin{array}{l}{V}_{\theta}^{\mathrm{C}}=\theta \left(1-\theta \right){\Delta}_{\mathrm{C}\theta}^{-1}{\Lambda}_{\mathrm{C}\theta }{\Delta}_{\mathrm{C}\theta}^{-1},\\ {}{\Delta}_{\mathrm{C}\theta }=E\left[{f}_{u_{\theta }}\left(0|x\right)I\left({x}^{\prime }{\beta}_{\theta}\le {y}^0\right){xx}^{\prime}\right],\mathrm{and}\\ {}{\Lambda}_{\mathrm{C}\theta }=E\left[I\left({x}^{\prime }{\beta}_{\theta }<{y}^0\right){xx}^{\prime}\right].\end{array}} $$

As in the basic quantile regression model, if \( {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0) \) with probability 1, then \( {V}_{\theta}^C \) simplifies to \( {V}_{\theta}^C={\omega}_{\theta}^2{\Lambda}_{C\theta}^{-1}, \) where \( {\omega}_{\theta}^2=\theta \left(1-\theta \right){f}_{u_{\theta}}^2(0) \).

It is important to note that if \( {x}_i^{\prime }{\widehat{\beta}}_{\theta}\le {y}^0 \) for all observations, then the censored quantile regression estimate coincides with the basic quantile regression.

The simple intuition for this estimation procedure is that βθ can be estimated only from that part of the sample for which it is observed, that is, for that fraction of the sample for which \( y={y}^{\ast }={x}^{\prime }{\beta}_{\theta }+{u}_{\theta}\le {y}^0 \). As a result, we note that the asymptotic covariance is ‘adjusted’ for that fact. That is, the term I(xβθy0) is included in both Δ and Δ.

A considerable drawback of the censored quantile regression model is that it does not have the attractive LP representation and the objective function is not globally convex in β.

Concluding Remarks

The main goal of this article is to provide the basic structure of the quantile regression model. Versions of this model have been widely used in the empirical literature in a variety of situations not covered by this article. Furthermore, there have been substantial advancements in the theoretical literature as well. This literature includes quantile regression for nonlinear models, time-series models, and others. There are also a number of empirical studies that have used quantile regression extensively, in a variety of data configurations and economic contexts. For a brilliant in-depth exposition of a wide variety of topics related to quantile regression, interested readers should refer to Koenker (2005).

See Also