Quantile Regression

Buchinsky, Moshe

doi:10.1057/978-1-349-95189-5_2795

Moshe Buchinsky¹

87 Accesses

Abstract

A semiparametric technique that has been gaining considerable popularity in economics, the quantile regression model has a number of attractive features. For example, it can be used to characterize the entire conditional distribution of a dependent variable given a set of regressors; it has a linear programming representation which makes estimation easy; and it gives a robust measure of location. Concentrating on cross-section applications, this article presents the basic structure of the quantile regression model, highlights the most important features, and provides the elementary tools for using quantile regressions in empirical applications.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

On multivariate quantile regression analysis

Article 20 November 2017

Fast algorithms for the quantile regression process

Article 12 July 2020

A note on estimating the bent line quantile regression model

Article 09 February 2017

Keywords

JEL Classifications

C14

Introduction

The quantile regression is a semiparametric technique that has been gaining considerable popularity in economics (for example, Buchinsky 1994). It was introduced by Koenker and Bassett (1978b) as an extension to ordinary quantiles in a location model. In this model, the conditional quantiles have linear forms. A wellknown special case of quantile regression is the least absolute deviation (LAD) estimator of Koenker and Bassett (1978a), which fits medians to a linear function of covariates. In an important generalization of the quantile regression model, Powell (1984, 1986) introduced the censored quantile regression model. This model is an extension of the ‘Tobit’ model and is designed to handle situations in which some of the observations on the dependent variable are censored.

The quantile regression model has some very attractive features: (a) it can be used to characterize the entire conditional distribution of a dependent variable given a set of regressors; (b) it has a linear programming representation which makes estimation easy; (c) it gives a robust measure of location; (d) typically, the quantile regression estimator is more efficient than the least squares estimator when the error term is non-normal; and (e) L-estimators, based on a linear combination of quantile estimators (for example, Portnoy and Koenker 1989) are, in general, more efficient than least squares estimators.

This article presents the basic structure of the quantile regression model. It highlights the most important features and provides the elementary tools for using quantile regressions in empirical applications. The article concentrates on cross-section applications, where the observations are assumed to be independently and identically distributed (i.i.d.).

The Model

Definitions and Estimator

Any real-valued random variable z is completely characterized by its (right continuous) conditional distribution function F(z) = Pr (z ≤ a). For any 0 < θ < 1, the quantity Q_θ(z) ≡ F⁻¹(θ) = inf {a : F(a) ≥ θ} is called the θth quantile of z. This quantile is obtained as a solution to a minimization problem of a particular objective function, the check function, given by ρ_θ(λ) = λ(θ − I(λ < 0)), where I( ) denotes the usual indicator function. That is,

$$ {Q}_{\theta }(z)\equiv \arg \min \limits_{a} E\left[{\rho}_{\theta}\left(z-a\right)\right] $$

An estimate for the θth quantile of z can be obtained from i.i.d. data z_i, i=1,…n, by minimizing the sample analogue of the population objective function defined above. That is,

$$ {\widehat{Q}}_{\theta }(z)\equiv \arg \min \limits_{a} \frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({z}_i-a\right) $$

or alternatively

$$ {\widehat{Q}}_{\theta }(z)\equiv \arg \min \limits_{a} \left\{\sum \limits_{i:{z}_i\ge b}\theta \left|{z}_i-a\right|+\sum \limits_{i:{z}_i<b}\left(1-\theta \right)\left|{z}_i-a\right|\right\} $$

The last equation provides a clear intuition for the quantile estimates. The θth quantile estimate is obtained by weighting the positive residuals by θ, while the negative residuals are weighted by the complement of θ, namely, 1 − θ.

The extension of this idea to the case of a conditional quantile is straightforward. Suppose that the θth conditional quantile of y, conditional on a K × 1 vector of regressors x = (1, x₂, … , x_K), is

$$ {Q}_{\theta}\left(y|x\right)={x}^{\prime }{\beta}_{\theta } $$

This implies that the model can be written as

$$ y={x}^{\prime }{\beta}_{\theta }+{u}_{\theta } $$

(1)

and, by construction, it follows that Q_θ(u_θ|x) = 0.

This model, which was first introduced by Koenker and Bassett (1978b), can be viewed as a location model. That is,

$$ \Pr \left(y\le \tau |x\right)={F}_{u_{\theta }}\left(\tau -{x}^{\prime }{\beta}_{\theta }|x\right) $$

where u_θ has the (right continuous) conditional distribution function $ {F}_{u_{\theta }}\left(\cdot |x\right), $ satisfying Q_θ(u_θ| x) = 0.

Similar to the unconditional case presented above, the population parameter vector β_θ is defined by

$$ {\beta}_{\theta }=\arg \min \limits_{\beta } E\left[{\rho}_{\theta}\left(y-{x}^{\prime}\beta \right)|x\right] $$

The sample analogue for the θth quantile conditional quantile is defined in a similar manner. Let (y_i, x_i), i =1, …, n, be an i.i.d. sample from the population. Then,$ {\widehat{\beta}}_{\theta }, $the estimator for β_θ, is defined by

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta } \frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({u}_{\theta i}\right),\mathrm{or}\ \mathrm{alternatively} $$

(2)

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\left\{\sum \limits_{i:{y}_i\ge {x}_i^{\prime}\beta}\theta \left|{y}_i-{x}_i^{\prime}\beta \right|+\sum \limits_{i:{y}_i\ge {x}_i^{\prime}\beta}\left(1-\theta \right)\left|{y}_i-{x}_i^{\prime}\beta \right|\right\} $$

The θth quantile regression problem in (2) can be also be rewritten as

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime }b\right)\right)\left({y}_i-{x}_i^{\prime }b\right) $$

where sgn(λ) = I(λ ≥ 0) − I(λ < 0). The last equation gives, in turn, the K × 1 vector of first-order conditions (F.O.C.):

$$ \frac{1}{n}\sum \limits_{i=1}^n\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime}\widehat{\beta}\right)\right){x}_i=\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,\beta \right)=0 $$

(3)

where ψ(x, y, β) = (θ − 1/2 + 1/2sgn(y − x ′ β))x. It is straightforward to show that under the quantile restriction Q_θ(u_θi| x_i) = 0 the moment function ψ( ) satisfies $ E\Big[\psi \left({x}_i,{y}_i,{\beta}_{\theta}\right)\equiv E\left[\psi \left({x}_i,{y}_i,\beta \right)\right]\left|{}_{\beta ={\beta}_{\theta }}=0\right. $. In the jargon of the generalized method of moments (GMM) framework, this establishes the validity of ψ( ) as a moment function. Consequently, using the methodology of Huber (1967), one can establish consistency and asymptotic normality of $ {\widehat{\beta}}_{\theta } $.

For illustration and discussion below, it is convenient to define the following: Let y denote the stacked vector of y_i, i = 1, … n, and let X denote the stacked matrix of the row vectors $ {x}_i^{\prime },i=1,\dots, n $.

Linear Programming and Quantile Regression

The problem in (2) can be shown to have a linear programming (LP) representation. This feature has some important consequences from both theoretical and practical standpoints.

Let the K × 1 vector β be written as a difference of two non-negative vectors β⁺ and β⁻ , that is, β = β⁺ − β⁻, for β⁺, β⁻ ≥ 0. Similarly let the n × 1 residuals vector u be written as a difference of two non-negative vectors u⁺ and u⁻, that is, u = u⁺ − u⁻, for u⁺, u⁻ ≥ 0. Furthermore, define the following quantities: A = (X, −X, I_n, −I_n), where I_n is an n dimensional identity matrix, $ z={\left({\beta}^{+\prime },{\beta}^{-\prime },{u}^{+\prime },{u}^{-\prime}\right)}^{\prime },c={\left({0}^{\prime },{0}^{\prime },\theta \cdot {l}^{\prime}\left(1-\theta \right)\cdot {l}^{\prime}\right)}^{\prime },0 $ is a k × 1 vector of zeros, and l is an n × 1 vector of ones.

When written in matrix notation the problem in (2) takes the familiar primal problem of LP:

$$ {\displaystyle \begin{array}{l}{\mathrm{min}}_z{c}^{\prime }z\\ {}\mathrm{subject}\kern0.24em \mathrm{to}\kern0.24em Az=y,z\ge 0.\end{array}} $$

Furthermore, the dual problem, of LP is (approximately) the same as the F.O.C. given above, namely

$$ {\displaystyle \begin{array}{l}{\mathrm{max}}_w{w}^{\prime }y\\ {}\mathrm{subject}\kern0.24em \mathrm{to}\kern0.24em {w}^{\prime }A\le {c}^{\prime }.\end{array}} $$

The duality th of LP implies that feasible solutions exist for both the primal and the dual problems, if the design matrix X is of full column rank, that is, rank (X) = K. The equilibrium th of LP guarantees then that this solution is optimal.

The LP representation of the quantile regression problem has several important implications from both computational and conceptual standpoints. First, it is guaranteed that an estimate will be obtained in a finite number of simplex iterations.

Second, the parameter estimate is robust to outliers. That is, for $ {y}_i-{x}_i^{\prime }{\widehat{\beta}}_{\theta }>0,{y}_i $ can be increased toward ∞, and for all $ {y}_i-{x}_i^{\prime }{\widehat{\beta}}_{\theta }<0,{y}_i $ can be decreased toward − ∞, without altering the solution $ {\widehat{\beta}}_{\theta } $. In other words, the only thing that matters is not the exact value of y, but rather on which side of the estimated hyperplane it lies. This is important for many economic applications in which y_i might be censored, at say $ {y}_i^0 $. For example, for the right-censored model $ {\widehat{\beta}}_{\theta } $ will not be affected as long as for all i we have $ {y}_i^0-{x}_i^{\prime }{\widehat{\beta}}_{\theta }>0 $.

Equivariance Properties

The quantile regression estimator has several important equivariance properties which help facilitate the computation procedure. That is, data-sets that are based on certain transformations of the original data set lead to estimators which are simple transformations of the original estimator. Denote the set of feasible solutions to the problem defined in (2) by B(θ, y, X). Then for every $ {\widehat{\beta}}_{\theta}\equiv \widehat{\beta}\left(\theta, y,X\right)\in B\left(\theta, y,X\right) $ we have (see Koenker and Bassett 1978b: Theorem 3.2):

$$ \begin{array} {llll} & \beta \left(\theta, \lambda y,X\right)=\lambda {\beta}\left(\theta, y,X\right),\mathrm{for}\;\lambda \in \left[0,\infty \right),\\ & {\beta}\left(1-\theta, \lambda y,X\right)=\lambda {\beta}\left(\theta, y,X\right),\mathrm{for}\;\lambda \in \left(-\infty, 0\right) \\ & {\beta}\left(\theta, y+ X\gamma, \mathrm{X}\right)={\beta}\left(\theta, \mathrm{y},\mathrm{X}\right)+\gamma, \mathrm{for}\kern0.3em \gamma \in {\mathrm{R}}^K, \\ & {\beta}\left(\theta, y, XA\right)={A}^{-1}{\beta}\left(\theta, y,X\right),\mathrm{for}\;\mathrm{nonsingular}\;k\times k\;\mathrm{matrix}\;A. \end{array}$$

These properties help in reducing the number of simplex iterations (of any LP algorithm) required for obtaining $ {\widehat{\beta}}_{\theta } $. For example, suppose that $ {\widehat{\beta}}_{\theta}^0 $ is a good starting value for $ {\widehat{\beta}}_{\theta } $ (for example, the least-squares estimate from the regression of y on x, or an estimate obtained from only a small subset of the data available). Let $ {\widehat{\beta}}_{\theta}^R $ denote the quantile regression estimate from the θth quantile regression of $ {y}^R=y-X{\widehat{\beta}}_{\theta}^0 $ on x. Then $ {\widehat{\beta}}_{\theta }={\widehat{\beta}}_{\theta}^R+{\widehat{\beta}}_{\theta}^0 $. In many cases it is faster to obtain the two estimates $ {\widehat{\beta}}_{\theta}^R $ and $ {\widehat{\beta}}_{\theta}^0 $ then to estimate $ {\widehat{\beta}}_{\theta } $ directly.

Efficient Estimation

The quantile regression estimator described above is not the efficient estimator for β_θ. An efficient estimator can be obtained by solving

$$ \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n{f}_{u_{\theta }}\left(0|x\right)\left(\theta -1/2+1/2\operatorname{sgn}\left({y}_i-{x}_i^{\prime}\beta \right)\right)\left({y}_i-{x}_i^{\prime}\beta \right). $$

That is, each observation is weighted by the conditional density of its error evaluated at zero. This estimation procedure requires the use of an estimate for the unknown density $ {f}_{u_{\theta }}\left(0|x\right) $. Below we provide details about the estimation of the asymptotic covariance matrix, which, in turn, also provides information about possible estimates for $ {f}_{u_{\theta }}\left(0|x\right) $. (For a more complete discussion of this estimator, see Newey and Powell 1990.)

Interpretation of the Quantile Regression

How can the quantile’s coefficients be interpreted? Consider the partial derivative of the conditional quantile of y with respect to one of the regressors, say j, that is, ∂Q_θ(y| x)/∂x_j. This derivative may be interpreted as the marginal change in the θth conditional quantile due to a marginal change in the jth element of x. If x contains K distinct variables, then this derivative is given simply by β_θj, the coefficient on the jth variable. It is important to note that one should be cautious not to confuse this result with the location of an individual in the conditional distribution. In general, it need not be the case that an observation that happened to be in the θth quantile of one conditional distribution will also be at the same quantile if x had changed. The above derivative reflects changes in the conditional distribution but has nothing to say about the location of an observation within the conditional distribution.

Note that an estimate for the θth conditional quantile of y given x is given by $ {Q}_{\theta}\left(y|x\right)={x}^{\prime }{\widehat{\beta}}_{\theta } $. Hence, if one were to vary θ between 0 and 1 and estimate a different quantile regression estimate for each θ, one can trace the entire conditional distribution of y, conditional on x.

Large Sample Properties of $ {\widehat{\boldsymbol{\beta}}}_{\boldsymbol{\theta}} $

We denote the conditional distribution function of u_θ by $ {F}_{u_{\theta }}\left(\cdot |x\right) $ and the corresponding density function by $ {f}_{u_{\theta }}\left(\cdot |x\right) $.

Assumption A.1

The distribution functions $ \left\{{F}_{u_{\theta i}}\left(\cdot |{x}_i\right)\right\} $ are absolutely continuous, with continuous density functions $ {f}_{u_{\theta i}}\left(\cdot |{x}_i\right) $ uniformly bounded away from 0 and ∞ at the point 0, for i = 1 , 2 , …

Assumption A.2

There exit positive definite matrices Δ_θ and Λ₀ such that

(i)
$ {lim}_{n\to \infty}\frac{1}{n}{\sum}_{i=1}^n{x}_i{x}_i^{\prime }={\Lambda}_0; $
(ii)
$ {lim}_{n\to \infty}\frac{1}{n}{\sum}_{i=1}^n{f}_{u_{\theta i}}\left(0|{x}_i\right){x}_i{x}_i^{\prime }={\Delta}_{\theta };\mathrm{and} $
(iii)
$ {\mathrm{max}}_{i=1,\dots, n}\left\Vert x\right\Vert /\sqrt{n}\to 0 $

Assumption A.3

The parameter vector β_θ is in the interior of the parameter space $ {\mathcal{B}}_{\theta } $.

Assumption A.1 requires that the conditional density of u_θi, conditional on x_i, be bounded and that there be no mass point at the conditional θth quantile at which β_θ is estimated.

Assumptions A.2 and A.3 provide regularity conditions very similar to those used for the usual least-squares estimator. Assumptions A.1 and A.2 are sufficient for establishing that $ {\widehat{\beta}}_{\theta}\to {\beta}_{\theta } $ as n → ∞, while Assumption A.3 is needed in addition for establishing the asymptotic normality of $ {\widehat{\beta}}_{\theta } $ in the following theorem.

Theorem 1

Under Assumptions A.1, A.2, and A.3

(i)
$ \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,\theta \left(1-\theta \right){\Delta}_{\theta}^{-1}{\Lambda}_0{\Delta}_{\theta}^{-1}\right); $
(ii)
if in addition $ {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0) $ with probability 1, then

$ \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,{w}_{\theta}^2{\Lambda}_0^{-1}\right) $, where $ {w}_{\theta}^2=\theta \left(1-\theta \right){f}_{u_{\theta}}^2(0) $.

The result in (i) uses the fact that the (y_i,x_i) are independent, but need not be identically distributed. This is the case when $ {f}_{u_{\theta }}\left(\cdot |x\right) $ depends on x, as is the case, for example, with heteroskedasticity. The result in (ii) simplifies the result in (i) when $ \left({y}_i,{x}_i^{\prime}\right) $ are i.i.d.

Estimation of the Asymptotic Covariance Matrix

Several estimators for the asymptotic covariance matrix are readily available. Some of the estimators are valid under Theorem 1(i), while others are valid only under the independence assumption of Theorem 1(ii). In the following, we refer to the former as the general case, while the latter is referred to as the i.i.d. case. Note that under either cases Λ₀ can be easily estimated by its sample analogue, namely,

$$ \widehat{\Lambda}={n}^{-1}{\sum}_{i=1}^n{x}_i{x}_i^{\prime } $$

The i.i.d. Case

In this case the problem centers around estimating $ {\omega}_{\theta}^2 $, or more specifically around estimating $ 1/{f}_{u_{\theta }}(0) $. Let $ {\widehat{u}}_{\theta (1)},\dots, {\widehat{u}}_{\theta (n)} $ be the ordered residuals from the θth quantile regression.

Order estimator: Following Siddiqui (1960), an estimator for $ 1/{f}_{u_{\theta}}^2(0) $ is provided by

$$ \frac{1}{{\widehat{f}}_{u_{\theta}}^2(0)}=\frac{{\left({\widehat{u}}_{\theta \left(\left[n\left(\theta +{h}_n\right)\right]\right)}-{\widehat{u}}_{\theta \left(\left[n\left(\theta +{h}_n\right)\right]\right)}\right)}^2}{4{h}_n}, $$

for some bandwidth h_n = o_p(1). Bofinger (1975) provides an optimal choice for the bandwidth, that minimizes the mean squared error, based on the normal approximation for the true $ {f}_{u_{\theta }}\left(\cdot \right): $

$$ {h}_n={n}^{-1/5}{\left(\frac{4.5{\varphi}^4\left({\Phi}^{-1}\left(\theta \right)\right)}{{\left[2{\Phi}^{-1}\left(\theta \right)\Big){}^2+1\right]}^2}\right)}^{1/5}, $$

where Φ and φ denote the distribution function and density function of a standard normal variable, respectively.

Kernel estimator

The density $ {f}_{u_{\theta }}(0) $ can be estimated directly by

$$ {\widehat{f}}_{u_{\theta }}(0)={\left({c}_nn\right)}^{-1}\sum \limits_{i=1}^n\kern0.24em \kappa \left({\widehat{u}}_{\theta i}\right), $$

where κ( ) is some kernel function κ( ) and c_n = o_p(1) is the kernel bandwidth. It can be optimally chosen using a variety of cross-validation methods (for example, least-squares, log likelihood, and so on).

Bootstrap estimator for$ {\omega}_{\theta}^2 $: This estimator relies on bootstrapping the residual series $ {\widehat{u}}_{\theta i},i=1,\dots, n. $ Specifically, one can obtain B bootstrap estimates for q_θ, the θ’s quantile of u_θ, say $ {\widehat{q}}_{\theta 1}^{\ast },\dots, {\widehat{q}}_{\theta B}^{\ast } $, from B bootstrap samples drawn from the empirical distribution $ {\widehat{F}}_{u_{\theta }}. $ An estimator for $ {\omega}_{\theta}^2 $ is obtained then by

$$ {\widehat{\omega}}_{\theta}^2=\frac{n}{B}\sum \limits_{j=1}^B{\left({q}_{\theta j}^{\ast }-{\overline{q}}_{\theta}^{\ast}\right)}^2 $$

where $ {\overline{q}}_{\theta}^{\ast }=\frac{1}{B}{\sum}_{j=1}^B{\widehat{q}}_{\theta j}^{\ast } $.

The General Case

There are several alternative estimators for the general case. Here we provide two possible estimators that have been proven accurate in a variety of Monte Carlo studies (for example, Buchinsky 1995).

Kernel estimator

Powell (1986) considered the following kernel estimator for Δ_θ

$$ {\widehat{\Delta}}_{\theta }={\left({c}_nn\right)}^{-1}\sum \limits_{i=1}^n\;\kappa \left({\widehat{u}}_{\theta i}\right){x}_i{x}_i^{\prime } $$

where κ(⋅) is some kernel function and c_n = o_p(1) is the kernel bandwidth. Note that the top left-hand element of the matrix $ {\widehat{\Delta}}_{\theta } $ is an estimate of the density$ {f}_{u_{\theta }}(0). $Hence, the same cross-validation methods discussed before can be used to optimally choose c_n.

Design matrix bootstrap estimators

There are several alternative ways for employing the bootstrap method of Efron (1979). The most general method is what is termed the design matrix bootstrapping, whereby one re-samples from the joint distribution of (y, x). Specifically, let $ \left({y}_i^{\ast },{x}_i^{\ast}\right) $, i = 1 , … , n be a randomly drawn sample from the empirical distribution of (x,y), denoted $ {\widehat{F}}_{xy} $. Let $ {\widehat{\beta}}_{\theta}^{\ast } $ denote the quantile regression estimate based on the bootstrap sample. If we repeat this process B times, then an estimate for $ {V}_{\theta }=\theta \left(1-\theta \right){\Delta}_{\theta}^{-1}{\Lambda}_0{\Delta}_{\theta}^{-1} $ is given by

$$ {\widehat{V}}_{\theta }=\frac{n}{B}\sum \limits_{j=1}^B\left({\widehat{\beta}}_{\theta j}^{\ast }-{\overline{\beta}}_{\theta}^{\ast}\right){\left({\widehat{\beta}}_{\theta j}^{\ast }-{\overline{\beta}}_{\theta}^{\ast}\right)}^{\prime } $$

where $ {\overline{\beta}}_{\theta}^{\ast }=\frac{1}{B}{\sum}_{j=1}^B{\widehat{\beta}}_{\theta j}^{\ast } $. The estimate $ {\widehat{V}}_{\theta } $ is a consistent estimator for Vθ in the sense that the conditional distribution of $ \sqrt{n}\left({\widehat{\beta}}_{\theta}^{\ast }-{\widehat{\beta}}_{\theta}\right) $, conditional on the data, weakly converges to the unconditional distribution of $ \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right) $.

One important caveat about bootstrapping is in order. If one already uses the bootstrap method, it can be used more efficiently and effectively, taking advantage of the higher-order refinement properties of the method. For example, one can directly construct confidence intervals, test statistics, and so forth, based on the bootstrap estimates without having to first compute an estimate for V_θ. The number of bootstrap repetitions required for the particular application may be different. Nevertheless, the exact number of repetitions can be computed using the method proposed by Andrews and Buchinsky (2000).

Set of Quantile Regressions

The model presented in (1) considered only the estimation for a single quantile θ. In practice one would like to estimate several quantile regressions at distinct points of the conditional distribution of the dependent variable. This section outlines the estimation of a finite sequence of quantile regressions and provides its asymptotic distribution.

Estimation and Large Sample Properties

Consider the model given in (1) (dropping the i subscript) for p alternative θ’s:

$$ {\displaystyle \begin{array}{l}y={x}^{\prime }{\beta}_{\theta_j}+{u}_{\theta_j}\;\mathrm{where}\\ {}{Q}_{\theta_j}\left({u}_{\theta_j}|x\right)=0,\end{array}} $$

for j = 1 , … , p. Without loss of generality assume that 0 < θ₁ < θ₂ < ⋯ < θ_p < 1. Estimating the p quantile regressions amounts to running p separate regressions for θ₁ through θ_p. Let the stacked vector of $ {\beta}^{'}\mathrm{s}\;{\beta}_{\theta }={\left({\beta}_{\theta 1}^{\prime },\dots, {\beta}_{\theta_p}^{\prime}\right)}^{\prime } $ denote the population’s true parameter vector and let $ {\widehat{\beta}}_{\theta }={\left({\widehat{\beta}}_{\theta_1}^{\prime },\dots, {\widehat{\beta}}_{\theta_p}^{\prime}\right)}^{\prime } $ denote its corresponding estimate.

Theorem 2

under Assumptions A.1, A.2, and A.3

(i)
$ {\displaystyle \begin{array}{l}\sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}\kern0.24em N\left(0,{\Lambda}_{\theta}\right), where\;{\Lambda}_{\theta }={\left\{{\Lambda}_{\theta_{jk}}\right\}}_{j,k=1,\dots, p'}\ \mathrm{and}\\ {}{\Lambda}_{\theta_{jk}}=\left(\min \left\{{\theta}_j,{\theta}_k\right\}-{\theta}_j{\theta}_k\right)\;{\Delta}_{\theta_j}^{-1}{\Lambda}_0{\Delta}_{\theta_k}^{-1};\mathrm{if}\ \mathrm{inaddition}\ {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0)\;\mathrm{for}\end{array}} $
(ii)
$ {\displaystyle \begin{array}{l}j=1,\dots, p\;\mathrm{with}\ \mathrm{probability}\;1,\mathrm{then}\;\sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right){\to}^{\mathrm{L}}N\left(0,{\Lambda}_{\theta}\right),\mathrm{where}\\ {}{\Lambda}_{\theta }={\Omega}_{\theta}\otimes {\Lambda}_0^{-1}\;\mathrm{and}\kern0.19em {\Omega}_{\theta_{jk}}=\left[\min \left\{{\theta}_j,{\theta}_k\right\}-{\theta}_j{\theta}_k\right]/\left[{f}_{u_{\theta_j}}(0){f}_{u_{\theta_k}}(0)\right]\end{array}} $

Crossing of Quantiles

Note that the estimated conditional quantiles, conditional on x, are given by $ {x}^{\prime }{\widehat{\beta}}_{\theta_1},\dots, {x}^{\prime }{\widehat{\beta}}_{\theta_p} $. Since the estimates $ {\widehat{\beta}}_{\theta_j}\left(j=1,\dots, p\right) $ for the p quantiles are obtained from separate quantile regressions, it is possible that for some vector x₀$ {x}_0^{\prime }{\widehat{\beta}}_{\theta_j}>{x}_0^{\prime }{\widehat{\beta}}_{\theta_k} $even though θ_j < θ_k, that is, conditional quantiles may cross each other. This may not be of any practical consequence, since there may not be such a vector within the relevant range of plausible x’s. Nevertheless, in any empirical application these potential crossing need to be examined.

Testing for Equality of Slope Coefficients

Under the i.i.d. assumption the p coefficient vectors $ {\beta}_{\theta_1},\dots, {\beta}_{\theta_p} $ should be the same, except for the intercept coefficients. There are a number of ways for testing the null hypothesis of i.i.d. errors. Only two testing procedures are provided here. For other alternative methods see Koenker (2005).

Wald-Type Testing

This testing procedure is based on the optimal minimum distance (MD) estimator under the null hypothesis. Denote the parameter vector under the null by $ {\beta}_{\theta}^R $ and note that $ {\beta}_{\theta}^R={\left({\beta}_{\theta_11},\dots, {\beta}_{\theta_p1},{\beta}_2,\dots, {\beta}_k\right)}^{\prime } $ is a (p + K − 1) × 1, with p distinct intercepts $ {\beta}_{\theta_11},\dots, {\beta}_{\theta_p1} $, and k − 1 common slope parameters β₂, … , β_k.

An optimal estimate for the restricted coefficient vector $ {\beta}_{\theta}^R $ is defined by

$$ {\widehat{\beta}}_{\theta}^R=\arg \min \limits_{\beta^R} {\left({\widehat{\beta}}_{\theta }-R{\beta}^R\right)}^{\hbox{'}}{\widehat{V}}_{\theta}^{-1}\left({\widehat{\beta}}_{\theta }-R{\beta}^R\right) $$

Where $ {\widehat{V}}_{\theta } $ is a consistent estimate for the covariance matrix of $ {\widehat{\beta}}_{\theta }, $the unrestricted parameter estimate from the p quantile regressions, estimated under the null (that is, under Theorem 2(ii). The matrix R is simply a (p + K − 1) × p ⋅ k restriction matrix which imposes the restrictions implied by the i.i.d. assumption. A test statistic for equality of the slope coefficients is provided then by

$$ {W}_n=n{\left({\widehat{\beta}}_{\theta }-R{\widehat{\beta}}_{\theta}^R\right)}^{\prime }{\widehat{V}}_{\theta}^{-1}\left({\widehat{\beta}}_{\theta }-R{\widehat{\beta}}_{\theta}^R\right) $$

Under the null hypothesis $ {W}_n\overset{D}{\to }{\upchi}^2\left( pK-p-K+1\right) $ as n → ∞ so, the null hypothesis is rejected if $ {W}_n>{\upchi}_{1-\alpha}^2\left( pK-p-K+1\right), $where $ {\upchi}_{1-\alpha}^2(m) $ denotes the 1 − α quantile of a χ²-distribution with m degrees of freedom.

GMM-Type Testing

An alternative testing procedure can be applied using Hansen’s (1982) GMM method. Define a moment function ψ(x, y, β) by stacking the p individual moment functions as defined in (3). While this moment function is a pk × 1 vector, under the null there are only p + K − 1 parameters to be estimated. Hansen’s GMM framework provides an estimator for $ {\beta}_{\theta}^R $, say $ {\widehat{\beta}}_{\theta}^R $, defined by

$$ {\widehat{\beta}}_{\theta}^R=\arg \min \limits_{b}{\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,b\right)\right)}^{\prime }{A}^{-1}\left(\frac{1}{n}\sum_{i=1}^n\psi \left({x}_i,{y}_i,b\right)\right), $$

(4)

An efficient estimator can be obtained if A is chosen so that $ A\overset{p}{\to }E\Big[\psi \left(x,y,{\beta}_{\theta}\right)\psi {\left(x,y,{\beta}_{\theta}\right)}^{\prime }. $as n → ∞. This framework provides us with a straightforward testing procedure. Under the null hypothesis

$$ n{\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,{\widehat{\beta}}_{\theta}^R\right)\right)}^{\prime }{A}^{-1}\left(\frac{1}{n}\sum \limits_{i=1}^n\psi \left({x}_i,{y}_i,{\widehat{\beta}}_{\theta}^R\right)\right)\overset{D}{\to }{\upchi}^2\left( pK-p-K+1\right), $$

as n → ∞.

Note that, because of the linearity of the conditional quantiles, the GMM testing provides a test statistics which is (asymptotically) equivalent to that provided by the MD testing.

Censored Quantile Regression

An important extension to the quantile regression model was suggested by Powell (1984, 1986). This extension considers the case in which some of the observations are censored. This model is essentially a semiparametric extension to the well known ‘Tobit’ model and can be written as

$$ {y}_i=\min \left\{{y}_i^0,{x}_i^{\prime }{\beta}_{\theta }+{u}_{\theta i}\right\} $$

for i = 1 , … , n, where $ {y}_1^0 $ is the (known) top coding value of y_i in the sample, for i = 1 , … , n. (For simplicity of presentation it will be assumed that $ {y}_i^0={y}^0 $ for all i = 1 , … , n.)

This model can be written as a latent variable model. That is, we have $ {y}_i^{\ast }={x}_i^{\prime }{\beta}_{\theta }+{u}_{\theta i} $, where Q_θ(u_θi| x_i) = 0 and $ {y}_i={y}_i^{\ast }I\left({y}_i^{\ast}\le {y}^0\right) $. It is easy to see that the observed conditional θth quantile of y_i, conditional on x_i, is given by

$ {Q}_{\theta}\left({y}_i|{x}_i\right)=\min \left\{{y}^0,{x}_i^{\prime }{\beta}_{\theta}\right\} $.

Hence, Powell suggested the following estimator for β_θ

$$ {\widehat{\beta}}_{\theta }=\arg \min \limits_{\beta }\frac{1}{n}\sum \limits_{i=1}^n{\rho}_{\theta}\left({y}_i-\min \left\{{y}^0,{x}_i^{\prime}\beta \right\}\right) $$

(5)

where ρ_θ(λ) is the same check function as defined above. Note that in order to obtain a consistent estimator of β_θ it is necessary that $ {x}_i^{\prime }{\beta}_{\theta }<{y}^0 $ for at least a fraction of the sample. Intuitively, the larger the fraction, the more precise the estimator will be.

Powell (1986) showed that, under certain regularity conditions, similar to those established by Huber (1967), the estimator is asymptotically normal. That is, $ \sqrt{n}\left({\widehat{\beta}}_{\theta }-{\beta}_{\theta}\right)\overset{D}{\to }N\left(0,{V}_{\theta}^C\right) $ as n → ∞, where

$$ {\displaystyle \begin{array}{l}{V}_{\theta}^{\mathrm{C}}=\theta \left(1-\theta \right){\Delta}_{\mathrm{C}\theta}^{-1}{\Lambda}_{\mathrm{C}\theta }{\Delta}_{\mathrm{C}\theta}^{-1},\\ {}{\Delta}_{\mathrm{C}\theta }=E\left[{f}_{u_{\theta }}\left(0|x\right)I\left({x}^{\prime }{\beta}_{\theta}\le {y}^0\right){xx}^{\prime}\right],\mathrm{and}\\ {}{\Lambda}_{\mathrm{C}\theta }=E\left[I\left({x}^{\prime }{\beta}_{\theta }<{y}^0\right){xx}^{\prime}\right].\end{array}} $$

As in the basic quantile regression model, if $ {f}_{u_{\theta }}\left(0|x\right)={f}_{u_{\theta }}(0) $ with probability 1, then $ {V}_{\theta}^C $ simplifies to $ {V}_{\theta}^C={\omega}_{\theta}^2{\Lambda}_{C\theta}^{-1}, $ where $ {\omega}_{\theta}^2=\theta \left(1-\theta \right){f}_{u_{\theta}}^2(0) $.

It is important to note that if $ {x}_i^{\prime }{\widehat{\beta}}_{\theta}\le {y}^0 $ for all observations, then the censored quantile regression estimate coincides with the basic quantile regression.

The simple intuition for this estimation procedure is that β_θ can be estimated only from that part of the sample for which it is observed, that is, for that fraction of the sample for which $ y={y}^{\ast }={x}^{\prime }{\beta}_{\theta }+{u}_{\theta}\le {y}^0 $. As a result, we note that the asymptotic covariance is ‘adjusted’ for that fact. That is, the term I(x ′ β_θ ≤ y⁰) is included in both Δ_Cθ and Δ_Cθ.

A considerable drawback of the censored quantile regression model is that it does not have the attractive LP representation and the objective function is not globally convex in β.

Concluding Remarks

The main goal of this article is to provide the basic structure of the quantile regression model. Versions of this model have been widely used in the empirical literature in a variety of situations not covered by this article. Furthermore, there have been substantial advancements in the theoretical literature as well. This literature includes quantile regression for nonlinear models, time-series models, and others. There are also a number of empirical studies that have used quantile regression extensively, in a variety of data configurations and economic contexts. For a brilliant in-depth exposition of a wide variety of topics related to quantile regression, interested readers should refer to Koenker (2005).

Bibliography

Andrews, D., and M. Buchinsky. 2000. A three-step method for choosing the number of bootstrap repetitions. Econometrica 68: 23–51.
Article Google Scholar
Bofinger, E. 1975. Estimation of density function using order statistics. Australian Journal of Statistics 17: 1–7.
Article Google Scholar
Buchinsky, M. 1994. Changes in the U.S. wage structure 1963–1987: Application of quantile regression. Econometrica 62: 405–458.
Article Google Scholar
Buchinsky, M. 1995. Estimating the asymptotic covariance matrix for quantile regression models: A Monte Carlo study. Journal of Econometrics 68: 303–338.
Article Google Scholar
Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics 7: 1–26.
Article Google Scholar
Hansen, L. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–1054.
Article Google Scholar
Huber, P. 1967. The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Vol. 1. Berkeley: University of California Press.
Google Scholar
Koenker, R. 2005. Quantile regression. Econometric Society Monograph. New York: Cambridge University Press.
Book Google Scholar
Koenker, R., and G. Bassett. 1978a. The asymptotic distribution of the least absolute error estimator. Journal of the American Statistical Association 73: 618–622.
Article Google Scholar
Koenker, R., and G. Bassett. 1978b. Regression quantiles. Econometrica 46: 33–50.
Article Google Scholar
Newey, W., and J. Powell. 1990. Efficient estimation of linear and type I censored regression models under conditional quantile restrictions. Econometric Theory 6: 295–317.
Article Google Scholar
Portnoy, S., and R. Koenker. 1989. Adaptive L-estimation for linear models. Annals of Statistics 17: 362–381.
Article Google Scholar
Powell, J. 1984. Least absolute deviation estimation for the censored regression model. Journal of Econometrics 25: 303–325.
Article Google Scholar
Powell, J. 1986. Censored regression quantiles. Journal of Econometrics 32: 143–155.
Article Google Scholar
Siddiqui, M. 1960. Distribution of quantile from a bivariate population. Journal of Research of the National Bureau of Standards 64: 145–150.
Article Google Scholar

Download references

Author information

Authors and Affiliations

http://springerlink.bibliotecabuap.elogim.com/referencework/10.1057/978-1-349-95121-5
Moshe Buchinsky

Authors

Moshe Buchinsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Copyright information

About this entry

Cite this entry

Buchinsky, M. (2018). Quantile Regression. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2795

Download citation

DOI: https://doi.org/10.1057/978-1-349-95189-5_2795
Published: 15 February 2018
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics

Quantile Regression

Abstract

Similar content being viewed by others

On multivariate quantile regression analysis

Fast algorithms for the quantile regression process

A note on estimating the bent line quantile regression model

Keywords

JEL Classifications

Introduction

The Model

Definitions and Estimator

Linear Programming and Quantile Regression

Equivariance Properties

Efficient Estimation

Interpretation of the Quantile Regression

Large Sample Properties of \( {\widehat{\boldsymbol{\beta}}}_{\boldsymbol{\theta}} \)

Assumption A.1

Assumption A.2

Assumption A.3

Theorem 1

Estimation of the Asymptotic Covariance Matrix

The i.i.d. Case

Kernel estimator

The General Case

Kernel estimator

Design matrix bootstrap estimators

Set of Quantile Regressions

Estimation and Large Sample Properties

Theorem 2

Crossing of Quantiles

Testing for Equality of Slope Coefficients

Wald-Type Testing

GMM-Type Testing

Censored Quantile Regression

Concluding Remarks

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation