1 Introduction

Quantile regression has been introduced by Koenker and Bassett [64] as an extension of standard mean regression to model the conditional quantiles of a continuous response and to provide a thorough overview of its distribution. It has become a very popular and consolidated approach in the statistical literature and is nowadays applied in a wide range of fields, including econometrics, finance, biomedicine, ecology. Comprehensive reviews can be found, among others, in [54, 63, 67, 118]. Specific examples are provided by Machado and Mata [88] in econometrics, by Austin and Schull [4] in epidemiology, by Cade et al. [15] in ecology.

The literature on quantile regression methods is now extremely vast. In this work we will review a specific area of application; in particular, we will focus on linear quantile regression models for longitudinal observations. In the last two decades, longitudinal study designs have raised considerable attention. Obtaining additional information from a unit already in the study is cheaper than adding a new one; also, longitudinal studies allow to monitor the evolution of individual trajectories over time. Weiss [113] lists a number of other benefits of collecting longitudinal in place of cross-sectional data. Additional issues have to be faced when dealing with longitudinal studies, though. Observations from the same individual are naturally dependent and this has to be taken into consideration to avoid potential bias in parameter estimates; moreover, individuals may leave the study before the planned end, thus, presenting incomplete data records. In such a context, standard regression models can not be directly used, and need to be extended to avoid misleading inferential conclusions.

We will not propose a unifying framework, but rather try to discuss the available options in a logically sound sequence, discussing advantages and limitations of each of them. We will approach both the model specification and the estimation of model parameters mainly from a statistical perspective. Quantile regression is indeed a popular methodology also in the econometric framework; however, in some cases, a clash does exist between the two contexts, with rationale, language and problems to be faced that can be quite different. We will also try to underline that longitudinal quantile regression has some similar (e.g., dependence) and some different challenges with respect to standard longitudinal regression. Quantile regression is more appropriate than mean regression in a number of situations. In some cases, the researcher’s interest may not rely on the center of the distribution, but rather in its tails; furthermore, covariates may have different effects on different quantiles. In HIV research, for instance, the effect of the treatment is more important on the left tail of the CD4 count distribution, where individuals are at higher risk. Additionally, a treatment might be beneficial for patients at lower risk but detrimental (due to complications or side effects) for those in the left tail. Other examples include longitudinal fetal growth studies (e.g., [25]), which usually focus on low/high quantiles of key anthropometric measurements and, more in general, growth curves [111, 112]. An example regarding monitoring physical activity in children is given in [47]. Another crucial point is that outliers may be present in the observed data, and this poses questions on the reliability of mean regression estimates. In these cases, one could use robust regression approaches (see [34, 35, 57], for reviews) or focus on the median of the outcome distribution. Quantile regression, additionally, allows the user to avoid transformations of the outcome in many cases, making parameter estimates more readily interpretable. While in cross-sectional designs a satisfactory transformation to normality might be often found, in longitudinal studies the outcome distribution may change in shape (e.g., skewness) at each time occasion. Thus, a global transformation may make time-specific distributions far from Gaussian, and time-specific transformations may lead to parameter estimates that are quite hard to be interpreted.

The rest of the paper is structured as follows: in the next section, we briefly review the essentials of cross-sectional linear quantile regression. In Sect. 3, we provide an overview of modelling specifications for longitudinal quantile regression, while estimation strategies are presented in Sect. 4. In Sect. 5, we discuss several extensions and open issues; available software is briefly listed in Sect. 6.

2 Quantile regression for independent observations

Let \(y_1, \ldots , y_n\) be the realizations of n independent and identically distributed random variables, \(Y_1, \ldots , Y_n\), denoting n copies of a continuous outcome of interest. Let also \(\mathbf x _{i} = (x_{i1}, \ldots , x_{ip}) \) be a p-dimensional vector of explanatory variables, recorded for unit \(i = 1,\ldots , n\). In this section, we deal with a cross-sectional experiment based on independent data.

A possible way to characterize the distribution of \(Y_i\) is in terms of quantiles. The \(\tau \)-th quantile \(Q_{\tau }(y)\) of a variable Y is conveniently defined as

$$\begin{aligned} Q_{\tau }(y) = \arg \min _{c} \sum _{i=1}^n \mathbb E[ \rho _{\tau } ( y_i - c) ], \end{aligned}$$

where \(\rho _{\tau } (u) = u[ \tau - \mathbb I (u<0)]\) is an asymmetric absolute loss function that assigns weights \(\tau \) and \(( \tau -1)\) to positive and negative deviations, respectively, and is usually referred to as the quantile loss function.

Now, suppose we want to estimate a conditional quantile, that is, the quantile of \(Y_i\) conditional on a given covariate configuration \(\mathbf x _i\). For a fixed \(\tau \in (0,1)\), the following quantile regression model can be specified

$$\begin{aligned} Q_{\tau } (y_i \mid \varvec{\beta }, \mathbf x _i) = \mathbf x _{i}^\prime \varvec{\beta }_\tau , \end{aligned}$$
(1)

where \(\varvec{\beta }_\tau \in \mathbb R^{p}\) denotes a vector of unknown, fixed, parameters summarizing the effects of \(\mathbf x _i\) on the \(\tau \)-th (conditional) response quantile. Expression (1) can be, alternatively, formulated through the linear model

$$\begin{aligned} y_i = \mathbf x _{i}^\prime \varvec{\beta }_\tau + \varepsilon _{i}, \end{aligned}$$
(2)

where \(\varepsilon _i\) is a random error term. The assumption

$$\begin{aligned} Q_{\tau } (\varepsilon _i \mid \varvec{\beta }, \mathbf x _i) = 0 \end{aligned}$$

is introduced to guarantee that the random errors are centred on the \(\tau \)-th quantile.

Interpretation of \({\varvec{\beta }}_\tau \) is straightforward: the intercept term simply represents the baseline predicted quantile, while each slope can be interpreted as the rate of change of the \(\tau \)-th response quantile per unit change in the value of the corresponding covariate (the others being fixed). Estimation of \(\varvec{\beta }_\tau \) in (1) or (2) proceeds by solving

$$\begin{aligned} \hat{\varvec{\beta }}_\tau = \arg \min _{\varvec{\beta }_\tau } \sum _{i: y_i \ge \mathbf x _i ^\prime \varvec{\beta }_\tau } \tau \mid y_i - \mathbf x _i ^\prime \varvec{\beta }_\tau \mid + \sum _{i: y_i < \mathbf x _i ^\prime \varvec{\beta }_\tau } (1- \tau ) \mid y_i - \mathbf x _i ^\prime \varvec{\beta }_\tau \mid . \end{aligned}$$
(3)

As suggested by Koenker and Bassett [64], optimal solutions can be derived by setting appropriate linear programming algorithms. The most common is a modified version of the Barrodale and Roberts algorithm for \(L_1\)-regression, as described by Koenker and d’Orey [65, 66]. For large dimensional problems, the Frisch–Newton interior point method (possibly after preprocessing) is a better option, as illustrated by Portnoy and Koenker [97]. Newey and Powell [95] show that asymmetric least square estimators have properties similar to those of the solutions obtained from (3).

A natural link between minimization of the quantile loss function and maximum likelihood theory is given by the assumption that the error term in Eq. (2) follows an asymmetric Laplace distribution (ALD), see, among others, [68, 119]. An ALD random variable has density

$$\begin{aligned} f_{y} (y \mid \mu , \sigma , \tau ) = \frac{\tau (1-\tau )}{ \sigma } \exp \left\{ - \rho _{\tau } \left( \frac{ y - \mu }{\sigma } \right) \right\} , \end{aligned}$$
(4)

where \(\rho _\tau (u)\) is the quantile loss function we have previously defined and \(\mu , \sigma \) and \(\tau \) are the location, the scale and the skewness parameter, respectively. By assuming that \(y_1, \ldots , y_n\) are independent realizations of random variables \(Y_i \sim \text {ALD}(\mu _i, \sigma , \tau )\), where \(\mu _i = \mathbf x _{i}^\prime \varvec{\beta }_\tau , i = 1, \ldots , n\), the following likelihood function can be derived

$$\begin{aligned} L(\varvec{\beta }, \sigma , \tau ) = \left[ \frac{\tau (1-\tau )}{\sigma } \right] ^{n} \exp \left\{ - \sum _{i = 1}^n \rho _\tau \left( \frac{y_i - \mathbf x _{i}^\prime \varvec{\beta }_\tau }{\sigma } \right) \right\} . \end{aligned}$$
(5)

It is worth to notice that the assumption of ALD errors is only introduced as a convenient computational trick that allows to recast quantile regression optimization in a maximum likelihood framework. Maximizing the likelihood (5) is equivalent to minimize expression (3) with respect to \(\varvec{\beta }_\tau \). Furthermore, such a distributional assumption allows several extensions of the basic framework, including modelling dependent observations.

3 Quantile regression for longitudinal observations

When data are repeatedly collected on a sample of individuals across time, the independence assumption may no longer hold. Three different sources of variability can influence the dependence between observations from the same individual: between-individual variability (reflecting individual propensities to the event which are shared by all the repeated measures coming from the same unit), within-individual variability (that is serial correlation between measurements from the same individual taken at different time points) and random error. Diggle et al. [30] and Fitzmaurice et al. [39], among others, give a detailed discussion on the topic. If dependence is not properly taken into consideration, model parameter estimates may be severely biased. Two main approaches to deal with dependent observations can be distinguished, essentially with reference to the families of marginal and conditional models. In the first case, the association structure is explicitly specified together with the model for the response quantiles. Here, model parameters have a population-averaged interpretation as they describe changes in the response values between separated groups of individuals (in the study population), as distinguished by their covariates. In the second case, the response quantiles and the dependence between repeated measurements are jointly specified. Individual-specific (fixed or random) parameters are introduced in the model specification; these are shared by all the responses coming from the same sample unit and allow to describe sources of unobserved heterogeneity that influence the dependence between longitudinal observations. In this context, model parameters have an individual-specific interpretation, that is they reflect variations in the individual response values over time associated with a change in that individual’s covariates [38].

An approach which is in-between the marginal and the conditional approach is discussed in [98]. The author considers individual-specific fixed effects for identification purposes only. Parameters associated to the observed covariates are, instead, defined as a function of the “total disturbance”, that is the individual-specific effect plus the random error, thus having a population-averaged interpretation.

We introduce marginal and conditional models in Sects. 3.1 and 3.2, respectively. Differences between the two approaches are discussed in Sect. 3.3.

3.1 Marginal models

Let \(Y_{it}\) denote a continuous longitudinal response recorded on \(i=1, \ldots , n\) individuals at time occasions \(t = 1,\ldots ,T_i\) and let \(\mathbf x _{it}\) be a p-dimensional vector of explanatory variables associated with the parameter vector \(\varvec{\beta }_{\tau }\). Marginal models are specified like cross-sectional ones, that is

$$\begin{aligned} Q_{\tau } (y_{it} \mid \varvec{\beta }, \mathbf x _{it}) = \mathbf x _{it} ^\prime \varvec{\beta }_{\tau }. \end{aligned}$$
(6)

The resulting error terms \(\varepsilon _{it} = y_{it} - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau }\) must fulfil the following assumptions: first, \(\Pr (\varepsilon _{it} \le 0 \mid \mathbf x _{it}) = \tau \), secondly, the vector of error terms is made by independent components over different individuals but dependent components over repeated measurements on the same individual.

The model is completed by an assumption on the association structure for measures coming from the same individual. This clearly represents an approximation of the true underlying dependence between repeated measurements and is treated as a nuisance parameter. As it is well known since [110], consistent estimates for the \(\varvec{\beta }_{\tau }\) parameters can be obtained also if the covariance matrix is misspecified. Even an independence assumption, albeit obviously incorrect, allows to obtain consistent estimates. On the other hand, a good approximation of the true association substantially decreases the MSE of parameter estimates. When compared to mean regression, the identification of a reasonable covariance structure in the quantile regression framework is more challenging. For example, if the random noise vector for the repeated measures from an individual follows an AR(1) structure, the corresponding noise vector for the quantile regression is no longer AR(1) [75]. This makes any assumption difficult to be motivated; based on these considerations, He et al. [56] recommend the working independence assumption.

Despite these difficulties, some proposals besides such an assumption can be found in the literature. Assuming \( \Pr (\varepsilon _{it} \le 0 ,\varepsilon _{it^\prime } \le 0 \mid \mathbf x _{it}) = \delta , t\ne t^\prime = 1, \dots , T_i\), Fu and Wang [40] define an exchangeable covariance matrix for \(\mathbf S _{i} = (S_{i1}, \dots , S_{iT_{i}})\), with \(S_{it} = \tau - \mathbb I(y_{it} - \mathbf x _{it}^\prime \varvec{\beta }_\tau \le 0)\), as

$$\begin{aligned} \mathbf V _i= \tau (1-\tau ) [ (1-c)\mathbf I _{T_i} + c \mathbf J _{T_i}]. \end{aligned}$$
(7)

In the expression above, \(c = \text {Cor}(S_{it}, S_{it^\prime }) = (\delta - \tau ^2) / (\tau -\tau ^2), \forall t \ne t^\prime ,\) while \(\mathbf I _{T_i}\) and \(\mathbf J _{T_i}\) represent an identity matrix and a square matrix of ones, respectively (both of dimension \(T_i\)). Simulation results in [40] show that assuming an exchangeable structure in place of independence results in a higher efficiency when strong within-subject correlation does exist, at the price of a larger bias in the parameter estimates. A general stationary autocorrelation structure has been proposed by Lu and Fan [84]; here, the covariance matrix of \(\mathbf S _{i}\) is given by

$$\begin{aligned} \mathbf V _i(\mathbf r ) = \mathbf A _i^{1/2} \mathbf C _i (\mathbf r ) \mathbf A _i^{1/2}, \end{aligned}$$
(8)

where \(\mathbf A _i = \text {diag}\{\text {Var}(S_{i1}), \dots , \text {Var}(S_{iT_{i}})\}\) and \(\mathbf C _i(\mathbf r ) \) is an autocorrelation matrix indexed by the parameter vector \(\mathbf r = (r_1, \ldots , r_{T_{i}-1})\) and defined as

$$\begin{aligned} \mathbf C _{i}(\mathbf r ) = \left( \begin{array}{cccc} 1 &{}\quad r_1 &{}\quad \dots &{}\quad r_{T_i -1}\\ r_1 &{}\quad 1 &{}\quad \dots &{}\quad r_{T_i -2}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ r_{T_i -1} &{}\quad r_{T_i -2} &{}\quad \dots &{}\quad 1 \\ \end{array} \right) . \end{aligned}$$

Estimates for the autocorrelation parameters \(r_{n-l}, l = 1, \ldots , T_i -1\) are obtained by exploiting the method of moments. A homogeneous unstructured covariance matrix may be also assumed; see [56].

3.2 Conditional models

An alternative approach to account for the dependence between longitudinal observations is based on the inclusion in the linear predictor of sources of unobserved heterogeneity. This heterogeneity comes either from omitted covariates or from a different effect of measured covariates on the response due to genetic, environmental, social and/or economic factors. Individual-specific parameters fruitfully describe these individual features. For a given \(\tau \in (0, 1)\), a conditional quantile regression model is defined by

$$\begin{aligned} Q_{\tau } (y_{it} \mid b_i, \varvec{\beta }, \mathbf x _{it}) = b_i + \mathbf x _{it} ^\prime \varvec{\beta }_\tau , \end{aligned}$$
(9)

which may be, equivalently, written as

$$\begin{aligned} y_{it} = b_i + \mathbf x _{it} ^\prime \varvec{\beta }_\tau + \varepsilon _{it}. \end{aligned}$$
(10)

In the equation above, \(\varepsilon _{it}\) is an error term whose \(\tau \)-th conditional quantile is identically null, that is \(Q_{\tau } (\varepsilon _{it} \mid b_i, \varvec{\beta }, \mathbf x _{it}) = 0\), while \(\varvec{\beta }_\tau \) summarizes the relation between the covariates in \(\mathbf x \) and the \(\tau \)-th response quantile for an individual whose baseline level is equal to \(b_i\). The dependence between observations from the same individual \(i = 1, \ldots , n\) arises since they share the same \(b_i\): conditional on this parameter, repeated measures are no longer dependent.

Two different approaches to (conditional) quantile regression may be distinguished, referring to distribution free and likelihood based methods [46]. Within the former approach, fixed individual-specific intercepts are considered and treated as pure location shift parameters common to all conditional quantiles. This implies that the conditional distribution for each individual has the same shape, but different locations as long as the \(b_i\)’s are different. Fixed effect quantile regression for longitudinal data has been introduced by Koenker [62] and it has been subsequently extended to allow for general endogenous covariates [55] and lagged responses [41, 42].

On the other hand, in the likelihood framework, individual-specific parameters \(b_i\)’s are assumed to be independent and identically distributed random variables; the corresponding distribution allows to explain differences in the response quantiles across individuals [45]. General random parameters have been considered by Liu and Bottai [83] and Geraci and Bottai [46]. Let \(\mathbf b _i = (b_{i1}, \ldots , b_{iq})\) represent a q-dimensional vector of individual random parameters, with density \(f_b(\cdot ; \mathbf D _\tau )\). Usually, \(\mathbf D _\tau \) is a \(\tau \)-dependent covariance matrix. A linear quantile mixed model is, therefore, defined by

$$\begin{aligned} y_{it} =\mathbf x _{it} ^\prime \varvec{\beta }_\tau + \mathbf z _{it}^\prime \mathbf b _i +\varepsilon _{it}, \end{aligned}$$
(11)

where \(\mathbf z _{it}\) denotes a subset of \(\mathbf x _{it}\) and, as before, \(Q_\tau (\varepsilon _{it}\mid \mathbf b _i, \varvec{\beta }, \mathbf x _{it}, \mathbf z _{it})=0\).

As it has been pointed out by Geraci and Bottai [46], the random structure above allows to account for between-individual heterogeneity associated with given explanatory variables and does not require orthogonality between the observed and the omitted covariates. We mention that, when the error terms in Eq. (11) are Gaussian random variables, a standard (mean) mixed model for longitudinal observations [72] is straightforwardly obtained.

When sources of unobserved heterogeneity are time-varying, the time-constant random parameters we have described so far may not properly recover the true value of the parameter estimates, especially on quantiles that are far from the median. The assumption that \(b_i\) is time fixed has been relaxed by Farcomeni [33] through the specification of a latent Markov model for conditional quantiles. Along these lines, Marino et al. [89] describe a mixed latent Markov model for quantiles where both time-constant and time-varying sources of unobserved heterogeneity are taken into consideration. In Sect. 3.2.1 alternative distributions for the random parameters are discussed.

3.2.1 The random parameter distribution

When unobserved heterogeneity is captured by individual-specific random parameters, the specification of the corresponding distribution requires some caution. Time-constant individual-specific parameters distributed according to a (zero-mean) Gaussian, a symmetric Laplace or a multivariate t random variable have been considered by Geraci and Bottai [45], Liu and Bottai [83], Geraci and Bottai [46] and Farcomeni and Viviani [36]. The last two distributions represent robust alternatives to the more typical Gaussian assumption.

Farcomeni [33] considers random intercepts that vary over time according to a homogeneous, first order, latent Markov (LM) chain. The resulting model represents a semi-parametric alternative to the models we have discussed so far; it allows to account for time-varying unobserved heterogeneity and avoids the potential bias deriving from a misspecification of the random parameter distribution. When adopting this approach, the model in Eq. (10) becomes

$$\begin{aligned} y_{it} = b_{it} + \mathbf x _{it} ^\prime \varvec{\beta }_\tau + \epsilon _{it}, \end{aligned}$$
(12)

where \(b_{it}\) varies according to the states of a (quantile-specific) latent Markov chain and \(Q_\tau (\varepsilon _{it} \mid b_{it}, \varvec{\beta }, \mathbf x _{it})\). Under the assumption of Gaussian random errors, Eq. (12) reduces to the standard latent Markov model for longitudinal data; see [7] for a detailed discussion on this class of models. By assuming that the hidden transition matrix is diagonal, the random intercepts are time-constant and a parsimonious (quantile) latent class model is obtained. This comes at the price of loosing some goodness of fit, so that this assumption should tentatively be tested [5].

The choice of a specific random parameter distribution should be data driven; models based on different specifications might be fit and compared according to penalized likelihood criteria, like the AIC [2] or the BIC [103]. Few additional guidelines follow. First of all, since linear quantile mixed models are essentially a specific kind of standard mixed models, when we focus on the center of the distribution and the number of repeated measurements per unit is large, the modelling structure is rather robust with respect to random parameter misspecification [101]. Even with small \(\max _i T_i\), a negligible bias is often observed and it is essentially related to covariate effects associated to random parameters [94]. Serious bias might be expected when focusing on quantiles corresponding to low density regions of the conditional outcome distribution, as well as when the covariates associated with random slopes are misspecified. A related matter is the use of random slopes and similar polynomial models to describe time-varying unobserved heterogeneity: the latent Markov framework naturally adapts to the true underlying dynamic heterogeneity. It is therefore resistant to misspecification of the random parameter distribution (as the discrete support approximates a possibly continuous underlying distribution), and the time-dynamics do not need to be parametrically specified. Of course, there are limitations to these statements, and we point the reader to the discussion in [8] for more details.

3.3 Marginal vs conditional models

To better highlight differences between the two families of models, we conclude this section by contrasting conditional and marginal approaches. In the mean regression context, Zeger et al. [123] state that marginal models describe the dependence among repeated observations; in this setting, the target population is considered as composed by homogeneous individuals with dependent repeated measurements and, therefore, population quantiles can be directly modelled as a function of the observed covariates. Due to these features, marginal models are often referred to as population-averaged models.

On the other hand, conditional models try to describe the potential sources of this dependence. They include in the model specification individual-specific parameters that account for unobserved factors which, once known, would make longitudinal observations from the same individual independent. In this perspective, individual characteristics play a central role: the target population is considered as composed by heterogeneous individuals and the effect of covariates on individual quantiles is analysed.

The choice between marginal and conditional model is always context-specific and has to be related to the analysed outcome; see e.g. [38, 74, 79, 93]. The most important thing to keep in mind is that interpretation of parameter estimates is different. In the marginal formulation, parameters \(\varvec{\beta }_\tau \) describe the effect of covariates on the \(\tau \)-th population response quantile. On the other hand, in the conditional framework, regression coefficients have an individual-specific interpretation. In fact, \(\beta _{\tau j}\) represents the change in the \(\tau \)-th quantile of the outcome distribution (associated to a unit increment in the corresponding covariate) for a unit with individual-specific parameter equals to \(b_i\).

Therefore, the applied researcher must choose the fitting strategy according to the research aims. The use of marginal models for repeated measurements is often discouraged [24, 74, 79] as predictions correspond to hypothetical individuals only. In almost all cases, therefore, one should fit conditional models. However, population averaged models may be useful in epidemiological or ecological studies, for instance, if the key questions involve a comparison between disjoint groups. When the central questions entail evaluation of changes for any given individual (e.g., what happens if a treatment instead of another is used on a individual, what happens after t time-units, etc.) conditional models are the only sensible choice.

It shall be noticed that conditional models can be specified in conjunction with the assumption that subject-specific parameters are fixed or random. Choosing between fixed and random effects is often a delicate matter. Here, we do not get into too much details about the former and point the interested reader to the econometric literature (see e.g. [17] and references therein).

We only mention some general considerations that may help decide which approach is more appropriate for a given problem. First, random intercept models rely on the independence assumption between sources of observed and unobserved heterogeneity, while such an assumption is not used in the fixed effect counterpart. Therefore, if one thinks that unobserved features are related to the observables (\(\mathbf x _{it}\)), then conditional models with fixed effects should be preferred. However, when general random coefficients are considered, one can let any of the model parameters vary from individual to individual, not just the intercept. This allows to relax the aforementioned independence assumption and leads to a more general model formulation. On the other hand, in the context of conditional models with fixed effects, only individual-specific intercepts can be considered and, although one is able to control for unobserved characteristics, their effect can not be directly estimated. Secondly, time-constant covariates can not be accommodated in conditional models with fixed effects as there would be overlapping between the two components; such a constraint does not exist in the random parameter framework. Last, conditional models with fixed individual-specific parameters require complex identifiability constraints, and in general can not be estimated when the available number of repeated measures per individual is small. Furthermore, the consistency of parameter estimates is ensured only when \(T_i\) diverges. This is another strong limitation in many common applications, where \(T_i\) is generally small and only n is large. A consequence is that, in many practical situations in the area of applications of statistics, random parameter models are the only viable choice.

We conclude this discussion by mentioning few examples. A benchmark dataset in longitudinal quantile regression is the labor pain data [26], which involves evaluation of pain levels at different stages of labor. A psychological component intuitively makes treatment less effective when we look at high quantiles of the distribution. In this application, population averaged effects would be misleading as we are interested in how pain evolves for each woman.

Fabrizi et al. [32] evaluate wage dynamics in the Italian labor market, where the differences, for instance, among qualifications might be evaluated through marginal models. A longitudinal study on child obesity is described by Fenske et al. [37], and the effects of fast-food prices on adult obesity are explored in [52]. Obesity is simply studied by focusing on high quantiles. Bottai et al. [11] discuss data related to depression in adolescents. In all these cases, conditional quantile regression has been exploited to analyse the effect of observed covariates on the evolution of individual trajectories over time. These data can probably be tackled from different perspectives, with a more descriptive approach, based on marginal modelling, in order to highlight differences between separate groups (e.g., males vs females) and conditional modelling in order to provide “individual” estimates of risk factors.

4 Estimation

In this section, we discuss estimation of model parameters for both the marginal and the conditional formulations. Generally, it has to be noticed that the former are computationally much simpler than the latter, as the same (weighted) routines used for cross-sectional data can be used for parameter estimation. On the other hand, when a working covariance structure is assumed, the covariance matrix of the estimates needs to be adjusted by the usual sandwich formula [59, 78]. For general expressions of the sandwich formula in longitudinal quantile regression see, for instance [40, 75].

4.1 Marginal models

Marginal models proposed by He et al. [56], Chen et al. [19] and Yin and Cai [116] use generalized estimating equations [78] based on an independence working model; these are simple and have desirable properties but could lead to loss of efficiency when strong correlation does exist. For simplicity, we focus on the estimation of single quantiles, but a generalization to simultaneous estimation of several quantiles (as in the next section) is quite straightforward.

Obtaining estimates under an independence working model simply involves a system of estimating equations of the kind

$$\begin{aligned} \sum _{i=1}^n \sum _{t=1}^{T_i} \mathbf x _{it} (\tau - \mathbb I(y_{it} - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau }\le 0)) = \mathbf 0 . \end{aligned}$$

As it is clear, results derived from the expression above are equivalent to those obtained from the minimization of the quantile loss function presented in Sect. 2 for cross-sectional data. As remarked above, the only difference is the robustification of the covariance matrix for parameter estimates.

Obviously, when a given covariance matrix, \(\mathbf V _i\), is used for the error terms, estimating equations above need to be properly extended. In the context of median regression models, Jung [60] suggests a quasi likelihood-type approach [110] based on the following set of weighted estimating equations

$$\begin{aligned} \sum _{i=1}^n \mathbf X _{i}^{\prime } \varvec{\Gamma }_i \mathbf V _i^{-1} (\tau - \mathbb I(\mathbf y _{i} - \mathbf X _{i} \varvec{\beta }_{\tau }\le 0)) = \mathbf 0 . \end{aligned}$$
(13)

Here, the matrix \(\varvec{\Gamma }_i = \text {diag}(f_{i1}(0), \dots , f_{iT_{i}}(0))\), with \(f_{it}\) being the probability density function of \(\varepsilon _{it}\), is introduced to account for possible overdispersion in the error distribution. Clearly, Jung’s estimator requires this matrix to be known. Only in the case of identically distributed errors such quantity can be ignored and the resulting estimating functions can be shown to be optimal if \(\mathbf V _i\) is correctly specified.

Lipsitz et al. [80] extend the previous approach by proposing a weighted GEE model for longitudinal data affected by non-informative drop-out (MAR data). Karlsson [61] develops a quantile weighted version of regression models for non-linear longitudinal data. Tang and Leng [105] suggest to derive parameter estimates via an empirical maximum likelihood approach, in which a working model for the conditional mean is specified to enhance efficiency. Fu and Wang [40] split the weighted estimating Eq. (13) into two components, the within-group and between-group estimating equations. The resulting estimator minimizes the combination of these two functions in a way similar to that of the generalized method of moments [53]. The method suggested by Fu and Wang [40] is robust to the choice of the error correlation structure and leads to parameters that have a higher efficiency than those derived under the independence assumption.

A different approach is proposed by Leng and Zang [75]. To circumvent the problem of correctly specifying the error covariance matrix, the authors suggest to combine multiple sets of estimating equations as those defined in (13), that is

$$\begin{aligned} \frac{1}{n} \sum _{i = 1}^n \left( \begin{array}{c} \sum _{t = 1}^{T_i} \mathbf x _{it}^{\prime } \varvec{\Gamma }_i \mathbf M _{i1}^{-1} (\tau - \mathbb I(y_{it} - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau }\le 0))\\ \vdots \\ \sum _{t = 1}^{T_i} \mathbf x _{it}^{\prime } \varvec{\Gamma }_i \mathbf M _{iK}^{-1} (\tau - \mathbb I(y_{it} - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau }\le 0)) \end{array} \right) = \mathbf 0 , \end{aligned}$$
(14)

where \(\mathbf M _{ik}, k = 1, \ldots , K,\) are known matrices (e.g. the identity matrix, the matrix with 0 on the diagonal and 1 off-diagonal, etc...). The last two proposals, as well as that of [84], extend the induced smoothing method [12] to quantile regression. The corresponding (smoothed) estimating equations can be solved via a standard Newton–Raphson algorithm.

Within the context of marginal quantile regression for longitudinal data, asymptotic normality of parameter estimators is guaranteed and a closed form expression for the asymptotic covariance matrix is available. In most of the cases, resampling and perturbing methods are used to estimate this latter quantity [116].

4.2 Conditional models

Parameter estimation in the conditional quantile regression framework is closely related to the assumptions postulated for the individual-specific parameters. In Sects. 4.2.14.2.2, we discuss estimation procedures for conditional models with fixed and random parameters, respectively.

4.2.1 Fixed effect models

Following the proposal by Koenker [62], when fixed individual-specific parameters account for dependence between longitudinal data, conditional quantiles are estimated simultaneously by minimizing a weighted piecewise linear quantile loss function. Let \(\tau = (\tau _1, \ldots , \tau _q)\) be the set of quantiles of interest; parameter estimates are obtained by solving

$$\begin{aligned} \arg \min _{\beta , b} \sum _{k = 1}^q \sum _{i = 1}^n \sum _{t = 1}^{T_i} \omega _k \rho _{\tau _k} [ y_{it} - b_i - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau _k}], \end{aligned}$$
(15)

where \(\rho _{\tau _k}, k = 1, \ldots , q,\) is the quantile loss function introduced by Koenker and Bassett [64] and the weights \(\omega _k\) are introduced to control the influence of the different quantiles on the estimation of individual parameters. The author shows that, under some regularity conditions, parameter estimates obtained by (15) are consistent and asymptotically Gaussian. However, when n is large relatively to \(T_i\), a penalized approach is suggested. In this case, a \(\ell _1\) penalty term is considered to shrink individual-specific intercepts towards a common value and parameter estimates are computed via

$$\begin{aligned} \arg \min _{\beta , b} \sum _{k = 1}^q \sum _{i = 1}^n \sum _{t = 1}^{T_i} \omega _k \rho _{\tau _k} [ y_{it} - b_i - \mathbf x _{it} ^\prime \varvec{\beta }_{\tau _k}] - \lambda \sum _{i = 1}^n |b_i|. \end{aligned}$$
(16)

Koenker [62] shows that fixed parameter estimators obtained under this approach are asymptotically unbiased and Gaussian; an asymptotic approximation for the optimal value of the tuning parameter \(\lambda \) can be found in [73]. Harding and Lamarche [55], Galvao and Montes-Rojas [42] and Galvao [41] build on the proposal by Chernozhukov and Hansen [23] to develop a two step procedure that exploits instrumental variables to derive parameter estimates. In the first step, a weighted quantile loss function is minimized with respect to model parameters, keeping individual-specific effects constant. In the second step, this latter are estimated by minimizing a weighted distance function defined on the basis of the instrumental variable coefficients. A simpler two step procedure is used by Canay [17]. A transformation of the data is first applied to eliminate individual-specific terms, as in the first-difference approach to mean regression; then, the remaining parameters are estimated. The resulting estimators are shown to be consistent and asymptotically Gaussian when both n and \(T_i\) go to infinity.

Identifiability of conditional quantile regression models with fixed effects has been extensively discussed in the econometric literature. When the number of repeated measurements is small, fixed effects can not be directly estimated. In the context of pure location shift effects (that is, when \(b_i\) is common to all quantiles), Canay [17] proves identifiability of model parameters under the assumption of independence between random errors and individual parameters, and the existence of first order moments. Under these conditions, model parameters are identified as long as at least two observations are available for each unit. Rosen [102] focuses on the identification of model parameters outside the pure location shift context; here, identification of model parameters holds if one assumes support conditions and conditional independence between the random errors \(\varepsilon _{it}\). We point the interested reader to these works and references therein.

4.2.2 Random parameter models

When individual-specific parameters are considered as random and the conditional assumption of asymmetric Laplace errors holds, parameter estimates can be obtained by exploiting a maximum likelihood approach. In the case of time-constant random parameters \(\mathbf b _i = (b_{i1}, \ldots , b_{iq})\), the marginal likelihood function is defined by:

$$\begin{aligned} L(\varvec{\beta }, \sigma , \mathbf D ; \tau ) = \prod _{i = 1}^n \int \prod _{t = 1}^{T_i} f_y (y_{it} \mid \mathbf b _i; \varvec{\beta }, \sigma , \tau ) f_b ( \mathbf b _i; \mathbf D _\tau ) \text {d} \mathbf b _i. \end{aligned}$$
(17)

The integral in the expression above does not have a closed form solution and numerical integration methods are required. Geraci and Bottai [45] and Liu and Bottai [83] describe the use of a MCEM algorithm [122] derived from the context of generalized linear mixed models [10]. Because of the computational inefficiency of such an approach, Geraci and Bottai [46] suggest the use of Gaussian quadrature methods to directly maximize the likelihood in (17).

The marginal likelihood function for the case of time-varying random intercepts with a Markovian distribution, as in [33], can be expressed as

$$\begin{aligned} L(\varvec{\beta }, \sigma , \varvec{\delta }, \mathbf Q ; \tau ) = \prod _{i = 1}^n \sum _\mathbf{b _i} f_y (\mathbf y _{i} \mid \mathbf b _{i}; \varvec{\beta }, \sigma , \tau ) f_b (\mathbf b _i; \varvec{\delta }_\tau , \mathbf Q _{\tau }), \end{aligned}$$
(18)

where \(\mathbf b _i = (b_{i1}, \ldots , b_{i{T_i}})\) is the vector of individual and time-dependent intercepts evolving according to the latent Markov chain described by \(\varvec{\delta }_\tau \) and \(\mathbf Q _{\tau }\); these represent the \(\tau \)-dependent initial probability vector and transition probability matrix, respectively. Farcomeni [33] adapts the Baum–Welch algorithm [9] to estimate model parameters. It shall be noticed that this algorithm is a particular specification of a general EM algorithm [28], so that the two names may be used interchangeably. Marino et al. [89] adopt a nonparametric distribution for time-constant and time-varying random parameters and exploit a nonparametric maximum likelihood approach to derive parameter estimates.

The description of a general EM algorithm in the context of quantile regression models follows. Starting from the definition of the complete data log-likelihood, the following two steps are repeatedly alternated until convergence.

  • In the E-step, the expected value of the complete data log-likelihood, given the observed data and the current parameter estimates, is computed. Monte Carlo integration or quadrature methods are needed, while a closed form expression is available when a discrete distribution is assumed for the random parameters.

  • In the M-step, the expected complete data log-likelihood is maximized. The parameters in the longitudinal data model are usually updated through a block algorithm, that is, fixed and random parameters are updated separately. Under the latent Markov formulation [33], closed form expressions for initial and transition probabilities can be derived; under the the mixed latent Markov formulation [89], closed form expression for the mixture component probabilities are available as well. Fixed parameters are updated by using an algorithm for cross-sectional quantile regression with an offset given by the (current) random predictor. Finally, the scale parameter of the ALD and the random parameter covariance matrix (if present) are updated through moment matching.

In the conditional mixed model formulation, standard errors are usually obtained by means of bootstrap routines; an excellent discussion about resampling in quantile regression is provided by Buchinsky [13]. Given that for each bootstrap sample we must run an algorithm which relies on non-convex optimization, resampling is often cumbersome. For this reason, the number of replicates (number of bootstrap samples) may be tuned as in [3]. An additional issue regards the fact that we are bootstrapping longitudinal data, which is often a delicate matter. We do not get into details here on a comparison between the methods available and simply mention that the usual strategy is to resample individuals, rather than measurements trying to preserve the dependence structure between repeated measures. See also Parzen et al. [96] and Yin and Cai [116].

We conclude this section by mentioning Bayesian approaches to mixed quantile regression. Reich et al. [99] discuss a quasi distribution-free Bayesian approach; the error distribution is assumed to arise from an infinite mixture of Gaussian densities subject to a stochastic constraint which enables inference on the quantile of interest. Luo and Lian [87] considers ALD errors and propose to use a Metropolis–Hastings algorithm or a Gibbs sampling approach to derive parameter estimates.

5 Extensions and open issues

In this section, we briefly review some extensions and generalizations of linear longitudinal quantile regression models described in the previous sections and list some open issues. All paragraphs refer to longitudinal data analysis. We report many results for the cross-sectional case only as a preamble to longitudinal data, or only when they can be directly extended to the longitudinal setting.

5.1 Missing values

Missing data are ubiquitous in all statistical applications and, especially, in longitudinal studies. Missing values may be ignorable (non-informative) or non-ignorable (informative). We refer the reader to [29, 81, 82] for a detailed treatment of the topic. In the ignorable case, standard estimation routines might be used simply discarding the missing values. Use of (multiple) imputation may possibly improve efficiency of the estimators, as suggested by Geraci [43], who does not focus specifically on longitudinal data but rather on potentially clustered data. A situation where it is often difficult to assume missing values are ignorable is in longitudinal studies with monotone missing data patterns, that is, when there is a drop-out from the study. Drop-out may occur due to reasons that are linked to the unobserved values of the outcome; for instance, in HIV studies patients may die before the last scheduled follow-up, in wage studies individuals may lose their job or retire, and so on. There are very few approaches to quantile regression in the presence of informative drop-out. Lipsitz et al. [80] and Yi and He [115] discuss the extension of the estimating equation approach to handle missing data, in a MAR context; the observed values are weighted by the inverse of the probability of drop-out. In a Bayesian framework, Yuan and Yin [120] model missingness as a binary time series sharing a random parameters with the quantile regression process, as in shared parameter models (see e.g. [114]). A joint model for a right-censored time-to-event and the quantile of a continuous longitudinal response is proposed by Farcomeni and Viviani [36]. Marino et al. [89] define a latent drop-out class approach to handle potentially informative drop-out in the context of linear quantile latent Markov models.

5.2 Spatial quantile regression

Spatial data arise when measurements are taken at different sites on a geographical territory. In general, data are dependent and this dependence may be expressed as a function of the distance between locations on a multidimensional (usually, two or three dimensional) space. This kind of data comes from various areas of research, including econometrics, epidemiology, environmental science. Spatial quantile regression methods deal with the peculiar spatial dependence structure, while modelling conditional quantiles of the outcome of interest. They can be thought of as a direct extension of longitudinal quantile regression methods, where dependence arises according to distance in a one dimensional space (i.e. the time). Hallin et al. [50] introduce a nonparametric conditional spatial quantile regression model; asymptotic behaviours of the local linear quantile regression estimates are derived through asymptotic methods that are typically used when dealing with time series. Kostov [71] uses a spatial quantile regression model in the context of hedonic models for agricultural land prices to ensure robustness against the possible presence of outliers in the data. Dependence between adjacent sites is modelled by considering lagged responses in the linear predictor and an instrumental quantile regression approach is exploited to compute parameter estimates. Reich et al. [100] develop a spatial quantile model that incorporates spatial dependence through spatially varying regression parameters. These are expressed as a weighted sum of Bernstein basis polynomials, where the weights are constrained spatial Gaussian processes. Lum and Gelfand [86] specify a spatial quantile regression model for all quantiles of a response variable. Starting from the asymmetric Laplace distribution, a process for quantile regression with spatially dependent errors is defined and exploited to model the dependence between observations coming from neighbouring sites. Finally, under a robust semi-parametric framework, Lu et al. [85] propose a spatial quantile regression model with functional coefficients to deal with the curse of dimensionality when there are more than three covariates. For a general overview on quantile regression models for spatial data, we refer to the monograph by McMillen [90].

5.3 Nonparametric models

The literature on nonparametric quantile regression models is very rich. We only briefly mention the approaches that are tailored (or that can be tailored) to longitudinal studies. Koenker et al. [70] and Koenker and Mizera [69] use a total variation regularization approach to estimate possible univariate and bivariate nonparametric terms; De Gooijer and Zerom [27], Yu and Lu [117], Horowitz and Lee [58], Cai and Xu [16] base their estimation on local polynomial fitting; Takeuchi et al. [104] and Li et al. [77] explore reproducing kernel Hilbert space (RKHS) norms for nonparametric quantile estimation. Fenske et al. [37] use a boosting algorithm to allow for data-driven determination of the amount of smoothness required for nonlinear effects and combine model selection with an automatic variable selection property. Bayesian approaches are available, including Yue and Rue [121] who use integrated nested Laplace approximation and MCMC methods. Finally, Mu and Wei [92] propose a varying parameter model for conditional quantiles in longitudinal studies. Censored data are considered by Wang and Fygenson [107], while Wang et al. [109] deal with partially linear varying parameter models.

5.4 Open issues

Longitudinal quantile regression is still a relatively recent field. There are many open issues and many of them would require an extension of methods that have already been established for cross-sectional quantile regression.

First, some works deal with outcome transformations in order to move from non-linear to linear relationships when bounded outcomes have to be handled. These include [11, 14, 48, 91]; all these works are focused on cross-sectional models and, as outlined before, transformations are more delicate when there are repeated measurements.

A further issue that has been almost completely overlooked in longitudinal quantile regression is the joint modelling of more than one outcome. Methods for independent data have been proposed in [49, 51]. In the longitudinal setting, this situation is easier to work with as additional random parameters can be used in conjunction with a conditional independence assumption. An example can be found for instance in [18]. A similar issue is the simultaneous modelling of more than one quantile. A rare problem that arises when separately modelling response quantiles (as in conditional mixed quantile regression) is that predictions may cross, that is, for certain covariate combinations a low quantile might be predicted to be larger than a higher one. There is a lot of literature on simultaneous modelling for independent data [63]. The problem is also closely linked to density regression; see e.g. [31]. In the longitudinal setting, we are only aware of the Bayesian approach proposed by Todkar and Kadane [106].

A specific issue is related to extremal quantiles: while estimation far in the tails may be extremely useful in many cases, tails are often associated with low density regions. Important contributions to the cross-sectional context include [20, 22]. Traditional approaches break down as standard errors are inflated and parameter estimates (excluding the intercepts) shrunk towards zero. To tackle these issues, Wang et al. [108] propose a two step approach: first quantile regression is focused on intermediate conditional quantiles yielding to reliable estimates, then parameters are extrapolated to the tails based on assumptions on tail behaviours; see also Li et al. [76]. To the best of our knowledge, the issue of dealing with extremal quantiles in longitudinal studies has not been considered so far.

A further issue is that of planning the sample size, with the only exception, to our knowledge, of [21]. We finally mention that many results and properties of quantile regression which are well known and established in the cross-sectional framework have not been assessed yet in the longitudinal one. Work has still to be done in this area to give more substantial theoretical support to procedures and methods discussed throughout.

6 Software

We briefly mention in this section the readily available software to fit longitudinal quantile regression models. We report only on software which can be directly used for conditional or marginal modelling (with sandwich estimators) in the longitudinal quantile regression framework.

Conditional models with Gaussian and AL random parameters can be fit using the R package lqmm [44]. Linear quantile mixed models with a latent Markov random intercept can be fit using the R function lmqr [33]. A non-optimized version is currently available at http://www.afarcome.altervista.org/lmqr.r, while a refined version will be soon included in the R package LMest [6]. The lmqr function can be used both for latent Markov (using option lm=T) and latent class (lm=F) models. Conditional models with fixed individual-specific parameters can be fit using methods in package rqpd, available from R-Forge. In rqpd, one can obtain both the penalized longitudinal quantile regression by Koenker [62] and the corresponding generalization based on correlated random effects [1]. Finally, we mention the qreg procedure in Stata, which provides sandwich estimates of the covariance matrix.