Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

As mentioned earlier, a probability distribution can be specified either in terms of the distribution function or by the quantile function. Although both convey the same information about the distribution, with different interpretations, the concepts and methodologies based on distribution functions are traditionally employed in most forms of statistical theory and practice. One reason for this is that quantile-based studies were carried out mostly when the traditional approach either is difficult or fails to provide desired results. Except in a few isolated areas, there have been no systematic parallel developments aimed at replacing distribution functions in modelling and analysis by quantile functions. However, the feeling that through an appropriate choice of the domain of observations, a better understanding of a chance phenomenon can be achieved by the use of quantile functions, is fast gaining acceptance.

Historically, many facts about the potential of quantiles in data analysis were known even before the nineteenth century. It appears that the Belgian sociologist Quetelet [499] initiated the use of quantiles in statistical analysis in the form of the present day inter-quantile range. A formal representation of a distribution through a quantile function was introduced by Galton (1822–1911) [206] who also initiated the ordering of observations along with the concepts of median, quartiles and interquartile range. Subsequently, the research on quantiles was directed towards estimation problems with the aid of sample quantiles, their large sample behaviour and limiting distributions (Galton [207, 208]). A major development in portraying quantile functions to represent distributions is the work of Hastings et al. [264], who introduced a family of distributions by a quantile function. This was refined later by Tukey [568]. The symmetric distribution of Tukey [568] and his articulation of exploratory data analysis sparked considerable interest in quantile functional forms that continues till date. Various aspects of the Tukey family and generalizations thereto were studied by a number of authors including Hogben [273], Shapiro and Wilk [536], Filliben [197], Joiner and Rosenblatt [304], Ramberg and Schmeiser [504], Ramberg [501], Ramberg et al. [502], MacGillivray [407], Freimer et al. [203], Gilchrist [215] and Tarsitano [563]. We will discuss all these models in Chap. 3. Another turning point in the development of quantile functions is the seminal paper by Parzen [484], in which he emphasized the description of a distribution in terms of the quantile function and its role in data modelling. Parzen [485487] exhibits a sequential development of the theory and application of quantile functions in different areas and also as a tool in unification of various approaches.

Quantile functions have several interesting properties that are not shared by distributions, which makes it more convenient for analysis. For example, the sum of two quantile functions is again a quantile function. There are explicit general distribution forms for the quantile function of order statistics. In Sect. 1.2, we mention these and some other properties. Moreover, random numbers from any distribution can be generated using appropriate quantile functions, a purpose for which lambda distributions were originally conceived. The moments in different forms such as raw, central, absolute and factorial have been used effectively in specifying the model, describing the basic characteristics of distributions, and in inferential procedures. Some of the methods of estimation like least squares, maximum likelihood and method of moments often provide estimators and/or their standard errors in terms of moments. Outliers have a significant effect on the estimates so derived. For example, in the case of samples from the normal distribution, all the above methods give sample mean as the estimate of the population mean, whose values change significantly in the presence of an outlying observation. Asymptotic efficiency of the sample moments is rather poor for heavy tailed distributions since the asymptotic variances are mainly in terms of higher order moments that tend to be large in this case. In reliability analysis, a single long-term survivor can have a marked effect on mean life, especially in the case of heavy tailed models which are commonly encountered for lifetime data. In such cases, quantile-based estimates are generally found to be more precise and robust against outliers. Another advantage in choosing quantiles is that in life testing experiments, one need not wait until the failure of all the items on test, but just a portion of them for proposing useful estimates. Thus, there is a case for adopting quantile functions as models of lifetime and base their analysis with the aid of functions derived from them. Many other facets of the quantile approach will be more explicit in the sequel in the form of alternative methodology, new opportunities and unique cases where there are no corresponding results if one adopts the distribution function approach.

1.2 Definitions and Properties

In this section, we define the quantile function and discuss some of its general properties. The random variable considered here has the real line as its support, but the results are valid for lifetime random variables which take on only non-negative values.

Definition 1.1.

Let X be a real valued continuous random variable with distribution function F(x) which is continuous from the right. Then, the quantile function Q(u) of X is defined as

$$\displaystyle{ Q(u) = {F}^{-1}(u) =\inf \{ x: F(x) \geq u\},\quad 0 \leq u \leq 1. }$$
(1.1)

For every −  < x <  and 0 < u < 1, we have

$$\displaystyle{F(x) \geq u\text{ if and only if }Q(u) \leq x.}$$

Thus, if there exists an x such that F(x) = u, then F(Q(u)) = u and Q(u) is the smallest value of x satisfying F(x) = u. Further, if F(x) is continuous and strictly increasing, Q(u) is the unique value x such that F(x) = u, and so by solving the equation F(x) = u, we can find x in terms of u which is the quantile function of X. Most of the distributions we consider in this work are of this form and nature.

Definition 1.2.

If f(x) is the probability density function of X, then f(Q(u)) is called the density quantile function. The derivative of Q(u), i.e.,

$$\displaystyle{q(u) = Q^{\prime}(u),}$$

is known as the quantile density function of X. By differentiating F(Q(u)) = u, we find

$$\displaystyle{ q(u)f(Q(u)) = 1. }$$
(1.2)

Some important properties of quantile functions required in the sequel are as follows.

  1. 1.

    From the definition of Q(u) for a general distribution function, we see that

    1. (a)

      Q(u) is non-decreasing on (0, 1) with Q(F(x)) ≤ x for all −  < x <  for which 0 < F(x) < 1;

    2. (b)

      F(Q(u)) ≥ u for any 0 < u < 1;

    3. (c)

      Q(u) is continuous from the left or Q(u − ) = Q(u);

    4. (d)

      \(Q(u+) =\inf \{ x: F(x)> u\}\) so that Q(u) has limits from above;

    5. (e)

      Any jumps of F(x) are flat points of Q(u) and flat points of F(x) are jumps of Q(u).

  2. 2.

    If U is a uniform random variable over [0, 1], then X = Q(U) has its distribution function as F(x). This follows from the fact that

    $$\displaystyle{P(Q(U) \leq x) = P(U \leq F(x)) = F(x).}$$

    This property enables us to conceive a given data set as arising from the uniform distribution transformed by the quantile function Q(u).

  3. 3.

    If T(x) is a non-decreasing function of x, then T(Q(u)) is a quantile function. Gilchrist [215] refers to this as the Q-transformation rule. On the other hand, if T(x) is non-increasing, then T(Q(1 − u)) is also a quantile function.

Example 1.1.

Let X be a random variable with Pareto type II (also called Lomax) distribution with

$$\displaystyle{F(x) = 1 {-\alpha }^{c}{(x+\alpha )}^{-c},\quad x> 0;\;\alpha,c> 0.}$$

Since F(x) is strictly increasing, setting F(x) = u and solving for x, we obtain

$$\displaystyle{x = Q(u) =\alpha [{(1 - u)}^{-\frac{1} {c} } - 1].}$$

Taking T(X) = X β, β > 0, we have a non-decreasing transformation which results in

$$\displaystyle{T(Q(u)) {=\alpha }^{\beta }{[{(1 - u)}^{-\frac{1} {c} } - 1]}^{\beta }.}$$

When T(Q(u)) = y, we obtain, on solving for u,

$$\displaystyle{u = G(y) = 1 -{\left (1 + \frac{{y}^{\frac{1} {\beta } }} {\alpha } \right )}^{-c}}$$

which is a Burr type XII distribution with T(Q(u)) being the corresponding quantile function.

Example 1.2.

Assume X has Pareto type I distribution with

$$\displaystyle{F(x) = 1 -{\left (\frac{x} {\sigma } \right )}^{\alpha },\quad x>\sigma;\ \alpha> 0,\ \sigma> 0.}$$

Then, working as in the previous example, we see that

$$\displaystyle{Q(u) =\sigma {(1 - u)}^{-\frac{1} {\alpha } }.}$$

Apply the transformation T(X) = Y = X  − 1, which is non-increasing, we have

$$\displaystyle{T(Q(1 - u)) {=\sigma }^{-1}{u}^{\frac{1} {\alpha } }}$$

and equating this to y and solving, we get

$$\displaystyle{G(y) = {(y\sigma )}^{\alpha },\quad 0 \leq y \leq \frac{1} {\sigma }.}$$

G(y) is the distribution function of a power distribution with T(Q(1 − u)) being the corresponding quantile function.

  1. 4.

    If Q(u) is the quantile function of X with continuous distribution function F(x) and T(u) is a non-decreasing function satisfying the boundary conditions T(0) = 0 and T(1) = 1, then Q(T(u)) is a quantile function of a random variable with the same support as X.

Example 1.3.

Consider a non-negative random variable with continuous distribution function F(x) and quantile function Q(u). Taking \(T(u) = {u}^{\frac{1} {\theta } }\), for θ > 0, we have T(0) = 0 and T(1) = 1. Then,

$$\displaystyle{Q_{1}(u) = Q(T(u)) = Q({u}^{\frac{1} {\theta } }).}$$

Further, if y = Q 1(u), \({u}^{\frac{1} {\theta } } = y\) and so the distribution function corresponding to Q 1(u) is

$$\displaystyle{G(x) = {F}^{\theta }(x).}$$

The random variable Y with distribution function G(x) is called the proportional reversed hazards model of X. There is considerable literature on such models in reliability and survival analysis. If we take X to be exponential with

$$\displaystyle{F(x) = 1 - {e}^{-\lambda x},\quad x> 0;\;\lambda> 0,}$$

so that

$$\displaystyle{Q(u) {=\lambda }^{-1}(-\log (1 - u)),}$$

then

$$\displaystyle{Q_{1}(u) {=\lambda }^{-1}(-\log (1 - {u}^{\frac{1} {\theta } }))}$$

provides

$$\displaystyle{G(x) = {(1 - {e}^{-\lambda x})}^{\theta },}$$

the generalized or exponentiated exponential law (Gupta and Kundu [250]). In a similar manner, Mudholkar and Srivastava [429] take the baseline distribution as Weibull. For some recent results and survey of such models, we refer the readers to Gupta and Gupta [240]. In Chap. 3, we will come across several quantile functions that represent families of distributions containing some life distributions as special cases. They are highly flexible empirical models capable of approximating many continuous distributions. The above transformation on these models generates new proportional reversed hazards models of a general form. The analysis of lifetime data employing such models seems to be an open issue.

Remark 1.1.

From the form of G(x) above, it is clear that for positive integral values of θ, it is simply the distribution function of the maximum of a random sample of size θ from the exponential population with distribution function F(x) above. Thus, G(x) may be simply regarded as the distribution function of the maximum from a random sample of real size θ (instead of an integer). This viewpoint was discussed by Stigler [547] under the general idea of ‘fractional order statistics’; see also Rohatgi and Saleh [509].

Remark 1.2.

Just as G(x) can be regarded as the distribution function of the maximum from a random sample of (real) size θ from the population with distribution function F(x), we can consider \({G}^{{\ast}}(x) = 1 - {(1 - F(x))}^{\theta }\) as a generalized form corresponding to the minimum of a random sample of (real) size θ. The model G  ∗ (x) is, of course, the familiar proportional hazards model. It is important to mention here that these two models are precisely the ones introduced by Lehmann [382], as early as in 1953, as stochastically ordered alternatives for nonparametric tests of equality of distributions.

Remark 1.3.

It is useful to bear in mind that for distributions closed under minima such as exponential and Weibull (i.e., the distributions for which the minima have the same form of the distribution but with different parameters), the distribution function G(x) would provide a natural generalization while, for distributions closed under maxima such as power and inverse Weibull (i.e., the distributions for which the maxima have the same form of the distribution but with different parameters), the distribution function G  ∗ (x) would provide a natural generalization.

  1. 5.

    The sum of two quantile functions is again a quantile function. Likewise, two quantile density functions, when added, produce another quantile density function.

  2. 6.

    The product of two positive quantile functions is a quantile function. In this case, the condition of positivity cannot be relaxed, as in general, there may be negative quantile functions that affect the increasing nature of the product. Since we are dealing primarily with lifetimes, the required condition will be automatically satisfied.

  3. 7.

    If X has quantile function Q(u), then \(\frac{1} {X}\) has quantile function 1 ∕ Q(1 − u).

Remark 1.4.

Property 7 is illustrated in Example 1.2. Chapter 3 contains some examples wherein quantile functions are generated as sums and products of quantile functions of known distributions. It becomes evident from Properties 3–7 that they can be used to produce new distributions from the existing ones. Thus, in our approach, a few basic forms are sufficient to begin with since new forms can always be evolved from them that match our requirements and specifications. This is in direct contrast to the abundance of probability density functions built up, each to satisfy a particular data form in the distribution function approach. In data analysis, the crucial advantage is that if one quantile function is not an appropriate model, the features that produce lack of fit can be ascertained and rectification can be made to the original model itself. This avoids the question of choice of an altogether new model and the repetition of all inferential procedures for the new one as is done in most conventional analyses.

  1. 8.

    The concept of residual life is of special interest in reliability theory. It represents the lifetime remaining in a unit after it has attained age t. Thus, if X is the original lifetime with quantile function Q(u), the associated residual life is the random variable X t  = (X − t | X > t). Using the definition of conditional probability, the survival function of X t is

    $$\displaystyle{\bar{F}_{t}(x) = P(X_{t}> x) = \dfrac{\bar{F}(x + t)} {\bar{F}(t)} \,}$$

    where \(\bar{F}(x) = P(X> x) = 1 - F(x)\) is the survival function. Thus, we have

    $$\displaystyle{ F_{t}(x) = \frac{F(x + t) - F(t)} {1 - F(t)}. }$$
    (1.3)

    Let F(t) = u 0, F(x + t) = v and F t (x) = u. Then, with

    $$\displaystyle{x + t = Q(v),\quad x = Q_{1}(u),\text{ say,}}$$

    we have

    $$\displaystyle{Q_{1}(u) = Q(v) - Q(u_{0})}$$

    and consequently from (1.3),

    $$\displaystyle{u(1 - u_{0}) = v - u_{0}}$$

    or

    $$\displaystyle{v = u_{0} + (1 - u_{0})u.}$$

    Thus, the quantile function of the residual life X t becomes

    $$\displaystyle{ Q_{1}(u) = Q(u_{0} + (1 - u_{0})u) - Q(u_{0}). }$$
    (1.4)

    Equation (1.4) will be made use of later in defining mean residual quantile function in Chap. 2.

  2. 9.

    In some reliability and quality control situations, truncated forms of lifetime models arise naturally, and the truncation may be on the right or on the left or on both sides. Suppose F(x) is the underlying distribution function and Q(u) is the corresponding quantile function. Then, if the distribution is truncated on the right at x = U (i.e., the observations beyond U cannot be observed), then the corresponding distribution function is

    $$\displaystyle{F_{RT}(x) = \frac{F(x)} {F(U)},\quad 0 \leq x \leq U,}$$

    and its quantile function is

    $$\displaystyle{Q_{RT}(x) = Q(u{Q}^{-1}(U)).}$$

    Similarly, if the distribution is truncated on the left at x = L (i.e., the observations below L cannot be observed), then the corresponding distribution func- tion is

    $$\displaystyle{F_{LT}(x) = \frac{F(x) - F(L)} {1 - F(L)},\quad x \geq L,}$$

    and its quantile function is

    $$\displaystyle{Q_{LT}(u) = Q(u + (1 - u){Q}^{-1}(L)).}$$

    Finally, if the distribution is truncated on the left at x = L and also on the right at x = U, then the corresponding distribution function is

    $$\displaystyle{F_{DT}(x) = \frac{F(x) - F(L)} {F(U) - F(L)},\quad L \leq x \leq U,}$$

    and its quantile function is

    $$\displaystyle{Q_{DT}(u) = Q(u{Q}^{-1}(U) + (1 - u){Q}^{-1}(L)).}$$

Example 1.4.

Suppose the underlying distribution is logistic with distribution function F(x) = 1 ∕ (1 + e  − x) on the whole real line \(\mathbb{R}\). It is easily seen that the corresponding quantile function is \(Q(u) =\log \left ( \frac{u} {1-u}\right )\). Further, suppose we consider the distribution truncated on the left at 0, i.e., L = 0, for proposing a lifetime model. Then, from the expression above and the fact that \({Q}^{-1}(0) = \frac{1} {2}\), we arrive at the quantile function

$$\displaystyle{Q_{LT}(u) = Q\left (u + (1 - u)\frac{1} {2}\right ) =\log \left ( \frac{u + \frac{1} {2}(1 - u)} {1 - u -\frac{1} {2}(1 - u)}\right ) =\log \left (\frac{1 + u} {1 - u}\right )}$$

corresponding to the half-logistic distribution of Balakrishnan [47, 48]; see Table 1.1.

1.3 Quantile Functions of Life Distributions

As mentioned earlier, we concentrate here on distributions of non-negative random variables representing the lifetime of a component or unit. The distribution function of such random variables is such that F(0 − ) = 0. Often, it is more convenient to work with

$$\displaystyle{\bar{F}(x) = 1 - F(x) = P(X> x),}$$

which is the probability that the unit survives time (referred to as the age of the unit) x. It is also called the reliability or survival function since it expresses the probability that the unit is still reliable at age x.

In the previous section, some examples of quantile functions and a few methods of obtaining them were explained. We now present in Table 1.1 quantile functions of many distributions considered in the literature as lifetime models. The properties of these distributions are discussed in the references cited below each of them. Models like gamma, lognormal and inverse Gaussian do not find a place in the list as their quantile functions are not in a tractable form. However, in the next chapter, we will see quantile forms that provide good approximations to them.

Table 1.1 Quantile functions of some lifetime distributions

1.4 Descriptive Quantile Measures

The advent of the Pearson family of distributions was a major turning point in data modelling using distribution functions. The fact that members of the family can be characterized by the first four moments gave an impetus to the extensive use of moments in describing the properties of distributions and their fitting to observed data. A familiar pattern of summary measures took the form of mean for location, variance for dispersion, and the Pearson’s coefficients \(\beta _{1} = \frac{\mu _{3}^{2}} {\mu _{2}^{3}}\) for skewness and \(\beta _{2} = \frac{\mu _{4}} {\mu _{2}^{2}}\) for kurtosis. While the mean and variance claimed universal acceptance, several limitations of β 1 and β 2 were subsequently exposed. Some of the concerns with regard to β 1 are: (1) it becomes arbitrarily large or even infinite making it difficult for comparison and interpretation as relatively small changes in parameters produce abrupt changes, (2) it does not reflect the sign of the difference (mean-median) which is a traditional basis for defining skewness, (3) there exist asymmetric distributions with β 1 = 0 and (4) instability of the sample estimate of β 1 while matching with the population value. Similarly, for a standardized variable X, the relationship

$$\displaystyle{ E({X}^{4}) = 1 + V ({X}^{2}) }$$
(1.5)

would mean that the interpretation of kurtosis depends on the concentration of the probabilities near μ ± σ as well as in the tails of the distribution.

The specification of a distribution through its quantile function takes away the need to describe a distribution through its moments. Alternative measures in terms of quantiles that reduce the shortcomings of the moment-based ones can be thought of. A measure of location is the median defined by

$$\displaystyle{ M = Q(0.5). }$$
(1.6)

Dispersion is measured by the interquartile range

$$\displaystyle{ \text{IQR } = Q_{3} - Q_{1}, }$$
(1.7)

where Q 3 = Q(0. 75) and Q 1 = Q(0. 25).

Skewness is measured by Galton’s coefficient

$$\displaystyle{ S = \frac{Q_{1} + Q_{3} - 2M} {Q_{3} - Q_{1}}. }$$
(1.8)

Note that in the case of extreme positive skewness, Q 1 → M while in the case of extreme negative skewness Q 3 → M so that S lies between − 1 and + 1. When the distribution is symmetric, \(M = \frac{Q_{1}+Q_{3}} {2}\) and hence S = 0. Due to the relation in (1.5), kurtosis can be large when the probability mass is concentrated near the mean or in the tails. For this reason, Moors [421] proposed the measure

$$\displaystyle{ T = [Q(0.875) - Q(0.625) + Q(0.375) - Q(0.125)]/\text{IQR} }$$
(1.9)

as a measure of kurtosis. As an index, T is justified on the grounds that the differences Q(0. 875) − Q(0. 625) and Q(0. 375) − Q(0. 125) become large (small) if relatively small (large) probability mass is concentrated around Q 3 and Q 1 corresponding to large (small) dispersion in the vicinity of μ ± σ.

Given the form of Q(u), the calculations of all the coefficients are very simple, as we need to only substitute the appropriate fractions for u. On the other hand, calculation of moments given the distribution function involves integration, which occasionally may not even yield closed-form expressions.

Example 1.5.

Let X follow the Weibull distribution with (see Table 1.1)

$$\displaystyle{Q(u) =\sigma {(-\log (1 - u))}^{\frac{1} {\lambda } }.}$$

Then, we have

$$\displaystyle\begin{array}{rcl} M& =& Q\left (\frac{1} {2}\right ) =\sigma {(\log 2)}^{\frac{1} {\lambda } }, {}\\ S& =& \frac{{(\log 4)}^{\frac{1} {\lambda } } + {(\log \frac{4}{3})}^{\frac{1} {\lambda } } - 2{(\log 2)}^{\frac{1} {\lambda } }} {{(\log 4)}^{\frac{1} {\lambda } } -{\left (\log \frac{4}{3}\right )}^{\frac{1} {\lambda } }}, {}\\ \text{IQR}& =& \sigma \left [{(\log 4)}^{\frac{1} {\lambda } } -{\left (\log \frac{4} {3}\right )}^{\frac{1} {\lambda } }\right ], {}\\ \end{array}$$

and

$$\displaystyle{T = \frac{{(\log 8)}^{\frac{1} {\lambda } } -{\left (\log \frac{8}{3}\right )}^{\frac{1} {\lambda } } +{ \left (\log \frac{8}{5}\right )}^{\frac{1} {\lambda } } -{\left (\log \frac{8}{7}\right )}^{\frac{1} {\lambda } }} {{(\log 4)}^{\frac{1} {\lambda } } -{\left (\log \frac{4}{3}\right )}^{\frac{1} {\lambda } }}.}$$

The effect of a change of origin and scale on Q(u) and the above four measures are of interest in later studies. Let X and Y be two random variables such that Y = aX + b. Then,

$$\displaystyle{F_{Y }(y) = P(Y \leq y) = P\left (X \leq \frac{y - b} {a} \right ) = F_{X}\left (\frac{y - a} {b} \right ).}$$

If Q X (u) and Q Y (u) denote the quantile functions of X and Y, respectively,

$$\displaystyle{F_{X}\left (\frac{y - a} {b} \right ) = u \Rightarrow Q_{X}(u) = \frac{y - b} {a} = \frac{Q_{Y }(u) - b} {a} }$$

or

$$\displaystyle{Q_{Y }(u) = aQ_{X}(u) + b.}$$

So, we simply have

$$\displaystyle{M_{Y } = Q_{Y }(0.5) = aQ_{X}(0.5) + b = aM_{X} + b.}$$

Similar calculations using (1.7), (1.8) and (1.9) yield

$$\displaystyle{\text{IQR}_{Y } = a\text{IQR}_{X},\ S_{Y } = S_{X}\text{ and }T_{Y } = T_{X}.}$$

Other quantile-based measures have also been suggested for quantifying spread, skewness and kurtosis. One measure of spread, similar to mean deviation in the distribution function approach, is the median of absolute deviation from the median, viz.,

$$\displaystyle{ A = \text{Med}\,(\vert X - M\vert ). }$$
(1.10)

For further details and justifications for (1.10), we refer to Falk [194]. A second popular measure that has received wide attention in economics is Gini’s mean difference defined as

$$\displaystyle\begin{array}{rcl} \Delta & =& \int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }\vert x - y\vert f(x)f(y)dxdy \\ & =& 2\int _{-\infty }^{\infty }F(x)(1 - F(x))dx, {}\end{array}$$
(1.11)

where f(x) is the probability density function of X. Setting F(x) = u in (1.11), we have

$$\displaystyle\begin{array}{rcl} \Delta & = 2\int _{0}^{1}u(1 - u)q(u)du&{}\end{array}$$
(1.12)
$$\displaystyle\begin{array}{rcl} & = 2\int _{0}^{1}(2u - 1)Q(u)du.&{}\end{array}$$
(1.13)

The expression in (1.13) follows from (1.12) by integration by parts. One may use (1.12) or (1.13) depending on whether q(u) or Q(u) is specified. Gini’s mean difference will be further discussed in the context of reliability in Chap. 4.

Example 1.6.

The generalized Pareto distribution with (see Table 1.1)

$$\displaystyle{Q(u) = \frac{b} {a}\left \{{(1 - u)}^{- \frac{a} {a+1} } - 1\right \}}$$

has its quantile density function as

$$\displaystyle{q(u) = \frac{b} {a + 1}{(1 - u)}^{- \frac{a} {a+1} -1}.}$$

Then, from (1.12), we obtain

$$\displaystyle{\Delta = \frac{2b} {a + 1}\int _{0}^{1}u{(1 - u)}^{- \frac{a} {a+1} }du = \frac{2b} {a + 1}B\left (2, \frac{1} {a + 1}\right ),}$$

where \(B(m,n) =\int _{ 0}^{1}{t}^{m-1}{(1 - t)}^{n-1}dt\) is the complete beta function. Thus, we obtain the simplified expression

$$\displaystyle{\Delta = \frac{2b(a + 1)} {a + 2}.}$$

Hinkley [271] proposed a generalization of Galton’s measure of skewness of the form

$$\displaystyle{ S(u) = \frac{Q(u) + Q(1 - u) - 2Q(0.5)} {Q(u) - Q(1 - u)}. }$$
(1.14)

Obviously, (1.14) reduces to Galton’s measure when u = 0. 75. Since (1.14) is a function of u and u is arbitrary, an overall measure of skewness can be provided as

$$\displaystyle{S_{2} =\sup _{\frac{1} {2} \leq u\leq 1}S(u).}$$

Groeneveld and Meeden [227] suggested that the numerator and denominator in (1.14) be integrated with respect to u to arrive at the measure

$$\displaystyle{S_{3} = \frac{\int _{\frac{1} {2} }^{1}\{Q(u) + Q(1 - u) - 2Q(0.5)\}du} {\int _{\frac{1} {2} }^{1}\{Q(u) + Q(1 - u)\}du}.}$$

Now, in terms of expectations, we have

$$\displaystyle\begin{array}{rcl} \int _{\frac{1} {2} }^{1}Q(u)du& =& \int _{ M}^{x}xf(x)dx, {}\\ \int _{\frac{1} {2} }^{1}Q(1 - u)du& =& \int _{ 0}^{\frac{1} {2} }Q(u)du =\int _{ 0}^{M}xf(x)dx, {}\\ \int _{\frac{1} {2} }^{1}Q(0.5)du& =& \frac{1} {2}M, {}\\ \end{array}$$

and thus

$$\displaystyle\begin{array}{rcl} S_{3} = \frac{E(X) - M} {\int _{M}^{\infty }xf(x)dx -\int _{0}^{M}xf(x)dx} = \frac{\mu -M} {E(\vert X - M\vert )}.& &{}\end{array}$$
(1.15)

The numerator of (1.15) is the traditional term (being the difference between the mean and the median) indicating skewness and the denominator is a measure of spread used for standardizing S 3. Hence, (1.15) can be thought of as an index of skewness in the usual sense. If we replace the denominator by the standard deviation σ of X, the classical measure of skewness will result.

Example 1.7.

Consider the half-logistic distribution with (see Table 1.1)

$$\displaystyle\begin{array}{rcl} Q(u)& =& \sigma \log \left (\frac{1 + u} {1 - u}\right ), {}\\ \mu & =& \int _{0}^{1}Q(u)du =\sigma \log 4, {}\\ \int _{\frac{1} {2} }^{1}Q(u)du& =& \sigma \left (\log 16 -\frac{3} {2}\log 3\right ), {}\\ \int _{0}^{\frac{1} {2} }Q(u)du& =& \sigma \left (\frac{3} {2}\log 3 - 2\log 2\right ), {}\\ \end{array}$$

and hence \(S_{3} =\log (\frac{4} {3})/\log (\frac{64} {27})\).

Instead of using quantiles, one can also use percentiles to define skewness. Galton [206] in fact used the middle 50 % of observations, while Kelly’s measure takes 90 % of observations to propose the measure

$$\displaystyle{S_{4} = \frac{Q(0.90) + Q(0.10) - 2M} {Q(0.90) - Q(0.10)}.}$$

For further discussion of alternative measures of skewness and kurtosis, a review of the literature and comparative studies, we refer to Balanda and MacGillivray [63], Tajuddin [559], Joannes and Gill [299], Suleswki [552] and Kotz and Seier [355].

1.5 Order Statistics

In life testing experiments, a number of units, say n, are placed on test and the quantity of interest is their failure times which are assumed to follow a distribution F(x). The failure times \(X_{1},X_{2},\ldots,X_{n}\) of the n units constitute a random sample of size n from the population with distribution function F(x), if \(X_{1},X_{2},\ldots,X_{n}\) are independent and identically distributed as F(x). Suppose the realization of X i in an experiment is denoted by x i . Then, the order statistics of the random sample \((X_{1},X_{2},\ldots,X_{n})\) are the sample values placed in ascending order of magnitude denoted by \(X_{1:n} \leq X_{2:n} \leq \cdots \leq X_{n:n}\), so that \(X_{1:n} =\min _{1\leq i\leq n}X_{i}\) and \(X_{n:n} =\max _{1\leq i\leq n}X_{i}\). The sample median, denoted by m, is the value for which approximately 50 % of the observations are less than m and 50 % are more than m. Thus

$$\displaystyle{ m = \left \{\begin{array}{@{}l@{\quad }l@{}} X_{\frac{n+1} {2}:n} \quad &\mbox{ if $n$ is odd} \\ \frac{1} {2}(X_{\frac{n} {2}:n} + X_{\frac{n} {2} +1:n})\quad &\mbox{ if $n$ is even.} \end{array} \right. }$$
(1.16)

Generalizing, we have the percentiles. The 100p-th percentile, denoted by x p , in the sample corresponds to the value for which approximately np observations are smaller than this value and n(1 − p) observations are larger. In terms of order statistics we have

$$\displaystyle{ x_{p} = \left \{\begin{array}{@{}l@{\quad }l@{}} X_{[np]:n} \quad &\mbox{ if $ \frac{1} {2n} <p <0.5$} \\ X_{(n+1)-[n(1-p)]}\quad &\text{if }0.5 <p <1 - \frac{1} {2n} \end{array} \right., }$$
(1.17)

where the symbol [t] is defined as [t] = r whenever r − 0. 5 ≤ t < r + 0. 5, for all positive integers r. We note that in the above definition, if x p is the ith smallest observation, then the ith largest observation is x 1 − p . Obviously, the median m is the 50th percentile and the lower quartile q 1 and the upper quartile q 3 of the sample are, respectively, the 25th and 75th percentiles. The sample interquartile range is

$$\displaystyle{ iqr = q_{3} - q_{1}. }$$
(1.18)

All the sample descriptive measures are defined in terms of the sample median, quartiles and percentiles analogous to the population measures introduced in Sect. 1.4. Thus, iqr in (1.18) describes the spread, while

$$\displaystyle{ s = \frac{q_{3} + q_{1} - 2m} {q_{3} - q_{1}} }$$
(1.19)

and

$$\displaystyle{ t = \frac{e_{7} - e_{5} + e_{3} - e_{1}} {iqr}, }$$
(1.20)

where \(e_{i} = \frac{i} {8}\), i = 1, 3, 5, 7, describes the skewness and kurtosis.

Parzen [484] introduced the empirical quantile function

$$\displaystyle{\bar{Q}(u) = F_{n}^{-1}(u) =\inf (x: F_{ n}(x) \geq u),}$$

where F n (x) is the proportion of \(X_{1},X_{2},\ldots,X_{n}\) that is at most x. In other words,

$$\displaystyle{ \bar{Q}(u) = X_{r:n}\quad \text{ for }\frac{r - 1} {n} <u <\frac{r} {n},\quad r = 1,2,\ldots,n, }$$
(1.21)

which is a step function with jump \(\frac{1} {n}\). For u = 0, \(\bar{Q}(u)\) is taken as X 1: n or a natural minimum if one is available. In the case of lifetime variables, this becomes \(\bar{Q}(0)\). When a smooth function is required for \(\bar{Q}(u)\), Parzen [484] suggested the use of

$$\displaystyle{\bar{Q}_{1}(u) = n\left ( \frac{r} {n} - u\right )X_{r-1:n} + n\left (u -\frac{r - 1} {n} \right )X_{r:n}}$$

for \(\frac{r-1} {n} \leq u \leq \frac{r} {n}\), r = 1, 2, , n. The corresponding empirical quantile density function is

$$\displaystyle{\bar{q}_{1}(u) = \frac{d} {du}\bar{Q}_{1}(u) = n(X_{r:n} - X_{r-1:n}),\quad \mbox{ for }\frac{r - 1} {n} <u <\frac{r} {n}.}$$

In this set-up, we have \(q_{i} =\bar{ Q}( \frac{i} {4})\), i = 1, 3 and \(e_{i} =\bar{ Q}( \frac{i} {8})\), i = 1, 3, 7, 8.

It is well known that the distribution of the rth order statistic X r: n is given by Arnold et al. [37]

$$\displaystyle{ F_{r}(x) = P(X_{r:n} \leq x) =\sum _{ k=r}^{n}{n\choose k}{F}^{k}(x){(1 - F(x))}^{n-k}. }$$
(1.22)

In particular, X n: n and X 1: n have their distributions as

$$\displaystyle{ F_{n}(x) = {F}^{n}(x) }$$
(1.23)

and

$$\displaystyle{ F_{1}(x) = 1 - {(1 - F(x))}^{n}. }$$
(1.24)

Recalling the definitions of the beta function

$$\displaystyle{B(m,n) =\int _{ 0}^{1}{t}^{m-1}{(1 - t)}^{n-1}dt,\quad m,n> 0,}$$

and the incomplete beta function ratio

$$\displaystyle{I_{x}(m,n) = \frac{B_{x}(m,n)} {B(m,n)},}$$

where

$$\displaystyle{B_{x}(m,n) =\int _{ 0}^{x}{t}^{m-1}{(1 - t)}^{n-1}dt,}$$

we have the upper tail of the binomial distribution and the incomplete beta function ratio to be related as (Abramowitz and Stegun [15])

$$\displaystyle{ \sum _{k=r}^{n}\binom{n}{k}{p}^{k}{(1 - p)}^{n-k} = I_{ p}(r,n - r + 1). }$$
(1.25)

Comparing (1.22) and (1.25) we see that, if a sample of n observations from a distribution with quantile function Q(u) is ordered, then the quantile function of the rth order statistic is given by

$$\displaystyle{ Q_{r}(u_{r}) = Q(I_{u_{r}}^{-1}(r,n - r + 1)), }$$
(1.26)

where

$$\displaystyle{ u_{r} = I_{u}(r,n - r + 1) }$$
(1.27)

and I  − 1 is the inverse of the beta function ratio I. Thus, the quantile function of the rth order statistic has an explicit distributional form, unlike the expression for distribution function in (1.22). However, the expression for Q r (u r ) is not explicit in terms of Q(u). This is not a serious handicap as the I u ( ⋅, ⋅) function is tabulated for various values of n and r (Pearson [489]) and also available in all statistical softwares for easy computation. The distributions of X n: n and X 1: n have simple quantile forms

$$\displaystyle{Q_{n}(u_{n}) = Q\left (u_{n}^{ \frac{1} {n} }\right )}$$

and

$$\displaystyle{Q_{1}(u_{1}) = Q[1 - {(1 - u_{1})}^{ \frac{1} {n} }].}$$

The probability density function of X r: n becomes

$$\displaystyle{f_{r}(x) = \frac{n!} {(r - 1)!\,(n - r)!}{F}^{r-1}(x){(1 - F(x))}^{n-r}f(x)}$$

and so

$$\displaystyle\begin{array}{rcl} \mu _{r:n}& =& E(X_{r:n}) =\int xf_{r}(x)dx \\ & =& \frac{n!} {(r - 1)!(n - 1)!}\int _{0}^{1}{u}^{r-1}{(1 - u)}^{n-r}Q(u)du.{}\end{array}$$
(1.28)

This mean value is referred to as the rth mean rankit of X. For reasons explained earlier with reference to the use of moments, often the median rankit

$$\displaystyle{ M_{r:n} = Q(I_{0.5}^{-1}(r,n - r + 1)), }$$
(1.29)

which is robust, is preferred over the mean rankit.

The importance and role of order statistics in the study of quantile function become clear from the discussions in this section. Added to this, there are several topics in reliability analysis in which order statistics appear quite naturally. One of them is system reliability. We consider a system consisting of n components whose lifetimes \(X_{1},X_{2},\ldots,X_{n}\) are independent and identically distributed. The system is said to have a series structure if it functions only when all the components are functioning, and the lifetime of this system is the smallest among the X i ’s or X 1: n . In the parallel structure, on the other hand, the system functions if and only if at least one of the components work, so that the system life is X n: n . These two structures are embedded in what is called a k-out-of-n system, which functions if and only if at least k of the components function. The lifetime of such a system is obviously X n − k + 1: n .

In life testing experiments, when n identical units are put on test to ascertain their lengths of life, there are schemes of sampling wherein the experimenter need not have to wait until all units fail. The experimenter may choose to observe only a prefixed number of failures of, say, n − r units and terminate the experiment as soon as the (n − r)th unit fails. Thus, the lifetimes of r units that are still working get censored. This sampling scheme is known as type II censoring. The data consists of realizations of \(X_{1:n},X_{2:n},\ldots,X_{n-r:n}\). Another sampling scheme is to prefix a time T  ∗  and observe only those failures that occur up to time T  ∗ . This scheme is known as type I censoring, and in this case the number of failures to be observed is random. One may refer to Balakrishnan and Cohen [51] and Cohen [154] for various methods of inference for type I and type II censored samples from a wide array of lifetime distributions. Yet another sampling scheme is to prefix the number of failures at n − r and also a time T  ∗ . If (n − r) failures occur before time T  ∗ , then the experiment is terminated; otherwise, observe all failures until time T  ∗ . Thus, the time of truncation of the experiment is now min(T, X n − r: n ). This is referred to as type I hybrid censoring; see Balakrishnan and Kundu [53] for an overview of various developments on this and many other forms of hybrid censoring schemes. A third important application of order statistics is in the construction of tests regrading the nature of ageing of a device; see Lai and Xie [368]. For an encyclopedic treatment on the theory, methods and applications of order satistics, one may refer to Balakrishnan and Rao [56, 57].

1.6 Moments

The emphasis given to quantiles in describing the basic properties of a distribution does not in any way minimize the importance of moments in model specification and inferential problems. In this section, we look at various types of moments through quantile functions. The conventional moments

$$\displaystyle{\mu _{r}^{\prime} = E({X}^{r}) =\int _{ 0}^{\infty }{x}^{r}f(x)dx}$$

are readily expressible in terms of quantile functions, by the substitution x = Q(u), as

$$\displaystyle{ \mu _{r}^{\prime} =\int _{ 0}^{1}\{Q{(u)\}}^{r}du. }$$
(1.30)

In particular, as already seen, the mean is

$$\displaystyle{ \mu =\int _{ 0}^{1}Q(u)du =\int _{ 0}^{1}(1 - u)q(u)du. }$$
(1.31)

The central moments and other quantities based on it are obtained through the well-known relationships they have with the raw moments μ ′ r in (1.30).

Some of the difficulties experienced while employing the moments in descriptive measures as well as in inferential problems have been mentioned in the previous sections. The L-moments to be considered next can provide a competing alternative to the conventional moments. Firstly, by definition, they are expected values of linear functions of order statistics. They have generally lower sampling variances and are also robust against outliers. Like the conventional moments, L-moments can be used as summary measures (statistics) of probability distributions (samples), to identify distributions and to fit models to data. The origin of L-moments can be traced back to the work on linear combination of order statistics in Sillito [537] and Greenwood et al. [226]. It was Hosking [276] who presented a unified theory on L-moments and made a systematic study of their properties and role in statistical analysis. See also Hosking [277, 279, 280] and Hosking and Wallis [282] for more elaborate details on this topic.

The rth L-moment is defined as

$$\displaystyle{ L_{r} = \frac{1} {r}\sum _{k=0}^{r-1}{(-1)}^{k}\binom{r - 1}{k}E(X_{ r-k:r}),\quad r = 1,2,3,\ldots }$$
(1.32)

Using (1.28), we can write

$$\displaystyle{L_{r} = \frac{1} {r}\sum _{k=0}^{r-1}{(-1)}^{k}\binom{r - 1}{k} \frac{r!} {(r - k - 1)!k!}\int _{0}^{1}{u}^{r-k-1}{(1 - u)}^{k}Q(u)du.}$$

Expanding (1 − u)k in powers of u using binomial theorem and combining powers of u, we get

$$\displaystyle{ L_{r} =\int _{ 0}^{1}\sum _{ k=0}^{r-1}{(-1)}^{r-1-k}\binom{r - 1}{k}\binom{r - 1 + k}{k}{u}^{k}Q(u)du. }$$
(1.33)

Jones [306] has given an alternative method of establishing the last relationship. In particular, we obtain:

$$\displaystyle{ L_{1} =\int _{ 0}^{1}Q(u)du =\mu, }$$
(1.34)
$$\displaystyle{ L_{2} =\int _{ 0}^{1}(2u - 1)Q(u)du, }$$
(1.35)
$$\displaystyle{ L_{3} =\int _{ 0}^{1}(6{u}^{2} - 6u + 1)Q(u)du, }$$
(1.36)
$$\displaystyle{ L_{4} =\int _{ 0}^{1}(20{u}^{3} - 30{u}^{2} + 12u - 1)Q(u)du. }$$
(1.37)

Sometimes, it is convenient (to avoid integration by parts while computing the integrals in (1.34)–(1.37)) to work with the equivalent formulas

$$\displaystyle{ L_{1} =\int _{ 0}^{1}(1 - u)q(u)du, }$$
(1.38)
$$\displaystyle{ L_{2} =\int _{ 0}^{1}(u - {u}^{2})q(u)du, }$$
(1.39)
$$\displaystyle{ L_{3} =\int _{ 0}^{1}(3{u}^{2} - 2{u}^{3} - u)q(u)du, }$$
(1.40)
$$\displaystyle{ L_{4} =\int _{ 0}^{1}(u - 6{u}^{2} + 10{u}^{3} - 5{u}^{4})q(u)du. }$$
(1.41)

Example 1.8.

For the exponential distribution with parameter λ, we have

$$\displaystyle{Q(u) = {-\lambda }^{-1}\log (1 - u)\quad \mbox{ and }\quad q(u) = \frac{1} {\lambda (1 - u)}.}$$

Hence, using (1.38)–(1.41), we obtain

$$\displaystyle\begin{array}{rcl} L_{1}& =& \int _{0}^{1}\frac{1} {\lambda } du {=\lambda }^{-1}, {}\\ L_{2}& =& \int _{0}^{1}u(1 - u)q(u)du =\int _{ 0}^{1}\frac{u} {\lambda } du = {(2\lambda )}^{-1}, {}\\ L_{3}& =& \int _{0}^{1}u(1 - u)(2u - 1)q(u)du = {(6\lambda )}^{-1}, {}\\ L_{4}& =& \int _{0}^{1}u(1 - u)(1 - 5u + 5{u}^{2})q(u)du = {(12\lambda )}^{-1}. {}\\ \end{array}$$

More examples are presented in Chap. 3 when properties of various distributions are studied.

The L-moments have the following properties that distinguish themselves from the usual moments:

  1. 1.

    The L-moments exist whenever E(X) is finite, while additional restrictions may be required for the conventional moments to be finite for many distributions;

  2. 2.

    A distribution whose mean exists is characterized by (L r : r = 1, 2, ). This result can be compared with the moment problem discussed in probability theory. However, any set that contains all L-moments except one is not sufficient to characterize a distribution. For details, see Hosking [279, 280];

  3. 3.

    From (1.12), we see that \(L_{2} = \frac{1} {2}\Delta\), and so L 2 is a measure of spread. Thus, the first (being the mean) and second L-moments provide measures of location and spread. In a recent comparative study of the relative merits of the variance and the mean difference Δ, Yitzhaki [596] noted that the mean difference is more informative than the variance in deriving properties of distributions that depart from normality. He also compared the algebraic structure of variance and Δ and examined the relative superiority of the latter from the stochastic dominance, exchangability and stratification viewpoints. For further comments on these aspects and some others in the reliability context, see Chap. 7;

  4. 4.

    Forming the ratios \(\tau _{r} = \frac{L_{r}} {L_{2}}\), r = 3, 4, , for any non-degenerate X with μ < , the result | τ r  | < 1 holds. Hence, the quantities τ r ’s are dimensionless and bounded;

  5. 5.

    The skewness and kurtosis of a distribution can be ascertained through the moment ratios. The L-coefficient of skewness is

    $$\displaystyle{ \tau _{3} = \frac{L_{3}} {L_{2}} }$$
    (1.42)

    and the L-coefficient of kurtosis is

    $$\displaystyle{ \tau _{4} = \frac{L_{4}} {L_{2}}. }$$
    (1.43)

    These two measures satisfy the criteria presented for coefficients of skewness and kurtosis in terms of order relations. The range of τ 3 is ( − 1, 1) while that of τ 4 is \(\frac{1} {4}(5\tau _{3}^{2} - 1) \leq \tau _{ 4} <1\). These results are proved in Hosking [279] and Jones [306] using different approaches. It may be observed that both τ 3 and τ 4 are bounded and do not assume arbitrarily large values as β 1 (for example, in the case of F(x) = 1 − x  − 3, x > 1);

  6. 6.

    The ratio

    $$\displaystyle{ \tau _{2} = \frac{L_{2}} {L_{1}} }$$
    (1.44)

    is called L-coefficient of variation. Since X is non-negative in our case, L 1 > 0, L 2 > 0 and further

    $$\displaystyle{L_{2} =\int _{ 0}^{1}u(1 - u)q(u)du <\int _{ 0}^{1}(1 - u)q(u)du = L_{ 1}}$$

    so that 0 < τ 2 < 1.

The above properties of L-moments have made them popular in diverse applications, especially in hydrology, civil engineering and meteorology. Several empirical studies (as the one by Sankarasubramonian and Sreenivasan [517]) comparing L-moments and the usual moments reveal that estimates based on the former are less sensitive to outliers. Just as matching the population and sample moments for the estimation of parameters, the same method (method of L-moments) can be applied with L-moments as well. Asymptotic approximations to sampling distributions are better achieved with L-moments. An added advantage is that standard errors of sample L-moments exist whenever the underlying distribution has a finite variance, whereas for the usual moments this may not be enough in many cases.

When dealing with the conventional moments, the (β 1, β 2) plot is used as a preliminary tool to discriminate between candidate distributions for the data. For example, if one wishes to choose a distribution from the Pearson family as a model, (β 1, β 2) provide exclusive classification of the members of this family. Distributions with no shape parameters are represented by points in the β 1-β 2 plane, those with a single shape parameter have their (β 1, β 2) values lie on the line 2β 2 − 3β 1 − 6 = 0, while two shape parameters in the distribution ensure that for them, (β 1, β 2) falls in a region between the lines 2β 2 − 3β 1 − 6 = 0 and β 2 − β 1 − 1 = 0. These cases are, respectively, illustrated by the exponential distribution (which has (β 1, β 2) = (4, 9) as a point), the gamma family and the beta family; see Johnson et al. [302] for details. In a similar manner, one can construct (τ 2, τ 3)-plots or (τ 3, τ 4)-plots for distribution functions or quantile functions to give a visual identification of which distribution can be expected to fit a given set of observations. Vogel and Fennessey [574] articulate the need for such diagrams and provide several examples on how to construct them. Some refinements of the L-moments are also studied in the name of trimmed L-moments (Elamir and Seheult [187], Hosking [281]) and LQ-moments (Mudholkar and Hutson [424]).

Example 1.9.

The L-moments of the exponential distribution were calculated earlier in Example 1.8. Applying the formulas for τ 2, τ 3 and τ 4 in (1.44), (1.42) and (1.43), we have

$$\displaystyle{\tau _{2} = \frac{1} {2},\;\tau _{3} = \frac{1} {3},\;\tau _{4} = \frac{1} {6}.}$$

Thus, \((\tau _{2},\tau _{3}) = (\frac{1} {2}, \frac{1} {3})\) and \((\tau _{3},\tau _{4}) = (\frac{1} {3}, \frac{1} {6})\) are points in the τ 2-τ 3 and τ 3-τ 4 planes, respectively.

Example 1.10.

The random variable X has generalized Pareto distribution with

$$\displaystyle{Q(u) = \frac{b} {a}\{{(1 - u)}^{- \frac{a} {a+1} } - 1\},\quad a> -1,\;b> 0.}$$

Then, straightforward calculations yield

$$\displaystyle\begin{array}{rcl} L_{1}& =& b,\qquad L_{2} = b(a + 1){(a + 2)}^{-1}, {}\\ L_{3}& =& b(a + 1)(2a + 1){[(2a + 3)(a + 2)]}^{-1}, {}\\ L_{4}& =& b(a + 1)(6{a}^{2} + 7a + 2){[(a + 2)(2a + 3)(3a + 4)]}^{-1}, {}\\ \end{array}$$

so that

$$\displaystyle{\tau _{2} = \frac{a + 1} {a + 2},\quad \tau _{3} = \frac{2a + 1} {2a + 3}\quad \mbox{ and}\quad \tau _{4} = \frac{6{a}^{2} + 7a + 2} {6{a}^{2} + 17a + 12}.}$$

Then, eliminating a between τ 2 and τ 3, we obtain

$$\displaystyle{\tau _{3} = \frac{3\tau _{2} - 1} {\tau _{2} + 1}.}$$

Thus, the plot of (τ 2, τ 3) for all values of a and b lies on the curve (τ 2 + 1)(3 − τ 3) = 4. Note that the exponential plot is \((\frac{1} {2}, \frac{1} {3})\) which lies on the curve when a → 0. The estimation and other related inferential problems are discussed in Chap. 7.

We now present probability weighted moments (PWM) which is a forerunner to the concept of L-moments. Introduced by Greenwood et al. [226], PWMs are of considerable interest when the distribution is expressed in quantile form. The PWMs are defined as

$$\displaystyle{ M_{p,r,s} = E[{X}^{p}{F}^{r}(X)\bar{{F}}^{s}(X)], }$$
(1.45)

where p, r, s are non-negative real numbers and E | X | p < . Two special cases of (1.45) in general use are

$$\displaystyle\begin{array}{rcl} \beta _{p,r}& =& E({X}^{p}{F}^{r}(X)) \\ & =& \int {x}^{p}{F}^{r}(x)f(x)dx \\ & =& \int _{0}^{1}{(Q(u))}^{p}{u}^{r}du{}\end{array}$$
(1.46)

and

$$\displaystyle\begin{array}{rcl} \alpha _{p,s}& =& E({X}^{p}\bar{{F}}^{s}(X)) \\ & =& \int _{0}^{1}{(Q(u))}^{p}{(1 - u)}^{s}du.{}\end{array}$$
(1.47)

Like L-moments, PWMs are more robust to outliers in the data. They have less bias in estimation even for small samples and converge rapidly to asymptotic normality.

Example 1.11.

The PWMs of the Pareto distribution with (see Table1.1)

$$\displaystyle{Q(u) =\sigma {(1 - u)}^{-\frac{1} {\alpha } },\quad \sigma,\alpha> 0,}$$

are

$$\displaystyle{\alpha _{p,s} =\sigma \int _{ 0}^{1}{(1 - u)}^{-\frac{p} {\alpha } +s}du = \frac{\sigma \alpha } {\alpha (s + 1) - p},\quad \alpha (s + 1)> p.}$$

Similarly, for the power distribution with (see Table 1.1)

$$\displaystyle{Q(u) =\alpha {u}^{\frac{1} {\beta } },\quad \alpha,\beta> 0,}$$

we have

$$\displaystyle{\beta _{p,r} =\alpha \int _{ 0}^{1}{u}^{-\frac{p} {\beta } +r}du = \frac{\alpha \beta } {1 +\beta (r + 1)}.}$$

Further specializing (1.46) for p = 1, we see that the L-moments are linear combination of the PW moments. The relationships are

$$\displaystyle\begin{array}{rcl} L_{1}& =& \beta _{1,0}, {}\\ L_{2}& =& 2\beta _{1,1} -\beta _{1,0}, {}\\ L_{3}& =& 6\beta _{1,2} - 6\beta _{1,1} +\beta _{1,0}, {}\\ L_{4}& =& 20\beta _{1,3} - 30\beta _{1,2} + 12\beta _{1,1} -\beta _{1,0} {}\\ \end{array}$$

in the first four cases. Generally, we have the relationship

$$\displaystyle{L_{r+1} =\sum _{ k=0}^{r}\frac{{(-1)}^{r-k}(r + k)!} {{(k!)}^{2}(r - k)!} \beta _{1,k}.}$$

The conventional moments can also be deduced as M p, 0, 0 or β p, 0 or α p, 0. The role of PW moments in reliability analysis will be taken up in the subsequent chapters. In spite of its advantages, Chen and Balakrishnan [140] have pointed out some infeasibility problems in estimation. While estimating the parameters of some distributions like the generalized forms of Pareto, the estimated distributions have an upper or lower bound and one or more of the data values lie outside this bound.

1.7 Diagrammatic Representations

In this section, we demonstrate a few graphical methods other than the conventional ones. The primary goal is fixed as the choice of model for the data represented by a quantile function. An important tool in this category is the Q-Q plot. The Q-Q plot is the plot of points (Q(u r ), x r: n ), r = 1, 2, , n, where \(u_{r} = \frac{r-0.5} {n}\).Footnote 1 For application purposes, we may replace Q(u r ) by the fitted quantile function. One use of this plot is to ascertain whether the sample could have arisen from the target population Q(u). In the ideal case, the graph should show a straight line that bisects the axes, since we are plotting the sample and population quantiles. However, since the sample is random and the fitted values of Q(u) are used, the points lying approximately around the line is indicative of the model being adequate. The points in the Q-Q plot are always non-decreasing when viewed from left to right.

The Q-Q plot can also be used for comparing two competing models by plotting the rth quantile of one against the rth quantile of the other. When the two distributions are similar, the points on the graph should show approximately the straight line y = x. A general trend in the plot, like steeper (flatter) than y = x, will mean that the distribution plotted on the y-axis (x-axis) is more dispersed. On the other hand, S-shaped plots often suggest that one of the distributions exhibits more skewness or tail-heaviness. It should also be noted that the relationship in quantile plot can be linear when the constituent distributions are linearly related. This procedure is direct when the data sets from two distributions contain the same number of observations. Otherwise, it is necessary to use interpolated quantile estimates in the shorter set to equal the number in the larger sets. Often, Q-Q plots are found to be more powerful and informative than histogram comparisons.

Example 1.12.

The times to failure of a set of 10 units are given as 16, 34, 53, 75, 93, 120, 150, 191, 240 and 390 h (Kececioglu [322]). A Weibull distribution with quantile function

$$\displaystyle{Q(u) =\sigma {(-\log (1 - u))}^{1/\lambda }}$$

is proposed for the data. The parameters of the model were estimated by the method of maximum likelihood as \(\hat{\sigma }= 146.2445\) and \(\hat{\lambda }= 1.973\). The Q-Q plot pertaining to the model is presented in Fig. 1.1. From the figure, it is seen that the above model seems to be adequate.

Fig. 1.1
figure 1

Q-Q plot for Example 1.12

A second useful graphical representation is the box plot introduced by Tukey [569]. It depicts graph of the numerical data through a five-figure summary in the form of extremes, quartiles and the median. The steps required for constructing a box plot are (Parzen [484])

  1. (i)

    compute the median \(m =\bar{ Q}(0.50)\), the lower quartile \(q_{1} =\bar{ Q}(0.25)\) and the upper quartile \(q_{3} =\bar{ Q}(0.75)\);

  2. (ii)

    draw a vertical box of arbitrary width and length equal to q 3 − q 1;

  3. (iii)

    a solid line is marked within the box at a distance m − q 1 above the lower end of the box. Dashed lines are extended from the lower and upper ends of the box at distances equal to x n: n  − q 3 and x 1: n  − q 1. This constitutes the H-plot, H standing for hinges or quartiles. Instead, one can use \(\bar{Q}(0.125) = e_{1}\) and \(\bar{Q}(0.875) = e_{7}\) resulting in E-box plots. Similarly, the quantiles \(\bar{Q}(0.0625)\) and \(\bar{Q}(0.9375)\) constitute the D-box plots;

  4. (iv)

    A quantile box plot consists of the graph of \(\bar{Q}(u)\) on [0, 1] along with the three boxes in (iii), superimposed on it.

Parzen [484] proposed the following information to be drawn from the plot. By drawing a perpendicular line to the median line at its midpoint and of length \(\pm {n}^{-\frac{1} {2} } - (q_{3} - q_{1})\), a confidence interval for the median can be obtained. The graph \(x =\bar{ Q}(u)\) exhibiting sharp rises is likely to have a density with more than one mode. If such points lie inside the H-box, the presence of several distinct populations generating the data is to be suspected, while, if they are outside the D-box, presence of outliers is indicated. Horizontal segments in the graph may be the results of the discrete distributions. By calculating

$$\displaystyle{\dfrac{\bar{Q}(\frac{1} {2}) -\frac{1} {2}[\bar{Q}(u) +\bar{ Q}(1 - u)]} {\bar{Q}(1 - u) -\bar{ Q}(u)} }$$

for u values, one can get a feel for skewness with a value near zero suggesting symmetry. Parzen [484] also suggested some measures of tail classification.

Example 1.13.

The box plot corresponding to the data in Example 1.12 is exhibited in Fig. 1.2. It may be noticed that the observation 390 is a likely outlier.

A stem-leaf plot can also be informative about some meaningful characteristics of the data. To obtain such a plot, we first arrange the observations in ascending order. The leaf is the last digit in a number. The stem contains all other digits (When the data consists of very large numbers, rounded values to a particular place, like hundred or thousand, are used a stem and leaves). In the leaf plot, there are two columns, first representing the stem, separated by a line from the second column representing the leaves. Each stem is listed only once and the leaves are entered in a row. The plot helps to understand the relative density of the observations as well as the shape. The mode is easily displayed along with the potential outliers. Finally, the descriptive statistics can be easily worked out from the diagram.

Fig. 1.2
figure 2

Box plot for the data given in Example 1.12

Example 1.14.

We illustrate the stem-leaf plot for a small data set: 36, 57, 52, 44, 47, 51, 46, 63, 59, 68, 66, 68, 72, 73, 75, 81, 84, 106, 76, 88, 91, 41, 84, 68, 34, 38, 54.

$$\displaystyle{ \begin{array}{r|l} 3&4\quad 6\quad 8\\ 4&1, \quad 4,\quad 6,\quad 7 \\ 5&1\quad 2\quad 4\quad 7\quad 9\\ 6&3\quad 6\quad 8\quad 8\quad 8 \\ 7&2\quad 3\quad 5\quad 6\\ 8&1, \quad 4, \quad 4,\quad 8 \\ 9&1\\ 10&6\end{array} }$$