1 Introduction

In probability theory, the concept of skewness means asymmetry or departure from symmetry. The curve of the density function appears distorted or skewed either to the left or to the right and this fact is interpreted as the tail on the curve’s right-hand side (left-hand side) being longer than the tail on the left-hand side (right-hand side). The study of skewness is of great interest in diverse areas of statistics, economics and related fields. For example, it is well known in statistics that tails have a direct effect on the efficiency of the estimators of location parameters. In finance, for example, the departure from symmetry of financial returns is crucial for the assessment of financial risk. However, all these situations in a real context are affected by more than one random variable and require the analysis of the whole problem from a multivariate point of view (see, e.g., Serfling 2004, 2006). The study and comparison of skewness in the multivariate case is an important topic not only in finance or risk theory, but also in environmental sciences and other research fields. In hydrology, for instance, extreme events are of great interest and usually involve multivariate skewed distributions. Unfortunately, the study of skewness becomes even more critical in the multivariate case, where the need to provide skewed multivariate distributions to fit multivariate data sets has been the origin of several proposals of multivariate skewed distributions. The papers by Arellano-Valle and Azzalini (2006), Azzalini (1985, 2005), Azzalini and Capitanio (1999) and Azzalini and Dalla-Valle (1996) are examples of multivariate skew-normal distributions and related distributions. Most proposals in the multivariate case consider multivariate extensions of “skewing mechanisms” developed in the univariate case. We describe two different approaches.

First, Ferreira and Steel (2006) developed a skewing mechanism in which a symmetric distribution \(F\) is skewed through a distortion. More precisely, given a symmetric distribution \(F\) and a distribution \(P\) with support on the interval \([0,1]\), the distorted version of \(F\) through \(P\), that is, \(P\circ F\), is the skewed version of \(F\). Recently, Abtahi and Towhidi (2013) gave an unified representation of multivariate skewed distributions extending to the multivariate case the proposal of Ferreira and Steel (2006) for the representation of univariate skewed distributions. They used the Rossenblatt construction (see, e.g., Rosenblatt 1952) to provide such representation of multivariate skewed distributions.

Another approach is the one developed by Ley and Paindaveine (2010). Given a random variable \(X\) with symmetric distribution \(F\), they considered an increasing transformation \(\varPhi (X)\) of \(X\) which provides a skewed version of \(X\). When \(\varPhi \) is increasing, it can be easily seen that \(\varPhi = G^{-1} \circ F\) (see Marshall and Olkin 2007, Proposition C. 6.), where \(G\) is the distribution function of \(\varPhi (X)\) and \(G^{-1}\) is the quantile function associated with \(G\), that is, \(G^{-1}(p)=\inf \{x:G(x)\ge p\}\). This skewing mechanism is then extended to the multivariate case considering an appropriate transformation which maps an \(n\)-dimensional random vector with a symmetric multivariate distribution onto a skewed version. This particular transformation is a diffeomorphism, i.e., a one-to-one mapping \(H: S \subset R^n \longmapsto T \subset \mathbb {R}^n\) such that both \(H\) and its inverse \(H^{-1}\) are continuously differentiable functions. It is worth mentioning that authors restrict to the case when \(H\) has a lower triangular Jacobian matrix with strictly positive diagonal elements.

In both cases, the point of departure is a symmetric distribution, but Ferreira and Steel (2006) considered a transformation (a distortion) of the quantile space, while Ley and Paindaveine (2010) considered a transformation of the sample space.

Traditionally, skewness has been studied through different measures that intends to represent the amount and direction of departure from horizontal symmetry. However, in the univariate case, when dealing with asymmetry or skewness, van Zwet (1964) introduced the concept of relative skewness. Let \(X\) be a random variable with interval support and distribution function \(F\). Let us consider another random variable \(Y\) with distribution function \(G\); van Zwet (1964) said that the distribution function \(G\) (or the random variable \(Y\)) is more right-skewed than the distribution \(F\) (or the random variable \(X\)) if \(G^{-1}\circ F\) is a convex function on the support of \(X\).

The approaches by van Zwet (1964) and Ley and Paindaveine (2010) have in common the transformation of the sample space, but van Zwet (1964) did not consider the symmetry of the random variable \(X\) or general increasing transformations of \(X\). In fact, they considered increasing convex transformations of \(X\). That is the reason why the random variable \(Y\) is more right-skewed than the random variable \(X\).

This idea arises in a reasonable way when trying to find a formal definition of what it means that one distribution \(G\) is more skewed to the right than a distribution \(F\). (Marshall and Olkin 2007, p. 70) provide an explanation of this fact. This idea provides a partial ordering among the set of distributions. In particular, a random variable \(X\) with distribution \(F\) is said to be less in the convex transform order than a random variable \(Y\) with distribution function \(G\), denoted by \(X\le _{c}Y\), if \(G^{-1}\circ F\) is a convex function (see Shaked and Shanthikumar 2007). Equivalently, it can be seen that \(X\le _{c}Y\) if, and only if, there exists an increasing and convex transformation \(\varPhi \), which maps \(X\) onto \(Y\), that is, \(Y=_{st}\varPhi (X)\). It is clear also that, in this case, there exists an increasing and concave transformation \(\Psi \), which maps \(Y\) onto \(X\), that is, \(X=_{st}\Psi (Y)\).

Next, we describe some situations where this comparison of skewness arises in a natural way.

The first example is the case of increasing convex transformations of some parametric models of random variables. For instance, if we consider a random variable \(X\) with normal distribution, with mean equal to \(0\) and standard deviation equal to \(1\), then the random variable \(Y=\exp (\sigma X + \mu )\), where \(\sigma \) is a positive real number and \(\mu \) is a real number, is an increasing convex transformation of \(X\) and, therefore, more skewed to the right than \(X\), or following the previous notation \(X\le _{c}Y\). In this case the random variable \(Y\) follows a lognormal distribution and, therefore, lognormal distributions are more skewed to the right than normal distributions.

Another example is when we consider a random variable \(\exp (\lambda )\) exponentially distributed. In this case, increasing concave (convex) transformations of \(\exp (\lambda )\) lead to random variables with an increasing [decreasing] failure rate, denoted by IFR [DFR]. Given a random variable \(X\) with absolutely continuous distribution function \(F\) and density function \(f\), the hazard or failure rate is defined as \(r(x)=f(x)/(1-F(x))\) for all \(x\), such that \(F(x)<1\). This function is one of the basic functions in the context of reliability and survival analysis, where a random variable \(X\) represents the random lifetime of a unit or a mechanism. The hazard rate describes the process of aging and can be considered as the rate at which a unit fails to survive up to a fixed time \(x\) (see Barlow and Proschan 1975; Lai and Xie 2006). Namely, if we denote by \(E\) an exponential distribution with parameter \(1\), i.e., \(E\sim \exp (1)\), the IFR [DFR] aging class can be characterized via the univariate convex transform order, that is, given a random variable \(X\) (or its distribution), then

$$\begin{aligned} \text{ X } \text{ is } \text{ IFR } \text{[DFR] }\Leftrightarrow & {} X \le _{c} [\ge _{c}] E, \nonumber \\\Leftrightarrow & {} X \le _{c} [\ge _{c}] a E +b,\quad \forall \, a>0, b\in \mathbb {R}. \end{aligned}$$
(1)

As we have mentioned before in the univariate case, when \(\varPhi \) is increasing, then \(\varPhi =G^{-1} \circ F\). In the multivariate case, it is possible to find such function \(\mathbf \Phi \) which maps a random vector \(\mathbf {X}\) onto a random vector \(\mathbf \Phi (X)\) with the same distribution of \(\mathbf {Y}\). Next, we describe the construction of such function. Throughout this paper, “increasing” means “nondecreasing” and “decreasing” means “nonincreasing”. We will denote by \(=_{st}\) the equality in law, and by \(\le _{a.s.}\) the almost sure inequality. For any random vector \(\mathbf {X}\), or random variable, we will denote by \(({\mathbf {X}}|A)\) a random vector, or random variable, whose distribution is the conditional distribution of \({\mathbf {X}}\) given \(A\).

Now, let us consider two \(n\)-dimensional random vectors \({\mathbf {X}}=(X_1,\ldots ,X_n)\) and \({\mathbf {Y}}=(Y_1,\ldots ,Y_n)\) with absolutely continuous distribution.

First, we consider the multivariate quantile transform introduced by Arjas and Lehtonen (1978), O’Brien (1975), Rosenblatt (1952) and Rüschendorf (1981). Essentially, this transformation is also discussed in Ley and Paindaveine (2010). Let us consider the random vector \({\mathbf {Y}}\), the multivariate quantile transform, also called the standard construction, associated with \({\mathbf {Y}}\), which is defined recursively as

$$\begin{aligned} \begin{array}{rcl} Q_{{\mathbf {Y}},1}(u_1)&{}=&{}G^{-1}_{Y_1}(u_1), \\ Q_{{\mathbf {Y}},2}(u_1,u_2)&{}=&{}G^{-1}_{(Y_2|Y_1=Q_{{\mathbf {Y}},1}(u_1))}(u_2), \\ &{}\vdots &{} \\ Q_{{\mathbf {Y}},n}(u_1,\ldots ,u_n) &{}=&{} G^{-1}_{\left( Y_n|\bigcap _{j=1}^{n-1}Y_j=Q_{\mathbf {Y},j}(u_1,\ldots ,u_j)\right) }(u_n), \end{array} \end{aligned}$$
(2)

for every \((u_1,u_2,\ldots ,u_n)\in (0,1)^n\), where \(G^{-1}_{Y_1}(\cdot )\) is the quantile function of \(Y_1\) and for \(i=2,\ldots ,n\), \(G^{-1}_{\left( Y_i|\bigcap _{j=1}^{i-1}Y_j=Q_{\mathbf {Y},j}(u_1,\ldots ,u_j)\right) }(\cdot )\) is the quantile function of the univariate conditional random variable given by

$$\begin{aligned} \left( Y_i\left| \bigcap _{j=1}^{i-1}\right. Y_j=Q_{\mathbf {Y},j}(u_1,\ldots ,u_j)\right) . \end{aligned}$$

This known transform is widely used in simulation theory and plays the role of the quantile in the multivariate case. It can be seen that given \(U_1,\ldots ,U_n\) independent and identically distributed random variables uniformly distributed on the interval \((0,1)\), then, denoting

$$\begin{aligned} \mathbf {Q_Y}(u_1,\ldots ,u_n)=(Q_{\mathbf {Y},1}(u_1),Q_{\mathbf {Y},2}(u_1,u_2),\ldots ,Q_{\mathbf {Y},n}(u_1,\ldots ,u_n)), \end{aligned}$$

we have that

$$\begin{aligned} (Y_1,\ldots ,Y_n)=_{st}\mathbf {Q_Y}(U_1,\ldots ,U_n). \end{aligned}$$
(3)

Next, we recall the multivariate distributional transform. Let us consider the random vector \(\mathbf {X}\); the multivariate distributional transform is defined recursively as

$$\begin{aligned} \begin{array}{rcl} D_{\mathbf {X}, 1}(x_1) &{} = &{} F_{X_1}(x_1), \\ D_{\mathbf {X}, 2}(x_1,x_2) &{} = &{} F_{(X_2|X_1=x_1)}(x_2), \\ &{} \vdots &{} \\ D_{\mathbf {X}, n}(x_1,\ldots ,x_n) &{} = &{} F_{\left( X_n\left| \right. \bigcap _{j=1}^{n-1}X_j=x_j\right) }(x_n), \end{array} \end{aligned}$$
(4)

for every \((x_1,\ldots ,x_n)\) in the support of \(\mathbf {X}\), where \(F_{X_1}(\cdot )\) is the distribution function of \(X_1\) and for \(i=2,\ldots ,n\), \(F_{\left( X_i \left| \right. \bigcap _{j=1}^{i-1}X_j=x_j\right) }(\cdot )\) is the distribution function of the conditional distribution \(\left( X_i\left| \right. \bigcap _{j=1}^{i-1}X_j=x_j\right) \).

Denoting

$$\begin{aligned} \mathbf {D_X}(x_1,\ldots ,x_n)=(D_{\mathbf {X},1}(x_1),D_{\mathbf {X},2}(x_1,x_2),\ldots , D_{\mathbf {X},n}(x_1,\ldots ,x_n)), \end{aligned}$$

it can be seen that

$$\begin{aligned} (U_1,\ldots ,U_n)=_{st}\mathbf {D_X}(X_1,\ldots ,X_n). \end{aligned}$$
(5)

Therefore, if we consider the transform

$$\begin{aligned} \mathbf \Phi \equiv \mathbf {Q_Y}\circ \mathbf {D_X}, \end{aligned}$$
(6)

defined for every \((x_1,\ldots ,x_n)\) in the support of \(\mathbf {X}\), we have, from (3) and (5) that

$$\begin{aligned} \mathbf {Y}=_{st} \mathbf {Q_Y}(\mathbf {D_X}(\mathbf {X})), \end{aligned}$$

and, hence, the function \(\mathbf \Phi \) maps the random vector \(\mathbf {X}\) onto \(\mathbf {Y}\).

Remark 1

From (2) and (4), the i-th component of \(\mathbf \Phi \) depends only on \((x_1,\ldots ,x_i)\) and is given by

$$\begin{aligned} {\mathbf \Phi }_1 (x_1)= & {} G^{-1}_{Y_{1}}(F_{X_1}(x_1)), \end{aligned}$$
(7)
$$\begin{aligned} {\mathbf \Phi }_i (x_1,\ldots ,x_i)= & {} G^{-1}_{ \left\{ Y_{i} \left| \bigcap \limits _{j=1}^{i-1}\right. Y_j = {\mathbf \Phi }_j( x_1,\ldots ,x_j ) \right\} }\left( F_{\left\{ X_{i} \left| \bigcap \limits _{j=1}^{i-1}\right. X_j = x_j\right\} }(x_i)\right) . \end{aligned}$$
(8)

From the increasingness of both the distribution function and its inverse, it is apparent that

$$\begin{aligned} {\mathbf \Phi }_i(x_1,\ldots ,x_i), \end{aligned}$$

is increasing in \(x_i\), for all \(i=1,\ldots ,n\). Hence, the Jacobian matrix of \(\mathbf \Phi \) is always a lower triangular matrix with strictly positive diagonal elements.

Remark 2

In addition, as a clear extension of the univariate case, Fernández-Ponce and Suárez-Llorens (2003) proved in their Theorem 3.1 that if we take a function \({\mathbf k}: \mathbb {R}^n\rightarrow \mathbb {R}^n\) such that \({\mathbf Y}=_{st} {\mathbf k} ({\mathbf X})\) and \({\mathbf k}\) has a lower triangular Jacobian matrix with strictly positive diagonal elements, then \(\mathbf k\) has necessarily the form of the function \(\mathbf \Phi \) given in (6).

The purpose of this paper is to provide a new concept of relative skewness for multivariate distributions assuming some convexity properties for the function \(\mathbf {\Phi }\). The organization of the paper is the following. In Sect. 3, we define and study a criteria of relative skewness based on convexity properties for the function \({\mathbf \Phi }\). We provide some properties and examples. In Sect. 4, we study the case of random vectors with the same copula and provide several examples for this case. Along the paper, we assume absolute continuity of the multivariate distributions and convex supports for random vectors and random variables.

2 Relative skewness of multivariate distributions

In this section, we consider a new multivariate convex order based on the generalization of the convexity of \(G^{-1}\circ F\) to the multivariate case for the function \(\mathbf \Phi = \mathbf {Q_Y}\circ \mathbf {D_X}\). This generalization is clearly inspired on the multivariate dispersive order proposed by Fernández-Ponce and Suárez-Llorens (2003) and also in the skewing mechanism by Ley and Paindaveine (2010). Along this section, we will assume that the random variables or vectors, upon which we consider convex transformations, have a convex support and, analogously to the previous section, also absolutely continuous distribution functions. Finally, we will restrict our study to the case when the function \({\mathbf \Phi }\), defined in (6), is differentiable.

We start by recalling the definition of a multivariate convex function (see Marshall et al. 2011, for more details).

Let \(\mathbf {f}:\mathbf {S}\rightarrow \mathbb {R}\) be a real-valued function defined on a convex set \(\mathbf {S} \subseteq \mathbb {R}^k\), \(k\ge 1\). Then, \(\mathbf {f}\) is convex on the set \(S\) if for all \(\mathbf {x}_1, \mathbf {x}_2 \in \mathbf {S}\) and for all \(\lambda \in (0,1)\) we have

$$\begin{aligned} \mathbf {f}(\lambda \mathbf {x}_1+(1-\lambda )\mathbf {x}_2)\le \lambda \mathbf {f}(\mathbf {x}_1)+(1-\lambda )\mathbf {f}(\mathbf {x}_2). \end{aligned}$$

In literature, there exist many interesting characterizations of convex functions. Next, we recall some of these characterizations. Note that when we use them, we will assume the regularity conditions that make them possible. From now on, given a vector (or a matrix) \(\mathbf {v}\), we denote as \(\mathbf {v}^t\) the transposition of \(\mathbf {v}\).

Characterization 1

If the function \(\mathbf {f}:\mathbf {S}\rightarrow \mathbb {R}\), \(\mathbf {S} \subseteq \mathbb {R}^k\), is differentiable in the interior of its support, the convexity is equivalent to check

$$\begin{aligned} (\mathbf {f}(\mathbf {x}_2)-\mathbf {f}(\mathbf {x}_1))^t \ge \nabla \mathbf {f}(\mathbf {x}_1)(\mathbf {x}_2-\mathbf {x}_1)^t, \end{aligned}$$

for all \(\mathbf {x}_1\), \(\mathbf {x}_2\) in the support of \(\mathbf {f}\), where

$$\begin{aligned} \nabla \mathbf {f}(\mathbf {x}_1)(\mathbf {x}_2-\mathbf {x}_1)^t = \displaystyle \sum _{i=1}^k \frac{\partial \mathbf {f}}{\partial x_i}(\mathbf {x}_1)(x_{2i}-x_{1i}), \end{aligned}$$

represents the classical tangent hyperplane to the hypersurface given by \(\mathbf {f}\) at \(\mathbf {x}_1\).

Characterization 2

If the function \(\mathbf {f}:\mathbf {S}\rightarrow \mathbb {R}\), \(\mathbf {S} \subseteq \mathbb {R}^k\), is twice differentiable in the interior of its support, the convexity is equivalent to check if the Hessian, denoted by \(\nabla ^2 \mathbf {f}(\mathbf {x})\), is a semidefinite positive matrix, for every \(\mathbf {x}\) in the support of \(\mathbf {f}\).

We recall that by Young’s theorem, the Hessian of any function, for which all second partial derivatives are continuous, is symmetric for all values of the argument of the function. Finally, attending to the Sylvester’s criterion, \(\nabla ^2\mathbf {f}(\mathbf {x})\) is semidefinite positive if, and only if, all its principal minors are non-negative.

Definition 1

Let \(\mathbf {X}\) and \(\mathbf {Y}\) be two \(n\)-dimensional random vectors. Let \(\mathbf {\Phi }\) be the function defined in (6) which maps \(\mathbf {X}\) onto \(\mathbf {Y}\). Then, \(\mathbf {X}\) is said to be smaller than \(\mathbf {Y}\) in the multivariate convex transform order, for short mct order and denoted by \(\mathbf {X} \le _{mct} \mathbf {Y}\), if, and only if, the i-th component of \(\mathbf {\Phi }\), \(\mathbf {\Phi }_i\), is convex in its support for all \(i=1,\ldots ,n\).

Roughly speaking, \(\mathbf {X} \le _{mct} \mathbf {Y}\) means that the transformation \({\mathbf \Phi }\) that maps \(\mathbf {X}\) onto \(\mathbf {Y}\) is a multivariate convex function. Hence, Definition 1 is a clear generalization of the concept of relative skewness proposed by van Zwet (1964). It is worth mentioning that the transformation \(\mathbf {\Phi }\) depends on the ordering of the marginal distributions. Note that we first obtain \(\mathbf {\Phi }_1\) from the marginal distributions \(X_1\) and \(Y_1,\) and conditioned on every possible realization \(\mathbf {\Phi }_1(x_1)\) we next construct \(\mathbf {\Phi }_2(x_1,x_2)\). Continuing this procedure, we finally arrive at \(\mathbf {\Phi }\). Far from being a disadvantage, this dependency provides a formal way of explaining that one random vector \(\mathbf {X}\) is more “directionally skewed‘” than a random vector \(\mathbf {Y}\), where directionally skewed is a clear extension of the right-skewed univariate concept. We will see later that Proposition 1 reinforces this remark. For such a reason, given two random vectors ordered in the mct-order sense, we cannot expect the mct order to hold for any arbitrary permutation of the marginal distributions.

From (6) and Remark 1, we point out that the function \({\mathbf \Phi }\) is the kind of transformations that Ley and Paindaveine (2010) use to define a skewing mechanism. Namely, the authors only require that transformations that map a multivariate symmetric distribution onto a skewed version cannot be odd functions. This is due to the fact that if we map a symmetric distribution, assuming symmetry around the null vector, using an odd mapping, we will obtain another symmetric distribution. We recall that, except affine transformations, convex functions cannot be odd functions. We will analyze in Proposition 2 the case when \({\mathbf \Phi }\) is an affine transformation.

From Remark 2 and although it is not our main purpose, we can consider any multivariate convex transformation having a lower triangular Jacobian matrix with strictly positive diagonal elements to provide a skewing mechanism. For example, the following transformation \(\mathbf {\Phi }(x_1,\ldots ,x_n)=(e^{x_1},\ldots , e^{x_n})\) will map any symmetric multivariate distribution, \(\mathbf {X}\), onto a skewed version, \(\mathbf {Y} = \mathbf {\Phi (\mathbf {X})}\), in the mct-order sense. In the latter case, \(\mathbf {X}\) and \(\mathbf {Y}\) share a common dependence structure, copula, as we will see in Sect. 4.

Next, we provide a particular interpretation. Denoting by \(J_{\mathbf \Phi }(\mathbf {x})\) the Jacobian matrix of \(\mathbf {\Phi }\) at \(\mathbf {x}\) and using Characterization 1 for each \(\mathbf {\Phi }_i\), \(i=1,\ldots , n\), it is apparent that Definition 1, in case of differentiability, is equivalent to check the following inequality:

$$\begin{aligned} (\mathbf \Phi (\mathbf {x}_2)-\mathbf \Phi (\mathbf {x}_1))^t\ge J_{\mathbf \Phi }(\mathbf {x}_1)(\mathbf {x}_2-\mathbf {x}_1)^t, \end{aligned}$$
(9)

which contains all the information of the tangent hyperplanes given by \(\nabla {\mathbf \Phi }_i\), \(i=1,\ldots , n\). Note that from Remark 1, the Jacobian matrix of \(\mathbf {\Phi }\) is a lower triangular matrix with strictly positive diagonal elements having the following form:

$$\begin{aligned} J_{\mathbf \Phi }(\mathbf {x})=\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \frac{\partial {\mathbf \Phi }_1}{\partial x_1}(x_1) &{} 0 &{} \ldots &{} 0 \\ \frac{\partial {\mathbf \Phi }_2}{\partial x_1}(x_1,x_2) &{}\frac{\partial {\mathbf \Phi }_2}{\partial x_2}(x_1,x_2) &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \frac{\partial {\mathbf \Phi }_n}{\partial x_1}(x_1,\ldots ,x_n) &{} \frac{\partial \mathbf \Phi _n}{\partial x_2}(x_1,\ldots ,x_n) &{} \ldots &{} \frac{\partial {\mathbf \Phi }_n}{\partial x_n} (x_1,\ldots ,x_n)\end{array} \right) . \end{aligned}$$

Due to the fact that \({\mathbf \Phi }\) maps the multivariate quantile transform of \({\mathbf {X}}\) to the corresponding of \({\mathbf {Y}}\), i.e., \( \mathbf {\Phi }(\mathbf {Q}_{\mathbf {X}}(\mathbf {u}))=\mathbf {Q}_{\mathbf {Y}}(\mathbf {u})\), for all \(\mathbf {u}=(u_1,\ldots , u_n)\), \(u_i \in (0,1)\), condition (9) can be interpreted as a particular distance between multivariate quantiles:

$$\begin{aligned} (\mathbf {Q_Y} (\mathbf {v})-\mathbf {Q_Y}(\mathbf {u}))^t\ge J_{\mathbf \Phi }(\mathbf {Q_X}(\mathbf {u}))(\mathbf {Q_X}(\mathbf {v})-\mathbf {Q_X}(\mathbf {u}))^t, \end{aligned}$$

for all \(\mathbf {v}=(v_1,\ldots , v_n)\) and \(\mathbf {u}=(u_1,\ldots , u_n)\).

Another interesting interpretation of the mct order is given by the following result.

Proposition 1

Let \(\mathbf {X}\) and \(\mathbf {Y}\) be two \(n\)-dimensional random vectors. If \(\mathbf {X} \le _{mct} \mathbf {Y}\), then

$$\begin{aligned}&X_1 \le _c Y_1, \end{aligned}$$
(10)
$$\begin{aligned}&\quad \left( X_i \left| \displaystyle \bigcap _{j=1}^{i-1}\right. \{X_j= Q_{\mathbf {X}, j}(u_1,\ldots ,u_j) \}\right) \le _{c} \left( Y_i \left| \displaystyle \bigcap _{j=1}^{i-1}\right. \{ Y_j= Q_{\mathbf {Y}, j}(u_1,\ldots ,u_j) \}\right) , \nonumber \\ \end{aligned}$$
(11)

for \(i=2,\ldots ,n\) and for all \(u_i,\) such that \(0<u_i<1\), \(i=1,\ldots ,n\).

Proof

Under hypothesis assumption, \({\mathbf \Phi }_i(x_1,\ldots , x_i)\) is convex for all \(i=1,\ldots , n\). Therefore, it is also convex in \(x_i\), when \(x_1,\ldots ,x_{i-1}\) remains fixed. If we take into account that \({\mathbf \Phi }\) maps the multivariate quantile transform of \({\mathbf {X}}\) to the corresponding \({\mathbf {Y}}\), the proof follows directly by just observing the expressions (7) and (8) and recalling the definition of the univariate convex order. \(\square \)

Therefore, the univariate conditional distributions are ordered in the sense of van Zwet (1964), i.e., the conditional distributions of \(\mathbf {X}\) can be interpreted to be more right-skewed than those of \(\mathbf {Y}\).

Next, we consider an example to show that (10) and (11) in Proposition 1 are not sufficient conditions for the mct order.

Example 1

Let \(\mathbf {X}=(X_1, X_2)\) be a non-negative bivariate distribution and let \(m_1\ge 1\) and \(m_2\ge 1\) be two fixed constants. Let us consider the random vector \(\mathbf {Y}=(Y_1,Y_2)\) given by

$$\begin{aligned} (Y_1, Y_2)= & {} {\mathbf k} (X_1,X_2), \\= & {} (k_1(X_1), k_2(X_1,X_2)), \\= & {} (X_1^{m_1},X_1^{m_1}X_2^{m_2}). \end{aligned}$$

Since Remark 2, it is apparent that the function \(\mathbf \Phi \), given in (6), which maps \(\mathbf {X}\) onto \(\mathbf {Y}\) has the form

$$\begin{aligned} {\mathbf \Phi }(x_1,x_2) = \mathbf {Q_Y}(\mathbf {D_X}(x_1,x_2))= (x_1^{m_1}, x_1^{m_1}x_2^{m_2}). \end{aligned}$$

If we compute the Hessian matrix of \({\mathbf \Phi }_2\), we obtain

$$\begin{aligned} \nabla ^2 {\mathbf \Phi }_2 (x_1,x_2) = \left( \begin{array}{c@{\quad }c} m_1(m_1-1)x_1^{m_1-2} x_2^{m_2} &{} m_1m_2x_1^{m_1-1} x_2^{m_2-1} \\ m_1m_2x_1^{m_1-1} x_2^{m_2-1} &{} m_2(m_2-1)x_2^{m_2-2} x_1^{m_1} \end{array} \right) , \end{aligned}$$

where

$$\begin{aligned} \det ( \nabla ^2 {\mathbf \Phi }_2 (x_1,x_2) )= & {} -(m_1+m_2-1)m_1m_2x_1^{2m_1-2}x_2^{2m_2-2} \le 0, \end{aligned}$$

which obviously does not achieve positive values. Therefore, \(\nabla ^2 {\mathbf \Phi }_2 (x_1,x_2)\) is not semidefinite positive and \(\mathbf {X}\nleq _{mct} \mathbf {Y}\). However, it is apparent that \({\mathbf \Phi }_1(x_1)\) is convex in \(x_1,\) and \({\mathbf \Phi }_2(x_1,x_2)\) is convex in \(x_2\) when \(x_1\) remains fixed.

Recalling the univariate convex order, given two univariate random variables \(X\) and \(Y\), \(X=_{c}Y\) if, and only if, \(Y=_{st}aX+b\) for all \(a>0\) and real \(b\) (see Marshall and Olkin 2007, Proposition C.9). Therefore, it is natural to wonder if a similar property holds for the mct order: the answer is yes, as we will see in Proposition 2. To prove it, we recall first the inverse function theorem (see, e.g., Burkill and Burkill 2002).

Lemma 1

Let \(A\subseteq \mathbb R^n\) be an open set and \(\mathbf \Phi :A\rightarrow \mathbb R^n\) a differentiable and continuous function with differentiable and continuous inverse \(\mathbf \Phi ^{-1}\). Then,

$$\begin{aligned} J_{\mathbf \Phi ^{-1}}(\mathbf {y})|_{\mathbf {y}=\mathbf \Phi (\mathbf {x})}=(J_{\mathbf \Phi }(\mathbf {x}))^{-1}, \end{aligned}$$

for all \(\mathbf {x}\in A\).

Proposition 2

Let \(\mathbf {X}=(X_1,\ldots ,X_n)\) and \(\mathbf {Y}=(Y_1,\ldots ,Y_n)\) be two random vectors. Then,

$$\begin{aligned} \mathbf {X}=_{mct}\mathbf {Y}, \end{aligned}$$

if, and only if,

$$\begin{aligned} \mathbf {Y}^t=_{st}\mathbf A\mathbf X^t+\mathbf b, \end{aligned}$$

for a lower triangular matrix \(\mathbf A=(a_{ij})\) with diagonal elements, \(a_{ii} >0\), \(i=1\ldots , n\), and a column matrix \(\mathbf b\).

Proof

First, we will prove the sufficient condition. If \(\mathbf {Y}^t =_{st} \mathbf A\mathbf {X}^t+\mathbf b\), as specified previously, using Remark 2 we obtain that \({\mathbf \Phi }(\mathbf {x}) = Q_{\mathbf {Y}}( D_{\mathbf {X}} (\mathbf {x})) = \mathbf A \mathbf {x}^t +\mathbf b\). Hence, it is apparent that \(\mathbf {X} \le _{mct}\mathbf {Y}\). Just observing that \(\mathbf {X}^t =_{st} \mathbf A^{-1}(\mathbf {Y}^t-\mathbf b)\) and taking into account that \(\mathbf A^{-1}\) is also a lower triangular matrix with strictly positive diagonal elements, using again Remark 2, \(\mathbf {Y} \le _{mct}\mathbf {X}\) holds with a similar argument.

We will show now the necessary condition. Let us suppose that \(\mathbf {X}=_{mct} \mathbf {Y}\). Note that the function \({\mathbf \Phi }\), defined in (6), which maps \(\mathbf {X}\) to \(\mathbf {Y}\) has a lower triangular Jacobian matrix with diagonal elements strictly positive. If we denote by \({\mathbf \Phi ^\star }\) the function that follows from (6), exchanging \({\mathbf X}\) by \({\mathbf Y}\), then it is not difficult to see that \({\mathbf \Phi ^\star }= {\mathbf \Phi }^{-1}\). Hence, by hypothesis assumption, the components of \(\mathbf \Phi \) and \(\mathbf \Phi ^{-1}\) are convex functions. Hence,

$$\begin{aligned} J_{\mathbf \Phi }(\mathbf x_1)(\mathbf x_2 - \mathbf x_1)^t \le (\mathbf \Phi (\mathbf x_2) - \mathbf \Phi (\mathbf x_1))^t, \end{aligned}$$
(12)

and, from Lemma 1, we have

$$\begin{aligned} (J_{\mathbf \Phi }(\mathbf x_1))^{-1}(\mathbf \Phi (\mathbf x_2) - \mathbf \Phi (\mathbf x_1))^t\le (\mathbf x_2 - \mathbf x_1)^t, \end{aligned}$$
(13)

where \(\mathbf x_2\), \(\mathbf x_1\) are in the support of \(\mathbf {X}\).

Let us see that these inequalities imply that \(\mathbf \Phi (\mathbf {x})=\mathbf A\mathbf {x}^t+\mathbf b\). Let us proceed by induction on \(i=1,\ldots n\).

For the case \(i=1\), the result is trivial, because we have that \({\mathbf \Phi }_1(x_1)\) and \({\mathbf \Phi }_1^{-1}(x_1)\) are increasing and convex and, therefore, \({\mathbf \Phi }_1(x_1)=a_{11} x_1 + b_1\) where \(a_{11} >0\).

Let us assume that this is true for \(j=1,\ldots ,i-1\), that is,

$$\begin{aligned} {\mathbf \Phi }_j(x_1,\ldots , x_j)=a_{1j}x_1 + \ldots + a_{jj} x_j + b_j, \end{aligned}$$

and let us see that it is true for \(j=i\). Then, we can write

and

where \(A_{i-1}\) is a lower triangular matrix with dimension \((i-1)\times (i-1)\) and

$$\begin{aligned}&B(x_1,\ldots , x_i)\nonumber \\&\quad =-\frac{1}{\frac{\partial {\mathbf \Phi }_i}{\partial x_i}(x_1,\ldots , x_i)}\left( \frac{\partial {\mathbf \Phi }_i}{\partial x_1}(x_1,\ldots , x_i), \ldots , \frac{\partial {\mathbf \Phi }_i}{\partial x_{i-1}} (x_1,\ldots , x_i)\right) A_{i-1}^{-1}. \end{aligned}$$

From (12), taking \(\mathbf x_2=(x_{21},\ldots ,x_{2n})\) and \(\mathbf x_1=(x_{11},\ldots ,x_{1n})\) in the support of \(\mathbf {X}\), we obtain that

$$\begin{aligned} \sum _{j=1}^i \frac{\partial {\mathbf \Phi }_i}{\partial x_j}(x_{11},\ldots ,x_{1i})(x_{2j}-x_{1j})\le {{\mathbf \Phi }_i} (x_{21},\ldots ,x_{2i}) - {\mathbf \Phi }_i (x_{11},\ldots ,x_{1i}). \end{aligned}$$
(14)

From (13), we also have that

$$\begin{aligned} x_{2i} - x_{1i}\ge & {} B(x_{11},\ldots ,x_{1i}) \left( \begin{array}{c} {\mathbf \Phi }_1 (x_{21})-{\mathbf \Phi }_1 (x_{11})\\ \vdots \\ {\mathbf \Phi }_{i-1} (x_{21},\ldots , x_{2(i-1)}) - {\mathbf \Phi }_{i-1} (x_{11},\ldots , x_{1(i-1)}) \end{array} \right) \\&+\, \frac{1}{\frac{\partial {\mathbf \Phi }_i}{\partial x_i}(x_{11},\ldots ,x_{1i})}({\mathbf \Phi }_{i} (x_{21},\ldots ,x_{2i}) - {\mathbf \Phi }_{i} (x_{11},\ldots ,x_{1i})) . \end{aligned}$$

Now, taking into account the expression of \(B(\cdot )\) and the induction hypothesis, the previous inequality is equivalent to

$$\begin{aligned} x_{2i} - x_{1i}\ge & {} -\frac{1}{\frac{\partial {\mathbf \Phi }_i}{\partial x_i}(x_{11},\ldots ,x_{1i})}\sum _{j=1}^{i-1} \frac{\partial {\mathbf \Phi }_i}{\partial x_j}(x_{11},\ldots ,x_{1i})(x_{2j}-x_{1j}) \\&+\, \frac{1}{\frac{\partial {\mathbf \Phi }_i}{\partial x_i}(x_{11},\ldots ,x_{1i})}({\mathbf \Phi }_{i} (x_{21},\ldots ,x_{2i}) - {\mathbf \Phi }_{i} (x_{11},\ldots ,x_{1i})). \end{aligned}$$

From this inequality, we get

$$\begin{aligned} \sum _{j=1}^{i} \frac{\partial {\mathbf \Phi }_i}{\partial x_j}(x_{11},\ldots ,x_{1i})(x_{2j}-x_{1j}) \ge ({\mathbf \Phi }_{i} (x_{21},\ldots ,x_{2i}) - {\mathbf \Phi }_{i} (x_{11},\ldots ,x_{1i})). \end{aligned}$$

Therefore, from previous inequality and (14) we get

$$\begin{aligned} \sum _{j=1}^i \frac{\partial {\mathbf \Phi }_i}{\partial x_j}(x_{11},\ldots ,x_{1i})(x_{2j}-x_{1j})\le & {} {\mathbf \Phi }_i (x_{21},\ldots ,x_{2i})-{\mathbf \Phi }_i (x_{11},\ldots ,x_{1i}) \\\le & {} \sum _{j=1}^i \frac{\partial {\mathbf \Phi }_i}{\partial x_j}(x_{11},\ldots ,x_{1i})(x_{2j}-x_{1j}). \end{aligned}$$

Hence, \({\mathbf \Phi }_i\) is an affine transformation. \(\square \)

Proposition 2 is consistent with the skewing mechanism proposed by Ley and Paindaveine (2010). This is due to the fact that an affine transformation with a column matrix \(\mathbf {b}=\mathbf {0}\) is an odd function. Therefore, an affine transformation cannot be considered as a skewing mechanism.

Now, we present an example where we show that all elliptically contoured distributions sharing a common generator are equal in the mct-order sense. First, we recall that \({\mathbf X}\) has an elliptically contoured distribution, denoted by \(E_n(\mu , \Sigma , g)\), if its density function can be expressed as

$$\begin{aligned} f(\mathbf{x})= k \cdot g((\mathbf{x}-\mu )^t\Sigma ^{-1}(\mathbf{x}-\mu )), \end{aligned}$$

where \(k\) is the scale factor, \(\mu \) is the median vector (which is also the mean vector if the latter exists), \(\Sigma \) is a symmetric positive definite matrix which is proportional to the covariance matrix if the latter exists, and \(g\) is a function mapping from the non-negative reals to the non-negative reals giving a finite area under the curve.

Example 2

Let \(\mathbf {X} \sim E_{n}(\mu _{1},\Sigma _{1}, g)\) and \(\mathbf {Y} \sim E_{n}(\mu _{2},\Sigma _{2},g)\) be two nondegenerate multivariate elliptically contoured distributions sharing a common generator, where \(\Sigma _{i}\), \(i=1,2\), are two non-singular symmetric positive definite matrices. According to Theorem 14.5.11 in Harville (1997), we can find two lower triangular matrices \(A\) and \(B\) such that \(AA^{t}=\Sigma _{2}\) and \(B^{t}B=\Sigma _{1}^{-1}\). Furthermore, we have that

$$\begin{aligned} A^t = {\mathbf D}_{A}^{1/2} {\mathbf U}\quad \text{ and } \quad B={\mathbf D}_{B}^{-1/2} ( {\mathbf V}^{-1})^{t}, \end{aligned}$$

with \({\mathbf U}\) and \({\mathbf V}\) being the unique unit upper triangular matrices and \({\mathbf D}_A = \{ d_{Ai}\} \) and \({\mathbf D}_B = \{ d_{Bi}\} \) being the unique diagonal matrices with positive elements such that

$$\begin{aligned} \Sigma _1 = {\mathbf V}^t {\mathbf D}_{B} {\mathbf V}\quad \text{ and } \quad \Sigma _2={\mathbf U}^{t} {\mathbf D}_{A} {\mathbf U,} \end{aligned}$$

where \({\mathbf D}_{A}^{1/2} = \{ \sqrt{d_{Ai}} \}\) and similarly for \({\mathbf D}_{B}^{1/2}\). The \({\mathbf U} \) and \({\mathbf V} \) matrices are computed using the Cholesky decomposition (Harville 1997).

According to Remark 2 and from the well-known fact that elliptically contoured distributions are preserved by affine transformations, the function \(\mathbf \Phi \), given in (6), which maps \(\mathbf {X}\) onto \(\mathbf {Y}\) has the form

$$\begin{aligned} {\mathbf \Phi }(\mathbf {x}) = \mathbf {Q_Y}(\mathbf {D_X}(\mathbf {x}))= AB(\mathbf x-\mu _1)+\mu _2. \end{aligned}$$

Consequently, the Jacobian matrix of \({\mathbf \Phi }\) satisfies \(J_{\mathbf \Phi }= AB\). Due to the fact that the Jacobian matrix is constant, it follows directly that the Hessian matrix of \({\mathbf \Phi }_i\) is the null matrix for all \(i=1,\ldots , n\), and analogously for the function that maps \(\mathbf {Y}\) to \(\mathbf {X}\). Hence, \( \mathbf {X} =_{mct} \mathbf {Y}\).

We would like to emphasize that the elliptical family contains the multivariate normal family that corresponds to the functional parameter \(g(t)=\exp (-t/2),\) which is a particular case of the power exponential family, \(g(z)=\exp (-z^\beta )/2)\) . It also contains a number of widely used subfamilies, which can be useful for robustness purpose as the multivariate t-distribution family, the multivariate symmetric generalized hyperbolic family, etc.

It is easy to see that the mct order is closed under conjunctions and verifies a sort of marginalization closure property. We state these properties without proof.

Proposition 3

The following properties hold:

  1. (i)

    Let \(\mathbf {X}_1,\ldots ,\mathbf {X}_n\) be a set of independent random vectors where the dimension of \(\mathbf {X}_i\) is \(k_i\), \(i=1,\ldots ,n\). Let \(\mathbf {Y}_1,\ldots ,\mathbf {Y}_n\) be a set of independent random vectors where the dimension of \(\mathbf {Y}_i\) is \(k_i\), \(i=1,\ldots ,n\). If \(\mathbf {X}_i\le _{mct}\mathbf {Y}_i\) for \(i=1,\ldots ,n\), then

    $$\begin{aligned} (\mathbf {X}_1,\ldots ,\mathbf {X}_n)\le _{mct}(\mathbf {Y}_1,\ldots ,\mathbf {Y}_n). \end{aligned}$$
  2. (ii)

    Let \(\mathbf {X}=(X_1,\ldots ,X_n)\) and \(\mathbf {Y}=(Y_1,\ldots ,Y_n)\) be two \(n\)-dimensional random vectors. Let \(1\le i\le n,\) \(\mathbf {X}_I=(X_{1},\ldots ,X_{i})\) and \(\mathbf {Y}_I=(Y_{1},\ldots ,Y_{i})\). If \(\mathbf {X}\le _{mct} \mathbf {Y}\), then \(\mathbf {X}_I\le _{mct}\mathbf {Y}_I\).

Due to the fact that the composition of multivariate convex functions is not always convex, the multivariate transform convex order does not satisfy the transitive property. Let us see the following example:

Example 3

Let \(\mathbf {X} = (X_1,X_2)\), \(\mathbf {Y} = (X_1^2, X_2^2)\) and \(\mathbf {Z} = (2X_1^2, -3X_1^2+X_2^2)\) be three bivariate random vectors on \((0,\infty )^2\). From Remark 2, a straightforward computation shows that

$$\begin{aligned} {\mathbf \Phi }^{(1)} (x_1, x_2)= & {} \mathbf {Q_Y}(\mathbf {D_X}(x_1, x_2)) = (x_1^2, x_2^2), \\ {\mathbf \Phi }^{(2)} (y_1, y_2)= & {} \mathbf {Q_Z}(\mathbf {D_Y}(y_1, y_2)) = (2y_1, -3y_1+y_2), \\ {\mathbf \Phi } (x_1, x_2)= & {} \mathbf {Q_Z}(\mathbf {D_X}(x_1, x_2)) = (2x_1^2, -3x_1^2 + x_2^2). \end{aligned}$$

Just computing the Hessian matrices of all component functions, it is easily obtained that \(\mathbf {X} \le _{mct} \mathbf {Y} =_{mct} \mathbf {Z},\) but \(\mathbf {X} \not \le _{mct} \mathbf {Z}\).

However, transitive property holds for some particular transformations, as we can see in the following proposition. The proof is a direct consequence of the composition of convex functions (see Marshall et al. 2011, Proposition B.7.) and it has been omitted.

Proposition 4

Let \(\mathbf {X}\), \(\mathbf {Y}\) and \(\mathbf {Z}\) be three \(n\)-dimensional random vectors such that \(\mathbf {X} \le _{mct} \mathbf {Y} \le _{mct} \mathbf {Z},\) and let \({\mathbf \Phi }^{(1)} \equiv \mathbf {Q_Y} \circ \mathbf {D_X}\) and \({\mathbf \Phi }^{(2)} \equiv \mathbf {Q_Z} \circ \mathbf {D_Y}\) as described in (6). If \({\mathbf \Phi }^{(2)}(\mathbf {y})\) is increasing for all \(\mathbf {y} \in \mathbb {R}^n\), then \(\mathbf {X} \le _{mct} \mathbf {Z}\).

3 On relative skewness for random vectors with the same copula

In this section, we discuss the case of random vectors with the same copula. A copula \(C\) is a cumulative distribution function with uniform margins on \([0, 1]\). The notion of copula was introduced by Sklar (1959). The main purpose of copulas is to describe the interrelation of several random variables. Given a random vector \(\mathbf {X}\) with margins \(F_1,\ldots ,F_n\), there exists a copula \(C\) such that

$$\begin{aligned} F(x_1,\ldots ,x_n)=C(F_1(x_1),\ldots ,F_n(x_n)). \end{aligned}$$

Moreover, if \(F_1,\ldots ,F_n\) are continuous, then \(C\) is unique (see Nelsen 1999, for a complete review about copulas). On the other hand, any copula evaluated with some margins in the right way leads to a multivariate distribution function. Next, we show that for random vectors with the same copula, the mct order is equivalent to compare in the convex order the marginal distributions.

Theorem 1

Let \(\mathbf {X}=(X_1,\ldots ,X_n)\) and \(\mathbf {Y}=(Y_1,\ldots ,Y_n)\) be two random vectors sharing a common copula. Then, \(\mathbf {X} \le _{mct}\mathbf {Y}\) if, and only if, \(X_i\le _{c}Y_i\) for all \(i = 1,\ldots , n\).

Proof

Arias-Nicolás et al. (2005) showed that, for two random vectors with the same copula, the function \(\mathbf \Phi \), defined in (6), which maps \(\mathbf {X}\) to \(\mathbf {Y}\) satisfies that

$$\begin{aligned} \mathbf \Phi _i( x_1,\ldots , x_i)=G_{Y_i}^{-1}(F_{X_i}(x_i)), \end{aligned}$$
(15)

for all \(i=1,\ldots ,n\). The result follows directly from expression (15) and recalling the definition of the univariate convex order. \(\square \)

As an immediate consequence of Theorem 1, for two random vectors having independent components, independence copula, the mct order is reduced to the univariate convex order between the marginal distributions.

From Proposition 4 and Theorem 1, we obtain the following corollary.

Corollary 1

Let \(\mathbf {X}=(X_1,\ldots ,X_n)\) and \(\mathbf {Y}=(Y_1,\ldots ,Y_n)\) be two random vectors such that \(\mathbf {X} \le _{mct} \mathbf {Y}\). Then, \(\mathbf {X} \le _{mct} \mathbf {W}\) for all random vector \(\mathbf {W}=(W_1,\ldots , W_n)\) having the same copula than \(\mathbf {Y}\) such that \(Y_i\le _{c}W_i\), for all \(i = 1,\ldots , n\).

Proof

Using Theorem 1, \(\mathbf {Y}\le _{mct} \mathbf {W}\) holds and the transformation \({\mathbf \Phi } \equiv \mathbf {Q_{Y}} \circ \mathbf {D_{W}}\), defined in (6), only depends on the marginal distributions, i.e., it can be expressed as

$$\begin{aligned} \mathbf \Phi _i(x_1,\ldots ,x_n)=G^{-1}_{{Y}_i}(F_{W_i}(x_i)), \end{aligned}$$

where it is apparent that \({\mathbf \Phi }\) is trivially increasing. The proof concludes just using Proposition 4. \(\square \)

Theorem 1 can be used to provide many examples of random vectors ordered in the mct order. Just fixing a copula, many multivariate random vectors can be ordered in the mct order via the univariate comparison of the marginal distributions. We would like to mention that the fact of using copulas for providing a flexible skewing mechanism is mentioned in Ley and Paindaveine (2010). Due to the well-known fact that the copula is preserved by strictly increasing transformations of the marginal distributions, given a random vector \({\mathbf X}=(X_1,\ldots ,X_n)\) with a symmetric distribution and copula \(C\), any multivariate increasing convex transformation of the type \( {\mathbf \Phi }(x_1,\ldots ,x_n)=(\varPhi _1 (x_1),\ldots , \varPhi _n (x_n) )\) will provide a skewed version of \({\mathbf X}\), say \({ \mathbf Y}= {\mathbf \Phi }({ \mathbf X})\), in the mct-order sense having the same copula \(C\).

Next, we describe some situations where Theorem 1 can be applied.

3.1 Multivariate normal and lognormal distributions

Given a random vector \(\mathbf {X}=(X_1,\ldots ,X_n)\) with multivariate normal distribution (see Example 2), the random vector \(\mathbf Y=(Y_1,\ldots ,Y_n)\), where \(Y_i=\exp X_i\) for \(i=1,\ldots ,n\), follows a multivariate lognormal distribution (see Aitchisonm and Brown 1957). Clearly, \(\mathbf X\) and \(\mathbf Y\) share the same copula and the function that maps \(X_i\) onto \(Y_i\) is convex, for all \(i=1,\ldots ,n\). Therefore, using Theorem 1, we obtain that \(\mathbf X\le _{mct} \mathbf Y\). Hence, the multivariate lognormal distribution is a right skewed transformation of a multivariate normal distribution. We also observe that given any other multivariate normal distribution \(\mathbf {X}^{\prime }\), from Example 2, \(\mathbf X^{\prime } =_{mct} \mathbf X\) holds. Then, using Proposition 4, \(\mathbf X^{\prime } \le _{mct} \mathbf Y\) also holds.

3.2 Multivariate distributions with IFR [DFR] margins

The next corollary provides a situation where we can apply previous ideas.

Corollary 2

Let \(\mathbf {X}=(X_1,\ldots ,X_n)\) be a random vector having a copula \(C\) such that all marginal distributions, \(X_i\), \(i=1,\ldots , n\), satisfy the IFR \([\)DFR\(]\) aging property. If we consider a random vector \(\mathbf {Y}=(Y_1,\ldots , Y_n)\) with the same copula \(C\) but having shifted exponential marginal distributions, that is \(Y_i \sim a_iE+b_i\), where \(a_i>0\), \(b\in \mathbb {R}\) and \(E\sim exp(1)\), then \(\mathbf {X} \le _{mct} [\ge _{mct}] \mathbf {Y}\).

Proof

The proof follows easily from Theorem 1 and the univariate characterization of the IFR [DFR] aging property given in (1).\(\square \)

Taking into account previous results, we can construct a great bunch of examples of multivariate distributions ordered in the mct order. Let us see some examples.

Example 4

Let \(\mathbf {X}\) and \(\mathbf {Y}\) be two random vectors with the same Gumbel copula given by:

$$\begin{aligned} C(u_1,u_2)=\exp \left\{ -\left[ (-\log u)^{1/\theta }+(-\log v)^{1/ \theta }\right] ^\theta \right\} \quad \text{ for } \,u_1,u_2\in (0,1). \end{aligned}$$

Let \(\mathbf {X}\) be a random vector with Weibull distributed margins with scale parameter equal to one, i.e., with distribution function given by:

$$\begin{aligned} F_i (x)=1-\exp \{-x^{\beta _i}\}\quad \text{ for } \,i=1,2. \end{aligned}$$

This bivariate distribution \(\mathbf X\) can be found in Lu and Bhattacharyya (1990).

Let \(\mathbf {Y}\) be a random vector with exponential distributed margins, i.e.,

$$\begin{aligned} G_i (x)=1-\exp \{-\alpha _i x\}\quad \text{ for } \,i=1,2. \end{aligned}$$

Then, we have again that \(\mathbf {X}\le _{mct}\mathbf {Y}\) [\(\ge _{mct}\)] if \(\beta _i>1\) \([ <1] \) for \(i=1,2\), respectively.

In fact, in Fig. 1, we plot the joint density functions of the bivariate Weibull and Gumbel distributions for \(\theta =0.3\), \(\beta _1=\beta _2=3\) and \(\alpha _1=\alpha _2=1\). Clearly, the bivariate Gumbel distribution is more skewed than the bivariate Weibull distribution.

Fig. 1
figure 1

Joint density function of a, c Weibull and b, d Gumbel bivariate distributions

Example 5

Let us consider a random vector \(\mathbf {X}\) with a Clayton copula which is given by

$$\begin{aligned} C(u_1,u_2)=\left[ u_1^{-1/\theta }+u_2^{-1/\theta }+1\right] ^{-\theta }\quad \text{ for } \,u_1,u_2\in (0,1), \end{aligned}$$

where \(\theta >0\) and exponentially distributed margins. Let \(\mathbf {Y}\) be a bivariate Pareto distribution as defined in Lindley and Singpurwalla (1986). \(\mathbf {Y}\) has a Clayton copula and Pareto distributed margins (see Balakrishnan and Lai 2009). It is known that the Pareto distribution is DFR. Hence, again from Corollary 2, we have \(\mathbf {X}\le _{mct}\mathbf {Y}\).

In Fig. 2, we can see the plots of the joint density functions of a distribution with a Clayton copula and exponential margins and a Pareto bivariate distribution for \(\theta =0.3\) in both cases.

Fig. 2
figure 2

Joint density function a, c of the Clayton copula with exponential margins and b, d Pareto bivariate distributions

3.3 Relative skewness for ordered data

The model of a random vector with ordered components arises in a natural way when we arrange in increasing order a set of observations from a random variable. Another example is the case of epoch times of a counting process, like the case of a nonhomogeneous Poisson process. Epoch times of nonhomogeneous Poisson processes can be introduced as record values of a proper sequence of random variables, which is another typical example of ordered data. Finally, it is worth mentioning that order statistics is also an interesting way of skewing symmetric distributions (see, e.g., Jones 2004). Given the similarity of several results for order statistics and record values, Kamps (1995a) introduced the model of generalized order statistics. This model provides a unified approach to study order statistics and record values, and several other models of ordered data. First, we recall the definition of generalized order statistics following (Kamps 1995a, b).

Definition 2

Let \(n\in \mathbb {N}\), \(k\ge 1\), \(m_{1},\ldots ,m_{n-1}\in \mathbb {R}\), \(M_{r}=\sum _{j=r}^{n-1}m_{j}\), \(1\le r\le n-1\) be parameters such that \(\gamma _{r}=k+n-r+M_{r}\ge 1\) for all \(r\in {1,\ldots ,n-1}\), and let \(\widetilde{m}=(m_{1},\ldots ,m_{n-1})\), if \(n\ge 2\) \((\tilde{m}\in \mathbb {R}\) arbitrary, if \(n=1)\). We call uniform generalized order statistics the random vector \((U_{(1,n,\tilde{m},k)},\ldots ,U_{(n,n,\tilde{m},k)})\) with joint density function

$$\begin{aligned} h(u_{1},\ldots ,u_{n})=k\left( \prod _{j=1}^{n-1}\gamma _{j}\right) \left( \prod _{j=1}^{n-1}(1-u_{j})^{m_{j}}\right) (1-u_{n})^{k-1}, \end{aligned}$$

on the cone \(0\le u_{1}\le \cdots \le u_{n}\le 1\). Now, given a distribution function \(F\), we call generalized order statistics based on \(F\) the random vector

$$\begin{aligned} (X_{(1,n,\tilde{m},k)},\ldots ,X_{(n,n,\tilde{m},k)})\equiv \left( F^{-1}(U_{(1,n,\tilde{m},k)}),\ldots ,F^{-1}(U_{(n,n,\tilde{m},k)})\right) . \end{aligned}$$

If \(F\) is an absolutely continuous distribution with density \(f\), the joint density function of \((X_{(1,n,\tilde{m},k)},\ldots ,X_{(n,n,\tilde{m},k)})\) is given by

$$\begin{aligned} f(x_{1},\ldots ,x_{n})=k\left( \prod _{j=1}^{n-1}\gamma _{j}\right) \left( \prod _{j=1}^{n-1}\bar{F}(x_{j})^{m_{j}}f(x_{j})\right) \bar{F}(x_{n})^{k-1}f(x_{n}), \end{aligned}$$

on the cone \(F^{-1}(0)\le x_{1}\le \cdots \le x_{n}\le F^{-1}(1)\).

Let us see now several other models that are included in this model. As we have mentioned previously, order statistics and record values are a particular case of this model (see Belzunce 2013, for a detailed review).

Taking \(m_{i}=0\) for all \(i=1,\ldots ,n-1\) and \(k=1\), we get the random vector of order statistics \((X_{1:n},X_{2:n},\ldots ,X_{n:n})\) from a set of \(n\) independent and identically distributed (i.i.d) observations \(X_1,X_2,\ldots ,X_n\) with common absolutely continuous distribution \(F\); in particular, we get that \(X_{i:n}=_{st }X_{(i,n,0,1)}\).

Taking \(m_{i}=-1\) for all \(i=1,\ldots ,n-1\) and \(k=1\), we get the random vector of the first \(n\) record values (see Chandler 1952).

Some additional particular cases of GOSs are the following. Taking \(m_{i}=-1\) for all \(i=1,\ldots ,n-1\) and \(k\in \mathbb N\), we get \(k\)-records. Taking \(n=m\), \(m_i=R_i\) and \(k=R_m+1\), we get order statistics from Type-II censored data. Another particular case is the that of order statistics under multivariate imperfect repair (see Shaked and Shanthikumar 1986).

Next, we show a property dealing with the comparison of relative skewness for generalized order statistics. Note that the dependency of the ordering of the marginal distribution for the mct order appears in a natural way for random vectors with ordered components.

Theorem 2

Let \(\mathbf {X}\) and \(\mathbf {Y}\) be two random vectors of generalized order statistics based on distributions \(F\) and \(G\) from random variables \(X\) and \(Y\), respectively. Then, \(\mathbf {X}\le _{mct}\mathbf {Y}\) if, and only if, \(X\le _{c}Y\).

Proof

Since the vectors of generalized order statistics have the same copula (see Belzunce et al. 2008), the result follows easily if \(X_{(r,n,\tilde{m},k)}\le _{c}Y_{(r,n,\tilde{m},k)}\), for all \(r=1,\ldots ,n\), if, and only if, \(X\le _{c} Y\). This follows observing that

$$\begin{aligned} G_{Y_{(r,n,\widetilde{m},k)}}^{-1}(F_{X_{(r,n,\widetilde{m} ,k)}}(x))=G^{-1}(F(x)), \end{aligned}$$

where \(G_{Y_{(r,n,\widetilde{m},k)}}\) and \(F_{X_{(r,n,\widetilde{m},k)}}\) denote the distribution functions of \(Y_{(r,n,\tilde{m},k)}\) and \(X_{(r,n,\tilde{m},k)}\), respectively (Cramer and Kamps 2003, see also). \(\square \)

In Fig. 3, we can see the plots of the joint density functions of three random vectors of order statistics, \(\mathbf X, \mathbf Y\) and \(\mathbf Z\), based on the Weibull distribution with parameters \(\lambda =1\) and \(\beta =2,1,0.5\), respectively. As previously mentioned, the aging of the Weibull distribution depends on the value of \(\beta \). In particular, we have that if \(\beta >[<]1\), then the Weibull distribution is IFR [DFR] and it is smaller [bigger] than the exponential distribution, which is obtained with \(\beta =1\), in the convex order. From Corollary 4 and Theorem 2, we have \(\mathbf X \le _{mct} \mathbf Y \le _{mct} \mathbf Z\).

Fig. 3
figure 3

Joint density function of bivariate random vectors based on the Weibull disribution for \(\beta =2,1,0.5\), respectively

3.4 Relative skewness with real data

In environmental sciences, there are situations where we can be interested in the analysis of skewness or concentration of our data. In this context, the study of skewness or concentration of hydrological data, such as droughts or precipitations, has attracted the attention for economic or wealth reasons. They can be described through some fundamental characteristics such as for example duration, denoted by \(D\), and severity, denoted by \(S\), which are usually fitted to the gamma and exponential distributions (see Shiau 2006). Moreover, the Clayton copula, as introduced in Example 5, is widely used to model the relationship between drought severity and drought duration. In Shiau and Modarres (2009), a copula-based drought severity duration study is given. They analyzed rainfall data for the period 1954–2003 from two gauge stations in Iran, Abadan and Anazli. Let us consider two random vectors \(\mathbf {X}=(S_{1},D_{1})\) and \(\mathbf {Y}=(S_{2},D_{2})\) with severity duration of Abadan and Anazli, respectively. Shiau and Modarres (2009) showed that both bivariate distributions have a Clayton copula with estimated parameters \(\widehat{\theta }_{\mathbf {X}}=1.527\) and \(\widehat{\theta }_{\mathbf {Y}}=1.497\), respectively. We provide in Table 1 the marginal distributions.

Table 1 Parameters of the margins of \(\mathbf {X}\) and \(\mathbf {Y}\)

Just taking into account the estimation error of the dependency parameter, \(\theta \), we can assume that \(\mathbf {X}\) and \(\mathbf {Y}\) share the same copula. On the other hand, van Zwet (1964) also studied the convex transform order between gamma distributions. In particular, if \(X\) and \(Y\) are univariate gamma distributions with shape parameters \(\alpha _1\) and \(\alpha _2\), respectively, such that \(\alpha _1\ge \alpha _2\), then \(X\le _{c} Y\). In our case, we have that the estimation of the shape parameter of \(S_1\) (\(\widehat{\alpha }_1=0.737\)) is smaller than that corresponding to \(S_2\) (\(\widehat{\alpha }_2=0.917\)); therefore, there is reasonable empirical evidence to affirm that \(S_1\ge _c S_2\). On the other hand, it is apparent that all exponential distributions are equal in the univariate convex transform order. Thus, we have that \(S_1=_{c}S_2\). Hence, using jointly these results and Proposition 1, \(\mathbf {X}\ge _{mct}\mathbf {Y}\) holds and we expect that \(\mathbf {X}\) is more right-skewed than \(\mathbf {Y}\).

This type of comparison implies, from Proposition 1, that both margins and conditioned random variables are ordered in the convex order. It is well known that the convex transform order preserves the comparison between the main skewness coefficients, such as the Fisher’s and Bowley’s coefficients (see MacGillivray 1986) and the coefficient of variation and Gini indexes (see Shaked and Shanthikumar 2007) for concentration. Therefore, data in Anazli are relatively more dispersed than those of Abadan, and therefore they are more unpredictable.

4 Conclusions

The study of skewness is of interest in many areas of research such as risk theory, finance, hydrology, etc. Skewness can be studied through comparisons of many single measures, but stochastic orders and, in particular when we deal with random variables, the univariate convex transform order proposed by van Zwet (1964) provides a more complete comparison between the skewness of two random variables. Unfortunately, in real phenomena, more than one random variable is involved and its analysis is more complicated.

In this work, we provide a new multivariate stochastic order, the so-called mct order, as a tool based on the standard construction for comparing the skewness of two random vectors. This transformation was firstly proposed by Rosenblatt (1952), and rediscovered periodically, and captures the essence of how a random vector is mapped onto another. The convexity of the standard construction is the key of the cct order which generalizes the univariate version of the convex transform order given by van Zwet (1964).

On the other hand, when one compares two random vectors in the mct order, we also show that it is important to take into account their dependence structures, because in case that they share the same copula, the comparison can be reduced to compare the margins in the univariate convex transform order.

Finally, we illustrate the results with different applications to some probabilistic models such as normal and lognormal distributions and generalized ordered statistics, as well as an example dealing with real data related to environmental sciences.