Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The multivariate extended skew-normal distribution (ESN, hereafter) was introduced by [12] and later independently rediscovered by [4, 7, 10]. Its pdf is

$$f(z;\xi ,\Omega ,\eta ,\tau ) = \frac{{{\phi _p}(z;\xi ,\Omega )}}{{\Phi (\tau )}} \cdot \left\{ {\tau \sqrt {1 + {\eta ^T}{\Omega ^{ - 1}}\eta } + {\eta ^T}(z - \xi )} \right\},$$
((1))

where z, ξ, η ∈ ℜd, τ ∈ ℜ d ×ℜ d , Ω > O, Φ (·) is the cdf of a standard normal distribution and ϕ p (z; ξ, Ω) is the pdf of a d-dimensional normal distribution with mean ξ and variance Ω. The distribution of z is said to be ESN with location parameter ξ, scale parameter Ω, nonnormality parameter η and truncation parameter τ, that is zESN(ξ, Ω, η, τ). This parameterization, with minor modifications, was used in [13] for inferential purposes and this paper gives further motivations for its use. Other parameterizations appear in [1, 2, 7, 10, 19]. The multivariate normal and multivariate skew-normal [11] distributions are ESN with τ = 0 and η = 0 d , respectively. [8, 9] review theoretical properties of the ESN.

The ESN appears in several areas of statistical theory: Bayesian statistics [27], regression analysis [14] and graphical models [13]. It also appears in several areas of applied statistics: environmetrics, medical statistics, econometrics and finance. Environmental applications of the ESN include modelling data from monitoring stations aimed at finding large values of pollutants [19] and uncertainty analysis related to the economics of climate change control [30]. In medical statistics, the ESN has been used as a predictive distribution for cardiopulmonar functionality [15] and for visual acuity [20]. Financial applications mainly deal with portfolio selection [2, 4] and the market model, which relates asset returns to the return on the market portfolio [1]. In econometrics, the ESN is known for its connection with bias modelling in the Heckman’s model [16] and with stochastic frontier analysis [22].

Unfortunately, the information matrix of the ESN is singular when η = 0 d , i.e. when it is a normal distribution. This prevents straightforward application of standard likelihood-based methods to test the null hypothesis of normality. The problem is well-known for the skew-normal case and has been successfully dealt with via the centred parametrization, which could be useful in the ESN case, too, but satisfactory theoretical results for this distribution appear to be difficult to obtain [6]. Problems with the information matrix of the ESN are made worse by the truncation parameter τ, which indexes the distribution only when it is not normal. As a direct consequence of the above arguments the rank of the information matrix is at least two less than the full, thus preventing application of results in [28]

An alternative approach could be based on the sample skewness, since it provides the locally most powerful test for normality among the location and scale invariant ones, when the underlying distribution is assumed to be univariate skew-normal [29]. [23] proposed a multivariate generalization based on data projections onto directions maximizing skewness. The test has been criticized for involving prohibitive computational work [5] and because calculation of the corresponding population values seemed impossible [25]. Both problems vanish in the SN case [21]. In the first place, the direction maximizing skewness has a simple parametric interpretation, and hence can be estimated either by maximum likelihood or method of moments. In the second place, the population values of skewness indices proposed by [23, 24] coincide and have a simple analytical form.

The paper generalizes results in [21, 29] to univariate and multivariate ESN distributions, respectively. It is structured as follows. Section 2 shows that sample skewness provides the locally most powerful test for normality among the location and scale invariant ones, under the assumption of extended skew-normality. Section 3 shows that the nonnormality parameter η identifies the linear functions of ESN random vector with maximal skewness, and discusses the related inferential implications. Sections 4 and 5 contain a numerical example and some concluding remarks, respectively.

2 The Univariate Case

Let X 1,…,X n be a random sample from an univariate extended skew-normal distribution with nonnegative skewness:

$$f(z;\xi ,\omega ,\eta ,\tau ) = \frac{1}{\omega }\phi \left( {\frac{{z - \xi }}{\omega }} \right) \cdot \frac{{\Phi \left\{ {\tau \sqrt {1 + {n^2}/{\omega ^2}} + \eta (z - \xi )} \right\}}}{{\Phi (\tau )}},$$
((2))

where z, ξ, τ ∈ ℜ, ω > 0, η ≥ 0 and ϕ denotes the pdf of a standard normal distribution. Moreover, let X̄, S 2 and G 1 be the sample mean, the sample variance and the sample skewness, respectively:

$$\bar X = \frac{1}{n}\sum\nolimits_{i = 1}^N {{X_i},{S^2} = \frac{1}{{n - 1}}\sum\nolimits_{i = 1}^N {{{({X_i} - \bar X)}^2},{G_1} = \frac{1}{n}\infty \sum\limits_{i = 1}^N {{{\left( {\frac{{{X_i} - \bar X}}{S}} \right)}^3}.} } } $$
((3))

Interest lies in the most powerful test of given size for H 0: η=0 against H 0:0 < η < ε, where ε is a small enough positive constant, based on statistics which do not depend on location and scale changes, that is functions of

$$\frac{{{X_1} - \bar X}}{S},...,\frac{{{X_n} - \bar X}}{S}.$$
((4))

The following theorem shows that such tests are characterized by rejection regions of the form R={(X 1,…,X n ) : G 1 > c}, where c is a suitably chosen constant.

Theorem 1 Let X1,…, X n be a random sample from an univariate extended skewnormal distribution with nonnegative skewness. Then the locally most powerful location and scale invariant test for normality rejects the null hypothesis when the sample skewness exceeds a given threshold value.

Proof Without loss of generality it can be assumed that the location and scale parameters of the sampled distribution are zero and one, respectively. Let ξ i (x) and m(x) denote the i-th derivative of log Φ (x) and the inverted Mill’s ratio, respectively:

$${\xi _i}(x) = \frac{{{\partial ^i}\log \Phi (x)}}{{{\partial ^i}x}};m(x) = \frac{{\phi (x)}}{{\Phi (x)}}.$$
((5))

Let also denote by U standard normal truncated from below at −τ: U =Y|Y > −τ, where Y is standard normal. Straightforward calculus techniques lead to the following equations:

$$\mu = E(U) = {\xi _1}(\tau ),E\left\{ {{{(U - \mu )}^2}} \right\} = 1 - {\xi _2}(\tau ),E\left\{ {{{(U - \mu )}^3}} \right\} = {\xi _3}(\tau ).$$
((6))

Let first prove that ξ3(x) is a strictly positive function. The function ξ1(x) = m(x) is strictly decreasing, since it is well-known that its first derivative ξ2(x) = −m(x){x+m(x)} is strictly negative. Equivalently, −m(x) is strictly increasing. The function x + m(x) is strictly increasing, too, since its first derivative 1−m(x){x+m(x)} is the variance of U. The product of two increasing functions is also increasing, so that the function ξ2 (x) = −m(x){x+m(x)} is strictly increasing. Equivalently, its first derivative ξ3(x) is a strictly positive function, and this completes the first part of the proof.

In order to complete the proof, recall the following representation theorem:

$$X = \frac{Z}{{\sqrt {1 + {\eta ^2}} }} + \frac{{\eta U}}{{\sqrt {1 + {\eta ^2}} }} \sim ES{N_1}(0,1,\eta ,\tau ),$$
((7))

where Z is standard normal and independent of independent of U [17] or, equivalently [27],

$$X|U = u \sim N\left( {\frac{{\eta u}}{{\sqrt {1 + {\eta ^2}} }},\frac{1}{{1 + {\eta ^2}}}} \right).$$
((8))

Hence X ~ ESN 1 (0,1,η,τ) can be represented as a location mixture of normal distributions, with a truncated normal as mixing distribution. The third cumulant of a standard normal distribution truncated from below at −τ is always positive, being equal to ξ3 (τ). [31] show that these are sufficient conditions for the sample skewness to give the locally most powerful location and scale invariant test for normality against one-sided alternatives.

The above result generalizes the one in [29], who proved local optimality of the test only for τ = 0, i.e. for the skew-normal distribution. Surprisingly enough, the optimality property of the test statistic is unaffected by the parameter τ, even if its sampling distribution under the alternative hypothesis does.

3 The Multivariate Case

We shall now consider the problem of testing multivariate normality when the sampled distribution is assumed to be ESN. One possible way for doing it is evaluating skewness of all linear combinations of the variables, and reject the normality hypothesis if at least one of them is too high in absolute value. This argument, based on the union-intersection approach, inspired [23] to introduce the test statistic

$$\mathop {\max }\limits_{c \in \Re _0^d} {\left\{ {\frac{1}{2}\sum\nolimits_{i = 1}^n {{{\left( {\frac{{{c^T}{x_i} - {c^T}\bar x}}{{\sqrt {{c^T}Sc} }}} \right)}^3}} } \right\}^2}$$
((9))

where ℜ d0 is the set of all real, nonnull, d-dimensional vectors, while x̄, S and X denote the sample mean, the sample variance and the data matrix X whose rows are the vectors x T1 ,…,x T n . Closure properties of the ESN distribution under linear transformations and optimality properties of sample skewness for the univariate ESN encourage the use of the above statistic for testing multivariate normality within the ESN class.

Its analogue for a random vector z with expectation μ, nonsingular variance Σ, and finite third-order moments is:

$$\mathop {\max }\limits_{c \in \Re _0^d} \frac{{{E^2}\left[ {{{\left\{ {{c^T}(x - \mu )} \right\}}^3}} \right]}}{{{{({c^T}\sum c)}^3}}}.$$
((10))

[25] argued that difficulties in evaluating the above measure for well-known parametric families of multivariate distributions posed a severe limitation to its use. To the best of the author’s knowledge, the skew-normal distribution, i.e. ESN with null truncation parameter, is the only known example of statistical model for which Malkovich and Afifi’s skewness has a straightforward parametric interpretation: the direction maximizing skewness is proportional to the nonnormality parameter [21]. Theorem 2 in this section generalizes the result to all multivariate ESN distributions, regardless of the truncation parameter’s value.

Another criticism to Malkovich and Afifi’s skewness came from [5], who pointed out the involved computational difficulties. Indeed, the method proposed by [23] for computing their statistic appear to be based more heuristics rather than on a formal theory. Theorem 2 also suggests an approach based on standard maximum likelihood estimation rather than maximization of a d-variate cubic form subject to quadratic constraints. The maximum likelihood estimate for the shape parameter converges to the shape parameter itself, by well-known asymptotic arguments. Moreover, the direction maximizing sample skewness converges to the direction maximizing population’s skewness, when it is unique [21], as it happens in the ESN case. Hence the direction maximizing sample skewness and the direction of the maximum likelihood estimate for the shape parameter will converge to each other. From the practical point of view, when the sample size is large enough, the former direction can be satisfactorily approximated by the latter one. Technical aspects of maximum likelihood estimation for the multivariate ESN are dealt with in [13].

Theorem 2 The vector c ∈ ℜd, d > 1, maximizing the skewness of c T x, where x Ȭ ESN(ξ, Ω, η, τ) and η ≠ 0 d , is proportional to the nonnormality parameter η.

Proof First recall some results regarding cumulants of an extended skew-normal random vector xESN(ξ, Ω, η, τ) [2]:

$$\mu = \xi + \delta {\zeta _1}(\tau ),\,\,\sum = \Omega + {\zeta _2}(\tau )\delta {\delta ^T},{\kappa _3}(x) = {\zeta _3}(\tau )\delta \otimes \delta \otimes {\delta ^T},$$
((11))

where ζ i (τ)=∂ilogΦ(τ)/∂iτ,δ=Ωη/√1+ηTΩη and μ, Σ, κ 3 (x) denote the mean, the variance and the third cumulant of x, respectively. The extended skewnormal class is closed under affine transformations, so that the distribution of the standardized random vector z−1/2(x−μ) is extended skew-normal, too: zESN z , Ω z , η z , τ) with \({\delta _z} = {\Omega _z}{\eta _z}/\sqrt {1 + \eta _z^T{\Omega _z}{\eta _z}} = {\sum ^{ - 1/2}}\delta \).

Let λ and λ z be unit length vectors maximizing the skewness of a linear combination of components of x and z, respectively:

$$\lambda = \arg \;{\max _{c \in C}}{\beta _1}({c^T}x);\quad {\lambda _z} = \arg \;{\max _{c \in C}}{\beta _1}({c^T}z),$$
((12))

where C is the set of d-dimensional random vectors of unit length and β1(Z) is the squared skewness of the random variable Z. It follows that λ∞Σ−1/2λ z . Definitions of C and z imply that

$${\beta _1}({c^T}z) = {E^2}\left\{ {{{\left( {{c^T}z} \right)}^3}} \right\}\quad c \in C.$$
((13))

Apply now linear properties of cumulants [26, p. 32] to obtain

$${E^2}\left\{ {{{\left( {{c^T}z} \right)}^3}} \right\} = {\left\{ {{{\left( {c \otimes c} \right)}^T}{\kappa _3}(z)c} \right\}^2},$$
((14))

where κ3(z)=ζ3(τ)δ⊗δ⊗δT is the third cumulant of z. Then

$${E^2}\left\{ {{{\left( {{c^T}z} \right)}^3}} \right\} = \zeta _3^2(\tau ){({c^T}\delta )^6}$$
((15))

by ordinary properties of the Kronecker product. Hence λ z is proportional to δ z or, equivalently, λ is proportional to Σ−1δ. Basic formulae for matrix inversion lead to

$$\lambda \infty {\sum ^{ - 1}}\delta = \left\{ {{\Omega ^{ - 1}} - \frac{{{\Omega ^{ - 1}}\delta {\delta ^T}{\Omega ^{ - 1}}}}{{\zeta _3^{ - 1}(\tau ) + {\delta ^T}{\Omega ^{ - 1}}\delta }}} \right\}and\;to$$
((16))
$$\delta = {\Omega ^{ - 1}}\delta \left\{ {1 - \frac{{{\delta ^T}{\Omega ^{ - 1}}\delta }}{{\zeta _3^{ - 1}(\tau ) + {\delta ^T}{\Omega ^{ - 1}}\delta }}} \right\}.$$
((17))

Since \(\eta = {\Omega ^{ - 1}}\delta /\sqrt {1 - {\delta ^T}\Omega \delta } \), the vector maximizing the skewness of x is proportional to the parameter η, and this completes the proof.

4 A Numerical Example

In this section we shall use projections which maximize skewness to highlight interesting data features. The approach is exploratory in nature, and differs from the inferential apprach of the previous sections. Skewness maximization provides a valid criterion for projection pursuit [18], which has never been applied to financial data, to the best of our knowledge.

Each observation is the closing price of an European financial market, as recorded by MSCI Inc., a leading provider of investment decision support tools. The included countries are Austria, Belgium, Denmark, Finland, France, Germany, Greece, England, Ireland, Italy, Norway, Holland, Portugal, Spain, Sweden, Switzerland. The first and last closing prices were recorderd during June 24, 2004 and June 23, 2008, respectively. Data are arranged in a matrix where each row corresponds to a day and each column to a country. Hence the size of the data matrix is 1305×16, which is quite large.

Table 1 reports the skewnesses and the kurtoses for each country. All prices are mildly skewed: their third standardized moments are never greater than 0.688 and exceed 0.4 in Finland, Germany and Holland only. Also, all prices are platykurtic: their fourth standardized moments are never greater than 2.33 and exceed 2.00 in Finland and England only. Both features are illustrated in the box plots of Fig. 1: despite obvious differences in location and spread, all box plots suggest mild skewness and absence of ouliers. Histograms of closing prices for different countries are multimodal and light tailed. We were unable to report all histograms due to space constraints. However, they are well exemplified by the histograms of Austria (Fig. 3) and Italy (Fig. 2). Austrian and Italian stock prices are linearly related, as shown in the scatter plot of Fig. 4. Again, the graph does not suggest the presence of outliers. Data exhibit a completely different structure when projected onto the direction which maximizes their skewness. The histogram of the projected data (Fig. 5) is definitely unimodal, markedly skewed and very heavy tailed. From the economic viewpoint, it is interesting to notice that the fifty greatest projected value correspond to the latest fifty days of the time period under consideration, when Europe began to suffer from the financial crisis.

Table 1 Skewness, kurtosis and weight in the linear combination of each country
Fig. 1
figure 1

Boxplot of stock prices in European countries

Fig. 2
figure 2

Histogram of Italian Prices

Fig. 3
figure 3

Histogram of Austrian Prices

Fig. 4
figure 4

Scatterplot of Italian Prices versus Austrian Prices

Fig. 5
figure 5

Histogram of Transformed data

5 Concluding Remarks

We have proposed two approaches for testing normality when the sampled distribution is assumed to be ESN. The first approach, recommended in the univariate case, is motivated by local optimality. The second approach, recommended in the multivariate case, is motivated by union-intersection arguments. Both approaches use some measure of skewness to overcome problems posed by likelihood-based methods.

Alternatively, multivariate normality can be tested via Mardia’s measure of skewness [24], which presents some advantages over Malkovich and Afifi−s one in the general case [5, 25]. However, these advantages vanish in the ESN case, as shown in Sect. 3. Moreover, Malkovich and Afifi’s index has a straightforward application in terms of projection pursuit, since univariate skewness is a well-known projection pursuit index [18].

A natural question to ask is whether results in the paper hold for more general classes of distributions, as the ones discussed in [3]. Simulation studies (not shown here) suggest that the answer is in the positive, but the problem deserves further investigation and constitutes a direction for future research.

Acknowledgements The author would like to thank Marc Hallin, Christophe Ley and Davy Pandevene for their insightful comments on a previous version of the paper.