1 Introduction

Various types of skew-symmetric distributions have been proposed by many researchers in the literature. In general, there are four methods of constructing a skew-symmetric distribution with a symmetric density function [17]. One of these methods is perturbation of a symmetric density via skewing function, i.e. a skew probability density function (pdf) is created by multiplying a symmetric pdf with a skewing function. A skewing function is a function with range [0,1]. In fact, the starting point of all these studies was the skew-normal (SN) distribution introduced by [5],

$$\begin{aligned} f(x;\lambda ) = 2\phi (x) \Phi \left( {\lambda x} \right) ,\quad x \in \mathbb {R}, \end{aligned}$$
(1)

where \(\phi \) and \(\Phi \) are the pdf and cumulative distribution function (cdf) of the standard normal, respectively. A random variable X with the above density is denoted by \(X \sim SN (\lambda )\). [1] introduced a generalization of (1) with nice properties, called skew generalized normal distribution (SGN) with pdf of the form

$$\begin{aligned} f(x;\lambda _1 ,\lambda _2 ) = 2\phi (x)\Phi \left( {\frac{{\lambda _1 x}}{{\sqrt{1 + \lambda _2 {x} ^2 } }}} \right) ,\quad x \in \mathbb {R}. \end{aligned}$$
(2)

Skew-curved normal distribution (SCN) is a SGN distribution with parameter \(\lambda _2 = \lambda _1^2\). A number of researchers proposed extension of this density such as [10, 12, 15, 22]. Choudhury and Abdul Matin [10] added one parameter to SGN family and called it, extended skew generalized normal (ESGN) distribution with following density

$$\begin{aligned} f(x;\lambda _1 ,\lambda _2 ,\lambda _3 ) = 2\phi (x)\Phi \left( \frac{{\lambda _1 x}}{{\sqrt{\lambda _2 x^2 + \lambda _3 x^4 } }}\right) . \end{aligned}$$
(3)

They showed that ESGN distribution is more flexible since the range of Pearson’s excess kurtosis coefficient of ESGN distribution is wider than those of SN and SGN distributions. Certain studies was done for creating flexibility (bimodality) in the skew-symmetric family of distributions by [2, 3, 9, 11, 14, 16, 18,19,20].

In this paper, an extension of SGN distribution is introduced by adding a shape parameter. The addition of this parameter make our proposed distribution to be uni-bimodal and have a wider range of Pearson’s excess kurtosis coefficient. Also, the multivarite version of our distribution with multimodal shape is introduced.

The rest of the paper is organized as follows. In Sect. 2, we present the definition of our proposed distribution and various graphs of its pdf. We also derive some important results about this distribution and its relationship with other distributions. The main properties of our proposed model such as moments, stochastic representation and characterizations are also discussed in this section. Section 3 is devoted to maximum likelihood estimation. In Sect. 4, we introduce the multivarite case of our proposed distribution and study some of its properties. Finally, in Sect. 5, we use two real data sets to illustrate the usefulness of this family of distributions.

2 The shape-skew generalized normal distribution and its main properties

In this section, we introduce a flexible class of skew normal distributions generalizing (2).

Definition 1

The random variable X has shape skew generalized normal distribution if its density is given by

$$\begin{aligned} f(x;\lambda _1 ,\lambda _2 ,\alpha ) = 2\phi (x)\Phi \left( {\frac{{\lambda _1 x}}{{\sqrt{1 + \lambda _2 {\left| x \right| } ^{2\alpha } } }}} \right) \quad x \in \mathbb {R}, \end{aligned}$$
(4)

where \(\lambda _1 \in \mathbb {R},\,\lambda _2 \in [0,\infty )\) are skewing parameters, \(\alpha \in \mathbb {R}-\left\{ 0 \right\} \) is a shape parameter with the following conditions: if \(\lambda _1 = 0\) then \(\lambda _2\) and \(\alpha \) must be zero and one, respectively and if \(\lambda _2 = 0\) then \(\alpha =1\). We denote this by \(X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha )\). The resulting distribution for the special case \(\lambda _2 = \lambda _1^2 \) is called shape skew-curved normal (SSCN) and is denoted by \(SSCN(\lambda _1 ,\alpha )\).

We like to point out that (4) is indeed a density, due to the fact that skewing function is constructed based on [17] conditions for skewing function (See page 2). This condition is presented as follows:

Fig. 1
figure 1

Some possible shapes of \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution by different parameters

A skewing function is a mapping \( \Pi :\mathbb {R}^k \times \mathbb {R}^k \rightarrow [0,1] \) such that

$$\begin{aligned} \Pi ( - \mathbf{{z}},\delta ) + \Pi (\mathbf{{z}},\delta ) = 1{} {} {} \forall \mathbf{{z}},\delta \in {\mathbb {R}^k}{} ,\Pi (\mathbf{{z}},{\delta ^*}) = \frac{1}{2}{} {} \forall \mathbf{{z}} \in {\mathbb {R}^k}, \end{aligned}$$
(5)

where \({\delta ^*}\) is special case of \(\delta \). The parameter \(\mathbf {\delta }\) is a skewness/asymmetry parameter and the normalizing constant equals 2. If \(\alpha \) is zero formula (4) is not a density.

Figure 1 illustrates the various graphs of (4) under different choices of \(\lambda _1 ,\lambda _2 ,\alpha \) which shows that SSGN density can change to uni/bimodality, high and low Pearsons excess kurtosis coefficient and heavy tail shape taking different parameters. To see the modality behavior of a SSGN distribution, we used some graphical methods and observed that the derivative of the density (4) changes sign at most once from positive to negative when \(\alpha \in \left\{ { - 1,1} \right\} \) and changes sign two more times when \(\alpha \notin \left\{ { - 1,1} \right\} \). Therefore, the distribution in question is either unimodal or bimodal. Figure 2 shows the effect of \(\alpha \) on \(SSCN(\lambda , \alpha )\) and it is compared with \(SCN(\lambda )\), (\(\alpha =1\)) introduced by the [1]. We see that for the positive shape parameter, the density tends to be more heavy tail and bimodal. On the other hand for the negative shape parameter, mode of SCN density divides into two modes.

Fig. 2
figure 2

Effect of positive or negative shape parameter on \(SSCN(\lambda _1 ,\alpha )\) distribution

Basic properties of a \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\):

Proposition 1

If \(X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) then we have:

  1. 1.

    \(SSGN(0 ,0 ,1 )=N(0,1)\)

  2. 2.

    \(SSGN(\lambda _1 ,0 ,1 )=SN(\lambda _1)\), for all \(\lambda _1 \in \mathbb {R}\)

  3. 3.

    \(SSGN(\lambda _1 ,\lambda _2 ,1 )=SGN(\lambda _1,\lambda _2)\), for all \(\lambda _1 \in \mathbb {R}\), \(\lambda _2 \ge 0\)

  4. 4.

    \(SSGN(\lambda _1 ,\lambda _2 ,2 )=ESGN(\lambda _1,0,\lambda _2)\), for all \(\lambda _1 \in \mathbb {R}\), \(\lambda _2 \ge 0\).

  5. 5.

    \(-X \sim SSGN(-\lambda _1 ,\lambda _2 ,\alpha )\)

  6. 6.

    \(f(x,\lambda _1 ,\lambda _2 ,\alpha )+f(-x,\lambda _1 ,\lambda _2 ,\alpha )=2\phi (x)\), for all \(x \in \mathbb {R}\),\(\lambda _1 \in \mathbb {R}\), \(\lambda _2 \ge 0\), \(\alpha \in \mathrm{\mathbb {Z}} - \left\{ 0 \right\} \).

  7. 7.

    \(\mathop {\lim }\limits _{\lambda _1 \rightarrow \infty } f(x,\lambda _1 ,\lambda _2 ,\alpha ) = 2\phi (x)I(x \ge 0) \) (half normal distribution), for all \(\lambda _2 \ge 0\), \(\alpha \in \mathrm{\mathbb {Z}} - \left\{ 0 \right\} \).

  8. 8.

    \(\mathop {\lim }\limits _{\lambda _1 \rightarrow - \infty } f(x,\lambda _1 ,\lambda _2 ,\alpha ) = 2\phi (x)I(x \le 0) \) (half normal distribution), for all \(\lambda _2 \ge 0\), \(\alpha \in \mathrm{\mathbb {Z}} - \left\{ 0 \right\} \).

  9. 9.

    If \(Z \sim N(0,1)\), then for every even function \( h( \cdot ) \), \((h(u)=h(-u))\), we have \( h(Z)\mathop = \limits ^d h(X) \) where \( \mathop = \limits ^d \) means the equality in distribution.

  10. 10.

    If \( \mathrm{{Y}} \sim SSGN(\lambda _1^* ,\lambda _2^* ,\alpha ^ * ) \), then for every even function \( h( \cdot ) \), \((h(u)=h(-u))\), we have \( h(Y)\mathop = \limits ^d h(X)\).

Proof

The proof is straightforward. \(\square \)

Proposition 2

Let \(X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) and \( F(x,\lambda _1 ,\lambda _2 ,\alpha )\) be the cdf of X, then we have:

$$\begin{aligned} F(x,\lambda _1 ,\lambda _2 ,\alpha ) = \Phi (x) - 2H(x,\lambda _1 ,\lambda _2 ,\alpha ), \end{aligned}$$
(6)

where

$$\begin{aligned} H(x,\lambda _1 ,\lambda _2 ,\alpha ) = \int _{ - x}^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)\phi (u)dtdu} }. \end{aligned}$$
(7)

Proof

We have

$$\begin{aligned} \Phi (x)= & {} \int _{ - \infty }^x {\phi (t)dt = \int _{ - \infty }^x {\int _{ - \infty }^\infty {\phi (t)\phi (u)dtdu} } }\nonumber \\= & {} \int _{ - \infty }^x {\left( {\int _{ - \infty }^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)dt} + \int _{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}}^0 {\phi (t)dt} + \frac{1}{2}} \right) \phi (u)du}\nonumber \\= & {} \frac{1}{2}F(x,\lambda _1 ,\lambda _2 ,\alpha ) + \frac{1}{2}\Phi (x) + \int _{ - \infty }^x {\int _{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}}^0 {\phi (t)} \phi (u)dtdu}, \end{aligned}$$
(8)

from which the following equality is obtained

$$\begin{aligned} \int _{ - \infty }^x {\int _{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}}^0 {\phi (t)} \phi (u)dtdu} = \int _{ - x}^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu} = H(x,\lambda _1 ,\lambda _2 ,\alpha ). \end{aligned}$$
(9)

\(\square \)

Properties of H function:

Proposition 3

The H function in the cdf of SSGN distribution has two following properties:

  1. (1)

    \(H(x,\lambda _1 ,\lambda _2 ,\alpha ) = H( - x,\lambda _1 ,\lambda _2 ,\alpha )\), for all \(x \in \mathbb {R}\),\(\lambda _1 \in \mathbb {R}\), \(\lambda _2 \ge 0\), \(\alpha \in \mathrm{\mathbb {Z}} - \left\{ 0 \right\} \).

  2. (2)

    \(H(x,-\lambda _1 ,\lambda _2 ,\alpha ) =- H(x,\lambda _1 ,\lambda _2 ,\alpha )\), for all \(x \in \mathbb {R}\),\(\lambda _1 \in \mathbb {R}\), \(\lambda _2 \ge 0\), \(\alpha \in \mathrm{\mathbb {Z}} - \left\{ 0 \right\} \).

Proof

To prove (1), we start with definition of −H function

$$\begin{aligned} - H(x,\lambda _1 ,\lambda _2 ,\alpha )= & {} \int _{ - \infty }^x {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu}\nonumber \\= & {} \int _{ - \infty }^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu} - \int _x^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu}\nonumber \\= & {} \int _{ - \infty }^\infty {\left( {\int _{ - \infty }^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu - \frac{1}{2}} \right) }\nonumber \\&- \int _x^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu}\nonumber \\= & {} 0 - \int _x^\infty {\int _0^{\frac{{\lambda _1 u}}{{\sqrt{1 + \lambda _2 u^{2\alpha } } }}} {\phi (t)} \phi (u)dtdu}. \end{aligned}$$
(10)

Equality (1) is obtained by multiplying both sides of the above equation by −1. The proof of part (2) is straightforward. \(\square \)

Now, we obtain the moments of SSGN. Note that in view part (9) of Proposition 1, the even moments of SSGN and standard normal distribution are the same i.e.

$$\begin{aligned} E( {X^{2K} } ) = 1 \times 3 \times 5 \times \cdot \cdot \cdot \times (2K - 1),\quad K = 1,2,\ldots . \end{aligned}$$
(11)

The odd moments of SSGN can be obtained using the following proposition.

Proposition 4

Let \(X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha )\). Then for \(K=0,1,2,\ldots \) we have

$$\begin{aligned} E( {X^{2K + 1} } ) = 2\left( {b_K (\lambda _1 ,\lambda _2 ,\alpha ) - b_K (0,\lambda _2 ,\alpha )} \right) , \end{aligned}$$
(12)

when

$$\begin{aligned} b_K (\lambda _1 ,\lambda _2 ,\alpha ) = \int _0^\infty {\frac{{u^k }}{{\sqrt{2\pi } }}e^{ - u/2} \Phi \left( \frac{{\lambda _1 \sqrt{u} }}{{\sqrt{1 + \lambda _2 u^\alpha } }}\right) } dx, \end{aligned}$$
(13)

and

$$\begin{aligned} b_K (0,\lambda _2 ,\alpha ) = \frac{{2^K \Gamma (K + 1)}}{{\sqrt{2\pi } }}. \end{aligned}$$
(14)

Proof

$$\begin{aligned} E( {X^{2K + 1} } )= & {} 2\int _0^\infty {x^{2K + 1} } \phi (x)\Phi \left( {\frac{{\lambda _1 x}}{{\sqrt{1 + \lambda _2 x^{2\alpha } } }}} \right) dx - 2\int _0^\infty {x^{2K + 1} \phi (x)dx}\nonumber \\= & {} 2b_K (\lambda _1 ,\lambda _2 ,\alpha ) - \frac{{2^{\mathrm{K} + 1} \Gamma (K + 1)}}{{\sqrt{2\pi } }} \end{aligned}$$
(15)

\(\square \)

Using the above formulas, we can obtain the skewness and Pearson’s excess kurtosis coefficients of \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) for selective values of \(\alpha \). These coefficients are obtained via

$$\begin{aligned} {\gamma _1} = \frac{{E({X^3}) - 3\mu {\sigma ^2} - {\mu ^3}}}{{{{({\sigma ^2})}^{\frac{3}{2}}}}}\quad and\quad {\gamma _2} = \frac{{E({X^4}) - 4E({X^3})\mu + 6E({X^2}){\mu ^2} - 3{\mu ^4}}}{{{{({\sigma ^2})}^2}}}. \end{aligned}$$

Figure 3, shows the variability of these coefficients for various values of the parameters.

Fig. 3
figure 3

Asymmetry and Kurtosis range variations of SSGN for some parameter values

Moment generating function is given by

$$\begin{aligned} M_X (t) = 2e^{t^2 /2} E(\Phi \left( {\frac{{\lambda _1 (Z + t)}}{{\sqrt{1 + \lambda _2 (Z + t)^{2\alpha } } }}} \right) \end{aligned}$$
(16)

where Z has a standard normal distribution.

Certain relations of SSGN distribution with well-known distributions are mentioned in the following proposition:

Proposition 5

  • If \( X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha ) \) and \(Y \sim \chi _{(k)}^2 \) then for \( G = \frac{X}{{\sqrt{\frac{Y}{k}} }} \), we have

    $$\begin{aligned}&f_G (g) \rightarrow 2f_{T_{(k)} }(g) I(g \ge 0)\quad as\quad \lambda _1 \rightarrow \infty \nonumber \\&f_G (g) \rightarrow 2f_{T_{(k)} }(g) I(g < 0)\quad as\quad \lambda _1 \rightarrow - \infty \end{aligned}$$
    (17)

    where \(f_{T_{(k)} }\) is density of student t distribution with k degrees of freedom.

  • If \( X_1 ,X_2 \mathop \sim \limits ^{iid} SSGN(\lambda _1 ,\lambda _2 ,\alpha ) \) and \(D = \frac{{X_1 }}{{\left| {X_2 } \right| }}\), then

    $$\begin{aligned}&f_D (d) \rightarrow 2f_{U }(d) I(d \ge 0)\quad as\quad \lambda _1 \rightarrow \infty \nonumber \\&f_D (d) \rightarrow 2f_{U }(d) I(d < 0)\quad as\quad \lambda _1 \rightarrow - \infty \end{aligned}$$
    (18)

    where \(f_{U }\) is density of standard Cauchy distribution (C(0, 1)) and iid stands for independent and identically distributed.

  • If \( X_1 ,X_2,\ldots ,X_n \mathop \sim \limits ^{iid} SSGN(\lambda _1 ,\lambda _2 ,\alpha ) \) then \(D = \sum \limits _{i = 1}^n {X_i^2 } \sim \chi _n^2\).

  • If \( X\left| {Y = y} \right. \sim SSGN(\frac{{\lambda _1 }}{y},\frac{{\lambda _2 }}{{y^{2\alpha } }},\alpha ) \), \(Y \sim SN(\theta )\) and \( V = \frac{X}{Y} \), then

    $$\begin{aligned} f_V (v) = 2g(v)\Phi \left( {\frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}} \right) \end{aligned}$$
    (19)

    where g(v) is the standard Cauchy density (C(0, 1)).

Note the connection between Normal and SSGN distribution: If \(X \sim N(0,1)\), the resulting random variable G has \({T_{(k)}}\) distribution. If \({X_1},{X_2} \sim N(0,1)\) the resulting random variable D has C(0, 1) distribution and finally if \({X_1},{X_2},\ldots ,{X_n} \sim N(0,1)\) the resulting random variable D has \(\chi _{n}^2\) distribution. So, we only prove (19).

Proof

Let \(f_V (v)\) denote the pdf of V. Then

$$\begin{aligned} f_V (v)= & {} \int _{ - \infty }^\infty {2\phi (vy)\Phi \left( \frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}\right) 2\phi (y)\Phi (\theta y)} \left| y \right| dy \nonumber \\= & {} 2\Phi \left( \frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}\right) \int _{ - \infty }^\infty {2\frac{{\left| y \right| }}{{\sqrt{2\pi } }}\phi (y\sqrt{1 + v^2 } )\Phi (\theta y)} dy \nonumber \\= & {} \frac{2}{{\sqrt{2\pi } \left( {1 + v^2 } \right) }}\Phi \left( \frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}\right) \int _{ - \infty }^\infty {2\left| y \right| \phi (y)\Phi (\frac{{\theta y}}{{\sqrt{1 + d^2 } }})} dy. \end{aligned}$$
(20)

Since \(\left| W \right| \mathop = \limits ^d \left| Z \right| \), where \(Z \sim N(0,1)\) and \(W \sim SN\left( \frac{\theta }{{\sqrt{1 + d^2 } }}\right) \), we have

$$\begin{aligned} f_V (v) = \frac{2}{{\sqrt{2\pi } ( {1 + v^2 })}}\Phi \left( \frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}\right) \int _{ - \infty }^\infty {\left| y \right| \phi (y)} dy = 2g(v)\Phi \left( \frac{{\lambda _1 v}}{{\sqrt{1 + \lambda _2 v^{2\alpha } } }}\right) ,\nonumber \\ \end{aligned}$$
(21)

which completes the proof. \(\square \)

The family of skew-Cauchy distribution was introduced by considering the distribution of \(\frac{{X_\lambda }}{X}\), where \(X_\lambda \sim SN(\lambda )\) and \(X \sim N(0,1)\) are independent random variables [7]. Another family of two parameters skew-Cauchy distribution which includes the skew-Cauchy distribution as a special case was proposed by [13]. Nekokhou et al. [19] introduced three parameters skew-Cauchy distribution based on relationships between SN and flexible skew generalized normal (FSGN) distributions. The pdf (19) presents another three parameters skew-Cauchy distribution based on SN and SSGN distributions without the assumption of independence of the random variables.

Method of generating data from SSGN distribution is presented by the stochastic representation. The first stochastic representation of SSGN distribution will be introduced in Proposition 6 which is constructed on random variables with the standard normal distributions.

Proposition 6

Let Y and Z be iid random variables with N(0, 1) distribution, then

$$\begin{aligned} X = \left\{ \begin{array}{l} Y\quad \quad \quad if\quad Z \le \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }} \\ - Y\quad \,\,if\quad Z > \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }} \end{array} \right. \end{aligned}$$
(22)

has \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution.

Proof

Observe that

$$\begin{aligned} F_X (x)= & {} P\left( {X \le x,Z \le \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) + P\left( {X \le x,Z> \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \nonumber \\= & {} P\left( {Y \le x,Z \le \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) + P\left( { - Y \le x,Z > \frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \nonumber \\= & {} \int _{ - \infty }^x {\int _{ - \infty }^{\frac{{\lambda _1 y}}{{\sqrt{1 + \lambda _2 y^{2\alpha } } }}} {\phi (z)\phi (y)dzdy + } } \int _{ - x}^\infty {\int _{\frac{{\lambda _1 y}}{{\sqrt{1 + \lambda _2 y^{2\alpha } } }}}^\infty {\phi (z)\phi (y)dzdy} }\nonumber \\= & {} \int _{ - \infty }^x {\phi (y)\Phi \left( {\frac{{\lambda _1 y}}{{\sqrt{1 + \lambda _2 y^{2\alpha } } }}} \right) dy + } \int _{ - \infty }^x {\phi (y)\Phi \left( {\frac{{\lambda _1 y}}{{\sqrt{1 + \lambda _2 y^{2\alpha } } }}} \right) dy} \nonumber \\= & {} 2\int _{ - \infty }^x {\phi (y)\Phi \left( {\frac{{\lambda _1 y}}{{\sqrt{1 + \lambda _2 y^{2\alpha } } }}} \right) dy}. \end{aligned}$$
(23)

Now, differentiating \(F_X(x)\) with respect to x, we have:

$$\begin{aligned} f_X (x) = 2\phi (x)\Phi \left( {\frac{{\lambda _1 x}}{{\sqrt{1 + \lambda _2 x^{2\alpha } } }}} \right) . \end{aligned}$$
(24)

\(\square \)

The second approach for generating data from SSGN distribution via standard normal and uniform distributions is presented in the next proposition.

Proposition 7

Let Y and U be independent random variables with distributions N(0, 1) and the uniform distribution on the interval [0,1] (U(0, 1)), respectively. The random variable

$$\begin{aligned} X = \left\{ \begin{array}{l} Y\quad if\quad U \le \Phi \left( {\frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \\ - Y\quad if\quad U > \Phi \left( {\frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \end{array} \right. \end{aligned}$$
(25)

has \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution.

Proof

The proof is similar to that of Proposition 6. \(\square \)

The next proposition establishes the stochastic representation of SSGN based on Normal, Uniform and Bernoulli distributions.

Proposition 8

Let YU and V be independent random variables with N(0, 1), U(0, 1) and the Bernoulli (B(1, p)) distributions, respectively. Define \( X_1 = Y\left| U \right. \le \Phi \left( {\frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \) and \( X_2 = - Y\left| U \right. > \Phi \left( {\frac{{\lambda _1 Y}}{{\sqrt{1 + \lambda _2 Y^{2\alpha } } }}} \right) \). Then

$$\begin{aligned} X_1 \mathop = \limits ^d X_2 \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha ), \end{aligned}$$
(26)

and

$$\begin{aligned} H=VX_1 + (1 - V)X_2 \end{aligned}$$
(27)

has \(SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution.

Proof

Note that

$$\begin{aligned} P({X_1} \le y)= & {} P\left( Y \le y\left| U \right. \le \Phi \left( {\frac{{{\lambda _1}Y}}{{\sqrt{1 + {\lambda _2}{Y^{2\alpha }}} }}} \right) \right) = \frac{{P\left( Y \le y,\,U \le \Phi \left( {\frac{{{\lambda _1}Y}}{{\sqrt{1 + {\lambda _2}{Y^{2\alpha }}} }}} \right) \right) }}{{P\left( U \le \Phi \left( {\frac{{{\lambda _1}Y}}{{\sqrt{1 + {\lambda _2}{Y^{2\alpha }}} }}} \right) \right) }} \nonumber \\= & {} \frac{{\int _{ - \infty }^y {\phi (y)\Phi \left( {\frac{{{\lambda _1}y}}{{\sqrt{1 + {\lambda _2}{y^{2\alpha }}} }}} \right) dy} }}{{\int _{ - \infty }^\infty {\phi (y)\Phi \left( {\frac{{{\lambda _1}y}}{{\sqrt{1 + {\lambda _2}{y^{2\alpha }}} }}} \right) dy} }} = \int _{ - \infty }^y {2\phi (y)\Phi \left( {\frac{{{\lambda _1}y}}{{\sqrt{1 + {\lambda _2}{y^{2\alpha }}} }}} \right) dy}. \end{aligned}$$
(28)

Similarly \({X_2}\) has the same SSGN distribution. Now, we show that H has SSGN distribution:

$$\begin{aligned} {F_H}(h)= & {} P(H \le h) = P(H \le h\left| {V = 1} \right. )P(V = 1) + P(H \le h\left| {V = 0} \right. )p(V = 0) \nonumber \\= & {} P({X_1} \le h)p + P({X_2} \le h)(1 - p) = P({X_1} \le h). \end{aligned}$$
(29)

The last equality is based on the fact that \(X_1\), \(X_2\) and H are identically distributed. \(\square \)

3 Maximum likelihood estimation

We consider the location-scale version of SSGN distribution for fitting a model to a real data set. For this purpose, we define \(Y=\mu +\sigma X\) where \(X \sim SSGN(\lambda _1 ,\lambda _2 ,\alpha )\) and \((\mu \in \mathbb {R},\sigma > 0)\). Then

$$\begin{aligned} f_Y (y\left| \theta \right. ) = \frac{2}{\sigma }\phi \left( {\frac{{y - \mu }}{\sigma }} \right) \Phi \left( {\frac{{\lambda _1 \left( {\frac{{y - \mu }}{\sigma }} \right) }}{{\sqrt{1 + \lambda _2 \left( {\frac{{y - \mu }}{\sigma }} \right) ^{2\alpha } } }}} \right) \end{aligned}$$
(30)

where \(\theta = \left( {\mu ,\sigma ,\lambda _1 ,\lambda _2 ,\alpha } \right) \) i.e. \(Y \sim SSGN\left( {\mu ,\sigma ,\lambda _1 ,\lambda _2 ,\alpha } \right) \).

Let \(x_1, x_2, \ldots , x_n\) be a random sample of size n from a population with pdf (30). Then the likelihood function of the sample is

$$\begin{aligned} \ell (\mu ,\sigma ,{\lambda _1},{\lambda _2},\alpha ) = n\ln \left( {\frac{2}{{\sqrt{2\pi } \sigma }}} \right) - \frac{1}{2}\sum \limits _{i = 1}^n {z_i^2} + \sum \limits _{i = 1}^n {\log \left( \Phi \left( {\frac{{{\lambda _1}{z_i}}}{{\sqrt{1 + {\lambda _2}z_i^{2\alpha }} }}} \right) \right) }, \end{aligned}$$
(31)

where \({z_i} = \frac{{{x_i} - \mu }}{\sigma }\).

Since space of \(\alpha \) is discrete, ML estimation is performed by the following algorithm based on profile likelihood: suppose \(\alpha \in \left\{ A={ - N,\ldots , - 1,1, \ldots ,N} \right\} \) for each \(N \in \mathbb {Z}\).

For \(i=1,\ldots ,2N\)

  • Set \(\alpha =A(i)\).

  • With numerical calculation based on following score function equal to zero, we find MLE of \(\mu , \sigma , \lambda _1 ,\lambda _2 \) and show them by \(({\hat{\mu }} ,{\hat{\sigma }} ,{\hat{\lambda }}_1 ,{\hat{\lambda }}_2)\)

    $$\begin{aligned} \frac{{\partial \ell }}{{\partial \mu }}= & {} - \sum \limits _{i = 1}^n {\frac{{z_i }}{\sigma }} - \sum \limits _{i = 1}^n {\frac{{\lambda _1 (1 + \lambda _2 (z_i^{2\alpha } + \alpha z_i^{2\alpha -1} ))}}{{\sigma ( {1 + \lambda _2 z_i^{2\alpha }})^{\frac{3}{2}} }}} w_i^* \nonumber \\ \frac{{\partial \ell }}{{\partial \sigma }}= & {} - \frac{n}{\sigma } + \sum \limits _{i = 1}^n {\frac{{z_i^2}}{{\sigma }}} + \sum \limits _{i = 1}^n {\frac{{\lambda _1 (2\alpha \lambda _2 z_i^{2\alpha +1} - z_i \sqrt{1 + \lambda _2 z_i^{2\alpha } } ) }}{{\sigma (1 + \lambda _2 z_i^{2\alpha } )}}w_i^* }\nonumber \\ \frac{{\partial \ell }}{{\partial \lambda _1 }}= & {} \sum \limits _{i = 1}^n {\frac{{z_i }}{{\sqrt{1 + \lambda _2 z_i^{2\alpha } } }}} w_i^* \nonumber \\ \frac{{\partial \ell }}{{\partial \lambda _2 }}= & {} \sum \limits _{i = 1}^n {\frac{{ z_i^{2\alpha } }}{{(1 + \lambda _2 z_i^{2\alpha } )^{\frac{3}{2}} }}} w_i^* \end{aligned}$$
    (32)

    where \( w_i^* = \frac{{\phi \left( {\frac{{\lambda _1 z_i }}{{\sqrt{1 + \lambda _2 z_i^{2\alpha } } }}} \right) }}{{\Phi \left( {\frac{{\lambda _1 z_i }}{{\sqrt{1 + \lambda _2 z_i^{2\alpha } } }}} \right) }} \).

  • now calculate \(l(i) = \ell ({\hat{\mu }} ,{\hat{\sigma }} ,{\hat{\lambda }}_1 ,{\hat{\lambda }}_2 ,\alpha )\).

Return to the first step with i replaced by \(i+1\) and after final step, we find \({{\hat{\alpha }}}\) based on maximum of log likelihood l function and corresponding estimation of other parameters of \(SSGN(\mu , \sigma ,\lambda _1 ,\lambda _2, \alpha )\).

4 The multivariate shape skew generalized normal distribution

In this section, we introduce certain interesting results when \(X = (X_1,\ldots , X_n)\) has a multivariate shape skew generalized normal density. This multivariate version can be used in graphical models since we show that its conditional distribution belongs to this family as well.

Definition 2

An n-dimensional random variable \(X = (X_1,\ldots , X_n)\) has the multivariate shape skew generalized normal, \(MSSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution with the following conditions: if \(\lambda _1 = 0\), then \(\lambda _2\) and \(\alpha \) must be zero and one, respectively, and if \(\lambda _2 = 0\), then \(\alpha =1\) with the following density:

$$\begin{aligned} f(x_1,\ldots ,x_n;\lambda _1 ,\lambda _2 ,\alpha ) = c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \limits _{i = 1}^n {\phi (x_i )} .\Phi \left( \frac{{\lambda _1 \prod \limits _{i = 1}^n {x_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \limits _{i = 1}^n {x_i } } \right) ^{2\alpha } } }}\right) , \end{aligned}$$
(33)

where \(\phi (x_1)\),...,\(\phi (x_n)\) are standard normal densities and

$$\begin{aligned} c({\lambda _1},{\lambda _2},\alpha )= & {} \frac{1}{{\int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {\prod \nolimits _{i = 1}^n {\phi ({x_i})} \Phi \left( \frac{{{\lambda _1}\prod \nolimits _{i = 1}^n {{x_i}} }}{{\sqrt{1 + {\lambda _2}{{\left( {\prod \nolimits _{i = 1}^n {{x_i}} } \right) }^{2\alpha }}} }}\right) d{x_1}\ldots \,d{x_n}} } }}\nonumber \\= & {} \frac{1}{{E\left( {\Phi }\left( \frac{{{\lambda _1}(\prod \nolimits _{i = 1}^n {{U_i}} )}}{{\sqrt{1 + {\lambda _2}{{(\prod \nolimits _{i = 1}^n {{U_i}} )}^{2\alpha }}} }}\right) \right) }}, \end{aligned}$$
(34)

where \(U_1 ,\ldots ,U_n \mathop \sim \limits ^{iid} N(0,1)\), with the following property:

\(c( - \lambda _1 ,\lambda _2 ,\alpha ) = c(\lambda _1 ,\lambda _2 ,\alpha )\).

Proposition 10 presents the relation between MSSGN and SSGN distributions and Proposition 11 presents an stochastic representation of MSSGN distribution.

Proposition 9

Let \(\left( {X_1 ,\ldots ,X_n } \right) \sim MSSGN(\lambda _1 ,\lambda _2 ,\alpha )\). The following properties hold:

  1. (1)

    The conditional distribution of each random variable given the other random variables has a shape skew normal distribution, i.e.,

    $$\begin{aligned}&X_i \left| \left( {X_1 ,\ldots ,X_{i - 1} ,X_{i + 1} \ldots ,X_n } \right) = \left( {x_1 ,\ldots ,x_{i - 1} ,x_{i + 1} \ldots ,x_n } \right) \right. \nonumber \\&\quad \sim SSGN\left( \lambda _1 .\prod \limits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_i ,} \,\lambda _2 .\prod \limits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_i ^{2\alpha } } ,\alpha \right) , \end{aligned}$$
    (35)

    for \(i = 1,\ldots ,n\).

  2. (2)

    The conditional distribution of each random vector given the other random variables has a multivariate shape skew normal distribution, i.e.,

    $$\begin{aligned}&\left( {X_1 ,\ldots ,X_r } \right) \left| {X_{r + 1} ,\ldots ,X_n = (x_{r + 1} ,\ldots ,x_n )} \right. \nonumber \\&\quad \sim MSSGN\left( \lambda _1.\prod \limits _{\scriptstyle j = r + 1}^n {x_i } ,\lambda _2. \prod \limits _{\scriptstyle j = r + 1}^n {x_i ^{2\alpha } } ,\alpha \right) \end{aligned}$$
    (36)

    for \(1<r < n\).

Proof of part (1) Let \(Y = X_i \left| {\left( {X_1 ,\ldots ,X_{i - 1} ,X_{i + 1} \ldots ,X_n } \right) = \left( {x_1 ,\ldots ,x_{i - 1} ,x_{i + 1} \ldots ,x_n } \right) } \right. \). From the definition of conditional pdf, we have

$$\begin{aligned} f_Y (y)= & {} \frac{{f_{X_1 , \cdots X_n } (x_1 , \ldots ,x_{i - 1} ,y,x_{i + 1} , \ldots ,x_n )}}{{f_{X_1 , \ldots ,X_{i - 1} ,X_{i + 1} , \ldots ,X_n } (x_1 , \ldots ,x_{i - 1} ,x_{i + 1} , \ldots ,x_n )}} \nonumber \\= & {} \frac{{c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {\phi (x_j ).\phi (y)} .\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j } } \right) y}}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j ^{2\alpha } } } \right) y^{2\alpha } } }}\right) }}{{\int _{ - \infty }^\infty {c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {\phi (x_j ).\phi (y)} .\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j } } \right) y}}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j ^{2\alpha } } } \right) y^{2\alpha } } }}\right) dy} }}\nonumber \\\propto & {} d\phi (y)\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j } } \right) y}}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j ^{2\alpha } } } \right) y^{2\alpha } } }}\right) \end{aligned}$$
(37)

where d is the normalizing constant. Therefore, Y has \(SSGN\left( \lambda _1 .\prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j ,} \,\lambda _2 .\prod \nolimits _{\begin{array}{c} \scriptstyle j = 1 \\ \scriptstyle j \ne i \end{array}}^n {x_j ^{2\alpha } } ,\alpha \right) \) distribution.

Fig. 4
figure 4

Some possible contours of bivariate [6] for several values of \(\lambda _1\), \(\lambda _2 \) and \(\rho \)

Fig. 5
figure 5

Some possible contours of bivariate [21] for several values of \(\delta _1\) and \(\delta _2 \)

Fig. 6
figure 6

Some possible contours of bivariate [18] for several values of \(\alpha _1\),\(\alpha _2 \), \(\beta _1 \), \(\beta _2 \), \(\beta _3 \) and \(\beta _4\)

Fig. 7
figure 7

Some possible contours of bivariate \(MSSGN(\lambda _1 ,\lambda _2 ,\alpha )\) for several values of \(\alpha ,\lambda _1\) and \(\lambda _2 \)

Proof of part (2) Let \(\mathbf{{Y}} = X_1 ,\ldots ,X_r \left| {X_{r + 1} ,\ldots ,X_n = (x_{r + 1} ,\ldots ,x_n )} \right. \). From the definition of conditional pdf, we have

$$\begin{aligned}&f_\mathbf{{Y}} (y_1 ,\ldots ,y_r ) = \frac{{f_{X_1 , \cdots X_n } (y_1 , \ldots ,y_r ,x_{r + 1} , \ldots ,x_n )}}{{f_{X_{r + 1} , \ldots ,X_n } (x_{r + 1} , \ldots ,x_n )}} \nonumber \\&\quad = \frac{{c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {\phi (x_j )} .\prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {\phi (y_i )} .\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i } }}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j ^{2\alpha } } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i ^{2\alpha } } } }}\right) }}{{\int _{ - \infty }^\infty {\ldots \int _{ - \infty }^\infty {c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {\phi (x_i )} .\prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {\phi (y_i )} .\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i } }}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j ^{2\alpha } } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i ^{2\alpha } } } }}\right) } dy_1 ,\ldots ,dy_r } }}\nonumber \\&\quad = d\prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {\phi (y_i ).\Phi \left( \frac{{\left( {\lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i } }}{{\sqrt{1 + \left( {\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_j ^{2\alpha } } } \right) \prod \nolimits _{\begin{array}{c} \scriptstyle i = 1 \\ \scriptstyle \end{array}}^r {y_i ^{2\alpha } } } }}\right) } \end{aligned}$$
(38)

where d is the normalizing constant. Thus, \(\mathbf {Y}\) has \( MSSGN\left( \lambda _1 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_i } ,\lambda _2 \prod \nolimits _{\begin{array}{c} \scriptstyle j = r + 1 \\ \scriptstyle \end{array}}^n {x_i ^{2\alpha } } ,\alpha \right) \) distribution. \(\square \)

Proposition 10

Let \(X_1 ,\ldots ,X_n \) and Z be i.i.d. random variables with N(0, 1) distribution. Then,

$$\begin{aligned} \left( {X_1 ,\ldots ,X_n } \right) \left| {\left\{ {Z \le \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {X_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {X_i ^{2\alpha } } } \right) } }}} \right\} } \right. \sim MSSGN(\lambda _1 ,\lambda _2 ,\alpha ). \end{aligned}$$
(39)

Proof

Let \( B = \left\{ {Z \le \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {X_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {X_i } } \right) ^{2\alpha } } }}} \right\} \). Then, we have

$$\begin{aligned} f_{\left( {X_1 ,\ldots ,X_n } \right) \left| B \right. } (x_1 ,\ldots ,x_n \left| B \right. )= & {} \frac{{p(B\left| {\left( {X_1 ,\ldots ,X_n } \right) = (x_1 ,\ldots ,x_n )} \right. ).f_{\left( {X_1 ,\ldots ,X_n } \right) } (x_1 ,\ldots ,x_n )}}{{p\left( {Z \le \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {X_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {X_i ^{2\alpha } } } \right) } }}} \right) }}\nonumber \\= & {} \frac{{p\left( Z \le \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {x_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {x_i ^{2\alpha } } } \right) } }}\right) \prod \nolimits _{i = 1}^n {\phi (x_i )} }}{{\int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {\prod \nolimits _{i = 1}^n {\phi (x_i )} \Phi \left( \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {x_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {x_i } } \right) ^{2\alpha } } }}\right) dx_1 ,\ldots ,dx_n } } }} \nonumber \\= & {} c(\lambda _1 ,\lambda _2 ,\alpha ).\prod \nolimits _{i = 1}^n {\phi (x_i )} .\Phi \left( \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {x_i } }}{{\sqrt{1 {+} \lambda _2 \left( {\prod \nolimits _{i = 1}^n {x_i } } \right) ^{2\alpha } } }}\right) . \end{aligned}$$
(40)

Thus \( \left( {X_1 ,\ldots ,X_n } \right) \left| {\left\{ {Z \le \frac{{\lambda _1 \prod \nolimits _{i = 1}^n {X_i } }}{{\sqrt{1 + \lambda _2 \left( {\prod \nolimits _{i = 1}^n {X_i ^{2\alpha } } } \right) } }}} \right\} } \right. \) has \(MSSGN(\lambda _1 ,\lambda _2 ,\alpha )\) distribution. \(\square \)

The MSSGN distribution reduces to multivariate normal distribution, \(MN_n (\mathbf{{0}},\mathbf{{I}})\) if \(\lambda _1 = 0\). In Fig. 7, some possible contours of bivariate \(MSSGN(\lambda _1 ,\lambda _2 ,\alpha )\) are shown for several values of \(\alpha ,\lambda _1\) and \(\lambda _2 \). This figure shows that our proposed class is more flexible than classical multivariate skew normal such as [6] (Fig. 4), [21] (Fig. 5) and the density of MSSGN distribution has different shape than the pdf of [18] (Fig. 6) distribution.

Table 1 Descriptive statistics for the first and second examples
Table 2 Estimated parameters and log-likelihood for the models SGN, FSGN, FGSN, SSGN and Mixture Normal for the first example (the numbers in brackets are the estimates are standard errors)

5 Data analysis

We consider the variable E-Shiny (first example) available in the database creaminess of cream cheese which can be found at http://www.models.kvl.dk/Cream and the Kevlar data represent the failure times when the pressure is at 70 percent stress level that is presented by [4]. Table 1 shows the summary statistics (length, mean, standard deviation, skewness (\({\gamma _1} = \frac{{{m_3}}}{{{s^3}}}\)) and kurtosis (\({\gamma _2} = \frac{{{m_4}}}{{{s^4}}}\))) ) for these two examples. (\(m_r\) is the rth central sample moments about mean). In Tables 2 and 5, two distributions are fitted to the data of the first and second examples, respectively. They are SGN [1] and FSGN [19]. Also, we compare our proposed distribution with two component mixture normal \((\mu _1 ,\sigma _1,\mu _2 ,\sigma _2, p)\) distributions. In all cases, the models are augmented by the inclusion of location \((\mu )\) and scale \((\sigma )\) parameters. In the second example, FSGN reduces to FGSN [18], since \({\hat{\lambda }}_2=0\).

Table 3 Formal goodness of fit statistics for first example
Table 4 Formal goodness of fit statistics for second example
Table 5 Estimated parameters and log-likelihood for the models SGN, FSGN, SSGN and mixture normal for the second example (the numbers in brackets are the estimates are standard errors)

In all of the cases, the parameters are estimated by the method of maximum likelihood using the stats package, optim function, of R software. If data set has unimodal histogram, then the parameter \(\alpha \) can have values \(-1\) , 1 and if it has bimodal histogram, we must search for \(\alpha \) in \( \mathbb {Z} -\left\{ -1,0,1\right\} .\) In the following examples, we are faced with two bimodal data sets. Thus, in view of Sect. 3, we must define a loop on the parameter \(\alpha \) in \( \mathbb {Z} -\left\{ -1,0,1\right\} .\) For simplicity, however, we choose an N in \( \mathbb {Z} \) and define a loop on \(\left\{ -N,\ldots ,N\right\} -\left\{ -1,0,1\right\} \). Then at each step of the loop, by optim function in R program, the MLE and the corresponding log-likelihood values are obtained. After completing all the steps in the loop, the MLEs of all the parameters are obtained by maximizing the log-likelihood function. The standard errors of all the parameters except \(\alpha \) are calculated using observed Fisher Information Matrix based on Hessian Matrix. The Hessian Matrix is obtained via “\( Hessian=T\)” code in optim function and finally just for the parameter \( \alpha \) of SSGN distribution, standard error of the MLE is calculated using parametric bootstrap with the same sample size.

Akaike information criterion (AIC) and Corrected Akaike information criterion (CAIC) [8] statistics are used for goodness of fit test criterion fitting mentioned distributions applied to two data sets. The lower value of these statistics show better fit considering the number parameters of models. Tables 3 and 4 present the values of these statistics for two real data sets.

To compare the SSGN distribution with the SGN model for both data sets, consider testing the null hypothesis of an SGN distribution against a SSGN distribution using the likelihood ratio statistics based on the ratio \( \Lambda = L_{SGN} ({\hat{\mu }} ,{\hat{\sigma }} , {\hat{\lambda }}_1 ,{\hat{\lambda }}_2 )/L_{SSGN} ({\hat{\mu }} ,{\hat{\sigma }} , {\hat{\lambda }}_1 ,{\hat{\lambda }}_2 ,{\hat{\alpha }} )\). Substituting the estimated values, we obtain \(-2log \Lambda \) for the first and the second example as 11.515 and 8.108, respectively. When compared with the 95 percent critical value of the \(\chi _{(1)}^2 = \mathrm{{3}}\mathrm{{.841}}\), we conclude that the null hypotheses are clearly rejected and there is a strong indication that the SSGN distribution presents a much better fit than the SGN distribution to the data sets under consideration.

In both examples theoretical mean, standard deviation, skewness and kurtosis coefficients (\(\gamma _1\) and \(\gamma _2\)) of distributions are presented in Tables 2 and 5. By considering scale of data sets, all theoretical and empirical statistics (Table 1) are approximately equal.

Fig. 8
figure 8

Histogram for the E-Shiny variable. The curves represent densities fitted by maximum likelihood

Fig. 9
figure 9

Histogram for the failure times variable. The curves represent densities fitted by maximum likelihood

6 Conclusion

This paper introduces a flexible generalization of skew generalized normal distribution by adding a shape parameter to define shape skew generalized normal (SSGN) distribution, which also includes the Azzalini skew normal, Arellano-Valle et al. [1] skew generalized normal and special case of extended skew generalized normal distribution [10]. Inferential properties and three generation procedures are mentioned for our model. This model includes popular structure such as uni/bimodality, skewness, heavy tail and wider range for Pearson’s excess kurtosis coefficient than SN and SGN distributions. Therefore, the proposed distribution is appropriate for other aspects of statistical analysis. The bivariate version of the distribution can model data sets with at most four modes, and its multivariate version can be used in graphical models such as directed acyclic graphs (DAG) (Figs. 8, 9).