Abstract
In this paper, we introduce an unrestricted skew-normal generalized hyperbolic (SUNGH) distribution for use in finite mixture modeling or clustering problems. The SUNGH is a broad class of flexible distributions that includes various other well-known asymmetric and symmetric families such as the scale mixtures of skew-normal, the skew-normal generalized hyperbolic and its corresponding symmetric versions. The class of distributions provides a much needed unified framework where the choice of the best fitting distribution can proceed quite naturally through either parameter estimation or by placing constraints on specific parameters and assessing through model choice criteria. The class has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. We illustrate the flexibility of the proposed class of distributions in a mixture modeling context using a Bayesian framework and assess the performance using simulated and real data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Statistical models based on finite mixtures of distributions have been extensively used in a wide variety of applications. Applying finite mixture models to real datasets allows fitting different characteristics of the empirical distribution, such as multimodality, skewness, kurtosis and heterogeneity, across observations. For general reviews of mixture models and applications, see Hogan and Laird (1997), Böhning (2000), McLachlan and Peel (2000), Frühwirth-Schnatter (2006), Lin (2010) and Mengersen et al. (2011).
While the vast majority of work on mixture models has focused on Gaussian mixture models, in many applications the tails of Gaussian distributions are shorter than appropriate and the Gaussian shape is not suitable for highly asymmetric data. Recent research has thus focused on fitting finite mixture models with more flexible distributional forms. The Student-t and the contaminated Gaussian distributions are two symmetric members of the scale mixtures of normal (SMN) family of distributions due to Andrews and Mallows (1974), which provide attractive heavy-tailed alternatives to the Gaussian distribution. Building upon this work is the class of scale mixtures of skew-normal (SMSN) distributions proposed by Branco and Dey (2001). The class of SMSN distributions provides location-scale density functions which depend on additional parameters of shape and kurtosis, and includes as special cases the normal and skew-normal (SN) densities, as well as the full SMN class of symmetric densities. Special symmetric and skew-symmetric heavier tail members of the SMSN family are, e.g., the Student-t, Cauchy, skew-t (ST), skew-Cauchy (SC), skew-contaminated normal (SCN) and skew-slash (SSL) distributions. Comprehensive coverage of the fundamental theory and new developments for SN and related distributions is given by Azzalini and Capitanio 2014; see also Genton (2004), Arellano-Valle and Genton (2005, 2010), Arellano-Valle and Azzalini (2006), Arellano-Valle and Azzalini (2006).
Many of the different distributions within the class of SMSN distributions have been developed, and their performance assessed in the context of mixture models. Lin et al. (2007), Lin et al. (2009) and Pyne et al. (2009) studied mixtures of skew-normal distributions. Frühwirth-Schnatter and Pyne (2010) considered Bayesian inference for finite mixtures of univariate and multivariate SN and ST distributions. Basso et al. (2010) considered the robust mixture modeling based on the SMSN family. Wang et al. (2009), Lin (2010), Lee and McLachlan (2014), Vrbik and McNicholas (2012) and Forbes and Wraith (2014) considered mixtures of multivariate ST distribution. Maleki and Arellano-Valle (2017) proposed a time series model based on finite mixtures of SMSN distributions. For a review of mixtures of SN and ST distributions, see Lee and McLachlan (2013a, b).
Other distributional forms within the SMSN family of distributions have also been examined. Karlis and Santourian (2009) developed mixtures of multivariate normal inverse Gaussian distributions. Franczak et al. (2014) examined mixtures of shifted asymmetric Laplace (SAL) distributions. Morris et al. (2014) proposed mixtures of contaminated SAL distributions. Browne and McNicholas (2015) and Wraith and Forbes (2015) examined mixtures of generalized hyperbolic distributions.
In the SMSN class of distributions, although the mixing distribution typically controls the tail behavior, it can also affect the behavior of skewness (Branco and Dey 2001). A recent work on this theme providing flexibility in both skewness and heavy tails has been considered by Vilca et al. (2014) in a class of distributions referred to as multivariate SN generalized hyperbolic (SNGH) distributions. In this setting, the mixing distribution follows a generalized inverse Gaussian distribution (GIG), which has previously been demonstrated to provide considerable flexibility in modeling heavy-tailed data (Wraith and Forbes 2015).
In other recent works, a broad class of skewed distributions has been explored by Lee and McLachlan (2016) in a mixture model context focusing on the unified ST (SUT) distribution (Arellano-Valle and Azzalini 2006) and the fundamental ST distribution (Arellano-Valle and Genton 2005), including as a special case the location-scale variant of the canonical fundamental or unrestricted ST (or skew unified t; SUT) distribution. A particular feature and advantage of the SUT distribution is that it encompasses as special cases the canonical fundamental or unrestricted SN (or skew unified normal; SUN) distribution (Arellano-Valle and Genton 2005) and other SN or ST variants (e.g., Sahu et al. 2003; Arellano-Valle et al. 2007; Lachos et al. 2007, 2010), thus providing considerable flexibility for modeling where the best fitting distribution can be chosen simply (automatically) through parameter estimation or use of model choice criteria.
In this paper, we propose a very general class of distributions which extends the previous work on the SUN and SUT distributions by considering a mixing distribution for this class of models which follows a generalized inverse Gaussian (GIG). We refer to this new family of distributions as an unrestricted skew-normal generalized hyperbolic distribution (SUNGH). The new family provides a very general framework for a large class of distributions and has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. The family also provides a high degree of flexibility for the modeling of complex multivariate data with different degrees of asymmetry, kurtosis and heavy tails. A particular attractiveness of this family of distributions is that it encompasses as special cases all of the distributions previously considered in the SMSN family and extensions to the unrestricted classes (e.g., SUT and SUN). Thus, this class of distributions provides a much needed unified framework where the choice of the best fitting distribution can proceed quite naturally through either parameter estimation or by placing constraints on specific parameters and assessing through model choice criteria. We illustrate the advantages of this new family in the finite mixture modeling context using a Bayesian framework.
There are some computational advantages to using a Bayesian framework in a mixture model setting. First, allowing for the influence or effect of missing data on parameter estimates is quite natural in a Bayesian setting as various patterns of missing data (e.g., class-dependent missingness) can be imputed at each MCMC iteration from the posterior predictive distribution (e.g., using a mixture model defined using open-source software such as JAGS or NIMBLE). In contrast, often quite separate and complex methods are needed for maximum likelihood estimation in these settings (Lin et al. 2009 and Wang et al. 2004). Further, for the complex distributions we consider in this paper, previous work using the EM algorithm has at times relied on approximations (Lee and McLachlan 2016) or calculations of derivatives involving complex functions (Browne and McNicholas 2015) for the estimation of parameters. This difficulty also extends to the estimation of the standard errors for parameters (if they are available) using asymptotic approximations to the observed information matrix if the sample size is large or resorting to a bootstrap method which is computationally demanding (Basso et al. 2010). At times, the standard errors for the parameters are also unavailable (particularly for the GH distribution) or not mentioned (e.g., Browne and McNicholas 2015). This is not to say that estimation in the Bayesian setting is devoid of potential computational issues, in particular the issue of label switching is a more prominent issue compared to methods using ML estimation (Mengersen et al. (2011)).
The paper is organized as follows. In Sect. 2, we provide some background to the SUN and GIG distributions. Section 3 outlines the details and properties of the new SUNGH family. In Sect. 4, we present a Bayesian analysis of a finite mixture model following a SUNGH distribution. In Sect. 5, we illustrate the performance of the proposed approach on real and simulated data. Finally, in Sect. 6, we present our main conclusions and discuss some areas of further research.
2 SUN and GIG distributions
2.1 Preliminaries
Following Arellano-Valle and Genton (2005), Arellano-Valle and Azzalini (2006) and Arellano-Valle et al. (2007), we say that a \(p\times 1\) random vector \({\varvec{X}}\) follows an unrestricted skew-normal (SUN) with \(p\times 1\) location vector \({\varvec{\mu }} \), \(p\times p\) positive definite dispersion matrix \(\varvec{\Sigma }\) and \(p\times q\) skewness parameter matrix \(\varvec{\Lambda }\), denoted by \({\varvec{X}}\sim \mathrm{SUN}_{p,q} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }} \right) \), if its probability density function (pdf) is
where \(\varvec{\uppsi }=\varvec{\Sigma }+{\varvec{\Lambda \Lambda }}^{\top }\), \(\varvec{\Upsilon }={\varvec{I}}_q -\varvec{\Lambda }^{\top }\varvec{\uppsi }^{-1}\varvec{\Lambda }=\left( {{\varvec{I}}_q +\varvec{\Lambda }^{\top }\varvec{\Sigma }^{-1}\varvec{\Lambda }} \right) ^{-1}\), and \(\phi _k \left( {\cdot \hbox {|}{\varvec{\mu }} ,\varvec{\uppsi }} \right) \) and \({\Phi }_q \left( {\cdot \hbox {|}\varvec{\Upsilon }} \right) \) are, respectively, the pdf and cumulative distribution function (cdf) of the multivariate normal distributions given by \(N_p \left( {{\varvec{\mu }} ,\varvec{\uppsi }} \right) \) and \(N_q \left( {{\varvec{0}},\varvec{\Upsilon }} \right) \). The SUN class of multivariate distributions defined by (1) contains various special cases. For instance, we recover the multivariate normal when \(\varvec{\Lambda }=\mathbf{0}\), the multivariate SN which called here restricted SN (rMSN) when \(q=1\), and the multivariate SN of Sahu et al. (2003) when \(p=q\) and \(\varvec{\Lambda }\) being a diagonal matrix. In fact, the SUN distribution becomes an important special case of the unified SN distribution (SUN) studied by Arellano-Valle and Azzalini (2006).
The random vector \({\varvec{X}}\sim \mathrm{SUN}_{p,q} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }} \right) \) can be stochastically represented from different ways. According to Arellano-Valle et al. (2006), the SUN random vector \({\varvec{X}}\) has selection representation given by
where the condition \({\varvec{V}}_0 >0\) means that each element of \({\varvec{V}}_0 \) is positive, and
The representation in (2) becomes a selection representation of the rMSN distribution when \(q=1\), i.e., when the latent vector \({\varvec{V}}_0 \) is replaced by a one-dimensional normal random variable \({{V}}_0 \). Also, if we let \({\varvec{V}}_0 ={\varvec{W}}_0\) and \({\varvec{V}}_1 ={\varvec{W}}_1 +\varvec{\Lambda }{\varvec{W}}_0 \), where \({\varvec{W}}_0 \sim {\varvec{N}}_q \left( {\mathbf{0},{\varvec{I}}_{q} } \right) \) and \({\varvec{W}}_1 \sim {\varvec{N}}_p \left( {{\varvec{0}}, {\varvec{I}}_p } \right) \) are independent, it follows from (2) that the stochastic representation of \({\varvec{X}}\) is given by
where \(\left| {{\varvec{W}}_0} \right| \) is the vector formed with the absolute value of each component of \({\varvec{W}}_0\). For more details, see Arellano-Valle et al. (2006), Arellano-Valle and Azzalini (2006) and Arellano-Valle et al. (2007). In particular, the mean vector and covariance matrix of \({\varvec{X}}\) are given by \(E\left[ {\varvec{X}} \right] ={\varvec{\mu }} +\sqrt{2/\pi }\varvec{\Lambda }{\varvec{1}}_p \) and \(\hbox {Cov}\left[ {\varvec{X}} \right] =\varvec{\uppsi }-\frac{2}{\pi }\varvec{\Lambda }{} \mathbf{1}_q \mathbf{1}_q^\top \varvec{\Lambda }^{\top }\), respectively, where \(\mathbf{1}_q \) denotes the vector of ones with length q.
In this work, we consider the extension of the scale mixtures of rMSN (SMRSN or SMSN) distributions to the scale mixtures of SUN (SMSUN) distributions. Specifically, we consider the family of random vectors defined by
where \({\varvec{X}}\sim \mathrm{SUN}_{p,q} \left( {\mathbf{0},\varvec{\Sigma },\varvec{\Lambda }} \right) \), \(\kappa \left( \cdot \right) \) is a positive scale function and U is a mixing random variable which is independent of \({\varvec{X}}\). For our proposed SUNGH distribution, we consider the SMSUN class of distributions defined by (4) when the mixing random variable U follows a GIG distribution.
2.2 The family of GIG distribution
The GIG class is a rich family of flexible distributions with positive support that has been studied by several authors. For instance, see Good (1953), Barndorff-Nielsen and Halgreen (1977), Jørgensen (1982), among others. Thus, the choice of a GIG distribution for the scale mixing variable U in (4) is a natural candidate and provides a highly flexible unified class of multivariate distributions for multivariate statistical analysis.
The GIG distribution has several but equivalent representations in terms of its parameterization. In this paper, and in order to simplify and have closed-form posterior distributions in the Bayesian framework adopted here, we consider (without loss of generality) the following two representations:
First representation \(\mathrm{GIG}^{*}\left( {\upsilon ,\gamma ,\rho } \right) \) : A random variable U has a GIG distribution, denoted by \(U\sim \mathrm{GIG}^{*}\left( {\upsilon ,\gamma ,\rho } \right) \), if its pdf is given by
where \(K_r \left( x \right) \) is the modified Bessel function of the third kind of order r evaluated at x, and the parameter spaces are given by \(\gamma >0\), \(\rho >0\) and \(-\infty<\upsilon <+\infty \).
Second representation \(\mathrm{GIG}_*\left( {\upsilon ,\psi ,\eta } \right) \) : A random variable U follows a GIG distribution denoted by \(U\sim \mathrm{GIG}_*\left( {\upsilon ,\psi ,\eta } \right) \), if its pdf is given by
where \(K_r \left( x \right) \) is defined previously and the parameter spaces are \(\psi >0\), \(\eta >0\) and \(-\infty<\upsilon <+\infty \). This representation will be used to simplify the posterior representation of the GIG parameters. In this case, the mth moment of the random variable \(U^{1/2}\) is given by
The equivalence between both representations of the GIG distribution considered in (5) and (6) is obtained by observing the one-to-one relationship between their parameters given by \(\psi =\rho \gamma \) and \(\eta =\rho /\gamma \). Particular members of the GIG class lead to a variety of skewed distributions belonging to the proposed family. The inverse Gaussian is one member of this class which has been extensively studied by Chhikara and Folks (1989), Seshadri (1993) and Johnson et al. (1994, chap. 15). Two additional members of the GIG class are the hyperbola and the positive hyperbolic distributions, both of which have been studied by Barndorff-Nielsen (1978) and Barndorff-Nielsen and Blaesild (1980). The exponential, gamma and inverse gamma distributions are also special members of the GIG family. For a recent study on these distributions, see Vilca et al. (2014) and references therein.
In this paper, we define the multivariate random variable \({\varvec{Y}}\) via (4), and by considering a multivariate SUN random variable \({\varvec{X}}\) according to (3) and a GIG scale random variable U distributed according to the second representation in (6). As mentioned previously, we refer to this proposed family as SUNGH distributions.
3 The family of SUNGH distributions
An alternative way to define SUNGH distribution follows by replacing Eq. (3) in Eq. (4). From this, we can say that a \(p\times 1\) random vector \({\varvec{Y}}\) follows a SUNGH distribution if
where \({\varvec{\mu }} \) is a \(p\times 1\) location vector, \(\varvec{\Sigma }\) is a \(p\times p\) scale matrix, \(\varvec{\Lambda }\) is a \(p\times q\) shape matrix, \({\varvec{W}}=\kappa ^{1/2}\left( U \right) \left| {{\varvec{W}}_0 } \right| \), \({\varvec{W}}_0 \sim N_q \left( {\mathbf{0},{\varvec{I}}_{q} } \right) \), \({\varvec{W}}_1 \sim N_p \left( {\mathbf{0},{\varvec{I}}_{p} } \right) \) and \(U\sim \mathrm{GIG}_*\left( {\upsilon ,\psi ,\eta } \right) \), with \({\varvec{W}}_0 \), \({\varvec{W}}_1 \) and U being independent random quantities. These assumptions also imply that \({\varvec{W}}\) is also independent of \({\varvec{W}}_1 \). Note that if we set \({\varvec{W}}=U,\kappa \left( u \right) =u\) and \(q=1\) we obtain the GH distribution proposed by McNeil et al. (2005) and considered in the mixture model context by Browne and McNicholas (2015). For this reason, the GH distribution is more restrictive (less flexible) compared to the SUNGH distribution. Since the conditional distribution of \({\varvec{Y}}\) given \(U=u\) is given by \(\left. {\varvec{Y}} \right| U=u\sim \mathrm{SUN}_{p,q} \left( {{\varvec{\mu }} ,\kappa \left( u \right) \varvec{\Sigma },\kappa \left( u \right) ^{1/2}\varvec{\Lambda }} \right) \), the marginal pdf of \({\varvec{Y}}\) becomes the infinite mixture of the SUN pdf in (1) given by
\({\varvec{y}}\in {\varvec{R}}^{p}\), where \({\varvec{\varpi }} =\left( {\upsilon ,\psi ,\eta } \right) ^{\top }\), and \(\varvec{\uppsi }\) and \({\varvec{\varUpsilon }} \) defined as in (1). In what follows, we will refer to the SUNGH random vector in (7) as \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda },{\varvec{\varpi }} } \right) \).
Note that there are some identifiability issues concerning the GIG parameters \(\varvec{\varpi } \) and skewness matrix \(\varvec{\Lambda }\). Using (8) the density is not identifiable as for any parameter \(c>0\), the parameters \(\left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda },\upsilon ,\psi ,\eta } \right) \) and \(\left( {\varvec{\mu } ,c\varvec{\Sigma },c\varvec{\Lambda },\upsilon ,\psi /c,c\eta } \right) \) yield the same density. A simple fix which results in an identifiable density is to set \(\eta =1\) and so \(\varvec{\varpi }=\left( {\upsilon ,\psi } \right) ^{\top }\). An alternative parameterization which can provide for greater flexibility is discussed in Wraith and Forbes (2015). Further, any permutation matrix can be multiplied by \({\varvec{W}}\) from the stochastic representations (7) without any changes to the distribution of \({\varvec{Y}}\), so sorting \(\varvec{\Lambda }\) by the norm of the columns or some other sorting method is also needed to ensure identifiability of the proposed model.
Varying the scale mixing function \(\kappa \left( U \right) \) for a given distribution of U belonging to the \(\mathrm{GIG}_*\left( \varvec{\varpi } \right) \) class leads to a variety of members in the SUNGH family. Alternatively, we can fix the scale function and vary the distribution of U within the \(\mathrm{GIG}_*\left( \varvec{\varpi } \right) \) class. In the latter case, a convenient choice for the scale function is \(\kappa \left( u \right) =u\), for which the pdf in (8) becomes
where \({\varvec{{\nu }} }'= \left( {\upsilon ,\sqrt{\psi /\eta } ,\sqrt{\psi \eta } } \right) ^{ \top } )\), \({\varvec{{\nu }}}''= \big ( \upsilon - p/2,\sqrt{\psi /\eta } ,q^{\prime }\big ( {\varvec{y}} \big ) \big )^{ \top } \), \({q}' \left( {\varvec{y}} \right) ^{2} = \left( {\varvec{y}} - {\varvec{\mu }} \right) ^{ \top } {\varvec{\uppsi }}^{{ - 1}} \left( {{\varvec{y}} - \varvec{\mu }} \right) + \psi \eta \), \(\varvec{\uppsi }=\varvec{\Sigma }+\varvec{\Lambda \Lambda }^{\top }\), \({\varvec{\varUpsilon }} ={\varvec{I}}_{q} -\varvec{\Lambda }^{\top }\varvec{\uppsi }^{-1}\varvec{\Lambda }\) and \({\varvec{B}}=\varvec{\Lambda }^{\top }\varvec{\uppsi }^{-1}\left( {{\varvec{y}}-{\varvec{\mu }} } \right) \), \(\mathcal{G}\mathcal{H}_p \) and \(GH_q \) denote the p-variate pdf and q-variate cdf of the generalized Hyperbolic distribution, respectively (Wraith and Forbes 2015).
The flexibility of the SUNGH family proposed in (8) can also be observed by varying the value of the dimension q. In fact, for \(q=1\) (the restricted case) we obtain as a special case of (8) the SN generalized hyperbolic (SNGH) distributions considered in Vilca et al. (2014), and thus some known SMSN (or SMRSN) distributions, as well the corresponding symmetrical variant for \(\varvec{\Lambda }=\mathbf{0}\).
A special case of the GIG distribution is the gamma distribution, so the proposed family of distributions covers the canonical fundamental unrestricted skew-normal (CFUSN) distribution of Arellano-Valle and Genton (2005), and the canonical fundamental unrestricted skew-t (CFUST) distribution of Lee and McLachlan (2016). Subsequently, a mixture model approach covering these distributions contains the finite mixtures of CFUSN and CFUST. By considering (9) in the symmetric case, the SUNGH and GH studied by Wraith and Forbes (2015) and Browne and McNicholas (2015) are similar, but in the asymmetric case these families are different. In particular, a greater degree of flexibility is available for the SUNGH family by allowing the skewness parameter to be multivariate (\(p\times q)\) rather than \(p\times 1\). The SUNGH family also has several desirable properties outlined in Propositions 2 to 6 below which will allow the family to be used in a variety of statistical models (e.g., mixed models and regression).
Known members of the SMSN family contained in the SNGH family are the SN, ST, SSL and skew-Laplace (SLP), and their respective symmetric versions. In the unrestricted case (\(q>1)\), the proposed family contains several subfamilies of distributions (symmetric and asymmetric) considered in the literature. For instance, if in (9) we let \(q=p\), \(\varvec{\Lambda }=\hbox {diag}\left( {{\lambda }_1 ,\ldots ,{\lambda }_{\mathrm{p}} } \right) \) and \(\kappa \left( u \right) =1\), then the multivariate skew-normal distribution of Sahu et al. (2003) is obtained. Finally, if \(\varvec{\Lambda }=\mathbf{0}\) (symmetric case) and \(\kappa \left( u \right) =u\), then (9) becomes the symmetric generalized hyperbolic (GH) distribution introduced by Barndorff-Nielsen and Halgreen (1977).
In the following propositions, we present some necessary and useful properties of the SUNGH family for the next sections. The proofs of these results are presented in “Appendix.”
Proposition 1
Let \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q} \left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda },\varvec{\varpi } } \right) \). Then, the following results hold:
-
(a)
if \(k_1 =E\left[ {\kappa \left( U \right) ^{1/2}} \right] <\infty \), then \(E\left[ {\varvec{Y}} \right] ={\varvec{\mu }} +\sqrt{\frac{2}{\pi }}k_1 \varvec{\Lambda }{} \mathbf{1}_q \),
-
(b)
if \(k_2 =E\left[ {\kappa \left( U \right) } \right] <\infty \), then \(\hbox {Var}\left[ {\varvec{Y}} \right] =k_2 \varvec{\uppsi }-\frac{2}{\pi }\varvec{\Lambda }\left[ {k_2 {\varvec{I}}_{q} -\left( {k_2 -k_1^2 } \right) \mathbf{1}_q \mathbf{1}_q^\top } \right] \varvec{\Lambda }^{\top }\).
Proposition 2
Let \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q} \left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda },\varvec{\varpi } } \right) \). Then, \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q+m} \left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda }^{*},\varvec{\varpi } } \right) \) for each \(m=1,2\),..., where \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\varvec{\Lambda }_{p\times q} }&{} {\mathbf{0}_{p\times m} } \\ \end{array} }} \right) \) or \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\mathbf{0}_{p\times m} }&{} {\varvec{\Lambda }_{p\times q} } \\ \end{array} }} \right) \).
Proposition 3
Let \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q} \left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda },\varvec{\varpi } } \right) \). Then, for each \({\varvec{b}}\in {\varvec{R}}^{n}\) and full row rank matrix \({\varvec{B}}\in {\varvec{R}}^{n\times p}\) we have
Proposition 4
Let \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q} \left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda },\varvec{\varpi } } \right) \). Partition \({\varvec{Y}}=\left( {{\varvec{Y}}_1^\top ,{\varvec{Y}}_2^\top } \right) ^{\top }\), where the first and second sub-vectors are of dimensions \(p_1 \times 1\) and \(p_2 \times 1\), respectively, with \(p_1 +p_2 =p\). The corresponding partition of the parameters \(\left( {\varvec{\mu } ,\varvec{\Sigma },\varvec{\Lambda }} \right) \) is
where \({\varvec{\mu }}_i \), \(\varvec{\Sigma }_{ii} \) and \(\varvec{\Lambda }_i \) have dimensions \(p_i \times 1\), \(p_i \times p_i \) and \(p_i \times q\), respectively, for \(i=1,2\). Then, the marginal distribution of \({\varvec{Y}}_i \) is \(\mathrm{SUNGH}_{p_i ,q} \left( {\varvec{\mu } _i ,\varvec{\Sigma }_{ii} ,\varvec{\Lambda }_i ,\varvec{\varpi } } \right) ,i=1,2\).
Proposition 5
If under the same conditions of Proposition 4, we have \(\varvec{\Sigma }_{12} =\varvec{\Sigma }_{21} =\mathbf{0}\) then a necessary and sufficient condition to have null correlation between \({\varvec{Y}}_1 \) and \({\varvec{Y}}_2 \) is that \(\varvec{\Lambda }_1 =\mathbf{0}\) or \(\varvec{\Lambda }_2 =\mathbf{0}\).
Proposition 6
Consider the same conditions of Proposition 4 with the partition of shape matrix \(\varvec{\Lambda }_ =\left( {\varvec{\Lambda }_{ij} } \right) _{i,j=1,2} \), where \(\varvec{\Lambda }_{ij} \) has dimension \(p_i \times q_j\), with \(q_1 +q_2 =q\). If \(\varvec{\Sigma }_{12} =\varvec{\Sigma }_{21}^\top =\mathbf{0}\) and \(\varvec{\Lambda }_{12} =\mathbf{0}\) or \(\varvec{\Lambda }_{21} =\mathbf{0}\), then \({\varvec{Y}}_i \sim \mathrm{SUNGH}_{p_i ,q_i } \left( {\varvec{\mu } _i ,\varvec{\Sigma }_{ii} ,\varvec{\Lambda }_{ii} ,\varvec{\varpi } } \right) ,i=1,2\), and \(\hbox {Cov}\left( {{\varvec{Y}}_1 ,{\varvec{Y}}_2 } \right) =-\frac{2}{\pi }k_1^2 \varvec{\Lambda }_{12} \mathbf{1}_{q_1 } \mathbf{1}_{q_2 }^\top \varvec{\Lambda }_{21}^\top \).
4 Finite mixtures of SUNGH family
4.1 FM-SUNGH model
In this section, we consider finite mixtures of the proposed SUNGH family of distributions (hereafter FM-SUNGH). To establish notation, we consider the usual mixture model defined as
where \(\varvec{\Theta }=\big ( {\varvec{\Theta }_1 ,\ldots ,\varvec{\Theta }_K } \big )\), with \(\varvec{\Theta }_k =\big ( {\varvec{\mu }} _k ,\varvec{\Sigma }_k ,\varvec{\Lambda }_k ,{\varvec{\upsilon }}_k ,{\varvec{\psi }} _k ,{\varvec{\eta }} _k \big ), k=1,\ldots ,K\), \({\varvec{p}}=\left( {p_1 ,\ldots ,p_K } \right) ^{\top }\) (for which \(p_k >0,k=1,\ldots ,K\hbox {and}\sum \nolimits _{k=1}^K p_k =1)\), \({\varvec{\upsilon }}_k =\left( {\upsilon _{k1} ,\ldots ,\upsilon _{kp} } \right) ^{\top }\), \({\varvec{\psi }} _k =\left( {\psi _{k1} ,\ldots ,\psi _{kp} } \right) ^{\top }\), \({\varvec{\eta }} _k = \big ( \eta _{k1} ,\ldots ,\eta _{kp} \big )^{\top }\) and \(f\left( {{\varvec{y}};\varvec{\Theta }_k } \right) \) given by (8). This model hereafter will be called FM-SUNGH. The identifiability of mixtures of distributions has been studied by Teicher (1963) and Holzmann et al. (2006) to ensure that the FM-SUNGH is identifiable.
The SUNGH family is a rich class of distributions, and various particular forms from this family have been considered over the last few years in the case of mixture models. In Table 1, we outline details of some of the distributions and the corresponding parameters within the SUNGH family.
Using the mixture model representation in (10), for each i.i.d. sample in the form of \({\varvec{Y}}_1 ,\ldots ,{\varvec{Y}}_n \), we can utilize an (latent) indicator (allocation) variables \({\varvec{Z}}_1 ,\ldots ,{\varvec{Z}}_n \), to assign observations to belong to different components of the mixture \(\left( {k=1,\ldots ,K} \right) \). The standard assumption for the allocation random variables \(Z_1 ,\ldots ,Z_K \) is that they follow a multinomial joint distribution: \({\varvec{Z}}_i =\left( {Z_1 ,\ldots ,Z_K } \right) \sim Multinomial\left( {K,p_1 ,\ldots ,p_K } \right) \), so that \(P\left( {Z_i =k} \right) =p_k ;i=1,\ldots ,n, k=1,\ldots ,K\). In terms of \(Z_i \), we can conclude that
Let \({\varvec{C}}=\left\{ {{\varvec{Y,U,W,Z}}} \right\} \) denote the complete data, where \({\varvec{Y}}=\left( {{\varvec{Y}}_1^\top ,\ldots ,{\varvec{Y}}_n^\top } \right) ^{\top }\) is the observed variable and \({\varvec{U}}=\left( {U_{11} ,\ldots ,U_{1K} ,\ldots ,U_{n1} ,\ldots ,U_{nK} } \right) ^{\top }\), \({\varvec{W}}=\big ( {\varvec{W}}_{11}^\top ,\ldots ,{\varvec{W}}_{1K}^\top ,\ldots ,{\varvec{W}}_{n1}^\top ,\ldots ,{\varvec{W}}_{nK}^\top \big )^{\top }\) and \({\varvec{Z}}=\left( {Z_1 ,\ldots ,Z_n } \right) ^{\top }\) are the latent or unobserved variables. If we consider the SUNGH stochastic representation (7) in terms of a finite mixture model for \(i=1,\ldots ,n\) and \(k=1,\ldots ,K\), a hierarchical representation is
where \(HN_q \) denotes the q-variate right half-normal distribution.
The model’s complete data likelihood function is then given by
where \(H\phi _q \left( {{\varvec{w}}|\mathbf{0},\cdot } \right) =\phi _q \left( {{\varvec{w}}|\mathbf{0},\cdot } \right) I({\varvec{w}}>\mathbf{0})\) is the q-variate right half-normal pdf.
4.2 Bayesian analysis
4.3 Priors
In this section, we choose priors for the parameters \(\varvec{\Theta }\) which will be used in Applications section. By assuming independency between the different types of parameters in \(\varvec{\Theta }\) and that the skewness matrix of each mixture component be in the form of \(\varvec{\Lambda }_k =\left( {\left. {\left. {{\varvec{\lambda }} _{k1} } \right| \ldots } \right| {\varvec{\lambda }} _{kq} } \right) \), prior distributions for some of the FM-SUNGH model parameters are given by
for \(k=1,\ldots ,K\), and where Dir and IW denote the Dirichlet and inverse Wishart distributions, respectively. An alternative representation of the skewness matrices priors and posteriors in the Gibbs updates is provided in “Appendix.” Prior distributions of the scaled factor variables for \(k=1,\ldots ,K\) are:
4.3.1 Posteriors
By considering the likelihood function (15) and the priors specified previously, the joint posterior of \(\varvec{\Theta }\) is given by
The above joint posterior is intractable, but we can use an MCMC method such as Gibbs sampling and Metropolis–Hastings to draw samples using the conditional posterior distributions. To establish notation, let \(B_k =\left\{ {i,Z_i =k} \right\} \) be the set of observation indices for those \({\varvec{y}}_i \) classified into the kth cluster and \(n_k\) is equal to the number of observations allocated to the kth component (cluster). Apart from the parameters for the scaled factor variables, all conditional posterior distributions have closed form and are as follows: (Note that \(\varvec{\Theta }_{\left( {-{\varvec{\theta }} } \right) } \) denotes the set of parameters without its element \({\varvec{\theta }} \).)
\(\left. {\varvec{p}} \right| \varvec{\Theta }_{\left( {-p} \right) } ,{\varvec{y,u,w}},z_i =k\sim Dir\left( {\delta _{pos.1} ,\ldots ,\delta _{pos.K} } \right) \), where
\(\left. {{\varvec{\mu }} _k } \right| \varvec{\Theta }_{\left( {-\mu _k } \right) } ,{\varvec{y,u,w,}}z_i =k\sim N_p \left( {{\varvec{\mu }} ,\varvec{\Sigma }} \right) ,k=1,\ldots ,K\), where
\(\left. {\varvec{\Sigma }_k } \right| \varvec{\Theta }_{\left( {-\varvec{\Sigma }_k } \right) } ,{\varvec{y,u,w}},z_i =k\sim IW_{t_k +n} \left( {\varvec{T}} \right) ,k=1,\ldots ,K\), where
\(\left. {{\varvec{\lambda }} _{kt} } \right| \varvec{\Theta }_{\left( {-\lambda _{kt} } \right) } ,{\varvec{y,u,w,}}z_i =k\sim N_p \left( {{\varvec{\mu }} ,\varvec{\Sigma }} \right) ;k=1,\ldots ,K;t=1,\ldots ,p\), where
where \(\varvec{\Lambda }_{k\left( {-t} \right) } \) denotes the \(p\times \left( {q-1} \right) \) skewness matrix \(\varvec{\Lambda }_k \) with the tth column eliminated, \(\varvec{w}_{ik\left( {-t} \right) } \) denotes the \(\left( {q-1} \right) \times 1\) vector \(\varvec{w}_{ik} \) vector with the tth element eliminated, and \({w}_{ik\left( t \right) } \) denotes the tth element of the vector \(\varvec{w}_{ik} \).
The full conditional posterior distribution for the latent variables \(Z_i \), \(U_{ik} \) and \({\varvec{W}}_{ik} \), for \(i=1,\ldots ,n;k=1,\ldots ,K\), are given by:
\(\left. {Z_i } \right| \varvec{\Theta },{\varvec{y,u,w}}\sim \mathrm{Multinomial}\left( {K,p_{p.1} ,\ldots ,p_{p.K} } \right) \), where
\(\left. {U_{ik} } \right| \varvec{\Theta },{\varvec{y,w}},z_i =k\sim \mathrm{GIG}^{*}\left( {\hbox {a}_u ,\hbox {b}_u ,\hbox {c}_u } \right) \), where \(\kappa \left( u \right) =u\) and
\(\left. {{\varvec{W}}_{ik} } \right| \varvec{\Theta },{\varvec{y,u}},z_i =k\sim HN_q \left( {{\varvec{\mu }} ,\varvec{\Sigma }} \right) \), where
Finally, the full conditional posterior for the scaled factor variables \(\upsilon _k ,\psi _k ,\eta _k ,k=1,\ldots ,K\), is as follows: \(\left. {\eta _k } \right| \varvec{\Theta }_{\left( {-\eta _k } \right) } ,{\varvec{u,w}},z_i =k\sim \mathrm{GIG}^{*}\left( {\hbox {a}_\eta ,\hbox {b}_\eta ,\hbox {c}_\eta } \right) \), where
The full conditional posterior density of \(\upsilon _k ,k=1,\ldots ,K\) is proportional to:
where \(\pi _1 \left( {\upsilon _k } \right) =\left( {K_{\upsilon _k } \left( {\psi _k } \right) } \right) ^{-n_k }\mathop \prod \limits _{B_k } \left( {u_{ik} /\eta _k } \right) ^{\upsilon _k }\).
The full conditional posterior density of \(\psi _k ,k=1,\ldots ,K\) is also proportional to:
where \(\pi _2 \left( {\psi _k } \right) =\left( {K_{\upsilon _k } \left( {\psi _k } \right) } \right) ^{-n_k }\) and \(E\left( \varphi \right) \) denotes the density of the exponential distribution with rate parameter \(\varphi \).
Note that (24) and (25) do not have closed forms, but a Metropolis–Hastings or rejection sampling step can be embedded in the MCMC scheme to obtain draws from them.
5 Applications
In this section, we present a simulation study and applications on two real datasets to evaluate the performance of the proposed SUNGH model for clustering problems. For illustrative purposes, we choose K to be equal to two for all models presented.
5.1 Simulated data
To illustrate some of the differences between the SUNGH family of models, we consider the case of two clusters each sampled from a four-dimensional SUNGH distribution with known parameters which are slightly separated from each other. For the first and second cluster
respectively. Both clusters shared the same parameters for the \(\mathrm{GIG}_*\left( {\upsilon ,\psi ,\eta } \right) \) distribution, where \(\upsilon =-0.5\), \(\psi =1\) and \(\eta =1\). The sample size for each cluster is 300 and 450, respectively. A plot of the simulated data is shown in Fig. 1 with the observations belonging to each cluster labeled by different colors.
For estimation of the different models, largely non-informative prior distributions were used for each of the component parameters: \({\varvec{\mu }} =\left( {\mu _1 ,\ldots ,\mu _4 } \right) ^{\top }\sim N_4 \left( {\mathbf{0},\varvec{\Sigma }} \right) \), where \(\varvec{\Sigma }=10^{3}\varvec{I}_4 \), \(\varvec{\Sigma }\sim IW_\tau \left( {\varvec{T}} \right) \), where \(\tau =4\) and \({\varvec{T}}=\varvec{I}_4 \) with skewness matrix \(\varvec{\Lambda }_{4\times 2} \) with priors of its columns as \(\varvec{\lambda }_t \sim N_4 \left( {{\varvec{\ell }} _t ,{\varvec{L}}_t } \right) \) for which \({\varvec{\ell }} _t =\mathbf{0}\) and \({\varvec{L}}_t =10^{3}\varvec{I}_4 \) and for \(t=1,2\) (and in the matrix variate prior of the skewness matrix we can consider that \(\varvec{\Lambda }_{4\times 2} \sim MN_{4,2} \left( {\mathbf{0},10^{3}\varvec{I}_4 ,10^{3}\varvec{I}_4 } \right) )\), \(\upsilon \sim N\left( {{ 0},10^{3}} \right) \), \(\eta \sim \mathrm{GIG}^{*}\left( {0.001,2000,0} \right) \), \(\psi \sim \exp \left( {0.1} \right) \) and \({\varvec{p}}\sim Dir\left( {1,\ldots ,1} \right) \). Also, we chose \(\kappa \left( u \right) =u\) for the scale mixing function. All computations are implemented in the R software version 3.3.1 (R Core Team 2017) with a core i7 760 processor 2.8 GHz. The R and Nimble code for the models are available from the authors upon request. Gibbs sampling runs of 60,000 iterations with burn-in of 30,000 were used, and convergence criteria were established using the Gelman–Rubin statistic (Gelman and Rubin 1992) and by visual inspection. To address the issue of label switching over the MCMC iterations (Mengersen et al. (2011)), we used the maximum a posteriori estimate (MAP) to select one of the k! modal regions and a distance based measure on the space of parameters to relabel parameters in proximity to this region (Celeux et al. 2000).
Model performance was assessed by comparing the classification accuracy and model selection criteria for different distributions within the family of SUNGH models (see Table 2). For classification accuracy, we report the adjusted rand index (ARI) (Hubert and Arabie 1985) which ranges from 0 (no match) to 1 (perfect match). We also report the EAIC and EBIC which are variations of the classical AIC and BIC criteria for use in a Bayesian setting (Carlin and Louis 2011) (lower values indicate a better fit). In a mixture setting, it is also possible to compare the DIC values using one of the measures suggested by Celeux et al. (2006).
As to be expected, from Table 2 we can see quite clearly that the classification performance of the true model (SUNGH \((q=2)\)) is very good with an ARI of 0.87 and model choice criteria all appear to favor this model. A higher log-likelihood was found for the model SUNGH \((q=3)\) with a similar ARI score to the SUNGH \((q=2)\) model, but on other criteria this model was not favored due to the extra parameters involved. In applied settings, and where the true labels are unknown, a similar trade-off will be made between choosing more complex models with extra flexibility in the skewness matrix (higher q values) and relative improvement in model choice or goodness-of-fit measures. The performance of the SN and SNGH models is also to be expected given the relative lack of flexibility for the skewness parameter to accommodate the degree of skewness in all dimensions in this application.
5.2 Real applications
5.2.1 AIS example
In this example, we consider a dataset from the Australian Institute Sports (AIS) containing measures of physical activity for 202 athletes (102 male and 100 female) based on sex, red cell count, white cell count, hematocrit, hemoglobin, plasma ferritin concentration, body mass index, sum of skin folds, body fat percentage, lean body mass and finally height and weight of the athletes (Cook and Weisberg 1994). The data are available in the R package “sn” (see Azzalini 2015).
To assess the performance of the proposed SUNGH model, we use BMI and body fat percentage (Bfat) to classify male and female athletes. Figure 2a shows the observations for male (in black) and female (in red) athletes according to these two measures, suggesting a reasonably skewed distribution for both males and females with a particularly strong skewed and heavy-tailed distribution for male athletes. Figure 2b–d also shows the fitted contours and assigned labels for each observation for several of the models examined (SNGH, SN and SUNGH).
Table 3 presents the model choice criteria for the different models examined. The results suggest that the SUNGH has the highest log-likelihood and the lowest EAIC, but the SN model has the lowest values for the EBIC and DIC\(_{2}\) measures. However, the ARI for the SN model (\(=0.52\)) is considerably lower than the SUNGH model (\(=0.79\)) suggesting more support for the SUNGH model in terms of classification accuracy. We can also see these results reflected visually in Fig. 2 with the SUNGH (Fig. 2d) able to represent the skewed nature of the distribution for the two groups, particularly for the male athletes. In contrast, the SN model (Fig. 2c) poorly represents the skewed distribution of the female athletes and the heavy-tailed nature of the distribution for the male athletes. As can be expected, the classification results for the SNGH model (Fig. 2b) are visually similar to the SUNGH; however, small differences (due to the reduced flexibility of the skewness parameter for the SNGH) can be observed which impact greatly on the classification accuracy (ARI = 0.64).
5.2.2 Lymphoma example
In another example, we examine a clustering problem for a lymphoma dataset analyzed by Lee and McLachlan (2013b). The data consist of a subset of data originally presented and collected by Maier et al. (2007). In Maier et al. (2007), blood samples from 30 subjects were stained with four fluorophore-labeled antibodies against CD4, CD45RA, SLP 76(pY 128) and ZAP 70 (pY 292) before and after an anti-CD3 stimulation. To illustrate the performance of different distributions within the SUNGH family, we will look at clustering a subset of the data containing the variables CD4 and ZAP70 (Fig. 3), which appear to be bimodal and display an asymmetric pattern. In particular, the largest mode appears to show both strong correlation between the two variables and substantial skewness in both dimensions.
From Fig. 3, we can see a clear difference between the SNGH and SUNGH models with the latter providing a closer fit to the two groups visible in the data. This is further supported by the model choice criteria with all three measures (EAIC, EBIC and DIC\(_{2})\) favoring the SUNGH model (Table 4) compared to SNGH. The SUNGH model is also preferred over the SN model (Fig. 3c) with the latter model not appearing to fit or represent the larger component in the data. Overall, the SUNGH model appears to fit the two groups in these data quite well with the lowest values for two of the three model choice measures (EAIC and EBIC). Using the DIC\(_{2}\) criteria, the lowest value appears to be for the SUT model, so there is some support for this in terms of model choice. However, the difference between DIC\(_{2}\) values for this model and SUNGH is not great (\(\mathrm{SUNGH}=7201.3; \mathrm{SUT}=7198.5\)), suggesting little difference in terms of this measure.
6 Conclusion
We have proposed a flexible family of unrestricted skew-normal generalized hyperbolic (SUNGH) distributions for application in clustering problems which are capable of representing distributions of asymmetric and heavy-tailed forms. The family contains several other well-known symmetric and asymmetric families of distributions such as scale mixtures of the skew-normal family (SMSN) as special cases. Various properties of the SUNGH family are well defined, and estimation of the parameters is relatively straightforward in a Bayesian framework with most of the Gibbs sampling updates available in closed form. Assessments of the performance of the proposed model on simulated and real data suggest that the family provides a considerable degree of freedom and flexibility in modeling data of varying tail behavior and directional shape. As this family of distributions and the parameterization we have adopted preserves several important propositions (e.g., closed under linear combinations), the SUNGH family can be used in a variety of other statistical models (e.g., linear multilevel/mixed models and regression).
References
Andrews, D.R., Mallows, C.L.: Scale mixture of normal distribution. J. Roy. Stat. Soc. B 36, 99–102 (1974)
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)
Arellano-Valle, R.B., Genton, M.G.: Multivariate unified skew-elliptical distributions. Chil. J. Stat. 2, 17–34 (2010)
Arellano-Valle, R.B., Branco, M.D., Genton, M.G.: A unified view on skewed distributions arising from selections. Can. J. Stat. 34, 581–601 (2006)
Arellano-Valle, R.B., Bolfarine, H., Lachos, G.H.: Bayesian inference for skew-normal linear mixed model. J. Appl. Stat. 33, 561–574 (2007)
Azzalini, A.: Package ‘sn’. http://azzalini.stat.unipd.it/SN (2015). Accessed 13 May 2017
Azzalini, A., with the collaboration of Capitanio, A.: The Skew-Normal and Related Families. IMS Monographs Series. Cambridge University Press (2014)
Barndorff-Nielsen, O.: Hyperbolic distributions and distributions on hyperbolae. Scand. J. Stat. 5, 151–157 (1978)
Barndorff-Nielsen, O., Blaesild, P.: Hyperbolic distributions. In: Kotz, S., Johnson, N.L., Read, C. (eds.) Encyclopedia of Statistical Sciences, vol. 3. Wiley, New York (1980)
Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38, 309–311 (1977)
Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on the scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)
Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping and Others. Chapman & Hall, Boca Raton (2000)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. Can. J. Stat. 43(2), 176–198 (2015)
Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis. CRC Press, Boca Raton (2011)
Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixture posterior distributions. J. Am. Stat. Assoc. 95, 957–970 (2000)
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–674 (2006)
Chhikara, R.S., Folks, J.L.: The Inverse Gaussian Distribution. Marcel Dekker, New York (1989)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tail weight: application to robust clustering. Stat. Comput. 24(6), 971–984 (2014)
Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1149–1157 (2014)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer Series in Statistics. Springer, Berlin (2006)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of skew-normal and skew-t distributions. Biostatistics 11(2), 317–336 (2010)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)
Genton, M.G.: Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall, Boca Raton (2004)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–260 (1953)
Hogan, J.W., Laird, N.M.: Mixture models for the joint distribution of repeated measures and event times. Stat. Med. 16, 239–258 (1997)
Holzmann, H., Munk, A., Gneiting, T.: Identifiability of finite mixtures of elliptical distributions. Scand. J. Stat. 33(4), 753–763 (2006)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continous Univariate Distributions, vol. 1. Wiley, New York (1994)
Jørgensen, B.: Statistical Properties of the Generalized Inverse Gaussian distribution. Springer, New York (1982)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19(1), 73–83 (2009)
Lachos, V.H., Bolfarine, H., Arellano-Valle, R.B.: Likelihood-based inference for multivariate skew-normal regression models. Commun. Stat. Theory Methods 36(9), 1769–1786 (2007)
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010)
Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22(4), 427–454 (2013a)
Lee, S.X., McLachlan, G.J.: On mixtures of skew normal and skew t distributions. Adv. Data Anal. Classif. 7(3), 241–266 (2013b)
Lee, S.X., McLachlan, G.J.: Finite mixtures of multivariate skew t distributions: some recent and new results. Stat. Comput. 24, 181–202 (2014)
Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Stat. Comput. 26, 573–589 (2016)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100(2), 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20(3), 343–356 (2010)
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modeling using the skew-normal distribution. Stat. Sin. 17(b), 909–927 (2007)
Lin, T.I., Ho, H.J., Chen, C.L.: Analysis of multivariate skew normal models with incomplete data. J. Multivar. Anal. 100(10), 2337–2351 (2009)
Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L.S., Hafler, D.A.: Allelic variant in CTLA4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)
Maleki, M., Arellano-Valle, R.B.: Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions. J. Stat. Comput. Simul. 87(6), 1061–1083 (2017)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton (2005)
Mengersen, K., Robert, C., Titterington, D.M.: Mixtures: Estimation and Applications. Wiley, Chichester (2011)
Morris, K., McNicholas, P.D., Punzo, A., Browne, R.P.: Robust Asymmetric Clustering. ArXiv e-print arxiv:1402.6744 (2014)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. 106(21), 8519–8524 (2009)
R Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2017). Accessed 20 June 2017
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31(2), 129–150 (2003)
Seshadri, V.: The Inverse Gaussian Distribution: A Case Study in Exponential Families. Oxford University Press, New York (1993)
Teicher, H.: Identifiability of finite mixtures. Ann. Math. Stat. 34(4), 1265–1269 (1963)
Vilca, F., Balakrishnan, N., Zeller, C.B.: Multivariate skew-normal generalized hyperbolic distribution and its properties. J. Multivar. Anal. 128, 73–85 (2014)
Vrbik, I., McNicholas, P.D.: Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Stat. Probab. Lett. 82(6), 1169–1174 (2012)
Wang, H.X., Zhang, Q.B., Luo, B., Wei, S.: Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recogn. Lett. 25(6), 701–710 (2004)
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Digital Image Computing: Techniques and Applications, Los Alamitos, California, pp. 526–531. IEEE (2009)
Wraith, D., Forbes, F.: Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. Comput. Stat. Data Anal. 90(Oct.), 61–73 (2015)
Acknowledgements
The authors would like to thank the coordinating editor and anonymous reviewers for their suggestions, corrections and encouragement, which helped us to improve earlier versions of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A.1. Proof of Propositions 1 to 6
In this appendix, we prove Propositions 1 to 6.
Proof of Proposition 1
By considering (7),
- (a):
- (b):
\(\square \)
Proof of Proposition 2
By considering the stochastic representation (7) and the fact that \(\varvec{W}_{0} \) (and so \({\varvec{W}}\)) are uncorrelated, this subject proved. In the case of \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\varvec{\Lambda }_{p\times q} }&{} {\mathbf{0}_{p\times m} } \\ \end{array} }} \right) \), relation (7) for \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q+m} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }^{*},\varpi } \right) \) is equivalent to \({\varvec{Y}}={\varvec{\mu }} +\varvec{\Lambda }^{*}{\varvec{W}}+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 ={\varvec{\mu }} +\varvec{\Lambda }{\varvec{W}}^{\left( 1 \right) }+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 \), where \({\varvec{W}}^{\left( 1 \right) }\) is the first q components of W, and in the case of \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\mathbf{0}_{p\times m} }&{} {\varvec{\Lambda }_{p\times q} } \\ \end{array} }} \right) \), relation (7) for \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q+m} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }^{*},{\varvec{\varpi }} } \right) \) is equivalent to \({\varvec{Y}}={\varvec{\mu }} +\varvec{\Lambda }^{*}{\varvec{W}}+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 ={\varvec{\mu }} +\varvec{\Lambda }{\varvec{W}}^{\left( 2 \right) }+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 \), where \({\varvec{W}}^{\left( 2 \right) }\) is the last q components of \({\varvec{W}}\)\(\square \)
Proof of Proposition 3
By considering the stochastic representation (7), we have that \({\varvec{b}}+{\varvec{BY}}={\varvec{b}}+{\varvec{B}}{\varvec{\mu }} +{\varvec{B}}\varvec{\Lambda }{\varvec{W}}+\kappa \left( U \right) ^{1/2}\left( {{\varvec{B}}\varvec{\Sigma }{\varvec{B}}^{\top }} \right) ^{1/2}{\varvec{W}}_1 \)\(\square \)
Proof of Proposition 4
By considering Proposition 3, with \({\varvec{b}}=\mathbf{0}\) and the matrix \({\varvec{B}}\) in the form of \(\left( {{\begin{array}{cc} {{\varvec{I}}_{p_1 } }&{} {{\varvec{0}}_{p_1 \times p_2 } } \\ \end{array} }} \right) \) or \(\left( {{\begin{array}{cc} {{\varvec{0}}_{p_2 \times p_1 } }&{} {{\varvec{I}}_{p_2 } } \\ \end{array} }} \right) \), respectively, this subject proved \(\square \)
Proof of Proposition 5
Since \({\varvec{Y}}=\left( {{\varvec{Y}}_1^\top ,{\varvec{Y}}_2^\top } \right) ^{\top }\), from part b) of the Proposition 1, we have \(\hbox {Var}\left[ {\varvec{Y}} \right] =\left( {\hbox {Cov}\left( {{\varvec{Y}}_i ,{\varvec{Y}}_j } \right) } \right) _{i,j=1,2} =\left( {\varvec{\Sigma }_{ij} +\varvec{\Lambda }_i \left[ {\left( {k_2 -k_1^2 } \right) \frac{2}{\pi }{} \mathbf{1}_q \mathbf{1}_q^\top -\frac{2}{\pi }k_2 I_q } \right] \varvec{\Lambda }_j^\top } \right) \). Thus, if \(\varvec{\Sigma }_{12} =\mathbf{0}\), then \(\hbox {Cov}\left( {{\varvec{Y}}_1 ,{\varvec{Y}}_2 } \right) =\varvec{\Lambda }_1 \big [ \left( {k_2 -k_1^2 } \right) \frac{2}{\pi }{} \mathbf{1}_q \mathbf{1}_q^\top -\frac{2}{\pi }k_2 I_q \big ]\varvec{\Lambda }_2^\top \), thus following that each of the conditions \(\varvec{\Lambda }_1 =\mathbf{0}\) or \(\varvec{\Lambda }_2 =\mathbf{0}\) leads to \(\hbox {Cov}\left( {{\varvec{Y}}_1 ,{\varvec{Y}}_2 } \right) =\mathbf{0}\)\(\square \)
Proof of Proposition 6
The first part follows by applying Proposition 2 in the Proposition 4. For the proof of the second result, note from the proof of Proposition 5 that
Thus, using the partitions \({\varvec{I}}_q =\hbox {diag}\left( {{\varvec{I}}_{q_1 } ,{\varvec{I}}_{q_2 } } \right) \) and \(\mathbf{1}_q =\left( {\mathbf{1}_{q_1 }^\top ,\mathbf{1}_{q_2 }^\top } \right) ^{\top }\) we obtain the proof \(\square \)
1.2 A.2. Matrix variate priors for skewness matrix
Considering the matrix variate priors in the form of \(\varvec{\Lambda }_k \sim MN_{p,q} \left( {{\varvec{N}}_k ,{\varvec{S}}_k ,{\varvec{F}}_k } \right) ,k=1,\ldots ,K\), where MN denotes the matrix normal distributions, this leads to the following posteriors instead of (19) as follows:
\(\left. {\hbox {vec}(\varvec{\Lambda }_k )} \right| \varvec{\Theta }_{\left( {-\varvec{\Lambda }_k } \right) } ,{\varvec{y}},{\varvec{u}},{\varvec{w}},z_i =k\sim N_{pq} \left( {{\varvec{\mu }} ,\varvec{\Sigma }} \right) ;k=1,\ldots ,K\), where
where \({\varvec{L}}_{ik} ={\varvec{w}}_{ik} {\varvec{w}}_{ik}^\top \) and \({\varvec{M}}_{ik} =\left( {{\varvec{y}}_i -{\varvec{\mu }} _k } \right) {\varvec{w}}_{ik}^\top \), for which \(\otimes \) denotes the Kronecker product and \(\hbox {vec}\) denotes the vectorization of a matrix (a linear transformation which converts the matrix into a column vector).
Using these forms for the Gibbs updates may improve mixing and convergence to a stationary distribution. However, they involve the use of matrix variate distributions for which users may not be familiar; hence, a simpler (computational) update is provided in the main text.
Rights and permissions
About this article
Cite this article
Maleki, M., Wraith, D. & Arellano-Valle, R.B. Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput 29, 415–428 (2019). https://doi.org/10.1007/s11222-018-9815-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9815-5