Keywords

1 Introduction

GMMs are semi-parametric density functions, which are represented as a weighted sum of Gaussian densities called components. They can practically approximate any density shape using a finite number of components. They are very useful for data clustering in particular for image segmentation where they have demonstrated a good performance [4, 5]. Although using too many components increases the approximation accuracy, it often results in complex models which overfit data. To obtain less complex models, two approaches are used. The first approach (horizontal) generates multiple GMMs with different numbers of components, and choose the best one according an information theoretic criterion (e.g., AIC, BIC) [2, 11]. The second approach (vertical), which is the interest of our work, starts from a GMM with a high number of components and reduce it to a less complex GMM by iteratively fusing the different components into Gaussians [8, 13].

A simple method to reduce a GMM is to eliminate components that do not contribute much to the mixture. Alternatively, instead of deleting components of the GMM or re-estimating it, one could merge components while preserving the structure of the data [13]. The idea is to merge similar components that “belong together” in a clustering sense. Therefore, there is a need to formalize “component similarity”. Among the used measures, Kullback-Leibler divergence (KLD) is the most popular [17]. However, there is no closed-form expression for the KLD between two GMMs [16]. Because of this limitation, several existing methods use an approximation of the KLD using sampling methods [13, 14, 17]. However, such approaches can be computationally expensive, especially when the dimensionality of the data is very high. On the other hand, mixture reduction is achieved by fusing two components at a time in the GMM into a single Gaussian. This can end up in the long run to bold components having large variance. In other words, merging two or more Gaussians is not necessarily a Gaussian-distributed as in the case, for example, in foreground image segmentation where both foreground (resp. background) regions are not necessarily Gaussian [3, 5].

In this paper, we introduce a new technique for GMM reduction into one or multiple small GMMs describing the different structures of the data. More specifically, an initial GMM is reduced iteratively by clustering its components into separate small GMMs describing the structure of the data in a hierarchical way. To facilitate components clustering, we use the Cauchy-Schwarz divergence (CSD) which has a closed-form expression and its computational complexity varies linearly with the dimension of data. We apply our reduction algorithm to foreground segmentation in still images, namely in salient object and skin lesion segmentation. Comparison with state-of-the art methods have shown the performance of our method.

The rest of this paper is organized as follows: Sect. 2 briefly describes the theoretical background of our algorithm. Section 3 presents details of our proposed method for mixture reduction problem. Section 4 describes experimental results such as qualitative and quantitative results. We end the paper with a conclusion and future work perspectives.

2 Background

Given a random vector \(x\in R^{D}\), we consider two GMMs with K and M their respective numbers of mixture components. Let f(x) and g(x) denote respectively their probability density functions (PDF) as follows:

$$\begin{aligned} f(x)=\sum \limits _{k=1}^{K}\pi _{k}\mathcal{N}(x\vert \mathbf {\mu }_{k},\mathbf {\Sigma }_{k}), \qquad g(x)=\sum \limits _{m=1}^{M}\omega _{m}\mathcal{N}(x\vert \mathbf {\nu }_{m}, \mathbf {\Lambda }_{m}) \end{aligned}$$
(1)

where \((\pi _{k}, \mathbf {\mu }_{k}, \mathbf {\Sigma }_{k})\), \(k=1,...,K\) are the weights, mean vectors and covariance matrices of the Gaussians composing mixture f, and \((\omega _{m}, \nu _{m}, \mathbf {\Lambda }_{m})\), \(m=1,...,M\), are the same parameters composing mixture g. \(\mathcal{N}(x\vert \mathbf {\mu }_{k}, \mathbf {\Sigma }_{k})\) and \(\mathcal{N}(x\vert \nu _{m}, \mathbf {\Lambda }_{m})\) are the multivariate Gaussian distributions.

The Cauchy-Schwarz divergence is based on the Cauchy-Schwarz inequality for inner products, and it is formulated between f and g as follows [16]:

$$\begin{aligned} CSD(f,g)= -\log \left( \frac{\int f(x)g(x)dx}{\sqrt{\int f(x)^2dx \int g(x)^2dx}}\right) \end{aligned}$$
(2)

It is clear that the term inside the parentheses takes its values in the interval [0, 1] with equality to 1 if and only if \(f(x) = g(x)\), such that CSD is always nonnegative and symmetric. The closed-form expression for CSD of a pair of GMMs can be derived using the Gaussian multiplication and identities, which forms the basic building block of (2). We obtain a closed-form expression for the CSD, which does not depend on x:

$$\begin{aligned} CSD(f,g)= & {} -\log \Bigg (\sum \limits _{k=1}^{K}\sum \limits _{m=1}^{M}\pi _{k}\omega _{m}\mathcal{N}(\mathbf {\mu }_{k}\vert \mathbf {\nu }_{m}, (\mathbf {\Sigma }_{k}+\mathbf {\Lambda }_{m}))\Bigg ) \nonumber \\+ & {} \frac{1}{2}\log \Bigg (\sum \limits _{k=1}^{K}\sum \limits _{k'=1}^{K}\pi _{k}\pi _{k'}\mathcal{N}(\mathbf {\mu }_{k}\vert \mathbf {\mu }_{k'}, (\mathbf {\Sigma }_{k}+\mathbf {\Sigma }_{k'}))\Bigg ) \nonumber \\+ & {} \frac{1}{2}\log \Bigg ( \sum \limits _{m=1}^{M}\sum \limits _{m'=1}^{M}\omega _{m}\omega _{m'}\mathcal{N}(\mathbf {\nu }_{m}\vert \mathbf {\nu }_{m'}, (\mathbf {\Lambda }_{k}+\mathbf {\Lambda }_{m'}))\Bigg ) \end{aligned}$$
(3)

This expression has computational complexity order of \(O(K^{2})\) and not integral computation is required as in the case of the KLD. Therefore, the CSD is chosen in this work because it can be computed in closed-form which results in a faster algorithm.

3 Proposed Mixture Reduction Method

The proposed method, coined CSGMR (Cauchy-Schwarz divergence for Gaussian mixture reduction), is formulated as follows. Let us consider a GMM with K multivariate Gaussian densities of the form:

$$\begin{aligned} f\big (x|\mathbf {\Theta }\big ) = \sum _{k=1}^{K} \pi _k \mathcal{N}(x| \mathbf {\mu }_{k}, \mathbf {\Sigma }_{k}), \end{aligned}$$
(4)

where \(\mathbf {\mu }_{k}, \mathbf {\Sigma }_{k}\) represent the mean vector and covariance matrix of the kth component, \(\mathbf {\Theta } = \{\pi _k,\mu _{k}, \mathbf {\Sigma }_{k}\}_{k=1}^{K}\) is the set of all mixture parameters, with \(\pi _k\) are the component weights that satisfy the constraints: \(0\le \pi _k \le 1 \quad \) and \(\quad \sum _{k=1}^K {\pi _k}=1\). Generally, the Expectation-Maximization (EM) algorithm [9] based on the maximum likelihood estimation (MLE) is used to calculate the GMM parameters.

Given the mixture f in (4), we wish to find a reduced mixture model g M having components (\(M<< K\)) by collapsing the components of f in those of g. Note that a similar challenge has been addressed in [5] using finite mixtures of generalized Gaussians. However, the method is hard to generalize to multiple dimensions of data.

Contrarily to traditional mixture reduction methods which output a single mixture [8, 13, 14], our method outputs several mixture models describing different clusters in the structure of data with arbitrary dimension. This characteristic of our method will reveal its importance for clustering problems such as image segmentation, where data structure comprises several non-Gaussian groups. In addition, the reduction is carried out without causing a strong deviation from the structure of the original GMM one according to CSD measure. Suppose that the density of reduced model is:

$$\begin{aligned} g(x|\mathbf {\Phi })=\sum _{m=1}^{M}\omega _{m} p_m(x|\mathbf {\Psi }_m), \end{aligned}$$
(5)

where \(p_m(x|\mathbf {\Psi }_m) =\sum _{j=1}^{K_{m}}\pi _{jm}\mathcal{N}(x\vert \mathbf {\mu }_{jm}, \mathbf {\Sigma }_{jm})\), with \(\mathbf {\Psi }_m = \{\pi _{jm},\mathbf {\mu }_{jm}, \mathbf {\Sigma }_{im}\}_{j=1}^{K_m}\) and \(\mathbf {\Phi }=\{\omega _{m}, \mathbf {\Psi }_{m}\}_{m=1}^{M}\). The reduction is made in such a way that \(\sum \limits _{m=1}^{M} K_{m}<< K\). Note that when \(K_m=1\), \(m=1,..., M\), we are in the classical mixture reduction scheme where each reduced component in g is a Gaussian, as proposed in [8, 13].

3.1 Proposed Algorithm

Our mixture reduction method is similar to classical agglomerative clustering, where initial groups are Gaussian components which are iteratively merged into clusters. Therefore, the input of our algorithm is an initial mixture model \(f(x|\mathbf {\Theta })\) having a large number of components, and the output is a reduced (super) mixture model \(g(x|\mathbf {\Psi })\) composed of several small GMMs. For this purpose, we maintain similarity matrix indicating CSD distances between formed GMMs (clusters) at each merging iteration.

Let \(f_1, f_2,..., f_{N_c}\), with \(M\le N_c\le K\), be intermediate models resulting from fusions of initial GMM components. A similarity matrix S of size \(N_c \times N_c\) is computed, where \(S_{rs} = CSD(f_r, f_s)\), \(\forall r\ne s\) and \(r,s = 1,...,N_c\). At each iteration of the reduction algorithm, we select the most similar groups to fuse, thus causing the smallest overall change in the initial mixture structure. More specifically, let \(f_r\) and \(f_s\) be two candidate models to be merged, where \(K_r\) and \(K_s\) are their number of components, respectively. To estimate the optimal number of components of their fusion, we use the Akaike (AIC) criterion [4] that will search the minimum of the following function:

$$\begin{aligned} K_{rs} = \mathop {\text {argmin}}\limits _{l \in [max(K_r,K_s),..., K_r + K_s]} \Big \{ -\log (\mathbf {\Psi }_l) + 2\lambda _l \Big \}, \end{aligned}$$
(6)

where \(\log (\mathbf {\Psi }_l)\) is the log-likelihood corresponding to a GMM model with the number of components l, and \(\mathbf {\Psi }_l\) is the MLE estimate. Finally, \(\lambda _l\) is the number of the free parameters in that model. The following script summarizes our reduction algorithm:

figure a

3.2 Foreground Image Segmentation

We apply the proposed method for foreground segmentation in natural and medical images (see illustration in Fig. 1). First, we generated an initial GMM by over-segmenting the image into superpixels using the SLIC method [1]. The superpixel itself is not enough to provide a robust image descriptor for segmentation, since the consistency of its neighborhood is not considered [10]. To take advantage of the superpixel structure and our method for segmentation, let K be the number of produced superpixels. We model each superpixel as a multivariate Gaussian with parameters given by the mean vector and covariance matrix of the RGB color of its composing pixels. We then use our mixture reduction algorithm to form the foreground and background regions (\(M=2\)).

Fig. 1.
figure 1

Example of foreground segmentation.

To obtain compact regions for our segmentation, we enforce our reduction algorithm by imposing that two segments \(S_i\), and \(S_j\) can be merged together only if they are spatially adjacent. To this end, we construct and maintain a region adjacency graph G in the clustering process, where the adjacency set of \(S_i\) is denoted by \(G_i\). Since our mixture reduction algorithm can output any number of clusters, we can tune the segmentation algorithm to produce any number of desired regions constituting the important structures of the image. For example, in binary segmentation where the objective is to segmentation the foreground from background, the output is \(M=2\).

4 Experiments and Results

We conducted several tests to assess the effectiveness and the usefulness of the proposed method. First, we compare the effectiveness of the proposed method against two of the best algorithms in the literature, namely COWA [6] and Runnalls [17]. We randomly generate a GMM with 10 components and 4 dimensions and search for an optimal reduced model comprising 5 components. We also conducted a binary classification of 20 Gaussians drawn from a sample of 3000 observations. As shown in Table 1, we can see that [17] has the lowest execution time followed by our method. However, our approach surpasses [17] and [6] in term of the reduction quality.

Table 1. Mixture simplification results.
Fig. 2.
figure 2

Result of CSGMR on the ISIC and MSRA10K datasets: Original images (first column), initial superpixels (second column), reduction with (\(M= 12\) top third column and \(M= 6\) bottom third column), \(M = 2\) (last column).

Table 2. Quantitative segmentation performance: (a) for ISIC and (b) for MSRA10K.
Fig. 3.
figure 3

Visual comparative results: Original images (first column), Ground-truth (second column), [12] result (third column) and CSGMR result (last column).

In order to illustrate the performance of the proposed method for image segmentation, we computed GMMs of different sizes (\(K= 100, 125, 500\)) for each image and perform our mixture reduction algorithm. The experiments were performed using two different public datasets: ISIC dataset [15] and MSRA10K dataset [7]. All experiments were performed without any correction of the images. The visual results are shown in Fig. 2. In Fig. 2 (column two) with \(K=181\) and \(K=126\), the reduced mixture is, respectively, \(M=12\) and \(K=6\) (Fig. 2 column three). As presented in Fig. 2 (column 3), the foreground and the background are modeled by a mixture. In order to compare the results of the proposed method with [12], we focus our experiment in foreground and background detection i.e. \(M = 2\) (see Fig. 3, column 3). The evaluation performance between our approach and [12] is conducted with 500 images in each dataset. The metrics used to quantify the segmentation result are recall, precision, accuracy. Note that higher values of these metrics indicate better results. The segmented image is the result of the reduced mixture. We can see that the images are segmented more correctly by using our proposed method. A quantitative comparative study between different methods is depicted in Table 2. It represents the average metrics for the 500 images selected from each dataset. According to these qualitative and quantitative results, obtained results with CSGMR are very encouraging and useful.

5 Conclusion

We have proposed an efficient GMM reduction algorithm based on CSD hierarchical clustering. Our method enables to quickly compute a compact version of an initial GMM to an another GMM constituted of non-Gaussian clusters. Experimental results on the simulated and image segmentation showed the effectiveness of the proposed technique with in comparison to other techniques. Note that our model can be extended easily into other exponential family distributions.