Keywords

1 Introduction

Interactive image segmentation, which uses prior knowledge by interacting with the users, is preferred in many applications due to its superior performance [1]. In this approach the users mark some regions belong to foreground and background and initialize the method with them. The most used interactive segmentation methods are GraphCut [2] and its variants (especially GrabCut).

The GraphCut method considers the image as a graph and utilize the mathematical operations and algorithms developed for graph theory. It equalizes the boundary and region features on all segments [2]. Many studies have been focused on the GraphCut and new methods have been developed. For example: Adding some other information to the energy functions [3, 4]; using region and boundary information together [5]. Among these methods the most interested one is the GrabCut method [1].

GrabCut method has improved the optimization of the GraphCut method, converted it to an iterative procedure and used the “border matting” to enhance the boundary segmentation performance. It also ask for users to select a rectangular region only, as an initial interaction.

In recent years some works have been implemented to improve the performance of the original GrabCut segmentation method. In [6], the depth information is also used on the energy function. Similarly, in [7] the texture information is used additionally. Khattab et al. [8] enhance the method to give the ability to find more than two objects and reduce the user interaction. In [9], the authors proposed to use the saliency map.

GrabCut is a segmentation method based on iterative energy minimization that use the probability model for color distributions of pixels. Hence, unexpected results may occur when the boundaries between the foreground and background have low contrast. To resolve this problem, the energy function of the method is reformulated in [10].

In this work we propose to use contrast enhancement methods as a preprocessing step for the GrabCut method. Thus, we aim to improve the original GrabCut method’s segmentation performance. From the contrast enhancement methods, Contrast Limited Adaptive Histogram Equalization (CLAHE) method [11] has been chosen and used for this study, because it operates on local regions and it is robust to noisy images. In the literature there are two works in which CLAHE and GrabCut methods have been used [12, 13]. On the other hand these works are application-specific and there are some other steps between CLAHE and GrabCut (in [12] CLAHE + brightness preserving dynamic fuzzy histogram + CLAHE + morphological operations + Gabor wavelets + GrabCut; in [13] preprocessing + CLAHE + thresholding and morphological operations + GrabCut have been proposed). Furthermore, they did not applied the CLAHE on all of the RGB channels (e.g. in [13] on the grey-level of the image only).

2 Methodology

2.1 GrabCut Method

GrabCut [1] is one of the best known GraphCut-based methods in the literature. Rother et al. proposed to use color information with GMM; minimize the energy function iteratively; and use incomplete trimaps, additionally. Basic steps of the method are:

  • Step 1. Initialize the trimap T which consists of known background TB, unknown TU and foreground TF regions, by drawing a rectangle on the image. Outside of the rectangle is determined as TB and the complement of it as TU. The initial TF is empty.

  • Step 2. Perform initial segmentation α = 1 , …, α i , …, α N ):

    $$ \alpha_{i} = 0,\,for \, i\, \epsilon\, T_{B} ;\,\,\, \alpha_{i} = 1,\,for \, i\, \epsilon\, T_{U} $$
    (1)
  • Step 3. Initialize two Gaussian Mixture Model (GMM), (one for background and other for foreground) with the previously obtained α segmentation.

  • Step 4. For each pixel in the unknown trimap TU, find the most appropriate Gaussian components from the background and foreground GMMs, separately.

$$ k_{i} = \, arg \, min \, k_{i} D_{i} \left( {\alpha_{i} , \, k_{i} , \, \theta , \, z_{i} } \right) $$
(2)

where k i ϵ {1,…,K} is an additional parameter assigned to each pixel to define the most likely GMM component of a pixel, K is the number of GMM components, and D i is the data term of the Gibbs energy function. The D i can be calculated by:

$$ \begin{aligned} D \, \left( {\alpha_{i} , \, k_{i} , \, \theta , \, z_{i} } \right) \, = \, & - \, log \, \pi \left( {\alpha_{i} , \, k_{i} } \right) \, + \, 0.5 \, log \, det\;\varvec{\varSigma}\left( {\alpha_{i} ,k_{i} } \right) \\ & + \, 0.5\left[ {z_{i} -\varvec{\mu}\left( {\alpha_{i} , \, k_{i} } \right)} \right]^{T}\varvec{\varSigma}\left( {\alpha_{i} , \, k_{i} } \right) \, [z_{i} -\varvec{\mu}\left( {\alpha_{i} , \, k_{i} } \right) \\ \end{aligned} $$
(3)

where π(.) is the mixture weighting coefficient, μ is the mean vector, Σ is the covariance matrix and z i is the intensity value of a pixel.

  • Step 5. Update the GMM parameters from the data previously clustered.

    $$ \theta \, = \, arg \, min_{\theta }\varvec{\varSigma}_{i} \left[ {D\left( {\alpha_{i} , \, k_{i} , \, \theta , \, z_{i} } \right)} \right] $$
    (4)
  • Step 6. To find the new clustering of the pixels perform the min cut algorithm as [1].

$$ min_{{\{ \alpha i \, : \, i \epsilon Tu\} }} min_{k} E\left( {{\varvec{\alpha}}, \, k, \, \theta , \, z} \right) $$
(5)

where Gibbs Energy function can be derived by the formula:

$$ E\left( {\varvec{\alpha}, \, k, \, \theta , \, z} \right) \, =\varvec{\varSigma}_{i} \left[ {D\left( {\alpha_{i} , \, k_{i} , \, \theta , \, z_{i} } \right)} \right] + V\left( {\varvec{\alpha}, \, z} \right) $$
(6)
$$ \text{V(}\varvec{\alpha}\text{,}z) = \gamma \sum\nolimits_{(i,j) \in \epsilon C} {[\alpha_{i} \ne \alpha_{j} ]} exp\left( { - \beta \left\| {z_{i} - z_{j} } \right\|^{2} } \right) $$
(7)

where V is the smoothness term of the energy function, C is a set of neighboring pixels, γ is the smoothness coefficient, β is a constant.

  • Step 7. Repeat steps 4–6 until the energy function converges to the predefined value.

In this work, we only use the hard segmentation process described above. The number of the Gaussian components, β constant, and the smoothness coefficient γ are determined empirically as 5, 5 and 50, respectively.

2.2 The Proposed Method

The original GrabCut method gives poor results for some images (especially for the images with low contrast object boundaries). To get rid of this weakness, Khattab et al. [10] proposed to reformulate the energy function of the method. On the contrary, we have not changed the original method, and proposed to use a contrast enhancement method before the original method as a preprocessing step.

In this work the Contrast Limited Adaptive Histogram Equalization (CLAHE) method is preferred for the contrast enhancement task due to these reasons: (1) it enhance the contrast of the local regions, which provides more information; (2) it is not greatly affected by the image noise because of the contrast limitation.

CLAHE Method

The CLAHE method is proposed by Pizer et al. in [11]. Main steps of the CLAHE can be summarized as follows:

  • Step 1. Divide the image into non-overlapping local regions (grids). Minimum size of the grid should be 32 × 32. And calculate the histogram for all grids, separately.

  • Step 2. Clip the histogram to avoid being affected by the noise. If the number of pixels for any intensity value is greater than a predetermined threshold value, it should be fixed to that threshold. In this case, to equalize the total number of pixels, the clipped number of pixels are distributed to the histogram uniformly.

  • Step 3. Perform the histogram equalization on the histograms obtained in step 2. Combine the neighboring grids and use the bilinear interpolation to eliminate the boundary artifacts. For the pixels in the center of the grids use interpolation of the four neighboring pixels. Although, the original method is developed for the gray level images, it has been used for color images with different schemes. In this work we use the “CLAHE on RGB model” described in [14].

3 Experiments and Results

3.1 Data Set

To evaluate the performance of the proposed approach we began with constituting a data set. The images are selected from the Caltech 256 images dataset [15] according to the following criteria: (1) do not select the images that have uniform backgrounds; (2) maximally select one image per object category; (3) select images which have cluttered backgrounds and complex shaped objects. Although most of the studies performed the experiments on 25 images, 40 images are chosen for this work.

3.2 Experimental Results

The performance of the proposed method has been evaluated visually and quantitatively with the segmentation results of the images on the constituted data set. Six well known performance metrics were used in the experiments for quantitative evaluation: Accuracy, Dice Coefficient, Jaccard Index, Precision, Sensitivity, and Specificity. In the experiments the region that is marked as known background is not taken into account, because there is no possibility to segment erroneously. Table 1. shows the average values of the quality metrics, for the proposed method compared to original method.

Table 1. Quantitative results of the proposed and original GrabCut methods on the image set.

It is clear from Table 1. that the proposed method achieved better results than the original GrabCut method for all the metrics. There is approximately 4% improvement with the proposed method.

To interpret the results much better, all the segmentation results of the images for both methods were analyzed visually. Figure 1 shows some images from the data set and their segmentation results in a comparative manner.

Fig. 1.
figure 1

Sample images and segmentation results for various scenarios. (a) Original image and initial bounding box; b, (c) Segmentation results of the GrabCut and the proposed method.

In the first two lines sample images which have low contrast are given. For these images, the proposed method gives better results noticeably. In the first image, although a part of the sky is included in the foreground; the bottom casing of the airplane and the logo of the airline company are not included in the foreground with the GrabCut method. In the second image, the snail is not successfully separated from the background with the original method. Third image contains a humming bird which flaps its wings at high frequency that results with some uncertainties on the boundaries. The proposed method also overcome this problem by enhancing the contrast in a local manner. Another scene is underexposed images taken under the sea. These images have low contrast and high noise as shown in the last line. Although the original GrabCut method gives weak performance, the proposed method provides sufficiently good results.

For the other schemes which are not visually shown above, both methods give very similar segmentation results.

4 Discussion

In this paper, an improvement to the GrabCut method is proposed to deal with the segmentation difficulties occurred when the images have low-contrast regions. The improvement has been done by using a contrast enhancement method (CLAHE on RGB colour bands) before the GrabCut method as a preprocessing step. The performance tests have been achieved on a data set consists of 40 images. According to the obtained results it is evident that the proposed method is superior to the original GrabCut method, especially for the low-contrast images. It has no drawbacks for the other images.

For the future work, the effect of the other contrast enhancement and image preprocessing techniques to the original GrabCut method might be studied.