1 Introduction

The progress of human civilization and a comfortable lifestyle are accompanied with the pollution of the air, soil, and seas. Fortunately, the use of remote-sensing satellites allows us to track the pollution timely.

Image segmentation is dividing the image into a number of procedures with consistency and non-overlapping regions. The topic of the interactive image segmentation has received considerable attention in the computer vision community in the last decades (Kolmogorov and Zabin 2004; Boykov et al. 2010; Boykov and Funka-Lea 2006). This paper is focused on how to detect the pollution region in remote-sensing (RS) images efficiently with interaction. The aim is to achieve high performance with the modest interactive effort for users. In general, the degrees of interactive effort range from editing individual pixels, at the labor-intensive extreme, to merely touching foreground and/or background in a few locations.

2 A brief review of interactive image segmentation

In the following, we categorize different methods of interactive image segmentation by their methodology and user interfaces, mainly including segmentation in discrete domain and segmentation in continuous domain.

The known appearance models typically are assumed in discrete domain of segmentation methods. The log-likelihoods of appearance are optimized in combination with some spacial regularization. This problem is relatively simple and many methods are guaranteed globally optimal results. The appearance models jointly with segmentation are estimated in continuous domain of segmentation methods. The model parameters are treated as additional variables transforming simple segmentation energies into high-order NP-hard functionals. It is known that such methods indirectly minimize the appearance overlap between the segments.

2.1 Segmentation in discrete domain

Boykov and Jolly (2001) were the first to formulate a simple generative Markov random field (MRF) model in discrete domain for the task of binary image segmentation. This basic model can be used for interactive segmentation. Given some user constraints in the form of foreground and background brushes, i.e., regional constraints, the optimal solution is computed very efficiently with graph cut. The main benefits of this approach are: global optimality, practical efficiency, numerical robustness, ability to fuse a wide range of visual cues and constraints, unrestricted topological properties of segments, and applicability to n-dimensional problems. Thanks to their work, many articles on interactive segmentation using graph cut and a brush interface were published (Boykov and Jolly 2001; Grady 2006). It also inspired the Grab-Cut system (Rother et al. 2004), which can be used to solve a more challenging problem, namely, the joint optimization of segmentation and estimation of global properties of the segments. The benefit is a simpler user interface in the form of a bounding box. Note, such joint optimization has been used in other contexts before. An example is the depth estimation in stereo images, where the optimal partitioning of the stereo images and the global properties (affine warping) of each segment are optimized jointly (Boykov and Jolly 1999). Slabaugh and Unal (2005) proposed the energy function incorporating an elliptical shape prior, which improved the accuracy of the circular object segmentation. Juan and Boykov (2006) pointed out that the key to improve the speed of interactive segmentation is to improve the efficiency of max-flow/min-cut algorithms. They designed the ActiveCuts (AC) algorithm. Another interesting set of discrete functionals are based on ratio, e.g., area over boundary length (Kolmogorov et al. 2007).

2.2 Segmentation in continuous domain

There are very close connections between the spatially discrete MRFs and variational formulations in the continuous domain. The first continuous formulations were expressed in terms of active contours with edges (Chan and Vese 2001), related to the well-known Mumford and Shah functional (1989). The goal is to find a segmentation that minimizes a boundary (surface) under some metric, typically image-based Riemannian metric. Traditionally, techniques such as level sets were used, which, however, are only guaranteed to find a local optimum. Recently, many of these functionals were reformulated, using convex relaxation, i.e., the solution lives in the [0, 1] domain, which allows to achieve global optimality and bounds in some practical cases. An example for interactive segmentation with a brush interface is (Unger et al. 2008), where the optimal solution of a weighted total variation norm is computed efficiently. Instead of using convex relaxation techniques, the continuous problem can be approximated on a discrete grid and solved optimally in global by graph cut. This can be done for a large set of useful metrics (Kolmogorov and Boykov 2005). Theoretically, the discrete approach is inferior, since the connectivity of the graph has to be large to avoid metrication artifacts. In practice, however, artifacts are rarely visible when using a geodesic distance.

2.3 Proposed method: improved Grab-Cut

First, we preprocess the RS image with principal component analysis (PCA) transform (Xu et al. 2014). This is followed by the automatic segmentation or manual tagging segmentation using the improved Grab-Cut mechanism. Finally, the detected edge can be saved to calculate area and circumference for convenience.

The novelty of our method lies first in the improvement of Grab-Cut mechanism. We propose to extend the ROI (region of interest) as background, which allows a considerably reduced degree of time consuming for a given quality of result. Second, we improve the Grab-Cut method to adapt the characteristics of large size and large amounts of information (e.g., Geographic Information) based on RS image. Finally, the improved Grab-Cut method is applied in detecting the marine sewage of RS image, which is competitive when compared with other state-of-the-art methods.

3 Improved Grab-Cut algorithm

3.1 Color data modeling

The RS image \(z=(z_1,\ldots ,z_n,\ldots ,z_N)\) consists of pixels \(z_n\) in RGB color space. We use two Gaussian mixture models (GMMs) to model color data, one for the background and one for the foreground. To deal with the GMM conveniently, we introduce vector \(k=\{k_1, \ldots , k_n, \ldots , k_N\}\) as each pixel’s parameter, with \(k_n \in \{1,\ldots ,K\}\).

The Gibbs energy function is as follows:

$$\begin{aligned} E(\alpha ,k,\theta ,z)=U(\alpha ,k,\theta ,z)+V(\alpha ,z), \end{aligned}$$
(1)

where \(\alpha =(\alpha _1,\ldots ,\alpha _n,\ldots ,\alpha _N)\) is transparency and \(\alpha _n\in \{0, 1\}\), with 0 for background and 1 for foreground. The data term U is defined as

$$\begin{aligned} U(\alpha ,k,\theta ,z)&=\underset{n}{\sum }D(\alpha _n,k_n,\theta ,z_n)\,, \end{aligned}$$
(2)
$$\begin{aligned} D(\alpha _n,k_n,\theta ,z_n)&=-\log p(z_n\mid \alpha _n,k_n,\theta )-\log \pi (\alpha _n,k_n)\nonumber \\&=-\log \pi (\alpha _n,k_n)+\frac{1}{2}\log \det \varSigma (\alpha _n,k_n)\nonumber \\&\quad +\frac{1}{2}\left[ z_n-\mu (\alpha _n,k_n)\right] '\varSigma (\alpha _n,k_n)^{-1}\left[ z_n-\mu (\alpha _n,k_n)\right] , \end{aligned}$$
(3)

where \(p(\cdot )\) follows Gaussian distribution, and \(\pi (\cdot )\) are mixture weighting coefficients.

The smoothness term can be written as

$$\begin{aligned} V(\alpha , z)=\gamma \underset{(m,n)\in {\mathbf {C}}}{\sum }{[\alpha _n\ne \alpha _m]\exp (-\beta \Vert z_m-z_n\Vert ^2}), \end{aligned}$$
(4)

where the constant \(\gamma =50\) is set by optimizing performance against ground truth over a training set. \(\beta =(2\langle \Vert z_m-z_n\Vert ^2\rangle )^{-1}\) and \(\langle \cdot \rangle \) denotes the expectation over a remote-sensing image sample. \(\mathbf {C}\) is the set of pairs of neighboring pixels.

Therefore, the parameters of the model are given by

$$\begin{aligned} \theta =\{\pi (\alpha ,k),\mu (\alpha ,k),\varSigma (\alpha ,k),\alpha \in \{0,1\},k=1, \ldots ,K\}. \end{aligned}$$
(5)

3.2 Energy minimization iteratively

Unlike Graph-Cut, Grab-Cut minimizes energy function iteratively. Therefore, the newly labeled pixels from the \(T_U\) region of the initial trimap will be used to modify the color GMM parameters.

3.3 User interaction and incomplete trimap

Incomplete labeling replacing complete trimap brings more flexibility. User only needs to define background \(T_\mathrm{B}\), leaving foreground \(T_\mathrm{F}=0\). No hard foreground labeling is needed at all. Iterative energy minimization allows \(T_U\) representing foreground area \(T_\mathrm{F}\), and the labels of background area \(T_\mathrm{B}\) are fixed. The initial value \(T_\mathrm{B}\) is specified by user with a rough rectangle. If the initial information the user gives is not enough to get satisfactory results, the user needs to do more interactive job and provides more information.

Fig. 1
figure 1

Extension of ROI diagram

3.4 Extension of ROI

According to the characteristics of RS image, ROI manually selected by the bounding box is insufficient. Large-scale images always have lower efficiency. For that matter, we propose to extend the ROI. Based on the characteristic of target, the size of original ROI will be doubled, less than the size of original image (e.g., the grid area in Fig. 1). Therefore, only the extension of ROI is involved in the calculation of the model. Experimental results show that the extension greatly improves efficiency of the method.

Table 1 shows the summary of the improved Grab-Cut:

Table 1 Algorithm 1 Grab-Cut with the local extension of ROI

4 Experiments

4.1 RS data

The Landsat-8 satellite of USA was launched on 11th February 2013, which carries a two-sensor payload, the Operational Land Imager (OLI), and the Thermal Infrared Sensor (TIRS). The reflectance of Landsat-8 OLI and TIRS was measured in 11 spectral bands: coastal/aerosol (0.44–0.45 \(\upmu \)m), blue (0.45–0.51 \(\upmu \)m), green (0.53–0.59 \(\upmu \)m), red (0.64–0.67 \(\upmu \)m), NIR (0.85–0.88 \(\upmu \)m), SWIR (1.57–1.65 and 2.11–2.29 \(\upmu \)m), cirrus (1.36–1.38 \(\upmu \)m), thermal infrared (10.6–11.19 and 11.5–12.51 \(\upmu \)m), and panchromatic mode (0.5–0.68 \(\upmu \)m). In this study, blue band 2 , green band 3, and red band 4 at 30 m spatial resolution of Landsat-8 image with cloud cover 6.56% were selected as the suitable data to detect the marine sewage. The image data were collected from the Bohai Bay near Tianjin city, China at 05:36:45 on August 9th, 2013.

Fig. 2
figure 2

Comparison of our method with various state-of-the-art methods. a Input RS image, b Boykov and Jolly’s method, c Rother, Blake, and Kolmogorov’s method, and d our method

4.2 Edge detection of the marine sewage

Figure 2 shows the comparison of our method with various state-of-the-art methods, including Boykov and Jolly’s method and Rother, Blake, and Kolmogorov’s method. There are still some visible errors (the upper right corner) in Boykov and Jolly’s method. Rother, Blake, and Kolmogorov’s method cannot detect the marine sewage in light color. Comparatively, our method is more accurate.

The experimental result on RS image with PCA transform can be seen in Fig. 3. The area of marine sewage is more obvious due to the PCA transform. It can be seen from Fig. 2b, c that Boykov and Jolly’s method and Rother, Blake, and Kolmogorov’s method cannot distinguish coast from marine sewage due to the similarity in color. Our method has better performances in this case, as shown in Fig. 2d.

As shown in Fig. 4, the area and circumference of marine sewage change with the numbers of iterations. Both of them tend to decrease with each iteration. As a result, the shape of marine sewage approaches the real results gradually.

Fig. 3
figure 3

Comparison of our method with various state-of-the-art methods. a RS image of PCA transform, b Boykov and Jolly’s method, c Rother, Blake, and Kolmogorov’s method and d our method

Fig. 4
figure 4

Area and circumference of each iteration

4.3 The size of ROI extension

It can be concluded from Figs. 5 and 6 that our method may be incredibly inefficient for large-scale RS image. Time consuming increases as the size of ROI Extension becomes larger. The local adaptive extension of ROI just solved the problem. We choose the right size of ROI extension for our method (eg. doubled the ROI size) to improve efficiency.

Fig. 5
figure 5

Different sizes of ROI extension on a river of Hainan Province

Fig. 6
figure 6

Time consuming of different sizes of ROI extension

4.4 Comparison with Grab-Cut method

The comparison of our method with Grab-Cut method on a river of Hainan Province in China is shown in Fig. 7a. In the first experiment, the time-consuming task of one segmentation without manually tagging was evaluated, as shown in Fig. 7b. As can be seen, the result of Grab-Cut method is not fine enough (i.e., the part of vegetation has not been removed). In contrast, our method can be used to accomplish this task. The river can be clearly distinguished. The second experiment was conducted with manually tagging which labels possible foreground and background. Grab-Cut still cannot remove the vegetation besides the river, as shown in Fig. 7c.

Fig. 7
figure 7

Comparison of our method with Grab-Cut method on a river of Hainan Province in China. a Input RS image, b RS image with automatic segmentation and manual tagging, c Grab-Cut method, and d our method

Given the time consuming of two experiments, our method costs 1–2% the time compared with that of Grab-Cut method. The improved Grab-Cut method can improve the efficiency significantly. The research code is implemented in C++ and tested under Windows environment with 3.20 GHz CPU and 4.00G RAM (Table 2).

5 Conclusion

In conclusion, a new and effective method for edge detection from RS image is proposed, which can be used to obtain foreground alpha mattes of good quality for large-scale images with a rather modest degree of user effort. Accordingly, software has also been developed base on the method that can be applied on the detection of marine sewage.

Table 2 Comparison of our method with Grab-Cut method