1 Introduction

Image change detection is a process that detects regions of change in multiple images of the same scene taken at different times [1], is of widespread interest due to a large number of applications in diverse disciplines, including remote sensing [210], medical diagnosis [11, 12] and video surveillance [13, 14]. With the development of remote sensing technology, change detection in synthetic aperture radar (SAR) sensor has been attracted wide attention, when natural catastrophe strikes, lives and properties are at stake. The images generated by SAR have become useful and indispensable sources of information in change detection, due to SAR sensors are independent of atmospheric and sunlight conditions. As mentioned in the literature [2], the procedure of unsupervised change detection in SAR images can be divided into three steps: (1) image preprocessing; (2) producing difference image (DI) between the multi-temporal images; and (3) analysis of the difference image. In the second step, because of the multiplicative nature of speckles, the log-ratio operator is typically used for producing a difference image because of its robustness and non-sensitiveness to speckle noise [48].

The DI-analysis step in fact can be looked on as the process of image segmentation, and two conventional methods, the threshold method and the clustering method, have been widely used [1517]. In the clustering method, it seems to be more convenient and feasible because we do not need to establish a model. A lot of clustering-based methods have been proposed. The fuzzy c-means (FCM) algorithm is one of the most popular clustering methods for image segmentation [18]. However, the standard FCM algorithm is very sensitive to noise because of ignoring spatial contextual information in image. To compensate for this defect of FCM, many improved FCM algorithms have been proposed by incorporating local spatial and local gray-level information into the original FCM objective function [1924]. Ahmed et al. [19] proposed FCM_S, which modified the objective function of FCM by introducing the spatial neighborhood term. One disadvantage of FCM_S is that the spatial neighborhood term is computed in each iteration step, which is time-consuming. Chen and Zhang [20] proposed FCM_S1 and FCM_S2 to reduce the computational complexity of FCM_S. These two algorithms introduced the extra mean and median-filtered image to replace the neighborhood term of FCM_S. The mean-filtered image and median-filtered image can be computed in advance. Thus, the execution times are considerably reduced. To accelerate the image segmentation process, Szilagyi et al. [21] proposed the enhanced FCM (EnFCM) and Cai et al. [22] proposed the fast generalized FCM (FGFCM) algorithm. However, these algorithms share a common crucial parameter \(\alpha \) (or \(\lambda \)) in the second term to control the effect of the penalty. Since the kind of image noise is generally a priori unknown, the selection of these parameters is not an easy task. These algorithms do not directly apply on the original image. Stelios et al. [23] presents a novel robust fuzzy local information c-means clustering algorithm (FLICM), which is free of any parameter selection, as well as promote the image segmentation performance. And Gong et al. [24] proposed a reformulated fuzzy local-information C-means clustering algorithm (RFLICM), which introduces the local coefficient of variation to replace the spatial distance as a local similarity measure. However, the local minimizers of FLICM do not converge to the correct local minima of the designed energy function [25].

Clustering task aimed at reducing the effect of speckle noise and enhancing the changed information, can be considered as a multiobjective optimization problem. A multiobjective optimization problem (MOP) can be described as follows [26]:

$$\begin{aligned} \hbox {min F}\left( x \right) =\left( {f_1 \left( x \right) ,f_2 \left( x \right) ,\ldots ,f_m \left( x \right) } \right) ^{\hbox {T}}, \end{aligned}$$
(1)

subject to \(x=\left( {x_1 ,x_2 \ldots ,x_n } \right) \in \Omega \). Where x is the decision vector, and \(\Omega \) is the feasible region in decision space. \(\hbox {F: D}\rightarrow \hbox {R}^{\mathrm{m}}\) consists of m real-valued objective functions and \(\hbox {R}^{\mathrm{m}}\) is called the objective space. The attainable objective set is defined as the set \(\left\{ {F\left( x \right) |x\in \Omega } \right\} \). Considering a minimization problem for each objective, it is said that a decision vector \(x_{\mathrm{A}} \in \Omega \) dominates another vector \(x_{\mathrm{B}} \in \Omega \) (written as \(x_{\mathrm{A}} \succ x_{\mathrm{B}}\)) if and only if

$$\begin{aligned} \forall i= & {} 1,2,\ldots ,k\,f_i \left( {x_A } \right) \le f_i \left( {x_B } \right) \nonumber \\&\wedge \exists j=1,2,\ldots ,k\,f_i \left( {x_A } \right) <f_i \left( {x_B } \right) . \end{aligned}$$
(2)

If there is no solution dominates \(x_{\mathrm{A}} \), then \(x_{\mathrm{A}}\) is a Pareto-optimal solution or nondominated solution. Since the objectives in (1) contradict each other, no point in \(\Omega \) minimize all the objectives simultaneously. One has to balance them. The best tradeoffs among the objectives can be defined in terms of Pareto optimality.

In this paper, we propose a novel multiobjective clustering algorithm based on the framework of multiobjective evolutionary to deal with the above mentioned problems by reference to the starting point of the above mentioned algorithms. Handl et al. [27] proposed the multi-object clustering technique Mock, the authors have used two objective functions, viz., cluster variance and connectivity, which considers a clustering task as a multiobjective optimization problem [26] solving data clustering problems unconventionally and shows good performance. Following MOCK, Gong et al. [28] used two complementary clustering objectives, compactness (rewarding compact clusters) and connectedness (regarding local connectedness of clusters) of cluster. Bandyopadhyay et al. [29] uses the Xie-Beni (XB) index [30] and the fuzzy C-means (FCM) measure (\(J_m\)) as the objective functions. Three validity indices, viz., \(J_m\), XB and PBM, have been optimized simultaneously in [31]. In this paper, we proposed two complementary clustering objectives aim at making a balance between noise-immunity and the preservation of image detail. The multiobjective clustering algorithm based on NNIA (Nondominated Neighbor Immune Algorithm) [32], in this paper, which is free of any parameter selection and incorporates local spatial and local gray-level information compared with the single-objective clustering algorithms, result in an improved and more robust performance. NNIA is a multiobjective optimization method by using a novel nondominated neighbor-based selection strategy in artificial immune system. NNIA has low computational complexity and is effective. The output of an MOEA is a set of mutually non-dominated clustering solutions, which correspond to different trade-offs between the two objectives, and also to different numbers of clusters. For choosing the most interesting solutions from the Pareto front, MOCK apply Tibshirani et al.’s [33] Gap statistic, a statistical method to determine the number of clusters in a data set. However no specific prior information about the image is assumed. For the image data, a comparison is provided in qualitative terms (i.e., visually) and in quantitative terms using a cluster goodness index PBM [34]. The PBM index proposed as a measure of indicating the validity of a cluster solution. Larger value of PBM index implies better solution, which is not good enough for image segmentation. In this paper, the multiobjective clustering algorithm generates many intermediate classification results. Then, the selective ensemble strategy is introduced to integrated intermediate image classification results. Finally, the integrated output is adopted for image segmentation.

This paper is organized as follows. In the next section, our motivation and the main procedure of the proposed approach will be described. Section 3 describes the proposed method in details. Experimental results on real multi-temporal SAR images will be described to demonstrate the effectiveness of the proposed approach in Sect. 4. Our conclusions will be drawn in the last section.

2 Motivation

Let us consider two co-registered intensity SAR images acquired over the same geographical area at two different times of the same size \(m\times n\), respectively. The change detection problem is to produce a difference image that represents the change information between the two times, and then apply a binary classification to produce a binary image corresponding to the two classes: change and unchanged. The two original images are corrupted by noise. We should design efficient classification methods to find the changes between the two images. As shown in Fig. 1, the proposed unsupervised distribution-free change detection approach consists of three phases: (1) generate the difference image DI (2) automatic analysis of the difference image by using a multiobjective clustering algorithm (3)introduce selective ensemble strategy to integrate intermediate image segmentation results. Finally, the integrated output is for image segmentation. In this paper, we lay primary emphasis on the second and the third step. The log-ratio operator is typically used for producing the difference image.

Fig. 1
figure 1

An illustration of the proposed change detection approach

2.1 Motivation of analyzing difference image using multiobjective clustering

As mentioned in Sect. 1, the clustering method has been widely used in DI-analysis step, a lot of clustering-based methods have been proposed. FCM [18] is one of the most popular clustering methods for image segmentation. However, FCM ignores spatial contextual information in image, which is sensitive to noise. Many improved FCM algorithms have been proposed by incorporating local spatial and local gray-level information into the original FCM objective function, to compensate for its defect, such as FCM_S, FCM_S1, and FCM_S2 [19, 20]. These algorithms share a common crucial parameter \(\alpha \) (or \(\lambda \)) to control the trade-off between robustness to noise and effectiveness of preserving the details. Since the kind of image noise is generally a priori unknown, the selection of these parameters has to be made by experience or by using the trial-and-error method. When \(\alpha \) is set to zero, the algorithm is equivalent to the original FCM, while approaches infinite, the algorithm acquires the same effect as the original FCM on filtered image, respectively [20], which may cause a loss of detail at the same time of denoising. FLICM, [23] is a novel robust fuzzy local information c-means clustering algorithm (FLICM), which define a novel fuzzy factor to replace the parameter used in above algorithms and its variants. However, the local minimizers of FLICM algorithm are in-fact not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function [25].

In essence, the addition of the second term in these mentioned improved FCM algorithms, equivalently, formulates a spatial constraint and neighboring pixel values around and aims at making a balance between noise-immunity and the preservation of image detail. Thus, clustering task aimed at reducing the effect of speckle noise and keeping the details completely, which can be considered as a multiobjective optimization problem. A multiobjective clustering technique requires to choose two or more complementary objective functions. As discussed, the existing traditional multiobjective evolutionary clustering algorithms predefine the objective functions to be optimized before the execution of the algorithm begins, and the same objective functions are used for all the datasets, which are not suitable for reducing the effect of speckle noise.

In this paper, a novel multiobjective clustering algorithm based on a multiobjective evolutionary algorithm is proposed to deal with the above mentioned problems. Section 3.1 presents the further detail about this novel multiobjective clustering algorithm.

2.2 Motivation of choosing the most interesting solutions from the Pareto front using selective ensemble strategy

The output of an MOEA is a set of mutually non-dominated clustering solutions, which corresponds to different trade-offs between the two objectives. To choose the most interesting solutions from the Pareto front, MOCK apply Tibshirani et al.’s [33] Gap statistic, a statistical method to determine the optimal solutions in a data set. The selection is based on the approximation of the shape of the Pareto front, as well as no specific prior information about the image is assumed. This selection strategy relies on domain-specific considerations, which limits its application. For the image data, the cluster goodness index PBM [34] is proposed as a measure of indicating the goodness of a cluster solution, which is equivalent to a clustering objective is not good enough for image segmentation. In this paper, the ensemble strategy is introduced to combine intermediate image classification results.

Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. Typically, an ensemble is constructed in two steps. First, a number of base learners are produced. Most ensemble methods use a single base learning algorithm to produce homogeneous base learners, but there are also some methods which use multiple learning algorithms to produce heterogeneous learners. Then, the base learners are combined to use [35]. In addition to classification and regression, it is worth mentioning that ensemble methods have also been designed for clustering [36] and other kinds of machine learning tasks. The multiobjective clustering process produces many intermediate classification results, which can be seen as the first step of an ensemble. The multiobjective clustering process produces multiple clustering learners, which are homogeneous. An ensemble often has better performance than a single one.

The multiobjective clustering process produces many intermediate classification results, some results may not be suitable for the final classification, which should not be selected to combine. Selective ensemble generates a set of base learners and selects some base learners instead of using all of them to compose an ensemble. The algorithm based on ensemble learning would be more effective than each single one and better than the algorithms that select all the base classification results. The aim of selective ensemble learning [37] is to further improve the classification accuracy of an ensemble machine, to enhance its classification speed as well as to decrease its storage need. The selective ensemble strategy is introduced in this paper to integrated intermediate image classification results. Finally, the integrated output is adopted for image segmentation. The detailed description of this method will be presented in Sect. 3.2.

3 Methodology

Motivated by the above descriptions, a multiobjective clustering technique requires choosing two or more objective functions, a suitable multiobjective optimization algorithm and an effective technique for solution selection from the set of Pareto-optimal solutions. We deal with the above mentioned problems by introducing two complementary clustering objectives. The details of the proposed multiobjective clustering algorithm and the selective ensemble strategy will be described in this section.

3.1 Analyze difference image using multiobjective clustering

The multiobjective clustering algorithm based on NNIA [32] has two main issues. First is antibody representation and the second is the choice of the cluster validity measures to be optimized.

3.1.1 Initialization

There are two popular strategies for antibody representation: point-based and center-based. In point-based encoding [38], the length of an antibody is the same as the number of pixels. So point-based encoding techniques suffer from large antibody lengths and hence slow rates of convergence. Here, we adopt center-based encoding, which cluster centers are encoded into antibody. Each antibody is of length \(k\times \hbox {d}\), where \(\hbox {d}\) is the dimension of the image data. Here, \(k\,=\,2\) (two classes: change and unchanged), thus, it usually has a faster convergence rate than point-based encoding techniques. And we adopt real-valued presentation.

3.1.2 Objective functions for clustering

For the clustering objective functions, we are interested in selecting optimization criteria that reflect different aspects of a good clustering solution. In order to express cluster compactness aimed at enhancing the cluster performance we compute the overall deviation of a partitioning. This is simply computed as the overall summed distances between pixels and their corresponding cluster centers. This criterion is similar to the well-known criterion of intra-cluster variance. As an objective, variance should be minimized, is defined as [18]

$$\begin{aligned} J_m =\mathop \sum \limits _{i=1}^N \mathop \sum \limits _{k=1}^c u_{ki}^m \left\| {x_i -v_k } \right\| ^{2} \end{aligned}$$
(3)

where

$$\begin{aligned} u_{ki} =\mathop \sum \nolimits _{j=1}^c \left( {\frac{\left\| {x_i -v_k } \right\| ^{2}}{\left\| {x_i -v_j } \right\| ^{2}}} \right) ^{{\hbox {-}1}/{m-1}} \end{aligned}$$
(4)

where N is the set of all pixels (\(N=m\times n\), \(m\times n\) is the size of difference image), \(u_{ki}\) is the degree of member ship of the \(i\hbox {th}\) pixel in the \(k\hbox {th}\) cluster, m is the weighting exponent on each fuzzy membership, and \(v_k\) is the prototype of the center of cluster k. And \(\left\| {x_i -v_k } \right\| ^{2}\) is the chosen distance.

For the second objective function, we propose a new measure incorporating local spatial and local gray-level information, in order to reducing the effect of speckle noise. It is computed as

$$\begin{aligned} G_{ki} =\sum _{i=1}^N {\sum _{k=1}^c {\sum _{\begin{array}{l} j\in N_i \\ i\ne j \\ \end{array}} {{\frac{1}{d_{ij} +1}\left\| {x_j -v_k } \right\| ^{2}}/{D_c }} } } \end{aligned}$$
(5)

where

$$\begin{aligned} D_c =\mathrm{max}_{i,j=1}^c \left\| {v_i -v_j } \right\| \end{aligned}$$
(6)

where the \(i\hbox {th}\) pixel is the center of the local window (for example, \(3\times 3\)), is the reference cluster and the \(j\hbox {th}\) pixel belongs in the set of the neighbors falling into a window around the \(i\hbox {th}\) pixel (\(N_i\)), \(d_{ij}\) is the spatial Euclidean distance between pixels i and j, and \(v_k\) is the prototype of the center of cluster k.\(D_c \), which measures the maximum separation between two clusters over all possible pairs of clusters. By using \(d_{ij}\) the function makes the influence of the pixels within the local window, more local spatial information can be used.

Performance of multiobjective clustering highly depends on the choice of objectives which should be as contradictory as possible. In this paper, we proposed two complementary clustering objectives \(J_m \) and \(G_{ki}\), aim at making a balance between noise-immunity and the preservation of image detail \(J_m\) calculates the global cluster variance, it considers the within cluster variance summed up over all the clusters. Lower value of \(J_m\) implies better clustering solution. On the other hand, \(G_{ki}\) (Eq. 6) incorporates local spatial and local gray-level situations. The form of \(G_{ki}\) is similar to \(J_m \), therefore, \(G_{ki}\) is minimized. Since remote sensing data sets typically have complex overlapping clusters. The two terms may not attain their best values for the same partitioning. \(J_m \) and \(G_{ki}\), will provide a set of result. Fig. 2 shows, for the purpose of illustration, the final Pareto optimal front of the proposed algorithm for the Ottawa dataset (described in the Sect. 4), to demonstrate the contradictory nature of the two complementary clustering objectives [27].

Fig. 2
figure 2

Non-dominating Pareto front for Ottawa dataset

3.1.3 The main loop of NNIA

NNIA stores nondominated individuals found so far in an external population, called the dominant population. Only partial less-crowded nondominated individuals, called active antibodies, are selected to do proportional cloning, recombination, and static Hypermutation. Furthermore, the population storing clones is called the clone population. The dominant population, active population, and clone population at time t are represented by time-dependent variable matrices \(\mathbf{D}_{\mathrm{t}}\), \(\mathbf{A}_{\mathrm{t}}\), and \(\mathbf{C}_{\mathrm{t}}\), respectively. The main loop of NNIA is shown as follows [28].

3.2 Choose the most interesting solutions from the Pareto front using selective ensemble strategy

The multiobjective clustering process produces many intermediate classification results, which can be seen as the first step of an ensemble. Then, last step in any ensemble-based system is the mechanism used to combine the individual classifiers, where among the most popular combination schemes are majority voting for classification and weighted averaging for regression [35].

The output of the multiobjective clustering algorithm is a set of mutually non-dominated clustering solutions, which correspond to different trade-offs between the two objectives. The intermediate image classification results are different biased class labels, then combined through a simple or weighted majority vote (or sum) to produce the final prediction.

Majority voting [39] has three flavors, depending on whether the ensemble decision is the class (1) on which all classifiers agree (unanimous voting); (2) predicted by at least one more than half the number of classifiers(simple majority); or (3) that receivers the highest number of votes, whether or not the sum of those votes exceeds 50 % (plurality voting). When not specified otherwise, majority voting usually refers to plurality voting, which can be mathematically defined as follows: choose class \(wc^{*}\), if

$$\begin{aligned} \mathop {\sum } \limits _{\hbox {t}=1}^{\hbox {T}} \hbox {d}_{\hbox {t},\hbox {c}^{*}} =\max _{\hbox {c}} \mathop {\sum } \limits _{\hbox {t}=1}^{\hbox {T}} \hbox {d}_{\hbox {t},\hbox {c}} \end{aligned}$$
(7)

where T is the number of classifiers.

figure a

If the classifier outputs are independent, it can be shown that majority voting is the optimal combination rule. Here, we use a simple majority vote produce a set of class labels of the Difference Image (DI), because the clustering solutions are non-dominated and independent. Then introduce selective ensemble strategy to integrated partial intermediate image classification results. It is well known that diversity among component classifiers is crucial for constructing a strong ensemble. After generating a set of base learners, selecting some base learners instead of using all of them to compose an ensemble is a better choice [37]. The aim of selective ensemble learning is to further improve the accuracy of an ensemble machine, to enhance its speed as well as to decrease its storage need.

Selective ensemble strategy could be roughly divided into the following categories: The literatures [40, 41] study the selective ensemble from the view of cluster technology. Here the cluster means that we can cluster some base classifiers together. Cluster technologies include k-means and hierarchical clustering. In clustering technology, the methods of selecting ensemble models include four strategies, that is, selecting center objectives as ensemble model, randomly selecting one objective as ensemble models from every cluster, randomly selecting two and three objectives as ensemble models from every cluster, and finally measuring the diversity of ensemble models. The methods of measuring diversity include fail/no-fail, double fault and correlation coefficient. Sorting the base classifier to prune integrated classifier is more intuitive selective ensemble learning method [42, 43]. The procedure can be divided into two steps: sort the base classifier based on some criteria (e.g., accuracy), and take appropriate stopping criteria (such as a specified number of base classifiers) to select part of them. Selecting part of the base classifier based on some criteria is the most intuitive selective ensemble learning method [44, 45]. Broadly speaking, selective ensemble strategy based on sorting also belongs to this kind of method. In other literatures [37, 46], genetic algorithms based selective ensemble were proposed. Zhou et al. [37] proposed the “many could be better than all” theorem which proves the validity of the selective ensemble strategy theoretically for the first time and presented an algorithm named Genetic Algorithm based Selective Ensemble (GASEN) to build selective ensembles. GASEN assigns a random weight to each of the available component learners. Then, it employs genetic algorithm to evolve those weights so that they can characterize to some extent the fitness of the learners in joining the ensemble. Finally, it selects the learners whose weight is bigger than a preset threshold to constitute the ensemble [46].

The strategy used in this step depends on the type of classifiers. Since the multiobjective clustering process produces a set of intermediate classification results, and the classifier outputs are independent. The selective ensemble strategy introduced in this paper is to select a subset of the ordered intermediate image classification results for aggregation. First, combine all the intermediate clustering results using a simple majority vote to produce a set of class labels of the difference image; then reorder the intermediate component individual image classification results using the set of class labels generated in first step as criterion and then select the top 15–30 % to combine [47, 48]. The process is shown in Fig. 3.

Fig. 3
figure 3

Process of choosing the most interesting results from the Pareto front

The criterion is calculated as:

$$\begin{aligned} \hbox {ar}={\hbox {N}_{\hbox {k}}}/\hbox {N} \end{aligned}$$
(8)

where N is the set of all pixels, \(\hbox {N}_{\hbox {k}}\) is the number of same class labels between each intermediate image classification result and the majority voting result produced in first step. If the intermediate class labels result and the set of class labels generated in first step are in complete agreement, then the \(\hbox {ar}\) value is 1. If there is no agreement among the intermediate class labels result and the set of class labels generated in first step, the \(\hbox {ar}\) value is 0. Finally, the top 15–30 % integrated output is adopted to combine, vote to produce a set of class labels of the final change map.

4 Experimental study

In this section, we implement two sets of experiments to validate the effectiveness of the proposed SAR-image change detection method NNIA_C. The first one is aimed at the analysis of the effectiveness of the two proposed clustering objectives, and the second one verifies the performance of our algorithm.

4.1 Introduction to datasets

We will show the performance of the proposed methods by presenting numerical results on five data sets.

The first data is a section (\(301 \times 301\,\hbox {pixels}\)) of two SAR images acquired by the European Remote Sensing 2satellite SAR sensor over an area near the city of Bern, Switzerland, in April and May 1999, respectively. Between the two dates, the River Aare flooded entirely parts of the cities of Thun and Bern and the airport of Bern. Therefore, the Aare Valley between Bern and Thun was selected as a test site for detecting flooded areas. The available ground truth (reference image), which was shown in Fig. 4c, was created by integrating prior information with photo interpretation based on the input images Fig. 4a, b.

Fig. 4
figure 4

Multi-temporal images relating to the city of Bern. a Image acquired in April 1999 before the flooding. b Image acquired in May 1999 after the flooding. c Ground truth

The second data set represents a section (\(290 \times 350\,\hbox { pixels}\)) of two SAR images over the city of Ottawa acquired by RADARSAT SAR sensor and provided by the Defence Research and Development Canada (DRDC)-Ottawa. These images were registered by the automatic registration algorithm from A.U.G. Signals Ltd that is available through the distributed computing at www.signalfusion.com. The available ground truth (reference image) which is shown in Fig. 5c was created by integrating prior information with photo interpretation based on the input images Figs. 5a, b.

Fig. 5
figure 5

Multi-temporal images relating to Ottawa. a Image acquired in July 1997 during the summer flooding. b Image acquired in August 1997 after the summer flooding. c Ground truth

The Yellow River dataset used consisted of two SAR images acquired by Radarsat-2 at the region of Yellow River Estuary in China in June 2008 and June 2009, respectively. We select three typical areas (the Inland water, the coastline and the farmland) which are shown in Figs. 6, 7 and 8. Different kinds of changes occur to compare the change in the selected three typical areas.

Fig. 6
figure 6

Multi-temporal images relating to Inland water of Yellow River Estuary. a Image acquired in June, 2008. b Image acquired in June, 2009. c Ground truth

Fig. 7
figure 7

Multi-temporal images relating to Coastline of Yellow River Estuary. a Image acquired in June, 2008, b Image acquired in June, 2009, c Ground truth

Fig. 8
figure 8

Multi-temporal images relating to Farmland C of Yellow River Estuary. a Image acquired in June, 2008, b Image acquired in June, 2009, c Ground truth

4.2 Evaluation criteria

The quantitative analysis of change detection results is set as follow. There are three criteria from [49]. First, we calculate the false negatives (FN, changed pixels that undetected). Second, the false positives (FP, unchanged pixels wrongly detected as changed) should be calculated. Then, we calculate the percentage correct classification (PCC). It is given by

$$\begin{aligned} \hbox {PCC}={\left( {\hbox {TP}+\hbox {TN}} \right) }/{\left( {\hbox {TP}+\hbox {FP}+\hbox {TN}+\hbox {FN}} \right) } \end{aligned}$$
(9)

where TP is short for true positives, which is the number of pixels that are detected as the changed area in both the reference image and the result. TN is short for true negatives, which is the number of pixels that are detected as the unchanged area in both the reference image and the result.

For accuracy assessment, Kappa statistic, which is a measure of accuracy or agreement based on the difference between the error matrix and chance agreement [50].

Kappa is calculated as:

$$\begin{aligned} \hbox {Kappa}=\frac{\hbox {PCC}-\hbox {PRE}}{1-\hbox {PRE}} \end{aligned}$$
(10)

where

$$\begin{aligned} \hbox {PRE}=\frac{\left( {\hbox {TP}+\hbox {FP}} \right) \cdot \hbox {Mc}+\left( {\hbox {FN}+\hbox {TN}} \right) \cdot \hbox {Mu}}{\hbox {M}^{2}} \end{aligned}$$
(11)

where M is number of pixels in the image. Mc is the number of changed pixels and Mu is the number of unchanged pixels.

4.3 Input parameters

The different parameters are set as follows: maximum number of generations \(\hbox {Gmax} = 100\), maximum size of dominant population \(\hbox {nD}=100\), maximum size of active population \(\hbox {nA} =20\), size of clone population \(\hbox {nC} =100\), mutation probability \(\hbox {pm}=1/\hbox {k}\).

4.4 Effectiveness of the proposed two objectives

In the first set of experiments, we analyzed the effectiveness of the proposed two objectives to process the difference image (DI) respectively. The experiments are carried out on the Ottawa dataset. The two proposed objectives result on Ottawa dataset is shown in Fig. 9. It is seen that the result of the first objectives keeps the details completely, which is shown in Fig. 9a, the second result lose some details but with no noise, which is shown in Fig. 9b, c is reference image. According to Fig. 9a, the change detection map achieved by single objective \(J_m \), contains lots of spots, as listed in Table 1, it yields a very high FP. While, The single objective \(G_{ki}\) which actually reduces the noise (the FP is 0) but causes the loss of details seen as Fig. 9b. The multiobjective clustering algorithm NNIA_C which use the two objectives can make a balance between denoising and preserving details which is reflected in a balance between FP and FN. Results on Ottawa dataset indicate that rather than considering these objectives individually, better clustering performance is obtained if both of them are optimized simultaneously. The algorithm has a strong capacity of interpreting images sufficiently. The output of the multiobjective clustering is a set of mutually non-dominated clustering solutions from the result of FP and FN on Ottawa dataset shown in Fig. 10. It is seen that the number of FP decrease with the increase in the number of FN.

Table 1 Change detection results obtained by optimizing single objective and multiple objectives
Fig. 9
figure 9

Change detection results of Ottawa dataset achieved by a single objective \(J_m \), b single objective \(G_{ki} \). c ground truth

Fig. 10
figure 10

Multiobjective clustering result of FP and FN on Ottawa dataset

This is because the result correspond to different trade-offs between the two objectives. From 1 to 100, the weights of first objectives decrease, while the weights of second objectives increase. The output of noise gradually decreases with the increase of the number of changed pixels that undetected. Here are parts of the intermediate image segmentation results on Ottawa dataset shown in Fig. 11. We can see that with the reduction of noise, the image details lost constantly.

Fig. 11
figure 11

Parts of multiobjective clustering result of Ottawa dataset

4.5 Performance of the proposed method

The second set of experiments verifies the performance of our multiobjective clustering algorithm using the selective ensemble strategy. To assess the suitability of the presented selective ensemble strategy for our multiobjective clustering algorithm, in the second set of experiments comparisons were carried out among traditional FCM, FCM_S, RFLICM, MRFFCM, MOCK (multiobjective clustering algorithm) and our method NNIA_C. The experiments are carried out on all the five dataset. To select the effective result of MOCK, the cluster goodness index PBM as been examined.

4.5.1 Results on the Bern dataset

As for Bern dataset, the results are illustrate in Fig. 12 and listed in Table 2.

Fig. 12
figure 12

Change detection results of Bern dataset achieved by a FCM, b FCM_S, c RFLICM, d MRFFCM, e MOCK, f NNIA_C

Table 2 Change detection results Of Bern dataset obtained by FCM, FCM_S, RFLICM, MRFFCM, MOCK And NNIA_C

According to Fig. 12a, the change detection result of Bern dataset achieved by traditional FCM contains lots of spots. This is explained by the fact that it fail to consider any information about spatial context. RFLICM which applies local information actually reduces the noise but causes the loss of details seen as Fig. 12c. MRFFCM use local information for the purpose of suppressing noise, the final maps generated by it is still polluted by some spots. The FN of MOCK is as high as 736 which means the loss of details seen as Fig. 12e.

As reported in Table 2, FCM_S resulted in the highest PCC and Kappa. However, FCM_S has a crucial parameter \(\alpha \) (or \(\lambda \)) in the second term to control the effect of the penalty. Since the kind of image noise is generally a priori unknown, the selection of these parameters is not an easy task, the algorithms do not directly apply on the original image. The visual and quantitative results on Bern dataset also confirm the suitability of the proposed method NNIA_C.

4.5.2 Results on the Ottawa dataset

As for Ottawa dataset, the results are shown in Fig. 13 and listed in Table 3.

Fig. 13
figure 13

Change detection results of Ottawa dataset achieved by a FCM, b FCM_S, c RFLICM, d MRFFCM, e MOCK, f NNIA_C

Fig. 14
figure 14

Change detection results of Coastline dataset achieved by a FCM, b FCM_S, c RFLICM, d MRFFCM, e MOCK, f NNIA_C

According to Fig. 13, MOCK and FCM caused the loss of information in changed area seen as Fig. 13a, e, and MOCK yields a very high FN, as the influence of noise on the Ottawa dataset is less great. RFLICM which applies local information for the purpose of suppressing noise also causes the loss of details seen as Fig. 13c. MRFFCM with a modified MRF energy function performs well on the Ottawa dataset which is influenced by noise less greatly, but it yields the highest FP (errors). As reported in Table 3, FCM_S resulted in the highest PCC and Kappa. However, FCM_S has a crucial parameter \(\alpha \) (or \(\lambda \)) in the second term to control the trade-off between robustness to noise and effectiveness of preserving the details. Since the kind of image noise is generally a priori unknown, the selection of these parameters has to be made by experience or by using the trial-and-error method. The method that we proposed can effectively reduce the errors (FP) in the change detection results. The quantitative results on Ottawa dataset confirm the suitability of the proposed method NNIA_C.

Table 3 Change detection results of Ottawa dataset obtained by FCM, FCM_S, RFLICM, MRFFCM, MOCK and NNIA_C

4.5.3 Results on the Yellow River datasets

The results on the three typical areas of Yellow River datasets are shown in Figs. 14, 15 and 16 and listed in Tables 4, 5 and 6. It is hard to detect the changes occurring on the Yellow River dataset which is influenced by noise much greatly. The final maps obtained by FCM, which is sensitive to noise (illustrated in Figs. 14, 15 and 16a), confirm the necessity of incorporating the information about spatial context. Though FCM_S resulted in the highest PCC and Kappa on Bern dataset and Ottawa dataset, change detection maps of the three typical areas of Yellow River datasets generated by FCM_S, swarm with noise. MOCK keeps a good performance of denoising, but it yields a very high FN, which means the serious loss of details seen as Figs. 14, 15 and 16e. The PCC of Coastline dataset obtained by MOCK is highest, but it resulted in lower Kappa. Although the two clustering methods, RFLICM and MRFFCM, use local information for the purpose of eliminating noise, the final change detection maps generated by them are still polluted by some spots. Only the result of Farmland C dataset yielded by RFLICM is better. The proposed method NNIA_C performs better shown in Figs. 14, 15 and 16f and listed in Tables 4, 5 and  6. No matter the dataset is from the water (Inland water and Coastline datasets) or the land (Farmland C dataset), the proposed method can make a tradeoff between image detail and noise.

Fig. 15
figure 15

Change detection results of Inland water dataset achieved by a FCM, b FCM_S, c RFLICM, d MRFFCM, e MOCK, f NNIA_C

Fig. 16
figure 16

Change detection results of Farmland C dataset achieved by a FCM, b FCM_S, c RFLICM, d MRFFCM, e MOCK, f NNIA_C

Table 4 Change detection results of coastline dataset obtained by FCM, FCM_S, RFLICM, MRFFCM, MOCK and NNIA_C
Table 5 Change detection results of inland water dataset obtained by FCM, FCM_S, RFLICM, MRFFCM, MOCK and NNIA_C

The proposed method NNIA_C can find a balance between denoising and preserving details. No matter the change areas are small (Coastline dataset and Bern datasets) or large (Ottawa dataset), regular (Inland water dataset) or irregular (Farmland C), the proposed method is applicative.

Table 6 Change detection results of Farmland C dataset obtained by FCM, FCM_S, RFLICM, MRFFCM, MOCK and NNIA_C

5 Concluding remarks

In this paper, we have presented a novel change detection algorithm specifically towards analyzing multi-temporal SAR images, which is based on multiobjective clustering algorithm and selective ensemble strategy. This approach is quite different from the existing clustering algorithms. First, as mentioned in the literature, after generating the DI through the log-ratio operator, we proposed two complementary clustering objectives automatic analysis of the difference image by using a multiobjective clustering algorithm. The proposed multiobjective clustering algorithm aims at reducing the effect of speckle noise and enhancing the changed information. One objective function calculates the global cluster variance. The other one incorporates local spatial and local gray-level situations. The multiobjective clustering algorithm is based on the framework of evolutionary algorithms. Second, for choosing the most interesting solutions from the Pareto front, we introduce selective ensemble strategy to integrate intermediate image segmentation results. In contrast with the selection strategy based on the approximation of the shape of the Pareto front, our selection strategy do not relies on domain-specific considerations. As the experiment results show, the index PBM proposed as a measure of indicating the validity of a cluster solution is not good enough for image segmentation.

The experiments on the datasets that have different features indicate the validity of the proposed approach. The proposed two complementary clustering objectives can reduce the noise of DI without losing the SAR-image details. And choose the most interesting solutions from the Pareto front using selective ensemble strategy can be more convenient and integrate all the results and gain a better performance. In general, the proposed approach is free of any parameter selection and can make a tradeoff between image detail and noise. In the future, we will do research on using different selective ensemble strategy to deal with images having different noise characteristics.