Oil pollution is one of the most common forms of marine pollution. It is estimated that 706 million gallons of oil are spilled into the ocean each year (Ivanovic 2012). Industrial discharges and urban runoff, oil production, and ship routine maintenance during operation accounted for a significant proportion. The remainder resulted from seepage, shipping accidents, or atmospheric circulation (Zhao et al. 2014). Marine oil spills have become one of the most serious ocean pollution problems, because they can degrade ocean ecosystems and impact both the environment and economy (ESA 1998).

There are rich natural resources in the North Pacific Ocean, but the continental shelf has been affected by oil spill pollution (Burger and Fry 1993). When the oil drifts into coastal areas, it degrades ocean resources, including shellfish beds, saltwater marshes, coral reefs, and other habitats (Garcia-Pineda et al. 2013).

To deal with oil spill pollution and to prevent expensive environmental and economic costs, a rapid and accurate response is necessary. The magnitude, location, and drifting course of the oil spill need to be confirmed as soon as possible. The coverage, continuity of observations, and rich data provided by remote sensing makes this an efficient way to detect and monitor oil spills over a broad area.

Synthetic Aperture Radar (SAR) is commonly used for oil spill detection and monitoring. SAR can be divided into single- and multi-polarization images. The main method to detect oil using single-polarization SAR is image processing, including image pre-processing, feature extraction, image segmentation, and classification. Some methodology of machine learning and pattern recognition was introduced during the process of single-polarized SAR detection of oil (Keramitsoglou et al. 2006; Derrode and Mercier 2007; Topouzelis et al. 2007). Recently, a growing number of researchers have studied the use of multi-polarized SAR, including four polarization modes (HH/VV/HV/VH) (Migliaccio et al. 2009, 2011). Multi-polarized SAR can take advantage of comprehensive electromagnetic characteristics to identify energy and phase differences (Nunziata et al. 2011; Zhang et al. 2011; Minchew et al. 2012). However, the coverage, temporal resolution, and high cost of SAR limits their practical application for an oil spill. In addition, the speed and direction of the wind must be taken into account to detect oil spills (Espedal 1999; Solberg et al. 1999). However, the main difficulty of using SAR for oil spill detection is distinguishing mineral oil slicks from look-alikes (phenomena whose scatter characteristics are very similar to those of real oil spills), such as low wind areas, biogenic films, rain cells, oceanic internal waves, and atmospheric gravity waves (Grimaldi et al. 2011).

Recently, robust satellite techniques to monitor oil spills, which have evolved from robust AVHRR techniques, have been proposed. The robust satellite technique is an automatic monitoring method that regards the satellite images as a space–time process. The robust satellite technique uses long-term multi-temporal satellite records to obtain the characteristics of signals and develop an index to identify signal anomalies that discriminate oil spills (Casciello et al. 2007; Grimaldi et al. 2009, 2010). However, the results need to be confirmed by further analyses for different events and extended to different satellite platforms.

Visible optical imagery has rich band information. Landsat 5 TM/Landsat 7 ETM+ captured images of oil spill pollution in Brazil (Bentz and De Miranda 2001), the Arabian Gulf (Essa et al. 2005), and the Gulf of Mexico (MacDonald et al. 1993). With the development of high-resolution imagery, visible remote sensing has been used in the field of automatic oil spill detection and monitoring. The image features are explored and a particle swarm optimization algorithm or other algorithm is used to complete an object-oriented approach (Fan et al. 2014). However, these methods can only locate the presence of a known oil spill after an alert and require the presence of prior knowledge (Grimaldi et al. 2011).

This paper proposes an automatic and efficient method for image processing analysis of high/moderate-resolution remote sensing images aimed at oil spill location and monitoring on a large scale. First, a combination of bottom-up and top-down saliency detection was proposed with the objective of rapid location of oil spills. A modified graph-based visual saliency (GBVS) model and a spectral similarity match saliency model were jointly used to locate a marine oil spill anomaly using computer vision, while other interference targets were ruled out by spectra. Once the oil spill saliency has been calculated, the region of interest (ROI) can be rapidly located. Comparison and analysis of the ROI image segmentation was performed to achieve the goal of monitoring marine oil spills. The flow chart of the paper is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the proposed model

Materials and Methods

Dataset

In the present study, Landsat 8, GF-1, MAMS, and HJ-1 data were analyzed. The satellite product characteristics are shown in Tables 1, 2, 3 and 4. A 1566- × 1585-pixel Landsat 8 image was acquired on March 8, 2014, covering an area from 101.54°E to 103.63°E and 7.61°N to 9.7°N for the North Pacific Ocean. An 833- × 570-pixel GF-1 satellite image covering an oil spill in the Qingdao coastal area from 119.54°E to 121°E and 34°N to 37°N was acquired on November 26, 2013. Another experimental dataset was for Zhoushan airport (29.9 N, 122.3°E) using airborne MAMS data (280 × 130 pixels) on September 16, 2008. In addition, we chose the HJ-1 imagery as another dataset of oil detection experiment, including Penglai (hereinafter referred to as PL) oil spill on June 11, 2011, Deepwater Horizon (hereinafter referred to as DH) oil spill on April 20, 2010, and Dalian Xingang (hereinafter referred to as XG) oil spill accident on July 16, 2010. An 1414- × 847-PL oil spill imagery from 119°3′4″E −120°1′28″E, 38°1′1″N −38°4′38″N acquired on June 13, 2010. Deepwater Horizon oil spill dataset (4338 × 7272), covering from 87°7′28″W−88°6′19″W, 38°1′1″N−38°4′38″N, was acquired on May 12, 2010. And a dataset for Xingang oil spill using HJ-1 acquired on July 23, 2010 was from 121°4′26″E−122°49″E, 38°8′7″N−39°12″N, in 454 × 444 pixel. The remote sensing datasets are shown in Figs. 2, 3, 4, 5, 6 and 7.

Table 1 Landsat 8 (OLI)
Table 2 GF1 multispectral camera
Table 3 MAMS multispectral camera
Table 4 HJ-1 A/B CCD
Fig. 2
figure 2

Landsat 8 data

Fig. 3
figure 3

Gf-1 data

Fig. 4
figure 4

MAMS data

Fig. 5
figure 5

PL data

Fig. 6
figure 6

DH data

Fig. 7
figure 7

XG data

ROI Detection

Optical satellite imagery has the advantages of lower costs, rich band information, and large spatial observation scales, which form good datasets for monitoring oil spills. Although SAR is becoming open access, the amount of data are still limited, and it is still difficult to get multi-polarized SAR images. In large-scale optical images, the ROI of an oil spill can be too small to be detected, and most of the image is irrelevant to the target. This makes it time-consuming to match and recognize the target directly within the scope of a whole image. A method to target ROI positioning quickly and accurately from a high-resolution remote sensing image will play an important role in improving the efficiency of monitoring oil spills.

Research into visual psychology has shown that the human visual system is capable of confirming the ROIs before a detailed analysis of the visual scene. Computers can simulate the human visual attention mechanism using a specific algorithm for ROIs. ROI detection based on the visual attention mechanism uses low-level visual features of images, such as color, orientation, and brightness, to simulate the visual attention model and generate a saliency map. The different saliency represents the importance of a position within an image. A salient region in the image is highly important to human visual attention.

The saliency map is calculated in this paper where there are features, such as sea water, clouds, islands, ships, or oil. Most of these features (except for the sea) in the saliency map must be of local importance within the context of an ocean to attract visual attention. Shadows and waves do not form strong visual features. As described above, a GBVS model was modified to implement bottom-up saliency detection in this paper.

It is then essential to distinguish between oil and other significant features (including clouds, islands, and ships). Another saliency map based on the spectral similarity is proposed. A spectral similarity match model was used to measure the sample oil spectra (extracted from experiments or from the images) and the pixel spectra in the image. The regions of similar oil spectra are extracted and used to derive a simple frequency spectra saliency map. Thus, a spectral similarity match saliency map is created. This step is called top-down saliency detection.

Bottom-Up Saliency Map Using a Simplified GBVS Model

There are a number of ways to generate saliency maps. Center-surround mechanisms are modeled by primary visual cortex cells (Itti et al. 1998; Walther and Koch 2006). Context-aware saliency detection is based on the detection of backgrounds indicating the object (Jiang et al. 2011; Goferman et al. 2012). Frequency saliency detection includes a spectral residual approach and phase spectrum (Hou and Zhang 2007; Guo et al. 2008). Information maximization suggests that the rarity of significant features could be used to measure visual saliency, as the information is dynamic enough to attract attention (Bruce and Tsotsos 2005; Luo et al. 2012). Another method was based on graph structure, which can simulate the neuron connections in the cerebral cortex and use this to extract saliency. A GBVS model was proposed with better outcomes. The equilibrium state was used as a measurement for saliency by considering the saliency difference of each pair of Markov nodes globally and dynamically. The model was better adapted to the background but was rather fast to run. A typical GBVS model is shown as below.

Step 1. For the input image I, the color, brightness, and orientation (Gabor filters) channels are constructed by several low-pass filters and 1/2 down-sampling to obtain a multi-Gaussian pyramid. Each level is decomposed into red (R), green (G), blue (B), yellow (Y), brightness intensity (I), and local orientation (O) channels. From these channels, a center-surround feature map is calculated from the across-scale differences (Sun et al. 2010).

$$F_{{{\text{I}},{\text{C}},{\text{S}}}} = N\left( {\left| {I\left( C \right){ \ominus }I\left( S \right)} \right|} \right),$$
(1)
$$F_{\theta } = N\left( {\left| {O_{\theta } \left( C \right){ \ominus }O_{\theta } \left( S \right)} \right|} \right),$$
(2)
$$F_{{{\text{RG}},{\text{C}},{\text{S}}}} = N\left( {\left| {R\left( C \right){ \ominus }G\left( C \right) - \left( {R\left( S \right){ \ominus }G\left( S \right)} \right)} \right|} \right),$$
(3)
$$F_{{{\text{BY}},{\text{C}},{\text{S}}}} = N\left( {\left| {B\left( C \right){ \ominus }Y\left( C \right) - \left( {B\left( S \right){ \ominus }Y\left( S \right)} \right)} \right|} \right),$$
(4)

where \({ \ominus }\) denotes the difference of different scales between the center (C) and the surround (S) in the feature pyramid. θ refers to the orientation. N is a map normalization operator, which is used to normalize the values in the map, find the location of the map’s global maximum, and compute the average of all its other local maxima at last globally multiply the map (Itti et al. 1998).

Step 2. An activation map is generated from these feature maps. The dissimilarity of M (i, j) and M (p, q) of the feature map are defined as:

$$d\left( {\left( {i,j} \right)||\left( {p,q} \right)} \right) \triangleq \,\left| {\log \frac{{M\left( {i,j} \right)}}{{M\left( {p,q} \right)}}} \right|$$
(5)

The fully connected directed graph G A is obtained by connecting every node of the lattice M, labeling every node with all other n − 1 nodes. The weight of the edge from node (i, j) to node (p, q) is defined as below (Harel et al. 2006):

$$w\left( {\left( {i,j} \right),\left( {p,q} \right)} \right) \triangleq \,d(\left( {i,j} \right)||\left( {p,q} \right)*F\left( {i - p,j - q} \right)$$
(6)

where

$$F\left( {a,b} \right) \triangleq \exp \left( { - \frac{{a^{2} + b^{2} }}{{2\sigma^{2} }}} \right)$$
(7)

Here, σ is a free parameter of the algorithm.

Step 3. Markov transition matrix is defined on G A. A state vector with the same dimension is initialized randomly to the nodes (Dan et al. 2015). The equilibrium state vector is the activation measure of each node.

Step 4. Finally, GBVS map is generated by normalizing the activation map. A graph GA is constructed from A. The weight of the edge from node (i, j) to node (p, q) is defined below.

$$W\left( {\left( {i,j} \right),\left( {p,q} \right)} \right) \triangleq \,A(\left( {i,j} \right)||\left( {p,q} \right)*F\left( {i - p,j - q} \right)$$
(8)

Again, the resulting graph is treated as a Markov chain. The final saliency map is generated by computing the equilibrium state vector over the nodes of G A.

Here, the original GBVS has an orientation channel to detect saliency. For the detection of oil spills, the channel was modified to a texture-entropy channel. The distinction highlighted between the oil and water was the entropy, not orientation. The shape of the oil spill was fluid, and the channels of the feature map were simplified in a single and optimized scale to improve the processing speed significantly.

Top-Down Saliency Map Using a Spectral Similarity Match Model

The saliency region obtained through the model calculation above may not only include the possible oil spill but also ships, islands, or other salient interference features. This paper proposed a top-down saliency detection model that used the spectral similarity measure of the spectra between oil and the pixel in the image—the greater the similarity, the greater the possibility of oil. The region with a spectral feature similar to oil was extracted using this method, so a second saliency map of the spectral similarity match was generated.

There are two main categories of measures of spectral similarity: uncertainty measures and randomness measures. The geometric measures (spectral distance (Wei et al. 2000), spectral angle (Yuhas et al. 1992), and spectral polygon) and the encoding measures (binary encoding (Paola and Schowengerdt 1995) and quad-encoding) belong to the former, whereas the latter includes spectral information divergence (Chang 2000) and the correlation coefficient measure. However, theoretical analysis and experimental results have shown that spectral similarity cannot be characterized by a single index, and we need to consider a comprehensive spectral similarity measure to identify differences between spectra.

This paper adopted the spectral pan-similarity measure (SPM), which integrated the magnitude of spectral vector, the shape of spectra, and the information of the spectra based on spectral distance, the correlation coefficient, and the relative entropy (Shu and Gong 2011). The smaller the SPM value, the more similar the spectra. Suppose r i = (r i1, r i2, r i3,···, r iN,)T were the oil spectra from the image or from the measured data, and r j = (r j1, r j2, r j3, ···, r jN,)T were the pixel spectra extracted in the image, where N denoted the band, and r ik denoted the value of the k band.

  1. (a)

    Spectral vector

This refers to the geometrical distance between two spectra. The Euclidean distance characterizes the difference between the spectra vectors (Granahan and Sweet 2001):

$${\text{SBD}}\left( {r_{i} ,r_{j} } \right) = \sqrt {\frac{1}{N}} {\text{ED}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left( {r_{ik} - r_{jk} } \right)^{2} } ,$$
(9)

where N denotes the spectral dimension. A smaller SBD reflects similar spectra, with a range from 0 to 1.

  1. (b)

    Spectral shape

The spectral shape (SSD) can be measured by the Pearson correlation coefficient (Sweet 2003):

$${\text{SSD}}\left( {r_{i} ,r_{j} } \right) = \left( {\frac{{1 - {\text{SCM}}\left( {r_{i} ,r_{j} } \right)}}{2}} \right)^{2} ,$$
(10)

where SCM is the Pearson correlation coefficient ranging from −1 to 1, SSD ranges from 0 to 1. The greater the absolute value of SCM and the smaller the SSD, the more similar a spectral shape is obtained.

$${\text{SCM}}\left( {r_{i} ,r_{j} } \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{N} \left( {r_{ik} - \bar{r}_{i} } \right)\left( {r_{jk} - \bar{r}_{j} } \right)}}{{\left( {\mathop \sum \nolimits_{k = 1}^{N} \left( {r_{ik} - \bar{r}_{i} } \right)^{2} } \right)^{1/2} \left( {\mathop \sum \nolimits_{k = 1}^{N} \left( {r_{jk} - \bar{r}_{j} } \right)^{2} } \right)^{1/2} }}$$
(11)
  1. (c)

    Spectral information divergence (SID)

SID represents the relative entropy between the spectra using the KL distance calculation (Chang 2003). The smaller the SID, the more similar the spectral information:

$${\text{SID}}\left( {r_{i} ,r_{j} } \right) = D(r_{i} ||r_{ij} ) + D(r_{j} ||r_{i} ),$$
(12)

where D denotes the relative entropy.

  1. (d)

    Spectral pan-similarity measure (SPM)

SPM can be defined as shown below—the smaller the SPM, the more similar the spectra.

$${\text{SPM}}\left( {r_{i} ,r_{j} } \right) = {\text{SID}}\left( {r_{i} ,r_{j} } \right) \times \tan \left( {\sqrt {{\text{SBD}}\left( {r_{i} ,r_{j} } \right)^{2} + {\text{SSD}}\left( {r_{i} ,r_{j} } \right)^{2} } } \right)$$
(13)

The procedure of spectral similarity match model is as follows.

  • Step 1. Get the average spectral vector of the oil extracted from the image as the reference spectra r i = (r i1, r i2, r i3,···, r iN,)T.

  • Step 2. Calculate the SPM of the reference spectra and the spectra extracted from each pixel, mark the calculation as C.

  • Step 3. Define an appropriate threshold to extract the regions that have similar spectra to oil.

  • Step 4. Use the simple frequency-based saliency detection to produce the top-down saliency map.

Fusion of Saliency Maps

The gray value of the two saliency maps is multiplied point by point and normalized to form the final saliency map. A seed point with a maximum saliency value is selected to start the four-neighborhood region growing. Finally, the ROI with a possible oil spill is extracted and further analyzed.

Image Segmentation

After the regions of interest detection, a neural network using a genetic algorithm was used to do further accurate oil extraction. In most cases, the neural network classifier has proved its superiority to traditional classifiers. What’s more, the global stochastic optimization ability of genetic algorithm could optimize neural network. In return, the neural network could promote evolution of genetic algorithm (Liu et al. 2004).

In this paper, the genetic algorithm was firstly utilized to optimize the initial weights of neural network to deal with the segmentation problem. A better search space could be located in the solution space; then, the algorithm was used to search the optimal solution in these tiny little solution spaces. In fact, the whole network was divided into two parts firstly the genetic algorithm was utilized to optimize the initial weights of neural network and then the BP algorithm was utilized to finish the network training.

The flow of combining neural network and GA for classification is as follows.

  • Step 1. The initial neural network is established. The topological structure of the neural network is determined to define the important parameters of the network, such as network weights, bias, generations and initial population size, crossover probability, and mutation probability.

  • Step 2. To encode the solution space, each string represents a solution for the solution space. The initial population is formed according to the initial and random genetic algorithm individuals.

  • Step 3. Each individual (genotype) is decoded and put into the neural network. The output is then converted to the corresponding adaptive value, and each set of the connection weights is evaluated by constructing the corresponding neural network and computing the total mean square error between the actual and target outputs.

  • Step 4. According to individual fitness, the individuals are used for selection, crossover, and mutation.

  • Step 5. A new generation of groups is generated, a new generation of individuals in the group is decoded, the new adaptive value is computed, and if the optimal solution has met the conditions, the algorithm is ended. Otherwise, step 4 is repeated.

  • Step 6. Repeat steps 3–5. One generation will evolve into another generation until the biggest evolution algebra reached to the original biggest evolution settings.

  • Step 7. Solutions are chosen from the biggest evolutionary population to complete the training level.

  • Step 8. The evolutionary solutions serve as the initial solution, and the connection weights and bias are set to the corresponding gene segments in turn. The fitness function is computed and compared again to obtain a reliable neural network.

Results and Discussion

The proposed method was tested on the remote sensing image dataset. Two scenes of each data source would be used to verify the effectiveness of the method for oil spill detection. One included oil spill and another did not. The oil scenes may be composed of ocean, haze, clouds, islands, ships, shadows, etc.

The hardware configuration of the test was Intel® Core™ i5-4590 CPU @ 3.30 GHz 3.30 GHz, with 8 GB of RAM. The software Matlab R2010b was running in Windows 7 environment.

Bottom-up saliency map using a simplified GBVS model results

The saliency maps were calculated as below. The high saliency value means the greater information. It was shown that because of the haze in the image of Landsat 8, the GBVS oil saliency detection was not very accurate (Fig. 8). The oil spill area saliency could be detected to a certain extent with a low value. Therefore, it was necessary to perform further saliency detection to exclude cloud and haze. The GF-1 GBVS oil saliency detection was accurate due to the high quality of this image (Fig. 9). The oil spill beside the shore has a high saliency value, and thus coastal waters were not a challenge for this model. However, some ships or human-made objects also contributed to the saliency. The GBVS saliency map of MAMS data showed a clear oil region, whether the low reflectivity of the thin film region or high reflectivity of the thick oil film area (Fig. 10). The GBVS saliency map of PL data could clearly show high value of the oil region. However, the sun glitter in the left center also showed a little higher saliency (Fig. 11). The oil spill region in the DH image was large enough, so the saliency was detected very well. Also, the cloud in the top left corner also had high saliency (Fig. 12). In the GBVS saliency map of XG, the shore and the clouds contributed a high value. Because of the thin thickness, the oil in this image contributed low saliency (Fig. 13).

Fig. 8
figure 8

GBVS (Landsat 8)

Fig. 9
figure 9

GBVS (GF-1)

Fig. 10
figure 10

GBVS (MAMS)

Fig. 11
figure 11

GBVS (PL)

Fig. 12
figure 12

GBVS (DH)

Fig. 13
figure 13

GBVS (XG)

Top-down saliency map using a spectral similarity match model results

We compared various top-down saliency models to identify the detection effect, including a SID model, spectral angle model, and Euclidean distance model. The saliency map was transformed to a binary map and compared with the human interaction interpretation results to obtain the recognition rate and false-alarm rate. The performance of the models is shown in Table 5.

Table 5 Comparision of different models

Our proposed model has advantages over the alternative models. For example, the Euclidean distance model tended to mix water and oil. The SPM performed better in identifying oil spills on a large scale than the SID and spectral angle models. Moreover, the SPM model could be improved if the type of oil spill was known and a spectral sample of the oil spill was collected for comparison. Our model made the detection of oil spills rapid and precise.

The computation of the spectral similarity match model demonstrated that the SPM model was robust. Only a little light cloud was wrongly highlighted in the Landsat 8 image (Fig. 14). The weakness of the GF-1 SPM output was a misclassification of a wide area of dark water. The GBVS model could make up the shortage of SPM model, as the light cloud, shadow, and water were not with high visual saliency (Fig. 15). The SPM model of MAMS detected almost the whole outline of the oil spill, although some sea water with a low GBVS value also was detected (Fig. 16). The SPM model of PL detected most oil spill, but the sun glitter in the left and right bottom corner showed much saliency. GBVS model result could almost complement the SPM model (Fig. 17). As shown Fig. 18, the oil spill in DH could be almost detected because of the thick oil, but there also was some thick cloud with high SPM saliency in the edge of the image. Compared with the GBVS model, the cloud with high GBVS model just had low value with SPM. The XG oil spill detection was almost completely correct. Because the oil was thin in this image, there was a big difference between the cloud and oil (Fig. 19).

Fig. 14
figure 14

SPM saliency map (Landsat 8)

Fig. 15
figure 15

SPM saliency map (GF-1)

Fig. 16
figure 16

SPM saliency map (MAMS)

Fig. 17
figure 17

SPM saliency map (PL)

Fig. 18
figure 18

SPM saliency map (DH)

Fig. 19
figure 19

SPM saliency map (XG)

The dataset without oil spills may have high GBVS saliency, but SPM saliency value was almost zero. So the fusion of saliency should not be detected. Because of the limitation of length, there were no more tautology results without oil spill.

ROI results

After the fusion of saliency maps, the ROI detection by growing a four-neighborhood region was completed as shown in Figs. 20, 21, 22, 23, 24 and 25. We also tested an image where there was no oil spill, and the detection result was correct. The whole processing time of the process is listed in Table 6. Despite the very large size of the HJ-1 image for the DH spill, the processing time was only 20 min. The fastest processing time was for the MAMS image at 7.6 s.

Fig. 20
figure 20

ROI (Landsat 8)

Fig. 21
figure 21

ROI (GF-1)

Fig. 22
figure 22

ROI (MAMS)

Fig. 23
figure 23

ROI (PL)

Fig. 24
figure 24

ROI (DH)

Fig. 25
figure 25

ROI (XG)

Table 6 Processing time

Genetic neural network segmentation results

Finally, based on the ROI, a genetic neural network was used to compute image segmentation and complete the extraction of the oil spill. The genetic neural network method was compared with fuzzy clustering (Yao et al. 2013; Chuang et al. 2006) and a level set based on the active contours method (Zhang et al. 2010). Kappa index and overall accuracy were regarded as the quantitative evaluation criterion. Although compared with the fuzzy clustering, the genetic neural network method was slower. Taken the accuracy and runtime together, it has more precise segmentation and more time savings (Table 7). We chose the genetic neural network to complete the image segmentation.

Table 7 Accuracy of classification

Sample data acquisition

Taking the images of gf-1, for example, 30 oil film feature areas and 30 non-oil film feature areas were selected as training samples, of which each area size was 15*15. If those region’s four band average values were calculated, a total 60 sets of 60*4 dimensional matrix could be the input of the neural network, so every training sample has 4 elements. Each feature included 20 test samples. If the network error of training samples were very small, but the network error of the test samples were very large, then the network generalization ability was poor and need to be retrained. If both network errors were very small, the network training was successful and could be used for classification.

Genetic algorithm evolution of network structure and weight

A three-layer BP neural network was utilized in this paper: the input layer neuron number was four, the output layer neuron number was two, and the hidden layer was determined to be five through experimentation.

It could be determined by the empirical formula 14.

$$m = \sqrt {n_{1} + n_{2} } + a$$
(14)

Wherein m is hidden layer node; n 1 is input layer node; n 2 is output layer node; a is adjustable parameters, its value is between 1 and 10.

Change the value of the adjustable parameter a, to change the value of m. Taking same samples for training, the optimal hidden layer node could be found when the neural network error was the minimum.

In this study, GA evolution algorithm was utilized to find the optimal hidden layer node when the neural network error was the minimum. The following neural network, with hidden layer node number was five, was taken as example to illustrate the specific process of GA evolution.

The fitness values of different number of hidden layer neuron were calculated by genetic algorithm (Table 8).

Table 8 Fitness values of different hidden layer neuron numbers

The smaller the fitness value was, the better the individual was. It is shown in the upper table that the fitness value was the minimum when the hidden layer neuron number was 5. So, the corresponding network weights were selected as the network initial weight matrix.

For the neural network algorithm, the learning rate was 0.01, learning rate incremental was set to 1.20, and the training performance was set to 0.02669. When the hidden layer neuron number was 5, the value of population number, fitness ratio, selection parameter, regeneration parameter, mutation parameter, intersection parameter were shown in Table 9. The algorithm was trained by 100 generations.

Table 9 Parameters of the genetic algorithm

Image segmentation results

Input training sample pair to the network, and set the network weights achieved by GA evolution as the initial weights. Read the images and finally get the pixel matrix of the image. Then, the input vector was obtained by the image matrix. Using the training GA neural network to train the input vector, the final output vector was the image classification result.

The genetic neural network computations are shown in Figs. 26, 27, 28, 29, 30 and 31. Overall, the classification result showed the genetic neural network classified the oil and water correctly and quickly. Although the patch number and the fragmentation index were quite large, the oil extraction of MAMS had low accuracy because of different oil film thickness. Also, the fragmentation degree of XG imagery was a little big. The overall accuracy of network segmentation also was acceptable.

Fig. 26
figure 26

Oil extraction (Landsat 8)

Fig. 27
figure 27

Oil extraction (GF-1)

Fig. 28
figure 28

Oil extraction (MAMS)

Fig. 29
figure 29

Oil extraction (PL)

Fig. 30
figure 30

Oil extraction (DH)

Fig. 31
figure 31

Oil extraction (XG)

Conclusions

It is important to acquire rapid location information of oil spills for a wide range of remote sensing images following an accident to take urgent remedial measures. In the present study, the ROI was detected rapidly based on the remote sensing characteristics. Both the visual spatial information and spectral information were used. After several comparison tasks, a bottom-up saliency map-simplified GBVS model and a top-down saliency map-spectral similarity match model were built to detect the ROI rapidly. This method should be applicable using hyperspectral image detection. Finally, image segmentation was used to extract the exact extent of the oil spill. In this way, marine oil spill monitoring can be achieved successfully and efficiently.

The best overall accuracy similar oil spill detection approach was 87.41% (Fan et al. 2014). The proposed methodology presents even better classification results. The main disadvantage of the method developed is that significant computational time is required for ROI processing (Karathanassi et al. 2006). The whole processing time for a 4338 × 7272 pixel HJ-1 imagery was 1112.28 s.

Further research on this issue may be validation of the method on more images with various sea states and thickness of oil spills. Moreover, the other information of imagery (shape, texture, etc.) should be used to improve the methodology.