1 Introduction

In the last years, the cloud services market experienced an extremely rapid growth, as reported in several market research reports that may lead to severe scalability problems (Christian et al. 2013). In recent decades, digital images are becoming increasingly important in healthcare environment, since they are more and more used in medical and medical-related applications (Castiglione et al. 2015). With the development of the remote-sensing imaging technology, the use of hyperspectral image is becoming more and more widespread. Recently, ambient intelligence for remote sensing images has obtained more attention (Benavente-Peces et al. 2014). Cerezo et al. (2012) dealt with the facial emotion recognition and multimodal fusion of affective information coming from different channels. Due to the dense sampling of spectral signatures of land covers, hyperspectral images have better discrimination among similar ground cover classes than traditional multispectral scanners. At the same time, these images are usually composed of tens of or hundreds of spectral bands with high redundancy and great amount of computation in perspectral image classification. Therefore the most important and urgent issue is how to reduce the number of those bands largely with little loss of information or classification accuracy (Wu et al. 2010). Band selection for hyperspectral image is the process to reduce the band size and identify the most informative bands or further analysis on the hyperspectral image data. Like the feature selection problem, band selection is NP-hard with only explication enumeration approaches known to solve it (Feng et al. 2014). Bands selection techniques generally involve both searching algorithm and defining the bands range. The search algorithm generates some useful features by taking combinations of the bands, those features are reasonably located in the highdimensional feature sets, which reduced the dimensionality on a large scale. Then define the bands range according to the generated features methods  (Liu et al. 2010). The optimal band selection of hyperspectral data is an important method to overcome the problem of massive hyperspectral data and improve its processing speed. In the fourth part of section three in this paper, the enumeration method is adopted to get the best three band selection in the studying area of the sampling, according to each evaluation method, the optimal three band combination of the evaluation index is given respectively, so that it can obtain the hyperspectral data imaging in three band and be convenient for visual interpretation in the process of practical application (Huang and He 2005). But there are serious drawbacks on visual interpretation in the process of practical application because of the huge amount of hyperspectral data, and only three band selection doesn’t necessarily get more effective classification for hyperspectral data, so this article uses the intelligent algorithm to band selection of hyperspectral data. Differential evolution (DE) algorithm has a good advantage of solving combinatorial optimization problems, therefore, the mentioned algorithm can fully applied to the study of hyperspectral data dimension reduction. This paper puts forward an hyperspectral data dimension reduction model which is based on improved DE algorithm: First, the ENVI software is applied to the preprocessing of original hyperspectral data (including atmospheric correction, radiation correction and geometry correction, etc.) and subspace partition. Second, the idea of Yin-Yang initialization population is introduced to improve the DE algorithm, and then we respectively treat the proposed band selection method suggested in the fourth part of section three as the fitness function value of the DE algorithm to compute the processed data, so we get the band combination of different dimensional sizes. Finally, we obtain the classification accuracy after measuring the stand or fall of band combination though.

In this paper, the main content of the research is a dimension reduction method used in hyperspectral remote sensing image processing. Dimension reduction can be viewed as a transformation from a high order dimension to a low order dimension (Kitti et al. 2012). Firstly, we apply the ENVI software in the original hyperspectral data preprocessing (including atmospheric correction, radiation correction and geometry correction, etc.), so as to get the statistical data of each image and category in different band. And then, dimension reduction model is applied to conduct the experiment over the data and obtain the optimal band combination which can effectively distinguish the main features in the image, and realize the dimension reduction of hyperspectral data.

Our motivation in this paper is use a simple but efficient method to solve a typical combinatorial optimization problem. DE as a very effective novel evolutional algorithm in global search, its genetic operation not only guarantee the results would not fall into local optimal, but also enhance the populations diversity, that it to say, we can search all the possible bands, meanwhile, get the best band combination.

2 The principle and improvement of DE algorithm

Differential evolution (DE) algorithm, proposed by Storn and Price (1997), is a simple, yet powerful population-based stochastic search technique for solving global optimization problems in continuous search domain. DE generates offspring by perturbing the solutions with a scaled difference of selected population vectors, and employs parent offspring comparison which allows replacement of a parent only if the offspring is better or equivalent to its parent  (Zhao et al. 2011). Recently, DE has become one of the most widely used methods for handling global optimization problems. Furthermore, DE has been successfully applied in many science and engineering fields, such as pattern recognition, signal processing, satellite communications, vehicle routing problem, and so on (Peng et al. 2015). It adopts the real-code, the basic thought of DE is to obtain the intermediate population through recombinant by making use of the difference between individuals in population, and the new generation of population through the competition between parents and offsprings. Since there is no requirement for initial value, DE gets the characteristics of fast convergence, simple structure and robustness, etc (Zhang et al. 2010). DE algorithm uses the individual difference of the current population to generate intermediate population by reorganization, comparing the individual adaptive values of father with that hybrid of son to generate the next generation through competition. Compared with Genetic Algorithm, the most important part of DE algorithm is its mutation. The general process of DE algorithm is as follows: the first step is to select an individual. The variation of individual is produced by adding another two individual weighted differences. Then it is important to get the candidate individuals by Hybrid operation between individuals and parent individuals. The last step is to compare the adaptive values of the candidate individuals with that of the parent individuals to select the superiors into the next generation group. In this way, the algorithm uses differential variations, hybridization and selection to keep the group evolving ongoing until the termination conditions. At the beginning of the algorithm iteration, due to the large differences among the population individuals, the mutation makes the algorithm has a strong global searching ability. In the later period of iteration, the difference among population individuals is small, so the algorithm has stronger local search ability. DE algorithm has a good effect on solving function optimization problems such as high dimension, multi-peak, nonlinear, etc. The main characteristic of DE algorithm can be summarized as follows: (1) Simple structure; (2) High convergence rate; (3) Low probability of local optimum; (4) Strong robustness.

The algorithm is a kind of evolutionary algorithm which is used to solve the real optimization problems originally. At present, it is used to solve the discrete optimization problems and it is widely used in the areas of data mining, pattern recognition and artificial neural network, digital filtering, chemical, mechanical optimization design (Chen et al. 2009).

2.1 The basic operation of DE algorithm

DE algorithm is a kind of evolutionary algorithm based on real number encoding whose structural framework is similar to other evolutionary algorithms. For example the initialization of population, individual fitness evaluation, and the process from one generation to the next population should go through the operation mutation, crossover, selection and so on (Liu et al. 2007).

  1. 1.

    Determine the parameters

    Algorithms with suitable parameters can get good solution, while unsuitable parameters can only get bad solution. DE is also sensitive to the parameters setting. Choosing suitable parameter values is, frequently, a problem-dependent task and requires previous experience of the user (Yang et al. 2010). DE algorithm mainly involves the following four parameters: (1) population size N; (2) the individual dimensions D (namely, the length of the chromosome); (3) variants F; (4) crossover probability CR.

    In general, these parameters affect the algorithm to search the optimal solution and the convergence speed, so the setting of parameters is of great significance for the performance of the algorithm. The influence of each parameter on the algorithm performance can be briefly summarized as the following (Chen et al. 2010):

    1. 1.

      The influence of population size N: A larger population is more diversified, which can increase the likelihood of search for the optimal solution, but at the same time, decline convergence rate; on the contrary, for the smaller group, the search space is less, speeding up the convergence speed of the algorithm, but it is easy to cause local convergence or stagnation of evolution algorithm.

    2. 2.

      The influence of the individual dimensions D: when D value is small, the calculation process is simple but it is not conducive to individual evolution; When D value is large, though conducive to get the optimal solution of individual evolution, the calculation is complicated.

    3. 3.

      The influence of variant F: Small F value will speed up the convergence, but it is easy to make the algorithm fall into local convergence, so we should avoid setting smaller F value; Larger F will increase the possibility of the algorithm jumping out of local optimal solution, but when F  > 1, the algorithm convergence rate will decline.

    4. 4.

      The influence of crossover probability CR: we generally choose CR between zero and one, in general, the greater the CR, the faster convergence speed, but it is easy to fall into local optimum.

    Research showed that, population size N is generally between 5 to 10 D (determine the variable dimension of D based on the practical problem), variation factor F is generally between zero and two, generally take F = 0.5, crossover probability CR is generally between zero and one, generally take CR = 0.3. But when solving practical problems, we select the appropriate parameters after specific analysis.

  2. 2.

    Produce the initial population

    Randomly generated N chromosomes meet the constraint conditions in D dimension space, the specific process is as follows:

    $$\begin{aligned} x_{ij}=rand(0,1)\times(x_{j}^{U}-x_{j}^{L})+x_{j}^{L}(1 \, \le \, i \, \le \, N,\,1 \, \le \, j \, \le \, D) \end{aligned}$$
    (1)

    while \(x_{j}^{U}\) and \(x_{j}^{L}\) are respectively the upper and lower bounds of the first j variable, rand(0, 1) return the random number in [0,1].

  3. 3.

    Mutation

    In the sight of individual variation selection, mutation operation involves the following three ways:

    1. 1.

      The rand type (randomly selected individual variation): for individuals \(x_{r1 }(1\le r_{1}\le N) \,\) in the group, the new individual \(x^{\prime}_{r1 }(1\le r_{1}\le N)\) generated meets the formula below:

      $$\begin{aligned} x^{\prime}_{r1 }(1\le r_{1}\le N)=x_{r_{1} }(1\le r_{1}\le N)+F \cdot (x_{r_{2}}-x_{r3}) \end{aligned}$$
      (2)

      while \(r_{2},r_{3}\in [1,\,N],r_{1}\ne r_{2}\ne r_{3}\), and \(F>0\) as the scale factor.

    2. 2.

      The best type (select the optimal individual for variation): for the best individual \(x_{best} (1\le r_{1}\le N)\,\) in the group, the new individual \(x^{\prime}_{r_{1}}(1\le r_{1}\le N) \,\) generated meet the formula below:

      $$\begin{aligned} x^{\prime}_{r_{1}}=x_{best}+F\cdot (x_{r_{2}}-x_{r_{3}}) \end{aligned}$$
      (3)

      while \(r_{2},\,r_{3}\in [1,\,N],r_{1}\ne r_{2}\ne r_{3}\), and \(F>0\) as the scale factor. \(x_{best}\) was the best individual generated in current.

    3. 3.

      The rand-to-best type (the compromise method): for individuals \(x_{best}(1\le r_{1}\le N)\) in the group, the new individual \(x^{\prime}_{r_{1}}(1\le r_{1}\le N)\) generated meet the formula below:

      $$\begin{aligned} x^{\prime}_{r_{1}}=x_{r_{1}}=\lambda \cdot(x_{best}-x_{best})+F\cdot(x_{r_{2}}-x_{r_{3}}) \end{aligned}$$
      (4)

      while \(r_{2},r_{3}\in [1,\,N],r_{1}\ne r_{2}\ne r_{3}\), and \(F>0\) as the scale factor. \(x_{best}\) was the best individual generated currently, \(\lambda\) was an added control variable set the same value with F generally. In view of the number of difference vector, it can also be subdivided into one weighted difference or two weighted difference. The above formula (2), (3) and (4) are the mutation in the case of one weighted difference. Below is the mutation in the case of two weight:

      $$\begin{aligned} x^{\prime}_{r_{1}}=x_{r_{1}}+F\cdot (x_{r_{2}}-x_{r_{3}})+F\cdot (x_{r_{4}}-x_{r_{5}}) \end{aligned}$$
      (5)

      while \(r_{2}, r_{3},r_{4},r_{5}\in [1,\,N]\), \(r_{1}\ne r_{2}\ne r_{3}\ne r_{4}\ne r_{5}\), and \(F>0\) as the scale factor.

  4. 4.

    The crossover operation

    Crossover operation occurs among individuals \(x_{r_{1}}(1\le r_{1}\le N)\) in the group and the new individuals \(x^{\prime}_{r_{1}}(1\le r_{1}\le N)\) generated by mutation, the offspring of candidate individuals produced by crossover operation between \(x_{r_{1}}\) and \(x^{\prime}_{r_{r1}}\), individual \(x_{r_{1}}\) or \(\nu\) determined to be reserved to the next generation by selecting operation (Alatas et al. 2008).

    Crossover operation mainly has two forms: the bin type and the exp type, Fig. 1 clearly describes these two approaches and the differences between them. The cross of two approaches on a particular gene can be expressed as the following formula:

    $$\begin{aligned} \nu _{j}= \left\{ \begin{array}{ll} x^{\prime}_{r_{1}},&\quad j, \, rand(0,1)\le CR\\ x_{r_{1}},&\quad j, \, rand(0,1)>CR \end{array} \right. \end{aligned}$$
    (6)

    while rand(0, 1) returns the random number in [0, 1], CR is the crossover probability \(0\le CR\le 1.\)

    The main difference between bin type and bxp type is that the each gene in the cross of bin type is independent. Namely, genes of individual \(x_{r_{1}}\) and individual \(x^{\prime}_{r_{1}}\) are inherited to individual \(\nu\) in a certain probability. But the exp type randomly chooses an initial point, continuing to produce the corresponding random decimal for each following gene. If the values are less than or equal to CR, inheriting the corresponding genes of individual \(x^{\prime}_{r_{1}}\) to individual \(\nu\), and continuing to do the similar operation for the next gene, until the random decimal is greater than CR. The other genes are genetic from \(x_{r_{1}}\). After concluding different mutation and crossover operations, we know it can form different patterns of DE algorithm, expressed as follows: DE/x/y/z.

    Among them, x is the way of choosing individual variation, it can be selected randomly, or choose the best one from the current generation. y represents the number of difference vectors (one weighted difference or two weighted difference). z represents the way of cross.

    x may have three kinds of circumstances: randbestrand-to-best. y may have two kinds of circumstances: 1, 2. z may have 3 kinds of circumstances: bin(binary)exp(exponential).

    A variety of mutation and crossover patterns can be constituted after combinating xyz, such as:

    $$\text{Scheme DE/rand/1/bin: }\, x^{\prime}_{r_{1}}=x_{r_{1}}+F\cdot (x^{\prime}_{r_{2}}-x^{\prime}_{r_{3}})$$
    $$\text{Scheme DE/best/2/bin: }\, x^{\prime}_{r_{1}}=x_{best}+ F\cdot (x^{\prime}_{r_{2}}-x^{\prime}_{r_{3}})+F\cdot (x^{\prime}_{r_{4}}-x^{\prime}_{r_{5}})$$
    $$\text{Scheme DE/rand-to best/1/bin:}\, x^{\prime}_{r_{1}}=x_{r_{1}}+\lambda \cdot (x_{best}-x_{r_{1}})+F\cdot (x^{\prime}_{r_{2}}-x^{\prime}_{r_{3}}).$$
  5. 5.

    Select operation

    A new individual \(\nu\) is generated after crossover and mutation operation. According to the size of objective function values, an individual chosen from \(x_{r_{1}}\) and \(\nu\) should be inherited to the next generation. The following (taking getting minimum value of function as an example):

    $$x_{r_{1}}= \left\{ \begin{array}{ll} x_{r_{1}}, &\quad f(x_{r_{1}})\le f(\nu )\\\nu,&\quad else \end{array} \right.$$
    (7)
Fig. 1
figure 1

The contrast of the two kinds of crossover operations. a The bin type; b The exp type; the three chromosomes showed in each figure from left to right respectively are \(x_{r_{1}}\), \(\nu\), \(x^{\prime}_{r_{1}}\)

2.2 The basic steps of DE algorithm

Steps of basic DE algorithm:

  • Step1: Set the variation factor F, crossover probability CR and the maximum number of iterations \(g_{max}\);

  • Step2: Initialize the population \(p(N \times D)\), set the number of iterations \(g=1\);

  • Step3: Terminate the loop when \(g=g_{max}\), otherwise, continue to do the next;

  • Step4: Perform the following operations for every individual \(x_{r_{1}}(1\le r_{1} \le 1)\) in current population;

    • Step4.1: Perform the mutation for every gene of individual \(x_{r_{1}}\) in a certain way, producing individual \(x^{\prime}_{r_{1}}\);

    • Step4.2: Crossover operation was carried out on the individual \(x_{r_{1}}\) and \(x^{\prime}_{r_{1}}\), insuring their genes are inherited to a new individual in a certain probability;

    • Step4.3: Perform selection operation on \(x_{r_{1}}\) and \(\nu\), the individual with high fitness values heredity to the next generation;

  • Step5: Add 1 to the number of iterations g, return to step3.

2.3 The improvement of DE algorithm

The ideas of Yin-Yang initialization is to create a mirror image for each individual at the center of the decision space for symmetric point, choosing a better one from the individual and its mirror image, so that it is more close to the global optimal solution. Through a filter for each individual, it will make the initial population more closely to the global optimal solution. In fact, the mirror image of each individual is more excellent than the original individuals in a probability of 50 %, so the initialization of Yin-Yang will accelerate convergence of the population (Das et al. 2008).

For any individual \(p(x_{1},x_{2},\,\dots ,\,x_{n})\) in population pop(t), assuming each component of the individual satisfy conditions \(x_{i}\in [a_{i},b_{i}](i=1,2,\,\dots ,\,n)\), the individual p in its decision-making space exists only one mirror individual \(\tilde{P}=(\tilde{x}_{1},\tilde{x}_{2},\,\dots ,\,\tilde{x}_{n})\), while \(\tilde{x}_{i}=a_{i}+b_{i}-x_{i}(i=1,2,\,\dots ,\,n)\).

Assuming that f(x) is the target problem which needs to find a minimum. Steps of Yin -Yang initialization are as follows:

  • Step1: Randomly generating a population \(pop(t)=\{x^{1},x^{2},\,\dots ,\,x^{N}\}\) in size N, while \(x^{i}=(x_{1}^{i},x_{2}^{i},\,\dots ,\,x_{n}^{i})(i=1,2,\,\dots ,\,N)\), and n is the dimension of decision space;

  • Step2: Calculating the mirror \(\tilde{x}^{i}=(\tilde{x}_{1}^{i},\tilde{x}_{2}^{i},\,\dots ,\,\tilde{x}_{n}^{i})\) for each individual \(x_{i}=(x_{1}^{i},x_{2}^{i},\,\dots ,\,x_{n}^{i})\) in population, if \(f(\tilde{x}_{i})\le f(x_{i})\);

After the above steps, the Yin-Yang initial population in single objective optimization problem is obtained. In the process of algorithm iteration, it can also introduce the initialization of Yin-Yang, the steps are as follows:

  • Step1: Randomly generating a population \(pop(t)=\{x^{1},x^{2},\,\dots ,\,x^{N}\}\) in size N, while \(x^{i}=(x_{1}^{i},x_{2}^{i},\,\dots ,\,x_{n}^{i})(i=1,2,\,\dots ,\,N)\), and n is the dimension of decision space;

  • Step2: Calculating the mirror \(\tilde{x}^{i}=(\tilde{x}_{1}^{i},\tilde{x}_{2}^{i},\,\dots ,\,\tilde{x}_{n}^{i})\) for each individual \(x_{i}=(x_{1}^{i},x_{2}^{i},\,\dots ,\,x_{n}^{i})\) in population, and forming a mirror populations \(\tilde{pop}(t)=\{\tilde{x}_{1},\tilde{x}_{2},\,\dots ,\,\tilde{x}_{n}\}\);

  • Step3: Merging the two groups, that is \(pop(t)\cup \tilde{pop}(t)\), and sorting based on fitness function values, the first N individuals are genetic to the next generations.

In the process of practical application, a certain probability can be set to operate the mirror populations of each generation selectively, in order to improve the executive efficiency of the algorithm.

3 Experimental methods and results analysis

In this paper, the improved DE algorithm is adopted to search the optimal band combination, and ten different wavelength selection methods are selected as an evaluation criterion. comparing and analyzing the effects that whether using subspace division selected band combination on classification accuracy.

3.1 Encoding and decoding of the algorithm

DE algorithm has a fixed number of genes in the process of operation, namely, the band number of the band subsets. But the number of individual genes of DE algorithm can be modified before operation, and the selected number of band combination can be changed flexibly.

DE algorithm codes in decimal real number. Each individual (that is, the band subset) with m genes (that is, band) and each gene value is between 0 and 1. Taking band 20 for example, supposing that there are a total band number is 166 after preprocessing, the genes value of band 20 is 20/166 = 0.120482. Also, the finally optimal solution obtained by DE algorithm is expressed in real number, so the decoding for the optimum solution of real number encoding is needed, and the decoding process is opposite to the encoding. For example, the best individual gets a gene value as 0.120482, the corresponding band number is 0.120482 × 166 = 20. But for the unification with the original data band number (242 band), the band number must also be converted into one of the 242 band, band used in this paper all have been transformated.

3.2 Fitness function of the algorithm

DE algorithm adopts fitness equation to calculate the fitness of each individual, determining the probability of inheritance to the next generation according to the values of the fitness. The bigger the fitness, the greater the probability of inheritance to the next generation is. This paper adopts the eigenvalue of covariance matrix, optimal index, discrete degree, Bhattacharyya distance, mahalanobis distance to calculate and evaluate the fitness, and sets the parameters of DE algorithm as follows: genetic iterations is 200, population size is 30, crossover probability is 0.3 and sacling factor is 0.5, adopt Rand1Bin mutation strategy.

3.3 The data source in research area

The experimental data in this paper is hyper spectral remote sensing image data which gets from hyper spectral imager sensor. The image size is that: column is 3408, row is 256, band is 242, spectral resolution is nearly 10 nm, spectral range from 360 to 2570 nm; Data format is HDF Scientific Data; The storage format of image element is BIL.

Part of the information for band in the following Table 1.

Table 1 Each band corresponds to the wavelength information table

According to the prior knowledge, this paper selects the top left corner (1,3009) and the bottom right corner(256,3408) from study area, size of 256 × 400. The main reason of selecting such area as experimental area from the full 256*3408 is that the type of terrain of this region is relatively abundant and easy to identify, as shown in Fig. 2. On this basis, looking at the map for the corresponding latitude and longitude through world wind developed by NASA. In order to observe and confirm the distribution of mountains, vegetation, water body, surface in the experimental area by three-dimensional terrain model. It is convenient for sampling on the basic.

Fig. 2
figure 2

Image of hyper spectral data sample selection area

In this paper, 4200 samples were chosen according to four kinds of feature classes such as mountains, vegetation, water body, and surface body, the band diagram for part of samples as shown in Fig. 3:

Fig. 3
figure 3

The band diagram for part of samples

3.4 Result of the experiment and contrastive analysis

  1. 1.

    Results of optimal band selection of three band number

    The experimental results of optimal three band combination that use the improved DE algorithm for each evaluation criterion is respectively shown in Table 2. Comparing the results by the enumeration method for the evaluation criteria, we can find that the algorithm can get the result of optimal band combination with enumeration method in a short time, which further shows that the improved DE algorithm is feasible and effective for band selection of hyperspectral data.

Table 2 Table of optimal three band combination for each evaluation criterion based on improved DE algorithm
  1. 2.

    Multiband number of optimal band selection in no subspace division

    When enumeration methods are taken for calculating multiband combination (such as 5 band, 10 band even more band), the computation is too large and the operation time is too long. DE algorithm has the advantages of fast convergence in solving combinatorial optimization problems. The following experiment is the selection of multiband combination by using the improved DE algorithm in different evaluation criteria (such as the eigenvalue of covariance matrix, the best index, discrete degree, B distance and mahalanobis distance), and then three kinds of classifiers (Naive Bayes theorem, SMO and J48) are used to verify the optimal band combination. Different classifiers have different classification accuracy, so the average results of the three kinds of classification accuracy are treated as the final results.

    Tables 3, 4, 5, 6 and 7 shows the different band combination classification accuracy in different evaluation standards without subspace division calculation. Observing the data in the table, we can know that for each evaluation criterion method the classification accuracy is not on the rise with more number of band.

Table 3 The classification accuracy for different number of band combination with the evaluation criterion of covariance matrix Eigen values
Table 4 The classification accuracy for different number of band combination with the evaluation criterion of the best index
Table 5 The classification accuracy for different number of band combination with the evaluation criterion of discrete degree
Table 6 The classification accuracy for different number of band combination with the evaluation criterion of Bhattacharyya distance
Table 7 The classification accuracy for different number of band combination with the evaluation criterion of Mahalanobis distance
Fig. 4
figure 4

The classification accuracy graph by using Naive Bayes classifier for each evaluation criteria

Fig. 5
figure 5

The classification accuracy graph by using SMO classifier for each evaluation

Fig. 6
figure 6

The classification accuracy graph by using J48 classifier for each evaluation criteria

Fig. 7
figure 7

The average classification accuracy graph for each evaluation criteria

Figure 5 shows the classification accuracy graph by using SMO classifier for each evaluation criteria (abscissa is the band number of band combination, ordinate is the classification accuracy, following figures are the same), the picture shows that the greater the band number, the higher the classification accuracy is Figs. 4 and 6 respectively show the classification accuracy graph by using Naive Bayes classifier and J48 classifier for each evaluation criteria. It seems that the classification accuracy do not increase with the number of band according to the two pictures, and it turns to the fold point when band number is ten. Figure 7 is the graph of average classification accuracy for each evaluation criterion.

Therefore, when selecting band for hyperspectral data, we should choose the band combination with a larger band number as the optimal band combination while using SMO classifier. The band combination with a band number of 10 should be the optimal band combination for the Naive Bayes classifier and J48 classifier application.

  1. 3.

    The optimal band selection with subspace division of multiband

    Tables 8, 9, 10, 11 and 12 shows the classification accuracy of different band number of band combination calculated with each evaluation standards after subspace division. By comparing the experimental results without subspace division above, it can be found that the classification accuracy of band combination of each evaluation criterion is almost improved. Therefore, this group of experiments prove that in the band selection of hyper spectral, the band combination has a good effect on the final classification result after adding subspace division. Figure 8 is the curve contrast figure of the average classification accuracy for the five kinds of evaluation criteria before and after subspace division.

Table 8 The classification accuracy for different number of band combination with the evaluation criterion of covariance matrix Eigen values (with subspace division)
Table 9 The classification accuracy for different number of band combination with the evaluation criterion of the best index (with subspace division)
Table 10 The classification accuracy for different number of band combination with the evaluation criterion of discrete degree (with subspace division)
Table 11 The classification accuracy for different number of band combination with the evaluation criterion of Bhattacharyya distance (with subspace division)
Table 12 The classification accuracy for different number of band combination with the evaluation criterion of Mahalanobis distance (with subspace division)
Fig. 8
figure 8

The curve contrast figure of the average classification accuracy for the five kinds of evaluation criteria before and after subspace division. a Covariance matrix Eigen values; b Bhattacharyya distance; c discrete degree; d the best index; e Mahalanobis distance

4 Conclusion

In this paper, an improved DE algorithm is put forward for hyperspectral remote sensing data with characteristics of a large number of band, high spectral resolution, narrow bandwidth, huge amount of data. It implemented a fast and efficient dimension reduction model of hyperspectral remote sensing data. Contrast experiment proves that the algorithm can not only help obtain the optimal 3 bands combination in a short time, but also can contribute to the optimal combination of 5 bands, 10 bands and even a larger number, resulting in more precise classification.