1 Introduction

Hyperspectral remote sensing is an important breakthrough in the field of remote sensing technology. It organically combines the traditional two-dimensional remote sensing imaging technology with the spectrum technology, imaging ground objects with tens to hundreds of narrow and continuous band at the same time. Each pixel has a corresponding continuous spectral curve [1, 2], which greatly improves the resolution. But the increasing number of bands of hyperspectral images has problems of large amount of data and redundant information. This will have a more direct impact [3] on the accuracy of hyperspectral classification. Therefore, dimensionality reduction is very necessary in hyperspectral image analysis before analyzing and processing data [4]. The dimensionality reduction of hyperspectral image is usually solved by two ways, namely feature extraction [5] and feature selection [6]. Feature extraction reduces dimension through a transformation based a certain rules from high-dimensional space to low dimensional space. But this method is usually very complicated, and may also cause damage to the original physical properties of band, which is not conducive to the image interpretation and inversion [7, 8]. In contrast, feature selection selects several bands from all bands which can be effectively used to analyze the composition to form a simplified feature space. By this way, the physical meaning of the original hyperspectral image is maintained and the amount of image data can effectively be reduced as well. So it attracts a wide spread attention.

Band selection of hyperspectral image is a complex combinatorial optimization problem [9, 10], in general, in order to obtain the optimal wave subset, exhaustive search must be used, which will take a lot of time. However, the intelligent algorithm can obviously simplify the process of searching optimal band and greatly improve the searching efficiency. In recent years, intelligent algorithms have been widely used in band selection of hyperspectral image and achieved good results. These intelligent algorithms applied to band selection include genetic algorithm [11,12,13,14], ant colony algorithm [15] and the particle swarm optimization algorithm [16,17,18,19]. In these algorithms, genetic algorithm can effectively solve the problem that the band combination number is huge and the traversal is difficult in the process of band selection process, but the convergence speed is slow. On account of the initial lack of pheromone, ant colony algorithm has a slow solving speed and easily stagnates in the search process. Particle swarm optimization algorithm is easy to realize and has a faster convergence speed, but it is easy to fall into local optimum. Therefore, how to find the optimal band combination quickly and accurately is a problem of hyperspectral image band selection which is needed to solve.

In this research, this article proposes a band selection method based on particle swarm optimization algorithm with dynamic sub-swarms. In this algorithm, we treat the classification accuracy as the fitness function, and divide particles into dynamic subgroup in different subgroups according to the fitness value. Through the subgroup’s cooperation and parallel searching of optimum band, the risk of particle swarm algorithm of falls into local optimum can be reduced effectively because of dependence only on a single information exchange and local optimum. By overcoming the disadvantages of particle swarm optimization algorithm, the performance of the selected band can be ensured. Finally, the effectiveness of proposed method was verified by the AVRIS hyperspectral images experiments. The main contribution of the work lies in introducing the particle swarm optimization algorithm with dynamic sub-swarms into band selection for hyperspectral images.

2 Related Work

2.1 Band Selection of Hyperspectral Image Based on Particle Swarm Optimization

Particle swarm optimization (PSO) algorithm is a swarm intelligence computing method, which simulates the group behavior of bird to solve combinatorial optimization problems [20]. It has the advantage of simple principle, ease of implement and fast convergence, etc. Therefore, the algorithm has been widely used in many fields. Recently, some researchers have applied PSO algorithm to the band selection of hyperspectral images. Li and Ding proposed a method for searching for the optimal parameters for support vector machine and obtaining a subset of beneficial features simultaneously [17]. Su et al. proposed a PSO-based system to select bands and determine the optimal number of bands to be selected simultaneously. In the system, the outer PSO for estimating the optimal number of bands and the inner one for the corresponding band selection. To avoid employing an actual classifier within PSO so as to greatly reduce computational cost, criterion functions that can gauge class separability are preferred [18]. Zhang et al. proposed an unsupervised band selection method in which the fuzzy clustering is combined with PSO. Moreover, an entropy-based strategy for selecting representative bands of clusters is designed to restrain noisy bands effectively [19]. In general, the above PSO band selection algorithms are easy to fall into local optimization for hyperspectral image.

2.2 The Basic Idea of Particle Swarm Optimization Algorithm

In the PSO algorithm, the search space is set as d dimensions, the total number of particles is set as m. Particle i is any particle in d-dimensional search space, which has three properties of current position, speed and fitness value. Particle position represents a potential solution of optimization problem, expressed as x i  = (x i1 ,x i2 ,…x id ). Particle velocity represents the moving direction of the particle in next iteration, expressed as v i  = (v i1 ,v i2 ,…v id ). Fitness value of the particle indicates its degree of excellence, which can be calculated according to the fitness function. Each particle searches for the optimal solution through moving constantly in search space, then the particles will know their best position that has been found, namely, individual extremum. Meanwhile the particle will find the best position of the entire particle swarm through the experience of “peer” particles, namely group’s extremum. Particles ultimately find the global optimal solution by tracking the two “extremum” in iterative optimization.

2.3 Division and Optimization of Dynamic Subgroup

Particle swarm optimization algorithm is a swarm intelligence algorithm with good convergence, but it is easy to fall into local optimization in the face of such complex optimization problems as band selection of hyperspectral image. The reason is that the diversity of the population will gradually weaken or even disappear with growth of iterations, which make the optimization result converge to local optimum. Against this shortcoming, this paper uses the particle swarm optimization algorithm with dynamic sub-swarms to optimize information exchange model. Fitness value is used as criterion to functionally divide particles into subgroups [21], and each sub-group takes the different optimization method, which realizes clustering optimization for particles with different capabilities of optimization. On the one hand, multiple subgroups exchange collaboration and optimize in parallel of proposed method can ensure direction and diversity. On the other hand, it makes the information of particle swarm fully utilized, so that the information in the whole group flows fast. It both improves the efficiency of optimization and reduces the risk of local optimal, which improves comprehensive optimization capabilities of PSO.

Division of Dynamic Subgroups

Criterions for the division of dynamic subgroups are not unique. The method in this paper divides subgroups based on the orders sorted by the size of fitness value. In turn, all the particles are divided into groups of high quality group, general group and poor quality group. The concrete methods of division are as follows:

After population with the m particles is initialized, the fitness value f i of each particle is calculated and the results are sorted. Among them, maximum and minimum values which are defined as fmax and fmin, arithmetic average to all particles fitness value f avg is the cutoff point, \( {f}_{avg}^{\prime } \) is defined as the arithmetic average fitness value of all the particles belonging to the interval (f avg , fmax], \( {f}_{avg}^{{\prime\prime} } \) is defined as the arithmetic average of all the particles fitness value belonging to the interval [fmin, f avg ].

The particles whose fitness value belongs to interval \( \left({f}_{avg}^{\prime },{f}_{\mathrm{max}}\right] \)are defined as superior group, the particles whose fitness value belongs to interval \( \left({f}_{avg}^{{\prime\prime} },{f}_{avg}^{\prime}\right] \) are defined as common group, and the particles whose fitness values belong to interval \( \left({f}_{\mathrm{min}},{f}_{avg}^{{\prime\prime}}\right] \) are defined as inferior group. From the perspective of a single particle, the position of the particle gets changed after each iteration, so the corresponding fitness value also gets changed. Therefore, the particles in every subgroup preserve change in a dynamic process, which reflects the idea of dynamic sub-swarm.

Because the three sub-groups have some differences in the degree of convergence, different subgroups need to take corresponding optimization mode, and the global and local search ability can be controlled by adjusting the inertia weight in PSO in order to make all particles with different optimization capabilities fulfill their responsibilities [22], which ensures the optimality of the selected band.

Superior Group and the Optimizing Method

Particles in superior group are made up of the particles with higher fitness value in the optimizing process. Each particle corresponds with better position. Even though the number of particles in superior group will change dynamically after each iteration, it can ensure the particles in superior group is the small portion of all with best current optimization position in population. So they have strong searching capability, the possibility of finding the optimal position in superior group according to the global model is higher. This will determine the way of information communication in the superior group, that each particle not only have the ability to learn the society information, but also has its own ability to think, namely that each particle need to follow both global extremum and individual extremum. The (k + 1) times iteration of speed and position are shown as Eqs. (1) and (2):

$$ {v}_{id}^{k+1}={wv}_{id}^k+{c}_1{r}_1\left({p}_{id}^k-{x}_{id}^k\right)+{c}_2{r}_2\left({Y}_{gd}^k-{x}_{id}^k\right) $$
(1)
$$ {x}_{id}^{k+1}={x}_{id}^k+{v}_{id}^{k+1} $$
(2)

where, c1, c2 are learning factors, r1,r2 are random numbers in interval [0,1], \( {p}_{id}^k \)is individual extremum, \( {Y}_{gd}^k \) is a global extremum in superior group and w is the inertia weight.

In superior group, the number of particles is relatively less, and closed to the global optimum, so it needs to accelerate to approach the global optimum. According to Eq. (1), the velocity of particles in superior depends on the first part, namely inertial velocity, so smaller inertia weight is assigned to the particles in superior group to strengthen the local search capabilities. Concrete definition is as follows:

$$ w={w}^{\hbox{'}}-\left({w}^{\hbox{'}}-{w}_{\mathrm{min}}\right)\left|\frac{f_i-{f}_{avg}^{\hbox{'}}}{f_{\mathrm{max}}-{f}_{avg}^{\hbox{'}}}\right| $$
(3)

where, fmaxand \( {f}_{avg}^{\prime } \)are the boundary value of superior group, wis a fixed value, and is set to be 0.6; wminis the minimum value of w, and is set to be 0.4.

Common group and Its Optimizing Method

Compared with the superior group, optimizing ability of particles in the common group is general as a whole, but they also have some differences. The particles belonging to fitness value interval \( \left[{f}_{avg},{f}_{avg}^{\prime}\right] \) relatively have good positions, which have possibility to breaking away from the common population to join the superior group. Therefore, the optimizing method needs to get close to the superior group, which means following global extremum of the superior group. However, it will speed up the velocity of this part of particles so that the risk of falling into the local optimum also is increased significantly. Therefore, when this part of the particle pursuit the optimal particle, they also need to follow personal extremum and global extremum in common group, which not only can avoid falling into local optimum, but also increase the probability of finding global optimization. It can be seen that particles belonging to \( \left[{f}_{avg},{f}_{avg}^{\prime}\right] \) in the common group needs to follow the global extremum in the superior group, the global extremum and the individual extremum in general group, The particle i at the (k + 1)th iteration is shown as formula (4):

$$ {v}_{id}^{k+1}={wv}_{id}^k+{c}_2{r}_2\left({Y}_{gd}^k-{x}_{id}^k\right)+{c}_3{r}_3\left({P}_{gd}^k-{x}_{id}^k\right)+{c}_4{r}_4\left({p}_{id}^k+{x}_{id}^k\right) $$
(4)

where, c2, c3, c4 are learning factors, r2, r3, r4 are respectively random number in [0,1], \( {Y}_{gd}^k \)is global extremum in high quality group, \( {P}_{gd}^k \)is global extremum in general group, \( {p}_{id}^k \)is individual extremum of each particle, and w is the inertia weight.

Another part of particles of the general group belonging to \( \left({f}_{avg}^{{\prime\prime} },{f}_{avg}\right] \) will continue to iterating optimization value rely on the global model by following personal best and global extreme of general group, the velocity of at the (k + 1)th iteration is defined as:

$$ {v}_{id}^{k+1}={wv}_{id}^k+{c}_3{r}_3\left({p}_{id}^k-{x}_{id}^k\right)+{c}_4{r}_4\left({p}_{gd}^k-{x}_{id}^k\right) $$
(5)

Positions of all particles in common group are updated themselves according to Eq. (2).

Since the particles in common group are good and bad, this group has capability of global search and local search. In order to ensure convergence precision, nonlinear descend inertia weight is used. In the early iterations, inertia weight is large and global search dominates. In the latter, as inertia weight descends, local search is enhanced. The inertia weight of k time iteration is:

$$ w={w}_{start}-\left({w}_{start}-{w}_{end}\right){\left(\frac{k}{k_{\mathrm{max}}}\right)}^2 $$
(6)

where, w start is the maximum of inertia weight, w end is the minimum of inertia weight, kmax is the maximum number of iterations, and the value interval of inertia weight is ranged from 0.4 to 0.9.

Inferior Group and Its Optimizing Method

Inferior group includes the particles with the worst evaluation during the process of optimization, namely the position of these particles are very poor in the whole population. So particles in inferior group are not iterated and updated, which has a big difference with the other two subgroups. Therefore, optimizing mode is different from others. According to the division rule of each subgroup, the particles in this group are eliminated by superior group and common group, so the particles in inferior group can’t follow the global extremum of the other two subgroups. Meanwhile, the own poor positions makes inferior particles lose memory function. So they longer follow the individual extremum. For the feature of inferior particles, the method is to search the whole searching space means particles randomly. Because inferior particles are few, searching at random has rarely effect on global optimization. In addition, inferior particles are not altogether bad, which contributes to keep the diversity of the population, and avoid particles falling into local optimum.

$$ {v}_{id}^{k+1}={v}_{id}^k $$
(7)
$$ {x}_{id}^{k+1}={x}_{\mathrm{min}}+{r}_5\times \left({x}_{\mathrm{max}}-{x}_{\mathrm{min}}\right) $$
(8)

where, xmax is the maximum of spatial position xmin is the minimum of spatial position, r5 is a random number within [0,1].

3 Proposed Band Selection Method

During the process of band selection based on particle swarm optimization algorithm with dynamic sub-swarms, the particle information exchange model is improved. According to differences of search capability among all subgroups, parameters selection and iterative optimization are completed adaptively, which can avoid particles falling into local optimum. This method consists of the following three parts:

  1. (1)

    Data preprocessing

Due to the influence of atmospheric noise and complicated changes of airborne remote sensing platform position, the original hyperspectral image always has serious distortion. Therefore, before hyperspectral data is analyzed and processed, the first is to preprocess the hyperspectral remote sensing image [23] in order to remove the bands which are seriously polluted by moisture and noise.

  1. (2)

    Construction of fitness function

Fitness function is the criterion to evaluate bands. This paper uses the classification accuracy as the criterion of selected band subset. But the classification accuracy is only for specific classifier. As a widely used classifier, Support Vector Machine (SVM) is built on statistical learning theory. It maps the original data to high-dimensional data space through kernel function [24], which can transform a linear inseparable problem of low-dimensional space to a separable one of high-dimensional space so that a linear discriminate function is constructed in a high-dimensional space to find optimal surface. Although the data dimension is increased, the computational complexity is not increased accordingly. It effectively overcomes the difficulties from the features of non linearity and small samples brought to the classification. Therefore, the SVM classifier accuracy is selected as band selection criteria, namely the fitness function.

  1. (3)

    Band selection

After the hyperspectral images are preprocessed, after pretreatment use dynamic grouping bands are selected based on PSO algorithm with dynamic sub-swarms, the concrete procedure are as follows:

  1. (a)

    Particle swarm initialization. The initialization includes the initialization of velocity and location of each particle, which are set according to the calculation complexity in order to ensure the population diversity and search efficiency. It should be noted that the band combinations are represented by binary vector in this paper, so sigmoid function is used to map particle position, which transforms from continuous variables into discrete variables, and the mapping function is shown as formula (9):

$$ {x}_{id}^{k+1}=\left\{\begin{array}{c}1, sig\left({v}_{id}^{k+1}\ge u\right)\\ {}0, else\end{array}\right. $$
(9)

where, sig represents the Sigmoid function, u is the predetermined threshold value which is used to control the probability mapping into 1, and the rest has the same with continuous particle swarm algorithm, u is set as 0.5.

  1. (b)

    Calculate the fitness value of particles and then sort them. According to this criterion, the whole population is firstly divided into three subgroup: superior group, common group and inferior group, and then the optimization begins in parallel. In the process of iteration, the velocity, position and inertia weight update themselves. By this way, the best band combination will be found.

  2. (c)

    During the iterative process of particle velocity and position, whether it is out of the boundary value is needed to judge. If velocity goes out of the set value, it will be set as boundary control the maximum length of each movement step of particles.

  3. (d)

    The optimal band combinations are achieved according to the output band from each band group.

The pseudo code and flowchart of the proposed algorithm are shown in Algorithm 1 and Fig. 1 respectively.

Figure 1
figure 1

The flowchart of proposed algorithm.

4 Experiment Simulation

4.1 Description of Experimental Data

The experimental image data used in this paper is AVIRIS hyperspectral remote sensing data which is offered by Purdue University of the United States American for free (http://dynamo.ecn.purdue.edu/biehl/MultiSpec). AVIRIS hyperspectral image was taken in the northwest of remote sensing experimental area in Indiana. The Spatial resolution of the image is 20 m and the size of the image is 145 × 145 pixels, the number of the band is 224, and the wavelength ranges from 400 nm to 2500 nm. Before the band selection and classification, it is necessary to remove bands which are seriously polluted by moisture and noise (Band 1 ~ 4, 7 ~ 8, 80 ~ 86, 103 ~ 110, 149 ~ 165, 217 ~ 224). As a result, 179 bands are reserved to be selected in experiment. Figure 2 is the false color image synthesized by band 90, 5, 120, Fig. 3 shows the reference map of the original ground feature.

Figure 2
figure 2

AVIRIS false color composite image.

Figure 3
figure 3

Reference map.

In this paper, band selection based on particle swarm optimization with dynamic sub-swarms (DPSO), while selected PSO and Genetic Algorithm (GA) as a reference to verify the proposed method is effective. In the experiments, we selected seven of categories feature for classification, and the training and test samples will be selected based on equal ratio (shown in Table 1).

Table 1 The numbers of training and test samples.

The parameters of proposed algorithm are set as follows: population size N = 50; particle swarm learning factor c1, c, c3, c4 are taken as 1.4962; inertia weight are changed in the way described by the algorithm, the total number of iterations is 50. SVM classifier selected RBF kernel function, whose penalty parameter c and the kernel parameter γ is set as c = 16, γ = 2.5874.

figure g

4.2 Analysis of Experimental Results

Classification accuracy assessment of hyperspectral remote sensing images includes product’s accuracy (PA), user’s accuracy (UA), the overall classification accuracy and Kappa coefficient which is a comprehensive assessment of the overall classification accuracy, product’s accuracy and the user precisions of the experiment. Classification accuracy of each ground feature in the experiment is shown in Table 2.

Table 2 Product’s accuracy and user’s of classification experiment.

As the experimental data shown in Table 2, under the same sample selective conditions, PSO has a better convergence accuracy than GA in response to such high-dimensional problems like the band selection, and the user accuracy and producers accuracy of DSPSO are both better than GA and PSO. The phenomenon of classification has been significantly improved. The classification accuracy of all ground features are improved in different degrees, which shows a certain advantage in the performance of algorithm.

Table 3 shows the overall classification accuracy and Kappa coefficient. It is shown in Table 3 that the improved algorithm proposed in this paper has the highest classification accuracy (94.64%) and Kappa coefficient (0.9327) compared with GA and PSO algorithms. The optimization of particle information model is beneficial to improve the search efficiency. Compared with GA and PSO algorithm, the improved algorithm has better performance of combinational classification, which is verified the effectiveness of the proposed method.

Table 3 The overall classification accuracy and Kappa coefficient.

Figure 4 is the fitness value of the three algorithms in whole iterative process, namely the change of classification accuracy. According to the fitness value trend of the three algorithms, it is shown that GA algorithm and PSO algorithm stagnate in a short time and fall into local optimum. The DSPSO algorithm improved this situation than other two algorithms. It can be observed from Fig. 4 that the DSPSO algorithm converges after about 20 generations, and GA and PSO algorithms converge after 35 generations. The DSPSO algorithm can achieve the highest classification accuracy and get away from local optimum quickly. The above analysis shows that the effectiveness of the proposed method on the hyperspectral image.

Figure 4
figure 4

Fitness value change curves of GA-SVM, PSO-SVM, and DSPSO-SVM algorithms, respectively.

Figure 5 is the result of classification experiment. As the images shown intuitively, the improved algorithm proposed in this paper has better performance than GA algorithm and PSO algorithm in the classification results.

Figure 5
figure 5

Classification results of GA-SVM, PSO-SVM, and DSPSO-SVM algorithms, respectively.

5 Conclusion

While particle swarm algorithm showed better convergence in band selection, but its shortcomings that is easy to fall into local optimum will cause adverse effects on following classification. This paper proposed a band selection method based on particle swarm optimization algorithm with dynamic sub-swarms, which divides up all particles into dynamic subgroups to optimize the information exchange information. It effectively reduces the risk of falling into local optimum in iterative optimization process in order to ensure the optimality of the selected bands. Experiment results shows that the improved algorithm proposed in this paper has higher classification accuracy of band combination than GA and PSO, which is verified the effectiveness of the algorithm.