Introduction

Target detection, as an important research direction in the field of hyperspectral imaging, aims to detect small objects or anomalies in hyperspectral image (HSI). It has broad application potential in military safety, environmental pollution monitoring, geological exploration, agriculture and forest monitoring, etc. (Matteoli et al. 2010; Li et al. 2017; Xie et al., 2019a, b). HSI contains rich spectral information and spatial information. Since spectral features differ among substances, it is possible to effectively distinguish objects in a scene using HSI.

Many target detection algorithms have been proposed and applied to a HSI. The Reed-Xiaoli (RX) detector (Reed and Yu, 1990; Xie et al., 2019a, b) obtains the detection result by constructing the generalized likelihood ratio and estimating the background covariance matrix. However, the RX algorithm ignores the rich nonlinear information in a HSI, resulting in poor detection accuracy. The collaboration representation detector (CRD) (Li and Du 2015) is directly based on the concept that each pixel in the background can be approximately represented by its spatial neighborhoods, while anomalies cannot but it only considers the spectral features of a HSI, ignoring the spatial features.

The combined sparse and collaboration representation detector (CSCR) (Li et al. 2015) assumes that the representation of known target signatures is sparse and can be solved by L1-norm minimization of the representation weight vector. However, the representation of background atoms is assumed to be collaborative and can be solved by L2-norm minimization. Finally, the decision can be made by computing the difference between the two representation residuals.

Support vector machines (SVM) (Zhao et al. 2012) are a highly effective method for nonlinear signals. It maps the signals to a new feature space where it is easier to distinguish among signals (Tan and Du 2008). The kernel method has yielded satisfactory results in HSI processing (Li et al. 2010; Zhao et al. 2010). In addition, many algorithms use statistics for hypothesis detection, such as spectral matched filter (SMF) (MANOLAKIS and SHAW 2002), all of which must assume the mathematical distribution of the pixel spectrum of hyperspectral images (HSIs). The accuracy of the hypothesis distribution model has a substantial impact on the detection results.

In recent years, the sparse representation method has attracted increasing attention (Chen et al., 2011a, b). It assumes that target detection can be achieved by using target and background libraries to represent test pixels. Under this formulation, the presentation of the target and background signatures can be solved by L1-norm minimization of the weight coefficient. The target detection problem is thus transformed into the optimization problem of solving for dictionary atomic coefficients (Chen et al., 2011a, b). This algorithm does not require assumptions regarding the mathematical distribution model of the background and target pixels, nor does it require the atoms of the training sample dictionary to be independent of one another. Because of the characteristics of background and target pixel spectra, they belong to different subspaces and the atoms that constitute their spectral dictionaries will differ. Whether the pixel belongs to the background or the target can be determined according to the positions of dictionary atoms with nonzero sparseness.

For solving the dictionary atomic optimization problem, convex optimization methods typically use the L1-norm instead of the L0-norm to solve the problem efficiently. However, only under strict conditions do the two yield equivalent solutions and the actual signal typically cannot be recovered. In addition, due to the similarity of the endmembers in the spectral library and the insufficient constraints of the optimization function, the real solutions differ substantially from the results of the abundance estimation. At the same time, because of the large amount of hyperspectral image data, the convex optimization algorithm is slow.

Another method is sparse Bayesian learning (SBL) (Themelis et al. 2012) that models the unknown variables based on Bayesian concepts and obtains sparse solutions by Bayesian derivation (Wipf 2006; Zhang and Rao 2012, 2013; Qiu and Dogandzic 2010; Zhang and Rao 2011; Kong et al. 2017). The core strategy of Bayesian theory is to obtain the probability of an a posteriori unknown parameter by using the sample prior information and the complete information. Wipf and Rao (2004) proved that the SBL algorithm can obtain the sparsest solution. SBL still performs reliably if the endmembers in the spectral library are strongly correlated. However, the expectation maximization (EM) algorithm is used to update the parameters. This leads to a large amount of computation and fails to consider the joint sparsity of the endmember combinations in adjacent pixels, thereby resulting in low efficiency of the algorithm.

To make better use of the spatial correlation of HSI, a regularized multiple sparse Bayesian learning (RMSBL) method for target detection in HSI is proposed, that is established by Bayesian inference using the conditional posterior distribution of the model parameters under a hierarchical Bayesian model. According to the cost function for multiple sparse Bayesian learning (MSBL), the presentation of the target and background signatures can be obtained by the L2,1-norm minimization iterative method and the target detection result can be achieved with the difference between the two representation residuals. In the simulation experiment, compared with other commonly used detection algorithms, the RMSBL algorithm has superior detection performance.

figure a

The remainder of this paper is organized as follows: the “Target Detection Based on Sparse” section discusses target detection based on sparse representation. The “Regularized Multiple Sparse Bayesian” section describes the derivation of RMSBL. The “Experimental Results and Analysis” section presents the experimental setup and evaluates the detection performance. Finally, the conclusions of this work are presented in the “Conclusions” section.

Target Detection Based on Sparse Representation

Let Y be a set of hyperspectral image data and y ∈ RN × 1 be an N-dimensional spectral vector in Y. Vector y can be represented as follows (Zhang et al. 2017):

$$ {\displaystyle \begin{array}{l}\mathbf{y}={\mathbf{a}}_1^b{x}_1^b+{\mathbf{a}}_2^b{x}_2^b+\cdots +{\mathbf{a}}_{N_b}^b{x}_{N_b}^b+{\mathbf{a}}_1^t{x}_1^t+{\mathbf{a}}_2^t{x}_2^t+\cdots +{\mathbf{a}}_{N_t}^t{x}_{N_t}^t+\mathbf{n}\\ {}\kern0.9000001em =\left[{\mathbf{a}}_1^b{\mathbf{a}}_2^b\cdots {\mathbf{a}}_{N_b}^b\right]\;{\left[{x}_1^b{x}_2^b\cdots {x}_{N_b}^b\right]}^T+\left[{\mathbf{a}}_1^t{\mathbf{a}}_2^t\cdots {\mathbf{a}}_{N_t}^t\right]\;{\left[{x}_1^t{x}_2^t\cdots {x}_{N_t}^t\right]}^T+\mathbf{n}\\ {}\kern0.9000001em ={\mathbf{A}}_b{\mathbf{x}}_b+{\mathbf{A}}_t{\mathbf{x}}_t+\mathbf{n}=\left[{\mathbf{A}}_b\;{\mathbf{A}}_t\right]\;\left[\begin{array}{c}{\mathbf{x}}_b\\ {}{\mathbf{x}}_t\end{array}\right]+\mathbf{n}=\mathbf{Ax}+\mathbf{n}\end{array}} $$
(1)

where Ab and At are the background dictionary and the target dictionary, respectively; x denotes the weight coefficients that correspond to the dictionary; x is a sparse vector, of which only a few coefficients are nonzero; and n represents the observation error.

In the sparse model, it is not necessary to assume the distribution characteristics of the target and the background because the spectral characteristics of the background and target pixels differ and are distributed in different subspaces. The sparse vector x is composed of background weight coefficient xb and target weight coefficient xt. If y is a target pixel, then xb is a zero vector and xt is a sparse vector; if y is a background pixel, then xb is a sparse vector and xt is a zero vector. Therefore, based on the nonzero position of the coefficient x of pixel y, whether the pixel is a background or a target pixel can be determined.

To obtain the weight coefficient x of the pixel y, one must solve the optimization problem that is defined by the following formula:

$$ \mathbf{x}=\mathrm{argmin}{\left\Vert \mathbf{x}\right\Vert}_1\;\mathrm{subject}\ \mathrm{to}\;\mathbf{Ax}=\mathbf{y} $$
(2)

where argmin(x) represents the value of the variable when the objective function takes the minimum value.

Pixel y can be classified by comparing the values of the reconstructed residuals after x has been obtained. Therefore, the output of the detector is expressed as follows:

$$ R\left(\mathbf{y}\right)={\left\Vert \mathbf{y}-{\mathbf{A}}_b{\mathbf{x}}_b\right\Vert}_2-{\left\Vert \mathbf{y}-{\mathbf{A}}_t{\mathbf{x}}_t\right\Vert}_2 $$
(3)

where ‖y − Abxb2 and ‖y − Atxt2 are the background residual and the target residual, respectively. For a specified threshold δ, if R(y) > δ, then y is classified as a target; otherwise, y belongs to the background.

The above is a target detection model that is based on sparse representation. Because the adjacent pixels in hyperspectral data contain similar information, they can be linearly combined by mixing coefficients of the endmembers in the spectral library. The spectral vector is extended to a spectral matrix and the weight coefficient vector is expanded to a weight coefficient matrix. Then, the mathematical model of the multiple sparse representation is as follows:

$$ \mathbf{Y}=\mathbf{AX}+\mathbf{N}={\mathbf{A}}_t{\mathbf{X}}_t+{\mathbf{A}}_b{\mathbf{X}}_b+\mathbf{N}\kern0.2em $$
(4)

where Y ∈ RN × M denotes the observed values of M pixels in N bands; A ∈ RN × L represents the spectral library, which contains the reflected values of L objects in N bands; X ∈ RL × M is the weight coefficient matrix; and N is the observation noise.

The key step in target detection is to find weight coefficient X such that \( {\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2 \) is minimized under the constraint that ‖X2, 1 is also minimized. Therefore, the objective function is as follows:

$$ \mathbf{X}=\arg \underset{\mathbf{X}}{\min }{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{\left\Vert \mathbf{X}\right\Vert}_{2,1} $$
(5)

The optimization formula of the L2,1-norm is typically solved by a convex optimization algorithm. The representative algorithm for convex optimization is the collaborative spectral unmixing by variable splitting and augmented Lagrangian algorithm (CLSUnSAL) (Bioucas-Dias and Figueiredo 2010).

The values of the target and the background under the sparse representation can be obtained separately after solving for the weight coefficients:

$$ {\mathbf{Y}}_t={\mathbf{A}}_t{\mathbf{X}}_t $$
(6)
$$ {\mathbf{Y}}_b=\mathbf{AX}-{\mathbf{Y}}_t $$
(7)

Then, the residuals of the pixels are calculated one by one, and the target residual and background residual are calculated as follows:

$$ {r}_t\left(\mathbf{y}\right)={\left\Vert \mathbf{y}-{\mathbf{y}}_t\right\Vert}_2^2 $$
(8)
$$ {r}_b(y)={\left\Vert y-{y}_b\right\Vert}_2^2 $$
(9)

Pixel y is classified by comparing the values of the reconstructed residuals. Therefore, the output of the detector is expressed as follows:

$$ R\left(\mathbf{y}\right)={r}_b\left(\mathbf{y}\right)-{r}_t\left(\mathbf{y}\right) $$
(10)

For a specified threshold δ, if R(y) > δ, then y is classified as a target; otherwise, y belongs to the background.

Regularized Multiple Sparse Bayesian Learning for Target Detection

Multiple Sparse Bayesian Learning Model

Multiple sparse Bayesian learning is an efficient method for solving the simultaneous sparse approximation problem in the simultaneous sparse model. Based on the MMV model, the prior distribution of the sparse coefficient matrix with joint hyperparameters is established. Because the current pixels in hyperspectral images and the surrounding pixels contain similar information, the coefficient matrix should satisfy the row sparsity characteristic. The prior distribution of each row vector is characterized by hyperparameters such that the matrix satisfies the row sparsity characteristic. According to the cost function of MSBL, an iterative method is obtained by theoretical derivation, which effectively reduces the number of iterations.

Let Yj and Xj represent the jth columns of Y and X, respectively. The likelihood function is obtained (Kong et al. 2016):

$$ p\left({\mathbf{Y}}_{.j}|{\mathbf{X}}_{.j}\right)={\left(\pi {\sigma}^2\right)}^{-N}\exp \left(-\frac{1}{\sigma }{\left\Vert {\mathbf{Y}}_{.j}-\mathbf{A}{X}_{.j}\right\Vert}_2^2\right) $$
(11)

The general method is to represent the sparsity of the abundance matrix directly by Laplace a priori, which will cause the likelihood function and prior distribution not to satisfy the requirement of conjugation. Therefore, the hierarchical Bayesian model is used to design the prior distribution. Assume that the ith row Xi of the coefficient matrix X obeys the Gaussian distribution p(Xi; γi) = N(0, γiI) of the parameter γi. Then, the prior distribution of the coefficient matrix is a high-dimensional Gaussian distribution:

$$ p\left(\mathbf{X};\gamma \right)=\prod \limits_{i=1}^Mp\left({\mathbf{X}}_{i.};{\gamma}_i\right) $$
(12)

where \( \gamma ={\left[{\gamma}_1,{\gamma}_2,\cdots, {\gamma}_M\right]}^T\in {R}_{+}^M,{\gamma}_i \) is used to denote the sparsity of each row of the coefficient matrix. If γi = 0, Xi. is all zero rows, namely, the conditional probability p(Xi. = 0| Y; γi = 0) = 1is satisfied. The parameter γi obeys the Gamma distribution: p(γi| λi)~Γ(γi| 1, λi/2).

According to Bayesian theory, a posterior distribution is obtained:

$$ {\displaystyle \begin{array}{l}p\left({\mathbf{X}}_{.j}|{\mathbf{Y}}_{.j};\gamma \right)=\frac{p\left({\mathbf{X}}_{.j},{\mathbf{Y}}_{.j};\gamma \right)}{\int p\left({\mathbf{X}}_{.j},{\mathbf{Y}}_{.j};\gamma \right)d{\mathbf{X}}_{.j}}\\ {}\kern6.599996em =N\left({\mu}_{.j},\sum \right)\end{array}} $$
(13)

The mean and variance can be expressed as follows:

$$ \mathbf{M}=\left[{\mu}_{.1},{\mu}_{.2},\cdots, {\mu}_{.L}\right]=E\left[\mathbf{X}|\mathbf{Y};\gamma \right]=\varGamma {\mathbf{A}}^T{\sum}_{\mathbf{Y}}^{-1}\mathbf{Y} $$
(14)
$$ \sum = Cov\left[{\mathbf{X}}_{.j},{\mathbf{Y}}_{.j};\gamma \right]=\varGamma -\varGamma {\mathbf{A}}^T{\sum}_{\mathbf{Y}}^{-1}\mathbf{A}\varGamma, \forall j $$
(15)

where Γ =  diag (γ)   and   ∑Y = σ2I + AΓAT.

The logarithmic form of the cost function of MSBL is expressed as follows (Wang et al. 2013):

$$ {\displaystyle \begin{array}{l}L\left(\gamma, {\sigma}^2\right)=-2\log \int p\left(\mathbf{Y}|\mathbf{X}\right)p\left(\mathbf{X};\gamma, {\sigma}^2\right)d\mathbf{X}\\ {}\kern1.44em =L\log \mid {\sum}_{\mathbf{Y}}\mid +{\sum}_{j=1}^L{\mathbf{Y}}_{.j}^T{\sum}_{\mathbf{Y}}^{-1}{\mathbf{Y}}_{.j}\end{array}} $$
(16)

For parameter optimization, the EM method is commonly used in sparse Bayesian learning. The EM method is divided into two steps: In step E, the mean value is calculated via formula (14) and in step M, the iterative update is carried out by the following formula:

$$ {\gamma}_i={\mu}_i^2+{\sum}_{ii} $$
(17)

The MacKay method obtains the parameter iteration formula by calculating the extremum. This method is equivalent to point estimation and has a faster iteration speed than formula (17):

$$ {\gamma}_i=\frac{\mu_i^2}{1-{\gamma}_i^{-1}{\sum}_{ii}} $$
(18)

Optimal Solution

This paper proposes a new method for optimization. In Eq. (16), the former term, namely, log ∣ ∑Y∣, is a smooth concave function, which can be transformed by the property of conjugate functions, while the latter term, namely, \( {\mathbf{Y}}_{.j}^T{\sum}_{\mathbf{Y}}^{-1}{\mathbf{Y}}_{.j} \), is a quadratic term. After deducing the two terms, the following is obtained:

$$ {\displaystyle \begin{array}{l}{L}_z\left(\mathbf{X},\gamma \right)=\frac{1}{\sigma^2}\underset{\mathbf{X}}{\min }{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{z}^T\gamma +{\mathbf{X}}^T{\varGamma}^{-1}\mathbf{X}\\ {}\kern1.44em =\frac{1}{\sigma^2}\underset{\mathbf{X}}{\min }{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{z}^T\gamma +{\sum}_{i=1}^M{\gamma}_i^{-1}{\left\Vert {\mathbf{X}}_{i.}\right\Vert}_2^2\end{array}} $$
(19)

where min(x) represents a function that takes the minimum value of the objective function.

The optimal iteration of γi is obtained by using the derivation rule for Eq. (19):

$$ {\gamma}_i={z}_i^{-1/2}\sqrt{{\mathbf{X}}_{i.}{\mathbf{X}}_{i.}^T}={z}_i^{-1/2}{\left\Vert {\mathbf{X}}_{i.}\right\Vert}_2\left(\forall i\right) $$
(20)

Formula (20) is substituted for formula (19) and the coefficients of the regular terms are normalized:

$$ \mathbf{X}=\underset{\mathbf{X}}{\mathrm{argmin}}\frac{1}{2}{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{\sum}_{i=1}^M{\sigma}^2{\gamma}_i^{-1}{\left\Vert {\mathbf{X}}_{i.}\right\Vert}_2^2 $$
(21)

Let \( {w}_i={\sigma}^2{z}_i^{1/2} \). The optimal expression for weight coefficient estimation is as follows:

$$ {\displaystyle \begin{array}{l}\mathbf{X}=\underset{\mathbf{X}}{\mathrm{argmin}}\frac{1}{2}{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{\sum}_{i=1}^M{\sigma}^2{z}_i^{1/2}{\left\Vert {\mathbf{X}}_{i.}\right\Vert}_2\\ {}\kern0.36em =\arg \underset{\mathbf{X}}{\min}\frac{1}{2}{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{\sum}_i^M{w}_i{\left\Vert {\mathbf{X}}_{i.}\right\Vert}_2\end{array}} $$
(22)

Define the matrix norm:

$$ {\left\Vert B\right\Vert}_{2,1}={\sum}_i\sqrt{\sum_j\left({B}_{ij}^2\right)}={\sum}_i\sqrt{B_{i.}{B}_{i.}^T}={\sum}_i{\left\Vert B\right\Vert}_2 $$
(23)

Equation (22) can be rewritten as follows:

$$ \mathbf{X}=\arg \underset{\mathbf{X},\kern0.48em W}{\min}\frac{1}{2}{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2+{\left\Vert W\mathbf{X}\right\Vert}_{2,1} $$
(24)

where W = diag(wi) denotes a diagonal matrix with diagonal elements wi. Equation (24) is an L2,1-regularization weighted iterative problem. The results that are obtained by the alternating iteration method are globally convergent and are the sparsest solution (Rakotomamonjy 2011).

In addition, since the noise variance only affects the convergence speed, it does not affect the accuracy of the sparse solution. To set the parameter value adaptively, we update the calculation of the variance as follows (Wipf and Rao 2007):

$$ \left({\sigma}^2\right)=\frac{{\left\Vert \mathbf{Y}-\mathbf{AX}\right\Vert}_F^2/L}{N-M+{\sum}_{i=1}^M{\sum}_{ii}/{\gamma}_i} $$
(25)

For problem (16), parameter learning can be performed by alternate iteration; the expressions for which are listed in Table 1.

Table 1 Parameter updating in each iteration

The overall RMSBL algorithm is summarized as Algorithm 1.

figure b

Experimental Results and Analysis

In this section, experiments are conducted on four datasets, and we compare the results with five widely used methods: the CRD, CSCR, RX, LRX, and RMSBL algorithms. The parameters of each algorithm are optimized in the experiment, and the receiver operating characteristic (ROC) curve is typically employed to quantitatively evaluate the detection performance. Then, we compute the area under the ROC curve (AUC) to evaluate the performance of the RMSBL and other algorithms.

Hyperspectral Data

The first experimental dataset, namely, airport2, uses a portion of the hyperspectral image of Los Angeles Airport that was collected by the airborne visible/infrared imaging spectrometer (AVIRIS) sensor. This scene consists of 100 × 100 pixels (as shown in Fig. 1a) and the spatial resolution is 7.1 m. After removing the water absorption and low-SNR bands, 205 bands remain, including 87 target pixels to be detected. Figure 1 b shows the ground-truth image of the target.

Fig. 1
figure 1

The first column shows the color composites of four datasets and the second column shows the ground-truth map of the target. a, b Airport2. c, d Cuprite. e, f San Diego. g, h The HYDICE Urban scene

The second dataset, namely, Cuprite, was captured by the AVIRIS sensor in 1997 over the Cuprite mine in Nevada. Only a small part of the data is used in this experiment. Figure 1 c and d show the color composites of Cuprite and the ground-truth image of the target, respectively. This scene consists of 250 × 191 pixels. After removing the water absorption and low-SNR bands, 188 bands remain, including approximately 35 to 40 target pixels to be detected.

The third dataset, San Diego, uses a portion of the hyperspectral image of the San Diego Airport in the USA that was collected by the AVIRIS sensor. This scene consists of 200 × 200 pixels (as shown in Fig. 1e) and the spatial resolution is 3.5 m. After removing the water absorption and low-SNR bands, 189 bands remain, including approximately 132 target pixels to be detected. Figure 1 f shows the ground-truth image of the target.

The fourth dataset, the HYDICE Urban scene, is a hyperspectral image of a suburban residential area in Texas, USA, that was captured by Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor. Figure 1 g and h show the color composites of the HYDICE Urban scene and the ground-truth image of the target, respectively. This scene consists of 80 × 100 pixels, and the spatial resolution is approximately 1 m. After removing the water absorption and low-SNR bands, 162 bands remain, including approximately 21 target pixels to be detected.

Detection Performance

According to the existing theoretical knowledge, algorithms RMSBL, CRD, CSCR, and LRX all use a background dictionary. In the actual target detection, the background dictionary is typically obtained via local and adaptive methods. For algorithms CRD, CSCR, and LRX, the sliding dual-window method is used to obtain the background dictionary; hence, the sizes of the inner and outer windows will affect the performance of the algorithm. Therefore, the performance of the algorithm is optimized by adjusting the parameters during the simulation experiment. The RMSBL algorithm obtains the dictionary via the vertex component analysis (VCA) method; thus, the RMSBL and RX algorithms are not affected by the dual-window scheme.

We conduct an experimental simulation on the four datasets that are described above and analyze the results of the RMSBL algorithm and the four comparison algorithms in terms of their ROC curves and AUC values.

To optimize the performance of the algorithm for the airport2 dataset, the parameters of each algorithm are determined after many experiments as follows: for the CRD algorithm, the outer window size is wout = 11, the inner window size is win = 5, and the regularization parameter is λ = 10−6; for the CSCR algorithm, the window sizes are (wout, win) = (11, 3), and the regularization parameter is λ1 = 10−2, λ2 = 10−1; and for the LRX algorithm, the window sizes are (wout, win) = (15, 3). The detection outputs of the algorithms are shown in Fig. 2. The proposed RMSBL algorithm yields the best result. Figure 6 a shows the ROC curves of the proposed algorithm and the comparison algorithms. The detection rate of the RMSBL algorithm is lower than those of the other algorithms if the false-alarm rate is less than 10−2; however, it increases rapidly if the false alarm rate exceeds 10−2, significantly higher compared with the other algorithms; and it reaches 1 before those of the other algorithms. On this dataset, the CRD algorithm is inferior to the RMSBL but outperforms the other algorithms.

Fig. 2
figure 2

Detection outputs for dataset airport2. a RMSBL. b CRD. c CSCR. d LRX. e RX

For the Cuprite dataset, the parameters of each algorithm are as follows: for the CRD algorithm, the window sizes are (wout, win) = (11, 5) and the regularization parameter is λ = 10−6; for the CSCR algorithm, the window sizes are (wout, win) = (11, 5) and the regularization parameters are λ1 = 10−1 and λ2 = 10−2; and for the LRX algorithm, the window sizes are (wout, win) = (13, 9). Figure 3 shows the outputs of the proposed algorithm and the comparison algorithms. The ROC curves of the algorithms are plotted in Fig. 6b. The experimental results demonstrate that the RMSBL algorithm far outperforms the other algorithms in terms of the detection probability and when the false alarm rate is less than 10−2, the detection rate has reached 1; hence, the RMSBL algorithm performs well on this dataset and the RX algorithm is inferior to the RMSBL algorithm but outperforms the other algorithms.

Fig. 3
figure 3

Detection outputs for the Cuprite dataset. a RMSBL. b CRD. c CSCR. d LRX. e RX

For the San Diego dataset, the parameters of each algorithm are as follows: for the CRD algorithm, the window sizes are (wout, win) = (17, 9), and the regularization parameter is λ = 10−6; for the CSCR algorithm, the window sizes are (wout, win) = (7, 5), and the regularization parameters are λ1 = 10−2 and λ2 = 10−1; and for the LRX algorithm, the window sizes are (wout, win) = (13, 7). Figure 4 shows the outputs of the algorithms. Figure 6 c shows the ROC curves of the algorithms, according to which the detection probability of the RMSBL algorithm exceeds those of the other algorithms. If the false alarm rate is close to 10−1, the detection rate reaches 1. Under this set of data, the CRD algorithm is inferior to the RMSBL algorithm but outperforms the other algorithms.

Fig. 4
figure 4

Detection outputs for the San Diego dataset. a RMSBL. b CRD. c CSCR. d LRX.. e RX

For the HYDICE dataset, the parameters of each algorithm are as follows: for the CRD algorithm, the window sizes are (wout, win) = (13, 7), and the regularization parameter is λ = 10−6; for the CSCR algorithm, the window sizes are (wout, win) = (9, 5), and the regularization parameters are λ1 = 10−2 and λ2 = 10−1; and for the LRX algorithm, the window sizes are (wout, win) = (13, 7). Figure 5 shows the outputs of the algorithms. The ROC curves of the algorithms are shown in Fig. 6d. If the false alarm rate is less than 10−3, the detection rate of the RMSBL algorithm is low. If the false alarm rate exceeds 10−3, the detection rate increases rapidly and reaches 1 when the false alarm rate is 10−2.

Fig. 5
figure 5

Detection outputs for the HYDICE Urban scene dataset. a RMSBL. b CRD. c CSCR. d LRX. e RX

Fig. 6
figure 6

ROC performance of the proposed method. a Airport2 dataset. b Cuprite dataset. c San Diego dataset. d HYDICE dataset

The AUC values of each algorithm are listed in Table 2. From the data, we can judge the performance of each algorithm more accurately. The AUC value of RMSBL is the largest in the experimental results for each group of data, namely, its performance is the best. For the Cuprite and HYDICE datasets, the AUC value of RX algorithm is slightly smaller than that of RMSBL; however, the RX algorithm does not perform well on the airport2 and San Diego datasets. The RMSBL algorithm performs well on all test datasets, especially the Cuprite and HYDICE datasets, on which the AUC value is close to 1.

Table 2 AUC (%) values for the proposed algorithm and the comparison algorithms

Finally, we report the computational complexities of the compared detection methods with optimal parameters. All experiments were conducted using MATLAB R2014a on an Intel Core i5-3470 CPU machine with 12 GB of RAM. The execution times (in seconds) for the experimental data are listed in Table 3. All the other algorithms except the RX algorithm have higher computational costs than RMSBL.

Table 3 Execution times (in seconds) on all experimental datasets

Conclusions

This paper proposes a hyperspectral target detection algorithm that is based on RMSBL. The weight coefficient is calculated by L2,1-norm regularization, and the target residual and background residual are obtained. Finally, target detection is achieved by evaluating the difference between the two residuals. The proposed method is compared with the CRD, CSCR, LRX, and RX methods. Experiments are performed on four datasets and the results demonstrate that our proposed method outperforms the state-of-the-art methods. In the future research, we will try to use the deep learning method to solve for the weight coefficient in order to achieve better results.