1 Introduction

Space-time adaptive processing (STAP) is a mature technique which used in airborne radar for improving the signal-to-clutter-plus-noise ratio (SCNR) of slow-moving targets in severe clutter environments. It can adaptively construct two-dimensional space-time optimal filter to eliminate the ground clutter spreading [1]. However, it is unreal to achieve the optimal STAP in application due to the high computational load and unknown clutter-and-noise covariance matrix (CNCM) [2, 3].

To overcome the aforementioned challenges of optimal STAP, sample covariance matrix (SCM) was developed to replace practical CNCM. Reed et al. proved that the requirement of independent and identically distributed (i.i.d.) training samples is at least twice degrees of freedom (DoFs) for the SCNR loss less than 3 dB [4]. Unfortunately, it is hard to obtain sufficient i.i.d. training samples during the short period of detection, especially in a heterogeneous environment [5]. Many algorithms have been designed to reduce sample limitation from different perspectives, which mainly consist of reduced-dimension (RD) and reduced-rank (RR). In RD-STAP, it reduces system DoFs by a linear transformation, which makes the number of training samples decrease to twice adaptive DoFs. Traditional RD-STAP algorithms include extended factored approach (EFA) [6], joint domain localized (JDL) [7], sum and difference beam-forming (SDBF) [8]. Among them, JDL-STAP and SDBF-STAP are sensitive to array error and they need to add additional channels to improve the robustness which leads to the increasing of requirement for training samples, EFA is more robust in the presence of spatial errors but it suffers from performance loss in low pulse repetition frequency (LPRF). As for RR-STAP, it projects the echo onto a lower-dimensional subspace by data-dependent transformation, thus reducing the sample requirement to twice the clutter rank. Typical RR-STAP methods include principal component (PC) [9], cross-spectral metric (CSM) [10], and multistage Wiener filter (MSWF) [11]. Despite their success to reduce the sample requirement, the performance of these methods is still constrained by secondary data in practical application[12].

STAP combined with the sparse recovery (SR) technique has attracted great interest in recent years. SR-STAP methods utilize the intrinsic sparsity of the clutter in angle-Doppler space to recover the clutter with advanced SR algorithms, and it can achieve a perfect estimation of CNCM under insufficient training samples condition, even in a single sample. Though the advantages of above SR-STAP methods have great potential, an unacceptable heavy computational burden is still required, which makes SR-STAP is unrealizable for application. Generally speaking, SR algorithms approximate the optimal solution by successive iterations, which causes a large time cost in repetitive operations, particularly in the region near to the ideal solution. Some studies have been devoted to reducing the computation complexity. Fast sparse Bayesian learning (SBL) turns sparse problem to maximum a posteriori estimation (MAP) and searches sparse coefficients according to Bayesian framework [13], it gets rid of the influence of regularization parameter. RD method transforms the global optimization problem to local optimization with RD techniques such as beam-space post-Doppler [14], which cuts down the dimension of variables and has a significant computational savings. Spectrum information can also be used to design a more suitable dictionary [15, 16], this way is easy to achieve and reduces iterating time significantly, however, a relatively small number of atoms may destroy the stability of algorithms [17, 18].

Consider the complexity limitation of hardware in practical application, the spectrum-aid method seems the most feasible way of the above fast methods, however, its performance and stability cannot be satisfied with the requirement of the application. This is because the sparsity of the residual grids after the spectrum-aid method is weaker than before. In this paper, the noise grids are introduced as additional information to improve the sparsity of the spectrum-aid methods and hence a novel RD dictionary is proposed, but different from the conventional overcomplete dictionary, only a few noise grids are used in the proposed method. It should be pointed out that our method is to focus on the improvement of the SR-STAP model but not on solving algorithms. Based on this core idea, we specially design a second grid screening structure to make the best use of spectrum information and get rid of the useless grids as much as possible. In the proposed algorithm, the region segmentation criterion is redesigned and the whole angle-Doppler space can be separated into three different areas according to the contribution toward the clutter. Then three different selection strategies are independently performed in their areas to select suitable space-time grids. Numerical experiment results verify the performance of the proposed method.

The remainder of this paper is arranged as follows: Sect. 2 establishes the basic signal mathematical model, which mainly includes the system working environment and the signal components of radar echo. The conventional STAP is also introduced in this section. Section 3 describes the method of SR-STAP and Sect. 4 makes a analysis of the effect of clutter grids and noise grids for performance. Section 5 gives a detailed description of the proposed RD dictionary construction algorithm. In Sect. 6, the numerical experiments are executed to verify the performance of the RD dictionary. Finally, Sect. 7 is the conclusion of the paper.

2 Basic signal model and conventional STAP principle

Consider a phased-array pulsed Doppler (PD) radar is placed in a moving platform which keeps a constant velocity v flying along with x-axis in the height of H as shown in Fig. 1. PD radar transmits M pulse with fixed PRF \(f_{r}\) in each coherent processing interval (CPI), and the transmitted signal is a narrow-band modulating signal with the carrier frequency \(f_{c}\). Assume that the structure of the antenna array is a uniform linear array (ULA) which includes N omnidirectional ideal elements, and these antenna units are uniformly distributed in the array with the interval d (usually it is equal to half wavelength). Here, our analysis focuses on side-looking mode which means crab angle \(\theta _{p}=0^{\circ }\), and the array length \(L_{a}\) is far smaller than the slant-range R which is the distance between the platform and scatter, hence all the elements can be viewed as having the same incidence direction with the reference element. \(\varPsi\), \(\theta\) and \(\phi\) are the cone angle, azimuth angle and elevation angle of the array relative to the ground unit, respectively.

Fig. 1
figure 1

The geometry of airborne radar illumination

Through the receiving process and matched filter, the multi-channel echo signal can be organized as a \(N \times M \times L\) three-dimensional data matrix, where N, M and L are the number of spatial elements, slow beat (pulse number) and snapshot (range samples), respectively. For a given range cell, the target detection problem can be demonstrated as a binary hypothesis testing problem as follow

$$\begin{aligned} {\left\{ \begin{array}{ll} H_1:\varvec{x}=\varvec{x}_{\varvec{c}}+\varvec{x}_{\varvec{t}} +\varvec{n}\\ H_0:\varvec{x}=\varvec{x}_{\varvec{c}}+\varvec{n} \end{array}\right. } \end{aligned}$$
(1)

Hypothesis \(H_1\) denotes the case of target presence and \(H_0\) denotes the case of target absence. \(\varvec{x}_{\varvec{c}} \in \mathbb {C}^{ NM \times 1 }\) is the clutter component of the echo signal, and \(\varvec{x}_{\varvec{t}} \in \mathbb {C}^{ NM \times 1 }\) denotes the target signal component. \(\varvec{n}\) is \(NM \times 1\) white Gaussian noise vector. Note that the interference signal is not considered in Eq. (1) because it is not a key point in this paper. J.Ward clutter model [19] thought that the received signal could be expressed as the sum of reflection of small clutter patches when they are independent of each other. Based on this assumption, if no ambiguity appears in the current range cell, \(\varvec{x}_{\varvec{c}}\) can be approximately written as

$$\begin{aligned} \varvec{x}_{\varvec{c}}= \sum _{i=1}^{N_c} \gamma _i \varvec{s}(f_i^s,f_i^d) \end{aligned}$$
(2)

where \(N_c\) means the whole number of divided clutter patches. \(\gamma _i\) is the complex scatter coefficient of the ith clutter patch. \(\varvec{s}(f_i^s,f_i^d)\) represents the \(NM \times 1\) complex space-time steering vector, where \(f_i^s\) and \(f_i^d\) are the spatial angle frequency and Doppler frequency, and can be denoted as \(f_i^s= \frac{d}{\lambda }cos \varphi cos \theta _i\) and \(f_i^d=\frac{2v}{\lambda f_r} cos \varphi cos(\theta _i + \theta _p)\), respectively. \(\lambda\) denotes the radar wavelength. Furthermore, \(s(f_i^s,f_i^d )\) can be written as

$$\begin{aligned} \varvec{s}(f_i^s)= & {} [1,e^{2 \pi f_i^s},\ldots ,e^{2 \pi n f_i^s},\ldots ,e^{2 \pi (N-1) f_i^s}]^T \nonumber \\ \varvec{s}(f_i^d)= & {} [1,e^{2 \pi f_i^d},\ldots ,e^{2 \pi m f_i^d},\ldots ,e^{2 \pi (M-1) f_i^d}]^T \nonumber \\ \varvec{s}(f_i^s,f_i^d)= & {} \varvec{s}(f_i^s) \otimes \varvec{s}(f_i^d) \end{aligned}$$
(3)

where \([ \cdot ]^T\) is the matrix transposition and \(\otimes\) denotes the Kronecker product. n and m represent the index of antenna element and pulse. The formation of \(\varvec{x}_{\varvec{t}}\) is similar to the \(\varvec{x}_{\varvec{c}}\) and can be written as

$$\begin{aligned} \varvec{x}_{\varvec{t}}= \sum _{i=1}^{N_t} \xi _i \varvec{s}(f_i^s,f_i^d) \end{aligned}$$
(4)

where \(\xi _i\) means the complex amplitude for ith moving target in the current range cell. The only difference between (4) and (2) is that \(N_t\) and it means the moving targets count in CUT. Finally, the signal echo of \(H_1\) is rewritten as

$$\begin{aligned} \varvec{x}= \sum _{i=1}^{N_c} \gamma _i \varvec{s}(f_i^s,f_i^d)+\sum _{i=1}^{N_t} \xi _i \varvec{s}(f_i^s,f_i^d)+\varvec{n} \end{aligned}$$
(5)

and the \(NM \times NM\) CNCM \(\varvec{R}\) of the received signal in the current snapshot is

$$\begin{aligned} \varvec{R}= \sum _{i=1}^{N_c} | \gamma _i |^2 \varvec{s}(f_i^s,f_i^d) \varvec{s}^{H}(f_i^s,f_i^d) + \sigma _n^2 \varvec{I} \end{aligned}$$
(6)

where \(| \cdot |\) denotes the absolute value operation and \(( \cdot )^H\) is the conjugate transpose operation. \(\sigma _n^2\) represents the noise variance of the current test cell. \(\varvec{I}\) is \(NM \times NM\) identity matrix in which diagonal elements are one. It should be noted that the targets are not included in CNCM. For the conventional STAP methods, the adaptive filter weight can be designed according to

$$\begin{aligned} \varvec{w}= \mu \varvec{R}^{-1} \varvec{s}_{\varvec{t}} \end{aligned}$$
(7)

where \(\mu =\frac{1}{\varvec{s}_{\varvec{t}}^H \varvec{R}^{-1} \varvec{s}_{\varvec{t}} }\) represents the normalization factor of weight. \(\varvec{R}^{-1}\) denotes the inverse of CNCM, which is mainly used to form the hollow of original clutter, \(\varvec{s}_{\varvec{t}} \in \mathbb {C}^{NM \times 1}\) represents the expected target direction in the angle-Doppler domain, here the spatial angle is the azimuth angle of illumination and the Doppler frequency is the tested Doppler channel. In practice, \(\varvec{R}\) is unknown, and it is usually be replaced by its estimation \(\hat{\varvec{R}}\) which is calculated from the secondary samples near the CUT by

$$\begin{aligned} \hat{\varvec{R}}= E\{\varvec{X} \varvec{X}^H\} \end{aligned}$$
(8)

where \(E \{ \cdot \}\) is the expectation of range samples. Note that this estimation method supposes that training samples are satisfied with the condition of i.i.d, when in a non-ideal environment, it would suffer from performance loss. \(\varvec{X}=[\varvec{x}_{\varvec{1}},\cdot \cdot \cdot ,\varvec{x}_{\varvec{l}},\cdot \cdot \cdot ,\varvec{x}_{\varvec{L}}]\) is sample data matrix, where \(\varvec{x}_{\varvec{l}}\) is a complex vector of l-th range cell with NM elements.

3 SR-STAP method

Different from the traditional STAP method, in SR-STAP operations, the continuous angle-Doppler plane is discretized into \(N_d N_s\) grid points (\(N_d\) points in Doppler axis, \(N_s\) points in angle axis), usually \(N_s=\beta _s N\) and \(N_d=\beta _d M\) , where \(\beta _s\) and \(\beta _d\) are the coefficients of expansion in angle dimension and Doppler dimension. Note that grids number need to meet \(N_d N_s\gg NM\) but it is not a strict condition. Then all the space-time steering vectors of grids can be used to construct the \(N_d N_s \times L\) overcomplete dictionary set \(\varvec{\varPhi }\) and it is

$$\begin{aligned} \varvec{\varPhi }= [\varvec{s}_{1,1},\ldots ,\varvec{s}_{n_d,n_s},\ldots ,\varvec{s}_{N_d,N_s}] \end{aligned}$$
(9)

where \(\varvec{s}_{n_d,n_s}\) is space-time steering vector corresponding to the grid with kth spatial angle and jth Doppler frequency. Then echo Eq. (2) can be rewritten according to (9) as follow

$$\begin{aligned} \varvec{x}_{\varvec{c}}= \varvec{\varPhi } \varvec{\gamma } \end{aligned}$$
(10)

with

$$\begin{aligned} \varvec{\gamma }= [\gamma _1,\ldots ,\gamma _{N_d,N_s}]^T \end{aligned}$$
(11)

where \(\varvec{\gamma }\) is the complex amplitude of grids in the angle-Doppler domain. The SR-STAP problem can be modeled as

$$\begin{aligned}&\min _{\varvec{\gamma }}||\varvec{\gamma }||_1 \nonumber \\&\quad \text{ s.t. } ||\varvec{\varPhi } \varvec{\gamma } - \varvec{x}||_2 \le \varepsilon \end{aligned}$$
(12)

where \(\min {(\cdot )}\) means to solve the minimum value of the object function. \(||\varvec{\gamma }||_1=\sum _{i=1}^{N_d N_s}|\gamma _i|\) denotes the \(l_1\)-norm which is used to describe the sparsity of recovery, and \(||\cdot ||_2\) is the \(l_2\)-norm which measures the similarity between the recovery solution and original data. \(\varepsilon\) is noise threshold which is usually small value in optimization. One thing is worth pointing out that the original sparse recovery problem is \(l_0\)-norm, here we take \(l_1\)-norm relaxation [20]. Finally, the estimation \(\hat{\varvec{R}}_{sr}\) of SR-STAP can be easy calculated by

$$\begin{aligned} \hat{\varvec{R}}_{sr}= \varvec{\varPhi }^H diag(|\varvec{\gamma }|^2) \varvec{\varPhi } \end{aligned}$$
(13)

where \(diag(\cdot )\) is the diagonalization operation. The rest of the steps is the same as the conventional STAP filter design in Eq. (7).

4 The contribution of different type grids

The essence of (12) is to solve the underdetermined equation \((N_d N_s \gg NM)\), and numerous solutions are existing in the SR-STAP problem. However, in the application, a recessive restraint (clutter ridge is only occupying a little area relative to the whole space-time plane [21]) from the real environment will be added in raw data, which means that \(\varvec{\gamma }\) is high sparsity and can be recovered by SR theory. Overcomplete dictionary \(\varvec{\varPhi }\) plays a vital role in recovery, however observe Eq. (10), it should be known that the redundancy of \(\varvec{\varPhi }\) is significant increasing with the rise of \(\beta _s\) and \(\beta _d\), which leads to the large computation. In fact, based on the prior knowledge of clutter ridge, we know that the most atoms of \(\varvec{\varPhi }\) has a little effect in performance, so whether there exists a way that reduce the number of atoms in \(\varvec{\varPhi }\) while keeping the sparsity of signal. Generally speaking, it needs add more grids in clutter area and less grids in noise area as in [16]. However when the noise girds reduce to zero [15], it will lead to a poor recovery due to the decreasing of sparsity. This illustrates that clutter atoms number represents the accuracy of recovery and noise atoms are used to stay the sparsity of signal. Therefore, a more concrete selecting principle can be given, to increase the accuracy of recovery, it needs to select more clutter grids, but it also needs to add more noise grids to keep the sparsity of signal. Following RD dictionary design is mainly based on the above principle.

5 RD dictionary construction strategy

As shown in Fig. 2, the proposed structure of RD dictionary consists of four parts including spectrum extracting, region segmentation, grids screening and RD dictionary construction. The prior knowledge which hides in the echo signal is extracted in the step one and then is used to exclude useless grids in step three. Finally, in step four,the RD transformation matrix which is formed by the saving grids. Following is the detailed description for each part.

Fig. 2
figure 2

The processing structure of screening strategy

5.1 Extracting spectrum information

Spectrum analysis offers a set of spectrum transformation methods to convert the time signal to frequency domain signal, such as Fourier transform (FT), Capon transform, Multiple Signal Classification (MUSIC) and Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) [22]. In these algorithms, FT is the commonly used method in practical application due to the low-complexity. In the angle-Doppler domain, it is similar to the FT, the angle-Doppler transform can be achieved with the space-time steering vector basis as

$$\begin{aligned} P_k^j=|\varvec{s}_{k,j}^H \tilde{\varvec{R}} \varvec{s}_{k,j}|,k=1, \ldots , N_s, j=1, \cdot \cdot \cdot , N_d \end{aligned}$$
(14)

where \(\tilde{\varvec{R}}\in \mathbb {C}^{NM \times NM}\) is the estimation of the clutter covariance matrix with the current CUT and \(\varvec{s}_{k,j} \in \mathbb {C}^{NM \times 1}\) denotes the space-time steering vector basis which is corresponding to the kth and jth angle-Doppler grid. Different from the conventional FT method, Eq. (14) can calculate any grids in the angle-Doppler plane, however, its precision is low. Capon transform is more accurate than FT, which can be expressed as follow

$$\begin{aligned} P_k^j=\frac{1}{|\varvec{s}_{k,j}^H \tilde{\varvec{R}}^{-1} \varvec{s}_{k,j}|},k=1, \ldots , N_s, j=1,\ldots , N_d \end{aligned}$$
(15)

The disadvantage of Capon method is that the big computation of the inverse operation, so it only suits to the situation which is small input data or weak real-time requirement. As for other high-accuracy algorithms, they all not suitable here due to the large running time.

Another alternative method is the least square (LS) algorithm which based on the optimal theory [23], and its spectrum estimate problem can be described as

$$\begin{aligned} \min {||\varvec{\varPhi } \hat{\varvec{P}}-\varvec{x}||}_2 \end{aligned}$$
(16)

where \(\hat{\varvec{P}}\) is the vectorization operation of power matrix \(\varvec{P}\). \(\varvec{x}\) represents the raw echo data. It can be solved directly according to general expression without searching. Let the minimum value of object function (16) is equal to zero, and then the simplified solution is

$$\begin{aligned} \varvec{P}=mat[(\varvec{\varPhi }^T \varvec{\varPhi })^{-1} \varvec{\varPhi }^T \varvec{x}] \end{aligned}$$
(17)

where \(mat[ \cdot ]\) is the transformation operation, which means the \(N_d N_s \times 1\) vector is converted to the \(N_s \times N_d\) matrix. After that, the threshold segmentation method is used to extract spectrum clutter distribution [24]. Consider that the main aim in this step is only to acquire the rough area of clutter rather than the exact distribution, so the mean of angle-Doppler is enough to use, and it is written as

$$\begin{aligned} Th=\frac{1}{N_s N_d} \sum _{i=1}^{N_s} \sum _{j=1}^{N_d} P_{i,j} \end{aligned}$$
(18)

Some suggestions need to be mentioned. In a real application, it usually needs to process large data at the same time. For accelerating the transformation, (14) can be solved by fast Fourier transform (FFT), the price of this is that it only can calculate the uniform spacing grids. The most time cost step of (15) and (17) is the inverse operation which can be calculated according to LU factorization. On the other hand, a more accurate threshold method only has a little improvement in final result but it would spend much time on computation, so a complex segmentation method is not be recommended here.

5.2 Region segmentation

In this section, the slide-window process structure is chosen to achieve the region segmentation. As shown in Fig. 3, four vertices of the window are selected as test grids which are used to detect the type of area. The angle-Doppler plane can be separated into three types according to the test grids. First is the main clutter area, it represents that all the grids in this window are clutter grids of spectrum distribution, on the contrary, the noise area denotes that all the grids are noise components. As for the clutter edge area, it lies in the boundary between clutter and noise, so this region both includes clutter and noise components. In processing, the test window slides successively in the whole plane, and check the type of current area by test grids. The second division aims to shrink the clutter and noise area, and isolate weak effect grids from above region.

Fig. 3
figure 3

The classification diagram of different areas

5.3 Grids screening

For the main clutter region, it includes the most effective clutter grids, in fact, the real ground clutter points mainly locates in the clutter ridge (slash in side-looking mode, curve in non-side-looking mode) but it would spread into neighbor girds due to the grid mismatch problem. The slight change of these grids will cause a severe influence on recovery. Therefore it best to save all the space-time grids in this region.

The clutter edge belongs to the junction of noise area and main clutter area, and it occupies a relatively small area to the whole angle-Doppler space. This means that only the marginal clutter grids are included in this area, i.e., its clutter power is lower than the main clutter power. Therefore the clutter edge has a weak performance improvement in the final accuracy of recovery. Above idea is established in the assumption that the number of edge grids is far smaller than the number of main clutter grids. However when this assumption is not satisfied in the application, the importance of edge grids would enhance with the increase of grids number and it cannot be ignored in this case. Fortunately, the number of the discretized grid is usually large in the most radar system. Therefore, to reduced the computation complexity, the clutter edge grids are directly discarded in the proposed algorithm, even this operation would have a little performance but it further improves the efficiency of the proposed method.

Noise area is the most important region in the screening strategy because its grids number is far bigger than other regions, this also means that the process method of it finally decides the size of the RD dictionary. Noise girds are employed as auxiliary grids to improve the sparsity of the main area but the nature of them is useless grids, hence it is reasonable to only utilize a few grids of noise area. Assume that the sliding-window is small enough compared to the angle-Doppler domain, and slow change occurs in neighbor grids, then the performance of the window can be approximately represented by part of grids in the window, it also can be known the locations of grids have no effect in final result according to the assumption. Here, we define the deletion ratio of grids, and it can be expressed as

$$\begin{aligned} \eta =\frac{n_0}{a^2} \end{aligned}$$
(19)

where \(\eta\) is deletion ratio. \(n_o\) represents the deletion grids in the noise area. When window length a is constant, the lower boundary of \(\eta\) is \(\frac{1}{a^2}\) which means that only one grid is used in the window. Combined with the previous analysis, the random method is designed to delete grids. First is to generate an index of \(n_o\) deletion grids with random number generation (RNG) algorithm, and then exclude grids by index. In application, it could take a fixed index to replace the random index, the advantage of it is no need additional algorithm complexity but it will require more storage memory.

5.4 RD transform

Through the screening process, the residual grids are the atoms of RD dictionary. It can directly construct RD dictionary by transformation matrix as follow

$$\begin{aligned} \tilde{\varvec{\varPhi }}=\varvec{\varPhi } T \end{aligned}$$
(20)

where \(\varvec{T}\) is the \(N_d N_s \times \hat{n}\) logical reduced-dimensional transform matrix and \(\hat{n}\) denotes the number of atoms of RD dictionary, for each column, the 1 only appears once in the index of saving grids (here each column only corresponds to one saving grids) and other locations are 0. However in practical application, the transformation matrix may be large due to the high-dimension of raw data. Here, we offer an easy way to solve this challenge, first is to prepare an additional matrix \(\tilde{\varvec{\varPhi }}\) with the size \(N_d N_s \times \hat{n}\), then copy the column of the overcomplete dictionary to \(\tilde{\varvec{\varPhi }}\) one by one according to the index of remaining grids. The advantage of this method is that it no need to generate T and less computation than Eq. (20).

6 Numerical experiments

In this section, the numerical experiments are executed to examine the performance of the proposed method. The simulation data is generated by adapting the J.Ward clutter model [19] and some key radar system parameters are summarized in Table 1. In order to simulate the real illumination, we only consider the azimuth angle from \(-90^{\circ } to 90^{\circ }\) and set the number of clutter patch \(N_c=180\). In SR-STAP methods, the expansion factor \(\beta\) is 6 and the noise level \(\varepsilon\) is \(10^{-6}\). The number of i.i.d training samples is 2. For conciseness, the proposed method is distinguished from other SR-STAP methods by using the prefix RD, hence three methods are called RD-FT-SR-STAP, RD-LS-SR-STAP and RD-C-SR-STAP, respectively. Moreover, the deletion ration \(\eta\) is fixed in \(\frac{3}{4}\) and \(L_n\) is 2. CVX [25] is selected as the solving tool for SR-STAP in the following experiments. To evaluate the performance of SR-STAP, the signal-clutter-noise ratio (SCNR) loss is used as the criterion and it can be calculated by

$$\begin{aligned} SCNR Loss=\frac{\sigma _n^2 |\varvec{w}^H \varvec{s}_t)|^2}{MN(\varvec{w}^H \varvec{R} \varvec{w})} \end{aligned}$$
(21)

10 random trials for each algorithm are run to guarantee the accuracy of experiments.

Table 1 Radar system parameters

6.1 The comparsion of SCNR loss

In the first experiment, we verify the SCNR loss of the proposed methods. As shown in Fig. 4, the notch of C-SR-STAP and FT-SR-STAP are far wider than other methods. This is because the residual grids of the above methods are insufficient to recover the CNCM. As for LS-SR-STAP, it gets better performance than C-SR-STAP and FT-SR-STAP for the reason that it includes more grids due to the low-resolution of the LS method. However compared with SR-STAP, it can be found that the above spectrum-aid SR-STAP methods have a severe degradation in performance. For the proposed methods, the SCNR loss of them only has a little bit down to the SR-STAP, but they still higher than the spectrum-aid methods. This result indicates the fact that the pure clutter grids cannot support the recovery of CNCM and utilizes the portion of noise grids as supplementary information can effectively improve the performance of spectrum-aid SR-STAP methods.

Fig. 4
figure 4

The SCNR Loss of different dictionaries

6.2 The effect of window size and detection ratio

In this experiment, we analyze the influence of window size and deletion ratio for the proposed method. As shown in Fig. 5, it is the SCNR loss of different window sizes. For three methods, their performance improve with the decreasing of the window size and they acquire the best result when \(L_n\) is equal to 2. However, when \(L_n\) increase to 5, they almost lost the ability due to the insufficient grids of the noise area. In these methods, the RD-LS-SR-STAP seems to have better performance than the other two methods but this is because the LS dictionary has already included more grids. As shown in Fig. 6, in order to evaluate the effect of the deletion ratio, we specially set \(L_n=3\) and take the deletion ratio \(\eta =[\frac{5}{9},\frac{6}{9},\frac{7}{9},\frac{8}{9}]\). There is almost no remarkable change for three RD dictionaries in small window size but it should be known that the effect of \(\eta\) will be intensified with the increase of \(L_n\). Therefore, it can be found the window size is the main factor that decides the performance of the proposed methods. According to the above analysis, we recommend the combination of \(L_n=2\) and \(\eta =\frac{3}{4}\) and this can cut down the most noise grids in the application.

Fig. 5
figure 5

The influence of different window size: a RD-FT-SR-STAP, b RD-LS-SR-STAP, c RD-C-SR-STAP

Fig. 6
figure 6

The influence of different deletion ratio: a RD-FT-SR-STAP, b RD-LS-SR-STAP, c RD-C-SR-STAP

6.3 The effect of spatial error

In this section, we focus on the stability of the proposed methods in existing spatial error. The Guerci error model [3] is used to generate a spatial error matrix and it can be described as following

$$\begin{aligned} f_u (\delta _{\epsilon _{a,n}})= & {} \frac{1}{\varDelta _{\epsilon _{a}}}, 0\le \delta _{\epsilon _{a,n}} \le \varDelta _{\epsilon _{a}}\nonumber \\ f_u (\delta _{\epsilon _{p,n}})= & {} \frac{1}{\varDelta _{\epsilon _{p}}}, -\frac{\varDelta _{\epsilon _{p}}}{2} \le \delta _{\epsilon _{p,n}} \le \frac{\varDelta _{\epsilon _{p}}}{2} \end{aligned}$$
(22)

where \(f_u (\delta _{\epsilon _{a,n}})\) and \(f_u (\delta _{\epsilon _{p,n}})\) are the uniform distribution Probability Density Function (PDF) of amplitude error and phase error, respectively. In practical application, the gain error can be easily compensated by the preprocess, thereby this experiment only considers the case of phase error and the final spatial error can be expressed by

$$\begin{aligned} \varGamma =e^{j\delta _{\epsilon _{p,n}}} \end{aligned}$$
(23)

As shown in Fig. 7, the phase error fluctuation \(\varDelta _{\epsilon _{p,n}}\) in here is set \(5^{\circ }\). It is not surprise that all the SR-STAP methods have a performance loss in the error case. In these methods, FT-SR-STAP and C-SR-STAP have a rapid decrease in performance, hence the spectrum-aid methods are instability to the error condition. In contrary to the above two methods, the proposed methods only have a slight decrease compared to the no error case. This demonstrates that the proposed methods are more robust than spectrum-aid methods, moreover, it also suggests that the noise grids have a great influence in improving stability.

Fig. 7
figure 7

The SCNR Loss of different SR-STAP methods in the case of spatial phase error

6.4 Real measurement data experiment

In order to verify the effectiveness of the RD dictionaries in real environment, we execute SR-STAP in the MCARM data [26]. In the 1990s, the Rome Lab and Northrop Grumman Company developed the MCARM plan, an L-band radar was installed in the BAC-111, and it recorded different ground types echo, these radar data can reveal the property of the real environment. The antenna array has 16 columns and each column has 8 elements, every four elements are organized into one channel. The index of using data is RL050575. The crab angle of the flight is \(\psi =7.28^{\circ }\) which could be regarded as work in the side-looking mode. The radar wavelength \(\lambda =0.2419\) m and PRF \(f_r=1984\) Hz. The distance of elements \(d=0.1092\) m. The speed of the plane \(v \approx 100\) m/s. In the experiment, we select 8 channels to process and the pulses number \(M=12\). The index of the range is from 300 to 350. Consider the running time, the expansion factor \(\beta =6\) in here and the number of sample \(L=6\). The error limitation \(\varepsilon =10^{-6}\). For comparing the result, we add one moving target in a strong clutter area by Moving Target Simulation (MTS), the range cells of it is 313. The normalized Doppler frequency of the target is 0.223. The power and spatial angle are \(-7\) dB and \(0^{\circ }\), respectively. The range-Doppler spectrum of raw data as shown in Fig. 8, and the red circles represent the location of simulated moving targets.

Fig. 8
figure 8

The range-Doppler spectrum of raw data

The suppression of different SR-STAP methods in MCARM data is shown in Fig. 9. It should be pointed out that the spectrum-aid methods fail to recover the CCM in \(\beta =2\). The range-Doppler spectrum of SR-STAP is shown in Fig. 9a and the other three RD-SR-STAP methods are shown in Fig. 9b–d, respectively. It seems that there is no visible difference between the SR-STAP and the proposed methods, which illustrates that only adds a few noise grids to spectrum-aid methods can realize the performance of an overcomplete dictionary. Furthermore, we display the output power of the different range bin in normalized Doppler frequency 0.223 as shown in Fig. 10. It can be easy to find that the moving target emerges in the range cell 313 and the surrounded clutter is eliminated after STAP. For three methods, the RD-FT-SR-STAP and RD-LS-SR-STAP are better than RD-C-SR-STAP. This means that fewer grids cannot effectively recover the CCM even it is combined with noise grids. Therefore, the capon method is not suitable in application because of its computational load and poor suppression.

Fig. 9
figure 9

The RD spectrum of SR-STAP: a SR-STAP, b RD-FT-SR-STAP, c RD-LS-SR-STAP, d RD-C-SR-STAP

Fig. 10
figure 10

The output power of defferent range bin: a SR-STAP, b RD-FT-SR-STAP, c RD-LS-SR-STAP, d RD-C-SR-STAP

The computational complexity of the proposed methods is evaluated in the last experiment and the expansion factor \(\beta\) is used as the variable to change the size of input data. As shown in Fig. 11, the red bar areas represent the addition grid number of SR-STAP compared to the other methods. This means that the SR-STAP needs large grids to recover CCM perfectly. As for the proposed methods, the grid number only has a slight increase compared to spectrum-aid methods but far smaller than the SR-STAP. Next is the running time as shown in Fig. 12, and the symbol ’X’ represents that a failed recovery in current \(\beta\). The spectrum-aid methods often lost efficacy in the small input size, especially the C-SR-STAP cannot normally work in such conditions. Therefore, pure spectrum-aid methods lack practical significance. The proposed methods can still stay the effective recovery in all the input size, which means they are more robust than spectrum-aid methods in real data. The only price is a slight time extension, but the proposed methods can be easier to accept in using than SR-STAP and spectrum-aid methods.

Fig. 11
figure 11

The residual grid number for RD dictionary in different input size: a SR-STAP, b RD-FT-SR-STAP, c RD-LS-SR-STAP, d RD-C-SR-STAP

Fig. 12
figure 12

The running time for RD dictionary in different input size: a SR-STAP, b RD-FT-SR-STAP, c RD-LS-SR-STAP, d RD-C-SR-STAP

7 Conclusion

For conventional SR-STAP methods, the complete dictionary leads to a huge computational load, and this limits the application of SR-STAP methods. The spectrum-aid methods have a great effect in reducing the computation of SR-STAP, but they are sensitive to the input data size. Except that, the spectrum-aid methods have poor performance because they discard lots of grids in recovery. In order to solve the above problems, the proposed methods first introduce the noise grids as the auxiliary information to enhance the sparsity of the recovery variable. Furthermore, a second grids screening method is developed to reject useless grids as much as possible. The exhaustive experiments demonstrate that it is feasible to get a similar recovery as the overcomplete dictionary by adding a few noise grids. Although we give three methods to extract spectrum information, the experiments show that the RD-FT-SR-STAP is more suitable than other two methods because of its balance between performance and computational load. In a word, the proposed method has a significant improvement and its computation is only a slight increase compared to the spectrum-aid methods, hence the proposed methods are more realistic significance. However, the proposed method only adopts the simple random sampling method to eliminate the useless noise grid points, hence a more appropriate screening scheme of noise grids is worth exploring. Furthermore, extending this method to other existing solving algorithms is also one of the main works in the future.