1 Introduction

Space-time adaptive processing (STAP) techniques usually employ the training data to design the filters, which requires that the statistics of the training data are the same as those of the test signal, such as reduced-rank and reduced-dimension methods (see [7, 11, 14, 26] and the references therein), the parametric adaptive matched filter (PAMF) approach [20] and sparse beamformers [35, 36]. However, in most real scenarios, the statistics of the training data are different from those of the test signal, resulting in significant performance degradation of conventional STAP algorithms. On the other hand, an alternative strategy that uses the received data in the cell under test (CUT) and requires no training data is developed to avoid estimation distortion caused by different statistics of the training data. Two types of approaches based on this idea were proposed: the direct data domain least-squares (D3-LS) STAP approach [24] and the maximum likelihood estimation detector (MLED) approach [1, 4, 5], where the first one was a deterministic approach and the latter one was a statistical processor. Unfortunately, the benefits of these approaches come at the cost of reduced system degrees of freedom (DOFs) leading to decreased performance. Hybrid detection approaches combining the STAP algorithms using the training data and those solely using the data in the CUT were subsequently put forward with improved robustness to nonhomogeneous environments [2, 30]. Knowledge-aided (KA) STAP methods were developed to enhance the detection performance, especially in nonhomogeneous environments (see, e.g., [9, 27] and the references therein). Specifically, the authors in [17] introduced a KA parametric covariance estimation (KAPE) scheme by blending both prior knowledge and data observations within a parameterized model to capture instantaneous characteristics of the CUT. But the performance of the KAPE approach relies on the accuracy of prior knowledge.

Recently, sparsity-based STAP techniques have been considered for applications in moving target indication (MTI) [15, 16, 18, 2123, 25, 3134]. In [18, 23], they supposed that the moving targets in the spatio-temporal plane were sparse and excluded the clutter ridge with a mask or eliminated the clutter components with conventional STAP algorithms before applying a sparse regularization (SR) to the STAP filter design. However, the performance of these methods relies on the accurate prior knowledge of the clutter ridge or the quantities of the elimination of the clutter components. The approaches in [15, 16] assumed that the entire scene with the target and clutter was sparse in the spatio-temporal plane and tried to reconstruct both the clutter and the target. Then, direct target detection was followed by using the reconstructed clutter and target image. Similar to the above idea, [22, 25, 32, 34] also first reconstructed the scene with the clutter and the target, or only the clutter, but designed the STAP filter to suppress the clutter before target detection. Considering the interference with both jammers and clutter, the approach in [21] sequentially estimated the jammer, clutter and target from measurements on passive radar operation, measurements adjacent to the CUT and measurements in the CUT, respectively. The work in [33] performed a theoretical analysis on clutter sparsity for a side-looking uniform linear array (ULA) with constant pulse repetition frequency (PRF), constant velocity and no crab. In [31], performance analysis and parameters setting of sparsity-based STAP algorithms based on five representative fast SR techniques were conducted. Compared with conventional STAP algorithms, the above sparsity-based STAP algorithms provide high resolution of the scene and exhibit significantly better performance in a very small number of snapshots [21, 25, 31, 32, 34] or even a single snapshot in the CUT [15, 16, 18, 22, 23, 33].

In this work, we develop a sparsity-based D3-STAP algorithm by formulating a more realistic sparse measurement model for airborne radar that considers the intrinsic clutter motion (ICM), whereas none of the above-mentioned sparsity-based STAP articles addresses the ICM problem. By exploiting the fact that the ICM can be modeled as a Hadamard product of a unitary modulo ”tapering” by the response of clutter patches associated without the ICM [10] and the properties of the Hadamard product, the proposed algorithm can be divided into four steps. First, it applies the focal underdetermined system solution (FOCUSS) method [12] as a sparse recovery technique to obtain the target plus clutter estimate. The primary clutter covariance matrix is sequentially calculated by excluding the target signal using a prior knowledge of the assumed target spatio-temporal frequencies in the spatio-temporal domain. Then, it combines the estimated clutter covariance matrix and the received data to adaptively estimate the ”covariance matrix taper” (CMT). Third, it obtains the final clutter covariance matrix estimate by a Hadamard product of the CMT and the previously estimated clutter covariance matrix. Finally, it designs the STAP filter weights using the estimated clutter covariance matrix and suppresses the clutter, followed by the target detection. We consider a different clutter matrix modeling to that of [7, 10, 17]. Unlike [7, 10, 17] which employ a conventional STAP with CMT, we employ the SR framework to model the intrinsic sparsity of the target plus clutter spectrum and to derive the proposed sparsity-based D3-STAP algorithm. The numerical results show that the proposed algorithm provides a considerable signal-to-interference-plus-noise ratio (SINR) improvement compared with the existing D3-STAP algorithms, namely the D3-LS STAP algorithm and the conventional sparsity-based D3-STAP algorithm.

The main contributions of our paper are listed as follows:

  • By introducing a sparse measurement model considering the ICM, a novel sparsity-based D3-STAP algorithm is proposed. Additionally, related parameter settings are analyzed.

  • In the proposed algorithm, a CMT framework is introduced to overcome the ICM problem. An efficient adaptive approach is developed to select the best CMT. Furthermore, fast implementations of the proposed algorithm are developed.

  • A study and comparative analysis of the proposed algorithm with other existing STAP algorithms for radar systems is carried out.

The remainder of the paper is organized as follows. In Sect. 2, we first develop a sparse measurement model considering ICM for airborne radar systems. Then, in Sect. 3, we derive the proposed sparsity-based D3-STAP algorithm and also discuss the performance metrics for the proposed algorithm. Simulated airborne radar data are used to evaluate the performance of the proposed algorithm in Sect. 4. Section 5 provides the summary and conclusions.

Notation: Scalar quantities are denoted with italic typeface. Lowercase boldface quantities denote vectors and uppercase boldface quantities denote matrices. The operations of transposition, complex conjugation, and conjugate transposition are denoted by superscripts \(T, *\), and H, respectively. The symbols \(\otimes , \odot , \mathrm{tr}\) represent the Kronecker product, Hadamard product and trace operation, respectively. The symbol \(E\left[ \cdot \right] \) denotes the expected value of a random quantity, \(\Vert x\Vert _{p}\) denotes the \(l_{p}\)-norm operation of x, and \(|\cdot |\) denotes the absolute operation. \(\mathbf{1}_N\) and \(\mathbf{1}_{N \times N}\) represent the \(N \times 1\) vector with N unity-elements and the \(N \times N\) matrix with all unity-elements, respectively. \(\mathrm{Toeplitz}(\cdot )\) denotes the Toeplitz matrix.

2 Sparse Measurement Model

Considering a pulsed Doppler side-looking airborne radar moving with constant velocity \(v_p\), we assume that the radar antenna is a ULA which consists of M elements with inner spacing \(d_a\). The radar transmits N coherent burst of pulses at a constant PRF \(f_r\), and the transmitter carrier frequency is \(f_c=c/\lambda _c\), where c is the propagation velocity and \(\lambda _c\) is the wavelength.

In airborne radar, if we discretize the whole spatio-temporal plane into \(N_sN_d \,(N_sN_d \gg NM\)) grid points, where \(N_s\) and \(N_d\) are the number of grid points along the spatial and temporal/Doppler frequency axes, respectively, a nonzero element from any such grid point would suggest the presence of a scatterer at that particular spatial and Doppler frequencies [21]. Thus, the target return can be represented by

$$\begin{aligned} \mathbf{x}_t = {\varvec{\Phi }}{\varvec{\gamma }}_t, \end{aligned}$$
(1)

where \({\varvec{\gamma }}_t = [\gamma _{t;1,1},\gamma _{t;1,2},\ldots , \gamma _{t;N_d,N_s}]^T\) denotes the \(N_dN_s \times 1\) target spatio-temporal profile with nonzero elements representing the target scatterers. The matrix \({\varvec{\Phi }}\) is the \(NM \times N_dN_s\) over-completed space-time steering dictionary, as given by

$$\begin{aligned} {\varvec{\Phi }}=[\mathbf{v}(f_{d,1}, f_{s,1}),\ldots , \mathbf{v}(f_{d,1}, f_{s,N_s}),\ldots , \mathbf{v}(f_{d,N_d}, f_{s,N_s})], \end{aligned}$$
(2)

where

$$\begin{aligned} \mathbf{v}(f_{d,i},f_{s,k})= & {} \mathbf{v}_t(f_{d,i}) \otimes \mathbf{v}_s(f_{s,k}), \nonumber \\&i=1,\ldots ,N_d, \ k=1,\ldots ,N_s. \end{aligned}$$
(3)

is an \(NM \times 1\) ideal space-time steering vector with

$$\begin{aligned}&\mathbf{v}_t(f_{d,i})= [1,\ldots , \exp (j 2\pi (N-1)f_{d,i})]^T,\end{aligned}$$
(4)
$$\begin{aligned}&\mathbf{v}_s(f_{s,k})=[1, \ldots , \exp ( j 2\pi (M-1)f_{s,k})]^T, \end{aligned}$$
(5)

indicating the ideal temporal/Doppler and spatial steering vectors, respectively.

As an approximation to a continuous field of clutter, the clutter return from iso-range of interest can be modeled as the superposition of a large number \(N_c\) of independent clutter patches that are evenly distributed in angle of arrival (AOA) about the radar [26]. Then, ignoring the impact of range ambiguities, the clutter component \(\mathbf{x}_c\) can be represented as

$$\begin{aligned} \mathbf{x}_c = {\varvec{\Phi }}{\varvec{\gamma }}_c, \end{aligned}$$
(6)

where \({\varvec{\gamma }}_c\) denotes the \(N_dN_s \times 1\) clutter spatio-temporal profile with nonzero elements representing clutter patches.

In real applications, in the presence of the ICM, the clutter return can be manifested as a random modulation of the form [10]

$$\begin{aligned} \mathbf{x}_{ct} = \mathbf{x}_c \odot \mathbf{t} = ({\varvec{\Phi }}{\varvec{\gamma }}_c) \odot \mathbf{t}, \end{aligned}$$
(7)

where \(\mathbf{t}=\mathbf{t}_d \otimes \mathbf{1}_M\) and \(\mathbf{t}_d\), called the temporal decorrelation term, is an \(N \times 1\) vector random process which is uncorrelated with \(\mathbf{x}_c\). Then, the received signal, composed of the moving target, the clutter and the receiver thermal noise, can be represented by

$$\begin{aligned} \mathbf{x} = \mathbf{x}_{t} + \mathbf{x}_{ct} + \mathbf{n}, \end{aligned}$$
(8)

where the \(NM \times 1\) thermal noise vector \(\mathbf{n}\) is usually assumed to be both spatially and temporally uncorrelated. On the one hand, because the clutter responses occupy only a diagonal ridge on the spatio-temporal plane (for the side-looking ULA or a set of concentric ellipses (for the nonside-looking case) [22], and the number of targets is limited, the number of nonzero elements in the spatio-temporal profile is much smaller than the dimension \(N_dN_s\) of that. On the other hand, it is proved that for the case of side-looking radar with a ULA, constant PRF, constant platform velocity and no crab angle, there is a group of space-time steering vectors (whose number is equivalent to the clutter rank) that can approximately represent the clutter subspace [33].Footnote 1 Therefore, there is a high degree of sparsity in the spatio-temporal profile, thereby calling the model in (8) as the sparse measurement model.

We assume the responses of the clutter are zero-mean, complex Gaussian random numbers; then, the clutter covariance matrix is given by

$$\begin{aligned} \mathbf{R}_{ct}= & {} E[\mathbf{x}_{ct}{} \mathbf{x}^H_{ct}] = E[\big (({\varvec{\Phi }}{\varvec{\gamma }}_c) \odot \mathbf{t}\big )\big (({\varvec{\Phi }}{\varvec{\gamma }}_c) \odot \mathbf{t}\big )^H] \nonumber \\= & {} ({\varvec{\Phi }}{\varvec{\Gamma }}_c{\varvec{\Phi }}^H) \odot \mathbf{T}, \end{aligned}$$
(9)

where \({\varvec{\Gamma }}_c = E[{\varvec{\gamma }}_c{\varvec{\gamma }}^H_c]= \mathrm{diag}(\mathbf{P})\) is a diagonal matrix, \(\mathbf{P} = E[{\varvec{\gamma }}_c\odot {\varvec{\gamma }}^*_c]\) denotes the powers of \({\varvec{\gamma }}_c\) and

$$\begin{aligned} \mathbf{T} = E[\mathbf{t}{} \mathbf{t}^H] = E\left[ (\mathbf{t}_d \otimes \mathbf{1}_M)(\mathbf{t}_d \otimes \mathbf{1}_M)^H\right] \end{aligned}$$
(10)

is a CMT.Footnote 2

3 Proposed Sparsity-Based D3-STAP Algorithm

In this section, we detail the principles of the proposed sparsity-based D3-STAP algorithm and the related issues about parameter settings. Then, it is presented an automatic approach to adjust the CMT of the proposed algorithm. Finally, the performance metrics are discussed and introduced for the proposed algorithm.

3.1 Spatio-Temporal Spectrum Estimation

As the conventional sparsity-based D3-STAP algorithm, we first estimate the spatio-temporal profile \({\varvec{\gamma }}_c\). Because \(N_dN_s \gg NM\), the space-time steering dictionary \({\varvec{\Phi }}\) is ill-conditioned, which results in the solution \({\varvec{\gamma }}_c\) of \(\mathbf{x}\) being an underdetermined inverse problem. In this case, the LS solution usually has a large norm and a poor performance. From the above section, we note that there is a high degree of sparsity in the target and clutter spatio-temporal profiles. Motivated by the recent developing SR techniques, we can obtain the estimates of spatio-temporal profiles by solving the minimization of the following objective function:

$$\begin{aligned} ({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)= & {} \arg \min _{{\varvec{\gamma }}_t, {\varvec{\gamma }}_c} \mathcal {L}({\varvec{\gamma }}_t, {\varvec{\gamma }}_c) \nonumber \\= & {} \arg \min _{{\varvec{\gamma }}_t, {\varvec{\gamma }}_c} \left\| \mathbf{x} - \mathbf{x}_{t} - \mathbf{x}_{ct}\right\| ^2_2 +\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p, \end{aligned}$$
(11)

where \(0 \le p \le 1\), and \(\kappa \) represents the regularization parameters that provide a trade-off between the approximation error and the sparsity.

Substituting (7) into (11), we have

$$\begin{aligned} {\mathcal {L}}({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)= & {} \left\| \mathbf{x} - {\varvec{\Phi }} {\varvec{\gamma }}_t - ({\varvec{\Phi }}{\varvec{\gamma }}_c) \odot (\mathbf{t}_d \otimes \mathbf{1}_M)\right\| ^2_2 \nonumber \\&\qquad +\, \kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p. \end{aligned}$$
(12)

To solve (12) more effectively, we assume the target is also affected by the temporal decorrelation \(\mathbf{t}_d\). Then, (12) becomes

$$\begin{aligned} \mathcal {L}'({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)= & {} \left\| \mathbf{x} - {\varvec{\Phi }}\left( {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right) \odot (\mathbf{t}_d \otimes \mathbf{1}_M)\right\| ^2_2 \nonumber \\&\qquad +\,\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p. \end{aligned}$$
(13)

The above approximation assumption is justified as follows: (1) the target power is usually much lower than the clutter power, resulting in the recovery the clutter being easier than the target, which means that the target may be not recovered when in small target case and the above approximation effects can be ignored. (2) If the target power is high or comparable to the clutter power, this approximation assumption will lead to the target energy spreading to the adjacent spatio-temporal grids. But the effects of this can be mitigated by the developed approach that will be discussed when estimating the clutter covariance in the parts below.

Conventionally, the amplitudes of the elements in the temporal decorrelation vector \(\mathbf{t}_d\) caused by the ICM are assumed unities,Footnote 3 such as the commonly used Billingsley model [7] and the Gaussian model [26]. Define the matrix \(\mathbf{Q}\) as

$$\begin{aligned} \mathbf{Q} = \mathrm{diag}(\mathbf{t}_d \otimes \mathbf{1}_M). \end{aligned}$$
(14)

Then, \(\mathbf{Q}\) is an \(NM \times NM\) diagonal matrix and also orthonormal matrix. Thus, (13) can be rewritten with the form of (15),

$$\begin{aligned} \mathcal {L}'({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)= & {} \left\| \mathbf{Q}\left[ \mathbf{Q}^{-1}{} \mathbf{x} - \mathbf{Q}^{-1}{\varvec{\Phi }}\left( {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right) \odot (\mathbf{t}_d \otimes \mathbf{1}_M)\right] \right\| ^2_2 +\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p \nonumber \\= & {} \left\| \mathbf{Q}\left[ \mathbf{Q}^{-1}{} \mathbf{x} -{\varvec{\Phi }}\left( {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right) \odot (\mathbf{Q}^{-1}\big (\mathbf{t}_d \otimes \mathbf{1}_M)\big )\right] \right\| ^2_2 +\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p \nonumber \\= & {} \left\| \mathbf{Q}\left[ \mathbf{Q}^{-1}{} \mathbf{x} -{\varvec{\Phi }}\left( {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right) \right] \right\| ^2_2 +\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p. \end{aligned}$$
(15)

By exploiting the unitary invariance property of the \(l_2\)-norm,Footnote 4 we can discard the multiplication by \(\mathbf{Q}\) in the \(l_2\)-norm term in (15) and obtain [6]

$$\begin{aligned} \min _{{\varvec{\gamma }}_t, {\varvec{\gamma }}_c} {\mathcal {L}}'({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)= & {} \min _{{\varvec{\gamma }}_t, {\varvec{\gamma }}_c} \left\| \mathbf{Q}^{-1}{} \mathbf{x} - {\varvec{\Phi }}\left( {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right) \right\| ^2_2 +\kappa \left\| {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\right\| _p \nonumber \\= & {} \min _{\tilde{\varvec{\gamma }}} \left\| \tilde{\mathbf{x}} -{\varvec{\Phi }}\tilde{\varvec{\gamma }}\right\| ^2_2 +\kappa \left\| \tilde{\varvec{\gamma }} \right\| _p. \end{aligned}$$
(16)

Here, \(\tilde{\varvec{\gamma }} = {\varvec{\gamma }}_t + {\varvec{\gamma }}_c\) and \(\tilde{\mathbf{x}} = \mathbf{Q}^{-1}{} \mathbf{x}\). It is found that the the minimization of the objective function \(\mathcal {L}'({\varvec{\gamma }}_t, {\varvec{\gamma }}_c)\) has a similar formulation as that presented by articles [15, 16, 18, 2123, 25, 3234] when there is no subspace leakage. However, since there are unknown parameters \(\mathbf{t}_d\) in \(\tilde{\mathbf{x}}\), the above problem cannot be directly solved by the SR techniques.

Next, we will show how to use the FOCUSS algorithm to estimate \(\tilde{\varvec{\gamma }}\) with the unknown temporal decorrelation term \(\mathbf{t}_d\). The principle of the FOCUSS algorithm is to use the weighted \(l_2\)-norm minimization to operate recursive adjustments to the weighting matrix until most elements of the solution are close to zero [12]. The basic form of the FOCUSS algorithm for problem (16) is composed of the following two steps:

$$\begin{aligned}&\mathrm{Step 1:} \quad \mathbf{W}_q = \mathrm{diag}(\left| \tilde{\varvec{\gamma }}_{q-1}\right| ^{1-p/2}), \ 0\le p \le 1\end{aligned}$$
(17)
$$\begin{aligned}&\mathrm{Step 2:}\quad \tilde{\varvec{\gamma }}_{q} = \mathbf{W}_q({\varvec{\Phi }} \mathbf{W}_q)^\dag \tilde{\mathbf{x}}, \end{aligned}$$
(18)

where \((\mathbf{A})^\dag = \mathbf{A}^H\left( \mathbf{A}{} \mathbf{A}^H\right) ^{-1}\) denotes the pseudo-inverse operation of matrix \(\mathbf{A}\). Since there is an unknown parameter \(\mathbf{Q}\) in \(\tilde{\mathbf{x}}\), it can not directly use the above iterations to estimate \(\tilde{\varvec{\gamma }}\). We rewrite \(|\tilde{\varvec{\gamma }}_{q}|\) as (19),

$$\begin{aligned} \left| \tilde{\varvec{\gamma }}_{q}\right|= & {} \left| \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \tilde{\mathbf{x}}\right| \nonumber \\= & {} \left\{ \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \tilde{\mathbf{x}}\right) \odot \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \tilde{\mathbf{x}}\right) ^*\right\} ^{1/2} \nonumber \\= & {} \left\{ \left[ \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \right) \odot \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \right) ^*\right] \left( \tilde{\mathbf{x}} \odot \tilde{\mathbf{x}}^*\right) \right\} ^{1/2} \nonumber \\= & {} \left\{ \left[ \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \right) \odot \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \right) ^*\right] \left( \mathbf{x} \odot \mathbf{x}^*\right) \right\} ^{1/2} \nonumber \\= & {} \left\{ \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \mathbf{x}\right) \odot \left( \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \mathbf{x}\right) ^*\right\} ^{1/2} \nonumber \\= & {} \left| \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \mathbf{x}\right| . \end{aligned}$$
(19)

From (19), we observe that \(\mathbf{W}_q\) is independent of the temporal decorrelation term \(\mathbf{t}_d\) and can be uniquely determined by the received snapshot \(\mathbf{x}\) and \({\varvec{\Phi }}\). In other words, the presented FOCUSS algorithm, termed as CMT-FOCUSS algorithm, can be operated in an iterative way with the following two main steps:

$$\begin{aligned}&\mathrm{Step 1:} \quad \mathbf{W}_q = \mathrm{diag}(\left| {\varvec{\gamma }}_{q-1}\right| ^{1-p/2}), \ 0\le p \le 1\end{aligned}$$
(20)
$$\begin{aligned}&\mathrm{Step 2:} \quad {\varvec{\gamma }}_{q} = \mathbf{W}_q({\varvec{\Phi }}{} \mathbf{W}_q)^\dag \mathbf{x}, \end{aligned}$$
(21)

The final solution \(\tilde{\varvec{\gamma }}_{q}\) is updated by

$$\begin{aligned} \tilde{\varvec{\gamma }}_{q}= & {} \mathbf{W}_q({\varvec{\Phi }} \mathbf{W}_q)^\dag \tilde{\mathbf{x}} \nonumber \\= & {} (\mathbf{W}_q({\varvec{\Phi }} \mathbf{W}_q)^\dag \mathbf{x}) \odot (\mathbf{t}_d \otimes \mathbf{1}_M) = {\varvec{\gamma }}_{q} \odot (\mathbf{t}_d \otimes \mathbf{1}_M). \end{aligned}$$
(22)

The iterative process terminates when certain criteria are satisfied, e.g., when the iteration number achieves a preset limit \(q_\mathrm{max}\), or when the relative change in \({\varvec{\gamma }}\) between consecutive iterations being sufficiently small: \(\left| ({\varvec{\gamma }}_q - {\varvec{\gamma }}_{q-1})/{\varvec{\gamma }}_q\right| \le \epsilon \) (where \(\epsilon \) is a small positive number). Here, we keep the representation of \(\tilde{\varvec{\gamma }}_{q}\) including the unknown term \(\mathbf{t}_d\). In the following, we will show that it can estimate the clutter covariance matrix by estimating the CMT instead of \(\mathbf{t}_d\).

Similar as the derivation of the above proposed CMT-FOCUSS algorithm, other versions of the FOCUSS algorithms, such as regularized FOCUSS [3, 19] and adaptive regularized FOCUSS [29], can also be applied to solve for problem (16). For space limitation, further details about parameters setting and implementation issues of the FOCUSS algorithm are referred to see [3, 12, 19, 29].

3.2 Target Detection

After obtaining the spatio-temporal spectrum in the CUT with high accuracy, it is possible to make a direct amplitude detection in the spatio-temporal domain. However, this relies on many conditions, such as a high target’s signal-to-noise ratio (SNR), very small off-grid errors and sufficient noncoherence of the overcomplete dictionary. In fact, the columns in the overcomplete dictionary are highly correlated because of \(N_sN_d \gg NM\). Off-grid errors always exist for target and clutter estimation. It is also hard to make sure there is a high target’s SNR. Therefore, we cannot guarantee a high-accuracy target estimation from a surrounding strong clutter using SR. But we can expect to recover significant clutter components, resulting in a good estimation of the clutter subspace. This is because the clutter to noise is always high and we care much more about the recovery of the whole clutter subspace than the exact positions of the clutter components. We can only extract the clutter components to form the adaptive filter to suppress the clutter and then detect the target. This extraction requires the prior knowledge of the assumed target spatio-temporal frequencies, which is available and is also used in the STAP algorithms employing the generalized sidelobe canceler (GSC-STAP) [11, 35], the D3-LS STAP [24] and the MLED [1, 4, 5]. However, the differences among them lie that: In GSC-STAP, it uses this prior knowledge to form the signal blocking matrix; in D3-LS STAP, it uses this prior knowledge to eliminate the influence of the target signal in the direct data domain; in MLED, it uses this prior knowledge to form the estimated covariance matrix. Unlike all the algorithms mentioned above, we use this prior knowledge to exclude the target signal in the recovered spatio-temporal domain, as also done in [22].

We first determine the signal of interest area \(\varOmega _\mathrm{SOI}\) in the spatio-temporal plane using prior knowledge of the target signal as

$$\begin{aligned} \varOmega _\mathrm{SOI} = \left\{ (i_{n_1},k_{n_1}),(i_{n_2},k_{n_2}), \ldots ,(i_{n_{N_\mathrm{SOI}}},k_{n_{N\mathrm{SOI}}})\right\} , \end{aligned}$$
(23)

where \((i_{n_m},k_{n_m}), 1 \le m \le N_\mathrm{SOI}\) denote the possible indexes of the target signal in the discretized spatio-temporal plane. The size of \(N_\mathrm{SOI}\) reflects the uncertainty or the energy spread along the spatial and temporal frequency axes. Then, the estimated clutter spatio-temporal profile \(\tilde{\varvec{\gamma }}_c\) is given by

$$\begin{aligned} \tilde{\gamma }_{c;i,k} = \left\{ \begin{array}{lll} \tilde{\gamma }_{i,k} &{}\hbox {for}&{} (i,k) \notin \varOmega _\mathrm{SOI}\\ 0 &{}\text{ for }&{} (i,k) \in \varOmega _\mathrm{SOI} \end{array}\right. , \end{aligned}$$
(24)

where \(\tilde{\gamma }_{c;i,k}\) and \(\tilde{\gamma }_{i,k}\) are the (ik)th elements in \(\tilde{\varvec{\gamma }}_c\) and \(\tilde{\varvec{\gamma }}\), respectively. Through this operation, the effects caused by the assumption of the target signal in (13) can be mitigated in a sense. Then, the parameter \({\varvec{\Gamma }}_c\) in the clutter covariance matrix \(\mathbf{R}_{ct}\) can be estimated by

$$\begin{aligned} \hat{{\varvec{\Gamma }}}_c = \mathrm{diag}(\hat{\mathbf{P}}) = \mathrm{diag}\left( \tilde{\varvec{\gamma }}_c \odot \tilde{\varvec{\gamma }}^*_c \right) = \mathrm{diag}\left( {\varvec{\gamma }}_c \odot {\varvec{\gamma }}^*_c \right) \end{aligned}$$
(25)

where it uses the fact that the amplitudes of the elements in \(\mathbf{t}_d\) are assumed unities and \({\varvec{\gamma } }_c\) represents the clutter spatio-temporal profile directly computed by \(\mathbf{x}\), given by

$$\begin{aligned} {\gamma }_{c;i,k} = \left\{ \begin{array}{lll}{\gamma }_{i,k} &{}\hbox {for}&{} (i,k) \notin \varOmega _\mathrm{SOI}\\ 0 &{}\hbox {for}&{} (i,k) \in \varOmega _\mathrm{SOI} \end{array}\right. . \end{aligned}$$
(26)

Thus, the clutter covariance matrix estimate can be calculated as

$$\begin{aligned} \hat{\mathbf{R}}_{ct} = ({\varvec{\Phi }} \hat{{\varvec{\Gamma }}}_c {\varvec{\Phi }}^H) \odot \hat{\mathbf{T}}, \end{aligned}$$
(27)

where \(\hat{\mathbf{T}}\) denotes the CMT estimate caused by the temporal decorrelation. From (27), we find that the CMT only operates on the clutter components not on the target. That is to say the approximation assumption in (13) is reasonable, and target energy spreading effects are mitigated by excluding the target signal in \(\varOmega _\mathrm{SOI}\).

Note that in \(\hat{\mathbf{R}}_{ct}\), it requires an estimate of \(\hat{\mathbf{T}}\), which have been addressed in a variety of articles, such as [7, 8, 10, 17, 37]. For a land scenario, we can select the Billingsley model [7] to describe the temporal decorrelation. The only parameters required to specify the clutter Doppler power spectrum are essentially the operating wavelength and wind speed. The operating wavelength is usually known, while the wind speed should be estimated. For a water scenario, we prefer to select the Gaussian model presented by J. Ward in [26] to estimate \(\hat{\mathbf{T}}\). The temporal autocorrelation of the fluctuations for this model is Gaussian in shape with the form:

$$\begin{aligned} \zeta (m)=\exp \left\{ - \frac{8\pi ^2\sigma ^2_vT^2_r}{\lambda ^2_c}m^2\right\} , \end{aligned}$$
(28)

where \(\sigma ^2_v\) is the variance of the clutter spectral spread in \(m^2/s^2\). The CMT \(\mathbf{T}(\sigma _v)\) is formulated as the form of

$$\begin{aligned} \mathbf{T}(\sigma _v) = \mathrm{Toeplitz}\left( \zeta (0), \zeta (1) \ldots , \zeta (N-1)\right) \otimes \mathbf{1}_{M \times M}. \end{aligned}$$
(29)

In the following simulations, we consider the CMT model of the latter one. It should be noted that the proposed algorithm firstly estimates the clutter covariance matrix in exactly the same manner as the existing sparsity-based D3-STAP without CMT, e.g., the method in [22] and then obtains the final clutter statistics by a Hadamard product of the CMT and the previously estimated clutter covariance matrix. Therefore, compared with the existing sparsity-based D3-STAP without CMT, the additional computational complexity of the proposed algorithm is due to the Hadamard product and the CMT estimation, i.e., on the order of \(O((NM)^2)\), resulting in a modest complexity increase. However, the values of the estimated CMT will affect the performance of the proposed algorithm. If there is no prior knowledge of the clutter, an automatically method must be developed, which will be discussed in the following Sect. 3.4.

Finally, we design the STAP filter based on a linearly constrained minimum variance (LCMV) approach, which is to minimize the clutter plus noise output power while constraining the gain in the direction of the desired target signal. The optimal LCMV STAP filter weight vector is given by

$$\begin{aligned} \hat{\mathbf{w}} = \frac{\left( \hat{\mathbf{R}}_{ct} + \beta _L \mathbf{I}\right) ^{-1}{} \mathbf{s}}{\mathbf{s}^H\left( \hat{\mathbf{R}}_{ct} + \beta _L \mathbf{I}\right) ^{-1}{} \mathbf{s}}. \end{aligned}$$
(30)

where \(\mathbf{s}\) is the \(NM \times 1\) space-time steering vector in the target direction and \(\beta _L\) denotes the diagonal loading factor related to the noise level \(\sigma ^2_n\) (which can be estimated by the receiver when the radar transmitter operates in passive mode [14]. Thus, the STAP filter output is computed by

$$\begin{aligned} y = \hat{\mathbf{w}}^H\mathbf{x}. \end{aligned}$$
(31)

3.3 Parameter Settings

There are four important parameters, namely the size of the dictionary \(N_sN_d\), the target of interest area \(\varOmega _\mathrm{SOI}\), the diagonal loading term \(\beta _L\) and the CMT, that must be estimated or set somehow in the proposed algorithm. Regarding the CMT, it will develop an adaptive approach to reduce the estimating difficulty in the next subsection. Regarding the diagonal loading term \(\beta _L\), simulation results will show that the proposed algorithm is robust to a range of values of \(\beta _L\). In the following, we focus on the discussions about the settings of \(N_sN_d\) and \(\varOmega _\mathrm{SOI}\).

The issue of the the size of the dictionary is also called the grid sampling issue, which is critical important to the performance of the sparsity-based STAP algorithms. All the sparsity-based STAP algorithms should be careful to set that. Intuitively, the denser the gird sampling of the spatio-temporal plane, the better approximation the described sparse model but the higher the computational complexity required by sparse recovery. Thus, it is a trade-off to choose a suitable gird sampling. We will conduct discussions from two aspects.

(1) From a point view of correlation:

For a point with temporal frequency \(f_d\) and spatial frequency \(f_s\), the space-time steering vector is

$$\begin{aligned} \mathbf{v}(f_d, f_s) = \mathbf{v}_t(f_{d}) \otimes \mathbf{v}_s(f_{s}). \end{aligned}$$
(32)

The correlation absolute value between \(\mathbf{v}(f_d, f_s)\) and \(\mathbf{v}(f_{d,i}, f_{s,k})\) in the space-time steering dictionary is

$$\begin{aligned} \rho= & {} \frac{\left| \mathbf{v}^H(f_{d,i}, f_{s,k}) \mathbf{v}(f_d, f_s)\right| }{NM} \nonumber \\= & {} \left| \frac{\sin \pi M (f_s-f_{s,k})}{M \sin \pi (f_s-f_{s,k}) }\cdot \frac{\sin \pi N (f_d - f_{d,i})}{N \sin \pi (f_d - f_{d,i})}\right| . \end{aligned}$$
(33)

Without loss of generality, Fig. 1a plots the correlation absolute bias value with \(f_d=0\) and \(f_s=0\). It is shown that the correlation absolute value keeps high when constraining in a diffraction-limited resolution cell and decays quickly elsewhere. In other words, for a component with \(f_d\) and \(f_s\), its power mostly lies in the corresponding diffraction-limited resolution cell. The smaller the bias between the true spatio-temporal frequencies and the sampling spatio-temporal frequencies, the better the approximation using the sampling spatio-temporal frequencies. To have a further intuitive understanding, Fig. 1b plots the maximal correlation absolute bias value caused by different grid sampling rates, which is shown that the correlation absolute bias value is very small when \(N_s \ge 4M\) and \(N_d \ge 4N\). From this point of view, we can expect to recover the clutter or target signal approximately when \(N_s\ge 4M\) and \(N_d\ge 4N\).

(2) From a point view of the proposed strategy:

For the clutter, we care much more about the recovery of the whole clutter subspace than the exact positions of the clutter components. In another word, if one of the clutter components does not exactly lie on the grid, its main power will spread into the most nearest grids centered at its own spatio-temporal frequency. After the sparse recovery of the clutter, we do not know the exactly clutter spatio-temporal frequencies but still can have a good estimation of the clutter subspace. From this point of view, the influence of the gird-off issue of the proposed algorithm is much less important than that in the field of direction-of-arrival estimation. Regarding how to sample the grid points, Melvin and Showman pointed out that it was important to oversample relative to the diffraction-limited case [17]. Moreover, they provided a good rule of thumb based on numerical simulations that it is to set the spacing of the steering vectors to about twenty to thirty percent of the diffraction-limited resolution. That is to say we should sample the grids with \(N_s\) about 4–5 times of M and \(N_d\) about 4–5 times of N. Additionally, the influence of the grid sampling to the performance of the sparsity-based STAP algorithms has been conducted in the papers, namely [34] and [31]. It was also shown that the performance improvement was very little based on the simulations when \(Ns\ge 4M\) and \(N_d \ge 4N\). The reason for that is: when \(N_sN_d\) is not large enough, there are serious mismatches between the sampling space-time steering vectors and those of the true clutter; when \(N_sN_d\) is sufficiently large, the accuracy of the sparse recovery solution is hardly to improve because of the existence of the noise and high correlations among the sampling space-time steering vectors. Therefore, it is empirical to set \(N_s\) about 4–5 times of M and \(N_d\) about 4–5 times of N.

For the target, if the target does not lie in the grid, the power of the target spreads to the nearest spatio-temporal frequencies. In the proposed algorithm, we have considered about this situation and estimate the clutter covariance matrix by excluding spatio-temporal frequencies in the region \(\varOmega _\mathrm{SOI}\). From the discussions about Fig. 1a, b, we can set the size of the region \(\varOmega _\mathrm{SOI}\) is about a diffraction-limited resolution cell. Furthermore, because the off-grid issue is an important one to the target recovery performance, especially in a strong clutter environment, we choose an alternative approach to detect the target. We consider that the recovering of the target component is not an easy task and the direct estimation of the target from the surrounding strong clutter may be unreliable [22]. Therefore, we only extract the significant clutter components and exclude the unstable target components with the prior knowledge of the assumed target spatio-temporal frequencies to form the adaptive filter. The above reasons result that the proposed algorithm is relatively robust to the off-grid issue in the target recovery.

Fig. 1
figure 1

Influence of the grid sampling. a \(\rho \) with \(f_d=0\) and \(f_s=0\,(N=12, M=12\)). b The maximal correlation absolute bias value

3.4 CMT Adaptation

The performance of the proposed approach to estimate the clutter covariance matrix described in the previous subsections depends on the CMT \(\mathbf{T}\). For the Gaussian model, from (29), the CMT \(\mathbf{T}\) is related to the clutter spectral spread variance \(\sigma ^2_v\). Here, we present an adaptation method for automatically selecting the parameter \(\sigma _v\) of the proposed approach. Specifically, we first constrain the parameter \(\sigma _v\) within a range of appropriate values (the candidates space of the parameter \(\sigma _v\) is presented by \(\varOmega _{\sigma _v} = \left\{ \sigma _{v;k} |k=1,2,\ldots ,K\right\} \)). In fact, for a typical environment, the typical region of the clutter spread deviation can be roughly estimated by the knowledge of the general clutter type and wind state (for the Gaussian CMT model, the spectral standard deviation is \(2\sigma _v/\lambda _c\)Hz) [26]. This task is far easier than directly estimate the value of the clutter spread deviation. Furthermore, even in the case without any knowledge about the environments, we can set a roughly large value of the region of the clutter spread deviation. But this may lead to the increase of the number of candidates in the region of the clutter spread deviation.

In order to select \(\sigma _v\) in a more appropriate way, we operate this procedure over a localized region by L adjacent range snapshots in the CUT. Let \(\left\{ \mathbf{x}_l\right\} ^L_{l=1}\) denote the adjacent range snapshots. Then, the best selection of \(\sigma _v\) relies on minimizing the STAP filter output with the cost function \(C(\varOmega _{\sigma _v})\):

$$\begin{aligned} \sigma _{v; \mathrm{opt}}= & {} \arg \min _{\sigma _v} C(\varOmega _{\sigma _v}) \nonumber \\= & {} \arg \min _{\sigma _v} \frac{1}{L}\sum ^L_{l=1}\left| \hat{\mathbf{w}}^H(\sigma _{v;k})\mathbf{x}_l\right| ^2 \nonumber \\= & {} \arg \min _{\sigma _v} \frac{1}{L}\sum ^L_{l=1}\left| \frac{\mathbf{s}^H\left( \hat{\mathbf{R}}'_{ct} \odot \mathbf{T}(\sigma _{v;k}) + \hat{\sigma }^2_n\mathbf{I} \right) ^{-1}{} \mathbf{x}_l}{\mathbf{s}^H\left( \hat{\mathbf{R}}'_{ct} \odot \mathbf{T}(\sigma _{v;k}) + \hat{\sigma }^2_n\mathbf{I}\right) ^{-1}{} \mathbf{s}}\right| ^2, \end{aligned}$$
(34)

where

$$\begin{aligned} {\hat{\mathbf{R}}}^{\prime }_{ct} = {\varvec{\Phi }} {\hat{\varvec{\Gamma }}}_c {\varvec{\Phi }}^H. \end{aligned}$$
(35)

The above criterion is used because the STAP filters satisfy \(\hat{\mathbf{w}}^H(\sigma _{v;k})\mathbf{s} = 1\) (where \(1\le k \le K\)) and minimizing the STAP filter output \(|\hat{\mathbf{w}}^H(\sigma _{v;k})\mathbf{x}_l|^2\) is equivalent to minimizing the clutter components in the final output. Then the final clutter covariance matrix estimate can be calculated by

$$\begin{aligned} {\hat{\mathbf{R}}}_{ct} = {\hat{\mathbf{R}}}^{\prime }_{ct} \odot \mathbf{T}(\sigma _{v; \mathrm{opt}}) =({\varvec{\Phi }} {\hat{\varvec{\Gamma }}}_c {\varvec{\Phi }}^H) \odot \mathbf{T}(\sigma _{v; \mathrm{opt}}). \end{aligned}$$
(36)

Since the computational complexity of the proposed CMT adaptation approach depends on the number of candidates in \(\varOmega _{\sigma _v}\), the larger K, the higher computational complexity. Thus, we expect to use only a few candidates to achieve good performance. In the simulations, it will demonstrate that the resulting SINR is relatively invariant to errors in the estimate of the CMT \(\mathbf{T}\) and a few candidates will be enough to obtain an acceptable SINR.

In addition, it should be noted that for each \(\sigma _{v;k} \in \varOmega _{\sigma _v}\), a matrix inversion operation is required in (34). Below, we will develop a fast approach to compute (34), which can be divided into two steps: the first is to compute the eigenvectors and eigenvalues of \(\hat{\mathbf{R}}'_{ct}\) and the second is to compute the matrix inversion of \(\hat{\mathbf{R}}'_{ct} \odot \mathbf{T}(\sigma _{v;k}) + \hat{\sigma }^2_n\mathbf{I}\).

Step 1: Compute the eigenvectors and eigenvalues of \({\hat{\mathbf{R}}}^{\prime }_{ct}\).

Because

$$\begin{aligned} {\hat{\mathbf{R}}}^{\prime }_{ct}= & {} {\varvec{\Phi }} {\hat{\varvec{\Gamma }}}_c {\varvec{\Phi }}^H \nonumber \\= & {} \sum ^{N_d,N_s}_{(i,k)=(1,1)}\hat{P}_{i,k}\mathbf{v}(f_{d,i},f_{s,k})\mathbf{v}^H(f_{d,i},f_{s,k}). \end{aligned}$$
(37)

Since there is a high degree of sparsity in \(\hat{P}_{i,k}\) (in other words, there will be only a small number of significant elements in \(\hat{P}_{i,k}\) Footnote 5), we can approximate \(\hat{\mathbf{R}}'_{ct}\) by

$$\begin{aligned} {\hat{\mathbf{R}}}^{\prime }_{ct} \approx {\hat{\mathbf{R}}}^{\prime \prime }_{ct} = \sum _{(i,k)=\varOmega _P}\hat{P}_{i,k}\mathbf{v}(f_{d,i},f_{s,k})\mathbf{v}^H(f_{d,i},f_{s,k}), \end{aligned}$$
(38)

where \(\varOmega _P\) denotes the index set of significant elements in \({\hat{P}}_{i,k}\). From (38), we note that the eigenvector subspace (denoted as \(\mathbf{U}\)) of \({\hat{\mathbf{R}}}^{\prime \prime }_{ct}\) is decided by vectors \(\mathbf{v}(f_{d,i},f_{s,k}), (i,k)=\varOmega _P\). For further reducing complexity, we can take the Gram–Schmidt orthogonalization procedure on these vectors to compute the eigenvector subspace instead of singular value decomposition (SVD), where the implementation steps of the Gram–Schmidt orthogonalization are omitted here, and interested readers are referred to [13] for further details. Then, the corresponding eigenvalues are also can be easily calculated by \(\lambda _q = \mathbf{u}^H_q{\hat{\mathbf{R}}}^{\prime \prime }_{ct}{} \mathbf{u}_q, 1\le q \le N_{\varOmega _P}\).

Step 2: Compute the matrix inversion of \(\hat{\mathbf{R}}'_{ct} \odot \mathbf{T}(\sigma _{v;k}) + \hat{\sigma }^2_n\mathbf{I}\).

For each \(\sigma _{v;k} \in \varOmega _{\sigma _v}\), we first compute and prestore the eigenvalues (denotes as \(\lambda _{\mathbf{T}(\sigma _{v;k});q}\)) and eigenvectors (denotes as \(\mathbf{u}_{\mathbf{T}(\sigma _{v;k});q}\)) of \(\mathbf{T}(\sigma _{v;k})\), where \(1\le q \le NM\). Note that \(\mathbf{T}(\sigma _{v;k})\) has a block Toeplitz structure, which can be exploited to reduce the complexity of eigenvalue decomposition [13]. Substituting the computed eigenvalues and eigenvectors of \(\hat{\mathbf{R}}''_{ct}\) and \(\mathbf{T}(\sigma _{v;k})\), we have (39),

$$\begin{aligned}&\hat{\mathbf{R}}^{\prime }_{ct} \odot \mathbf{T}(\sigma _{v;k}) \approx \hat{\mathbf{R}}''_{ct} \odot \mathbf{T}(\sigma _{v;k}) \nonumber \\&\quad = \left( \sum ^{N_{\varOmega _P}}_{i=1} \lambda _i \mathbf{u}_i\mathbf{u}^H_i \right) \odot \left( \sum ^{N_{NM}}_{j=1} \lambda _{\mathbf{T}(\sigma _{v;k});j} \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j}{} \mathbf{u}^H_{\mathbf{T}(\sigma _{v;k});j}\right) \nonumber \\&\quad = \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} \left( \mathbf{u}_i\mathbf{u}^H_i \right) \odot \left( \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j}{} \mathbf{u}^H_{\mathbf{T}(\sigma _{v;k});j}\right) \nonumber \\&\quad = \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} \left( \mathbf{u}_i \odot \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j} \right) \left( \mathbf{u}_i \odot \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j}\right) ^H. \end{aligned}$$
(39)

Let \(\tilde{u}_{i,j} = \mathbf{u}_i \odot \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j}\), and we note that vectors \(\tilde{u}_{i,j}\) retain the orthonormality property (orthogonal and unit norm), proved as follows:

$$\begin{aligned} \tilde{u}^H_{i,j}\tilde{u}_{i}^{\prime },j^{\prime }= & {} \left( \mathbf{u}_i \odot \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j} \right) ^H \left( \mathbf{u}_{i^{\prime }} \odot \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j^{\prime }}\right) \nonumber \\= & {} \left( \mathbf{u}^H_i \mathbf{u}_{i}^{\prime } \right) \odot \left( \mathbf{u}^H_{\mathbf{T}(\sigma _{v;k});j}{} \mathbf{u}_{\mathbf{T}(\sigma _{v;k});j'}\right) \nonumber \\= & {} \delta _{ii}^{\prime }\delta _{jj}^{\prime } \end{aligned}$$
(40)

where \(\delta _{ii'}\) and \(\delta _{jj'}\) denote the Kronecker delta function. Therefore, the matrix inversion of \(\hat{\mathbf{R}}'_{ct} \odot \mathbf{T}(\sigma _{v;k})\) can be calculated as

$$\begin{aligned}&\left( {\hat{\mathbf{R}}}^{\prime }_{ct} \odot \mathbf{T}(\sigma _{v;k}) + {\hat{\sigma }}^2_n\mathbf{I} \right) ^{-1} \nonumber \\&\quad \approx \left( \hat{\mathbf{R}}^{\prime \prime }_{ct} \odot \mathbf{T}(\sigma _{v;k}) + {\hat{\sigma }}^2_n\mathbf{I} \right) ^{-1} \nonumber \\&\quad = \left( \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} \tilde{u}_{i,j}\tilde{u}^H_{i,j} + {\hat{\sigma }}^2_n\mathbf{I}\right) ^{-1} \nonumber \\&\quad = \frac{1}{\hat{\sigma }^2_n} \left( \mathbf{I} - \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \frac{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j}}{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} + {\hat{\sigma }}^2_n} \tilde{u}_{i,j}\tilde{u}^H_{i,j}\right) . \end{aligned}$$
(41)

Substituting (41) into (34), we obtain

$$\begin{aligned}&C(\varOmega _{\sigma _v}) = \frac{1}{L}\sum ^L_{l=1}\nonumber \\&\quad \left| \frac{\mathbf{s}^H\mathbf{x}_l - \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \frac{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j}}{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} + \hat{\sigma }^2_n}\left( \tilde{u}^H_{i,j}\mathbf{s}\right) \left( \tilde{u}^H_{i,j}{} \mathbf{x}_l\right) }{1 - \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \frac{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j}}{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;k});j} + \hat{\sigma }^2_n} \left| \tilde{u}^H_{i,j}\mathbf{s}\right| ^2}\right| ^2. \end{aligned}$$
(42)

From the above equation, we observe that (34) requires to compute the matrix inversion for each \(\sigma _{v;k} \in \varOmega _{\sigma _v}\), while (42) only needs to calculate simple multiplications using the eigenvalues and eigenvectors of the matrix \(\hat{\mathbf{R}}'_{ct}\) and the prestored eigenvalues and eigenvectors of \(\mathbf{T}(\sigma _{v;k})\). At last, the final STAP filter weight vector \(\hat{\mathbf{w}}\) in (30) can be rewritten as

$$\begin{aligned} {\hat{\mathbf{w}}} = \frac{\mathbf{s} - \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \frac{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;\mathrm{opt}});j}}{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;\mathrm{opt}});j} + {\hat{\sigma }}^2_n}\left( \tilde{u}^H_{i,j}\mathbf{s}\right) \tilde{u}_{i,j}}{1 - \sum ^{N_{\varOmega _P}}_{i=1}\sum ^{N_{NM}}_{j=1} \frac{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;\mathrm{opt}});j}}{\lambda _i \lambda _{\mathbf{T}(\sigma _{v;\mathrm{opt}});j} + {\hat{\sigma }}^2_n} \left| \tilde{u}^H_{i,j}{} \mathbf{s}\right| ^2}. \end{aligned}$$
(43)

3.5 Performance Metrics

Here, we detail two traditional performance metrics, i.e., SINR loss and SINR improvement factor, for analyzing the performance of the proposed sparsity-based STAP algorithm. The output SINR loss is defined as the output SINR performance relative to the matched filter \(\mathrm{SNR}_\mathrm{opt}\) in an interference-free environment (where the \(\mathrm{SNR}_\mathrm{opt}\) is equivalent to the number of system DOFs NM multiplying the target power \(\sigma ^2_t = E[|\alpha _t|^2]\), i.e., \(NM\sigma ^2_t\)) [26], given as

$$\begin{aligned} \mathrm{SINR}_\mathrm{loss} = \frac{\mathrm{SINR}}{\mathrm{SNR}_\mathrm{opt}} = \frac{\left| \hat{\mathbf{w}}^H\mathbf{s}\right| ^2}{NM\left| \hat{\mathbf{w}}^H\mathbf{R}\hat{\mathbf{w}}\right| }, \end{aligned}$$
(44)

where \(\mathbf{R}\) is the true clutter plus noise covariance matrix. As pointed in [21], there is no closed-form statistical characterizations for the estimated clutter covariance matrix when using the sparsity-based STAP algorithms. Therefore, we use a Monte Carlo technique to generate the SINR loss performance.

The SINR improvement factor, \(\mathrm{IF}\), is as a gain in SINR relative to the input SINR \(\mathrm{SINR}_\mathrm{in}\) on a single channel and a single pulse and defined as

$$\begin{aligned} \mathrm{IF} = \frac{\mathrm{SINR}}{\mathrm{SINR}_\mathrm{in}} = \frac{\left( 1 + \mathrm{CNR}\right) \left| \hat{\mathbf{w}}^H\mathbf{s}\right| ^2}{\left| \hat{\mathbf{w}}^H\mathbf{R}\hat{\mathbf{w}}\right| }, \end{aligned}$$
(45)

where \(\mathrm{CNR}\) is defined by

$$\begin{aligned} \mathrm{CNR} = \frac{\mathrm{tr}\left( \mathbf{R}_{ct}\right) }{\sigma ^2_n NM}. \end{aligned}$$
(46)

Additionally, we define the target’s SNR as [26]

$$\begin{aligned} \mathrm{SNR} = \frac{E[\mathbf{x}^H_t\mathbf{x}_t]}{\sigma ^2_n NM}, \end{aligned}$$
(47)

which is the ratio of the input target power to the input noise. The SINR improvement factor provides not only the amount of clutter rejection but also the coherent gain on target due to spatial and temporal beamforming [26].

In addition, to study the effect of various parameters on the SINR, we employ the average value of the SINR loss or SINR improvement factor over all Doppler frequencies as composite performance metrics given by

$$\begin{aligned} \overline{\mathrm{SINR}}_\mathrm{loss}= & {} \int ^{0.5}_{-0.5} \mathrm{SINR}_\mathrm{loss}(f) df.\end{aligned}$$
(48)
$$\begin{aligned} \overline{\mathrm{IF}}= & {} \int ^{0.5}_{-0.5} \mathrm{IF}(f) df. \end{aligned}$$
(49)

Similar to the SINR loss, we use a Monte Carlo technique to generate \(\mathrm{IF}, \overline{\mathrm{SINR}}_\mathrm{loss}\) and \(\overline{\mathrm{IF}}\).

4 Numerical Results

In this section, we assess the output SINR and \(P_d\) performance of the proposed algorithm (shortened as CMT-FOCUSS-D3-STAP algorithmFootnote 6) using simulated data and compare it with the conventional sample matrix inversion algorithm (SMI), the D3-LS STAP method in [24], the FOCUSS-D3-STAP algorithm in [22] and the sparsity-aware beamformer in [35]. Throughout the simulations, unless otherwise stated, the parameters of the simulated scenarios are given in Table 1. The thermal noise is modeled as a Gaussian white noise with unity power. The clutter and target powers are referred to the thermal noise power. In the simulations, for the sparsity-based STAP algorithms, we set \(N_d=4N\) and \(N_s=4M\) for the discretized spatio-temporal plane and set the diagonal loading factor \(\beta _L\) to the noise level in the STAP filter design. We set the \(l_p\)-norm to \(p=1\), the maximum iteration number is 500, and the stopping criterion is decided by the preset limit relative change of the solution between two adjacent iterations \(10^{-4}\). For the D3-LS STAP algorithm, the numbers of temporal and spatial DOFs used to design the filter weights are \(N'=7\) and \(M'=7\), respectively. For the SMI algorithm, the number of snapshots without target is \(2NM=288\). For the sparsity-aware beamformer algorithm, the number of snapshots without a target is 100 and the regularization parameter value is set to 0.3. All presented results are averaged over 300 independent Monte Carlo runs.

Table 1 Radar system parameters

4.1 Effects of Parameter Settings on Performance

In this subsection, we focus on the effects of parameter settings on the performance of the proposed algorithm, such as the estimated CMT, the diagonal loading term \(\beta _L\) and the target region \(\varOmega _\mathrm{SOI}\).

Fig. 2
figure 2

SINR loss against target Doppler frequency of the CMT-FOCUSS-D3-STAP algorithm with different estimated CMT

In the first example, we evaluate the effect of the estimated CMT on the SINR performance for the proposed algorithm. We consider the ICM with \(\sigma _v=1\) and assume the estimated CMT with six different values, i.e., \(\hat{\sigma }_v = 0, 0.25, 0.5, 1, 2.5, 5\), where \(\hat{\sigma }_v = 0\) corresponds to the FOCUSS-D3-STAP algorithm. Figure 2 depicts the SINR loss against normalized target Doppler frequency. The curves suggest that the proposed CMT-FOCUSS-D3-STAP algorithm outperforms the conventional FOCUSS-D3-STAP algorithm, and the resulting SINR is relatively invariant to errors in the estimate of the CMT, e.g., the proposed algorithms with \(\hat{\sigma }_v = 1, 2.5, 5\) provide close performance.

Fig. 3
figure 3

SINR loss against target Doppler frequency of the CMT-FOCUSS-D3-STAP algorithm with different \(\varOmega _\mathrm{SOI}\)

In the second example, we access the effect of the assumed target region \(\varOmega _\mathrm{SOI}\) on the performance of the proposed algorithm. We consider the ICM with \(\sigma _v=1\) and assume the estimated CMT of the proposed algorithm with \(\hat{\sigma }_v = 2\). As shown in Fig. 3, we see that it provides the best performance when the assumed target region \(\varOmega _\mathrm{SOI}\) is a diffraction-limited resolution cell, which validates the analysis in Sect. 3.3.

Fig. 4
figure 4

SINR loss against target Doppler frequency of the CMT-FOCUSS-D3-STAP algorithm with different \(\beta _L\)

In the third example, we consider the sensitiveness of the diagonal loading term \(\beta _L\) in the filter (30) to the performance of the proposed algorithm. The assumed target region \(\varOmega _\mathrm{SOI}\) is set to be a diffraction-limited resolution cell, and other simulation parameters are the same as those in the second example, except for \(\beta _L\). The results are plotted in Fig. 4. The curves show that the proposed algorithm exhibits robust performance for a range of values of the diagonal loading term \(\beta _L\), i.e., \(\beta _L=-20,-15,-10,-5,0\) dB to the true noise level.

4.2 Comparisons with Existing Algorithms

In this subsection, we focus on the performance comparisons between the the proposed algorithm and other existing algorithms.

Fig. 5
figure 5

SINR loss against the number of snapshots for all algorithms

To have a comparable convergence performance with the existing STAP performance using training data, we illustrate the SINR loss against the number of snapshots in Fig. 5. In this example, we consider the ICM with \(\sigma _v=1\) and assume the estimated CMT of the proposed algorithm with \(\hat{\sigma }_v = 2\). In addition, we keep the target normalized Doppler frequency of 0.3 for all algorithms. Because the D3-type STAP algorithms, i.e., the D3-LS STAP algorithm, the FOCUSS-D3-STAP algorithm and the proposed CMT-FOCUSS-D3-STAP algorithm, only use the snapshot in the CUT, the SINR performance keeps invariant with the increase of snapshots. While the SINR performance of the conventional SMI algorithm and the sparsity-aware beamformer algorithm increases as the number of snapshots increases. Thus, the D3-type STAP algorithms exhibit a significant performance improvement over the standard SMI algorithm and the sparsity-aware beamformer algorithm when the number of IID snapshots is very low, e.g., in seriously nonhomogeneous environments.

Fig. 6
figure 6

Clutter eigenspectra for different levels of ICM

Fig. 7
figure 7

Comparison of SINR loss against target Doppler frequency of different STAP algorithms with different levels of ICM. a \(\sigma _v=0.25\); b \(\sigma _v=0.5\); c \(\sigma _v=1\); and d \(\sigma _v=5\)

In the second example, we compare the performance of the proposed CMT-FOCUSS-D3-STAP algorithm using CMT adaptation with other existing STAP algorithms in four different ICM cases, namely \(\sigma _v=0.25, 0.5, 1, 5\). In the simulation, we set the candidates space of the parameter \(\sigma _v\) as \(\varOmega _{\sigma _v}=\{\sigma _{v;k}= 0.1, 0.3, 0.9, 1.5, 2.5, 5, 6\}\). The clutter eigenspectrum for different levels of ICM is illustrated in Fig. 6. As the velocity standard deviation \(\sigma _v\) increases, the tails of the eigenspectrum become larger as the clutter rank increases. Figure 7a–d plots the SINR loss against target Doppler frequency of different STAP algorithms with the above-mentioned levels of ICM. From the figures, we observe that: (i) the proposed CMT-FOCUSS-D3-STAP algorithm outperforms the FOCUSS-D3-STAP algorithm and the D3-LS STAP algorithm in almost all Doppler bins; (ii) as \(\sigma _v\) increases, the performance of the FOCUSS-D3-STAP algorithm and the D3-LS STAP algorithm degrades significantly while the proposed CMT-FOCUSS-D3-STAP algorithm provides relative robustness to the ICM. This is because the proposed algorithm employs a CMT to overcome the effects of the ICM. Furthermore, compared with the results of the proposed algorithm in Figs. 2 and 7c, it implies that the proposed CMT adaptation approach can efficiently select the best CMT to suppress the clutter. In addition, we also observe that the proposed algorithm performs worse than the SMI algorithm and the sparsity-aware beamformer algorithm in the normalized Doppler range of \(-0.2\) to 0.2. That is because: on the one hand, some clutter components may also be excluded by using the target region \(\varOmega _\mathrm{SOI}\) when the target is close to the clutter ridge. On the other hand, it should be noted that the numbers of snapshots used in the SMI and the sparsity-aware beamformer algorithm are 288 and 100 respectively, while the proposed algorithm only uses the snapshot in the CUT. When in seriously nonhomogeneous environments, the performance of the SMI algorithm and the sparsity-aware beamformer algorithm will be seriously affected because of the snapshot deficiency.

Fig. 8
figure 8

\(P_d\) performance versus target’s SNR

Because the target component will be included in the recovered spatio-temporal profile, in this example, we evaluate the effect of SNR on the \(P_d\) performance of the proposed algorithm with adaptive CMT approach, as shown in Fig. 8. In the simulations, we keep the ICM with \(\sigma _v=1\) and the CNR 50dB. The false alarm rate \((P_{fa})\) is set to \(10^{-3}\), and for simulation purposes the threshold and probability of detection estimates are based on 1000 samples. Furthermore, we consider a scenario with a target not close to the clutter ridge (normalized Doppler frequency 0.3) and another scenario with a target close to the clutter ridge (the normalized Doppler frequency 0.1). The other parameters are the same as those in the previous examples. From the figures, it is found that although the performance of the proposed algorithm degrades when the target is close to the clutter ridge, it always provides higher detection performance than the conventional FOCUSS-D3-STAP algorithm.

Fig. 9
figure 9

Average SINR loss performance versus different levels of SNR

To have a further comprehensive analysis of the effect of SNR on the proposed algorithm, average SINR loss performance versus different levels of SNR for different STAP algorithms are plotted in Fig. 9. Here, we consider five different cases of SNR, i.e., \(\mathrm{SNR} = -10, 0, 10, 20, 30\) dB. The results show that when the SNR is slightly higher than the noise power but far lower than the clutter power, i.e., \(\mathrm{SNR}=10\) dB, the SINR improvements of the proposed algorithm and the FOCUSS-D3-STAP algorithm slightly degrade. This is due to the fact that, in this situation, it is hard to accurately recover the target components resulting in some target energy spreading to other elements of the spatio-temporal profile. It is interesting that the smaller the target energy, the smaller effect the target signal to the estimated clutter covariance matrix. In other words, the proposed algorithm can obtain a larger SINR improvement in weak target detection.

Fig. 10
figure 10

Clutter eigenspectra for different levels of CNR

Fig. 11
figure 11

Comparison of average SINR loss and average IF against different levels of CNR. a Average SINR loss. b Average IF

In the final example, we assess the effect of CNR on SINR performance for the proposed algorithm. In the simulations, we keep the ICM with \(\sigma _v=1\) and the SNR 0dB, but vary the CNR, i.e., \(\mathrm{CNR} = 30,40,50,60\) dB. The other parameters are the same as previous examples. Figure 10 depicts the clutter eigenspectrum for different levels of CNR. Seen from the figure, it is noted that as CNR increases, the clutter rank increases and the tails of the eigenspectrum become larger. Figure 11a, b plots the average SINR loss and average IF performance versus different levels of CNR for different STAP algorithms. The curves show that the proposed CMT-FOCUSS-D3-STAP algorithm exhibits relative more robustness to the CNR than the FOCUSS-D3-STAP algorithm and the D3-LS STAP algorithm. It also should be noted that the SMI algorithm and the sparsity-aware beamformer algorithm provide the best robustness performance against different levels of CNR. This is because the SMI algorithm and the sparsity-aware beamformer algorithm use many snapshots to estimate the clutter covariance matrix. However, the D3-type STAP algorithms only use the snapshot in the CUT, resulting in some deficiency of the estimated clutter power. Through the CMT, the proposed CMT-FOCUSS-D3-STAP algorithm can mitigate the deficiency of the estimated clutter power in a sense.

5 Conclusions

In this paper, we have proposed a sparsity-based D3-STAP algorithm by formulating a more realistic sparse measurement model for airborne radar that considers the ICM. It first derived the principle of the sparsity-based D3-STAP algorithm based on the FOCUSS technique, which illustrated that the proposed algorithm estimates the clutter covariance matrix by a Hadamard product of the CMT and the clutter covariance matrix estimates with the SR technique. Furthermore, a CMT adaptation approach for the proposed algorithm was developed to automatically select the best CMT. The results with different scenarios in an airborne radar system have showed that the proposed algorithm outperforms the D3-LS STAP algorithm and the conventional sparsity-based D3-STAP algorithm and is expect to obtain much better performance than the SMI algorithm and the sparsity-aware beamformer algorithm in heterogeneous environments. In future, we will extend our model to incorporate more realistic physical effects, such as channel mismatch, antenna array misalignment and range walk.