Keywords

1 Introduction

Target detection technology under the background of sea clutter is widely used in military and civilian settings, and it is a radar research hotspot [1]. Sea clutter has non-Gaussian, nonlinear, and non-stationary characteristics [2,3,4], which lead to mismatches of the statistical model which reduce the target detection performance of the constant false alarm rate (CFAR) method [5]. Contrary to CFAR detections, convolutional neural networks (CNN) [6,7,8,9] are data-driven and construct models using deep perceptual networks, which overcom1es the limitation of relying solely on statistical characteristics to simulate clutter distributions [10].

In Ref. [11], radar target detection is defined as a binary classification problem between targets and clutter, which is realized by using the Doppler domain information of the echo signal. To classify maritime targets, clutter, and coastline, CNN is used to classify the segmented maritime radar image samples in [12]. Since the target has a variety of motion characteristics, the Doppler velocity of the clutter unit tends to be large in low sea conditions, and the Doppler spectrum of the target and the clutter overlaps partially, detecting it with only a single feature is unreliable.

In this paper time frequency and amplitude characteristics of sea clutter were extracted using two-channel convolutional neural networks. Based on the viewpoints of feature fusion and feature extraction, the feature vector layer fusion model and the decision layer fusion model are then developed. Similarly, an attention mechanism is simultaneously implemented in the detection of sea clutter objects to improve the model's ability to extract features. For the proposed strategy, simulation analysis is used to determine the most suitable model, taking into account detection performance, parameter amounts, calculation complexity, and detection time.

2 Proposed Method Descriptions

In this section, a two-channel feature extraction network structure was developed, along with a convolutional attention module. The predicted results from each branch were fused at the decision layer, and the target detection problem was transformed into a binary classification problem using the Softmax classifier. The decision threshold was adjusted to effectively control the system's false alarm rate.

2.1 Fusion Attention Mechanism for Feature Extraction Networks

Convolutional attention module (CBAM) [13, 14] is integrated into the feature extraction network VGG16, and end-to-end training is performed together with VGG16 to form an improved feature extraction network. Figure 1 illustrates the network structure. There are two types of CBAM: Channel Attention Modules (CAM) and Spatial Attention Modules (SAM). In order to adapt the features, the two modules infer the attention weights in turn along the channel and space dimensions.

Fig. 1
An illustration represents the flow of feature extraction going through multiple layers, which include convolution + R e L u, max pooling, fully connected + R e L u, softmax, channel attention, spatial attention, and element-wise multiplication.

Network of VGG16 feature extraction with CBAM

As shown in Fig. 2, an input feature map \(F\) is compressed to reduce its spatial dimension, and information about its spatial location is consolidated using average pooling and maximum pooling to create a \(1 \times 1 \times C\) feature map: \(F_{\max }^{c} \) and \( F_{avg}^{c}\). As a result, the multilayer perceptron (MLP) of the hidden layer is applied to the two feature maps, and the output feature vectors are combined element-by-element. As a final step, the Sigmoid activation operation is used to produce a detailed feature map of channel attention \(A_{c}\).

Fig. 2
A block diagram illustrates the process of C A M operation. It denotes the flow from the convolution feature maps going through max polling, average polling, shared M L P, sigmoid, and cam before reaching the channel refined feature.

Process of CAM operation

Formulas (1)–(3) show the calculation process:

$$ A_{c} \left( F \right) = \sigma \left( {MLP\left( {AvgPool\left( F \right)} \right) + MLP\left( {MaxPool\left( F \right)} \right)} \right) $$
(1)
$$ A_{c} \left( F \right) = \sigma \left( {W_{1} \left( {W_{0} \left( {F_{\max }^{c} } \right)} \right) + W_{1} \left( {W_{0} \left( { F_{avg}^{c} } \right)} \right)} \right) $$
(2)
$$ F_{c} = A_{c} \left( F \right) \otimes F $$
(3)

where \(Maxpool\) and \(Avgpool\) represent average pooling and maximum pooling, \(W_{0}\) and \(W_{1}\) represent the two weights of MLP, \(\sigma\) represents sigmoid function, and \(\otimes\) represents element-wise multiplication.

Figure 3 shows the operation process, which is the most informative part of the SAM. As a first step, the average pooling and maximum pooling operations will be performed in the channel dimension to obtain two feature maps of \(H \times W \times 1\). As a result, two features are spliced together into a feature map of \(H \times W \times 2\), and a \(7 \times 7\) convolution kernel is used to reduce the dimension again into a feature map of \(H \times W \times 1\). As a final step, the spatial attention map \(A_{s}\) is generated via the Sigmoid activation method and the final salient feature map \(F_{s}\) is obtained through element-wise multiplication.

Fig. 3
A block diagram illustrates the process of S A M operation. It denotes the flow from the channel refined feature going through max polling, average polling, F s max, F s average, convolution layer, and spatial attention before reaching the refined feature.

Process of SAM operation

Formulas (4)–(6) show the calculation process:

$$ A_{s} \left( {F_{c} } \right) = \sigma \left( {Conv_{7 \times 7} \left( {\left[ {MaxPool\left( {F_{c} } \right); AvgPool\left( {A_{c} } \right)} \right]} \right)} \right) $$
(4)
$$ A_{s} \left( {F_{c} } \right) = \sigma \left( {Conv_{7 \times 7} \left( {\left[ {F_{\max }^{s} ; F_{avg}^{s} } \right]} \right)} \right) $$
(5)
$$ F_{R} = A_{s} \left( {A_{c} } \right) \otimes F_{c} $$
(6)

2.2 Algorithm for Decision Fusion

In this framework, decision-level feature fusion is implemented through two modules: classification and decision fusion. LeNet-5 and VGG16 are used as feature extraction channels in the classification module; the mean fusion algorithm is used in the decision fusion module (Fig. 4).

Fig. 4
An illustration represents the flow from the dual feature datasets splits to V G G 16 and L e Net 5 and proceeds through multiple layers under the classification and decision fusion modules before reaching the outcome.

Model based on fusion of decision layers

A VGG16 channel contains two fully connected layers with an output vector size of \(4096 \times 1\), while a LeNet-5 channel has an output vector size of \(120 \times 1\). In the last fully connected layer of each network branch, vector probability is calculated using the Softmax function. During feature extraction, the weights and parameters of each layer of the trained network are loaded using the transfer learning method.

Each branch of the classification module’s prediction results is fused at the decision level in the fusion module, denotes the prediction result of the \(i\)-th branch, then there are two branches, so \(i \in \left\{ {1, 2} \right\}\) is obtained. By using different fusion rules, \(P_{f} = \left( {p_{f, 1} , p_{f, 2} } \right)\) can be predicted for the final fusion module. In this paper, the inter-element mean strategy is used for fusion. The calculation rules for the fusion of the \(j\)-th element \(P_{f,j}\) through the element mean strategy are as follows:

$$ P_{f,j} = \frac{1}{2}\mathop \sum \limits_{i = 1}^{2} p_{i, j} $$
(7)

3 Experimentals and Analysis

The proposed method is comprised of three parts: data preprocessing, dataset construction, and model training. During the forward propagation of the dataset in the network model, preliminary prediction results are obtained. Subsequently, in the backpropagation process, the model weights are adjusted by computing the error between the predicted and expected values. This adjustment enables obtaining the optimal network parameter model, facilitating binary classification of targets and clutter.

3.1 Data Set Description and Settings

In this paper, a dual-feature dataset that combines sea clutter amplitude and Doppler velocity is produced for the parallel dual-channel feature network structure described in Sect. 2. IPIX sea clutter data is used for training and testing.

A sea clutter dual-feature image sample is made by splicing and packaging the data obtained from the same signal sequence after different preprocessing methods to ensure that the time–frequency map and amplitude map are identical. Figure 5 displays a double-featured image of sea clutter. The time–frequency features are represented by lines 1–224, while the compressed amplitude features are represented by lines 225–229.

Fig. 5
Two graphical representations under a and b represent the target and clutter data, respectively. The bottom section in both graphs is highlighted and denoted as the line graph of amplitude versus time. A has an increasing trend, while B has a linear progression with multiple peaks and troughs.

Sea clutter dual feature dataset

3.2 Evaluations of the Proposed Method

As shown in Table 1, compared with models 1–4, VGG16 achieves higher clutter classification accuracy, while LeNet shows higher target classification accuracy. Moreover, compared with single-channel models 5–7, dual-channel models significantly improve target sample classification accuracy.

Table 1 Extraction model performance

In order to verify the detection performance of Model 6 based on the decision fusion algorithm, we designed Model 5, which uses different feature fusion strategies but the same feature extraction channel model. The experimental results show that Model 6 has a target accuracy of 91.27% and a clutter accuracy of 98.33%. Target classification performance improved by 4.45%. Afterwards, the detection probability increased by 1.54% and 0.86% after the CBAM module was introduced in VGG16.

As shown in Fig. 6, Model 7 outperforms Models 5 and 6 if a variable threshold Softmax classifier is used. With false alarm probabilities greater than 10−2, models 5 and 6 display similar detection performance, but both perform worse than model 7. When the false alarm rate is less than 10−3, model 7 can achieve a higher detection probability, and when the false alarm rate is 10−4, model 7 can still achieve a detection accuracy of 84.16%.

Fig. 6
A line graph of detection probability versus false alarm probability depicts an increasing trend of 3 lines representing Model 5, Model 6, and Model 7. The line for Model 7 has the highest probability of more than 0.95.

ROC curves for model 5–7

3.3 Analysis of Influencing Factors

In practical detection tasks, environmental factors such as wind speed, temperature, and weather can contribute to a complex and variable radar operating environment. Moreover, high-speed acquisition of sea surface information is necessary during detection, requiring control of radar dwell time. The shorter the dwell time, the shorter the observation time. Therefore, this section conducts experimental tests to verify the detection performance of the proposed detector under different observation durations and sea conditions.

Figure 7a presents a comparison of the detector’s performance for different observation periods under the HH polarization. The results indicate that an increase in observation time significantly improves the detection probability. Notably, under high false alarm rates, increasing the observation time from 2048 to 4096 ms slightly improves the detector's detection performance. Moreover, when the observation time is only 256 ms and the false alarm probability is 10−3, the detector’s detection accuracy can exceed 80%, which can meet the requirements of low observation time, low false alarm probability, and high detection accuracy in practical applications.

Fig. 7
Two different graphs denote different data. a. A line graph of detection probability versus false alarm probability depicts the trend of 5 lines denoting 4096, 2048, 1024, 512, and 256 milliseconds. b. A grouped bar graph of detection probability versus sea state depicts the values for H H, V V, H V, and V H.

Model detection figures under different observation durations and sea states

Figure 7b shows detection of Class 2, Class 3 and Class 4 sea states with 1024 ms observation time and a false alarm probability of 10−3. Due to the difference between the sea clutter and the Doppler spectrum of the target unit in the third sea state, the network is better able to extract and learn image features based on the differences between those two factors; Nevertheless, in level 2 sea states, the target Doppler spectrum overlaps with the clutter Doppler spectrum, which limits the performance of target detection; Similarly, when the sea state is level 4, backward electromagnetic scattering characteristics of sea clutter are strong, and sea peaks due to waves and swells on the sea surface are similar to the target echo and have a high amplitude, making it easier to cover up the target. At this time, the detection probability decreases slightly.

4 Conclusion

As a part of this paper, radar signal target detection is converted into a binary classification problem, and the measured sea clutter and target radar signal data are used to test the performance of different feature extraction models, as well as an attention mechanism-based method.

We process radar signals by short-time Fourier transform and modulo method, VGG16 and LeNet networks are used for feature extraction, and CBAM is fused into the VGG16 network in the decision layer. The detection probability is 87.88% under the conditions of 10−3 false alarm rate, and 84.16% under the conditions of 10−4 false alarm rate.