1 Introduction

Intra-pulse modulation recognition of radar signal has played an essential role in modern electronic warfare systems, such as electronic reconnaissance, electronic intelligence, and electronic attack [17]. The recognition accuracy of radar signal can not only help infer the modulation types of enemy radar signal and further judge its threat level, but also improve the accuracy of parameter estimation [30]. While, with the rapid development of radar technology in recent years, complexity and diversity of radar signal types are increasing. Moreover, the signal to noise ratio (SNR) for the working environment becomes lower and lower [1]. Therefore, it is urgent to propose an efficient approach to improve the recognition performance of radar signal at the lower SNR.

The radar signal recognition methods have been proposed in the past few years [27], and included the traditional feature extraction methods [9] and artificial intelligence methods [18]. Traditional radar signal recognition methods mainly utilized feature extraction and classification technology to extract features and classify the types of radar signals [3]. For the feature extraction method, some researchers have proposed short-time Ramanujan Fourier transform [11], integrated quadratic phase function [4] and scale-invariant feature transform (SIFT) position [15]. Over the years, various kinds of features, such as instantaneous feature [7], high order cumulants [6], and time-frequency feature [29], have improved radar signal recognition performance compared with the previous methods.

To further improve recognition performance and enhance accuracy, recognition approaches based on deep learning for radar signals have also been proposed [13]. In [21], a radar signal intra-pulse modulation recognition method based on convolutional neural network and deep Q-Learning network (CNN-DQLN) was put forward. The results demonstrated that the average recognition accuracy of eight types radar signals can reach up to 94.43% at the SNR of −6 dB. In [12], a recognition system based on secondary time-frequency distribution, discriminative projection, and collaborative representation was investigated. The overall average recognition accuracy of eight types radar signals was up to 95.6% at −8 dB. In [24], a deep fusion method based on convolutional neural network (CNN) to recognize jamming signal acting on pulse compression radar was researched. The experiment shown that the proposed approach provided the competitive results in recognition accuracy. In [22], the asymmetric convolution squeeze-and-excitation networks and autocorrelation features for pulse repetition interval modulation recognition was proposed. The results demonstrated that accuracy of six types radar signals can achieve more than 91% at −12 dB. In [25], an accurate automatic modulation classification method based on dense convolutional neural networks was investigated. The experimental results shown that the overall average recognition accuracy of eight types radar signals was up to 93.4% at −8 dB. In [2], an adversarial transfer learning architecture that incorporated the adversarial training and knowledge transfer in a unified way was investigated, and demonstrated an effective performance on knowledge transfer across sampling rates.

As can be seen from the above overview, a common limitation of the above-mentioned work is that the recognition accuracy is still low up to now, especially at the lower SNR. To remedy this flaw, an efficient deep convolutional neural network with features fusion (EDCNN-FF) for recognizing radar signals is proposed. The innovation points of this paper are as follows: 1) Two constructed deep convolutional neural network (DCNN) models are utilized for extracting the more detailed features; 2) A multi-features fusion model is designed for making full use of the relationship among different features and yielding further improvement; 3) The proposed method possesses the superior recognition performance over others.

This paper is organized as follows: The overall structure of the radar signal recognition framework is proposed in Section 2. Fundamental theory of time-frequency transformation and image pre-processing methods are introduced in Section 3. An EDCNN-FF model for radar signal recognition is designed in Section 4. The recognition performance of the proposed approach is fully investigated by using the experimental method in Section 5. Some conclusions of this work are drawn in Section 6.

2 System framework

To improve radar signal recognition accuracy and enhance robustness, especially at the lower SNR environment, a novel radar signal recognition approach EDCNN-FF is proposed in this paper. The recognition system mainly consists of data pre-processing, DCNN features extraction, multi-features fusion, and radar signal classification. The system framework of the proposed approach is illustrated in Fig. 1.

Fig. 1
figure 1

System framework of EDCNN-FF for radar signal recognition

As can be known from Fig. 1 that the choi-williams distribution (CWD) and smooth pseudo-wigner-ville distribution (SPWVD) transformations are utilized to convert the input radar signals into the 2-D time-frequency images (TFIs) in the first part. Then, the TFIs are, respectively, subjected to 2-D wiener filtering, normalizing and resizing, and then fabricated into dataset-1 and dataset-2. In the second part, the dataset-1 and dataset-2 are, respectively, input into the DCNN models to extract more detailed radar signal features. Then, the extracted features are feed into a multi-features fusion model to integrate features. The proposed system framework possesses the competitive advantages in integrating different modality features and extracting the detailed features to improve the recognition performance at the lower SNR. Finally, the radar signals are accurately recognized and classified.

3 Data pre-processing

In this section, the fabricated radar signal model is introduced, and the TFIs are obtained by using CWD and SPWVD transformations. TFI pre-processing methods consist of 2-D wiener filtering, image amplitude normalization, and image size adjustment. The TFI can be binarized and transformed into the suitable size, and send into the DCNN model for extracting features.

3.1 Radar signal model

This paper mainly focuses on the common radar signals, which consist of binary frequency shift keying (2FSK), 4-frequency shift keying (4FSK), binary phase shift keying (BPSK), even quadratic frequency modulation (EQFM), FRANK, linear frequency modulation (LFM), normal signal (NS), and sinusoidal frequency modulation (SFM). Those signals are similar to that in [25]. Gaussian white noise (GWN) is used to process these radar signals. The SNR is aimed to simulate the realistic application environment, and can be formulated as \( SNR=10{\log}_{10}\left({\sigma}_s^2/{\sigma}_{\upvarepsilon}^2\right) \). Therein, \( {\sigma}_s^2 \) and \( {\sigma}_{\varepsilon}^2 \) refer to the variances of signal and noise, respectively. The radar signal is expressed as [28].

$$ x(nT)=s(nT)+m(nT)=\mathrm{A}{e}^{j\phi (nT)}+m(nT) $$
(1)

where x(nT) represents the radar signals; s(nT) is n-th sample signals with respect to period T; m(nT) refers to n-th GWN samples with variances \( {\sigma}_{\varepsilon}^2 \)T and n are sampling interval and integer, respectively; A and ϕ represent the amplitude and instantaneous phase.

3.2 Time-frequency transformations

The CWD is one of the most widespread time-frequency transformation approaches of the radar signal [14], and used in this paper. It not only expresses the radar signal in detail by introducing a kernel function, but also effectively prevents the cross terms. The expression of CWD is defined as

$$ \kern0.5em C\left(t,\omega \right)={\iiint}_{\infty }{e}^{j2\pi \upxi \left(s-t\right)}f\left(\xi, \tau \right)\cdotp x\left(s+t/2\right)\cdotp {x}^{\ast}\left(s-t/2\right)\cdotp {e}^{- j\omega \tau} d\xi dsd\tau $$
(2)

where C(t, ω) refers to the obtained results of CWD transformation; t and ω represent the time axes and angular frequency, respectively; f(ξ, τ) represents a window function, which is formulated as

$$ f\left(\xi, \tau \right)={e}^{{\left(\pi \xi \tau \right)}^2/2\sigma } $$
(3)

where σ is the controllable factor, which determines the filter bandwidth. To balance the cross terms and resolution in TFIs, σ is set as 1 in this paper [5].

Provided that all modulation types of CWD transformation are shown, it will take up the considerable space. For brevity, the obtained CWD transformation results of four types radar signals at −10 dB are shown in Fig. 2.

Fig. 2
figure 2

CWD transformation results of four types radar signals

The SPWVD is also a common time-frequency transformation method that can prevent significantly the cross terms, and is written as [16].

$$ S\left(t,f\right)=\iint x\left(t-\nu +T/2\right)\cdotp {x}^{\ast}\left(t-\nu -T/2\right)\cdotp h(T)g\left(\nu \right){e}^{-j2\pi fT} dvdT $$
(4)

where S(t, f) denotes the obtained results of SPWVD transformation; t and f represent the time axes and frequency, respectively; ∗ refers to the complex conjugate; h(T) and g(ν) are the window functions; x(t) represents the analytic signal of r(t), which is defined as

$$ x(t)=r(t)+ jH\left[r\left(\mathrm{t}\right)\right] $$
(5)

where H[·] denotes the Hilbert transformation.

Similarly, four radar signals modulation types of SPWVD transformation are only provided, as shown in Fig. 3.

Fig. 3
figure 3

SPWVD transformation results of four types radar signals

As can be known from Figs. 2 and 3 that different TFIs of radar signals obtained by CWD and SPWVD transformations, including BPSK, NS, FRANK, and EQFM, exist the significant differences. It is worth pointing out that the TFIs converted by CWD is generally clearer than that converted by SPWVD at the same SNR. The better TFIs might be able to extract more detailed features in recognizing the radar signals.

3.3 Time-frequency image pre-processing

In this paper, the TFIs obtained from CWD and SPWVD transformations are further reprocessed by image processing methods [19]. First of all, 2-D wiener filtering is adopted to restrain the noise of the TFIs. Therein, the 2-D wiener filtering is an adaptive filtering method that could adjust the filtering effect according to the local variance of the image, which gets a better filtering effect on GWN. Then, to reduce and simplify the computational complexity of features extracted from the DCNN model, bilinear interpolation is used to adjust the size of TFIs into 224 × 224 pixels, and normalize the amplitude of TFIs. Therein, the maximum normalization method is adopted in this paper, and the amplitude of each pixel of TFI is divided by the maximum amplitude for obtaining the normalized TFIs. Finally, the TFIs after time-frequency image pre-processing methods, respectively, come into being dataset-1 and dataset-2.

4 Network model

Inspired by Tan et al. [26], the DCNN model is proposed to extract radar signal features. In order to improve the recognition performance of the radar signal at the lower SNR, a multi-features fusion model is structured by integrating different modality features and extracting the more detailed features. To optimize the learning efficiency of the proposed model, the Adam optimization [10] is employed to train the network parameters. The dataset-1 and dataset-2 are, respectively, first input into the two constructed DCNN models for feature extraction. The extracted features are then input into a multi-features fusion model for integrating features. The obtained features fusion maps are finally input into the classification layer to recognize the radar signal types. In the following analyses, the EDCNN-FF model and multi-features fusion model are described in detail.

4.1 Efficient deep convolutional neural network with features fusion model

Figure 4 illustrates the proposed EDCNN-FF model for extracting more detailed features of radar signals and classifying accurately the signal types. The proposed model possesses the competitive advantages in integrating different modality features and alleviating the vanishing-gradient problem, with the purpose of improving the recognition performance of the radar signals, especially at the lower SNR.

Fig. 4
figure 4

EDCNN-FF model framework

As can be known from Fig. 4 that the proposed EDCNN-FF includes two DCNN models, one multi-features fusion model, two fully connected layers, and one Softmax classifier. Therein, one DCNN model consists of one convolutional layers with kernel size of 3 × 3 and stride size of 2; one mobile inverted bottleneck convolution (MBConv) block with expansion ratio of 1, kernel size of 3 × 3, and stride size of 1; two MBConv blocks with expansion ratio of 6, kernel size of 3 × 3, and stride size of 2; two MBConv blocks with expansion ratio of 6, kernel size of 5 × 5, and stride size of 2; three MBConv blocks with expansion ratio of 6, kernel size of 3 × 3, and stride size of 2; three MBConv blocks with expansion ratio of 6, kernel size of 5 × 5, and stride size of 1; four MBConv blocks with expansion ratio of 6, kernel size of 5 × 5, and stride size of 2; one MBConv block with expansion ratio of 6, kernel size of 3 × 3, and stride size of 1.

The structure of MBConv block in the DCNN model is identical to the inverted residual block of MobileNetV2 [23]. The overall structure of MBConv block is illustrated detailedly in Fig. 5.

Fig. 5
figure 5

Structure of MBConv block

As can be seen in Fig. 5 that the structure of MBConv block is composed of three parts. In the first part, the block includes a convolutional layer with kernel size of 1 × 1, a batch normalization, and an activation function Swish. In the second part, it consists of a global pooling layer, a convolutional layer with kernel size of 1 × 1, an activation function Swish, a convolutional layer with kernel size of 1 × 1, and an activation function Sigmoid. In the third part, it contains a convolutional layer with kernel size of 1 × 1, a batch normalization, and a dropout. Therein, convolutional layer can efficiently retain the background information of TFIs and achieve the nonlinear interaction between cross-channel information; global pooling layer enjoys superior extraction capability for TFIs texture information; activation function Swish aims to add nonlinear into the neural network model; batch normalization and dropout could solve the problem of gradient disappearance and overfitting, respectively. The structure of MBConv block enhances the connection between network modules, reduces features transmission loss, and hence improves the reutilization of features.

4.2 Multi-features fusion model

Single TFI may fail to extract the more detailed features and cause the lower recognition performance in the Softmax classifier to some extent. Inspiring by the fact that the more effective features extracted from TFIs, the features fusion could improve the recognition performance of radar signals, especially at the lower SNR. Multi-features fusion possesses the better ability to decrease the concatenating complexity and increase the penalty terms between individual modality and concatenated features. Therefore, multi-features fusion model is first proposed to improve the recognition performance of the radar signal in this paper, as illustrated in Fig. 6.

Fig. 6
figure 6

Multi-features fusion model

As can be shown in Fig. 6, the multi-features fusion model is composed of a convolutional layer with kernel size of 1 × 1, a convolutional layer with kernel size of 3 × 3, and a global average pooling layer. In the process of radar signal recognition, a multi-features fusion model by integrating the features obtained from CWD and SPWVD is applied. It not only considers the features of single modality, but also considers the relationship among the features of every modality.

The penalty term between the two predicted label distributions is to be considered in the multi-features fusion model. Jensen-Shannon (JS) divergence is used to solve the penalty and make up for the disadvantage in the asymmetry of Kullback-Leibler (KL) divergence, which is written as

$$ JS\left(p\parallel q\right)=\frac{1}{2} KL\kern0.5em \left(p\parallel \frac{p+q}{2}\right)+\frac{1}{2} KL\kern0.5em \left(q\parallel \frac{p+q}{2}\right) $$
(6)

where p and q denote two kinds of probability distributions, respectively.

The loss function L(θ) of the model is defined as

$$ L\left(\theta \right)=\frac{1}{N}\sum \limits_{i=1}^N JS\left({t}_{\mathrm{i}}\parallel {p}_{\theta^c}\left({x}_i^c\right)\right)+\frac{\mu }{2}\parallel {\theta}^c{\parallel}^2+\frac{1}{N}\sum \limits_{m=1}^2\sum \limits_{i=1}^N JS\left({p}_{\theta^m}\left({x}_i^c\right)\parallel {p}_{\theta^m}\left({x}_i^m\right)\right)+\frac{\mu }{2}\parallel {\theta}^m{\parallel}^2 $$
(7)

where \( {x}_i^m\ \left(m\in \left\{1,2\right\},\kern0.5em i\in \left\{1,\cdots N\right\}\right) \) represents the m-th TFI features of the i-th sample; \( {x}_i^c \) refers to the concatenated features from \( {x}_i^m \); θ (θ ∈ {θc, θm}) denotes the obtained training parameter; N refers to the number of training samples; ti represents the true probability distribution; μ refers to the hyper-parameter.

5 Experiment, results and discussions

This section is to assess the recognizing performance of the proposed approach for various radar signals. Fabricated datasets and trained network parameters are first provided. Then, to show in detail recognition performance of the proposed method at the SNR, five radar signal recognition methods, called as deep convolutional neural network and choi-williams distribution (DCNN-CWD, only adopts CWD), deep convolutional neural network and smooth pseudo-wigner-ville distribution (DCNN-SPWVD, only adopts SPWVD), efficient deep convolutional neural network and features fusion (EDCNN-FF, adopts CWD and SPWVD), convolutional neural network and deep Q-Learning network (CNN-DQLN) [21], and convolutional denoising autoencoder and deep convolutional neural network (CDAE-DCNN) [20], are selected as comparative analyses in this paper.

5.1 Experimental datasets and training parameters

The radar signals include 2FSK, 4FSK, BPSK, EQFM, FRANK, LFM, NS, and SFM. The parameters of radar signals utilized in this paper are listed in Table 1.

Table 1 Parameters of various radar signals

The dataset-1 can be obtained by pre-processing the TFIs that using CWD transformation. 1000 samples for each signal type at the same SNR are randomly generated when SNR increases from −12 dB to 12 dB at a step of 2 dB, and thus the training dataset-1 contains 104,000 samples. Similarly, 400 samples are selected to randomly generate for each signal at the same SNR, and the validation dataset-1 includes 41,600 samples. The training and validation dataset-2 is produced by pre-processing the TFIs that using SPWVD transformation, which is the same way as dataset-1. The testing dataset is obtained by pre-processing the TFIs that using CWD and SPWVD transformations. 300 samples for each signal type at the same SNR are randomly generated, and thus the testing dataset contains 62,400 samples. The detailed datasets statistics are listed in Table 2.

Table 2 Datasets statistics

The evaluation of the recognition performance is performed on the deep learning framework of Pytorch. For hardware, a computer with 20 nuclear CPU and two GTX 3090 graphics cards is utilized. For training the network parameters and obtaining the better recognition performance, the mini-batch size for each training is set as 64, epochs as 40, and the epsilon and momentum as 0.001 and 0.01, respectively. In addition, the learning rate is set as 0.001. For preventing the overfitting in training state, the weight decay method is adopted and set as 1e-5.

5.2 Results and discussions

To more objectively evaluate the recognition performance of the proposed EDCNN-FF, the average recognition accuracy of EDCNN-FF is compared with DCNN-SPWVD, DCNN-CWD, CNN-DQLN [21], and CDAE-DCNN [20] for various radar signals. Figure 7 illustrates the variation of average recognition accuracy with the SNR for various methods.

Fig. 7
figure 7

Variation of recognition accuracies with SNR for various methods

As can be known from Fig. 7 that the SNR plays an essential role in determining the average recognition accuracy. The recognition accuracies all enhance with the increase of SNR for EDCNN-FF, DCNN-SPWVD, DCNN-CWD, CDAE-DCNN, and CNN-DQLN. This reason for the phenomenon is that the interference at the lower SNR is more serious for TFIs, which would weaken the time-frequency analysis ability to some extent. It is worth pointing out that the increasing trend of recognition accuracy for EDCNN-FF is obviously faster over others. The recognition accuracy of the proposed EDCNN-FF can be up to 100% at −4 dB. While, the recognition accuracy could reach to 100% at −2 dB for DCNN-SPWVD, DCNN-CWD, and CDAE-DCNN. Moreover, there exists a larger difference in the average recognition accuracy of five methods at the lower SNR. When the SNR is at −12 dB, the obtained recognition accuracies for EDCNN-FF, DCNN-SPWVD, DCNN-CWD, and CDAE-DCNN are 84.38%, 52.25%, 63%, and 61%, respectively. When the SNR is at −10 dB, the recognition accuracies for EDCNN-FF, DCNN-SPWVD, DCNN-CWD, CDAE-DCNN, and CNN-DQLN are 94.31%, 74.88%, 83.94%, 90%, and 71.72%, respectively. Therefore, the proposed EDCNN-FF demonstrates the superior recognition accuracy than others, especially at the lower SNR. That’s because the proposed multi-features fusion model EDCNN-FF can integrate features extracted from two DCNN models, which make full use of the relationship among different features and further improves the recognition performance.

For showing the crucial factors that limit the recognition accuracy at the lower SNR environment, the confusion matrices for EDCNN-FF, DCNN-SPWVD, and DCNN-CWD at −12 dB are investigated. Figure 8 illustrates the confusion matrices of various radar signals for EDCNN-FF, DCNN-SPWVD, and DCNN-CWD at −12 dB.

Fig. 8
figure 8

Confusion matrices for DCNN-SPWVD, DCNN-CWD, and DCNN-FF at −12 dB

As illustrating from Fig. 8 that the recognition accuracy of EDCNN-FF is obviously better than those of DCNN-SPWVD and DCNN-CWD for various radar signals at −12 dB. Moreover, the probabilities of correctly recognizing SFM are only 23% and 29% for DCNN-SPWVD and DCNN-CWD, respectively. While, the probability of EDCNN-FF for SFM can reach up to 89%, which showed a significant improvement over other methods. It is worth pointing out that the recognition accuracies of LFM and NS demonstrate the best performance over other signals for EDCNN-FF. While, the errors of 4FSK, FRANK and BPSK are higher than others. The probability of correctly recognizing BPSK is only 72%, and that is because BPSK is easily confused with NS. Furthermore, the main discrepancy is that the sample length ranges from 512 to 1024 and cannot contain all the frequency components.

To clearly obtain a better insight of how the classification accuracy of a specific modulation type vary with SNR, Fig. 9 illustrates the variation of classification accuracy of the proposed EDCNN-FF, DCNN-SPWVD, DCNN-CWD, and CNN-DQLN with SNR for 2FSK, 4FSK, BPSK, EQFM, FRANK, LFM, NS, and SFM.

Fig. 9
figure 9

Classification accuracy of each modulation type versus SNR for DCNN-FF, DCNN-SPWVD, DCNN-CWD, and CNN-DQLN

As can be observed from Fig. 9 that the recognition performance of four methods can be significantly improved with the increase of SNR, and the classification accuracy is similar to each other at the higher SNR, especially more than 0 dB. While, the recognition performance of EDCNN-FF is obviously better than DCNN-SPWVD, DCNN-CWD, and CNN-DQLN at the lower SNR. When the SNR is at −12 dB, the classification accuracies of the proposed EDCNN-FF are all more than 70% for majority signals, and some of them even achieve 90%, such as NS and LFM, as validated in the confusion matrices. It is worth pointing out from Fig. 9a, b and h that the recognition accuracies of 2FSK, 4FSK and SFM can be improved significantly by using the proposed EDCNN-FF. That is because of the powerful features extraction and features fusion of the proposed EDCNN-FF model at the lower SNR. Therefore, the proposed approach demonstrates the superior recognition performance over others, especially at the lower SNR. This work provides a sound experimental foundation for further improving radar signal recognition and enhancing the future real application in modern electronic warfare systems.

To evaluate the computational complexity of the proposed EDCNN-FF, the floating point operations per second (FLOPs) of EDCNN-FF is compared with DCNN-SPWVD, DCNN-CWD, and residual neural network (ResNet) [8]. Table 3 demonstrates the computational complexity of four methods. Therein, the time complexity is measured by the inference time under the above mentioned hardware conditions.

Table 3 Complexity of four methods

Because of including two subnetworks, the utilized parameters of EDCNN-FF are larger than other methods, as shown in Table 3. In the time complexity, the proposed EDCNN-FF are about 6 ms, 9 ms, and 14 ms longer than DCNN-CWD, DCNN-SPWVD, and ResNet, respectively. This reason is that the proposed method consumes a lot of time to extract time-frequency features obtained by CWD and SPWVD in the subnetwork and to fuse features. Based on the average recognition accuracy and the computational complexity, the proposed model demonstrates the superior recognition performance and generalization ability over others.

6 Conclusions

This paper proposed an efficient EDCNN-FF approach with features fusion for enhancing the recognition accuracy of radar signal types at the lower SNR. The radar signals were first transformed into TFIs by using CWD and SPWVD transformations, and then input into two constructed DCNN models after the image pre-processing. A multi-features fusion model was proposed to integrate different modality features. The experimental results showed that the proposed method possessed the superior performance over others, especially at the lower SNR. The classification accuracy can be up to 84.38% when the SNR was at −12 dB, and even reached to 94.31% at −10 dB. LFM and NS showed the better recognition performance than others. Moreover, the recognition accuracy of 2FSK, 4FSK and SFM can be largely improved through the proposed EDCNN-FF. This work could provide a useful guidance for improving radar signals recognition.

In further work, dual-component radar signals will be considered and explored to further enhance the generalization and robustness of the proposed approach. In addition, future research will focus on compressing the network parameters and reducing the running time.