1 Introduction

Automatic radar waveform recognition technology can be capable of identifying the low probability of intercept (LPI) radar waveform of received signal, which plays an essential role in electronic warfare systems, such as electronic support, electronic intelligence, electronic attack, and so on [1, 13]. With the remarkable development of radar technology and the working environment of radar at the lower SNR in the recent decades, the modulation types of LPI radar signal have become more and more complicated and diversified [24]. Therefore, it is crucial to explore a more accurate approach to recognize the radar waveform at the lower SNR environment.

Some LPI waveform automatic recognition techniques have been proposed in recent years [2], which utilized feature extraction and classification techniques to extract features from the LPI radar signal and classify the types of signal, respectively. For the feature extraction, time-frequency conversion technology can transform the signal waveform into time-frequency images (TFI), such as wigner ville distribution (WVD) [11, 28] and choi-willian distribution (CWD) [15, 35]. In addition, a combination of two time-frequency conversions has also been researched, for example, smooth pseudo wigner-ville distribution (SPWVD) and born-jordan distribution (BJD) [38], WVD and CWD [20, 36]. Moreover, with the significant achievement of artificial intelligence, deep learning has aroused widespread concern and widely applied in various fields, for its excellent feature extraction and classification capabilities [39]. VGG [25], Highway Networks [26], Residual Networks (ResNets) [7], and DenseNets [8] have been proposed. Dense connection as well as residual connections have been widely used in computer vision tasks, such as object tracking [18] and video object segmentation [19, 29].

The advanced network model, including the recurrent neural networks (RNN) [16], deep belief networks (DBN) [4], support vector machine (SVM) [6], and convolutional neural networks (CNN) [3, 9, 22, 30, 41], have been proposed for improving the recognition performance of radar waveform. Zhang et al. [37] explored a novel blind modulation classification method based on the time-frequency distribution and CNN. Kong et al. [14] proposed LPI radar waveform recognition technique based on CNN and designed the hyperparameters. Wang et al. [31] investigated an automatic modulation classification algorithm based on a joint feature map for discriminating the radar emitter signal, and the overall recognition rate of the six LPI radar signals was up to 97% at SNR of 6 dB. Wan et al. [27] researched an automatic identification system for detecting, tracking and locating low probability radar waveform, and the experimental results showed that the overall recognition rate of the system reached to 94.42% when the SNR was −4 dB. Ma et al. [21] proposed an autocorrelation feature image construction technique (ACFICT) combined with CNN, and the simulation results showed that when the SNR was −6 dB, the overall recognition rate of the method was up to 88%. Wang et al. [32] proposed a novel waveform recognition method based on an adversarial unsupervised domain adaptation, which incorporated adversarial learning to improve the cross-scenario recognition performance. Zhu et al. [40] proposed a deep multi-label based multiuser automatic modulation classification framework (MLAMC) for compound signals, which validated the effectiveness and superiority of the method. Wei et al. [33] proposed a novel network that combined a shallow CNN, long short-term memory (LSTM) network and deep neural network (DNN) for recognizing six types of radar signals, which demonstrated that the accuracies in autocorrelation domain were all more than 90%.

A common limitation of the above-mentioned work is that the recognition accuracy is still low up to now, especially at the lower SNR environment, and the recognition types of the radar waveform are relatively less. To remedy these flaws, an accurate automatic modulation classification algorithm based on dense convolutional neural networks (AAMC-DCNN) is proposed. The algorithm owns the competitive advantages of strengthening the feature reuse and extracting the detailed feature, for improving the recognition performance of radar waveform at the lower SNR. The proposed AAMC-DCNN mainly consists of data pre-processing, feature extraction and classification. In the first part, the eight types of signal are converted into time-frequency images by choi-williams distribution (CWD), and the large training and testing datasets are fabricated. In the second part, the dense convolutional neural networks (CNN) are designed, and the transfer learning and Adam optimization are introduced. Finally, the experimental analyses are carried out to evaluate the recognition performance. It is worth mentioning that the classification accuracy can be up to 93.4% when the SNR is −8 dB, and even reach to 100% at 0 dB.

This paper is organized as follows. The overall structure of the LPI radar waveform recognition algorithm is proposed in Section 2. Signal model and basic theory of time-frequency distribution are introduced in Section 3. A dense CNN model for LPI radar waveform recognition algorithm is designed in Section 4. The large dataset is fabricated, and the recognition performance of the proposed method is fully investigated by experiment in Section 5. Conclusion of this work is drawn in Section 6.

2 Designing of AAMC-DCNN

Figure 1 illustrates the proposed AAMC-DCNN. The algorithm aims to improve radar waveform recognition accuracy and enhance robustness, especially at the lower SNR environment, which mainly consists of data pre-processing, dense CNN feature extraction and radar waveform classification.

Fig. 1
figure 1

AAMC-DCNN framework

As can be known from Fig. 1 that LPI radar waveform signal will transform 2-D TFI by using choi-williams distribution (CWD) transformation in the first part. The dataset can be obtained by transforming and binarizing the TFI. The partial pre-processed dataset, about 70%, is selected as the training. While, the remained dataset is used as the testing. In the second part, the dense CNN is proposed for the sake of feature extraction and classification, which owns the competitive advantage of improving the recognition performance of radar waveform at the lower SNR by strengthening the feature reuse and extracting the detailed feature. It is attributed to the fact that it possesses a dense connection mechanism. In order to accelerate and optimize the learning efficiency of the proposed model that without learning from scratch as before, the transfer learning and Adam optimization are employed to share the learned parameters with the proposed model. The dense CNN is first pre-training by using ImageNet, and the obtained pre-training parameters are saved. Then, the fabricated training dataset is input into the designed dense CNN model to extract radar waveform feature by means of using the pre-training parameters, and the Adam optimization is used to optimize and train the network parameters. The testing dataset is eventually sent into SoftMax classifier, and the eight types of LPI radar waveform classification results can be obtained accurately.

3 Data pre-processing

The CWD time-frequency analysis is utilized to convert the LPI radar waveform into the 2-D TFI, which can be binarized and then transformed into the suitable size to send into the dense CNN.

3.1 Signal model

This paper pays close attention to the common LPI radar signals, including binary frequency shift keying (2FSK), 4-frequency shift keying (4FSK), binary phase shift keying (BPSK), even quadratic frequency modulation (EQFM), Frank, linear frequency modulation (LFM), normal signal (NS), and sinusoidal frequency modulation (SFM), which can be processed by the gaussian white noise (GWN). The SNR is used to add the recognition complexity, and can be written as \(SNR=10{\log}_{10}\left({\sigma}_s^2\right)/{\sigma}_{\varepsilon}^2\), where \({\sigma}_s^2\) and \({\sigma}_{\varepsilon}^2\) represent the variances of signal and noise, respectively. The LPI radar signal is formulated as [34].

$$z(nT)=s(nT)+m(nT)$$
(1)

where z(nT) refers to the received LPI radar waveform signal; s(nT) represents n-th sampling signal of period T; m(nT) is the n-th GWN sample of power \({\sigma}_{\varepsilon}^2\).

3.2 Time-frequency distribution

The CWD refers to a Cohen-type time-frequency distribution [17]. It not only expresses the detected signals in detail by introducing a kernel function, but also prevents significantly the cross terms. The C(t, ω) refers to the obtained result of CWD time-frequency conversion, which is given as

$$C\left(t,\omega \right)={\iiint}_{\infty }{e}^{j2\pi \xi \left(s-t\right)}f\left(\xi, \tau \right)\cdotp x\left(s+t/2\right)\cdotp {x}^{*}\left(s-t/2\right)\cdotp {e}^{- j\omega \tau} d\xi dsd\tau$$
(2)

where f(ξ, τ) is a kernel function, referring to a 2-D low-pass filter, which is defined as

$$f\left(\xi, \tau \right)={e}^{{\left(\pi \xi \tau \right)}^2/2\sigma }$$
(3)

where σ represents the controllable factor, and it decides the bandwidth of the filter. σ is set to 1 for balancing the cross terms and resolution in TFI of radar waveform. The obtained CWD transformation results of the eight types of LPI radar signal are shown in Fig. 2.

Fig. 2
figure 2

CWD transformation results of the eight types of LPI radar signal: (a) 2FSK; (b) 4FSK; (c) BPSK; (d) EQFM; (e) Frank; (f) LFM; (g) NS; (h) SFM

4 Feature extraction and classification

Inspired by Huang et al. [8], a dense CNN is proposed for extracting feature and classifying the LPI radar signals. In order to speed up and optimize the learning efficiency of the proposed model that without learning from scratch as before, the transfer learning [23] is also introduced in this model. The Adam optimization algorithm [12] is adopted to optimize the proposed dense CNN parameters. The specific steps of the implementation approach are as follows.

4.1 Dense CNN

Figure 3 illustrates the proposed dense CNN model for extracting feature maps of radar waveform and classifying the types of radar signal. The model possesses the competitive advantages of improving the recognition performance of radar waveform at the lower SNR by alleviating the vanishing-gradient problem, strengthening the feature propagation, encouraging the feature reuse, extracting the detailed feature, and substantially reducing the parameters.

Fig. 3
figure 3

Modified dense CNN model framework

It can be known from Fig. 3 that the proposed dense CNN model includes five dense blocks (naming dense-1, dense-2, dense-3, dense-4, and dense-5) and four transition layers (naming transition-1, transition-2, transition-3, and transition-4). The training dataset is first input into the convolutional layer with the kernel size of 7 × 7 for feature extraction. The extracted features are input into the max pooling layer with the kernel size of 3 × 3, for reducing the dimension of the feature maps. They are then sent into the dense-1, including 6 convolutional layers with the kernel sizes of 1 × 1 and 3 × 3, respectively, aiming to further extract feature. The decreased input feature maps not only can improve the computation efficiency, but also integrate the features of each channel. The obtained feature maps are feed into the transition-1, consisting of a batch normalization, a convolutional layer with a kernel size of 1 × 1 and an average pooling layer with the kernel size of 2 × 2, for matching the size of the feature maps. It has an ability of taking full advantage of the learned feature maps and decreasing unnecessary external noise without using zero padding. The action mechanisms of the following dense blocks and transition layers are similar to the dense-1 and transition-1, respectively. The eventual feature maps extracting from dense-5 are input into the classification layer, which include a global average pooling layer with a filter size of 7×7, an 8-D full connection layer, and a SoftMax classifier. The detailed parameters of the proposed dense CNN are shown in Table 1.

Table 1 Parameters of the dense CNN

The dense CNN owns a more radical dense-connection mechanism compared with DenseNet [8], which interconnects all the layers. Figure 4 shows the connection mechanism of a dense block.

Fig. 4
figure 4

Dense block

It is observed from Fig. 4 that each layer is connected to all previous layers on the channel dimension and serves as the input to the next layer. For the l-th layer network, the dense block consists of l(l + 1)/2 connections. In addition, the dense block is also defined as the feature map of directly contacting with different layers, which can reuse feature and thus improve efficiency. The l-th layer can receive the feature maps of all preceding layers, which can be given as

$${y}_l={H}_l\left(\left[{y}_0,{y}_1,\cdots, {y}_{l-1}\right]\right)$$
(4)

where Hl(·) refers to non-linear transformation function, including a series of batch normalization [10], rectified linear units (ReLU) [5], pooling and convolution layer. [y0, y1, ⋯, yl − 1] represents the concatenation of the feature maps produced in layers 0, 1,…, l-1, respectively.

To maintain consistent feature map sizes in the dense CNN connection, the dense block and transition structure are utilized in this model. The transition module can connect two adjacent dense blocks and reduce the size of the feature maps by means of average pooling layer. The m feature map channels obtained by dense block are input into the transition layer, and it can produce θ ∗ m feature maps, where θ is the compression rate. θ is set to 0.5 in this paper for reducing network parameters. Therefore, the transition layer can act as a compression model to some extent.

4.2 Pre-training and optimization

One of the most powerful ideas in deep learning is transfer learning, in which neural networks can acquire knowledge from one task sometimes and apply successfully that knowledge to another similar task. The introduction of transfer learning could reduce the training time of the network model and eliminate the need to start from scratch on a new dataset to some extent. In this paper, the proposed dense CNN is pre-training on the ImageNet dataset, and the pre-trained parameters are saved to later train the LPI radar signal dataset in the dense CNN model.

Adam is considered as a first-order optimization algorithm that can replace the traditional stochastic gradient descent process, which owns the powerful advantages of decreasing the computation resource and speeding up the model convergence [12]. It can update the weight of neural network iteratively based on the training data, and an important feature of its updating rules is to choose the step size carefully. Assuming ε equal to 0, the effective descent step size of the time step t and parameter space can be written as

$${\Delta }_t=\frac{\alpha \cdotp \hat{m_t}}{\sqrt{\hat{v_t}}}$$
(5)

where α represents the step parameter; mt refers to the exponential moving averages of the gradient; vt is the squared gradient; \(\hat{m_t}\) is the bias-corrected estimate of mt; \(\hat{v_t}\) is the estimate of vt.

The approximation magnitude of the effective step size of each time in the parameter space is limited by the step size factor α, which is given as

$${\left|{\Delta }_t\right|}_{\approx}^{<}\alpha \cdotp \frac{1-{\beta}_1}{\sqrt{1-{\beta}_2}}$$
(6)

where β1 and β2 are the hyper-parameters, which control the exponential decay rates of these moving averages.

The initialization bias is used to correct the term for Adam algorithm, which will be derived from the second-order moment estimation. The gradient of the stochastic objective function f can be first obtained, and then its second-order original moment is estimated by using the exponential moving mean and decay rate β2 of the squared gradient. The gradients in time step sequence is defined as g1, …, gT, respectively, which all obeys the potential gradient distribution gt, where gt ~ p(gt). The exponential moving initialized mean v0 is equal to zero vector. The updated exponential moving mean at time step t is given as

$${v}_t={\beta}_2\cdotp {v}_{t-1}+\left(1-{\beta}_2\right)\cdotp {g}_t^2$$
(7)

where \({g}_t^2\) represents the Hadamard product gt ⊙ gt. When v is eliminated, it can be rewritten as a function that only contains the gradient and the decay rate on all previous time steps. The Equation is given as

$${v}_t=\left(1-{\beta}_2\right)\cdotp \sum \limits_{i=1}^t{\beta}_2^{t-i}{g}_t^2$$
(8)

The expectation operation is conducted for Eq. (8), and the obtained results can be written as

$$\mathrm{E}\left[{v}_t\right]=\mathrm{E}\left[\left(1-{\beta}_2\right)\cdotp \sum \limits_{i=1}^t{\beta}_2^{t-i}{g}_t^2\right]=\mathrm{E}\left[{g}_t^2\right]\cdotp \left(1-{\beta}_2\right)\cdotp \sum \limits_{i=1}^t{\beta}_2^{t-i}+\zeta =\mathrm{E}\left[{g}_t^2\right]\cdotp \left(1-{\beta}_2^t\right)+\zeta$$
(9)

If the real second order moment E[g2i] is stationary, then the ζ is set to 0, otherwise ζ could keep a very small value. It is because the hyper-parameter β1 can make the moving average distribution of the small weight into gradient. Therefore, the term \(\left(1-{\beta}_2^t\right)\) is only remained by initializing the zero vector.

4.3 Dataset

The radar signal datasets utilized in this paper, consisting of 2FSK, 4FSK, BPSK, EQFM, Frank, LFM, NS, and SFM, are formulated. Table 2 lists the employed radar signal parameters. The frequency parameters of the signals are processed by normalization. In order to analyze the generalization performance of the approach, the parameters of all simulation signals have a dynamic characteristic. The signal length is randomly changed from 512 to 1024.

Table 2 The parameters of LPI radar signal

For the training dataset, there are 1000 samples randomly generated for each signal type at the same SNR condition, and the SNR ranges from −8 to 14 dB at intervals of 2 dB in this paper. Therefore, the obtained training dataset is totally up to 96,000 samples. While, for the testing dataset, 400 samples can be randomly produced for each signal at the same SNR. Therefore, the testing dataset includes 38,400 samples. The training dataset will be used to train the network parameters, while the testing dataset will be utilized to evaluate the recognition performance of radar waveform.

5 Experimental results and analyses

The experimental analyses are conducted to evaluate the performance of the proposed AAMC-DCNN model for recognizing the radar waveform. The network parameters of LPI radar signals and the datasets are first designed. Next, the average recognition accuracy of eight radar signal types of the model is fully verified by the experiment, and performance analyses are carried out to compare with time-frequency feature map using CNN (TFFM-CNN) [28] and joint feature map and CNN (JFM-CNN) [30]. Then, to show the recognition performance of the proposed method at the lower SNR in detail, the confusion matrices are investigated for the SNR changes from -8 to -2 dB. Finally, the variation of performance with SNR at a specific modulation type is clearly obtained, and the variation of classification accuracy with SNR for the AAMC-DCNN and JFM-CNN at 2FSK, 4FSK, BPSK and LFM is explored for the comparative analyses. To demonstrate the recognition performance of the proposed AAMC-DCNN over the two another method: TFFM-CNN and JFM-CNN, the average recognition accuracy of the AAMC-DCNN is compared with JFM-CNN and TFFM-CNN for various radar signal types. Figure 5 illustrates the average recognition accuracies of radar signal types of AAMC-DCNN, JFM-CNN and TFFM-CNN with the SNR.

Fig. 5
figure 5

Variation of average classification accuracy with SNR for AAMC-DCNN, JFM-CNN and TFFM-CNN

It should be noted that the variable SNR utilized in TFFM-CNN and JFM-CNN is ranged from −4 to 14 dB and 6 to 14 dB, respectively, for comparative analyses. As can be known from Fig. 5 that the classification accuracy all enhanced with the increase of SNR for these methods, and the recognition accuracy could all reach to 95% when the SNR is larger than 2 dB. When the SNR is at 6 dB, the obtained classification accuracy ratios for AAMC-DCNN, JFM-CNN and TFFM-CNN are 100%, 98%, and 98.5%, respectively. In addition, the proposed AAMC-DCNN demonstrates the superior recognition accuracy over others, especially at lower SNR. The proposed algorithm can be up to 100% at 0 dB. Therefore, the proposed AAMC-DCNN shows an outstanding average classification accuracy for eight signal types. This is attributed to the fact that the proposed dense CNN owns a denser linkage mechanism and can extract more detailed feature.

Furthermore, to more detail show the recognition performance of the proposed AAMC-DCNN at the lower SNR, the confusion matrices are investigated for the SNR changing from −8 to −2 dB. Figure 6 illustrates the confusion matrices of the proposed algorithm at various SNR.

Fig. 6
figure 6

Confusion matrices of the AAMC-DCNN for the SNR changing from −8 to −2 dB: (a) -8 dB; (b) -6 dB; (c) -4 dB; (d) -2 dB

As illustrated from Fig. 6, it is apparent that the classification accuracy of LFM and SFM demonstrates the best performance, while, the error of NS, Frank, and 4FSK are higher than other modulation types. The main discrepancy is due to the fact that Frank and BPSK are easily confused with each other. This can be explained by the dataset that their TFI are similar to each other, as shown in Fig. 2.

To clearly obtain the variation of classification performance with SNR at a specific modulation type, Fig. 7 illustrates the variation of classification accuracy of the proposed AAMC-DCNN and JFM-CNN with SNR at 2FSK, 4FSK, BPSK and LFM.

Fig. 7
figure 7

The classification performance of four modulation types versus SNR in AAMC-DCNN and JFM-CNN

As can be shown from Fig. 7 that the classification accuracy of the proposed AAMC-DCNN is more than 96% for all signals, and some of them even achieve 100% when the SNR is larger than 0 dB. In addition, the recognition performance of AAMC-DCNN is obviously better than JFM-CNN at the lower SNR. Therefore, the proposed algorithm shows a robust classification ability for various signal types, especially at the lower SNR. The powerful feature extraction and classification capability of the proposed dense CNN model are important reasons for explaining the excellent classification performance.

6 Conclusions

This paper presented an accurate automatic modulation classification algorithm based on dense convolutional neural networks, aiming to improve the recognition accuracy at the lower SNR environment. The proposed algorithm framework owns a more radical dense-connection mechanism compared with DenseNet and connects each layer to every other layer in a feed-forward pattern. The algorithm owns the competitive advantages of strengthening the feature reuse and extracting the detailed feature, for improving the recognition performance of radar waveform at the lower SNR. It was worth mentioning that the classification accuracy can be up to 93.4% when the SNR is −8 dB, and even reach to 100% at 0 dB, which demonstrated the superior performance over others, especially at the lower SNR. The confusion matrices demonstrated that the classification accuracies of LFM and SFM possessed the best performance over other modulation types. The present work provided an effective experimental foundation for the research of recognizing the radar signal waveform.