1 Introduction

Research on the intra-pulse characteristic analysis of radar signals began in the 1980s. Radar waveform design has recently increased in complexity because of the wide use of advanced radar [1,2,3,4]. The requirements for modern radar counter-spy reconnaissance have been difficult to meet using the five traditional characteristic parameters for radar signal identification. However, intra-pulse features can help reduce the probability of a multi-parameter space overlap, which provides a basis for the classification and identification of radar signals. These features also offer a feasible method for enhancing the recognition ability of current radar signals [5, 6]. Identification of intra-pulse modulation can improve the accuracy of signal sorting, guide radar jamming, and analyze the tactical performance of radars. Numerous methods to identify radar intra-pulse modulation have been proposed in the relevant literature. With the continuous development of radar technology, techniques in radar signal recognition are constantly updated. The low probability of intercept (LPI) radar signal has the characteristics of a large time–bandwidth product, strong anti-jamming ability, high resolution, and low interception, which impede the detection of traditional non-cooperative interception receivers [7, 8]. Literature [9] identified radar signals by measuring pulse parameters. Literature [10] extracted the singular-value features of time–frequency images. Literature [11] considered a method for diagnosing and classifying interception radar signals on the basis of pulse compression waveforms. The problems identified in the present study are as follows: 1. The robustness of the classifier is poor at a low SNR; 2. The effectiveness and universality of man-made features need to be assessed; 3. The recognition effect is influenced by image processing in several recognition methods on the basis of the image.

Deep learning is highly effective in solving various problems, such as visual recognition, speech recognition, and natural language processing [12]. Convolutional neural networks (CNNs) is a common deep-learning architecture which prompted by the cognitive mechanism of biological vision. CNNs can powerfully extract features and obtain effective characterization of original images. These qualities allow CNNs to extract visual laws directly from the original pixels with minimal preprocessing. However, a small-scale database direct training network may lead to model over fitting. The reason is that the number of parameters of the model would be considerably larger than the number necessary to fit the small data distribution [13]. Transfer learning can transfer knowledge from one machine learning mode to another. The technique can be used on small databases by fine-tuning the network model trained on large datasets. Therefore, the current study proposes an approach to training the network by transfer learning, thereby achieving recognition of eight types of modulation signals. Meanwhile, to prevent over-fitting networks caused by highly abstract characteristics, the current study combines the features possessed by deep neural networks with the shallow features of artificially extracted from images. The fused features are then fed into the SVM classifier for LPI radar signal recognition. The overall process flowchart is presented in Fig. 1.

Fig. 1
figure 1

Overall working process of this study

The rest of this paper is organized as follows, Section 2 briefly introduces the CWD and gives the CWD image of the signal to be identified. Section 3 first introduces network-based transfer learning, feature fusion and support vector machine (SVM), the overall algorithm flow is given at the end of this section. Then Section 4 is the experiments and results. We conclude this work in Section 5.

2 Signal time–frequency analysis

Radar signal is a non-stationary signal that can only obtain significantly limited signal information by traditional time–domain and frequency–domain analyses. A powerful tool for non-stationary signal processing, time–frequency significantly helps analyze radar signal characteristics. It maps one-dimensional time signals to two-dimensional time–frequency planes, which can fully characterize the time–frequency joint characteristics of non-stationary signals [14, 15]. Time–frequency analysis not only reflects the distribution of signal energy with respect to time and frequency; it also reveals the relationship of frequency with time.

There are two types of time–frequency analyses have been identified: linear representation and nonlinear representation. Typical linear time–frequency analyses include short-time Fourier transform, wavelet transform, and S transform, among others. By contrast, typical nonlinear time–frequency representation includes Wigner-Ville distribution (WVD), Choi–Williams distribution (CWD), and so on.

CWD time–frequency analysis was proposed in 1989. It exhibits minimum cross-interference in all unprocessed Cohen-like distributions as well as high resolution and recognition accuracy for signals at different times or frequencies. The CWD of the continuous signal f(t) is expressed as follows:

$$ C\left(t,\omega \right)=\underset{\infty }{\iint}\sqrt{\frac{\sigma }{4{\pi \tau}^2}}f\left(s,\tau \right)x\left(s+\tau /2\right){x}^{\ast}\left(s-\tau /2\right){e}^{- j\omega \tau} dsd\tau $$
(1)

where f(s, τ) is a kernel function. In accordance with Cohens general theory of time–frequency distribution, different distributions can be obtained by kernel functions.

Interference caused by cross-terms can be effectively reduced using the kernel function.

$$ f\left(s,\tau \right)=\exp \left[\frac{\sigma {\left(s-t\right)}^2}{4{\tau}^2}\right] $$
(2)

This kernel function is similar to a low-pass filter in a two-dimensional space that can filter signal cross-terms. The parameter σ is a controllable factor and determines the bandwidth of the filter. It can inhibit cross-term interference by controlling the value of σ.

To more efficiently reflect the CWD time–frequency distribution image of different signals and render the cross-term less obvious, this study uses σ = 1 to balance cross-term interference and signal resolution [16]. The discrete form of the Choi–Williams transform is as follows:

$$ C\left(l,\omega \right)=2\sum \limits_{\tau =-\infty}^{\infty }{e}^{-j2\omega \tau}\sum \limits_{s=-\infty}^{\infty}\frac{1}{\sqrt{4\pi {n}^2/\sigma }}{e}^{-\sigma {\left(s-l\right)}^2/\left(4{\tau}^2\right)}x\left(s+\tau \right){x}^{\ast}\left(s-\tau \right) $$
(3)

The smallest time scale in the discrete sequence is the sampling interval. Half of the translation in the continuous case cannot be achieved, and only a corresponding unit can be translated by at least one unit 20. For convenience of operation, the windowed Choi–Williams transform can be expressed as Eq. (4):

$$ C\left(l,\omega \right)=2\sum \limits_{\tau =-\infty}^{\infty }{W}_N\left(\tau \right){e}^{-j2\omega \tau}\sum \limits_{s=-\infty}^{\infty }{W}_M(s)\sqrt{\frac{\sigma }{4{\pi \tau}^2}}{e}^{-\sigma {s}^2/\left(4{\tau}^2\right)}x\left(l+s+\tau \right){x}^{\ast}\left(l+s-\tau \right) $$
(4)

where, WN(τ) is a symmetric window function, with the range of −N/2 ≤ τ ≤ N/2, and WM(τ) is a rectangular window with a value of 1 in the range of −M/2 ≤ τ ≤ M/2.

Figure 2 presents the CWD graph for SNR = 8 dB. As shown in the figure, the CWD images of 8 different LPI signals are significantly different, which facilitate the neural network to extract features.

Fig. 2
figure 2

Different classes of waveforms, including the linear frequency modulation (LFM), Costas code, T1, T2, T3, T4, Frank code, and BPSK code

3 Radar signal recognition rlgorithm based on transfer learning and feature fusion

Deep neural networks have recently drawn considerable interest in machine learning. Accordingly, image classification based on convolutional neural networks (CNNs) is widely used because of its high robustness and excellent performance. As highly powerful deep neural networks, deep CNNs have successfully realized image recognition and classification, NLP sentence classification, and so on [17,18,19,20]. CNNs have also been widely applied in the industrial field. Deep CNNs include a large number of network parameters and can easily lead to over-fitting during training in a small-scale database. This problem was addressed via transfer learning by reusing the structure and weights of the pre-training model [21,22,23,24,25].

3.1 Network-based transfer learning

Network-based deep transfer learning refers to multiplexing part of the network pre-trained in the source domain, including its network structure and connection parameters and migrating it to a part of the deep neural network used in the target domain .

Figure 3 presents a CNN to describe transfer learning. The network mostly includes parameters in the fully connected layer. All convolutional layers and the first two full connection layer parameters are frozen during training, and we only fine-tune the network parameters of the last fully connected layer (the fc8 layer in AlexNet and VGGNet). Lastly, we use the softmax classifier for classification. In order to transfer the pre-trained network model to the tasks of this study. The dimension of the last fully connected layer of the CNN model pre-trained on ImageNet is changed to suit for our task, and the parameters of this layer are randomly initialized. The fc8 layer is then fine-tuned under the database established in this paper. The pool layers in CNNs mainly act as a scaling-down layer of convolution results, which can also be regarded as nonlinear down sampling operation layers. The scale of the feature map is expected to decrease after pooling operation.

Fig. 3
figure 3

Transfer learning based on the deep convolutional neural network model

Figure 4 presents the feature map of the third convolution of the AlexNet, VGG16, and VGG19 networks respectively. The feature map shows that the features of the images extracted by different network models also vary with the network structure and the size of the convolution kernel.

Fig. 4
figure 4

Visualization of convolutional layer features. (a) Convolutional layer of AlexNet; (b) Convolutional layer of VGG16; (c) Convolutional layer of VGG19

3.2 Feature fusion algorithm

Deep CNN can automatically extract the deep features of images. If the given dataset is relatively simple, the model excessively considers the correlation between unnecessary data, such as noise in the fitting function, which can lead to over fitting. Therefore, to prevent over fitting when training neural networks and compensating for the deficiency of deep neural networks in shallow feature expression, we use feature fusion to realize radar signal recognition. The texture features of an image can describe repeated local patterns and permutation rules in images. The texture features of the images have a wide range of applications in image classification owing to its satisfactory anti-noise performance and rotation invariance [26]. To obtain a simple end classifier, image texture analysis was conducted based on the statistical properties of the gray histogram. We ultimately obtain six characteristic values. The features selected for feature fusion are listed in Table 1.

Table 1 List of features selected for feature fusion

The n-order moments of the mean are expressed as Eq. (5):

$$ {\mu}_n=\sum \limits_{i=0}^{L-1}{\left({z}_i-m\right)}^np\left({z}_i\right) $$
(5)

where zi represents a random variable of the gray scale, p(z) represents a histogram of gray levels in one region, and L represents the number of gray levels. All features in the table can be derived from Eq. (5), and the expression of each is given in Table 1.

A support–vector machine (SVM) is a supervised learning algorithm mainly used to solve data classification problems in pattern recognition. The basic idea of the SVM is to find the best separated hyperplane on the feature space to maximize the interval between positive and negative samples in a training set. This idea can be turned into solving the following constrained optimization problems by applying the Lagrange multiplier method and the concept of a kernel function [23]:

$$ {\displaystyle \begin{array}{l}\max L\left(\alpha \right)=\sum \limits_{j=1}^N{\alpha}_j-\frac{1}{2}\sum \limits_{j=1}^N\sum \limits_{k=1}^N{\alpha}_j{\alpha}_k{y}_j{y}_kK\left({x}_j,{x}_k\right)\\ {}s.t.\left\{\begin{array}{c}0\le {\alpha}_j\le C\\ {}\sum \limits_{j=1}^N{a}_j{y}_j=0\end{array}\right.\end{array}} $$
(6)

where C is the punishment coefficient, which denotes the punishment degree of SVM, and K(xj, xk) denotes the kernel function. There are commonly 4 types of kernel function, including linear kernel, polynomial kernel, sigmoid kernel and Guass radial basis kernel. In this study, we choose the Gaussian kernel, expressed as Eq. (7):

$$ K\left({x}_j,{x}_k\right)=\exp \left(-\frac{x_j-{x}_k^2}{2{\sigma}^2}\right) $$
(7)

Lastly, we use the fc7 layer features of the deep CNN and the texture features of the image to be fused in series. In addition, we use the fused features to train the SVM classifier in order to recognize the radar signal. The overall flowchart of the algorithm is shown in Fig. 5.

Fig. 5
figure 5

The overall flow of feature fusion algorithm

3.3 Dataset

Automatic waveform recognition systems are generally applicable in highly complex environments; thus, the length of the received signal is indefinite. To expedite the calculation, we set the number of samples to a random.

value between 512 and 1024 and discard data larger than 1024. The signal is assumed to be propagated on a Gaussian white noise channel, and SNR denotes the ratio of signal to noise [27,28,29,30,31]. \( SNR=10{\log}_{10}\left({\sigma}_s^2\right)/\left({\sigma}_{\varepsilon}^2\right) \).

All LPI radar signal generation and experimental simulation processes in this chapter are based on MATLAB 2016a. Different modulation types of LPI radar signals have various parameters; thus, this study uses the uniform distribution U(⋅) based on the sampling frequency fs. Radar signal parameters are listed in Table 2.

Table 2 List of simulation parameters

In this section, the SNR ranges from −8 to 8 dB with interval of 2 dB. We transform the radar signal into different time–frequency images by CWD. Lastly, we established a database contaning 1800 CWD pictures and 8 types of radar signal images. We split this database into two parts such that the training set has 1440 pictures, and the test set has 360 pictures. The images have a 236*236*3 resolution with the third dimension corresponding to the red–green–blue system color channels.

4 Simulation results

To verify the effectiveness of the transfer learning and feature fusion algorithm in our task, the experiments were conducted using AlexNet, VGG-VD16, and VGG-VD19. All images were resized to the optimal resolution for each CNN model—that is, either 227*227*3 or 224*224*3. The dimension of the last layer of the training model is modified to 1*1*4096*8 corresponding to the number of the radar waveform. During training, the parameters of all front layers are frozen, and the parameters of the fc8 layer are subjected to random initialization. Compared with real-world object recognition, time–frequency images are relatively simple, which might prompt the model to consider unnecessary noise and other unnecessary data association when fitting functions. To reduce over fitting, several dropout layers are added (which can weaken the single relationship between the features and form more combinations of features so that the model does not heavily depend on a feature) at the right before the output.

layers. The epoch is set to 30, and the batch size is 64. When training each deep CNN model, we used the standard cross-entropy loss (cost) function. The experimental environment is listed in Table 3.

Table 3 Testing environment

Figure 6 presents the trend of the recognition rate of three pre-training models under different iterations. After time–frequency analysis of the radar signal, the difference between classes become more apparent so that this method can achieve superior performance with reduced iteration. Meanwhile, Fig. 7 tests the robustness of the three models. As shown in the figure, the recognition rate of all three models can still exceed 90% under the condition of a reduced sample size.

Fig. 6
figure 6

Recognition performance with different iteration time

Fig. 7
figure 7

Recognition performance under different training sample sizes

Figure 6 indicates that satisfactory recognition performance may still be obtained using the algorithm when the sample size is small. Therefore, the algorithm proposed in this study exhibis strong robust performance.

Studies on the introduction of multi-time coding into the experiment of the classification system are rarely reported. Thus, the experiment in this section is compared with Lundén’s [11] and Ming Zhang’s [20]. The signal waveforms classified by the Lundén and Ming Zhang are similar to this paper, The simulation results are shown in Fig. 8.

Fig. 8
figure 8

Classification performance of overall waveform

As shown in Fig. 8, multi-time coding is introduced for the first time in the classification system; thus, the Lundén’s method exhibits poor performance in recognizing multi-time coding. Zhang Ming used the deep learning algorithm for the first time; thus, the performance is relatively good when SNR is above 0 dB.

The classification method proposed in this study still achieves a good recognition accuracy rate below 0 dB. Specifically, when using the VGG16 pre-training network model for feature fusion, satisfactory performance and stability can be obtained.

Table 4 presents the improvement in recognition performance by using the feature fusion algorithm and SVM classifier. Experimental results indicate that compared with only using the fc7 layer features, the feature fusion algorithm can obtain enhanced recognition performance.

Table 4 Performances of feature fusion and non-feature fusion

Figure 9 presents the detailed recognition results of eight waveforms under the −4 dB condition. The correct recognition rates of all models are more than 96% after 30 epochs of training. However, the recognition effect of VGG19 is relatively poor and followed by AlexNet. VGG19 is the deepest among the three network structures. Thus, the features extracted by VGG19 are relatively abstract, which can lead to overfitting during network training. VGG16 and AlexNet with a relatively shallow network structure can achieve good recognition results. We evaluate the recognition performance of transfer learning and feature fusion at a low SNR. The results show that the algorithm proposed in this study can be effectively used for the classification task of CWD images and further realize the recognition of different radar signals.

Fig. 9
figure 9

Confusion matrix for feature fusion algorithm. (a) feature fusion with AlexNet; (b) feature fusion with VGG-16; (c) feature fusion with VGG-19

5 Conclusion

We proposed an automatic recognition system for an LPI radar waveform. In order to reduce the influence of cross-terms, CWD is used for time–frequency analysis of the signals and a small-scale CWD image database is created for training CNNs. This method transforms the problem of abstract radar signal intra-pulse recognition into one of image classification and optimizes the superior performance of depth neural network in the field of image classification. Meanwhile, in order to address the problem that the deep CNNs is easy to over fit on a small-scale datasets, this study introduces transfer learning to fine-tune the pre-training network to adapt to this task. Experimental results indicate that the trained CNNs can automatically extract CWD image features, which solves the difficulties attributed to man-made features. Apart from this, this study proposes the feature fusion algorithm to combine the features extracted using the CNN with shallow feature fusion and thereby solve the problem of insufficient depth features in image expression. Lastly, we use the SVM classifier for LPI radar signal recognition. The recognition accuracy under different SNR indicates that the method can still achieve good recognition results at a low SNR. It provides a feasible solution for LPI radar signal recognition under low-SNR conditions.