1 Introduction

The sorting of radar signal is to separate the interleaved radar pulse signal flow from different sources. As an important part of electronic intelligence (ELINT) and electronic support measure (ESM) system, signal sorting affects the performance of electronic reconnaissance equipments directly, and is a key technology to campaign decision-making [22].

In general, parameters used in sorting algorithms are extracted from the pulses and mainly composed of direction of arrival (DOA), radio frequency (RF), pulse amplitude (PA), pulse width (PW) and time of arrival (TOA), which together form pulse descriptor word (PDW) [20]. Radar signal sorting approaches are mostly based on single parameter, parameter for parameter or multiple-parameter sorting, namely including three aspects as follows:

  1. (1)

    The sorting method relying on single parameter mainly focuses on the parameter of TOA by which the pulse repetition interval (PRI) can be obtained. Then some different algorithms could be used to perform the signal sorting with PRI [6, 28]. However, with more and more complex waveform design and electromagnetic environment, sorting with single parameter has some limitations and the sorting rate is not high either.

  2. (2)

    The parameter for the parameter sorting method is a serial regulation detection system. Pulse parameters such as RF, PA, PW and DOA, are compared with pulse groups that are confirmed in advance, and it should be detected that whether each parameter drops into the existent cell with certain tolerance range or not [25]. This method not only has a slow sorting speed but also is inefficient for sorting the parameters which are incomplete or under the influence of jamming or noise.

  3. (3)

    The multiple-parameter sorting method is usually performed with clustering method using several parameters in PDW [7, 8, 12]. This approach can acquire a better sorting result compared to those two sorting methods mentioned above.

As with the above aspects, the radar signal sorting method should mainly focus on the clustering sorting method. As the most efficient clustering method, the support vector clustering (SVC) algorithm [1] has been widely researched in both theoretical developments and practical applications due to its outstanding features such as in [9, 13, 15, 16, 18, 19, 21, 24]. Zhang et al. [27] deal with an application of SVC to radar emitter signal (RES) recognition. And in [7] the algorithm is used to perform the radar signal sorting and gets a good sorting result. However, as a common agreement the SVC algorithm is time-consuming (as shown in experiment result) because of its optimization problem and cluster labeling method [17].

This paper presents a new sorting algorithm: firstly using SVC algorithm to map the radar signal data points to a high dimensional feature space and form some separate clusters of points; then differentiating between points that belong to different clusters with cone cluster labeling (CCL) method, we regard these two steps as cone mapping support vector clustering (CMSVC); finally indicating the best sorting result efficiently with the proposed validity index of similitude entropy (SE), which assesses the compactness and separation of clusters with information entropy theory.

The paper is organized as follows. In Sect. 2 the signal sorting model based on SVC with the cluster labeling method of CCL is introduced. The proposed similitude entropy (SE) index for indicating the correct sorting clusters is derived, and then the radar signal sorting system based on CMSVC and SE index is presented in Sect. 3. Section 4 implements the sorting algorithm by adjusting the parameters of CMSVC and updating the dynamic library of radar signal clusters. In Sect. 5, several experimental results are presented and discussed. The paper is concluded in Sect. 6.

2 Cone Mapping Support Vector Clustering-Based Signal Sorting

In this section, we introduce the signal sorting model based on SVC with the cluster labeling method of CCL, namely based on the cone mapping support vector clustering (CMSVC) algorithm.

2.1 Sorting Model Based on Support Vector Clustering

While using the sorting methods based on clustering, the greatest obstacle of clustering sorting is that features used to cluster cannot obtain satisfactory separation of clusters. This drawback hinders the practical effects of clustering sorting. But support vector clustering (SVC) hold the edge to resolve the above issue.

The SVC is an unsupervised and non-parametric clustering algorithm, presented by Ben-Hur et al. [1] based on the support vector domain description (SVDD) algorithm. This method takes the support vector machine (SVM) as a tool to perform clustering and its basic idea is as follows: first, mapping the data points by means of a Gaussian kernel to a high dimensional feature space from data space; then looking for the optimum hypersphere that can enclose the image of the data in this new space; and finally mapping this hypersphere back to data space and forming a set of contours which enclose the data points. Theses contours are interpreted as cluster boundaries. Points enclosed by each separate contour are associated with the same cluster.

The SVC algorithm is the most effective method for unsupervised classification and has several unique advantages—generating cluster boundaries of arbitrary shape, enabling analyzing noisy data points and separating between overlapping clusters. This is what other clustering algorithms cannot accomplish. And the linear separable probability of data points is enhanced with the algorithm by a nonlinear transformation. As a result, the useful feature is probably recognized, extracted and magnified relatively [2, 26]. Meanwhile, by employing slack variables the SVC is able to deal with outliers and get rid of the influence of noise.

Let G⊆ℜ3 be a subset of a PDW, here G={RF,DOA,PW}, and {g i }⊆G be the radar signal pulse parameters vector, here i=1, 2,…,N, N denoting the number of vector. By mapping the parameters, namely data points to feature space we get a smallest closed sphere with the center a and the radius R. Thus the signal sorting problem can be converted to solve the optimization problem described by the constraints [1]

$$ \min R^{2} + C\sum_{i} \xi_{i}\quad\text{s.t.}\quad \bigl\| \varPhi(\boldsymbol{g}_{i}) - \boldsymbol{a} \bigr\| ^{2} \le R^{2} + \xi_{i} \& \& \xi_{i} \ge0 $$
(1)

where C is the constant penalty factor, Φ is a nonlinear transformation, Φ(g) denotes the image of g, ∥•∥ is the Euclidean norm and ξ i the slack variable.

To solve the problem, the Lagrangian is introduced as

$$ L = R^{2} - \sum_{i} \bigl(R^{2} + \xi_{i} - \bigl\| \varPhi( \boldsymbol{g}_{i}) - \boldsymbol{a} \bigr\| ^{2}\bigr) \beta_{i} - \sum_{i} \xi_{i} \mu_{i} + C\sum_{i} \xi_{i} $$
(2)

where the β i and μ i are Lagrange multipliers.

Setting to zero the derivate of formula (2), we can deduce the equivalent problem of formula (1) from the Wolfe dual form as

$$\begin{aligned} &\max_{\beta} \biggl( \sum_{i} \beta_{i} \varPhi^{T}(\boldsymbol{g}_{i})\varPhi( \boldsymbol{g}_{i}) - \sum_{i,j} \beta_{i}\beta_{j}\varPhi^{T}(\boldsymbol{g}_{i}) \varPhi(\boldsymbol{g}_{j}) \biggr) \\ &\text{s.t.}\quad 0 \le \beta_{i} \le C\& \& \sum_{i} \beta_{i} = 1 \end{aligned}$$
(3)

In this paper we use the Gaussian kernel \(K(\boldsymbol{g}_{i},\boldsymbol{g}_{j}) = \varPhi ^{T}(\boldsymbol{g}_{i})\varPhi (\boldsymbol{g}_{j}) = e^{ - q\| \boldsymbol{g}_{i} - \boldsymbol{g}_{j} \|^{2}}\) with width parameter q to present the dot products as a Mercer kernel [1]. Incorporating the constraints of formula (2), the distance between image points and the center of sphere is defined as

$$ R^{2}(\boldsymbol{g}) = \bigl\| \varPhi(\boldsymbol{g}) - \boldsymbol{a} \bigr\| ^{2} = 1 - 2\sum _{j} \beta_{j}K(\boldsymbol{g}_{j}, \boldsymbol{g}) + \sum_{i,j} \beta_{i} \beta_{j}K(\boldsymbol{g}_{i},\boldsymbol{g}_{j}) $$
(4)

In view of (3) and (4) we get the optimum β i and the optimum hypersphere radius R. The contours defined by the set {g:R(g)=r} are formed in the data space.

Combining with the Karush–Kuhn–Tucker (KKT) complementarity condition [14] we conclude:

  1. (1)

    A point Φ(g i ) only with β i ≠0 can be used to define the optimum center, then we have:

    1. (1)

      A point Φ(g i ) with 0<β i <C is mapped to the surface of the feature space sphere. The points as g i will be called Support Vectors (SVs) which satisfy R(v i )=R and lie on cluster boundaries. The points lying on the same boundary contour in data space confirm the cluster boundary of the same type radar signal instances, and the boundary shape depends on kernel function.

    2. (2)

      A point Φ(g i ) with β i =C is mapped to the outside of the feature space sphere. The points as g i will be called Bounded Support Vectors (BSVs), which imply r>R and lie on external contours. These points denote the outliers or noise.

  2. (2)

    A point Φ(g i ) with β i =0 is mapped to the inside of the feature space sphere. The points as g i lie on internal contours. And the same internal contours confirm the same type radar signal instances.

2.2 Support Vector Clustering with Cone Clustering Labeling Method

In order to solve the problem that contours cannot differentiate between points that belong to different clusters, the complete graph (CG) and support vector graph (SVG) cluster labeling methods are discussed in [1]. The improved method proximity graph (PG) and gradient descent (GD) are proposed in [11, 23]. But these methods still take on a high time complexity because they sample a line segment to decide whether a pair of data points is in the same cluster. So these algorithms cannot meet the real-time and accuracy requirements of sorting radar signals in the dense and complex electromagnetic environment.

For decreasing the consuming time of the cluster labeling method ulteriorly, the cone clustering labeling (CCL) algorithm in [10] is applied to label clusters. This algorithm avoids the clustering inaccuracy caused by calculating the adjacent matrix with sampling a path between two data points which are in the same cluster while decreasing the time complexity.

The main idea of this algorithm is to find cones which include the image of SV, namely Φ(v i ), and cover a key portion of the minimal hypersphere in feature space. In data space these cones correspond to hyperspheres. And a data point g j with the image of Φ(g j ) which is inside of the cones lies inside of the hyperspheres. The data points as g j possess strong intra-class compactness and satisfy ∥v i g j ∥≤∥v i Φ −1(a)∥. The union of the hyperspheres forms an approximate covering of the data space contours. All the SVs are clustered and then the data points corresponded to SVs in the same class form a single cluster using the approximate covering. Thus the remaining data points can be easily clustered.

The formula ∥v i Φ −1(a)∥=x 1/2 is deduced in [10], where \(x = - \ln(\sqrt{1 - R^{2}} ) / q\). Setting Z=∥v i Φ −1(a)∥, then the main CCL algorithm can be described as follows:

  1. Step 1.

    Compute Z for q.

  2. Step 2.

    Compute the Euclid distance between pairs of SVs. Iff the distance is less than 2Z, the data points pairs lie in the same cluster.

  3. Step 3.

    Repeat the step 2 until finishing the SVs clustering.

  4. Step 4.

    Compute the distance, namely d, between the rest of data g j and SV, and then rank the g j to the class which contains the nearest SV from g j .

  5. Step 5.

    Repeat the step 4 until all the remaining data points are clustered.

3 Radar Signal Sorting System Based on CMSVC and SE Index

Traditional cluster validity index cannot evaluate the clustering results well to find the partitioning that best fits the radar data as the pulse signal flow is badly interleaved. So a new-type validity index is required to fit the complicated signal environment. The radar signals with period repetition from one emitter strongly resemble themselves, and the ones from different emitters have poor resemblance. Based on this, the resemblance degree can be used to depict the resemblance by calculating the resemblance coefficient between the signal flows. The resemblance coefficient is defined as

$$ S(\boldsymbol{g}_{1},\boldsymbol{g}_{2}) = \frac{\boldsymbol{g}_{1}^{T}\boldsymbol{g}_{2}}{\| \boldsymbol {g}_{1} \|\| \boldsymbol{g}_{2} \|} $$
(5)

where {g 1,2(u), u=1,2,3} denotes the sample vector of two emitter signals, ∥g∥=(∑g 2(u))1/2, S∈(0,1]. S can be viewed as the resemblance probability of two sample vectors. Under this consideration, in order to get better clustering results, we always expect that the within-cluster resemblance should be as strong as possible after clustering and between-cluster resemblance should be as poor as possible. According to the physical significance of information entropy and the characteristics of radar pulse signal, the information entropy can be combined with the resemblance among signals. The poorer resemblance the sample vectors have, the bigger value the entropy is, and vice versa. Thus, we try to introduce the information entropy index to represent the resemblance level of clustering signals. The index is proportional to within-cluster resemblance and inversely proportional to between-cluster resemblance. Within-cluster resemblance is denoted by the within-cluster similitude entropy H comp(C) and between-cluster resemblance by the between-cluster similitude entropy H sep(C). The index is called the similitude entropy (SE), which is defined as

$$ \mathrm{SE}=H_{\mathrm{sep}}(C)/H_{\mathrm{comp}}(C) $$
(6)

where H sep(C) denotes the between-cluster similitude entropy and H comp(C) denotes within-cluster one. These two variables are expressed as follows:

$$ H_{\mathrm{sep}}(C) = \sum_{k = 1}^{c} \min_{l = 1,\ldots, c,l \ne k} \{ H_{lk} \}, \qquad H_{\mathrm{comp}}(C) = \frac{1}{c}\sum_{k = 1}^{c} H_{kk} $$
(7)

where

$$\begin{aligned} &H_{lk} = - \frac{1}{n_{k}}\sum_{i = 1}^{n_{k}} S_{l,ki}\log S_{l,ki} \end{aligned}$$
(8)
$$\begin{aligned} &S_{l,ki} = S(\boldsymbol{m}_{l},\boldsymbol{g}_{ki}) = \frac{\boldsymbol{m}_{l}^{\boldsymbol{T}}\boldsymbol{g}_{ki}}{\| \boldsymbol{m}_{l} \|\| \boldsymbol{g}_{ki} \|} \end{aligned}$$
(9)

where m l denotes the center of cluster C l , g ki denotes the ith sample vector in cluster C k , and S l,ki denotes the resemblance coefficient between m l and g ki , respectively. When l=k, H lk denotes the within-cluster similitude entropy of C l . When lk, H lk denotes the within-cluster similitude entropy between C l and C k .

Hence, a greater value of SE predicates a better compactness and separation of the clusters. So by maximizing the value of SE the correct clustering result could be determined. Then we run CMSVC algorithm with different value of q and get different class numbers of c, c∈[2,N−1]. In the spirit of the analysis mentioned above we conclude that the similitude entropy value should satisfy SE=max{SE c , 2≤cN−1}.

It can be viewed as a cluster validity index whereas the SE index can depict the within-cluster compactness and between-cluster separation efficiently while the PDW subset G is partitioned to c classes under the term SE index. Meanwhile, in view of the low time complexity of CCL algorithm, we propose a sorting method for radar signal aggregating CMSVC algorithm and SE index as shown in Fig. 1. The main steps are as follows:

  1. Step 1.

    Extract the subset G of PDW by a real-time processing method in subsection, where i denotes the ith extracting operation and G={RF,DOA,PW} i .

  2. Step 2.

    Pre-sort the sample vectors constituted with RF, DOA and PW via running CMSVC algorithm and adjust the parameters of q, the Gaussian kernel, and C, the soft margin constant with SE index.

  3. Step 3.

    Sort the sample vectors of G using optimum parameters for final output which would be processed according to SE index once more and update the dynamic library of radar signal clusters.

Fig. 1
figure 1

Framework of radar signal sorting system based on the combination of CMSVC and SE index

4 Algorithm Implementation

In this section we introduce the parameters adjustment algorithm with the proposed validity index SE and perform the CMSVC algorithm with these optimal parameters. Finally, the dynamic library of radar signal clusters is updated using the SE index.

4.1 Adjust Parameters Using SE Index

In SVC sorting algorithm, as q increases the cluster boundaries become more and more rough, and forming an increasing number of clusters, the same as in CMSVC. Thus, The SVC algorithm runs from a small value of q which increases by using the heuristic algorithm. And the heuristic algorithm is used to get the optimal value of q. In the beginning starting out with the penalty factor C=1, we do not allow for any outliers or need to handle the BSVs. If the number of SVs is excessive, or a number of singleton clusters form when we heuristically increase the value of q, the clusters should contain BSVs. One should decrease the value of C to smooth cluster boundaries and avoid the impact on clustering accuracy caused by BSVs [1]. Thus the correct radar signal sorting result is observed. The parameter adjustment algorithm with SE index is typically implemented by the following steps:

  1. Step 1.

    Initialize the value of q as q=1/max ij g i g j 2.

  2. Step 2.

    Perform the CMSVC algorithm with the parameter of q and obtain a temporary clustering result.

  3. Step 3.

    Evaluate the following two conditions according to the clustering result in Step 2:

    1. (1)

      whether the number of SVs is excessive;

    2. (2)

      whether a number of singleton clusters form.

    If any one of conditions (1) or (2) occurs, we heuristically decrease the value of C and go to Step 2, otherwise go to Step 4.

  4. Step 4.

    Compute the SE validity measure according to the clustering result obtained in Step 2.

  5. Step 5.

    If the maximum SE is obtained, go to Step 6. Otherwise heuristically increase the value of q and go to Step 4.

  6. Step 6.

    Identify the clustering parameters of q and C and perform the clustering algorithm with these two parameters again. Then stop and output the best clustering result.

4.2 Update the Dynamic Library of Radar Signal Clusters

When the real-time radar signal sorting method is implemented, it is necessary to compare the data clusters between existing ones in dynamic library and new obtained ones. Via this operation it can decide whether one cluster in the new clusters or in existing ones belongs to the same class. We solve this problem using the following method. Assuming that the new clusters set obtained in Sect. 4.1 is U′={C l , l=1,2,…,c′}, and the existing clusters set in dynamic library is U={C k , k=1,2,…,c}. Then the algorithm of updating the dynamic library of radar signal clusters is as follows:

  1. Step 1.

    Initialize l=1.

  2. Step 2.

    Calculate the SE validity measure between new cluster C l and clusters set U according to (8), and these values are expressed as {H lk , k=1,2,…,c}.

  3. Step 3.

    Sort the H lk ascendingly, namely \(H_{lt_{1}}<H_{lt_{2}}<\cdots H_{lt_{p}} \dots<H_{lt_{c}}\), where {t 1,t 2,…,t p ,…,t c } denotes a rank of k.

  4. Step 4.

    Initialize p=1.

  5. Step 5.

    Calculate the SEold of clusters set U firstly, then incorporate the new cluster C l into the existing cluster \(C_{t_{p}}\) and form a cluster of \(C'_{t_{p}}\), and calculate the SEnew of clusters set U after replacing \(C_{t_{p}}\) with \(C'_{t_{p}}\). If SEnew≥SEold, the incorporation is confirmed and go to step 6, otherwise retract the incorporation. When the incorporation is retracted, if p<c, set p=p+1 and repeat step 5, otherwise add the new cluster C l to clusters set U and set c=c+1.

  6. Step 6.

    If all the new clusters have been processed, stop the algorithm; otherwise set l=l+1 and go to step 2.

5 Simulation Experiment Result

In order to verify the efficacy of the proposed SE index and the low time complexity of the CCL algorithm, a series of radar pulse signal data is simulated. The experiments data used in this paper include 10116 radar pulse signals. The simulated data after preprocessing is as in Table 1. Here, we compare the validity measures, the DB [4], Dunn [5] and PS index [3] and the proposed SE validity measure to illustrate the performances. We adopt the datasets of data1, data2 and data3 from the simulated pulse datasets of the sample vectors of 1–200, 201–400 and 401–600 to compare the performances. And the result is as in Table 2. Meanwhile more experiments are conducted in Table 3. The accuracy and consuming time are compared in Table 4 using the different cluster labeling method of CG, SVG, PG, GD and CCL under the guidance of SE index. In Table 4 the accuracy is equal to (nn 1n 2)/n, where n denotes the number of all sample vectors, n 1 denotes that of the missing vectors when the datasets are clustered and n 2 denotes that of vectors clustered incorrectly. The worst-case asymptotic time complexity means sorting signals without BSVs.

Table 1 The radar pulse signal parameters information
Table 2 The optimal number of clusters and value of q comparison using several validity indices
Table 3 The radar pulse signal parameters information from Q. Guo
Table 4 Accuracy and runtime comparison using several cluster labeling methods under SE

It is proved that when the DB and PS index is minimized, the crisp and compact clusters in the data can be obtained in [4] and [3] using the clustering method. As an indication of the optimal clustering scheme, the Dunn presented in [5] is the point at which it takes its maximum value.

However, from the above experimental comparison, we affirm that using these three indices we cannot well determine the optimal value of q or the correct number of clusters that exist in our dataset of data1, data2 and data3, which means the dense radar signal pulse flow cannot be sorted efficiently with these three indices. For instance, Index Dunn proposes the partitioning of data1 into three clusters as the best partitioning while the correct number of clusters fitting the dataset is two. Moreover, the Indices DB and Dunn select the clustering scheme of three and clusters, respectively, for data2 which actually contains four clusters. In the case of data3, the Indices DB and PS select two and five clusters, respectively, as the optimal scheme while the correct number is four.

On the contrary to DB, Dunn and PS indices, SE finds the correct number of clusters for all these three datasets. It means that the SE validity index can indicate the best sorting result efficiently for radar signal under the complicated electromagnetic environments.

While in order to verify further that the strategy can improve efficiency without sacrificing sorting accuracy, we introduce Q. Guo’s pulse data which include 5687 radar pulse signals. These supplementary data are as in Table 3. Based on this, more experiments are conducted. These signals are sorted with the method of Multiple-parameter Radar Signal Sorting. And the parameters are adjusted using the SE index.

Here the first group data including 10116 radar pulse signals is called “G 1st”, and the second group data including 5687 radar pulse signals “G 2nd”. After several statistics about the sorting results, accuracy, and runtime comparisons with other cluster labeling methods are as in Table 4.

Table 4 illustrates that the computing efficiency is enhanced extremely when we sort the signals with CCL cluster labeling method compared to CG, SVG, PG and GD method. Meanwhile it can keep the sorting accuracy when the CCL algorithm enhances the computing efficiency compared to CG cluster labeling method used in [7]. In addition, compared to the others CCL method decreases the consuming time a lot as the dataset become large. To sum up, the proposed method keeps the advantage of sorting signal with SVC; and that it is more suitable for the radar signal processing system.

6 Conclusions

For the purpose of reducing the processing time of the radar signal sorting, enhancing the sorting accuracy and meeting the real-time and accuracy requirements of electronic warfare, we introduce the CMSVC algorithm into the radar signal sorting in complicated electromagnetic environments. Meanwhile we use information entropy theory to verify the clustering validity and update the library of radar signal clusters dynamically. And we conduct performance simulations for both typical methods and our proposed method. Experimental results show that the proposed method can reduce the computing complexity to decrease the consumed time in course of sorting the signals and also it can keep the sorting accuracy.