Keywords

1 Introduction and Background

What an irony of fate when Robert Watson-Watt was pulled over in a RADAR (RA dar D etection A nd R anging) speed trap during his visit in Canada in the late 1950s. He joked that had he known radar would be used for speed traps, he would never have invented it. Nowadays, this is what most people associate the radar with, but when Watson-Watt invented his primitive radar system in the mid 1930s, it was secretly developed for military purposes. Later, in 1940, it played a vital role in the Battle of Britain, providing early warning of incoming Luftwaffe bombers. During the World War II, USA scientists made the Watson-Watt’s radar a lot smaller, more efficient and reliable. This made possible a compact radar unit to be used for warning fighter pilots of enemy aircraft approaching from behind. Also, four of these units were carried on each of the nuclear bombs dropped over Hiroshima and Nagasaki to monitor the bomb distance to the ground, so that detonation could be triggered at a pre-set altitude for maximum destruction. Vigorous development of radar technology after the war led to a wide range of military applications for detecting, locating, tracking, and identifying objects, for surveillance, navigation and weapon guidance purposes for terrestrial, maritime, and airborne systems at small to medium and large distances (from ballistic missile defence systems to fist sized tactical missile seekers) [1].

Later, civilian applications emerged and became wide-spread. This began in air traffic control systems to guide commercial aircrafts in the vicinity of the airports and during their flight and in the sea navigation, used by ships in maritime collision avoidance systems. Nowadays, radars are beginning to serve the same role for the automobile and trucking industries in self-braking systems in cars, crash avoidance and parking assist [2, 3].

Police traffic radar are used for enforcing speed limits; airborne radars are used not only for weather forecast, large-scale weather monitoring, prediction and atmospheric research, but also for environmental monitoring of forestry conditions and land usage, water and ice conditions, pollution control, etc.; space-born (both satellite and space shuttle) serve for space surveillance and planetary observation; in sport they are used for measuring the speed of tennis and baseball serves [1].

A basic block-scheme of a radar system is shown in Fig. 1. Radars are considered to be “active” sensors, as they use their own source of illumination (a transmitter) for locating targets. They transmit energy towards a target and then catch the reflected signal to identify the target. The problem is that (especially for a long range radars) a powerful transmitter and very sensitive receiver are needed because the energy spreads out on its way to the target, scatters on reflection and further spreads out on its way back (in general, the decrease of the received signal is proportional to the fourth power of the target distance). The radars range, resolution and sensitivity are generally determined by their transmitter and waveform generator. Although the typical radar systems operate in the microwave region of the electromagnetic spectrum with frequency range of about 200 MHz to about 95 GHz (with corresponding wavelengths of 0.67 m to 3.16 mm), there are also radars that function at frequencies as low as 2 MHz and as high as 300 GHz [4].

Fig. 1
figure 1

A block diagram of a basic radar system. Radars operate by transmitting electromagnetic energy toward targets and processing the observed echoes

The application of the Doppler effect revolutionized the cosmology enabling Doppler spectroscopy to become a powerful tool for finding extrasolar planets and proving the expansion of the universe (the light spectrum of stars (or galaxies) receding from us exhibits redshift (increased bandwidth and reduced frequencies), and blueshift (higher frequencies and lower bandwidth) if they are moving towards us), but also expanded dramatically the use of radiolocation radars. For the Doppler radars, the reflection from an approaching target electromagnetic wave exhibits higher frequency than the transmitted one and vice versa, a moving away target returns lower frequency wave. The difference between the sent and received frequencies can then be used to estimate the target speed. The problem is that this difference is a very small one, e.g., an incoming target with a 100 km/h increases the received frequency by less than 1e-6, which needs very precise circuits to measure.

A Doppler weather radar with a parabolic antenna situated within a large tiled dome is shown in Fig. 2 [5]. A system with such a radar can measure the distance and lateral speed of falling rain drops, hail particles, or snowflakes, allowing forecasters to predict storms’ evolving locations. The presence of debris in the air is used in similar radar systems to detect tornadoes and define their location, velocity and direction, allowing projections of their movement in real time.

Fig. 2
figure 2

A Doppler weather radar (Photo Brownie Harris/Corbis)

The classical radar imaging uses its antenna to focus a radio frequency beam on a target and capture its reflection to create the image. To work over a long-range it requires powerful transmitters and sensitive receivers because of the way the transmitted energy spreads out on its way to the target and then scatters on reflection. Also, to achieve higher resolution of the image, it needs narrower beams which means that the airborne or space-born platform will need much larger antenna than it could carry. The application of a synthetic aperture technique solves this problem by enabling the use of a smaller antenna through simulating a virtual one with aperture defined by the travel distance of the physical antenna.

The use of the Doppler effect further enhanced the angular resolution in synthetic-aperture radars (SAR) [6] enabling them to acquire surprisingly clear and crisp images [7]. The SAR have been long used on planes and satellites (Fig. 3) for military reconnaissance, mapping ground terrain with intelligence imagery, revealing enemy facilities for enhancing situational awareness and all this in any type of weather, in total darkness and through cloud cover and foliage [8, 69]. They also proved to be very useful in diverse range of civil applications, e.g., in earthquake damage assessment [9], ice [10] and snow monitoring [11], oceanography, polar ice caps and coastal regions imagery, oil pollution monitoring, solid earth science, hydrology, ecology and planetary science [12, 13].

Fig. 3
figure 3

JAXA’s ALOS-2 Earth-observation radar sat may help the Japanese navy keep track of ship movements in the region. Photo JAXA Concept

Another type developed especially to look underground and through walls is the Ground-penetrating radar (GPR), also known as surface-penetrating radar (SPR) [14]. GPR has recently proved to be efficient non-invasive technology with applications in archaeology [15, 16], mining–for both identifying underground rock strata and monitoring instabilities [14, 17], and for optimal irrigation and pollution monitoring [18, 19]. It has been also used for helping police, emergency response and firefighters ‘to see’ through building walls to locate hostages or help people trapped by fire or under a rubble of a collapsed building [20]. Its ability to see under surface metallic and non-metallic objects makes it useful mapping tool for detection and localisation of underground cables and pipes [21], and buried objects of historical and archaeological importance [22].

The IEEE standard letter nomenclature for the common nominal radar bands is given in Table 1, [23]. The millimetre wave band is sometimes further decomposed into approximate sub-bands of 36–46 GHz (Q band), 46–56 GHz (V band), and 56–100 GHz (W band). The lower frequency bands are usually preferred for longer range surveillance applications due to the low atmospheric attenuation and high available power, and vice versa the higher frequencies tend to be used for shorter range applications and higher resolution, due to the smaller achievable antenna beam widths for a given antenna size, higher attenuation, and lower available power [1]. The radars from the first category (considered a form of radar radiolocation) are capable of covering distances of up to hundreds of kilometres (using high-power transmitters concentrated in a relatively narrow radio bandwidth) and the second group covers radar systems that operate at low power levels, over much smaller distances.

Table 1 Letter nomenclature for nominal radar frequency bands (IEEE, 2003)

Based on their characteristics, features and application areas, radars can be classified in terms of the following criteria [24]:

  • purpose and function: surveillance, tracking, guidance, reconnaissance, imaging, data link;

  • frequency band: radar systems have been operating at frequencies as low as 2 MHz and as high as 300 GHz (see Table 1). Criteria for frequency selection for surveillance radar can be found in [4, 25];

  • waveform: continuous wave, pulsed wave, digital synthesis;

  • beam scanning: fixed beam, mechanical scan (rotating, oscillating), mechanical scan in azimuth, electronic scan (phase control, frequency control and mixed in azimuth/elevation), mixed (electronic-mechanical) scan, multi-beam configuration;

  • location: terrestrial (stable, mobile), marine-borne, air-borne, space-borne;

  • spectrum of collected data: range (delay time of echo), azimuth (antennae beam pointing, amplitude of echoes), elevation (3D—radar, multifunctional, tracking), height (derived by range and elevation), intensity (echo power), radar cross section (RCS)—(derived by echo intensity and range), radial speed (measurement of differential phase along the time on target due to the Doppler effect—it requires a coherent radar), polarimetry (phase and amplitude of echo in the polarisation channels: horizontally transmitted—HH, horizontally received—HV, VH, VV), RCS profiles along range and azimuth (high resolution along range, imaging radar);

  • configuration: monostatic (same antenna with co-located transmitter and receiver), bi-static (two antennas), multistatic (one or more spatially dispersed transmitters and receivers). Further detail on variety of radar configurations can be found in [26];

  • signal processing: coherent (Moving Target Detector/Pulse-Doppler/Super-resolution Signal Processor/Synthetic Aperture Radars (SAR)), non-coherent (integration of envelope signals, moving window, adaptive threshold (Constant False Alarm Rate (CFAR)) and mixed [6];

  • transmitter and receiver technologies: antenna—reflector plus feed, array (planar, conformal), corporate feed; transmitter—magnetron, klystron, wideband amplifiers (high-power travelling wave tubes (TWT)), solid state; and receiver—analogue and digital technologies, base band, intermediate frequency sampling, low-power TWT;

  • area of application: large-scale weather forecast and monitoring, air traffic control and guidance (terminal area, en route, collision avoidance, airport apron); police traffic radar used for enforcing speed limits; air defence; anti-theatre ballistic missile defence; vessel traffic surveillance; remote sensing (application to crop evaluation, geodesy, astronomy, defence); environmental monitoring of forestry conditions and land usage; pollution control; geology and archaeology (ground penetrating radar); meteorology (hydrology, rain/hail measurement); study of atmosphere (detection of micro-burst and gust, wind profilers); space-born altimetry for measurement of sea surface height; acquisition and tracking of satellites; monitoring of space debris; marine—navigation and ship collision avoidance; others [5, 1215].

Radar detection, classification and tracking of targets against a background of clutter and interference are considered as “the general radar problem”. For military purposes, the general radar problem includes searching for, interception, localisation, analysis and identification of radiated electromagnetic energy, which is commonly known as radar Electronic Support Measures (ESM). They are considered to be a reliable source of valuable information regarding threat detection, threat avoidance, and, in general, situation awareness for timely deployment of counter-measures [27, 28]. A list of ESM abbreviations is given in Table 2.

Table 2 Commonly adopted ESM abbreviations

A real-time identification of the radar emitter associated with each intercepted pulse train is a very important function of the radar ESM. Typical approaches include sorting incoming radar pulses into individual pulse trains [29], then comparing their characteristics with a library of parametric descriptions, in order to get list of likely radar types. This can be very difficult task as there may be radar modes for which there is no record in ESM library; overlaps of different radar type parameters; increases in environment density (e.g., Doppler spectrum radars, transmitting hundreds of thousands of pulses per second); agility of radar features, such as radio frequency and scan, pulse repetition interval, etc.; multiplication and dispersion of the modes for military radars; noise and propagation distortion that lead to incomplete or erroneous signals [30].

1.1 Neural Networks in Radar Recognition Systems

There are wide variety of approaches and methods used for radar emitter recognition and identification. For example, [31] investigate a specific emitter identification technique applied to ESM data and by analysing the radar pulses try to extract unique features for each radar, which can be later used for identification. A wavelet transform is employed in [32] for the feature extraction phase in radar signal recognition, as in [33], where they use it before employing probabilistic support vector machines SVMs for the radar emitter recognition task. SVMs are also used in [8, 34] for solving a similar problem. In [35] the authors focus their research on the estimation of a common modulation from a group of intercepted radar pulses and use it as a basis for specific emitter identification. A variety of novel radar emitter recognition algorithms, incorporating clustering and competitive learning, and investigating their advantages over the traditional methods are proposed in [32, 3642, 7073].

Among those approaches, a considerable part of the research in the area incorporates NN, due to their parallel architecture, fault tolerance and ability to handle incomplete radar type descriptions and inconsistent and noisy data [43]. NN techniques have previously been applied to several aspects of radar ESM processing [28], including Pulse Descriptor Word (PDW) sorting [44, 45] and radar type recognition [46]. More recently, many new radar recognition systems include NNs as part of a clutter reduction system to improve the information managed by automatic identification systems, such as the detection, positioning, and tracking of surrounding ships [47], or as a key classifier [4852]. Some examples of NN architectures and topologies used for radar identification recognition and classification based on ESM data include Multilayer Perceptron (MLP) [43], Radial Basis Function (RBF) neural networks as a signal detector [46, 53], a vector neural network [54], and a single parameter dynamic search neural network [50].

In many cases, the NNs are hybridised with other techniques, including fuzzy systems [55], clustering algorithms [29, 56], wavelet packets [32, 57], or Kalman filters [30]. When implementing their “What-and-Where fusion strategy” [30] use an initial clustering algorithm to separate pulses from different emitters according to position-specific parameters of the input pulse stream, and then apply fuzzy ARTMAP (based on Adaptive Resonance Theory (ART) neural network) to classify streams of pulses according to radar type, using their functional parameters. They also complete simulations with a data set that has missing input pattern components and missing training classes and then incorporate a bank of Kalman filters to demonstrate high-level performance of their system on incomplete, overlapping and complex radar data. In [48] higher order spectral analysis (HOSA) techniques are used to extract information from low probability of intercept (LPI) radar signals to produce 2D signatures, which are then fed to a NN classifier for detecting and identifying the LPI radar signal. The work presented in [49] investigates the potential of NNs (MLPs) when used in Forward Scattering Radar (FSR) applications for target classification. The authors analyse collected radar signal data and extract features, which are then used to train NN for target classification. They also apply K-Nearest Neighbour classifier to compare the results from the two approaches and conclude that the NN solution is superior. In [58] an approach combining rough sets (for data reduction) and NN as a classifier is proposed for radar emitter recognition problem, while [59] combines wavelet packets and neural networks for target classification.

The common denominator of all referenced approaches is that they use predominantly supervised NN learning. This means that there is an available data set (or it is on-line collected), on which the NN can be trained and later used to determine the type of the radar emitters detected in the environment. During the training, the NN is presented with labelled samples from the available dataset and the NN weights are adjusted in order to minimise the difference between the NN output and the available target (supervised learning). This difference is expressed by an error function that is minimised by adjusting the NN weights. One of the most popular methods for training is backpropagation (BP), but, as it uses Newton and quasy-Newton deterministic minimisation methods, it could become trapped in a local minimum and in this way to converge to a suboptimal training. Another drawback of the BP algorithm is that it can, sometimes, be slow and unstable. After training, the NN is tested for its ability to generalise, in other words, its ability to correctly classify samples that have not been shown during the learning process.

Among other considerations, the complexity of the training includes selecting the way of showing the samples to the network (i.e. how the training data set is organised and presented to the NN—‘batch mode’, ‘on-line mode’, etc.). Another important question is when to stop the training—achieving a zero error function does not always lead to an optimal training. The reality shows that at some point of the learning process, the NN starts to memorise rather than to generalise—this happens when the NN starts to overfit. In order to avoid the overfitting, an additional data subset (called validation subset), is used in parallel with the training set. Initially, the errors on both sets will decrease, but at some point the validation error will start to rise, while the training error will continue to decrease. This point is an indication of overfitting and the training should be stopped, with the current weights assumed to be optimal. This training approach is known as split sample training, where the available dataset is split in training, validation and testing subsets. There are also other training approaches, such as k-fold crossover, or bootstrapping, each with their own specific advantages and drawbacks [43]. One advantage of the k-fold crossover, for example, is that it can be applied when limited number of samples is available for training.

In addition, often before approaching training, the available data set needs to be pre-processed, e.g., [60] use feature vector fusion before feeding the NN classifier. Radar signal processing has specific features that differentiate it from most other signal processing fields. Many modern radars are coherent, meaning that the received signal, once demodulated to baseband, is complex-valued rather than real-valued and as it can be seen from Table 2, many of the collected data is categorical. Another specificity of the radar data sets is that there are usually many missing or incomplete data. Therefore, the problems of representation and statistical pre-processing of the available dataset are very important steps that need to be considered, before starting the actual training. This may also include transformation techniques, such as linear discriminant analysis and principal component analysis, in order to reduce the dimensionality of the problem and dispose of redundant information in the dataset.

1.2 Dealing with Missing Data

According to statistical analysis, the nature of missing data can be classified into three main groups [6163]: missing completely at random (MCAR), where the probability that an observation is missing is unrelated to its value or to the value of any other variables; missing at random (MAR)—that missingness does not depend on the value of the observed variable, but on the extent of the missingness correlation with other variables that are included in the analysis (in other words, the cause of missingness is considered); and missing not at random (MNAR)—when the data are not MCAR or MAR (missingness still depends on unobserved data). The problem associated with MNAR is that it yields biased parameter estimates, while MCAR and MAR analysis yield unbiased ones (at the same time the main consequence of using MCAR is loss of statistical power), [63].

Dealing with missingness requires an analysis strategy leading to least biased estimates, while not losing statistical power. The problem is these criteria are contradictory and in order to use the information from the partial data in samples with missing data (keeping up the statistical power), and substituting the missing data samples with estimates, inevitably brings bias.

The most popular approaches in dealing with missing data generally fall in three groups: Deletion methods; Single imputation methods; and Model-based methods [62, 64, 65].

Deletion methods include pairwise and listwise deletion. The pairwise deletion (also called “unwise” deletion) keeps as many samples as possible for each analysis (and in this way uses all available information for it), resulting in incomparable analysis, as each is based on different subsets of data, with different sample sizes and different standard errors. The listwise deletion (also known as complete case analysis) is a simple approach, in which all cases with missing data are omitted. The advantages of this technique include comparability across the analyses and it leads to unbiased parameter estimates (assuming the data is MCAR), while its main disadvantage is that there may be substantial loss of statistical power (because not all information is used in the analysis, especially if a large number of cases is excluded).

The single imputation methods include mean/mode substitution, dummy variable method, and single regression. Mean/mode substitution is an old procedure, currently rejected due to of its intrinsic problems, e.g., it adds no new information (the overall mean stays the same), reduces the variability, and weakens the covariance and correlation estimates (it ignores relationship between variables). The dummy variable technique uses all available information about missing observation, but produces biased estimates. In the regression approach, linear regression is used to predict what the missing value should be (based of the available other variables) and then uses it as an actual value. The advantage of this technique is that it uses information from the observed data, but overestimates the model fit and the correlation estimates, and weakens the variance [62].

Most popular, “modern” model-based approaches, fall into two categories: multiple imputation (MI) and maximum likelihood (ML) methods (often referred to as full-information maximum likelihood), [63]. Their advantage is that they model the missingness and give confidence intervals for estimates, rather than relying on a single imputation. If the assumption for MAR holds, both groups of methods result in unbiased estimates (i.e., tend to “preserve” means, variances, co-variances, correlations and linear regression coefficients) without loss of statistical power.

ML identifies a set of parameter values that produces the highest (log) likelihood and estimates the most likely value that would result in the observed data. It has the advantage that both complete and incomplete cases are used, in other words, it utilises all of the information and produces unbiased parameter estimates (with MCAR/MAR data). The MI approach involves three distinct steps: first, sets of plausible data for the missing observations are created and these sets are filled in separately to create many ‘completed’ datasets; second, each of these datasets is analysed using standard procedures for complete datasets; and thirdly, the results from previous step are combined and pooled into one estimate for the inference. The aim of the MI process is not just to fill in the missing values with plausible estimates, but also to plug in multiple times these values by preserving important characteristics of the whole dataset. As with most multiple regression prediction models, the danger of overfitting the data is real and can lead to less generalisable results than would have been possible with the original data [66].

The advantage of the MI technique is that it provides more accurate variability by making multiple imputations for each missing value (it considers both variability due to sampling and variability due to imputation) and its disadvantage is that it depends on the correctly specified model. Also, it requires cumbersome coding, but the latter is not an issue due to the existence of easy to use off-shelf software packages. For the purpose of this investigation, a free, open source R statistical software is used.

2 Data Analysis

For the purpose of this research, a data set composed of 29,094 intercepted generic data samples is used. Each of the captured signals is pre-classified by experts in one of 26 categories, in regards to the platform that can carry the radar emitter (aircraft, ship, missile, etc.) and in one of 142 categories, based on the functions it can perform (3D surveillance, weather tracking, air traffic control, etc.).

Each data entry represents a list of 12 recorded pulse train characteristics (signal frequencies, modulation type, pulse repetition intervals, etc. that will be considered as input parameters), a category label (specifying the radar function and being treated as system output) and a data entry identifier (for reference purposes only) (Table 3).

Table 3 Sample radar data subset

A more comprehensive summary of the data distribution is presented in Table 4, where an overview of the type, range and percentage of missing values for the recorded signal characteristics is given. The collected data consists of both numerical (integer and float) and categorical values, therefore coding of the categorical fields to numerical representations will be required during the data pre-processing stage. Also, due to the large number of missing values for some of the parameters, approaches for handling of missing data will be considered.

Table 4 Data description and percentage of missing values

3 Data Pre-processing

The pre-processing of the available data is of a great importance for the subsequent machine learning stage and usually can significantly affect the overall success or failure of the application of a given classification algorithm. In this context, the main objective of this stage is to analyse the available data for inconsistencies, outliers and irrelevant entries and to transform it in a form that could facilitate the underlying mathematical apparatus of the machine learning algorithm and lead to an overall improvement of the classifier’s performance.

3.1 Data Cleaning and Imputation

Data cleaning (also known as data cleansing or scrubbing) deals with detecting and removing errors and inconsistencies from data, in order to improve its quality [67]. The most important tasks carried out on this stage would include identification of outliers (entries that are significantly different from the rest and could be a result of an error), resolving of data inconsistencies (values that are not consistent with the specifications or contradict expert knowledge), dealing with missing data (removing the missing values, assigning those values to the attributes’ mean, using statistical algorithms to predict the missing values) or removing redundant data in different representations.

At this stage of the pre-processing phase, two data sets are prepared. For the purposes of the first two case studies (presented later in this chapter), a data set only containing samples with complete data values is extracted, with the data that could not have been fully intercepted and recognised removed by applying listwise deletion. The second data set (used for the final case study) is received after applying multiple imputation, performed as described below.

3.2 Dealing with Missing Data—Data Imputation

To estimate the values of the missing multivariate data, a sequential imputation algorithm, presented in [68] is used. According to it, if the available data set is denoted with Y and the complete subset with Yc, the procedure starts from the complete subset to estimate sequentially the missing values of an incomplete observation Y*, by minimizing the covariance of the augmented data matrix Y* = [Yc, x*]. Subsequently the data sample x* is added to the complete data subset and the algorithm continues with the estimate of next data sample with missing values.

Implementations in R of the original algorithm (available under the function name “impSeq”) and two modifications of it (namely “impSeqRob” and “impNorm”) are considered and tested. As the original algorithm uses the sample mean and covariance matrix, it is vulnerable to the presence of outliers, but this can be enhanced by including robust estimators of location and scatter (which is realised in the “impSeqRob” function). However, the outlyingness metric can be computed for a complete dataset only, therefore the sequential imputation of the missing data is done first and then the outlyingness measure is computed and used to define whether the observation is an outlier or not. If the measure does not exceed a predefined threshold, the observation is included in the next stage of the algorithm. In our investigation, however, the use of modified “impSeqRob” and “impNorm” versions did not produce better results when tested on complete dataset (which may be simply due to the lack of outliers), so the “impSeq” function was adopted.

After employing MI on the data samples with missing continuous values, a second dataset of 15656 observations is received, which is more than double the size of the first dataset. Table 5 shows the inputted values produced by the MI algorithm for the sample subset, presented previously in Table 3.

Table 5 Sample radar data subset with imputed values for the missing data entries

3.3 Data Coding and Transformation

This stage of the pre-processing aims to transform the data into a form that is appropriate for feeding to the selected classifier and would facilitate faster and more accurate machine learning.

In particular, a transformation known as coding is applied to convert the categorical values presented in the data set into numerical ones. Three of the most broadly applied coding techniques are investigated and evaluated—continuous, binary and introduction of dummy variables.

For the first type of coding, each of the categorical values is substituted by a natural number, e.g., the 12 categories for the RFC input are encoded with 12 ordinal numbers, the 15 PRC categories—with 15 ordinal numbers, etc. A sample of data subset coded with continuous values is given in Table 6.

Table 6 Sample subset with imputed radar data and natural number coding for the ‘RFC’, ‘PRC’, ‘PDC’, and ‘ST’ signal characteristics

Binary coding, wherein each non-numerical value is substituted by log 2 N (where N is the number of categories taken by that variable) new binary variables (i.e. taking value of either 0 or 1), is illustrated in Table 7 for 32 categories.

Table 7 Example of binary coding for 32-level categorical variable

Finally, the non-numerical attributes are coded using dummy variables. In particular, every N levels of a categorical variable are represented by introducing N dummy variables. An example of dummy coding for 32 categorical levels is shown in Table 8.

Table 8 Example of dummy coding for 32-level categorical variable

Taking into account the large number of categories presented for the categorical attributes in the input data set (Table 4), continuous and binary codings are considered for transforming the input variables. On the other hand, binary and dummy variable codings are chosen for representing the output parameters.

Finally, in order to balance the impact of the different input parameters on the training algorithm, data scaling is used. Correspondingly, each of the conducted experiments in this chapter is evaluated using 3 forms of the input data set: the original data (with no scaling); normalised data (scaled attribute values within [0, 1] interval); and standardised data (i.e. scaling the attribute values to a zero mean and unit variance). A sample binary coded and standardised data subset is given in Table 9.

Table 9 Sample subset with imputed radar data and binary coding

3.4 System Training

The investigated neural network topologies include one hidden layer, with fully connected neurons in the adjacent layers and batch-mode training. For a given experiment with P learning samples, the error function is presented as:

$$E_{P} = \frac{1}{2}\sum\limits_{p = 1}^{P} {\sum\limits_{i = 1}^{L} {\left( {x_{i}^{p} - t_{i}^{p} } \right)^{2} } } ,$$
(1)

where for each sample p = 1, …, P and each neuron of the output layer i = 1, …, L, a pair (x i , t i ) of NN output and target values, respectively, is defined.

4 Results and Discussion

A number of experiments are designed, implemented, executed and evaluated to test and validate the performance of the proposed intelligent system for identification and classification of radar signals. Two separate approaches are considered and the related results are grouped and presented in the following two case studies. MATLAB® and its Statistics, Neural Networks and Global Optimisation toolboxes are used for coding and running of all the experiments.

4.1 Case Study 1—Listwise Deletion and Feedforward Neural Networks

For the purposes of the first case study, samples that contain incomplete data (i.e. data that was not fully intercepted or recorded) are removed from the considered data set, resulting in a subset of 7693 complete data samples of radar signal values.

Subsequently, depending on the experiment to be performed, the samples are sorted by experts in several groups of major interest according to their application. In two classes for the first two experiments (“Civil” and “Military”), and in 11 classes for the purpose of the final one (4 from the “Civil” and 7 from the “Military” application areas).

A randomly selected, no missing data sample subset (after listwise deletion) is presented in Table 10. Its first column (the ID attribute) is retained for referencing purposes only and it is not used during the classifier’s training.

Table 10 Sample radar data subset with no missing values, received after listwise deletion

Next, a coding transformation (as described in Sect. 3.2) is applied to convert the categorical values in the data set to numerical ones. Taking into account the large number of categories in the inputs (Table 4), continuous and binary codings are considered for transforming the input variables. On the other hand, binary and dummy variable representations are used for transforming the output parameters.

In order to balance the impact of the different input parameters on the training algorithm, data scaling is applied. Respectively, each of the experiments conducted for the purposes of this case study is evaluated using three forms of the input data set—the data itself (with no scaling), after normalisation (i.e., scaling the attribute values to fall within a specific range, for example [0 1]), and after standardisation (i.e. scaling the attribute values to a zero mean and unit variance). A sample binary encoded and normalised data subset is given in Table 11.

Table 11 Sample radar data subset with no missing values (listwise deletion), after binary encoding and normalisation

The investigated NN topologies include one hidden layer, with fully connected neurons in the adjacent layers and batch-mode training. For a given experiment with P learning samples, the error function is given with Eq. 1. Supervised NN learning with Levenberg-Marquardt algorithm and tangent sigmoid transfer function is used. A split-sample technique using randomly selected 70 % of the available data for training, 15 % for validation and 15 % for testing, and mean squared error (MSE) is adopted for evaluating the learning performance. The stopping criteria is set to 500 training epochs, gradient reaching less than 1.0e-06 or if 6 consequent validation checks fail, whichever occurs first.

For the purposes of the first experiment, the categorical attributes of the input data are coded with consecutive integers. In this way a total of 12 input variables are received (Table 6). Two neural network topologies are examined—12-10-1 (12 neurons in the input, 10 neurons in the hidden and 1 neuron in the output layers) and 12-10-2, where the output parameter is coded as one binary neuron taking values 0 (“Civil”) and 1 (“Military”) for the first topology and 2 binary neurons, taking values 10 (“Civil”) and 01 (“Military”) for the second topology (Fig. 4). The performance of each of the topologies is investigated, evaluated and compared after training with the original, normalised and standardised data. The results are summarised in Table 12 and Fig. 5.

figure 4
Table 12 Classification performance (over the testing set) for continuous input coding and 12-10-N topologies with no data scaling, after normalisation and after standardisation

The second experiment investigates two additional NN topologies: 22-22-1 and 22-22-2, where the output parameter is again coded by one binary neuron (0 for “Civil” and 1 for “Military”) for the first topology and by two binary neurons for the second one (10 for “Civil” and 01 for “Military”). Again, the performance of each of the topologies is investigated, evaluated and compared using the original data, after normalisation and after standardisation. The results are summarised in Table 13.

Table 13 Classification performance (over the testing set) for binary input coding and 22-22-N topologies with no data scaling, after normalisation and after standardisation
Fig. 5
figure 5

Classification results for 12-10-2 NN classifier with normalised input data and a validation stop after 118 epochs. The values in green specify the correctly classified samples for each class (10—Civil, 01—Military)

Similarly to the first experiment, sample confusion matrices are presented in Fig. 6 for a 22-22-2 NN classifier trained with standardised input data. A very high accuracy of 84.3 % on the testing data set is achieved after 114 epochs and activation of the validation check stopping criteria (unsatisfactory performance on the validation data set in six successive iterations).

Fig. 6
figure 6

Classification results for 22-22-2 NN classifier with normalised input data and a validation stop after 114 epochs. The values in green specify the correctly classified samples for each class (10—“Civil”, 01—“Military”)

The final experiment in this case study investigates a broader output space of 11 classes (4 from the “Civil” and 7 from the “Military” domain) and evaluates a 22-22-11 NN classifier with unscaled, normalised and standardised training data using dummy variable coded outputs. Summary of the obtained results is presented in Table 14 and a sample confusion matrix for the investigated classifier with standardised input training data is given in Fig. 7, where a good recognition rate of 67.49 % can be observed.

Table 14 Classification performance (over the testing set) for binary input coding and 22-22-11 topology with no data scaling, after normalisation and after standardisation
Fig. 7
figure 7

Classification results for 22-22-11 NN classifier with standardised data on 7 military (M1“Multi-function”, M2“Battlefield”, M3“Aircraft”, M4“Search”, M5“Air Defense”, M6“Weapon” and M7“Information”) and 4 civil classes (C1“Maritime”, C2“Airborne Navigation”, C3“Meteorological” and C4“Air Traffic Control”)

Although a straightforward comparison with radar classification studies reported by other authors might be misleading, due to the different data sets, model parameters and training methods used, the achieved results appeared to be strongly competitive when compared to the ones reported in [30, 32, 48, 49, 60]. Furthermore, additional improvement is expected, if further statistical pre-processing techniques, missing data handling routines, NN topologies or training algorithm parameters are investigated (as shown in the next two case studies).

4.2 Case Study 2—Multiple Imputation and Feedforward Neural Networks

The second case study follows the same sequence of experiments and NN topologies, as introduced in the first study, however, this time an extended dataset, received after multiple imputation of the missing data values (as described in Sect. 3) is used.

For the purposes of the first experiment in this study, the categorical attributes of the input data are coded with consecutive integers. Two NN topologies are examined—12-10-1 and 12-10-2, where the output parameter is coded as one binary neuron taking values 0 (“Civil”) and 1 (“Military”) for the first topology and 2 neurons, taking binary values 10 (“Civil”) and 01 (“Military”) for the second one.

The performance of each of the topologies is investigated, evaluated and compared using training with the original data (no pre-processing), and after normalisation and standardisation. The results are summarised in Table 15 showing up to 5 % accuracy improvement for the case introducing imputation.

Table 15 Classification performance (over the testing set) for continuous input coding and 12-10-N topologies with no data scaling, after normalisation and after standardisation

Sample confusion matrices for a 12-10-2 NN classifier trained with normalised input data and a validation stop activated after 106 epochs are given in Fig. 8. They demonstrate improved accuracy rates (especially for the “Military” class) when compared to the case studies using listwise deletion to cope with the incomplete data samples (Fig. 5).

Fig. 8
figure 8

Classification results for imputed data case for 12-10-2 NN classifier with normalised input data and a validation stop after 106 epochs. The values in green specify the correctly classified samples for each class (10—“Civil”, 01—“Military”)

The second experiment in this study investigates two additional NN topologies—22-22-1 and 22-22-2, where the output is again coded by one binary neuron (0 for “Civil” and 1 for “Military”) for the first topology and by two binary neurons for the second one (10 for “Civil” and 01 for “Military”).

The NN performance for each of the topologies is investigated, evaluated and compared using the original, normalised and standardised data for both the cases—with and without imputed values. The performance results are summarised in Table 16, again showing improved NN performances for the cases with imputed data.

Table 16 Classification performance (over the testing set) for binary input coding and 22-22-N topologies with no data scaling, after normalisation and after standardisation

The final experiment investigates a broader output space of 11 classes (4 “Civil” and 7 “Military”) and evaluates 22-22-11 NN classifiers with the original, normalised and standardised training data, and with dummy variable coded outputs. Summary of the obtained results when training on data subsets with and without imputation is presented in Table 17.

Table 17 Classification performance (over the testing set) for binary input coding and 22-22-11 topology with no data scaling, after normalisation and after standardisation

Sample confusion matrices for the imputed 22-22-11 NN case, trained with standardised input data and a validation stop activated after 98 epochs are presented in Fig. 9. Although the results seem slightly inferior to the listwise deletion case (Fig. 7), they give higher statistical confidence because of the increased number of samples.

Fig. 9
figure 9

Classification results for inputed data and 22-22-11 NN classifier with standardised data on 7 military (M1“Multi-function”, M2“Battlefield”, M3“Aircraft”, M4“Search”, M5“Air Defense”, M6“Weapon” and M7“Information”) and 4 civil classes (C1“Maritime”, C2“Airborne Navigation”, C3“Meteorological” and C4—“Air Traffic Control”)

It can also be seen from Fig. 9 that although the accuracy of the NN classifier is relatively the same (compared to the NN trained after listwise deletion (Fig. 7)), the number of hits is largely increased and with a better distribution. This is especially evident for the ‘M7’ class, for which there were no hits in the case without imputation. The best accuracy is again achieved for the ‘M4’ and ‘C1’ classes, but the more important achievement as a result of the imputation is the uniform distribution of correctly classified samples. As illustrated in Fig. 7, the class accuracy variance for the classification with no missing data is very high, from 0 to 87.9 %, whereas in the case using imputed data (Fig. 9), it is between 22.6 and 87.4 %. In other words, while keeping the best accuracy almost the same, the minimum accuracy is improved by more than 22 %. This should be attributed to the greater number of available training and testing samples as a result of the imputation, which increases the statistical power of the dataset and subsequently improves the classification performance of the NN.

5 Conclusion

Reliable and real-time identification of radar signals is of crucial importance for timely threat detection, threat avoidance, general situation awareness and timely deployment of counter-measures. In this context, this chapter investigates the potential application of NN-based approaches for timely and trustworthy identification of radar types, associated with intercepted pulse trains.

A number of experiments are designed, implemented, executed and evaluated for testing and validating the performance of the proposed intelligent systems for solving the investigated classification tasks. The different experiments study a variety of NN topologies, data transformation techniques and missing data handling approaches.

The simulations are divided in two broad case studies, each of which conducts several sub-experiments. In the first one, all the signals are pre-classified by experts into between 2 and 11 classes, depending on the experiment, and then a listwise deletion is used to clean the data from incomplete samples. As a result, very competitive classification accuracy of about 81, 84 and 67 % is received for the different recognition tasks.

In the second one, a study applying a multiple imputation model-based approach for dealing with the large number of missing data (contained in the available radar signals data set) is investigated. The experiments conducted for the purposes of the first case study are repeated, but this time using the imputed data set for training of the classifiers. An improved accuracy of up to 87.3 % is achieved. The results are compared and critically analysed, showing overall improved accuracy when the NN are trained on the larger subset with imputed values.

Although a straightforward comparison to radar classification studies, reported by other authors might be misleading, due to the different data sets, model parameters, data transformations, training and optimisation methods used, the achieved results are strongly competitive to the ones reported in [30, 42, 48, 49, 52, 60].

Potential areas for further extension of this research include investigation of additional statistical transformation techniques, such as Principal Component Analysis (PCA), Non-Linear Principal Component Analysis (NLPCA), and Linear Discriminant Analysis, for decreasing the dimensionality of the problem and increasing the separability between the classes. In terms of classifiers, we presented supervised learning and classification, but unsupervised learning techniques (such as self-organising maps (SOM)) can also be considered, as well as varying other training parameters and exploring additional NN topologies. Finally, additional classes can be introduced, in order to achieve more specific classification of the intercepted radar data.