1 Introduction

The effectiveness of infrared and radar fusion detection has been proved by massive applications in early warning and guidance. The difficulty of fusion detection lies in the fact that infrared detector and radar are heterogeneous sensors. It is necessary to construct joint feature vectors through preprocessing steps. Additionally, the infrared image provides image information on different spectral ranges (Wen et al. 2018a, b; Jia et al. 2017; Junwei et al. 2013; Wang et al. 2016), and various features of the target can be extracted from the radar echo (Eryildirim and Onaran 2011; Zhai and Jiang 2014). In air defense early warning systems, the preprocessing problems such as data volume redundancy and feature redundancy of infrared and radar fusion target detection are still unsolved hot issues.

Figure 1 is a flow chart of the infrared and radar fusion detection scheme, which considers the target detection problem as a two-class problem of “target” and “non-target”. The steps of feature fusion include: data set homogenization and normalization, feature reduction and data association, and feature matching (Luo et al. 2011). The first two steps are pre-processing, the purpose is to build and optimize the data set for the classifier to perform feature matching. Preprocessing is a bridge connecting classifiers and specific problems.

Fig. 1
figure 1

Infrared and radar fusion detection

Farcomeni and Greco (2016) systematically discussed the sample reduction. Zhang et al. (2016) proposed a sample reduction method similar to the filtering method. The experimental results on the face data set show that the sample reduction improves the detection effect. Tang et al. (2017) mentioned the Laplacian embedded sample reduction method to solve the sample problem in cluster analysis. These studies primarily screen unsupervised data sets through statistical theory. To construct a feature dataset of infrared images, the dataset needs to contain interference, clutter, background, and abstract the target into a data vector. Relevant research has not been found.

In terms of feature dimension reduction, Ghojogh et al. (2019) and Li et al. (2018) have made detailed summarys of the related research on feature dimension reduction. Similarly, infrared and millimeter-wave fusion data sets are highly interpretable compared to dictionary-like data sets, which is not considered in previous studies.

In infrared and radar fusion detection, data volume redundancy and feature redundancy are two problems encountered in practical applications, and need to be overcome in the preprocessing stage. For the problem of excessive data volume in the data set from the infrared image, we propose a sample reduction method based on Human Visual System (HVS) to reduce the volume of data set. By eliminating the background and clutter data that are not helpful in classification, the data set is optimized. A nearest neighbor data association is performed to construct a joint feature vector. For the problem of feature redundancy, a Prior Weighted Mutual Information Feature Selection (PWMI) is performed using the physical meaning of the feature. The experiments show that the joint data set after sample reduction and feature selection can obtain higher detection probability than the unfiltered data set. The application scenarios of infrared and radar fusion target detection are varied. The average performance of PWMI is comparable to that of classical algorithms. And it is superior to other feature selection methods in the case of significant difference between training set and detection set. The fusion detection scheme proposed can effectively solve the problems in the fusion detection and improve its robustness.

2 Sample reduction of infrared data sets based on hvs

A single infrared image contains approximately 480 × 640 pixels. If we convert all the pixels into a data set \(\Omega\), an excessive amount of data will consume a large amount of computing resources, and the redundant information will also affect the detection result. To build a data set based on more images, we must filter \(\Omega\) to obtain a reducted subset. In infrared small target detection, the non-target part contains low-grey pixels, which composed of clear sky and low-brightness clouds, and includes high-grey pixels, which are composed of highlight clouds, buildings, trees, and fake targets. The target itself is a high-grey point, and the false alarm is mainly derived from the background noise and the target of the high grey scale. The background noise of the low grey scale has little influence on the detection of the small target than the former. Low grey noise exacerbates the imbalance between the target and background data, but does not contribute much to the classification. By excluding these low-grey noises to construct a reducted data set \(\Omega^{R}\), the efficiency of the algorithm can be improved without reducing the classification accuracy.

The human visual system (HVS) is an efficient screening method. The infrared image is transformed into the LOG scale space, and the greyscale difference mechanism of the HVS is used to suppress the peripheral region of the pixel to be inspected. At the same time, the transformation on multiple scales also utilizes the multiscale mechanism of the HVS to extract the scale of the points of interest.

As Fig. 2 shows, infrared image \(I\) is subjected to LOG scale spatial transformation with scales of \(\sigma_{1} , \ldots \sigma_{N}\), and a plurality of scale spatial transformation maps \(I_{1} , \ldots I_{N}\) are obtained. The transformation maps of the respective scales are normalized. For point \(\left( {x,y} \right)\), the scale \(\sigma_{m} \left( {x,y} \right)\) of the normalized scale space map sequence that takes the transform value \(I_{n} \left( {x,y} \right)\) to the maximum value \(I_{m} \left( {x,y} \right)\) is the scale of the data point. The maximum value of the scale space images \(I_{1} , \ldots I_{N}\) is obtained pixel by pixel to obtain a multiscale fused image \(I_{m}\).

Fig. 2
figure 2

HVS-based infrared image sample reduction

$$I_{m} \left( {x,y} \right) = \mathop {max}\limits_{n = 1 \ldots N} \left[ {I_{n} \left( {x,y} \right)} \right],\quad n = 1,2 \ldots N$$
(1)
$$\sigma_{m} \left( {x,y} \right) = \arg \mathop {\hbox{max} }\limits_{n = 1 \ldots N} \left[ {I_{n} \left( {x,y} \right)} \right],\quad n = 1,2 \ldots N$$
(2)

Fast non-maximum suppression is conducted through image meshing. \(I_{m}\) is divided into image blocks of.\(G_{1,1}\), \(G_{1,2}\)\(G_{{1,g_{n} }}\), \(G_{{2,g_{n} }}\) … and \(G_{{g_{m} ,g_{n} }}\) of \(g_{m}\) rows and \(g_{n}\) columns by grids of a side length \(a\). For each image block \(G_{i,j}\), its maximum value \(G\hbox{max} \left( {i,j} \right)\) and coordinates \(\left( {x_{i,j} ,y_{i,j} } \right)\) are obtained. \(G\hbox{max}\) is the maximum combination of size \(g_{m} \times g_{n}\). For all pixels on \(G\hbox{max}\), if the pixel is not larger than each value on all of 8 neighbourhoods, the pixel is suppressed to 0. Traverse every grid \(G_{i,j}\) for \(G\hbox{max} \left( {i,j} \right)\), Combine them to obtain \(G\hbox{max}\), the maximum map after suppression. As the Fig. 2 shows, the size of \(G\hbox{max}\) is \(g_{m} \times g_{n}\), which contains all the non-zero pixels in \(I_{m}\). The non-zero pixels in \(G\hbox{max}\) is the reducted data.

For the pixel \(G\hbox{max} \left( {i,j} \right)\), the corresponding infrared data point coordinates are \(\left( {x_{i,j} ,y_{i,j} } \right)\), the grey value is \(f_{Grey} = I\left( {x_{i,j} ,y_{i,j} } \right)\), the scale is \(f_{Scale} = \sigma_{m} \left( {x_{i,j} ,y_{i,j} } \right)\), and the gradient is \(f_{Grad} = func_{g} \left( I \right)\left( {x_{i,j} ,y_{i,j} } \right)\), in which \(func_{g} \left( \cdot \right)\) is the gradient function. The infrared classical feature vector is \(\left( {f_{Grey} ,f_{Scale} ,f_{Grad} } \right)\).

It can be seen from the process of non-maximum suppression that the data point obtained by the suppression is the maximum value among the 9 adjacent grids. For any point, even if the point is located at the edge of the grid, it will be the largest in the vicinity of the range of \(2a \times 2a\). Figure 3 shows a point located at the edge of the grid of the second row, second column. Even at the edge of the grid, the \(2a \times 2a\) area around the point is still in the suppression range (the adjacent 9 grids).

Fig. 3
figure 3

HVS-based infrared image sample reduction

\(2a \times 2a\) is a range that can be represented by a single maximum point. As the Society of Photo-Optical Instrumentation Engineers defined, a small target has a total spatial extent of less than 80 pixels, occupying no larger than 9 × 9 (Chen et al. 2013). To ensure that the suppression result of a target contains only one output, the range of \(2a \times 2a\) should be no less than 9 × 9. Take \(a = 5\), thus the target will be extracted into a single point after suppression, at the same time multiple targets will not be confused as a single point.

Algorithm 1 shows the pseudocode for the sample reduction of an infrared image data set.

figure a

3 Prior weighted mutual information feature selection

Feature selection is used to exclude features that do not contribute to classification from feature vectors and to optimize the feature vectors that contain redundant and repeated information. After data association, the feature vectors are high-dimensional infrared features and radar features. Unlike the dictionary-like features, these features have physical meanings, strong interpretability, and different physical characteristics due to their physical sources and extraction principles. Additionally, there is strong independence between different features. Based on these characteristics, this manuscript proposes a Prior Weighted Mutual Information Feature Selection based on a feature’s prior information.

First, a brief introduction to information and mutual information theory is given. Shannon Information Theory gives the concept of information entropy. The information entropy of random variables \(Y\) is:

$$H(Y) = - \sum\limits_{y = 1}^{{N_{y} }} P (y)\log P(y)$$
(3)

Entropy measures the degree of uncertainty of a single variable \(Y\), and its value is determined by the distribution of variable \(Y\). To measure the correlation between two variables, the concept of mutual information is proposed. The expression of the mutual information of variables \(X\) and \(Y\) is

$$I(X;Y) = - \sum\limits_{x,y} {P\left( {x,y} \right)\log \left( {\frac{{P\left( {x,y} \right)}}{P\left( x \right)P\left( y \right)}} \right)}$$
(4)

in which \(P\left( {x,y} \right)\) is the joint probability density function of variables \(X\) and \(Y\), \(P\left( x \right)\) and \(P\left( y \right)\) are the marginal probability density functions, and the mutual information is the relative entropy between the joint distribution \(P\left( {x,y} \right)\) and the product distribution \(P\left( x \right)P\left( y \right)\).

Feature selection based on mutual information is a filtering method. It can simultaneously consider the extent to which the feature itself contributes to the classification and the actual degree of correlation between the features. The mutual information value characterizes the dependency between a single feature and a data class, and is thus the dependency between two features. Based on the combination and iteration of mutual information values, each feature is scored to achieve sorting and optimization of the features. Classical methods include mutual information maximization (MIM) (Lewis 1992), mutual information preference (MIFS), minimum redundancy maximum correlation (mRMR) (Peng et al. 2005), conditional mutual information feature (CIFE) (Lin and Tang 2006), joint mutual information (JMI) (Meyer et al. 2008), etc. Among them, mRMR has been widely praised in the industry and is the most representative classic mutual information feature selection method.

$$mRMR\left( {X_{k} } \right) = MI\left( {X_{k} ,Y} \right) - \frac{1}{\left| S \right|}\sum\limits_{{X_{j} \in S}} {\left[ {MI\left( {X_{k} ,X_{j} } \right)} \right]}$$
(5)

mRMR adopts the method of forward search. It gradually adds the feature of the highest score to an empty set and continuously updates the score of the remaining features. Equation (6) is the score of the kth feature, \(\left| S \right|\) is the number of features in the selected feature set \(S\), and \(\frac{1}{\left| S \right|}\) is the factor of correlation between two selected features. By setting a threshold for the feature’s score, select the feature that meets the requirements is selected.

Mutual information statistically recognizes the contribution of features to classification and the degree of redundancy relative to other features. Different from the statistical significance emphasized by these methods, infrared and radar features have clear physical meaning and are highly interpretable, and their prior information can be used to assist in feature selection. The statistical knowledge acquired through the data set is subject to the limitations of the data set itself, and is thus easy to overfit. Radar features and infrared features are derived from heterogeneous sensors, and the redundancy of information between them is minimal. The redundancy of information between infrared features based on different visual theories is higher, and the redundancy of information between features with the same principles is the highest.

Therefore, this paper proposes the Prior Weighted Mutual Information Feature Selection method:

$$PWMI\left( {X_{k} } \right) = MI\left( {X_{k} ,Y} \right) - \frac{{\beta \left( {k,j} \right)}}{\left| S \right|}\sum\limits_{{X_{j} \in S}} {\left[ {MI\left( {X_{k} ,X_{j} } \right)} \right]}$$
(6)

where \(\beta \left( {k,j} \right)\) is the weight coefficient of the mutual information value of feature \(X_{k}\) and feature \(X_{k}\).

To elaborate on the application of PWMI, Fig. 4 shows the weight distribution of a feature vector in feature selection. The characteristics adopted in the experiment are divided into three categories: infrared classical features, infrared visual features and radar features. There are 3 infrared classical features: greyscale, scale, and Sobel gradient. According to their sources, they can be divided into three groups: scale, greyscale and gradient. 9 visual features are listed: LoG (Shao et al. 2012), ILOGb (Han et al. 2016), LoPSF (Moradi et al. 2018), EbLDM (Deng et al. 2017), LCM (Han et al. 2019), LGSM (Wei et al. 2016), MLHM (Nie et al. 2018), MPCM (Wei et al. 2016), NWIE (Deng et al. 2016). According to their sources, they can be divided into 4 groups: attention mechanism characteristics, contrast mechanism characteristics (difference), contrast mechanism characteristics (quotient), and edge mechanism. 4 radar characteristics are listed: acceleration, velocity, distance, and RCS. According to their sources, they are divided into three groups: speed, distance, and scattering area. The weight coefficient \(\beta_{k',j'}\) is determined by the categories and group of the two features.

Fig. 4
figure 4

A prior weight distribution of two feature mutual information

In this example, the weights of the features based on the same principle are \(\beta_{inner}\). The weights of the features belonging to a large class of features and different groups are \(\beta_{nor}\) (both infrared classical features), \(\beta_{vis}\) (both infrared visual features), and \(\beta_{radar}\) (both radar features). The weights of the two features belonging to the radar feature and the infrared feature are \(\beta_{n\_r}\) and \(\beta_{v\_r}\), respectively. The weights of features belonging to the infrared classical features and the visual features are \(\beta_{n\_v}\). The distribution of \(\beta_{k',j'}\) is shown in Fig. 4.

In Eq. (6), \(MI\left( {X_{k} ,X_{j} } \right)\) is the mutual information value of the features \(X_{k}\) and \(X_{j}\). A high mutual information value indicates a strong correlation in the statistical sense. Additionally, features of the same group characterize the same aspect of the HVS. For example, the infrared LoG feature and the LoPSF feature both reflect the same saliency in terms of the HVS. The information extracted by the two features has a certain overlap, thus the correlation between them is strong.

To minimize redundant information, it is given a higher weight coefficient, making it inferior to other categories. For features belonging to the same category and different groups, the weight coefficient is smaller than the former coefficient. For features belonging to different classes, which are extracted from different sources of information, the degree of redundancy between the two features is minimal, thus giving them the smallest weighting factor.

The characteristic weight \(\beta \left( {k,j} \right)\) of PWMI and the relationships of each value are as follows:

$$\beta \left( {k,j} \right) = \left\{ {\begin{array}{*{20}l} {\beta_{n\_m} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong }}\;{\text{to}}\;{\text{ infrared}}\;{\text{ regular}}\;{\text{ feature}}\;{\text{ and}}\;{\text{ radar}}\;{\text{ feature}}} \hfill \\ {\beta_{v\_m} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong }}\;{\text{to }}\;{\text{infrared }}\;{\text{visual}}\;{\text{ feature }}\;{\text{and }}\;{\text{radar}}\;{\text{ feature}}} \hfill \\ {\beta_{n\_v} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong}}\;{\text{ to }}\;{\text{infrared }}\;{\text{regular}}\;{\text{ feature }}\;{\text{and }}\;{\text{infrared }}\;{\text{visual }}\;{\text{feature}}} \hfill \\ {\beta_{nor} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong }}\;{\text{to }}\;{\text{infrared }}\;{\text{regular}}\;{\text{ feature }}\;{\text{but }}\;{\text{different}}\;{\text{ groups}}} \hfill \\ {\beta_{vis} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong }}\;{\text{to }}\;{\text{infrared }}\;{\text{visual}}\;{\text{ feature }}\;{\text{but}}\;{\text{ different}}\;{\text{ groups}}} \hfill \\ {\beta_{mmw} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong }}\;{\text{to radar}}\;{\text{ feature}}\;{\text{ but }}\;{\text{different}}\;{\text{ groups}}} \hfill \\ {\beta_{inner} } \hfill & {X_{k} \;and\;X_{j} \;{\text{belong}}\;{\text{ to }}\;{\text{the}}\;{\text{ same}}\;{\text{ feature }}\;{\text{and}}\;{\text{ the }}\;{\text{same}}\;{\text{ group}}} \hfill \\ \end{array} } \right.$$
(7)
$$1 \le \beta_{n\_w} = \beta_{v\_m} < \beta_{n\_v} < \beta_{nor} = \beta_{vis} = \beta_{mmw} < \beta_{inner}$$
(8)

The value of the weight coefficient A in the paper is determined experimentally as:

$$\begin{aligned} \beta_{n\_m} & = \beta_{v\_m} = 1 \\ \beta_{n\_v} & = 1.1 \\ \beta_{nor} & = \beta_{vis} = \beta_{mmw} = 1.3 \\ \beta_{inner} & = 1.5 \\ \end{aligned}$$
(9)

4 Experiment and analysis

To verify the validity and feasibility of the proposed fusion detection method, the training set and test set are respectively established using the measured data and simulation data under different times and for different scenes, and the RBF-SVM classifier is used for classification. Three experiments of fusion detection and single sensor detection, HVS sample reduction, and PWMI were conducted and the results were analysed.

4.1 Data set and experiment settings

The data set used in the experiment is an infrared and radar feature fusion data set. The infrared features are collected by 12 image sequences for different scenes. The collection scene and image characteristics of each sequence are shown in Table 1:

Table 1 Statistics of small target detection

The radar features in the data set are based on the simulated data at different settings. The settings for the simulated data are shown in Table 2:

Table 2 Radar feature simulation settings

The data set required for the experiment is constructed by infrared image and simulated radar signals of different scenes. A total of 4 infrared features are included: grey scale, scale, and sobel gradient. The 9 visual features are: LoG, IDoGb, LoPSF, EbLDM, LCM, LGSM, MLHM, MPCM, and NWIE, and the 4 radar features are: acceleration, speed, distance, and RCS.

To obtain more reliable classification performance, the experiment uses the cross-validation method to perform tenfold cross-validation on the data set. Each experiment takes turns to use different parts of the data as a training set and test set in a 4:1 ratio. Each experiment will determine the detection probability and false alarm probability of the experiment.

4.2 Results and analysis of detection based on infrared features, radar features and all features

To verify the superiority of fusion detection, target detection is conducted based on infrared features, radar features and all features, respectively. The detection results are shown as follows:

Figure 5a shows the ROC curve after parameter tuning, where the circular mark is the threshold taken by the SVM classification, and the Fig. 5b is a partial enlargement of the low false alarm probability.

Fig. 5
figure 5

ROC curves of single sensor detection and fused detection

We chose detection probability (Pd) and false alarm probability (Pfa), and Area Under Curve(AUC) as the indicators. Pd and Pfa are classic indicators of target detection. AUC represents the area under the ROC curve. And its value is independent of the threshold, thus it can measure the performance of the classifier more comprehensively. The larger the AUC is, the better the classifier detection effect. The experimental results are shown in Table 3:

Table 3 Single sensor detection and fusion detection results

When only infrared features or radar features are utilized, the ROC curves are located below the ROC curve of the fusion detection. The circular mark on the ROC curve is the classification threshold, and the SVM’s classification results for the fusion data set are also superior to the single sensor. The AUC value of the fusion detection is higher than the single sensor detection, which also verifies the previous results.

4.3 Results and analysis of fusion detection based on HVS infrared data set sample reduction

The infrared data set was constructed in a manner of no screening and HVS sample reduction, respectively. In the control group, the image pixels were randomly selected as the data set without screening. In the experimental group, the data set was selected based on the HVS-based data set sample reduction. The parameters of the data set and the corresponding detection results based on it are shown in Table 4:

Table 4 Data set construction and detection results

The amount of data in the randomly sampled data set is adjusted so that the amount of data is roughly equal. The detection results show that the reducted data set has a higher probability of detection. The ROC chart of the detection results of the two data sets is shown below:

Figure 6 shows the ROC plot of the results, where (b) is a partial enlargement of the low false alarm probability portion of (a). The ROC curve based on the HVS data set is higher than the randomly sampled data set, and has a higher AUC value. Thus, its detection effect is better than the randomly sampled data set.

Fig. 6
figure 6

Detection results with sample reduction and without

4.4 Results and analysis of fusion detection based on PWMI feature selection

To verify the selection effect of the PWMI feature selection method, 5 classical feature selection methods [Fisher Score (Duda et al. 2012), MI (Lewis 1992), mRMR (Peng et al. 2005), CIFE (Lin and Tang 2006), and JMI (Meyer et al. 2008) and ILFS (Roffo et al. 2017)] were used as the control group. The results of each classification and the average results of the multiple trials were analysed to verify the effectiveness of the proposed method. 5 partitioning schemes were selected to divide the data set into training set and test set. The detection probabilities and number of selected features of each partitioning scheme and the mean value of each scheme are shown in Table 5:

Table 5 Detection probability and number of selected features when applying different feature selection methods

As shown above, the feature selection algorithm with the highest detection probability is marked in bold, and “Ave” represents the average value of the 5 schemes. In the first, third, and fifth experiments, the detection results based on the feature selection data set are superior to the complete data set in the detection probability, and in the second and fourth experiments, the detection probability of the feature selection data set is equivalent to the complete data set. Since some features are lowly correlated with the target, filtering out these features helps improve the detection probability by the classifier. From the average of the detection probability, the feature selection method with the highest detection probability is mRMR, with a detection probability of 0.7574, which is generally better than other feature selection algorithms. The average detection probability of PWMI is 0.7396, which is second only to mRMR.

However, the case of Experiment 1 is quite special. The training set consists of image sequences 1–9, and the test set consists of image sequences 10–12. It can be seen from Table 4 that the training set and the detected concentrated infrared features are obtained for completely different scenes. Using all the features for classification, the detection probability is lower than most of the selected detection probabilities. mRMR is the best-performing feature selection method in many experiments, but in experiment 1 its detection probability is lower than PWMI. Under the condition that the difference between the training set and the test set is large, only the statistical information of the training set is used to select the optimal feature, and the most suitable feature cannot be selected accurately. It is thus necessary to judge the priority of the selection considering the physical meaning of each feature. Figure 7 shows the variation of the detection probability in Experiment 1 with the number of selected features, and Table 6 shows the scores obtained by all feature selection algorithms.

Fig. 7
figure 7

The influence of the number of selected features on the detection probability

Table 6 Scores of various features by the feature selection algorithms

It can be seen that the PWMI and mRMR results are the same in the selection of the first 6 features, and the PWMI is better than the mRMR when the seventh feature is selected. As seen from Table 6, mRMR selects NWIE, and PWMI selects MLHM. The seven choices of MRMR and PWMI are marked by bolding, while their seventh choices are marked by underlining. MLHM belongs to the infrared visual feature of the contrast mechanism characteristic (quotient) group. There are no similar type features for the first 6 features selected. NWIE and EbLDM belong to the contrast mechanism characteristic (difference). The PWMI gives EbLDM a higher weight as punishment, so that the MLHM based on different physical principles is selected as the candidate target. Thus, a better detection probability is achieved under the condition that the training set and the test set have large differences.

5 Conclusion

The performance of infrared and radar fusion target detection is better than that of single sensor, and data preprocessing is the key to data fusion. By reducing the amount of data in the infrared data set, the detection probability is improved. The Prior Weighted Mutual Information Feature Selection method utilizes the physical meaning of the feature itself, and can achieve higher detection probability in the case of large difference of scenes, and the average detection probability is equivalent to the classical algorithm. It effectively improves the performance and robustness of the infrared and radar fusion detection.