1 Introduction

Infrared search and track (IRST) systems have been developed to achieve autonomous searching, detection, acquisition, tracking, and designation of potential incoming targets [6, 11]. The most important threats dealt with in sea-based IRST systems are incoming small targets, such as anti-ship sea skimming missiles (ASSM) or asymmetric ships. In this study, we concentrated on the detection of such targets encountered in sea-based IRST systems. The detection of long range small targets is quite difficult due to the small target signal and environmental clutter, such as sensor noise, sea gulls, clouds, and sun glints.

The related research can be summarized into two categorical goals: increasing the target detection rate and reducing the false alarm rate. The increase in the target detection rate means that true targets in images should be found as many as possible. It can be easily solved by a low threshold value. However, it generates many false detections in clutter area. So, we need to reduce the false alarm rate, simultaneously. Many studies have been conducted regarding how to increase the detection rate of small infrared targets either by enhancing the target signal or subtracting the background. The difference between the target intensity profile and background intensity profile, which is well-known in IRST, is used to enhance the signal-to-clutter ratio (SCR). The intensity difference can be between a center pixel and the surrounding pixels [28] or be adaptive according to the background structure [61]. There are different methods used to accomplish this: matched filters (template matching) [8, 28, 35], multi-scale methods [17, 32, 52], and the radial symmetry-based method [16]. The intensity surface can be modeled using the facet-based approach [53]. The background estimation-based detection scheme is a popular approach due to its simplicity and its good performance in small target detection. This method detects targets by subtracting the estimated background from the input image. The detection performance depends on how accurately the true background is estimated. The background image can be estimated from an input image using spatial filters, such as least mean square (LMS) [27, 41, 46], mean [55], median [42], and morphological (Top-Hat) filter [38, 54]. The LMS filter minimizes the difference between an input image and a background image estimated through the weighted average of the neighboring pixels. The mean filter estimates the background through a Gaussian mean or simple moving average. The median filter is based on order statistics. The median value can effectively remove point-like targets. The morphological opening filter can remove specific shapes by erosion and dilation with a specific structural element. The mean filter-based target detection method is computationally very simple, but it is sensitive to edge clutter. Target detection with non-linear filters, such as the median or morphology filters, has a low false alarm rate around the edge but is computationally complex. Combination filters, such as Max-Mean or Max-Median, can preserve the edge information of clouds and background structures [13]. There is also the data fitting approach that models the background using multi-dimensional parameters [50]. The super-resolution method is useful in background estimation and enhances small target detection [14].

There are several works regarding the removal or reduction of false detections. Their false alarm reduction strategies are strongly dependent on the situation. If a sensor platform is static, the information regarding the target motion is enhanced by the removal of several types of clutter, such as sun-glint. A well-known approach is the Track-Before-Detect (TBD) method. Its concept is similar to that of the 3D matched filter. The TBD method can remove static clutter, such as ground clutter [37, 40], and detect targets in an environment suffering from sun-glint [26]. Dynamic programming (DP), which is a quick version of the traditional TBD method, achieves a powerful performance in detecting dim targets [1, 7]. The temporal profiles, including mean and variance, at each pixel are effective in the detection of moving targets in slowly moving clouds [5, 45, 47, 49]. Accumulating the detection results of each frame makes it possible to detect moving targets [39]. The wide-to-exact search method was developed to enhance the speed of 3D matched filters [61]. Recently, an improved power-law-detector-based moving target detection method has been presented; it is effective for image sequences that occur in heavy clutter [56]. False alarms caused by sun-glint can be reduced through three-plot correlation with temporal filtering [23].

It is also possible to reduce false detection through the use of decision methods. These decision methods have to determine whether or not a probing region is a target. The hysteresis method has two thresholds. The first threshold is a very low value and is used to find the candidate target regions. The second threshold possesses a relatively high value which depends on the operational requirements [12]. As information about the size information becomes available, it is possible to remove large sun glints and other large objects. By applying an iterative threshold, we can obtain similar results [2]. Statistics-based adaptive threshold methods, such as the constant false alarm rate (CFAR), are useful in a severely cluttered background [9, 33]. If we apply the CFAR detector after a spatial filtering to an IRST image, we can obtain the detection results shown in Fig. 1b, where a lot of false detections caused by strong sun glints, cloud clutter, and ground clutter exist for a given test image, as shown in Fig. 1a. If we apply an additional temporal filter, such as a three-plot correlation, we can remove false detection in the sea surface region. However, false detections from cloud clutter and ground clutter still remain, as shown in Fig. 1c.

Fig. 1
figure 1

False alarm reduction method limitations: a the original infrared image, b spatial filter + CFAR detection only, and c additional three plot correlation

The target detection performance can be upgraded by introducing multi-features and machine learning. Zhang et al. [62] proposed spectral profile feature for sub-pixel target detection and tensor feature by combining spectral feature with spatial characteristics [63]. Yu et al. [5760] proposed multi-attribute features such as color histogram, shape feature (Hausdorff edge feature or shape context), and skeleton feature to represent and match cartoon characters. The proposed semisupervised multiview distance metric learning is effective in cartoon character classification and content-based image retrieval.

The focus of this study was on finding a novel approach to remove the remaining false detections. Motivated from the above works, we applied machine learning approaches for the target attribute features. A classifier divides the correct targets and clutter points in the feature space. The simplest method is the nearest neighbor classifier (NNC) algorithm, which uses only feature similarity [21]. In addition to NNC, there are the model-based Bayesian classifier [19], learning-based neural network, and support vector machine (SVM) [44] methods. Recently, manifold regularized methods have been presented to handle semi-supervised learning [3, 29, 31]. Classification information can be useful to remove various clutter points. However, it is not easy to apply these classification methods, because the targets are very small resulting in little available information.

Our contributions can be summarized as follows. The first contribution is to introduce the false alarm rejection scheme based on the machine learning methods. The second contribution is the proposition of eight kinds of small infrared target attributes, especially ranked-fill-ratio and rotational size variation. The third contribution is the proposition of novel feature selection method by area under ROC curve (AUC) metric-based sequential forward selection. The fourth contribution is the performance evaluation among state-of-the art classifiers such as Bayes, Adaboost, Kernel SVM, and Laplacian SVM. The last contribution is the experimental validation on real infrared target sequences.

Section 2 introduces the basics of the IRST small target detection method. Section 3 presents the proposed target discrimination algorithm including the feature analysis and feature selection method. The performance of our method is evaluated in Sect. 4 and we conclude in Sect. 5 .

2 The basics of filter-based small target detection

As shown in Fig. 2, the overall flow for small infrared target detection consists of spatial filtering, background subtraction, target detection, and a discriminator. First, we will discuss the basics of filter-based small target detection and then explain the details of the target discrimination method.

Fig. 2
figure 2

The overall flow of small target detection and discrimination

The state-of-the art small infrared target detector is based on two spatial filters: a modified mean subtraction filter (M-MSF) and a directional background estimation (DBE) and removal filter [22]. So the method is called Double Layered Filter (DLF). An input image (I(xy)) is pre-filtered using the filter coefficients to enhance the signal-to-clutter ratio (I SCR). Simultaneously, the background image (I BG(xy)) is estimated by a moving average kernel (MA7 × 7(x,y)). The background image is subtracted from the pre-filtered image, which produces a Modified Mean Subtraction Filter (M-MSF) image (I FL1(x,y)) by

$$ I_{{\rm FL}1}(x,y)=I_{\rm SCR}(x,y)-I_{\rm BG}(x,y) $$
(1)

As can be seen in Fig. 3b, the M-MSF can enhance the SCR, but there is a strong stripe visible on the horizon.

Fig. 3
figure 3

The basic spatial filtering procedures: a a target in the heterogeneous background is b filtered by Filter Layer 1. Filter Layer 2 image (d) is then obtained from directional background estimation shown in c. Note the improvement of the SCR of the target

After applying Filter Layer 1, we obtain an improved SCR image. The second spatial filter, Filter Layer 2, is directly applied to the result of Filter Layer 1. It is reasonable to estimate the background along the row direction for each row in a sea-based IRST. The target pixel values are regarded as outliers, whereas the background pixel values are regarded as inliers. The proposed directional background estimator (DBE, I DBE(xy)) is defined as

$$ \begin{aligned} I_{\rm DBE}(x,y)&=\hbox{median }\{I_{{\rm FL}1}(x-n,y),I_{{\rm FL}1}(x-n+1,y),\ldots,\\ &\quad I_{{\rm FL}1}(x,y),\ldots,I_{{\rm FL}1}(x+n-1,y),I_{{\rm FL}1}(x+n,y)\} \end{aligned} $$
(2)

where the tab size is 2n + 1.

We use 1D local median filter to handle the image tilt instead of using a whole row pixels. Fig. 3c demonstrates the directional background estimation from Filter Layer 2. The input of Filter Layer 2 is the output (I FL2(x,y)) of Filter Layer 1 from which directional background (I DBE(xy)) is estimated. Then, the output (I FL2(x,y)) of Filter Layer 2 is calculated using Eq. (3)

$$ I_{{\rm FL}2}(x,y)=I_{{\rm LF}1}(x,y)-I_{\rm DBE}(x,y) $$
(3)

Since the horizontal background is estimated and removed by Filter Layer 2, the clutter noise is reduced, leading to the enhancement of the SCR calculation, as seen in Fig. 3d.

The last step in the small target detection process is in deciding which pixels correspond to the target pixels. In this study, we used an adaptive hysteresis thresholding method. The first threshold is selected to be as low as possible to find the candidate target region. Then, the 8-nearest neighbor (8-NN)-based clustering method is utilized to group the detected pixels. The detection criteria are reduced to the SCR test problem as defined by

$$ \begin{array}{c} \hbox {A probed region is a target if}\\ \hbox{SCR}(x,y)=\frac{T_{\rm max}-\mu_{\rm BG}}{\sigma_{\rm BG}}>k,\\ \end{array} $$
(4)

where \(\mu_{\rm BG}\) and \(\sigma_{\rm BG}\) represent the average and standard deviation of the background region, respectively. T max denotes the maximal target signal in a target cell. k denotes the user defined parameter. As depicted in Fig. 4c, the probing region is divided into a target cell, guard cell, and background cell, according to the results of global low thresholding and clustering. Figure 4 summarizes the overall adaptive Hysteresis threshold procedures.

Fig. 4
figure 4

The adaptive hysteresis threshold-based target detection flow: a filtered image, b pre-threshold and 8-NN based clustering, c the SCR estimation, and d final detection

3 The proposed target discrimination system

This section explains the details of the proposed small target discrimination system. As shown in Fig. 5, the target discrimination system consists of a learning phase and a discrimination phase. In the learning phase, a training database (DB) is automatically prepared using the target detection algorithm and ground truth information. Given a set of target attribute feature, we conduct feature selection by the Area Under ROC Curve (AUC) metric-based forward selection. The classifiers are learned using the selected features. In the discrimination phase, the features are extracted by probing the target regions; the final target discrimination is performed by the learned classifier.

Fig. 5
figure 5

The overall flow of the target discrimination

3.1 Target feature extraction

Small infrared targets are usually small bright blobs of under 100 pixels; it is quite difficult to extract informative features from point-like target images. In this paper, we present two kinds of feature extraction approaches, the intensity-based and the region-based methods. In the intensity-based approach, we consider standard deviation, ranked-fill-ratio, and 2nd-order moment methods. In the region-based approach, we consider area, size ratio, rotational size variation, frequency energy, and average distance methods. In advance, we consider a filtered database to inspect the features.

Feature 1-standard deviation: The first feature is a simple standard deviation of the image intensity for a considered region, as defined by Eq. (5). I(i) denotes the intensity at ith pixel, N denotes the total number of pixels, and \(\mu\) is the average intensity.

$$ \sigma=\sqrt{\frac{\sum\nolimits_{i=1}^{N}(I(i)-\mu)^2}{N}} $$
(5)

Feature 2-ranked-fill-ratio: The second feature considers the ratio between the K brightest pixels and the total intensity as defined in Eq. (6). Targets usually have higher values than clutter, since targets have a hot spot on a cold background. Originally, this attribute used in radar society to reject natural-clutter false alarms [34]. It is first trial to adopt the ranked-fill-ratio to the infrared target detection problem.

$$ \eta=\frac{\sum\nolimits_{j \in K}I(j)}{\sum\nolimits_{i}I(i)} $$
(6)

Feature 3-2nd order moment: The third feature considers the 2nd image moment as defined in Eq. (7).

$$ m_{22}=\frac{\sum\nolimits_{x}\sum\nolimits_{y}(x-\mu_x)^2(y-\mu_y)^2I(x,y)}{\sum\nolimits_i\sum_j I(i,j)} $$
(7)

The following five features are basically extracted from target region:

Feature 4-area: In this feature, a black and white target region is obtained by applying Otsu’s method, which chooses the threshold to minimize the intraclass variance of the black and white pixels [15]. Given a gray image I(i), the segmented target region is denoted as R(i). This feature can be calculated by

$$ a=\sum_i R(i) $$
(8)

Feature 5-size ratio: The fifth feature considers the target size ratio. If we denote the target width as l W and target height as l H, then the ratio is defined as:

$$ S_{\rm ratio}=\frac{l_{\rm H}}{l_{\rm W}} $$
(9)

Feature 6-rotational size variation: The sixth feature is based on the rotational size profile (L(i)) as shown in Fig. 6. We propose the concept of the rotational size profile and variation in this paper. Given an intensity image (Fig. 6a), the target region is extracted by the Otsu’s thresholding method (Fig. 6b). By rotating the region, a target size profile is generated (Fig. 6c). Therefore, the rotational size profile reflects the target shape. If a small target has a circular blob, the profile is uniform; if it has a rectangular shape, as shown in the Fig. 6b, the profile is similar to cosine curve. We can quantify the rotational size profile using the standard deviation of the curve, as defined in Eq. (10).

$$ \sigma_L=\sqrt{\frac{\sum\nolimits_{i=1}^{N}(L(i)-\mu_L)^2}{N}} $$
(10)

Feature 7-frequency energy: The seventh feature regards frequency energy and is obtained by applying a fast Fourier transform (FFT) to the rotational size profile (L(i)):

$$ \begin{aligned} M(k)&=\hbox{FFT}(L(i)-\mu_L),\\ f_{\rm energy}&=\sum_{k=1}^{M}\frac{|M(k)|^2}{M} \end{aligned} $$
(11)

Feature 8-average distance: The last feature is average distance. If a region consists of N pixels and region center is \((\mu_x, \mu_y), \) we can calculate the average Euclidean distance using:

$$ d=\frac{\sum\nolimits_{i=1}^N \sqrt{(x(i)-\mu_x)^2+(y(i)-\mu_y)^2}}{N} $$
(12)
Fig. 6
figure 6

The rotational size profile extraction procedure: a the input test image, b binarization using Otsu’s method, and c the rotational target size profile

Until now, we presented eight kinds of target attributes. We can summarize that the feature 3 (ranked-fill-ratio) is the first adoption to infrared target detection problem and the feature 6 (rotational size variation) is proposed in this paper.

3.2 Area under ROC curve-based feature selection

In the previous section, we introduced eight kinds of target attribute features. It is critical to select the most discriminative features and to combine those features. Feature selection has been an active research area in machine learning community. The feature selection problem can be interpreted as choosing a subset of features that achieves the lowest error according to a certain allowed loss. So, we have to choose a class separation metric and a selection scheme. In this paper, we use the Area Under ROC Curve (AUC) to measure the overall discrimination performance [43]. There are three types of feature selection schemes such as filter, wrapper and embedded. Filter-based methods used indirect measuring of the quality of the selected feature such as feature correlation [4]. On the other hand, wrapper methods use sequential forward selection or backward elimination [25]. In embedded approaches, there are random forest, simulated annealing [36]. Because we have small size of target attributes, we choose the wrapper method, especially sequential forward selection. Figure 7 summarizes the proposed feature selection algorithm using the sequential forward scheme with AUC metric.

Fig. 7
figure 7

The proposed sequential forward feature selection algorithm with AUC metric

3.3 Machine learning-based target discrimination

We have discussed feature extraction methods and observed the feature distributions of target and clutter samples. The rest of the process is in the selection of the optimal classifier. In this study, we considered five kinds of classifiers: simple threshold, the Naïve Bayes classifier, a support vector machine (SVM), Adaboost, and Laplacian SVM.

Simple threshold: The first classification method under consideration is the threshold-based approach. If a feature value is larger than a pre-defined threshold, it is declared to be a target. We use this method to rank the target discrimination capability of an individual feature. We use the area of the receiver operating characteristic curve (AUC) and the equal error rate (EER) as performance measures.

Naïve Bayes classifier [10]: This classifier can be viewed as the maximum a posteriori probability classifier for a generative model. Targets and clutter can be modeled by their multi-modal distribution in the feature space. Practically, it is quite difficult to determine such distributions due to the high dimensionality and non-linearity. Since Naïve Bayes assumes independent feature measurements, it is relatively easy to determine the joint probability distribution through independent Gaussian distributions characterized by means (μ i ) and standard deviations (σ i ) of feature vector (x) as shown in Eqs. (13) and (14). P T denotes target distribution and P C denotes clutter distribution. \(\mathcal{N}_{\rm T}\) and \(\mathcal{N}_{\rm C}\) represent Gaussian distributions of target and clutter, respectively. In Eq. (15), the likelihood ratio (\(l(\mathbf{x})\)) of a probing region is defined as the ratio between the target distribution and the clutter distribution. If the ratio is larger than 1, it is declared as a target. It is well-known that the accuracy of the Naïve Bayes classification results is typically high [10].

$$ P_{\rm T}({{\mathbf{x}}})=\prod_{i=1}^{8}{\mathcal{N}}_{\rm T}({{\mathbf{x}}};\mu_i, \sigma_i) $$
(13)
$$ P_{\rm C}({{\mathbf{x}}})=\prod_{i=1}^{8}{\mathcal{N}}_{\rm C}({{\mathbf{x}}};\mu_i, \sigma_i) $$
(14)
$$ l({{\mathbf{x}}})=\frac{P_{\rm T}({{\mathbf{x}}})}{P_{\rm C}({{\mathbf{x}}})} $$
(15)

SVM: SVM is one of the popular classifiers, known for its learning capability of non-linear decision boundaries using a kernel recipe [51]. We regard the eight feature types as vectors and train the SVM using SVMlight software [20]. The intersection kernel is used as a distance metric, as derived in Eq. (16). The intersection kernel shows an efficient classification performance with the run time logarithmic in the number of support vectors [30].

$$ k({{\mathbf{x}}},{{\mathbf{z}}})=\sum_{i=1}^{n}min(x(i),z(i)) $$
(16)

Adaboost [48]: The SVM method considers multi-dimensional feature vectors and finds support vectors using a kernel recipe. Adaboost, on the other hand, uses simple weak classifiers (h i ) and the weighted sum of weak classifiers, which leads to a strong classifier as Eq. (17). In this study, the weak classifiers are just simple threshold-based binary decisions for the individual feature space.

$$ H_{\rm strong}({{\mathbf{x}}})=sign\left(\sum\limits_{i=1}^{N}\alpha_i h_i({{\mathbf{x}}})\right) $$
(17)

LapSVM [3, 31]: The Laplacian SVM (LapSVM) is the state-of-the art classifier in the semi-supervised learning problem. Basically, it needs manifold assumption that states that the marginal probability distribution underlying the data is supported on or near a low-dimensional manifold, and that the target function should change smoothly along the tangent direction. The LapSVM provides a natural out-of-sample extension, so that they can classify data that become available after the training process, without having to retrain the classifier or resort to various heuristics [3]. Recently, the LapSVM method has been improved the training time considerably [31]. In this paper, we use the open source code in the homepage of http://www.dii.unisi.it/melacci/lapsvmp/ for fare comparison.

4 The experiment results

4.1 The target and clutter database preparation

It is important to prepare a large enough data set to ensure successful learning. In this study, 136 real target images were collected using either a mid-wave infrared (MWIR) camera or a long-wave infrared (LWIR) camera. The target images were acquired by real airplanes, such as the KT-1, F-5, and F-16. The cloud clutter database was prepared using the detection algorithms introduced in the previous section. In Sect. 2, we summarized the small target detection flow. As shown in Fig. 4b, target sizes are determined by the 8-NN-based clustering after low level thresholding. The 8-NN clustering can estimate target width and height. The final region sizes are determined by considering both target sizes and guard cell sizes. Figure 8 shows examples of the target and clutter images. In addition, we also considered a filtered database, as shown in Fig. 9. We used these datasets in the following feature extraction subsection.

Fig. 8
figure 8

Examples of the target and clutter database without spatial filtering: a sample images of target chips, b sample images of clutter chips. The chips are automatically generated by the small target detection algorithm with ground truth information

Fig. 9
figure 9

Examples of the target and clutter database after spatial filtering: a sample images of filtered target chips, b sample images of filtered clutter chips. The chips are automatically generated by the small target detection algorithm with ground truth information

4.2 The target and clutter feature distribution observations

The intensity-based feature observations—standard deviation, ranked-fill-ratio, and 2nd order moment: Figure 10 summarizes the intensity-based feature observations. Figure 10a shows the standard deviation feature for each target and clutter. The target feature has similar or a slightly higher values. In terms of the statistical analysis, as shown in Fig. 10b, the distributions of the target and clutter overlap strongly, from which we can predict that this feature is not useful for target discrimination. Figure 10c shows the ranked-fill-ratio feature for the targets and clutter. The target feature has a relatively high value compared to the clutter feature. In terms of statistical analysis, as shown in Fig. 10d, the target distribution is located in the upper values and the clutter distribution is located in the lower values. Figure 10e shows the 2nd order moment feature for the targets and clutter. The clutter feature has a similar or slightly higher value compared to the target feature. In terms of statistical analysis, as shown in Fig. 10f, the target distribution is concentrated in the lower values but has a strong overlap with the clutter distribution.

Fig. 10
figure 10

The intensity feature observations: a the standard deviation feature values, b the pdf of the standard deviation feature, c the ranked-fill-ratio feature values, d the pdf of the ranked-fill-ratio, e the 2nd-order moment feature values, and f the pdf of the 2nd-order moment

Region-based feature observations—area and size ratio: Figure 11 summarizes the area and size ratio feature observations. Figure 11a displays the area feature of the target and clutter images. Targets usually have small area compared to clutter. Figure 11b shows the corresponding probability density functions (pdf). The target pdf has a narrow distribution, whereas the clutter pdf is dispersed. Figure 11c shows the size ratio observation. The targets have a size ratio around 1 and the clutter has low values, which means that the clutter usually has long striped patterns. Figure 11d represents corresponding pdf of the targets and clutter. The two distributions have different centers.

Fig. 11
figure 11

The region feature-area and size ratio observations: a the area feature values for the target and clutter samples, b the probability density function of the area feature for the targets and clutter, c the size ratio feature values, and d the probability density function of the size ratio feature for the targets and clutter

Region-based feature observations—rotational size variance, frequency energy, and average distance: Figure 12 summarizes the observations regarding the rotational size variance, frequency energy, and average distance features. Figure 12a shows the observed values of the rotational size variance feature for the target and clutter samples. Since small targets have a circular symmetry, the variance of the profile is very low. Figure 12b presents the corresponding pdfs of the targets and clutter. Their distributions are quite different. Figure 12c shows the observed values of the frequency energy feature for the target and clutter samples. The frequency energy of the target profile is quite low compared to that of the clutter profile. Figure 12d shows the corresponding target and clutter pdfs. Figure 12e shows the observed values of the average distance feature for the target and clutter samples. Since small targets have small areas, the average pixel distance is low compared to that of the clutter. Figure 12f shows the corresponding target and clutter pdfs. The two pdfs have quite different distributions, which are useful in target discrimination.

Fig. 12
figure 12

Region feature observations: a the rotational size variation feature values, b the pdfs of the rotational size variation feature, c the frequency energy feature values, d the pdfs of the frequency energy, e the average distance feature values, and f the pdfs of the average distance

According to the observations of the eight different features, we can deduce that Features 2, 4, 5, 6, 7, 8 will be useful for target discrimination. However, these features have been extracted from filtered images. If we use the raw images directly, what happens? Figure 13 shows the partial pdf results derived directly from the raw images. The target and clutter distributions overlap strongly. This problem originates from the unstable region extraction shown in Fig. 14. Targets usually are placed in front of various backgrounds. Therefore, if we apply Otsu’s method, incorrect regions are extracted, which leads to unstable feature extraction. According to these results, we can conclude that filtered image-based feature extraction is better than raw image-based feature extraction.

Fig. 13
figure 13

Target and clutter pdfs obtained directly from a raw image: a ranked-fill-ratio feature pdfs, b size ratio feature pdfs, c rotational size variance feature pdfs, and d average distance feature pdfs

Fig. 14
figure 14

The analysis for the cause of unstable feature extraction from a raw image: a the raw target image, b the 3D view of the target image, and c the region extraction by Otsu’s method

4.3 Evaluations of feature selection

According to the observations on the target attributes, individual feature has different distributions. Optimal feature set should be selected using the proposed feature selection scheme as shown in Fig. 7. We use the AUC metric in sequential forward feature selection. ROC curves are generated by Adaboost classifier. Before the feature selection process, selected scalar feature should be normalized to achieve stable and maximal classification performance [18]. Hsu said that we have to use the same method to scale both training and testing data. For example, suppose that we scaled the first attribute of training data from [3, 11] to [0, 1]. If the first attribute of testing data lies in the range [4, 12], we must scale the testing data to [1/8, 9/8] [18]. The [0, 1] normalization can be done by [x ― min]/[max ― min].

In the feature selection Step 1, the AUC value of each attribute should be evaluated. Fig. 15a represents the ROC curves of each feature generated by controlling by changing the threshold for each feature space. We can rank the individual features based on the AUCs as shown in Fig. 15b. According to the graph, the descending order of AUC rank is F8 > F2 > F4 > F6 > F5 > F7 > F1 > F3. Top four features are selected as a base feature vector. We denote it as FS1: F base = F[8 2 4 6]. The sequential forward algorithm adds a hypothesized next feature and compares the AUC values between the previous AUC and the current AUC. We use the Adaboost to obtain the AUC values by changing the bias given hypothesized feature vector. If the current AUC is larger than the previous one, the hypothesized feature is selected to the feature vector. Figure 16 shows the results of feature selection process. According to the graph, we achieved upgraded AUC values by sequentially adding F5, F7, F1, F3 to the base feature set, FS1. So, we can conclude that the selected 8 kinds of target attributes are useful to the target/clutter discrimination.

Fig. 15
figure 15

Results of feature ranking process: a ROC performance of the individual features using the simple threshold method, b AUC comparison from the ROC curves

Fig. 16
figure 16

Results of feature selection process: a ROC curves of feature selection types, b AUC comparison among selected feature set

4.4 The classifier evaluations

In this subsection, we compare four kinds of classifiers: Naïve Bayes classifier, SVM, Adaboost, and LapSVM. We randomly selected training samples and used the remaining samples for the test set. We use two kinds of performance measures such as AUC and detection rate (DR) at a fixed false alarm rate (FAR) to check the overall behavior and real operating performance, respectively (Table 1). Figure 17 shows the ROC curves and AUC results. According to the results, we found that the Adaboost showed the best overall performance. The rest rankings are Naïve Bayes, SVM, and LapSVM. However, we get different classifier performance results if we consider the real operating conditions such as highest detection rate at a fixed false alar rate. The highest DR is more important in target discrimination since true targets need to be detected. Table 2 shows the detection rate at a fixed FAR point. The LapSVM shows the best DR, then followed by Adaboost, SVM, and Naïve Bayes.

Fig. 17
figure 17

Results of classifier evaluation: a ROC curves, b corresponding AUC values

Table 1 The performance of the: (a) Naïve Bayes, (b) SVM, and (c) Adaboost classifiers in terms of detection rate (DR) and false alarm rate (FAR) at the specific operating point
Table 2 The performance evaluation results according to target detector and classifier on cluttered test images

4.5 Evaluation of target detectors and classifiers

Until now, we evaluated the discrimination performance according to the feature types and classifiers on cropped target chips and clutter chips. In this subsection, we applied the target detector and classifier on a real test set composed of cloud and ground clutter. The test set consists of 18 real images with real moving targets (F-16).

In the first evaluation, we compared spatial target detection filters, especially the proposed double layered filter (DLF) with the Top-hat filter (baseline method) [54]. The structural element is ‘ball type’ with size of ‘3 × 3’. Detection thresholds were controlled to produce the same detection rates. As shown in Table 2, two spatial filters have the same detection rate and similar false alarms per image.

In the second evaluation, we compared the effects of detectors on the classifier. We applied the Adaboost classifier to the spatial filter-based detection results. We used eight kinds of target attributes as a feature vector. According to the results, the proposed DLF with Adaboost showed much better performance than the Top-hat with Adaboost (baseline method) as shown in Table 2.

In the last evaluation, we compared classifiers using the same target detector (DLF). We applied LapSVM classifier to the DLF-based detection results. We used classifiers learned from the training chips with no biased thresholds. The DLF with LapSVM showed higher detection rate than the DLF with Adaboost combination as shown in Table 2. However, the former generated more false alarms than the latter. So, if we consider both detection rate and false alarm rate, we can conclude that both the Adaboost and LapSVM have comparable performance (Fig. 18).

4.6 Target discrimination performance on a test sequence

According to the results from the feature analysis and classifier tests, we conclude that the Adaboost classifier using the eight feature types is suitable for target discrimination, since it achieves the highest detection rate with a moderate false alarm rate. Therefore, we applied this target discriminator to a test sequence. The test sequence consisted of 156 1,280 × 1,024 frames. The number of synthetic targets was 1,478, generated using Kim et al.’s method [24]. The Adaboost was retrained by adding new targets and clutter to the previous database. We used five randomly selected frames. Table 3 summarizes the overall evaluation results with and without a discriminator in terms of detection rate and number of false alarms per frame. The target discriminator reduced the number of false alarms by a factor of 5.7 with just a 0.6% degradation in the detection rate. Figure 19 shows examples of the target discrimination effects. Note that false detections in the cloudy sky region are almost all removed by the proposed discriminator, while still maintaining target detection.

Table 3 The spatial detection performance with and without Adaboost-based target discrimination
Fig. 18
figure 18

Examples of target detection and discrimination according to different spatial filters and classifiers. DLF denotes double layered filter

Fig. 19
figure 19

The system performance a before applying target discrimination and b after applying target discrimination (Adaboost)

5 Conclusions

It is quite challenging to reduce false detections caused by clutter occurring in small infrared target detection due to the point-like target nature. This paper presented intensity- and region-based features and observed the target and clutter feature distributions. Except for the standard deviation and 2nd-order moment features, they show distinctive distributions. We also considered four kinds of classifiers: a simple thresholding method and three machine learning-based methods (Naïve Bayes, SVM, and Adaboost). According to the simple thresholding method, we can evaluate individual feature in terms of ROC curve, where the average distance feature shows the best performance. The Naïve Bayes generative learning method had the best false alarm rate; the Adaboost discriminative learning method had the best detection rate. The SVM results show a moderate detection rate with the worst false alarm rate. According to the results of the Adaboost-based target discrimination method on the test sequence, we achieved a false alarm reduction by a factor of 5.7 with only a 0.6 % degradation of the detection rate. In the future, we will conduct further evaluations on various databases in the search for a practical infrared search and track system.