1 Introduction

In the air defense early warning system, infrared small target detection is an enduring research hot spot. Infrared small target detection consists of preprocessing and threshold segmentation.

There are many research results of preprocessing algorithms. Classic algorithms such as TOPHAT (Zhao et al. 2013; Bai and Liu 2017) and wavelet (Zhao et al. 2015) are continuously improved. Many interdisciplinary theories such as machine vision (Yang et al. 2015; Zhao et al. 2015), scale space theory (Dong et al. 2014) are also applied to infrared small target detection. Kim et al. (2009) applies the Laplacian of Gaussian (LOG) scale space to the extraction of small targets, models the small targets as Gaussian points, and extracts the targets by means of maximum fusion. Dong et al. (2014) filtered the image with difference of Gaussian (DOG) and simulated the visual attention mechanism through the Gaussian window to enhance the target. The LOG scale space theory can effectively enhance the weak targets of different scales. But it also increases the false alarms similar to small targets. This is a problem that many small target detections cannot overcome.

Using geometric features is an effective way to detect small targets in complex backgrounds. The local contrast measure (LCM) algorithm proposed by Chen et al. (2014) divides the adjacent rectangular region of the point of interest into nine squares, calculates the maximum of the contrast. It initially uses the geometric features on the small target neighborhood. Han et al. (2014) and Wei et al. (2016) extended the LCM algorithm by sub-blocks and multi-level models to achieve comprehensive improvement in speed, detection probability, and false alarm probability. However, LCM did not consider the case where the mean value of the neighborhood is 0, which makes its practicality greatly reduced. Chen proposed the average absolute difference maximum (AADM) algorithm (Chen et al. 2007) to divide the adjacent rectangular region of the point of interest into four quadrants, calculate the weighted mean respectively, and select the one with the minimum difference between the point and the quadrants mean. The algorithm ensures good processing for different brightness. Deng et al. (2016b) proposed three windows with different side lengths centered on the point of interest, calculate the variance of the pixels between two windows respectively, and enhance the target through the difference of the variance. Deng et al. (2016a) and Nasiri and Chehresa (2017) combines Average Gray Absolute Difference with local entropy to characterize the geometric features of the target. These methods make use of the difference in geometric features between the small target and the surrounding area, and mostly use the gray difference between the small target and the background clutter to measure the saliency of the image. However, these methods cannot distinguish points, lines and faces very well. Thus false alarms with higher gray level and gradient than the target are hard to exclude. Qi et al. (2013) utilizes the non-directionality of small targets, and uses four second-order directional derivative operators to calculate the saliency map of the image, which is good for targets enhancement. The method utilizes geometric features to obtain better detection results. However, it uses empirical threshold segmentation, which lacks practicality in dealing with specific images.

Threshold segmentation algorithm has fewer research results. Single threshold segmentation and two-dimensional threshold segmentation are the most common segmentation methods. OTSU and maximum entropy segmentation are commonly used in threshold calculations. However, it contains too many pixels in the foreground in small target detection, leaving a high false alarm probability. In most of the literature, the value of single threshold segmentation depends on the empirical value, which hinders the popularization of the detection algorithm.

The use of hyperspectral information for small target detection is also a promising research direction (Liu et al. 2018). Haskett et al. (1999) describes an automatic target detection algorithm based on the sequential multi-stage approach. DiPietro et al. (2010) evaluate the performance of detection algorithms for sub-pixel objects using a replacement signal model, where the spectral variability is modeled by multivariate normal distributions. Wang et al. (2016) analyzed the short-wave infrared characteristics of the aircraft. Hyperspectral images contain infrared and visible spectrum, and their fusion detection results are significantly improved over a single band.

The letter proposed a multiscale hysteresis threshold detection algorithm for the infrared image with complex background. First, filter the image by LOG operator in multiple scales, and extract the coordinates and scales of the points of interest through nonmaximum suppression. Then the local gradient second-order origin moment (LGSM) is used to determine whether there is an edge near the point of interest, and both the filtered image and LGSM feature are used to perform the hysteresis threshold detection. The double threshold required for small target detection is calculated based on the principle of maximizing the interclass variance. Experiments show that in complex background, the hysteresis threshold detection can effectively reduce the probability of false alarms under the premise of ensuring the probability of detection. It can also get the coordinates and scale of targets.

2 Extract the geometric features of points of interest

The infrared image of a tactical missile occupies a much smaller number of pixels than \(8\times 8\), which is consistent with SPIE’s definition of small targets (Chen et al. 2014). At the same time, small targets do not have obvious shape and texture features. Detecting small targets requires the extraction of the geometric features of the isolated points.

The LOG operator is not only a kernel function of scale space, but also a non-directional gradient operator. The LOG operator can be used to enhance the small target and blur small-scale clutter. However, this operator cannot distinguish small targets with geometric features of a point from clutters with geometric features of a line and a corner. In order to measure the geometric characteristics of the pixels to be inspected and determine whether the small targets are part of the clutter, this paper proposes the characteristics of local gradient second-order origin moments (LGSM). The geometrical characteristics of different targets were compared and analyzed.

2.1 Limitations of the LOG operator

The basic idea of scale space is to introduce a scale factor into the visual information (image information) processing model.

The Laplacian Gaussian (LOG) kernel, extracts the feature of small targets in the image. The LOG scale space of the image can be represented as a convolution of the original image with the multiscale LOG kernel functions as shown below.

$$\begin{aligned} LOG_{\sigma }= \frac{1}{\pi {\sigma ^4}}\left( 1-\frac{{{x}^{2}}+{{y}^{2}}}{2{{\sigma }^{2}}} \right) {{e}^{-{\left( {{x}^{2}}+{{y}^{2}} \right) }/{2{{\sigma }^{2}}}\;}} \end{aligned}$$
(1)
$$\begin{aligned} L_\sigma= LOG_\sigma *I \end{aligned}$$
(2)

The LOG operator is a product of the Laplacian operator and the Gaussian operator. The LOG operator blurs the clutter of the image with a smaller scale than the operators and sharpens the high gradient image with an equivalent scale. It is assumed that the ideal small target model is a two-dimensional Gaussian function. The more similar the point is to the operator, the higher the response value is in the corresponding LOG scale space. A detailed image whose scale is smaller than the scale of the operator will be suppressed.

Fig. 1
figure 1

Original image, LOG scale space response and LGSM response of ideal points of interest. The charts of a blob, a line and a corner with the step edges are shown in the upper part. The maximum gray level A and the radius \(R_{t}\) of the points of interest are marked. The charts of a point, a line and a corner with the Gaussian edges are shown in the lower part. The maximum gray level A and the standard deviation \(\sigma _t\) of the points of interest are also marked. In the original image, the red box marks the center pixel column of the points of interest. The scale space response of the pixel column on scales \(\sigma\) and the LGSM response on inner and outer diameters r are shown below the original image

The models of the point, the line and the corner are shown in Fig. 1. The ideal point model includes: a blob, a line and a corner with the step edges, and a point, a line and a corner with the Gaussian edges. We have adjusted the maximum gray level of each point so that its response on the scale \(\sigma =2\) is 0.1. The maximum gray level of the blob, the step edge line and the step edge corner are 0.5, 1.35 and 1. And the maximum gray level of the point, the Gaussian edge line and the Gaussian edge corner are 0.83, 1.11 and 2. Therefore, when the clutters have a similar or a higher gray level and a scale close to the small target, their scale space responses will be similar to the small targets. The geometric feature of the points of interest should to be extracted for more accurate detection.

2.2 Local gradient second-order origin moment

The key difference between a target and a false alarm point is the geometric feature on its adjacent area. The target point should have a higher gray level itself, and there should be no points with high gray levels or high gradients in the neighborhood. If a point is connected to the clutter, difference will occur between the low gradient region and the high gradient region generated by the clutters boundary. Points that are not adjacent to the clutter will not have this gradient difference. Therefore, we propose the indicator local gradient second-order origin moments(LGSM), which measures the gradient distribution on the target neighborhood.

LGSM measures the degree of dispersion of local gradient values. It is used to determine whether the point of interest is connected to the clutter. The indicator obtains the non-directional gradient value of the image by the LOG operator, and calculates the second-order origin moment of the gradient within the coverage. The LOG operator sharpens the edges by forming a peak on the side with high gray level, a valley on the side with low gray level, and 0 at the center of the edge. The difference in gray levels on both sides of the edge will be magnified significantly. And the degree of divergence of the gradient map is calculated by the second-order origin moment.

Fig. 2
figure 2

The calculation area for the LGSM. Data in the red annulus area are is used to calculate the LGSM of the point of interest

As Fig. 2 shows, for a small target of scale \(\sigma _i\) , the annulus area \(\varPsi \left( {{\sigma }_{i}},{{x}_{0}},{{y}_{0}} \right)\) takes the untested point \(\left( {{x}_{0}},{{y}_{0}} \right)\) as the center, \({{R}_{i}}=2{{\sigma }_{i}}\) as the inner radius, and \(3{{\sigma }_{i}}\) as the outer radius. \(\varPsi \left( {{\sigma }_{i}},{{x}_{0}},{{y}_{0}} \right)\) determines the surrounding area of a point without disturbing the target.

$$\begin{aligned} \varPsi \left( {{x}_{0}},{{y}_{0}} \right) =\left\{ \left. \left( x,y \right) \right| 4{{\sigma }^{2}}<{{\left( x-{{x}_{0}} \right) }^{2}}+{{\left( y-{{y}_{0}} \right) }^{2}}\le 16{{\sigma }^{2}} \right\} \end{aligned}$$
(3)

The LGSM is based on gradients on all scales. We use the normalized gradient image \(I^g\) to calculate LGSM, which combines image gradients at different scales by taking the maximum value.

$$I^{g} = \mathop {max}\limits_{i} \left[ {\frac{{I_{i} }}{{\mathop {\max }\limits_{{\left( {x,y} \right)}} {\mkern 1mu} \left( {I_{i} } \right) - \mathop {\max }\limits_{{\left( {x,y} \right)}} {\mkern 1mu} \left( {I_{i} } \right)}}} \right]$$
(4)

The second-order origin moment of all points on \(\varPsi \left( \sigma _i,x_0,y_0 \right)\) is the LGSM.

$$\begin{aligned} LGSM\left( {{x}_{0}},{{y}_{0}} \right) =\frac{ \sum \limits _{\left( x,y \right) \in \varPsi \left( {{\sigma }_{i}},{{x}_{0}},{{y}_{0}} \right) }{{{\left[ {{I}^{g}}\left( x,y \right) \right] }^{2}}}}{12\pi \sigma _{i}^{2}\left( {{x}_{0}},{{y}_{0}} \right) } \end{aligned}$$
(5)

In which, \(12\pi \sigma _i^2\left( x_0,y_0 \right)\) is the area of \(\varPsi\). In Sect. 2.1, we filtered the ideal points of interests by LOG operators. The LGSM of the ideal points are also shown in Fig. 1. The maximum of the scale space response of the blob is at its center, the corresponding LGSM is \(2.12\times 10^{-5}\). The LGSM of the point is \(6.13\times 10^{-6}\). While the response of the step edge line, the step edge corner, the Gaussian edge line and the gaussian edge corner are 0.046, 0.016, 0.013 and 0.019. Obviously, the LGSM shows significant distinguishing ability for point-like small targets and clutters. If there are edges or intense interference, the sharpening effect of the LOG operator will significantly amplify the LGSM.

3 Multiscale hysteresis threshold segmentation

The multiscale hysteresis threshold segmentation uses both geometric features and gray level to segment the image, and uses nonmaximum suppression to extract small target coordinates, so that the segmentation result only contains small targets. The main steps include multiscale LOG filtering, nonmaximum suppression, and gradient map hysteresis threshold segmentation.

3.1 Nonmaximum suppression

Threshold segmentation divides the image into foreground and background. In small target detection, the foreground consists of points of interest. After the segmentation, it is still necessary to extract the target center from the pixels occupied by the points of interest. With nonmaximum suppression, the gradient maximum point of the small target of a certain scale is approximated to the center of the target. And the pre-extraction of the target is completed directly before the segmentation, the number of foreground pixels obtained by segmentation is minimized. Coordinates and scales for the detection and further tracking and identification of targets.

The steps are as follow:

Step 1 Single-scale suppression. Image \({{I}_{i}}\) is the LOG scale space output of the original image I on scale \(\sigma _i\). Traverse \({{I}_{i}}\) to determine if any point is a maximum point in its certain neighborhood. \(i=1,2,\ldots ,N\), where N is the number of scales.

For the jth nonzero point \(\left( {{x}_{j}},{{y}_{j}} \right)\) on image \({{I}_{i}}\) on scale \({{\sigma }_{i}}\), create \(mask\left( {{R}_{i}},{{x}_{j}},{{y}_{j}} \right)\):

$$\begin{aligned} mask\left( R_i,x_j,y_j\right) = \left\{ \begin{array}{ll} 1 &{} when \; \left( x-x_j \right) ^2+\left( y-y_j \right) ^2\le R_i \\ 0 &{} when \; \left( x-x_j \right) ^2+\left( y-y_j \right) ^2>R_i \\ \end{array} \right. \end{aligned}$$
(6)

where \(\left( {{x}_{j}},{{y}_{j}} \right)\) is the central coordinate of the mask, and \({{R}_{i}}\) is the maximum radius of the detectable point on scale \({{\sigma }_{i}}\).

The suppression region \({{\varOmega }_{j}}\) is delimited by \(mask\left( {{R}_{i}},{{x}_{j}},{{y}_{j}} \right)\). Suppress the nonmaximum points using formula (7)

$$\begin{aligned} I_{i}^{assigned}\left( {{x}_{j}},{{y}_{j}} \right) =\left\{ \begin{array}{ll} 0 &{} \exists k,I_{i}^{{}}\left( {{x}_{j}},{{y}_{j}} \right) <I_{i,j}^{masked}\left( {{x}_{k}},{{y}_{k}} \right) \\ I_{i}^{{}}\left( {{x}_{j}},{{y}_{j}} \right) &{} \forall k,I_{i}^{{}}\left( {{x}_{j}},{{y}_{j}} \right) \ge I_{i,j}^{masked}\left( {{x}_{k}},{{y}_{k}} \right) \\ \end{array} \right. \end{aligned}$$
(7)

where \(I_{i,j}^{masked}={{I}_{i}}\cdot mask\left( {{R}_{\max }},{{x}_{j}},{{y}_{j}} \right)\).

Fig. 3
figure 3

Flow chart of single-scale nonmaximum suppression

Figure 3 shows the flow of single-scale suppression. The suppressed image \(I_{i}^{assigned}\) consists of a sequence of discrete nonzero points. The distance between any two is greater than the radius \(R_i\), which avoids multiple responses on the same scale.

Step 2 To obtain the points of interest, segment \(I_{i}^{assigned}\) into \(I_i^{TH}\) with threshold TH.

$$\begin{aligned} I_{i}^{TH}=\left\{ \begin{array}{ll} 1 &{} when\ I_{i}^{assigned}\ge TH \\ 0 &{} when\ I_{i}^{assigned}<TH \\ \end{array} \right. \end{aligned}$$
(8)

Step 3 Multiscale suppression. There may still be multiple detection phenomena among points on different scales. The shape and gray distribution of the targets are different from those of the Gaussian points. Peaks may appear on small scales. Determine if the points of interest on the smaller scale are within the radius of a larger scale point. If this is the case, consider the point as the texture detail inside the target and assign it to 0. After the fusion detection of multiscale points of interest, multiple responses can be eliminated, and each point of interest is determined by the central coordinates and scale.

If \(i<N^{scale}\), obtain the periodically global suppressed image \(I_i^{suppressed}\):

$$\begin{aligned} I_{i}^{suppressed}=\left\{ \begin{array}{ll} I_{i}^{TH}\cdot {{I}_{i}}\cdot \prod \limits _{h=i+1}^{{{N}^{scale}}}{\left[ 1-I_{h}^{TH}*s\left( {{\sigma }_{h}} \right) \right] } &{} when\ i<{{N}^{scale}} \\ I_{i}^{TH}\cdot {{I}_{i}} &{} when\ i={{N}^{scale}} \\ \end{array} \right. \end{aligned}$$
(9)

In which, \(s\left( \sigma _{i}\right)\) is a binary operator to shade a circular region from detection, and \(\left( *\right)\) represents the convolution operation.

$$\begin{aligned} s\left( {{\sigma }_{i}} \right) =\left\{ \begin{array}{ll} 1 &{} when\ {{x}^{2}}+{{y}^{2}}\le 2{{\sigma }_{i}} \\ 0 &{} when\ {{x}^{2}}+{{y}^{2}}>2{{\sigma }_{i}} \\ \end{array} \right. \end{aligned}$$
(10)
Fig. 4
figure 4

Flow chart of multiscale suppression

Figure 4 shows the flow of multiscale suppression. After the multiscale suppression, the output image sequence is \(I_i^{sup},i=1,2,\ldots ,N\). The discrete nonzero points in each image are corresponded to a point of interest on the relevant scale.

Although nonmaximum suppression can effectively solve the problem of multiple responses, the single-scale suppression contains 3 loops, which costs much in terms of time and computation. Therefore, the grid nonmaximum suppression is proposed. The steps are as follow:

Step 1 As shown in the Fig. 5, the image is meshed into rectangular cells by a length of \(l=2R_{max}/3\). Cells have a total of \(M^{cell}\) rows and \(N^{cell}\) columns. Every 4 adjacent cells are combined into a window. Windows have a total of \(\left( M^{cell}-1\right)\) rows and \(\left( N^{cell}-1\right)\) columns. Thus a cell will be contained in 4 windows.

Fig. 5
figure 5

Every 4 adjacent cells are combined to form a sliding window (blue shadow)

Step 2 Search for the maximum value in the window, and mark it on the image \(I_i^{mark}\).

Find the maximum point \(P_{m,n}\) in window of the mth row and nth column, whose coordinates are \(\left( l\left( m-1 \right) +{x_{m,n}},l\left( n-1 \right) +{y_{m,n}} \right)\), where \(x_{m,n}\), \(y_{m,n}\) are the coordinates of \(P_{m,n}\) inside the window. Increase the value

\(I_i^{mark}\left( l\left( m-1 \right) +{x_{m,n}},l\left( n-1 \right) +{y_{m,n}} \right)\) by 1. The initial value of \(I_i^{mark}\) is 0. Repeat the search in all windows.

Step 3 Points marked four times are used as the output of the nonmaximum suppression. The binarized image \(I_i^{mark-4}\) describes whether a point is the output of the suppression algorithm.

$$\begin{aligned} I_{i}^{mark-4}=\left\{ \begin{array}{ll} 1 &{} when\ I_{i}^{mark}=4 \\ 0 &{} others \\ \end{array} \right. \end{aligned}$$
(11)

The suppressed image \(I_i^assigned\) is shown as follows:

$$\begin{aligned} I_{i}^{assigned}={{I}_{i}}\cdot I_{i}^{mark-4} \end{aligned}$$
(12)

The optimized algorithm replaces the circular mask with a rectangle mask, decreasing the amount of computation. Given the number of all points in the image as S, the original algorithm has 3 circles, which requires approximately \(4\pi R_{max}^2S\) comparison operations. The optimized algorithm requires 4S comparison operations. Experiments on multiple images show that the average operation time is shortened from 2.866 to 0.2700 s, and the suppression effect is almost the same as the original algorithm.

3.2 Hysteresis threshold segmentation based on LGSM

Hysteresis threshold segmentation detects a target using two gray level thresholds and two LGSM thresholds. The segmentation intends to automatically select different thresholds to detect the points of interest. Blob-like points use a low threshold (loose threshold). While the points that may be connected to clutters use a high threshold (strict threshold). The points with a gray level higher than the high threshold is detected as the real ones. As for the points with a gray level between the two thresholds, we can determine whether the requirement of a low gray level and low gradient is satisfied by LGSM. A point satisfies the strcit LGSM threshold(low threshhold) will also be judged as a real target.

Therefore, the segmentation is expressed as follows:

Step 1 The filtered and suppressed image \(I_i^{sup}\) of scale \(\sigma _{i}\) should be segmented using the thresholds \(Th_i^{low\_gray}\) and \(Th_i^{high\_gray}\) \(\left( Th_i^{low\_gray}<Th_i^{high\_gray}\right)\). In formula (8) the threshold \(TH=Th_i^{low\_gray}\), which is used to determine whether a point is a point of interest for multiscale suppression. The results are shown as sets of points \(D_i^{loose\_gray}\) and \(D_i^{strict\_gray}\).

$$\begin{aligned} D_{i}^{loose\_gray}= \left\{ \left( x,y \right) |Th_{i}^{high\_gray}\ge {{I}_{i}}\left( x,y \right) >Th_{i}^{low\_gray} \right\} \end{aligned}$$
(13)
$$\begin{aligned} D_{i}^{strict\_gray}&= \left\{ \left( x,y \right) |{{I}_{i}}\left( x,y \right) >Th_{i}^{high\_gray} \right\} \end{aligned}$$
(14)

Step 2 Use the thresholds \(Th_i^{high\_LGSM}\) and \(Th_i^{low\_LGSM}\) \(\left( Th_i^{low\_LGSM}<Th_i^{high\_LGSM}\right)\) to segment the LGSM of the targets, and obtain sets of points \(D^{loose\_LGSM}\) and \(D^{strict\_LGSM}\).

$$\begin{aligned} {{D}^{loose\_LGSM}}= & {} \left\{ \left( x,y \right) |T{{h}^{high\_LGSM}}\ge LGSM\left( x,y \right) >T{{h}^{low\_LGSM}} \right\} \end{aligned}$$
(15)
$$\begin{aligned} {{D}^{strict\_LGSM}}= & {} \left\{ \left( x,y \right) |LGSM\left( x,y \right) \le T{{h}^{low\_LGSM}} \right\} \end{aligned}$$
(16)

Step 3 The results of the test are:

$$\begin{aligned} \left( {{D}^{loose\_LGSM}}\bigcap D_{i}^{strict\_gray} \right) \bigcup \left( {{D}^{strict\_LGSM}}\bigcap D_{i}^{loose\_gray} \right) \end{aligned}$$
(17)

The threshold should be adaptively based on the image. Maximizing interclass variance is the most common method for threshold calculation. However, it is meant for binarization of images, and small targets detection requires a higher threshold. We use the threshold that follows this principle as a low threshold and set a high threshold based on it:

$$\begin{aligned} Th_{i}^{low\_gray}&= \underset{T}{\mathop {\text {argmax}}}\,\left[ \frac{{{\omega }_{O}}\left( T \right) }{1-{{\omega }_{O}}\left( T \right) }\times {{\left( {{\mu }_{O}}\left( T \right) -\mu \left( T \right) \right) }^{2}} \right] \end{aligned}$$
(18)
$$\begin{aligned} Th_{i}^{high\_gray}&= \alpha \cdot \left[ \max \left( {{I}_{i}} \right) -Th_{i}^{low\_gray} \right] +Th_{i}^{low\_gray} \end{aligned}$$
(19)

In which \(\omega _O\) is the probabilities of the foreground, \(\mu _O\) is the expected value of the foreground, \(\mu\) is the expected value of the whole image, and \(\alpha\) is the adjustment parameter determined by experiments, which is taken as 0.4.

The LGSM itself can be viewed as a simplified variance on \(\varPsi\). \(\varPsi\) covers the possible edge clutter regions and nonclutter regions around the point. The nonclutter region occupies most of the entire image, while the edge clutter region occupies a smaller portion of the entire image. Therefore, the second-order origin moment of the entire image can be used as a low threshold for target detection.

$$\begin{aligned} T{{h}^{low\_LGSM}}=\frac{\sum \limits _{\left( x,y \right) \in \mathcal {W}}{{{\left[ {{I}^{g}}\left( x,y \right) \right] }^{2}}}}{S} \end{aligned}$$
(20)

In which, \(\mathcal {W}\) is the set of all the points in the image \(I_g\), and S is the number of all points in the image.

The optimal threshold for image \(I_g\) can be obtained by maximizing the interclass variance between the foreground and the background. It can be assumed that most of the foreground area is composed of edge clutter regions, and a small part is composed of nonclutter regions. Therefore, the intra-class second-order origin moment of the foreground can be used as the loose threshold of the target detection.

$$\begin{aligned} T{{h}^{high\_LGSM}}=\frac{\sum \limits _{\left( x,y \right) \in \mathcal {F}}{{{\left[ {{I}^{g}}\left( x,y \right) \right] }^{2}}}}{{{S}_{O}}} \end{aligned}$$
(21)

In which, \(\mathcal {F}\) is the set of points in the foreground, and \(S_O\) is the number of all points in the foreground.

Fig. 6
figure 6

Original Image, phased result (upper) and compact result (lower) of the proposed algorithm. The smaller box represents the detection result of the multiscale LOG operator, and the larger circle represents the test result of the LGSM. The red mark means that the strict condition is met, and the green mark means that only the loose condition is satisfied

To verify the performance of the algorithm, we performed a simulation on ideal targets and clutters. In the upper part of Fig. 6, the small target detection in the simple background is carried out. In the image, the left column shows the small targets with the step edges, the right column shows the small targets with Gaussian edges, all the targets are sorted by radius or standard deviation.

As Fig. 6 shows, targets larger than 8 × 8 have higher LGSM and will be excluded. The points of interest obtained by non-central suppression are also shown in the phased results, and each target corresponds to a unique detection point. The LOG operator has limited performance in enhancing the target, the detection probability of the small target under the condition of low signal to noise ratio(SNR) can be enhanced by other preprocessing algorithms.

The lower part of Fig. 6 is for small target detection in complex backgrounds, where the targets are the same as the upper part, and the complex background contains lines and corners of different edges and scales. The background contains lines and angles as clutter, and in the case of higher signal-to-clutter ratio(SCR), It is still able to detect small targets and eliminate the interference of clutter at the same time. Grayscale and LGSM are simultaneously used as the basis for segmentation, which effectively improves the performance of small target detection under complex background.

3.3 Experiment and analysis

In order to verify the research ideas and compare the image effects before and after restoration, the following experiments were designed and carried out. We uses a Long-wavelength infrared (LWIR) detector with a working band of 8–14 µm. The Noise-equivalent temperature difference (NETD) of the detector is 110 mK, and the Minimum resolvable temperature difference (MRTD) is 580 mK. The NETD is much smaller than the MRTD of the detector. Therefore, the system random noise will not affect the image quality.

Fig. 7
figure 7

phased results and final results of proposed algorithm. I, II, III, VI represent the sampling frames of 4 respective sequence. phased results show the phased detection results using dual threshold segmentation on the gray level and the LGSM. Final Results show the results of the hysteresis threshold segmentation

During the experiment, the drone hovered in the air, and the infrared lens was mounted on the ground. The height of the pan-tilt is relatively negligible relative to the height of the drone. The take-off position of the drone is 350 m to 500 m from the camera, and the hovering height of the vertical take-off is 100 m. Therefore, the linear distance between the drone and the infrared lens is about 364 to 509 m. According to the principle of camera imaging, it can be calculated that the size of the drone on the photosensitive element is about 0.1187 mm , less than 8 pixels, which meets the definition of small targets.

Four infrared image sequences are selected according to the type of interferences they contain, as well as the size and intensity of the small targets. The left column in Fig. 7 show one frame of each sequence. Images I and II contain the background of buildings and trees. Image III contains a small amount of background of buildings and trees. Image IV contains a small amount of cloud background and two targets, a strong one and a weak one. The phased results and final results are shown in Fig. 8. TOPHAT background suppression (Qi et al. 2013), NWIE (Deng et al. 2016b), AADM (Chen et al. 2007) and DOG (Han et al. 2014) algorithms are selected as the control group. The comparison of the experimental results of different algorithms are shown in Fig. 8.

In the phased results in Fig. 7, the smaller box represents the detection result of the multiscale LOG operator, and the larger circle represents the test result of the LGSM. The red mark means that the strict condition is met, and the green mark means that only the loose condition is satisfied. The column of Final Results show a compact representation of the results. The marked points satisfy the hysteresis threshold condition.

Images I, II and III contain the interference of strong edges including buildings and trees, and the proposed algorithm shows a strong ability to eliminate false alarms, false alarms within a strong edge cannot pass the LGSM threshold, yet real targets have lower LGSM. For the targets of low gray level, a loose threshold can guarantee that the targets be detected. Image IV contains interference of high gray level but weaker edges than the former ones, therefore the adaptive threshold of LGSM may be strict to fill. In this situation, points with higher gray level should be detected.

Fig. 8
figure 8

Small target detection results. I, II, III, VI represent the sampling frames of 4 respective sequences

In Fig. 8, the original image and results obtained using TOPHAT background suppression, NWIE, AADM, DOG and the proposed algorithm are presented. Some algorithms do not include a threshold value method. Therefore, the paper uses the maximum entropy method for segmentation.

The information of the image detection results is shown in Table. 1.

Table 1 Statistics of small target detection

In images I, II and III, TOPHAT background suppression, AADM, and LOG algorithms fail to detect target points, and some false alarm points are detected. The multiscale hysteresis threshold detection algorithm detects the target point without a false alarm in images I and II and detects the target point with a false alarm in image III. Image IV contains a highlighted cloud layer and two targets. Due to the existence of strong targets, TOPHAT and AADM fail to detect the weak targets by maximum entropy threshold segmentation. The hysteresis threshold algorithm detects both targets with 4 false alarms. The NWIE algorithm works poorly on images with a complex background, detecting false alarms in a cluster from images I, II and IV with no targets detected.

The proposed algorithm does cost more time than the TOPHAT and DOG methods, but is equal to the AAGM, a state-in-art algorithm.

Fig. 9
figure 9

ROC curve of the image sequence detection results. I, II, III, VI represent the sampling frames of 4 sequences

In Fig. 9, we changed the threshold of each detection algorithm and plotted the ROC curve of each image sequence according to the detection probability and false alarm probability. Since the NWIE algorithm works poorly in complex infrared images, the ROC curve is not calculated.

In the case of single threshold segmentation, the univariate algorithm makes the ROC curve a monotonically increasing curve. Since the proposed algorithm uses dual thresholds on two variables, when different thresholds are traversed, abnormal situations may occur when there is an inappropriate match between the thresholds. The same false alarm probability corresponds to different detection probabilities, or the high false alarm probability corresponds to a low detection probability. Therefore, the ROC of the proposed algorithm is not a monotonically increasing curve.

In I, II, and III, the ROC of the proposed algorithm is much higher than the other algorithms. In II and III, with the change of the threshold, the maximum probability of detection of the target by the proposed algorithm exceeds other algorithms. This finding means that the proposed algorithm can detect small targets that other algorithms cannot detect, and the detection ability for small targets is much higher than for the others.

In IV, when the false alarm probability is lower than 0.15, the proposed algorithm has no obvious advantage. However when the false alarm rate is high, the detection probability of the proposed algorithm is still much higher than other algorithms. The gradient change in the highlighted cloud background is small, making the LGSM adaptive threshold lower. The small target is more likely to be judged as a high LGSM point and be segmented using the high gray threshold. This results in a lower detection probability than the others. When the threshold values becomes loose, the target is more likely to be judged as a low LGSM point, and be segmented using the low gray threshold. This phenomenon leads to a rapid rise in the probability of detection. and explains why the maximum value of the detection probability is superior to other algorithms. Since the blob-like points use a low gray threshold for segmentation, the proposed algorithm can detect weaker targets that other algorithms cannot, and obtains a higher detection probability.

The multiscale hysteresis threshold detection algorithm has a significant suppression effect on complex backgrounds such as buildings, trees and highlighted clouds. Experiments show that the calculation time of the algorithm is comparable to the state-in-art algorithms. At the same time, the hysteresis threshold detection algorithm extracts the coordinates and scales and eliminates the multiple detection, which is beneficial for verification or tracking on the basis of the small target detection results.

4 Conclusion

The LGSM indicator is proposed to measure the geometric features in the neighborhood of the point of interest. Combined with the scale space theory and the LGSM, a multiscale hysteresis threshold detection algorithm is proposed to eliminate stubborn false alarms. We measure the local neighborhood gradient of the image on multiple scales, set the hysteresis thresholds, eliminate the false alarms from buildings, trees and clouds, and achieve the detection of a low gray level target. Experiments show that the algorithm works satisfactorily on removing false alarms, and maintains a high detection probability and a low false alarm probability in a complex background.