Keywords

1 Introduction

Target recognition is one of the important research topics in computer vision, image processing and machine learning. In recent years, with the rapid development of infrared imaging technology in the field of remote sensing, infrared dim and small moving target detection has also become a research hotspot in image processing technology [1,2,3,4,5].

It is not easy to obtain shapes, textures and other features of the dim and small target. As a result, methods like moment invariant based on the geometric feature of the target, template matching based on the texture feature, mean shift based on the grayscale, probability and density of the target and other traditional extended target tracking methods are hard to guarantee the detection and tracking performance of weak and small targets [6]. Only the information of the target on the sequence image can be used to track and identify while detecting through the dynamic characteristics of the inter-frame correlation. Particle filtering has developed into a critical research direction of small target tracking algorithms for the better performance in the non-linear and non-Gaussian scenes [7, 8]. However, traditional particle filtering is prone to tracking missing under a single feature of the target under strong noise interference. Therefore, extracting multiple features of the target and integrating the particle filter helps to improve the stability of detection. For the detection of dim and small targets in the complex background of infrared sequence images, this paper first uses the multi-frame difference method to suppress the background and uses the grayscale and dynamic feature of small target to delete the false points. At the same time, considering the large amount of calculation and the problem of sample depletion in resampling of particle filter, the strategy of krill herd optimization algorithm to update particle weights is added to track the dim and small targets in infrared sequence images.

2 Algorithm Framework

In this paper, the algorithm is divided into four modules: preprocessing, gray feature extraction, motion feature extraction and improved particle filter algorithm optimized by krill herd. The framework of the entire algorithm is as follows (Fig. 1):

Fig. 1.
figure 1

Algorithm framework

The detail of the proposed algorithm is given in Sects. 2.12.4 below.

2.1 Preprocessing

It is no surprise that background suppression is an essential step in preprocessing to improve the detection accuracy of infrared dim targets and accurately separate the background, noise and target points. Common backgrounds can be divided into stationary backgrounds and non-stationary backgrounds. The pixels in the stationary background have strong spatial correlation. The false points are mainly composed of the noise of the detector, which will cause small changes in the background. They can be suppressed by spatial filtering. Pixels in a non-stationary background have a strong time-domain correlation so that the frame difference can be used before threshold segmentation to highlight the target and suppress the background.

Infrared images containing small targets are consist of targets, noise and background. The specific model is as follows:

$$ \begin{array}{*{20}c} {f\left( {x,y} \right) = f_{T} \left( {x,y} \right) + f_{B} \left( {x,y} \right) + n\left( {x,y} \right)} \\ \end{array} $$
(1)

Where \( f\left( {x,y} \right) \) is the infrared image, \( f_{T} \left( {x,y} \right) \), \( f_{B} \left( {x,y} \right) \), \( n\left( {x,y} \right) \) represent the target, background and noise [9].

Because of the long distance, the target usually appears like a dot, and the background is a large area of gentle variation with strong pixel correlation. The infrared radiation intensity of the target is generally higher than that of the surrounding background. Therefore, small targets and noises appear as isolated bright spots in infrared images, and the gray value is much larger than the pixels in their neighborhood.

Considering the non-stationary background of common infrared remote sensing images and the long distance of imaging, this article focuses on small targets moving at a constant speed under non-stationary backgrounds. This paper adopts the first few frames as the prediction background, and then segments the images to obtain suspicious targets.

Since the target occupies very few pixels in a single-frame image, the image of the previous k frames can be taken and averaged as the predicted background image. If the original image is \( f\left( {x,y,t_{n} } \right) \) and the predicted background image is \( f^{\prime}\left( {x,y,t_{n} } \right) \), then the predicted background image is:

$$ \begin{array}{*{20}c} {f^{\prime}\left( {x,y,t_{n} } \right) = \frac{1}{k}\mathop \sum \limits_{n = 1}^{k} f\left( {x,y,t_{n} } \right)} \\ \end{array} $$
(2)

The results show that k taking 10 can better balance the relationship between the amount of calculation and robustness.

The difference processing is made between the original image \( f\left( {x,y,t_{n} } \right) \) and the predicted background image \( f^{\prime}\left( {x,y,t_{n} } \right) \) to obtain the sequence difference image \( g\left( {x,y,t_{n} } \right) \) after the background is removed:

$$ \begin{array}{*{20}c} {g\left( {x,y,t_{n} } \right) = \left| {f\left( {x,y,t_{n} } \right) - f^{\prime}\left( {x,y,t_{n} } \right)} \right|} \\ \end{array} $$
(3)

The results of the preprocessing experiment are as follows:

Fig. 2.
figure 2

Preprocessing results.

Among them, (a), (b) and (c) in Fig. 2 are the three kinds of backgrounds in the video sequence, and (d), (e) and (f) are the background suppression results calculated by Eqs. (2) and (3). As can be seen from the figure, the algorithm can suppress the background well and highlight the target point using the gray and motion features of the targets.

2.2 Gray Feature Extraction

After predicting the background image, the obtained image can be subtracted with value T to obtain a binary image \( F\left( {x,y,t_{n} } \right) \), in which the non-zero gray value are the set of suspected target points:

$$ \begin{array}{*{20}c} {F\left( {x,y,t_{n} } \right) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {g\left( {x,y,t_{n} } \right) < T} \hfill \\ {g\left( {x,y,t_{n} } \right), } \hfill & {g\left( {x,y,t_{n} } \right) \ge T} \hfill \\ \end{array} } \right.} \\ \end{array} $$
(4)

Where \( F\left( {x,y,t_{n} } \right) \) is the binary map of the suspect target point set, \( {\text{g}}\left( {{\text{x}},{\text{y}},{\text{t}}_{\text{n}} } \right) \) is the sequence difference image after background removal, and T is the segmentation threshold selected according to experience.

According to the 4 neighborhoods traversal image \( F\left( {x,y,t_{n} } \right) \), we can find all the suspected target points with connected 4 domains. Cause the infrared weak small target spots are usually slightly larger than the noise points, we will arrange the connected domain according to the size and select top k largest connected domains, which will be used as candidate target points, and the others are eliminated as noise. At this point, we have obtained the grayscale features of the suspected target points with position information.

2.3 Motion Feature Extraction

According to the positional correlation of the movement of the small targets between frames, the distance of the candidate target points between frames can be used as the basis for discrimination. In this paper, the shortest Euclidean distance is used as the trajectory of the target points and the initial target position is determined from the candidate target points.

$$ \begin{array}{*{20}c} {D_{k}^{n} = \sqrt {\left| {c\left( {x_{k}^{n} } \right) - c\left( {x_{k}^{n - 1} } \right)} \right|^{2} + \left| {c\left( {y_{k}^{n} } \right) - c\left( {y_{k}^{n - 1} } \right)} \right|^{2} } } \\ \end{array} $$
(5)

Where \( D_{k}^{n} \) represents the distance of the kth candidate target point between the nth frame and the n−1th frame, and \( c\left( {x_{k}^{n} } \right) \) represents the centroid x coordinate of the kth candidate target point in the nth frame, \( c\left( {x_{k}^{n - 1} } \right) \) represents the centroid x coordinate of the k−1th candidate target point in the n−1th frame, \( c\left( {y_{k}^{n} } \right) \) represents the centroid y coordinate of the kth candidate target point in frame n−1, \( {\text{c}}\left( {{\text{y}}_{\text{k}}^{{{\text{n}} - 1}} } \right) \) represents the x coordinate of the centroid of the kth candidate target point in the n−1th frame. According to the experimental results of this paper, k takes 4.

Infrared dim targets move slowly on the imaging surface, and sometimes the distance between frames is less than one pixel. As a sequence of the random and messy appearance and no correlation between frames of noise, it is helpful for us to use motion distance of candidate target points between two adjacent frames as a criterion to effectively eliminate noise points.

2.4 Improved Particle Filter Algorithm Optimized by Krill Herd

Assuming that the target is moving at a constant speed, the state variable of the target is defined as \( S = \left\{ {x,y,v_{x} ,v_{y} } \right\} \), where \( \left( {x,y} \right) \) is the position of the center of the target, \( \left( {v_{x} ,v_{y} } \right) \) is the horizontal and vertical speed of the target. We define the state transition matrix \( F \) and random noise \( W_{k} \) as:

$$ \begin{array}{*{20}c} {F = \left[ {\begin{array}{*{20}c} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right],\;W_{k} = \left[ {\begin{array}{*{20}c} {S_{p} *rand} \\ {S_{p} *rand} \\ {S_{v} *rand} \\ {S_{v} *rand} \\ \end{array} } \right]} \\ \end{array} $$
(6)

Where \( S_{p} \) represents the status constant variable in the horizontal direction, and \( S_{v} \) represents the status constant variable in the vertical direction. According to the experiment, \( S_{p} \) takes 25 and \( S_{v} \) takes 5.

The steps of the improved particle filter algorithm optimized by krill herd are as follows (Fig. 3):

Fig. 3.
figure 3

The steps of the improved particle filter algorithm optimized by krill herd.

Step 1: Initialization

Establish the target motion model:

$$ \begin{array}{*{20}c} {X_{k} = FX_{k - 1} + W_{k} } \\ \end{array} $$
(7)

Where \( F \) is the state transition matrix, \( W_{k} \) is the random noise.

Sampling independent and identically distributed particles \( \left\{ {x_{k}^{i} ,\frac{1}{N}} \right\}_{i = 1}^{N} \) according to

$$ \begin{array}{*{20}c} {x_{k}^{i} \sim q\left( {x_{k}^{i} |x_{k - 1}^{i} ,z_{k} } \right)} \\ \end{array} $$
(8)

Set iterations \( I \) to 1, set krill group size \( N \), maximum induction speed \( N^{max} \), maximum foraging speed \( V_{f} \), maximum random diffusion speed \( D^{max} \), time interval \( \Delta t \), maximum number of iterations \( I_{max} \), the inertia weight of induced motion \( w_{n} \) and inertial weight \( w_{f} \) of foraging behavior.

Step 2: Prediction

Calculate the weight of the new particle set \( \left\{ {x_{k}^{i} ,w_{k}^{i} } \right\}_{i = 1}^{N} \) according to the current observed value \( z_{k} \):

$$ \begin{array}{*{20}c} {w_{k}^{i} = w_{k - 1}^{i} \frac{{p\left( {z_{k} |x_{k}^{i} } \right)p\left( {x_{k}^{i} |x_{k - 1}^{i} } \right)}}{{p\left( {x_{k}^{i} |x_{k - 1}^{i} ,z_{k} } \right)}}} \\ \end{array} $$
(9)

Use the state function of Eq. (7) to calculate the state value \( x_{k}^{i} \) of the particle at the next moment, then use Eq. (9) to calculate the weight of each particle. The particle state value \( x_{k}^{i} \) is used as the position of krill \( X_{i} \).

Step 3: Update

$$ \begin{array}{*{20}c} {N_{i}^{new} = N^{max} \alpha_{i} + w_{n} N_{i}^{old} } \\ \end{array} $$
(10)
$$ \begin{array}{*{20}c} {\alpha_{i} = \alpha_{i}^{local} + \alpha_{i}^{target} } \\ \end{array} $$
(11)

Where \( N \) indicates the induced motion speed, \( max \) indicates the maximum speed, \( \alpha_{i} \) indicates the individual induced direction, \( \alpha_{i}^{local} \) indicates the neighboring individual induced direction, \( \alpha_{i}^{target} \) indicates the optimal individual induced direction, \( {\text{w}}_{\text{n}} \, \in \,\left[ {0,1} \right] \) indicates the inertial weight of the original motion.

$$ \begin{array}{*{20}c} {F_{i}^{new} = V_{f} \beta_{i} + w_{f} F_{i}^{old} } \\ \end{array} $$
(12)
$$ \begin{array}{*{20}c} {\beta_{i} = \beta_{i}^{food} + \beta_{i}^{best} } \\ \end{array} $$
(13)

Where \( F_{i}^{new} \) is the speed vector of the foraging behavior of the ith krill, \( V_{f} \) is the speed of foraging behavior, \( \upbeta_{\text{i}}^{\text{food}} \) is the effect of food on the ith krill, and \( \beta_{i}^{best} \) is the optimal fitness of the ith krill, \( {\text{w}}_{\text{f}} \, \in \,\left[ {0,1} \right] \) is the inertia weight of foraging behavior, and \( F_{i}^{old} \) is the speed vector of the last foraging behavior.

$$ \begin{array}{*{20}c} {D_{i}^{new} = D^{max} \left( {1 - \frac{I}{{I_{max} }}} \right)\delta } \\ \end{array} $$
(14)

Where \( D^{max} \) is the maximum speed of random diffusion, \( I_{max} \) is the maximum number of iterations, and \( \delta \) is a directional vector with each variable obeying [−1, 1] uniform distribution.

$$ \begin{array}{*{20}c} {X_{i}^{l} = X_{i}^{l - 1} + \Delta t\left( {N_{i}^{new} + F_{i}^{new} + D_{i}^{new} } \right)} \\ \end{array} $$
(15)

Where \( \Delta t \) is the time interval.

Use Eqs. (10) (12) (14) to calculate the induced foraging and diffusion movement of each krill individual, and finally calculate the position change of each krill in each iteration according to Eq. (15).

Step 4: Cross operation

$$ \begin{array}{*{20}c} {C_{r} = 0.2\hat{K}_{i,best} } \\ \end{array} $$
(16)

Where \( C_{r} \) is the crossover probability [10,11,12], \( \hat{K}_{i,best} \) represents the best fitness of the ith krill.

Calculate the crossover probability according to Eq. (16) and perform crossover operation on individual krill groups.

Step 5: Determine the Termination Conditions

Whether the maximum number of iterations \( I_{max} \) is reached, otherwise step 3.

Step 6: Weight Normalization

$$ \begin{array}{*{20}c} {w_{k}^{i} = \frac{{w_{k}^{i} }}{{\mathop \sum \nolimits_{i = 1}^{{N_{s} }} w_{k}^{i} }}} \\ \end{array} $$
(17)
$$ \begin{array}{*{20}c} {\hat{x}_{k} = \mathop \sum \limits_{i = 1}^{N} w_{k}^{i} x_{k}^{i} } \\ \end{array} $$
(18)

Recalculate the weight of each particle, update the particle weight according to Eq. (9), normalize the particles according to Eq. (17), and finally output the estimated state value of each particle according to Eq. (18).

Step 7: Iteration

Repeat steps 2 to 6 until the prediction is completed.

3 Simulation Results and Analysis

Three sets of video sequences with different backgrounds were selected for the experiment to confirm the performance of the method in this paper. The first test video came from the detection and tracking data set of small and weak aircrafts with infrared images under the ground/air background of Hui Bingwei et al. [13], the simulation experiment was performed on the hardware platform of AMD Ryzen 5 4600H CPU 3.00 GHz and memory 16 GB, and the development environment is MatlabR2017b (Table 1).

Table 1. Properties of experimental scenes.

The experiment compares the classical particle filtering algorithm and the improved method in this paper from the tracking time per frame and error under three common backgrounds. Define the tracking error as the Euclidean distance between the target centroid in the tracking algorithm and the calibrated target centroid:

$$ \begin{array}{*{20}c} {error = \sqrt {\left( {x - x^{*} } \right)^{2} + \left( {y - y^{*} } \right)^{2} } } \\ \end{array} $$
(19)

The number of particles in the algorithm is 300.

Video sequence 1 is the background with ground. (a) and (b) in Fig. 4 are the tracking effects of the basic particle filtering algorithm and the improved particle filtering algorithm on the video, respectively, from the 30th, 120th, 180th, and 220th frames of the video. Figure 4(c) is the comparison of the single frame operation time of the two algorithms, and Fig. 4(d) is the comparison of the tracking errors of the two algorithms.

Fig. 4.
figure 4

Tracking results under background with ground in Experiment 1.

It can be seen from Fig. 4 (a1) (b1) that the basic particle filtering algorithm is disturbed and the centroid shifted during the first few frames of the video, and the particle filter algorithm optimized by krill herd can track the target more stably and has better robustness. It can be seen from Fig. 4(c) that adding particle optimization does not greatly extend the calculation time. The calculation time of the two algorithms is similar, and the basic particle filtering algorithm is slightly faster. It can be seen from Fig. 4(d) that the error of the optimized particle filter algorithm is smaller than that of the basic particle filter algorithm in the middle of the video.

Video sequence 2 is the sea-sky background. (a) and (b) in Fig. 5 are the tracking results of the basic particle filtering algorithm and the improved particle filtering algorithm on the video, respectively from the 30th, 120th, 180th, and 220th frames of the video. Figure 5(c) is the comparison of the single frame operation time of the two algorithms, and Fig. 5(d) is the comparison of the tracking errors of the two algorithms.

Fig. 5.
figure 5

Tracking results under background with sea and sky in Experiment 2.

As can be seen from Fig. 5(a1) (b1), the gray level of the background is similar in the video, so the target characteristics are more obvious, and both algorithms have good performance. Figure 5(c) displays that the calculation time of the two algorithms is similar, and the basic particle filtering algorithm is slightly faster. What’s more, Fig. 4(d) demonstrates that the error of the two algorithms are about the same in the front and the end part of the video, and optimized particle filter algorithm is smaller than that of the basic particle filter algorithm in the middle of the video.

Video sequence 3 is a background with clouds. (a) and (b) in Fig. 6 are the tracking results of basic particle filtering algorithm and improved particle filtering algorithm on the video, respectively from the 40th, 80th, 160th, and 200th frames of the video. Figure 6(c) is the comparison of the single frame operation time of the two algorithms, and Fig. 6(d) is the comparison of the tracking errors of the two algorithms.

Fig. 6.
figure 6

Tracking results under background with clouds in Experiment 3.

It can be seen from Fig. 6(a1) (b1) that in backgrounds where the grayscale is close to the target, the elementary particle filtering algorithm suffers some interference and there are cases of loss. The particle filter algorithm optimized by the krill herd can track the target more stable, and the robustness is better. In addition, Fig. 6(c) illustrates that the calculation time of the two algorithms is similar, and the basic particle filter algorithm is slightly faster. Furthermore, the particle filter algorithm with krill herd optimization is highly robust under the background with large interference in Fig. 6(d).

Considering that particle filtering is a probabilistic algorithm, the results of each experiment will be different, so we have run each experiment twelve times and the average of the results is taken. The calculation data of the two algorithms in three backgrounds is shown in the following Tables 2 and 3:

Table 2. Processing time of algorithm
Table 3. Rate of algorithm

4 Conclusions

The above experiments demonstrate that the preprocessing algorithm using grayscale and motion features proposed in this paper can effectively suppress background in complex environments and highlight the small dim target. Then the algorithm uses the particle filter algorithm based on krill herd optimization to track the target. As we can see, although part of the operation speed is sacrificed, the robustness and accuracy of tracking are effectively improved. As a result, the detection of infrared dim targets can be accurately achieved.