Keywords

1 Introduction

Moving target detected and tracked by airborne radars, such as the synthetic aperture radar (SAR), especially the slow-moving target called in many literatures can be defined from two aspects. Firstly, in speed, the slow-moving target is called relatively to the rapid movement of the radar aircraft (100–300 m/s), so the common ground wheeled and tracked vehicles are all belong to the domain of slow-moving targets, but some other targets such as the hedgehopped cruise missiles are not in the domain. Secondly, in frequency spectrum, the slow-moving target is named relatively to the static targets. The ordinary airborne imaging radar system only considers the focus of the ground static targets. For the slow-moving targets, their echo signals fall into the main lobe clutter, and as their moving parameters are different from the static targets, the slow-moving targets are transformed into the defocusing and shifting information in radar images.

The echoes from slow-moving targets are confused with the main lobe clutter, so it is difficult to detect and locate these targets. But the above mentioned targets are the military ones such as tanks, fighting vehicles, which should be paid significant attention. In the ground moving target tracking field based on SAR, the early research is focused on ice-motion tracking in the sea. For example, Kwok [1] analysized the motion of iceberg in the sea area of Alaska based on SAR image series using a simple feature and region based tracking algorithm. Daida [2] presented an object-oriented feature tracking technique. Strozzi [3] proposed a density tracking technology for the glacier. Yang [4] studied the adaptive subspace filtering method and its application in moving target tracking. Kirubarajan [5] proposed a interactive multi-model filtering based tracking method, which overcomes the shortcoming of the traditional Kalman filtering method, for example, the Kalman filter is hard to be used for complex movement tracking.

In the field of ground target tracking using SAR image series, Xia [6] proposed a moving ground target detection method based on SAR multi-look image sequence tracking, in which an improved dynamic programming approach with directional constraints is used to track the moving target. Henke [7] presented a novel method for moving-target tracking using single-channel SAR with a large antenna beamwidth, the main technologies include subaperture SAR processing, image statistics, and multitarget unscented Kalman filtering. Gao [8] studied a detection and tracking algorithm for moving target using SAR Images with the particle filter based on an idea of Track-Before-Detect. These methods can improve the detection performance of the weak target, but it ignores the inherent information of target in high resolution SAR image, and doesn’t make full use of the size, the structure, the direction and the deeper features of the targets.

With the gradually improvement of SAR in imaging resolution and system integrated level, it becomes an important research direction on how to realize the multi-class slow-moving targets recognition and tracking based on the SAR imaging and the high resolution image series, especially based on the feature information of target. On the basis of large-scale scene high resolution airborne SAR image series, an efficient ground multi-class slow-moving targets classification and tracking method is realized in this paper using a combination recognition method of local multi-resolution analysis and multiple kernel classifier, also includes the tracking filter of UKF.

In the remainder of this paper, we go along through different sections organized as follows: in Sect. 2, we present a robust feature extraction and multiple target recognition method based on visual attention mechanism. Then a combining recognition and tracking method based on “what” and “where” pathways information processing mechanism is introduced in Sect. 3. In Sect. 4, several simulation experiments are carried out to testify the effectiveness of the method. Finally, we conclude in Sect. 5.

2 Robust Feature Extraction and Multiple Target Recognition Based on Visual Attention Mechanism

For SAR target recognition, two aspects must be concerned, firstly, there must have a robust and low-dimensional feature extraction method according to the image’s characters, especially for the SAR images with much speckle and complicated background. Secondly, a high efficient classifier should be used to realize the high precision classification for targets.

2.1 Robust Feature Extraction Based on Visual Attention Mechanism

For the ground slow-moving target classification and tracking in high resolution SAR images, without loss of generality, the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset is taken as the investigation object. The targets in the MSTAR chips have the following characters: the dataset are composed by chips with equal image size; there is only one target in a chip; the target lies in the center of the chip; the targets in the dataset have the same resolution, and are distributed around the centers of the chips with a certain angle. The image examples of the three classes of targets in the MSTAR dataset are shown in Fig. 1. In view of this, according to the method in reference [9], an image-based multilevel difference of Gaussian (DOG) like scale space is constructed based on the mechanism of receptive field model and the scale invariant feature transform (SIFT) method. Then, a 8-neighborhood orthogonal basis is designed, using which an image can be processed with a multi-level sampling filter, in addition, the features in eight directions and one low frequency filtering feature of the image can be achieved. The structure of the 8-neighborhood orthogonal basis is shown in Fig. 2.

Fig. 1.
figure 1

Image examples of the three classes of targets with BMP2, BTR70 and T72 in MSTAR dataset

Fig. 2.
figure 2

8-neighborhood orthogonal basis and its frequency spectrum

Using the sampling filter idea of the traditional wavelet, a local extension sampling method based on visual attention mechanism is presented, which extends from a local point to surroundings, and can guarantee the generated basis are right toward the local region. In this way, the same local characteristics have the similar projection coefficients onto the basis, which is beneficial to the feature description and target recognition. About the implementation of this method, an image can be processed with a multistage filtering beginning from a local point (for example, the point of interest) of the image by adopting a fast filter, namely the multi-resolution decomposition to an image. As a result, the DOG like space image of the original image can be obtained. Then, by directly sampling from the key pixels in each stage image, the local multi-resolution features of the original image can be rapidly acquired.

Aiming at the images in MSTAR dataset, the targets first can be detected using the constant false alarm rate (CFAR) method. In consideration of the targets only occupy the central location of the chips, for convenience, the central area with the size of 81 × 81 in each 128 × 128 chip is taken as the research target. Then, the obtained image is processed by a 4-level local multi-resolution decomposition. As for the feature extraction, we can directly choose the pixels in each level of the image as follows: in the highest level, the image size is 3 × 3, all the 9 pixels are choose as the 9-dimension feature; in the second level, the 8 image blocks with the size of 3 × 3 corresponding to the 8 peripheral pixels in the highest level are choose as the feature, so the feature dimension is 72; the feature extraction method in the third level is similar to the second level, the eight 3 × 3 image blocks corresponding to the block centers in the second level are selected, the feature dimension is also 72; in the fourth level, the peripheral 8 central pixels are directly selected, so the feature dimension is 8. So, for a given image, the total feature dimension is 9 + 72 + 72 + 8 = 161.

2.2 Multiple Kernel Classifier Designing

The fusion of kernels with multiple scales is a special condition of multiple kernel learning [10,11,12,13]. This kernel method has better flexibility, and can bring more completed scale choice than other method, such as the composite kernel method. In addition, with the wavelet theory and the multi-scale analysis theory continue to mature, the multiple kernel method gains good theory background by introducing the scale space.

The foundation of multi-scale kernel method is seeking for a set of kernel functions owning the multi-scale representation capability. Among the kernel functions being widely used, the Gaussian radial basis function (RBF) (1) is the most popular one, because of its general approximation ability, simultaneously, it is also a typical kernel can be multi-scaled.

$$ k(\varvec{x},\varvec{z}) = \exp ( - \frac{{\left\| {\varvec{x} - \varvec{z}} \right\|^{2} }}{{2\sigma^{2} }}) $$
(1)

Take the RBF kernel as the example, it can be multi-scaled as (2) (Suppose the generated kernels have the translation invariant).

$$ k(\frac{{\left\| {\varvec{x} - \varvec{z}} \right\|^{2} }}{{2\sigma_{1}^{2} }}), \ldots ,k(\frac{{\left\| {\varvec{x} - \varvec{z}} \right\|^{2} }}{{2\sigma_{m}^{2} }}) $$
(2)

where \( \sigma_{1} < \ldots < \sigma_{m} \). From (2), we can see when \( \sigma \) is small, the support vector classifier (SVC) using the RBF kernel can fit the samples which have drastic variability. And when \( \sigma \) is larger, the same classifier can well classify the samples with mild variability. So the multi-scale kernels can obtain better generalization. When it is implemented, the values of \( \sigma \) can be determined as (3) by borrowing the scale-variant rule of wavelet transformation.

$$ \sigma_{i} = 2^{i} \sigma ,\begin{array}{*{20}c} {} \\ \end{array} i = 0,1,2, \ldots $$
(3)

Utilizing the multi-scale kernel matrix fused from multiple scaled kernels, the discrimination of features and the classification accuracy can both be promoted than the simple kernel matrix in common support vector machine (SVM). For a 2-class classification problem, the decision function of the simple SVC is

$$ f(\varvec{x}) = \text{sgn} \left( {\sum\limits_{i = 1}^{n} {\alpha_{i} } y_{i} \left\langle {\phi (\varvec{x}),\phi (\varvec{x}_{i} )} \right\rangle + b} \right). $$
(4)

After substituting the kernel function, the function can be transformed as (5)

$$ f(\varvec{x}) = \text{sgn} \left( {\sum\limits_{i = 1}^{n} {\alpha_{i} } y_{i} K(\varvec{x},\varvec{x}_{i} ) + b} \right). $$
(5)

For a typical multiple kernel learning method with the convex combination of multiple kernels, the decision function of the SVC is

$$ f(\varvec{x}) = \text{sgn} \left( {\sum\limits_{j = 1}^{m} {\beta_{j} } \sum\limits_{i = 1}^{n} {\alpha_{i} } y_{i} \left\langle {\phi_{j} (\varvec{x}),\phi_{j} (\varvec{x}_{i} )} \right\rangle + b} \right), $$
(6)

namely,

$$ f(\varvec{x}) = \text{sgn} \left( {\sum\limits_{j = 1}^{m} {\beta_{j} } \sum\limits_{i = 1}^{n} {\alpha_{i} } y_{i} K_{j} (\varvec{x},\varvec{x}_{i} ) + b} \right). $$
(7)

On the other hand, it is also an effective approach for the improvement of target recognition accuracy if we synthesize the features having multi-resolution character and the multi-scale kernel functions. In this paper, the 4-level local multi-resolution decomposition and the 4-scale Gaussian kernels are synthesized, the scales of corresponding kernel functions are increased with 2 times. At the same time, the weights of kernels are determined by equal coefficients, namely \( \beta_{1} = \beta_{2} = \beta_{3} = \beta_{4} = 1/4 \), the schematic diagram is shown in Fig. 3.

Fig. 3.
figure 3

Synthesis schematic diagram of 4-level local multi-resolution feature and 4-scale Gaussian kernel

The target recognition procedure includes two stages which are the offline training and the online recognition. The offline training is mainly based on the MSTAR dataset. Firstly, the CFAR detection is carried out respectively for all the training target chips and we can get the target segmentation results. Then from the centers of the chips (namely the centers of targets), the 4-level local multi-resolution decompositions of the targets are executed, and then the feature vectors of every targets can be extracted. Using these multi-level feature vectors, the multiple kernel classifier can be trained.

The online recognition is based on the large-scale scene series image samples being acquired in real time. Firstly, the CFAR detection is done for every frame of large-scale scene image and targets are segmented. On this basis, the centers of gravity of the targets can be calculated, then from the centers of gravity, execute the 4-level local multi-resolution decomposition for the original image, and extract the feature vectors of targets. Finally, the feature vectors are fed into the multiple kernel classifier, and then the recognition result can be obtained.

3 Combining Recognition and Tracking Method Based on “What” and “Where” Pathways Information Processing Mechanism

Recent studies on human brain visual processing mechanism find that there are two main pathways in the vision system, the “what” pathway and the “where” pathway, which can form the feeling to target and the location in space respectively.

Based on the findings, to track the multiple targets in SAR images, the multi-class targets should first be classified, and then the trend of motion of different target can be continuously predicted. According to this idea, a combing recognition and tracking method is proposed using the “what” and “where” pathways information processing mechanism. Based on the recognition result and the corresponding relationship of targets in the adjacent frames, the targets’ motion parameters are estimated utilizing the UKF based on the “what” and “where” pathways information processing mechanism. As a result, the high performance tracking of multi-class slow-moving targets in complicated background is realized.

For the target tracking based on SAR images, the hypotheses are all difficult to be satisfied. So, the unscented Kalman filtering (UKF) algorithms [14] are introduced in this paper. These algorithms can effectively overcome the influences of the nonlinear dynamics and the non-Gaussian noise [15]. The algorithms also have lower computational complexity than the other methods, such as particle filtering, and the error of the algorithms only appears in the moments beyond the third rank, which can be easily accepted in practical applications. On the other hand, for multi-target tracking and location, the UKF shows great superiority than the other tracking methods.

Utilizing the above mentioned feature extraction method, multi-scale kernel classifier and target recognition system, we further study and design a target tracking system based on detection and recognition results aiming at the target image series. The flow diagram is shown in Fig. 4. The main ideas are as follows: firstly, get the image segmentation results (namely the regions of interest (ROI)) of the large scale scene image series in real time. Secondly, calculate and obtain the center of gravity of each ROI by the image binary conversion. On the one hand, the center of gravity is an important target parameter of KF and UKF tracking, through which the predicted coordinates of target can be gained after passing the filter, and it can be used for the next step prediction via a parameter correction. On the other hand, the center of gravity is also the reference point of image’s local multi-resolution decomposition and getting the feature vector. Thirdly, send the feature vectors into the multi-scale kernel classifier, and we can obtain the classification result, which can also be used to discriminate the false-alarm. As a result, the target type and coordinates are obtained, which is the basis of the target tracking.

Fig. 4.
figure 4

Flow diagram of target recognition and tracking

After the target recognition and location in each frame of image, the motion parameters of the target can be estimated by the tracking filter based on the corresponding relationship of the targets between the adjacent frames, and the parameters are continuously renewed using the practical measured values. Ultimately, we can gain the type and coordinate information of targets in real time, and realize the target indication in the frames.

4 Simulation Experiment and Result Analysis

4.1 Target Recognition Experiment

After the features of the testing set being extracted, the feature vectors are sent to the classifier and the recognition precision is outputted. To analyze the adaptability against noise, the speckle with mean 0 and variance 0.04 is added to the MSTAR testing set, the final recognition results are shown in Table 1.

Table 1. Single target ATR result under different speckle adding degree

From the experimental result, we can see that the fusion method with the multi-scale feature and the multi-scale kernel classifier gives a very high classification precision of 98.75% when there is not adding speckle. In addition, the algorithm realizes the fast access and storage to nearly 3000 SAR images in a short time, which indicates good real-time performance. Comparing with the traditional method, the presented algorithm has far more advanced in fast detection of target and the dimension of feature vectors.

When the speckle is added into the testing samples, the recognition precision is 93.41% while SAD = 1. With the enhancement of speckle, the recognition precisions are reduced to 87.62% and 76.48% when SAD = 2 and SAD = 3 respectively, which are still the preferably correct ratios. That is because there is only one target in each sample, and we already know the targets lie in the center of sample chips. So, even though the target structure changes, we still can do the local multi-scale decomposition from the center of sample and can extract the exactly proper features.

Further tests with the rotation and scale transformations of the targets are introduced based the large-scene SAR images. In the tests, 3 degrees of speckle are added into Image I and Image II. Then utilizing the same target segmentation, mathematical morphological processing with modulation of parameters, center of gravity calculation, we can achieve the target detection and marking result. Figure 5 shows the image target segmentation, detection and marking result under the SAD = 1. Then, begin from the center of gravity, execute the multi-scale decomposition, feature extraction, and gain the feature vectors. Finally, send the feature vectors into the multi-scale classifier, and output the recognition results.

Fig. 5.
figure 5

Multiple targets detection and marking result of the large-scene sample Image I with rotation and scale transformation

4.2 Target Tracking Experiment

The tracking experiments based on series images are carried out as the following steps. Firstly, segment the image frame and find the regions that might contain the targets. Secondly, after the target detection, local multi-resolution decomposition, and recognition in real time, eliminate the false targets and get the real targets. Thirdly, add the coordinate information of the targets into the tracking filter, measure and estimate the motion of all the targets. Fourthly, continuously record the target positions in each frame; ultimately realize the target UKF tracking and location with the center of gravity. The comparison experiments using the common Kalman filter and the UKF method are carried out. By drawing the tracking curves of targets, we can analysis the tracking performance of the algorithm.

Figure 6 shows the multi-class and multiple vehicle targets tracking results using UKF based on image series. From the recognition and tracking of every frame, we can see that the targets are almost been correct recognized except the beginning several frames, and the error-labeled targets and missing targets didn’t appear in the tracking process. At the same time, the estimated value of the target position could rapidly converge to the measured value (namely the true value) using the UKF. Figure 7 is the horizontal and vertical coordinates tracking curves with KF and UKF for target marked “2”. From the curves we can find that UKF method can more rapidly converge to the actual position of target than Kalman filter, the tracking result once again verifies the perfect convergence speed and tracking performance of UKF.

Fig. 6.
figure 6

Target UKF tracking result of image series II

Fig. 7.
figure 7

Horizontal and vertical coordinates tracking curves with KF and UKF for target marked “2”

4.3 Target Tracking Accuracy Analysis

Still utilizing the UKF tracking method, the location data and errors in 4 random frames of the two image series are recorded in Table 2. There are respectively 6 targets in the experiments. According to the statistics of the estimated values and true values on the coordinates of the target center of gravity, we can calculate the distances (namely the estimation error) of the two coordinate points. In the case of the image resolution has already been known, the absolute errors of target tracking and location can be figured out (the marks “-” in the tables mean the targets have not appeared or have not been detected in this frame).

Table 2. Target tracking error of image series (Random 4 frames)

From the data in the tables, we can see that the method has high location accuracy in the tracking and location process. Suppose the SAR image has a resolution of 0.5 m, the maximum value of location error is 1.58 m and the minimum value is 0.5 m in the image series I; in the image series II, the corresponding maximum value is 2.55 m, and the minimum value is 0, which shows the accuracy of tracking and location once again.

5 Conclusions

Aiming at the large-scale multi-target recognition and tracking demands, a multiple targets recognition and tracking method is systematically studied in this paper. Firstly, a robust feature extraction method based on a local multi-resolution decomposition is proposed. Then with the introduction of multiple kernel classifier, the multi-scale features and the multi-scale kernel method can be organically combined. Moreover, according to the recognition result and the corresponding relationship of targets in the adjacent frames, the targets’ motion parameters are estimated utilizing the unscented Kalman filter (UKF) based on the “what” and “where” pathways information processing mechanism. As a result, the high performance tracking of ground multi-class slow-moving targets in complicated background is realized. Simulation results with large-scale scene SAR image series show that the effectiveness of the given method. Besides the vehicle targets in SAR image series, the presented feature extraction, pattern classification and tracking filtering methods can also be used for some other moving target with the fixed structure, which possesses important significance and practical values in target detection, target positioning, real-time situation monitoring and damage effect evaluation.