Keywords

1 Introduction

Kernel learning estimation (KLE) method, which is built based on kernel function and statistical learning theory, is a newly developing branch of pattern recognition and applied to the techniques of support vector machines (SVMs) originally [1, 2]. The kernel-based method maps the data from original space into a high-dimensional feature space by using nonlinear mapping function, transforming nonlinear problem into linear analysis in sample space. It is an effective method to deal with nonlinear pattern recognition problems. Also, the kernel-based method can provide a way of efficient calculation. By using the kernel function, it can hide the nonlinear mapping in a linear learning machine for synchronous calculation and replace complex inner product operation in high-dimensional space that provides a new way to solve the problem of high-computational complexity in high-dimensional space, avoiding the Curse of Dimensionality.

Driven by emerging technologies, techniques of target tracking are developing rapidly, related to many disciplines such as pattern recognition, dynamic systems and artificial intelligence. It has already achieved significant applications in intelligent decision-making [3], unmanned driving [4,5,6], biomedicine [7], human–computer interaction, reconnaissance and other application areas, and still has great development prospects. The problem of moving target tracking essentially is a typical nonlinear pattern recognition problem. The tracking model is established through the prior knowledge of moving target, followed by predicting the motion state or motion pattern of the target at the current time or frame. However, quality of the tracking model directly impacts on the tracking performances. Deviation of the tracking model, such as mismatching of the target motion state or unreasonable setting of the searching area, may lead to remarkable tracking errors or target losses.

The moving target tracking algorithm based on kernel learning method, by introducing nonlinear mapping structure with a learning mechanism, is independent of the specific model and can guarantee tracking quality under the nonlinear change of the target motion state, giving the tracking algorithm better adaptability to the environment.

This paper presents a review on the moving target tracking algorithms based on kernel learning methods, including discussions on the target detection algorithms, generative and discriminant target tracking algorithms using kernel methods with their research progress. For design and optimization of the kernel functions, multi-kernel learning method with typical structures of multiple kernel functions is also discussed. In the end, a prospect on research and applications of the kernel learning target tracking method is provided, with development trends in target-scale change, kernel optimization, accuracy of feature extraction and other critical problems.

The rest of the paper is organized as follows. Section 2 presents the fundamental principle of nonlinear kernel mapping. Kernel learning estimation with application to moving target tracking is discussed in detail in Sect. 3, where target detection, generative and discriminant tracking algorithms are shown, respectively. In Sect. 4, multiple kernel learning estimation is presented. Section 5 gives a discussion and prospect to the future research on kernel learning method, and Sect. 6 concludes the paper.

2 Nonlinear Kernel Mapping

The key idea of the kernel-based method is to map the data from the original space to a reproducing kernel Hilbert space (RKHS) through a fixed nonlinear mapping operator, performing data processing in a high-dimensional feature space.

Define the nonlinear mapping operator as kernel function \(\kappa\). For all \(x\) and \(x^{\prime}\), the kernel function satisfies

$$ \kappa \left( {x,x^{\prime}} \right) = \varphi \left( x \right),\varphi \left( {x^{\prime}} \right) $$
(1)

where \(\varphi \left( \cdot \right)\) is a feature vector transformed from original space to feature space.

When the kernel function is continuous, positive definite and symmetric, it is called Mercer kernel [8]. If the kernel function meets the following two conditions:

  1. (1)

    For any vector of \(x \in X\), \(\kappa \left( {x,z} \right)\), its function belongs to the vector space \(F\);

  2. (2)

    \(\kappa \left( {x,z} \right)\) is reproducible.

It is called a reproducible kernel function. A certain reproducible kernel function defines a reproducible Hilbert space. For a reproducible kernel function, the kernel \(\kappa \left( {x, \cdot } \right)\) composes the function

$$ g\left( \cdot \right) = \mathop \sum \limits_{i = 1}^{l} a_{i} \kappa \left( {c_{i} , \cdot } \right) $$
(2)

and for all \(i\),\(c_{i} \in X\), there is

$$ g,\kappa \left( {x, \cdot } \right) = \mathop \sum \limits_{i}^{l} a_{i} \kappa \left( {c_{i} ,x} \right) = h\left( x \right) $$
(3)

Through definition of Mercer kernel, the reproducible kernel can be expressed as

$$ \kappa \left( {x,x^{\prime}} \right) = \mathop \sum \limits_{i = 1}^{\infty } \varsigma_{i} \varphi_{i} \left( x \right)\varphi_{i} \left( {x^{\prime}} \right) $$
(4)

where \(\varsigma_{i}\) and \(\varphi_{i}\) are nonnegative feature value and vector, respectively. The mapping \(\varphi\) can be expressed as

$$ \left\{ {\begin{array}{*{20}c} {\varphi :X \to F} \\ {\varphi \left( x \right) = \left[ {\sqrt {\varsigma_{1} } \varphi \left( {x_{1} } \right),\sqrt {\varsigma_{2} } \varphi \left( {x_{2} } \right), \ldots } \right]} \\ \end{array} } \right. $$
(5)

For data \(u\) in the original input space, \(\varphi \left( u \right)\) can be obtained by mapping \(\varphi \left( \cdot \right)\) into the feature space. Figure 1 shows the mapping relationship of the data from the original space to the Hilbert space.

Fig. 1
An illustration of non linear mapping depicts, structure of u dot in a graph with 2 axes, on the left, is mapped to a structure, phi of u, in a graph with 3 axes. Mapping is represented by a directed line, from left to right, with a label phi of dot.

Schematic diagram of the nonlinear mapping \(\varphi \left( \cdot \right)\)

In Hilbert space, \(\varphi \left( \cdot \right)\) is the mapping from the original space to the feature space, and \(\varphi \left( x \right)\) is the mapped feature vector. For \(x,x^{\prime} \in X\), the kernel function satisfies

$$ \varphi (x)^{T} \varphi \left( {x^{\prime}} \right) = \kappa \left( {x,x^{\prime}} \right) $$
(6)

That is, when the mapping function \(\varphi \left( \cdot \right)\) is difficult to express explicitly, the calculation of the kernel function is used to replace the complex calculation to the inner product of the nonlinear mapping in the RKHS, thereby simplifying the calculation process and improving the calculation efficiency.

3 Moving Target Tracking Using Kernel Learning Estimation

Target tracking is a challenging branch in the field of intelligent systems. It aims to realize the recognition and decision-making of target motion modes. It has broad research and application prospects and has been applied in intelligent monitoring, unmanned driving, biological medicine, human–computer interaction, military and other fields. Kernel method is deeply applied in the target tracking technique due to its linearization, data driven and other advantages, and has achieved fruitful results.

Target tracking includes three main parts: appearance model, motion model and search strategy, involving image processing, data processing and machine learning. Considering that target detection is a prerequisite of target tracking, we describe the process of moving target tracking as the following general steps: detecting directive information of target motion, predicting target movement trend, and performing higher-level behavior analysis and decision-making for the randomly moving target.

3.1 Target Detection

The quality of target detection determines the performance of target tracking. Presently, target detection algorithms based on kernel methods mainly include three types of algorithms: kernel principal component analysis (KPCA), kernel Fisher linear discriminant (KFLD) and kernel independent component analysis (KICA).

Inspired by the kernel mapping theory, Scholkopf combined it with the principal component analysis (PCA) and proposed the KPCA [9] algorithm in 1998, allowing the kernel method break away from support vector machine to cooperate with other algorithms for the first time. That is of great significance for the kernel method been successfully applied to the field of target feature extraction and target detection.

Fisher linear discriminant (FLD), also known as linear discriminant analysis (LDA), was proposed by Mika [10] in 1999. The principle of it is to find a most suitable projection axis, so that the projection distance between all kinds of samples on this axis is as far as possible, and the projections of samples in each category are as compact as possible, intending to achieve the best classification effect. Also, Mika extended the FLD with kernel method and proposed the KFLD algorithm to solve the nonlinear problems. A feature space is established to judge the linear Fisher criterion for KFLD, which is helpful to obtain more accurate result for target feature extraction under nonlinear conditions.

In 2002, Bach, based on independent component analysis (ICA), proposed the KICA [11] algorithm, which further enriched kernel detection methods. KICA deals with functions in the kernel Hilbert space based on the entire nonlinear function space and uses kernel mapping technique to search the space with high efficiency. The use of function space enables it to adapt to a variety of data samples, thus making the algorithm more robust to time-variant sample distribution.

3.2 Generative Tracking

Generative tracking is a type of tracking method in which online learning is used to model the target feature and through the feature model to search and match and find the target position. A representative generative tracking method is mean shift (MS) [12].

3.2.1 Representative Algorithm: Mean Shift

During the initial stage of tracking, the MS algorithm needs to determine the search window independently to select the target, therefore, it is a semi-automatic tracking method. The algorithm calculates the histogram distribution of the initial frame search window and calculates the histogram distribution of the \(Nth\) frame window in the same way, making the search window to move along the direction of the maximum histogram density, and finally gets the position of the target. The standard steps are as follows:

  1. (1)

    Target model of the initial frame

Divide the feature space into multiple eigenvalues according to pixel color values. At the initial frame, the probability of the \(u\)-th eigenvalue in the search window of the initial frame is

$$ \tilde{q}_{u} = C\mathop \sum \limits_{i = 1}^{n} k\left( {\frac{{x_{0} - x_{i} }}{h}^{2} } \right)\delta \left[ {b\left( {x_{i} } \right) - u} \right] $$
(7)

where \(x_{0}\) and \(x_{i}\) are the central pixel coordinates of the \(N\) th frame search window and the coordinates of the \(i\)-th pixel, respectively; \(k\left( \cdot \right)\) is the kernel function, and \(h\) is the bandwidth of the kernel function.

  1. (2)

    Target model of the \(N\)-th frame

Calculate the probability of the \(u\)-th eigenvalue in search window of the \(N\)-th frame with

$$ \tilde{p}_{u} \left( y \right) = C_{h} \mathop \sum \limits_{i = 1}^{{n_{k} }} k\left( {\frac{{y_{0} - x_{i} }}{h}^{2} } \right)\delta \left[ {b\left( {x_{i} } \right) - u} \right] $$
(8)

where \(y_{0}\) is the coordinate of the pixel in the center of the search window.

  1. (3)

    Similarity function

Similarity function represents degree of similarity between the initial frame model and the \(N\)-th frame model. Define the function as

$$ \tilde{\rho }\left( y \right) \equiv \rho \left( {\tilde{\rho }\left( y \right),\tilde{q}} \right) = \mathop \sum \limits_{u = 1}^{m} \sqrt {\tilde{\rho }_{u} \left( y \right)\tilde{q}_{u} } $$
(9)
  1. (4)

    Mean shift vector

By maximizing the similarity function, the mean shift vector can be obtained as

$$ m_{h,G} \left( y \right) = y_{1} - y_{0} \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{k} }} x_{i} w_{i} g\left( {\frac{{\tilde{y}_{0} - x_{i} }}{h}^{2} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{{n_{k} }} w_{i} g\left( {\frac{{\tilde{y}_{0} - x_{i} }}{h}^{2} } \right)}} - y_{0} $$
(10)

in which \(y_{0}\) is the central coordinate of the search window and \(y_{1}\) is that of the found new search window.

3.2.2 Related Studies

The mean shift algorithm was first proposed by Fukunaga [12] in 1975. It was not widely concerned until 1995 when Cheng introduced the kernel function and weight coefficient [13] to the original MS and extended the MS theory. Comaniciu applied the MS algorithm to video image processing and realized smoothing, segmentation and tracking of visual objects [14,15,16,17].

Usually, the bandwidth of the kernel function in MS tracking algorithm is fixed, leading to large tracking errors when the target size varies. For this reason, based on scale space theory [18], Coliins proposed a scale variable MS tracking algorithm [19]. Sometimes, the color feature map in the MS algorithm cannot accurately describe the distribution of the target color in the image, leading to vacancy of the feature space. To solve this problem, the spatial color histogram is introduced into the target tracking model [20], combining the target color with the image color in the current frame to improve the accuracy of target recognition. Also, to improve the anti-occlusion ability of MS algorithm, filtering algorithms are integrated into the MS [21]. To track multi-target objects and human joints, confidence fusion propagation algorithm is introduced into the MS algorithm [22] for realizing tracking of single target, multi-target and human joints.

With good performance on processing time and robustness, MS algorithm has achieved rich results in the field of target tracking for several decades. However, in the generative tracking methods, environmental information is usually ignored so that the entire tracking process is quite easy to be disturbed. It yields a key problem of how to extract the target from background effectively that motivates the studies of discriminant tracking.

3.3 Discriminant Tracking

Discriminant tracking uses the correction theory of machine learning to model the target feature and background feature, respectively, taking the peak in the confidence map as target position and training corresponding filter to distinguish the target from background. It has two representative classes of kernel-based algorithms: circulant structure of tracking-by-detection with kernels (CSK) and kernelized correlation filters (KCF).

3.3.1 Representative Algorithm I: CSK

Correlation filter has a fast operation rate in frequency domain, and the tracking speed can reach hundreds of frames per second. It is an effective way to realize real-time moving target tracking. In 2010, Bolme [23] introduced correlation filtering into minimum output sum of square error (MOSSE ) algorithm, expanding target tracking from time domain to frequency domain. Based on the MOSSE, Henriques [24] proposed the CSK algorithm. It uses correlation theory of cyclic matrix and shift the training samples and candidate samples cyclically to increase the number of two samples, meanwhile, use the training samples to train the classifier and then use the trained classifier to detect target in the candidate region which is constructed by the candidate samples, and finally, take the maximum response of the correlation calculation as the target location. Kernel mapping function is adopted for the CSK to fasten the calculation speed.

To find the response position, CSK algorithm needs to transform from frequency domain to time domain, which will lead to marginal effect similar with the leakage phenomenon of aperiodic signals in systems. Moreover, application range of CSK algorithm is narrow, and it is only suitable for gray scale feature space. It lacks the establishment of target appearance model and is easily disturbed by complex task background and environmental noises. In order to establish the target appearance model more accurately, Danelljan et al. use the color name (CN) [25] based on the CSK algorithm to enhance the description of the appearance model, mapping three-dimensional color space into a high-dimensional CN space. In the way, the algorithm can more effectively establish the target appearance model and has a better tracking effect on the problem of target partial occlusion, light change, complex background, etc. However, due to the complexity of computing in a high-dimensional space, the calculation rate of CSK decreases after introducing three-dimensional color space.

When the target scale is changed or occluded, the CSK algorithm will degenerate obviously. On the basis of the algorithm, other target tracking algorithms based on correlation filter and their improved algorithms are proposed, which have achieved better tracking effect and tracking robustness than CSK algorithm.

3.3.2 Representative Algorithm II: KCF

To further improve tracking accuracy and tracking precision, Henriques proposed the kernelized correlation filters [26] in 2014. The KCF algorithm mainly includes four typical steps that are ridge regression [27], diagonalization of cyclic matrix, kernel correlation filtering and fast target detection.

3.3.2.1 Ridge Regression

For linear regression model, the objective function is

$$ f\left( x \right) = W^{T} X $$
(11)

Find the weight \(W_{i}\) in sample \(X\) to minimize the prediction of target \(Y_{i}\), that is

$$ \mathop {\text{min}}\limits_{w} \mathop \sum \limits_{i} \left( {f\left( {x_{i} } \right) - y_{i} } \right)^{2} + \lambda w^{2} $$
(12)

where \(\lambda\) is the parameter that controls over-fitting. Take the derivative of (12) and set it to zero, and then, the optimal solution is obtained as

$$ w = \left( {X^{H} X + \lambda I} \right)^{ - 1} X^{H} y $$
(13)

where \(X\) is the sample matrix, and \(y\) is the label.

3.3.2.2 Cyclic Matrix Calculation

Note \(X_{n \times 1} = \left\{ {x_{1} ,x_{2} , \ldots x_{n} } \right\}^{T}\) as the positive sample and generate the training sample (negative sample) by cyclic shifting to the positive sample. Use the negative sample to train the classifier, and one-dimensional operator P in cyclic shifting is

$$ P = \left[ {\begin{array}{*{20}c} 0 & 0 & 0 & \cdots & 1 \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \\ \end{array} } \right] $$
(14)

By multiplying the vector \(x\) constantly with the operator P in cyclic shifting, for one-dimensional image, a cyclic matrix \(C\left( x \right)\) with combinations of \(n\) vectors is obtained. For two-dimensional image, training sample of the two-dimensional image is obtained by cyclic shifting in the region of interest, and finally, the cyclic matrix of the two-dimensional matrix is obtained.

3.3.2.3 Kernel Correlation Filtering

To improve the classifier performance, map the original spatial data into a high-dimensional space with weight \(w\) that is a linear combination of input samples as

$$ w = \mathop \sum \limits_{i} \alpha_{i} \varphi \left( {x_{i} } \right) $$
(15)

By introducing \(\varphi^{T} \left( x \right)\varphi \left( {x^{\prime}} \right) = k\left( {x,x^{\prime}} \right)\) for kernelization, the optimal solution of (15) can be transformed into

$$ \alpha = \left( {K + \lambda I} \right)^{ - 1} y $$
(16)

where \(K\) is also a cyclic matrix. By using the properties of the cyclic matrix, and (16) can be rewritten as

$$ \hat{\alpha } = \frac{{\hat{y}}}{{\hat{k}^{xx} + \lambda }} $$
(17)

where \(k^{xx}\) is the first row of \(K\) and \(\hat{y}\) is the discrete Fourier transform of \(y\). In general, the classifier can be solved by calculating \(\hat{\alpha }\).

3.3.2.4 Fast Target Detection

Suppose that the target state of the previous frame is \(x\), and the input sample of the current frame is \(z\), and the kernel matrix \(K\) is introduced to obtain the response of the test sample, that is

$$ \hat{f}\left( z \right) = \hat{k}^{xz} \odot \hat{\alpha } $$
(18)

Performing inverse Fourier transform to (18) obtains the sample response. Take the maximum response value as the target position, and update the classifier parameters and the target model by

$$ \left\{ {\begin{array}{*{20}c} {x^{i} = \left( {1 - \theta } \right)x^{i - 1} + \theta z^{i} } \\ {\alpha^{i} = \left( {1 - \theta } \right)\alpha^{i - 1} + \theta \sigma^{i} } \\ \end{array} } \right. $$
(19)

where \(x^{i}\) is the target position model and \(\alpha^{i}\) is the classifier parameters predicted for the \(i\) frame, respectively; \(z^{i}\) and \(\sigma^{i}\) are the detected target position model and classifier parameters, respectively; \(\theta\) represents the learning rate.

3.3.3 Related Studies

Traditional target tracking algorithms, such as mean shift algorithm, support vector machine [28], Kalman filtering [29], particle algorithm based on Bayesian filtering [30, 31], tracking learning detection (TLD) [32] based on target characteristics, and compressive tracking (CT) [33] algorithm based on compression perception, appear deficiencies in tracking performance and in computation efficiency. KCF algorithm, absorbing advantages of CSK, shows high tracking precision and calculation speed with improved classifier training and sample detection.

To solve the problems of scale change, target occlusion and target loss in the process of target tracking, a tracking method with multi-scale correlation filters is proposed in [34], where two ridge regression models having strong plasticity and strong stability are adopted. The model with good plasticity tracks the target position to construct image pyramid, putting the position as the center. The model with good stability predicts the target-scale change, realizing multi-scale detection and tracking. In [35], a depth scaling kernelized correlation filtering (DSKCF) algorithm is developed, which extends the RGB tracking algorithm in KCF to form an RGB-D tracking algorithm by integrating depth features. The algorithm gives an improved solution for the problems of target-scale change and target occlusion. To deal with the marginal effect caused by cyclic shift in KCF algorithm, Danelljan introduced the spatial setting factor to constrain the weight of filters, alleviating the marginal effect at the expense of calculation complexity of the algorithm [36].

For many complex scenes, it is usually difficult for a single kernel to satisfy the requirements of various tracking performances, and multi-kernel fusion for the KCF algorithm provides an extension scheme.

4 Multiple Kernel Learning

Kernel function design is an important part of the kernel-based target tracking algorithm, which determines the tracking accuracy. To improve the tracking adaptivity to complex environments, fusion of multiple kernel functions is studied recently. Many works [37,38,39] have shown that multi-kernel model has better flexibility and robustness than model with single kernel. Presently, multi-kernel learning methods can be categorized into three main types, which are synthetic kernel, multi-scale kernel and infinite kernel.

Synthetic kernel method is the fundamental method of multi-kernel learning. One can use many ways to construct the synthetic kernel, such as multi-kernel linear combinatorial synthesis method, multi-kernel extended synthesis method, non-stationary multi-kernel learning, local multi-kernel learning and non-sparse multi-kernel learning. Due to simplicity of the synthetic kernel method, it has been applied in many practical problems, such as target eigenvalue extraction and processing [40, 41], classification [42,43,44,45], image segmentation [46] and system identification [47].

Synthetic kernel method generally uses linear combination of kernel functions to generate an integrated kernel function, which is questionable for desired tracking accuracy when facing unbalanced distribution of samples. Multi-scale kernel method introduces the concept of scale space and merges multi-scale kernel functions into a more flexible new kernel function. The multi-scale kernel method needs to train bandwidth for each kernel function. The kernel function with smaller bandwidth is used to track samples with drastic changes, and the function with larger bandwidth is to track samples with gentle changes. Using SVM regression or RKHS to extend the multi-scale kernel method is beneficial to improve the tracking accuracy. The multi-scale kernels are classified in [48], where a typical multi-scale kernel method is provided.

The aforementioned two kernel methods use limited number of kernel functions for linear combination or fusion. However, for large-scale data processing, the multi-kernel processing methods with limited number of kernels become hard to achieve desired tracking performance with optimal decision. Therefore, a trend of expanding from finite to infinite kernel functions for multi-kernel learning emerges recently, but there are few reports about its application in the field of target tracking.

5 Discussion and Prospect

Presently, target tracking method based on kernel learning estimation is developing fast. Compared with other types of tracking methods, kernel-based target tracking is free of nonlinear error and promising to be model-free and data-driven, however for future use, it still needs further research to satisfy requirements under practical conditions. The potential work is mainly reflected in the following aspects.

  1. (1)

    Reduce the dependency to tracking model.

    Traditional Bayesian tracking algorithms strongly depend on a prior math model and model parameters. Introducing kernel method into classical tracking algorithms can expand the algorithm to nonlinear applications precisely. With kernel function replacing inner product operation in sample space and target dynamics modeled by trained measurements, high-dimensional computing complexity and dependency on tracking model can be reduced greatly.

  2. (2)

    Improve the accuracy of feature extraction.

    The quality of feature extraction is the key factor affecting the efficiency of visual target tracking. Convolution features have advantages over artificial features, but the design of training network still needs further research. Multi-feature fusion is helpful to describe the target much more precisely, but the increasing computation complexity needs to be controlled to balance tracking accuracy and response speed.

  3. (3)

    Improve the balance of base-kernel design.

    Single kernel function is difficult to obtain satisfied performances in complicated tracking tasks. Multi-kernel learning method can optimize kernel function through combination of multiple base kernels and adjust weight coefficients and parameters of the base kernels to deal with complex application scenarios. However, multi-kernel learning method increases computational complexity. Learning efficiency is also necessary to be considered in algorithm design, and the balance of multi-kernel learning framework also needs for further exploration.

  4. (4)

    Improve the adaptability to environmental dynamics.

    Changes and occlusion of the target will decrease the tracking accuracy or lead to target missing. It is significant for tracking algorithm to maintain high precision of tracking when target shape or scale is changing, flipping or occluded. Trajectory prediction and target blocking are effective to improve environmental adaptability.

  5. (5)

    Improve the persistence of stable tracking.

    For long-term tracking of random-moving target, uncertainties have much more chance to appear, leading to weak robustness and decreasing tracking accuracy. The algorithms for short-term tracking are generally lack of framework from long-term tasks to construct continuity of stable tracking, which is meaningful to be addressed in further research.

6 Conclusion

Kernel learning estimation method gives a feasible bridge from linear to nonlinear estimation problems and forms a unified framework of solutions in pattern analysis fields. By applying nonlinear kernel mapping to the tracking problem and replacing complicated inner product operation by designed kernel functions, kernel-based target tracking method can achieve good performance in capabilities of nonlinear processing, data-driven and generalization. The method has significant research prospects and potential applications in decision intelligence, pattern recognition, autonomous unmanned vehicles and other information processing systems.