1 Introduction

Visual multiple target tracking is to find the locations and sizes of multiple targets and maintaining the identities of targets in video sequences. Tracking of multiple targets is very important for many real applications and high-level analysis such as video surveillance, robotics, and activity analysis. Because of the significant improvements in object detection techniques in the recent years [1, 2], many researches in visual multiple target tracking have focused on the tracking-by-detection framework which transforms the multiple target tracking task into a data association problem. The targets are detected in individual frames and then the measurements (detection responses) are progressively associated into long trajectories.

According to the differences of the data association algorithms, multiple target tracking approaches can be categorized into two classes: offline multi-target tracking and online multi-target tracking. Offline multi-target tracking solves the data association problem over large temporal windows or even the whole video sequence by global optimization [38]. By exploiting information from future frames, these offline multiple target tracking algorithms can handle detection failures caused by occlusions, and show high accuracy and robustness. However, optimizing over large temporal windows leads to a significant temporal delay between the measurements and tracking results. Therefore, the offline multiple target tracking algorithms cannot be applied for real-time tracking.

Online multiple target tracking algorithms, which only consider information up to the current frame and locate the targets in the current frame, are more suitable for time critical online applications. Online multiple target tracking algorithms sequentially grow trajectories with online-provided measurements. The particle filter [9] and the joint probabilistic data association filter [10] are two classical approaches which are wildly used in online multiple target tracking systems. However, such methods often suffer from the exponentially increasing complexity in the number of tracked targets. Alternatively, a greedy scheme [11] or the Hungarian algorithm [12, 13] can be used to solve the data association problem more efficiently. Although these online multiple target tracking approaches show improved performance, they tend to drift and to produce identity swap errors in complex scenes.

The main challenge for online visual multiple target tracking in unconstrained environments is that the measurements are often noisy and uncertain, such as missing detections caused by partial occlusions and false positive detections caused by changing illumination conditions. Thus, the resulting data association problem between the measurements and the targets in the online multiple target tracking systems is very difficult. Recently, some publications point out that utilizing fuzzy mathematics to model the uncertain information in the tracking system is helpful to improve the tracking performance [1418],e.g., Fuzzy quadrature particle filter [14], Intuitionistic fuzzy joint probabilistic data association filter [15], fuzzy Kalman filter [16], and fuzzy coding histogram [18]. Inspired from [14], a novel online visual multi-target tracking method based on intuitionistic fuzzy data association is proposed in this paper. The main contributions of this paper are summarized as follows:

  1. 1)

    A novel online visual multi-target tracking method which integrates intuitionistic fuzzy data association with track management is proposed. Unlike [14], the proposed method does not use the algorithm structure of the joint probabilistic data association filter. In addition, automatically and correctly initialize or terminate a target track are enabled using track management.

  2. 2)

    The association costs between targets and measurements are replaced by the intuitionistic fuzzy membership degrees which can be obtained by a modified intuitionistic fuzzy clustering algorithm. The association costs in previous methods do not consider the uncertain information of the possible association pairs leading to the unsatisfying tracking results in complex scenes. However, by using the intuitionistic fuzzy membership degrees, the proposed method can deal with the uncertain and noisy measurements more effectively leading to the improvement in tracking accuracy. Furthermore, the modified intuitionistic fuzzy clustering algorithm is based on the multi-cue-based distance measure and a new intuitionistic index.

  3. 3)

    A new intuitionistic index is defined based on two different fuzzy c-means clustering. The defined intuitionistic index effectively models the uncertainty of the possible association pair. Combining with the intuitionistic fuzzy point operator which can extract useful information from the intuitionistic index, the uncertainty of the data association is effectively reduced leading to better tracking performance. In addition, the model of the intuitionistic index which is based on two different fuzzy c-means clustering has never been reported.

The rest of this paper is organized as follows. The proposed online visual multiple target tracking method based on intuitionistic fuzzy data association is presented in Sect. 2. Experiments are shown in Sect. 3. Conclusions are presented in Sect. 4.

2 Proposed Online Visual Multiple Target Tracking Method

In this section, a novel online visual multiple target tracking method based on intuitionistic fuzzy data association is proposed. The framework of the proposed tracking method is shown in Fig. 1.

Fig. 1
figure 1

The framework of the proposed online visual multi-target tracking method

In Fig. 1, an object detector based on aggregated channel features [2] is first applied to produce the set of measurements \( {\{ z_{i} \}_{i = 1, \ldots ,r}}. \) Each measurement z i is represented by a bounding box around the target candidate region. Then, a frame-by-frame data association algorithm based on intuitionistic fuzzy sets is used to deal with the online-provided measurements which are imprecise and uncertain. The detailed derivation of the intuitionistic fuzzy sets-based data association algorithm is presented in Sect. 2.1. Through the intuitionistic fuzzy data association step, the set of correct association pairs \( \{ (o_{n} ,z_{k} )\}_{{n \in \{ 1,\, \ldots ,l\};\,k \in \{ 1,\, \ldots ,r\} }} \) is obtained. Then, the state of each existing target is updated and predicted by the Kalman filter. In the same time, track management is performed to enable automatically and correctly initialize or terminate an object track. The details of the track management are presented in Sect. 2.2. The overall online visual multi-target tracking algorithm is presented in Sect. 2.3.

2.1 Intuitionistic Fuzzy Data Association

Most previous online visual multi-target tracking approaches use Hungarian algorithm to solve the data association problem. In the Hungarian algorithm, the optimal association pairs are determined by minimizing the total association cost. However, the association costs in previous methods do not consider the uncertain information of the possible association pairs. In complex scenes, the measurements are often inaccurate, especially when many targets are close to each other. In this case, the association cost of an incorrect association pair might be lower than the association cost of the correct association pair. Therefore, the resulting association pairs are actually not the best association pairs. In order to deal with the noisy and uncertain measurements, intuitionistic fuzzy sets are used to model the uncertainty of the possible association pairs, and the association costs are replaced by the intuitionistic fuzzy membership degrees.

2.1.1 Intuitionistic Fuzzy Point Operator

In order to represent the imprecise and imperfect information, Atanassov [19] extends fuzzy set to intuitionistic fuzzy set. An intuitionistic fuzzy set A in X is defined as follows:

$$ A = \{ < x,\mu_{A} (x),\nu_{A} (x) > |x \in X\} $$
(1)

with the condition

$$ 0 \le \mu_{A} (x) + \nu_{A} (x) \le 1\;\;\forall x \in X $$

where \( \mu_{A} :X \to \left[ {0,1} \right] \) and \( \nu_{A} :X \to \left[ {0,1} \right] \), \( \mu_{A} \) represents the degree of membership, v A represents the degree of non-membership of element \( x \in X \) to the set A.

If

$$ \pi_{A} (x) = 1 - \mu_{A} (x) - \nu_{A} (x) $$
(2)

then \( \pi_{A} (x) \) is the degree of uncertainty of the membership of element \( x \in X \) to the set A, which is called the intuitionistic index (degree of hesitancy). If \( \pi_{A} (x) = 0 \), then the intuitionistic fuzzy set is reduced to the ordinary fuzzy set.

In order to mine useful information from the intuitionistic index, the intuitionistic fuzzy point operator proposed in [20] is applied. It is defined as follows:

Definition 1

For each element \( x \in X \), take \( \alpha_{x} ,\beta_{x} \in \left[ {0,1} \right] \), and \( \alpha_{x} + \beta_{x} \le 1 \), the intuitionistic fuzzy point operator \( F_{{\alpha_{x} ,\beta_{x} }} :IFS(X) \to IFS(X) \) is defined as follows:

$$ F_{{\alpha_{x} ,\beta_{x} }} (A) = \left\{ { < x,\mu_{A} (x) + \alpha_{x} \pi_{A} (x),\nu_{A} (x) + \beta_{x} \pi_{A} (x) > |x \in X} \right\}. $$
(3)

In [20], the author proves that the intuitionistic fuzzy point operator \( F_{{\alpha_{x} ,\beta_{x} }} \) has the following properties:

Theorem 1

Let \( A,B \in IFS(X),x \in X,\alpha_{x} ,\beta_{x} \in [0,1], \) and \( \alpha_{x} + \beta_{x} \le 1, \)

  1. (1)

    if \( A \in FS(X), \)then \( F_{{\alpha_{x} ,\beta_{x} }} (A) = A, \)

  2. (2)

    \( (F_{{\alpha_{x} ,\beta_{x} }} (A^{c} ))^{c} = F_{{\alpha_{x} ,\beta_{x} }} (A), \)

  3. (3)

    if \( \mu_{A} (x) + \nu_{A} (x) \ne 0 \) and \( \alpha_{x} = \frac{{\mu_{A} (x)}}{{\mu_{A} (x) + \nu_{A} (x)}},\,\beta_{x} = \frac{{\nu_{A} (x)}}{{\mu_{A} (x) + \nu_{A} (x)}} \) for each element \( x \in X, \) then \( F_{{\alpha_{x} ,\beta_{x} }} (A) = \left\{ { < x,\alpha_{x} ,\beta_{x} > |x \in X} \right\} \) and \( F_{{\alpha_{x} ,\beta_{x} }} (A) \in FS(X). \)

Obviously, the intuitionistic fuzzy point operator \( F_{{\alpha_{x} ,\beta_{x} }} \) transforms the intuitionistic fuzzy set A to a new intuitionistic fuzzy set \( F_{{\alpha_{x} ,\beta_{x} }} (A) \) with intuitionistic index \( \pi_{{F_{{\alpha_{x} ,\beta_{x} }} (A)}} (x): \)

$$ \pi_{{F_{{\alpha_{x} ,\beta_{x} }} (A)}} (x) = 1 - (\mu_{A} (x) + \alpha_{x} \pi_{A} (x)) - (\nu_{A} (x) + \beta_{x} \pi_{A} (x)) = (1 - \alpha_{x} - \beta_{x} )\pi_{A} (x) $$
(4)

Therefore, for any \( x \in X, \) there is \( \pi_{{F_{{\alpha_{x} ,\beta_{x} }} (A)}} (x) \le \pi_{A} (x). \) If \( A \in IFS(X), \) denotes \( F_{{\alpha_{x} ,\beta_{x} }}^{2} (A) = F_{{\alpha_{x} ,\beta_{x} }} (F_{{\alpha_{x} ,\beta_{x} }} (A)), \) then

$$ \begin{aligned} F_{{\alpha_{x} ,\beta_{x} }}^{2} (A) = & \{ < x,\mu_{A} (x) + \alpha_{x} \pi_{A} (x) + \alpha_{x} (1 - \alpha_{x} - \beta_{x} )\pi_{A} (x), \\ &\nu_{A} (x) + \beta_{x} \pi_{A} (x) + \beta_{x} (1 - \alpha_{x} - \beta_{x} )\pi_{A} (x) > |x \in X\} \\ \end{aligned} $$
(5)

and

$$ \begin{aligned} \pi_{{F_{{\alpha_{x} ,\beta_{x} }}^{2} (A)}} (x) = & 1 - [\mu_{A} (x) + \alpha_{x} \pi_{A} (x) + \alpha_{x} (1 - \alpha_{x} - \beta_{x} )\pi_{A} (x)] \\ &- [\nu_{A} (x) + \beta_{x} \pi_{A} (x) + \beta_{x} (1 - \alpha_{x} - \beta_{x} )\pi_{A} (x)] \\ = & (1 - \alpha_{x} - \beta_{x} )^{2} \pi_{A} (x) \\ \end{aligned}. $$
(6)

Generally, for any positive integer n, there are

$$ \begin{aligned} F_{{\alpha_{x} ,\beta_{x} }}^{n} (A) & = F_{{\alpha_{x} ,\beta_{x} }} (F_{{\alpha_{x} ,\beta_{x} }}^{n - 1} (A)) \\ & = \{ < x,\mu_{A} (x) + \alpha_{x} \pi_{A} (x)\frac{{1 - (1 - \alpha_{x} - \beta_{x} )^{n} }}{{\alpha_{x} + \beta_{x} }}, \\ & = \nu_{A} (x) + \beta_{x} \pi_{A} (x)\frac{{1 - (1 - \alpha_{x} - \beta_{x} )^{n} }}{{\alpha_{x} + \beta_{x} }} > |x \in X\} \\ \end{aligned} $$
(7)

and

$$ \pi_{{F_{{\alpha_{x} ,\beta_{x} }}^{n} (A)}} (x) = (1 - \alpha_{x} - \beta_{x} )^{n} \pi_{A} (x) $$
(8)

where \( \alpha_{x} + \beta_{x} \ne 0,\forall x \in X. \)

If \( x \in X, \alpha_{x} + \beta_{x} = 0\), namely \( \alpha_{x} = 0 \) and \( \beta_{x} = 0, \) then

$$ \begin{aligned} \mu_{{F_{{\alpha_{x} ,\beta_{x} }}^{n} (A)}} (x) = \mu_{A} (x) \hfill \\ \nu_{{F_{{\alpha_{x} ,\beta_{x} }}^{n} (A)}} (x) = \nu_{A} (x) \hfill \\ \end{aligned} $$
(9)

Theorem 2

Let \( A \in IFS(X), \) n is a positive integer. For each \( x \in X, \) \( \alpha_{x} ,\beta_{x} \in [0,1], \) and \( \alpha_{x} + \beta_{x} \le 1, \) then, \( \pi_{{F_{{\alpha_{x} ,\beta_{x} }}^{n} (A)}} (x) \le \pi_{{F_{{\alpha_{x} ,\beta_{x} }}^{n - 1} (A)}} (x) \le \pi_{A} (x),\forall x \in X. \)

Theorem 2 shows that the intuitionistic fuzzy point operator can reduce the degree of uncertainty of the membership. It indicates that the intuitionistic fuzzy point operator can mine useful information from the unknown information of the element.

2.1.2 Modified Intuitionistic Fuzzy Clustering

The modified intuitionistic fuzzy clustering algorithm combines the intuitionistic fuzzy point operator with a new intuitionistic index which is based on two different kinds of membership degrees. To calculate the membership degrees, fuzzy c-means clustering algorithm and its variants [2123] can be used. Overview of the modified intuitionistic fuzzy clustering algorithm is shown in Fig. 2. In Fig. 2, the red boxes represent the targets denoted by \( \{ o_{1} ,o_{2} ,o_{3} \} \), and the yellow boxes represent the measurements denoted by \( \{ z_{1} ,z_{2} ,z_{3} ,z_{4} ,z_{5} \} \). By using the modified intuitionistic fuzzy clustering algorithm, the intuitionistic fuzzy membership degree matrix \( [\eta ]_{3 \times 5} \) can be obtained which reflects the uncertainty of all possible association pairs.

Fig. 2
figure 2

Overview of the modified intuitionistic fuzzy clustering algorithm

Suppose that there are the sets of states of the targets \( {{O}} = \{ o_{1} , \ldots ,o_{l} \} \) predicted by Kalman filter and measurements \( {{Z}} = \{ z_{1} , \ldots ,z_{r} \} \) produced by the object detector in the current frame. First, the following objective function is minimized,

$$ \begin{aligned} & \mathop { \hbox{min} }\limits_{{\mathbf{U}}} {\kern 1pt} \left\{ {{J}_{m} ({\mathbf{U}}) = \sum\limits_{i = 1}^{l} {\sum\limits_{k = 1}^{r} {u_{ik}^{m} {g} (o_{i} ,z_{k} )} } } \right\},\;\;o_{i} \in {{O}},z_{k} \in {{Z}} \hfill \\ & \sum\limits_{i = 1}^{l} {u_{ik} = 1,\forall k,\;\;m = 2} \hfill \\ \end{aligned}. $$
(10)

It leads to

$$ \begin{aligned} & u_{ik} = \left[ {\sum\limits_{j = 1}^{l} {\left( {\frac{{{g} (o_{i} ,z_{k} )}}{{{g} (o_{j} ,z_{k} )}}} \right)^{{\frac{2}{m - 1}}} } } \right]^{ - 1} \hfill \\ & \quad v_{i} = \frac{{\sum\limits_{k = 1}^{r} {(u_{ik} )^{m} z_{k} } }}{{\sum\limits_{k = 1}^{r} {(u_{ik} )^{m} } }},i = 1, \ldots ,l \hfill \\ \end{aligned} $$
(11)

where u ik represents membership degree of the predicted state of the target o i to the measurement z k , v i represents the cluster center, and \( g( \cdot , \cdot ) \) represents the multi-cue-based distance measure defined by Eq. (21).

Then, the other objective function defined as follows is minimized,

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{{\mathbf{U^{\prime}}}}} {\kern 1pt} \left\{ {{J}_{m} ({\mathbf{U^{\prime}}}) = \sum\limits_{i = 1}^{l} {\sum\limits_{k = 1}^{r} {\left( {u^{\prime}_{ki} } \right)^{m} {g} (o_{i} ,z_{k} )} } } \right\},\;\;o_{i} \in {{O}},z_{k} \in {{Z}} \hfill \\ &\quad \sum\limits_{k = 1}^{r} {u^{\prime}_{ki} = 1,\forall i,\;\;m = 2} \end{aligned} $$
(12)

It leads to

$$ u^{\prime}_{ki} = \left[ {\sum\limits_{j = 1}^{r} {\left( {\frac{{{g} (o_{i} ,z_{k} )}}{{{g} (o_{i} ,z_{j} )}}} \right)^{{\frac{2}{m - 1}}} } } \right]^{ - 1} $$
(13)

where \( u^{\prime}_{ki} \) represents membership degree of the measurement z k to the predicted state of the target o i , and \( g( \cdot , \cdot ) \) represents the multi-cue-based distance measure defined by Eq. (21). Note that the constraints in Eq. (10) and Eq. (12) are different.

In order to incorporate intuitionistic fuzzy properties into the fuzzy membership degrees, the intuitionistic fuzzy point operator is applied. Given u ik and \( u^{\prime}_{ki} \) calculated with Eq. (11) and Eq. (13) respectively, a new intuitionistic fuzzy membership degree is defined as follows:

$$ \eta_{ik} = u_{ik} \times u^{\prime}_{ki} + \alpha_{{z_{k} }} \pi_{{o_{i} }} (z_{k} )\frac{{1 - (1 - \alpha_{{z_{k} }} - \beta_{{z_{k} }} )^{n} }}{{\alpha_{{z_{k} }} + \beta_{{z_{k} }} }} $$
(14)

where \( \eta_{ik} \) denotes the intuitionistic fuzzy membership degree of measurement z k belonging to the target i, \( \alpha_{{z_{k} }} \) and \( \beta_{{z_{k} }} \) denote the scale factor of the membership information and non-membership information in intuitionistic index \( \pi_{{o_{i} }} (z_{k} ) \), respectively.

The definition of the intuitionistic index \( \pi_{{o_{i} }} (z_{k} ) \) should reflect the uncertainty of the membership degree of the measurement z k to the target i. Therefore, in the modified intuitionistic fuzzy clustering algorithm, the intuitionistic index is defined as follows:

$$ \pi_{{o_{i} }} (z_{k} ) = \left| {u_{ik} - u^{\prime}_{ki} } \right| $$
(15)

The larger the difference between the membership degree of measurement z k to target i and the membership degree of target i to measurement z k , the more uncertain information the intuitionistic index contains. The choice of the parameters of \( \alpha_{{z_{k} }} \) and \( \beta_{{z_{k} }} \) depends on the problem. In this paper, they are set to \( \alpha_{{z_{k} }} = u_{ik} \times u^{\prime}_{ki} \) and \( \beta_{{z_{k} }} = 1 - \alpha_{{z_{k} }} - \pi_{{o_{i} }} (z_{k} ) \), respectively. Equation (14) indicates that the intuitionistic fuzzy membership degree \( \eta_{ik} \) includes additional useful information extracted from the intuitionistic index, which also means that the uncertainty for assigning measurement z k to target i is reduced.

2.1.3 Multi-cue-Based Distance Measure

In complex scenes, some targets are difficult to be discriminated by only relying on the appearance information because of their similar appearances. Therefore, in the modified intuitionistic fuzzy clustering algorithm, the information of multiple cues including appearance information, shape information, and motion information is exploited to construct the distance measure. First, the affinities between the states of the targets and the measurements based on five different cues which compensate each other to some extent are defined as follows.

  1. 1)

    The affinity between the predicted state of the target o and the measurement z in terms of spatial proximity is defined as follows:

    $$ f_{1} (o,z) = \exp \left( { - \frac{{\left\| {(x_{o} ,y_{o} ) - (x_{z} ,y_{z} )} \right\|_{2}^{2} }}{{2\sigma_{1}^{2} h_{o} }}} \right) $$
    (16)

    where x o represents the x-coordinate of the center of the predicted state of the target o and y o represents the y-coordinate of the center of the predicted state of the target o; x z represents the x-coordinate of the center of the measurement z and y z represents the y-coordinate of the center of the measurement z; h o represents the height of the target o; parameter \( \sigma_{1}^{2} \) represents the variance for the Euclidean distance between the two centers and is set to \( \sigma_{1}^{2} = 3. \)

  2. 2)

    The affinity between the predicted state of the target o and the measurement z in terms of shape is defined as follows:

    $$ f_{2} (o,z) = \exp \left( { - \frac{{(h_{o} - h_{z} )^{2} }}{{2\sigma_{2}^{2} \cdot (h_{o} )^{2} }}} \right) $$
    (17)

    where h o represents the height of the target o and h z represents the height of the measurement z, parameter \( \sigma_{2}^{2} \) represents the variance for the height and is set to \( \sigma_{2}^{2} = 3 \). Compared with the height information, the widths of the targets are less distinguishable. Therefore, the definition of affinity in terms of shape only relies on the height information.

  3. 3)

    It is assumed that the motion direction of the target does not change a lot between consecutive frames. The constant velocity motion model is used to model the motion of each target. Thus, the affinity between the state of the target o, which is predicted by the Kalman filter, and the measurement z in terms of velocity direction is defined as follows:

    $$ f_{3} (o,z) = \exp\left( {\frac{{|\arctan ((y_{z}^{t} - y_{o}^{t - 1} )/(x_{z}^{t} - x_{o}^{t - 1} )) - arctan(v_{{y_{o} }}^{t - 1} /v_{{x_{o} }}^{t - 1} )|}}{{2\sigma_{3}^{2} }}} \right) $$
    (18)

    where \( y_{z}^{t} \) represents the y-coordinate of the center of the measurement z, and \( x_{z}^{t} \) represents the x-coordinate of the center of the measurement z in the current frame; \( y_{o}^{t - 1} \) represents the y-coordinate of the center of the target o, and \( x_{o}^{t - 1} \) represents the x-coordinate of the center of the target o estimated by the Kalman filter in the previous frame; \( v_{{y_{o} }}^{t - 1} \) represents the velocity of the target o along with the y-axis, and \( v_{{x_{o} }}^{t - 1} \) represents the velocity of the target o along with the x-axis estimated by the Kalman filter in the previous frame, \( \arctan (v_{{y_{o} }}^{t - 1} /v_{{x_{o} }}^{t - 1} ) \) represents the motion direction of the target o in the image coordinate in the previous frame; parameter \( \sigma_{3}^{2} \) represents the variance for the velocity direction and is set to \( \sigma_{3}^{2} = 1200. \)

  4. 4)

    The affinity between the predicted state of the target o and the measurement z in terms of color information is defined as follows:

    $$ f_{4} (o,z) = \exp \left( { - \frac{{1 - \rho (H_{c} (o),H_{c} (z))}}{{2\sigma_{4}^{2} }}} \right) $$
    (19)

    where \( H_{c} ( \cdot ) \) represents the RGB color histogram, \( \rho \) represents the Bhattacharyya coefficient for the color histogram of the target o and the color histogram of the measurement z, parameter \( \sigma_{4}^{2} \) is set to \( \sigma_{4}^{2} = 3. \)

  5. 5)

    The affinity between the predicted state of the target o and the measurement z in terms of the histogram of oriented gradient is defined as follows:

    $$ f_{5} (o,z) = \exp \left( { - \frac{{1 - \rho (H_{g} (o),H_{g} (z))}}{{2\sigma_{5}^{2} }}} \right) $$
    (20)

    where \( H_{g} ( \cdot ) \) represents the histogram of oriented gradient,\( \rho \) represents the Bhattacharyya coefficient for the histogram of oriented gradient of the target o and the histogram of oriented gradient of the measurement z, parameter \( \sigma_{5}^{2} \) is set to \( \sigma_{5}^{2} = 3. \)

Then, the multi-cue-based distance measure in the modified intuitionistic fuzzy clustering algorithm is defined as follows:

$$ g(o,z) = 1 - f_{1} (o,z) \times f_{2} (o,z) \times f_{3} (o,z) \times f_{4} (o,z) \times f_{5} (o,z) $$
(21)

where \( g(o,z) \) denotes the multi-cue-based distance measure between the predicted state of the target o and the measurement z, f 1 is defined by Eq. (16), f 2 is defined by Eq. (17), f 3 is defined by Eq. (18), f 4 is defined by Eq. (19), and f 5 is defined by Eq. (20).

The fuzzy membership degrees are computed based on the multi-cue-based distance measure defined by Eq. (21), and the intuitionistic fuzzy membership degree matrix \( [\eta_{ik} ]_{l \times r} \) is constructed based on the obtained fuzzy membership degrees. Finally, the correct association pairs between the targets and the measurements are determined according to the principle of maximum intuitionistic fuzzy membership degree. The proposed data association algorithm based on intuitionistic fuzzy sets is summarized in Algorithm 1.

Algorithm 1 proposed data association algorithm based on intuitionistic fuzzy sets

Input: A set of predicted states of the targets \( {{O}} = \{ o_{1} , \ldots ,o_{l} \} \)and a measurement set \( {{Z}} = \{ z_{1} , \ldots ,z_{r} \} \) at frame t.

Output: The best association pairs \( \{ (o_{i} ,z_{k} )\} ,i \in \{ 1, \ldots ,l\} ;k \in \{ 1, \ldots ,r\} \)

1. Compute the multi-cue-based distance measure between predicted state of each target in O and each measurement in Z by Eq. (16)–Eq. (21).

2. Construct the intuitionistic fuzzy membership degree matrix \( [\eta_{ik} ]_{l \times r} \) by Eq. (10)–Eq. (15). Mark every row and column of the current intuitionistic fuzzy membership degree matrix.

3. Find the maximum within all the marked elements of the intuitionistic fuzzy membership degree matrix, \( \eta_{pq} = \hbox{max} ([\eta_{ik} ]_{l \times r} ),p \in \{ 1, \ldots ,l\} ,q \in \{ 1, \ldots ,r\} \), then unmark every element in the p–th row and every element in the q–th column of the intuitionistic fuzzy membership degree matrix. If \( \eta_{pq} \ge \{ \eta_{pj} \}_{j = 1,\, \ldots ,r} \), \( \eta_{pq} \ge \{ \eta_{iq} \}_{i = 1,\, \ldots ,l} \) and \( f_{1} \ge \theta, \) where f 1 is defined by Eq.(17) and the parameter θ is set to \( \theta = 0.65, \) then \( (o_{p} ,z_{q} ) \) is a correct association pair.

4. If there are both marked row and marked column in the intuitionistic fuzzy membership degree matrix, then return to step 3.

2.2 Track Management

When a new target enters the scene, the corresponding track needs to be initialized automatically. Also, when a tracked target leaves the scene, the corresponding track needs to be terminated automatically. Therefore, to enable automatically and correctly initialize or terminate a target track, several rules of track management are developed. These rules are described as follows:

  1. i)

    The measurement which is not associated with any existing targets is initialized as a new track, and it is marked as a temporal target. At the same time, a Kalman filter is initialized for it.

  2. ii)

    The temporal target which is consistently associated in λ 1 frames is marked as a real target, where the parameter λ 1 satisfies λ 1 > 1.

  3. iii)

    The states of the temporal targets and of the real targets are updated and predicted by the corresponding Kalman filters.

  4. iv)

    The tracks of the temporal targets which are not associated with any measurements in subsequent λ 2 frames are automatically terminated. The tracks of the real targets which are not associated with any measurements in subsequent λ 3 frames are automatically terminated. The parameters λ 2 and λ 3 satisfy \( \lambda_{3} \ge \lambda_{2} \ge 1. \)

2.3 Overall Online Visual Multiple Target Tracking Algorithm

The overall intuitionistic fuzzy data association-based online visual multiple target tracking method is described in Algorithm 2.

Algorithm 2 overall algorithm for the online visual multi-target tracking method

Input: t–th video frame f t

Output: Estimated states of the targets \( \{ \hat{o}_{k} \}_{k = 1,\, \ldots ,l} \) with identities

1. Use the object detector based on aggregated channel features [2] to generate the measurements \( \{ z_{i} \}_{i = 1,\, \ldots ,r} \) at the current frame.

2. Find the best association pairs \( \{ (o_{n} ,z_{k} )\}_{{n \in \{ 1, \ldots ,l\} ;k \in \{ 1,\, \ldots ,r\} }} \) between the measurements \( \{ z_{i} \}_{i = 1, \ldots ,r} \) and the predicted states of the targets \( \{ o_{j} \}_{j = 1, \ldots ,l} \) by using the intuitionistic fuzzy sets-based data association algorithm which is described in Algorithm 1.

3. Update the states of the targets with the associated measurements according to rule (iii) of track management.

4. Initialize temporal target tracks for new targets and produce real target tracks according to rule (i) and rule (ii) of track management, respectively.

5. Terminate target tracks for invalid targets according to rule (iv) of track management.

6. Predict the states of existing targets at the next frame according to rule (iii) of track management.

3 Experiments

The proposed method is evaluated on three publicly available challenging video sequences, including both low-density and high-density sequences. The comparison results between the proposed method and other related methods are also reported. The detailed information of the used datasets is shown in Table 1.

Table 1 The information of the used datasets

For the quantitative evaluation, the CLEAR MOT metrics [24] are used, including:

  • MOTP(↑): Multi-target tracking precision, average of the bounding box overlap over all tracked targets. It is defined as \( {\text{MOTP}} = {{\left( {\sum\limits_{t} {\sum\limits_{i} {\frac{{{\text{Area}}(h_{i}^{t} \cap o_{i}^{t} )}}{{{\text{Area}}(h_{i}^{t} \cup o_{i}^{t} )}}} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{t} {\sum\limits_{i} {\frac{{{\text{Area}}(h_{i}^{t} \cap o_{i}^{t} )}}{{{\text{Area}}(h_{i}^{t} \cup o_{i}^{t} )}}} } } \right)} {bbbbbb}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{t} {n_{t} } } \right)}} \), where \( {\text{Area}}(h_{i}^{t} \cap o_{i}^{t} ) \) denotes the intersection area between bounding boxes of the ith ground truth and the ith tracking result at the time t, \( {\text{Area}}(h_{i}^{t} \cup o_{i}^{t} ) \) denotes the union area between bounding boxes of the ith ground truth, and the ith tracking result at the time t, n t denotes the number of tracked targets at the time t.

  • MOTA(↑): Multi-target tracking accuracy, it is defined as \( {\text{MOTA}} = {{1 - \left( {\sum\limits_{t} {\left( {FP_{t} + FN_{t} + IDS_{t} } \right)} } \right)} \mathord{\left/ {\vphantom {{1 - \left( {\sum\limits_{t} {\left( {FP_{t} + FN_{t} + IDS_{t} } \right)} } \right)} {\left( {\sum\limits_{t} {m_{t} } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{t} {m_{t} } } \right)}} \), where FP t , FN t , IDS t , and m t are the number of false positives, number of missed targets, number of mismatches and number of total targets, respectively, at the time t.

  • IDS(↓): ID switch, the number of times that a tracked target changes its id.

  • MT(↑): The ratio of mostly tracked trajectories, which are tracked for at least 80 %.

  • ML(↓): The ratio of mostly lost trajectories, which are tracked for less than 20 %.

  • FG(↓): Fragments, the number of times that a ground truth trajectory is interrupted.

For items with ↑, higher scores indicate better results, for those with ↓, lower scores indicate better results.

3.1 PETS.S2L1 Dataset

Quantitative results on the PETS.S2L1 dataset are shown in Table 2, while some visual tracking results by the proposed method are shown in Fig. 3. As can be seen, the proposed method achieves competitive results to other multi-target tracking approaches. The proposed method achieves the best performance in terms of MOTA score. This means that the proposed method can handle the false positive detections effectively. At the same time, the MOTP score of the proposed method is a little lower, and also is the FG score. This means that the output target trajectories of the proposed method are not properly aligned with the ground truth, leading to more fragments. This is because that most targets in PETS.S2L1 do not satisfy the simple constant velocity assumption. Therefore, the precision of the target state estimated by the Kalman filter is low. In the PETS.S2L1 sequence, people walk close together causing frequent occlusion, and there also exist complete occlusions caused by the traffic sign. These challenges make data association more difficult. Figure 3 shows some tracking examples of the proposed method. As can be seen, the proposed method is able to track the targets correctly under severe occlusions to some extent, such as person 2 and person 3 at frame 25; person 6 and person 9 at frame 155. However, the identities of person 2 and person 3 are changed to 6 and 8, respectively, at frame 65, because both persons make a sharp direction change after long-term occlusions.

Table 2 Comparison of tracking results on PETS.S2L1
Fig. 3
figure 3

Tracking examples on PETS.S2L1

3.2 Town Centre Dataset

Quantitative results on the Town Centre dataset are shown in Table 3, while some visual tracking results by the proposed method are shown in Fig. 4. As can be seen, the proposed method achieves better performance on most scores. The proposed method achieves the best performance in terms of MOTA, MT, and ML scores. This means that the proposed method correctly tracks more persons, although the IDS score and the FG score of the proposed method are a slightly higher. In the Town Centre sequence, there are more targets than in the PETS.S2L1 sequence. On average, 16 people are visible at any time, resulting in frequent dynamic occlusions. Many people are not detected because of the complete or partial occlusions. Tracking is difficult in this semi-crowded environment, especially at the low frame rate. Figure 4 shows some tracking examples by the proposed method. As can be seen, the proposed method is able to track multiple targets robustly under frequent occlusions to some extent, such as person 4 at frame 19; person 17, person 21, person 22, person 23, person 24, and person 26 at frame 57 and frame 63; person 39, person 45, and person 46 at frame 134. Some persons are lost due to the missed detections, such as the persons pushing a bicycle at frame 19 and frame 134, respectively.

Table 3 Comparison of tracking results on Town Centre
Fig. 4
figure 4

Tracking examples on Town Centre

3.3 PETS.S2L2 Dataset

Quantitative results on the PETS.S2L2 dataset are shown in Table 4, while some visual tracking results by the proposed method are shown in Fig. 5. As can be seen, the proposed method achieves the best performance in terms of MOTA, IDS, MT, ML, and FG scores, while keeping the MOTP score as high as 67.3 %. The obvious performance improvement indicates the effectiveness of the proposed method. The PETS.S2L2 sequence is more difficult than PETS.S2L1 since the crowded density of PETS.S2L2 is much higher and missing detections often occur due to frequent occlusions. Moreover, target appearance changes heavily, caused by different illumination conditions in different image regions. However, by using data association algorithm based on intuitionistic fuzzy sets, the proposed method successfully tracks more targets. Figure 5 shows some tracking examples by the proposed method. As can be seen, the proposed method is able to track multiple targets robustly under severe and partial occlusions to some extent, such as person 20, person 27, and person 28 at frame 40 and frame 80. The proposed method also can deal with illumination and appearance variation to some extent, such as person 33 at frame 80 and frame 120; person 21 at frame 40, frame 80, frame 120, and frame 160.

Table 4 Comparison of tracking results on PETS.S2L2
Fig. 5
figure 5

Tracking examples on PETS.S2L2

3.4 Performance Evaluation of the Intuitionistic Fuzzy Data Association

To verify the effectiveness of the proposed intuitionistic fuzzy sets-based data association algorithm, the Hungarian algorithm is applied using Eq. (16) to compute the association cost instead of the intuitionistic fuzzy membership degrees, and its performance is compared with the proposed method. Quantitative results of the comparison on all datasets are shown in Table 5. As can be seen, although the performance differences in terms of MOTA score and MOTP score between the proposed method and the association cost-based method are not much, the proposed method achieves significant improvements in terms of IDS score and FG score. Compared with the association cost-based method, the proposed method reduces the ID switches and fragments by about 49 and 52 on average, respectively. The obvious improvement in IDS and FG scores indicates that the proposed method can better track targets in complex scenes due to the correct association. It verifies the effectiveness of the intuitionistic fuzzy data association.

Table 5 Performance comparison results

3.5 Speed of the Proposed Online Visual Multiple Target Tracking Algorithm

The proposed algorithm is implemented using the MATLAB on a PC with 3.6 GHz × 2cores CPU and 8 GB RAM. The complexity of the proposed algorithm depends on the number of detections and targets in a sequence. The average speeds of both the proposed algorithm and the compared methods are given in Table 6. Note that the reported numbers in Table 6 do not include the detection time. As presented in Table 6, although the proposed algorithm achieves significant improvement in terms of tracking accuracy, the speed of the proposed algorithm is the lowest. It means that the proposed algorithm sacrifices a small amount of computational time in exchange for better tracking accuracy. However, the differences of the speeds are not much. The most expensive processing in the proposed algorithm is spent in calculating the multi-cue-based distance measure. To improve the computational complexity of the proposed algorithm, the parallel programming or GPU-based processing can be used. The real-time implementation of the proposed algorithm is a topic for future research.

Table 6 Speed comparison results

3.6 Discussion

In the low-density sequence PETS.S2L1, the proposed method achieves competitive results to other multi-target tracking approaches. Compared with the offline multi-target tracking approach [4], the proposed method achieves better performance in terms of tracking accuracy mainly due to the utilization of the information of multiple cues including appearance, shape, and motion. By using the multi-cue-based distance measure, the proposed method performs well in discriminating different targets, especially when two targets walk closely. Compared with the online visual multi-target tracking approach [13], the proposed method also achieves better performance in terms of tracking accuracy. Because the rules of track management in the proposed method are able to correctly initialize most target tracks, thus most false positives are filtered out. However, the proposed method produces more fragments compared with the above two approaches. This is mainly because the Kalman filter cannot effectively handle the complex movements of the targets. A possible solution to this problem is to replace the current Kalman filter with non-linear filter which is able to model the motion of the target more accurately.

In the more crowded scene Town Centre and the high-density sequence PETS.S2L2, the appearances of the targets are quite similar and the occlusion happens frequently among the targets, which make data association more difficult. Compared with both offline multi-target tracking approach [8] and online visual multi-target tracking approach [13], the proposed method achieves significant improvement in most evaluation scores mainly due to the utilization of intuitionistic fuzzy data association. The intuitionistic index effectively models the uncertainty of the possible association pair, and the intuitionistic fuzzy point operator can extract useful information from the intuitionistic index. Therefore, compared with the other methods, the proposed intuitionistic fuzzy data association algorithm can better deal with the uncertain and noisy measurements, which is further confirmed by the experiment in Sect. 3.4. The drawback of the proposed method is that the higher accuracy is obtained by sacrificing the computational complexity. A possible solution to this problem is to use parallel programming or GPU-based processing. In addition, although the proposed method focuses on the visual multi-target tracking, the idea of intuitionistic fuzzy data association can also be used in other related fields [2531].

4 Conclusion

In this paper, a novel intuitionistic fuzzy data association-based online visual multi-target tracking method is proposed. Different from previous tracking methods, the proposed method does not use the algorithm structure of the joint probabilistic data association filter and instead replaces the association costs between the measurements and the targets with the intuitionistic fuzzy membership degrees which are obtained by the modified intuitionistic fuzzy clustering algorithm. Furthermore, a novel intuitionistic index is defined based on two different fuzzy c-means clustering in the proposed method. In addition, a new track management part is developed to enable automatically and correctly initialize or terminate the target track. Experimental results using challenging sequences show the improved performance of the proposed online visual multi-target tracking method compared with other approaches.