15.1 Introduction

In addition to adapting the system behavior to the user’s preferences and its cognitive state, a Companion-System is expected to adapt to the current environment. An intuitive example is the adaptation of the system’s input and output modalities. For example, audio output should not be used for providing confidential information if other persons are in the proximity of the Companion-System. Further, the presence of other persons typically increases the uncertainty of speech and gesture input. To realize these capabilities, the Companion-System requires an exact model of its environment including all persons in its proximity.

The environment model is realized using a multi-object tracking system which jointly estimates the number of persons as well as the current state of the individual persons. The continuous tracking of the persons in the proximity of the Companion-System additionally facilitates the resumption of previously interrupted interactions. Standard multi-object tracking approaches like the Joint Probabilistic Data Association (JPDA) [1] filter, the Joint Integrated Probabilistic Data Association (JIPDA) filter [16], and Multiple Hypotheses Tracking (MHT) [18] are bottom-up approaches which extend the Kalman filter [7] to facilitate the tracking of multiple objects. During the last decade, approximations of the Multi-Object Bayes filter [11] became very popular in multi-object tracking applications. The representation of the multi-object state using random finite sets (RFSs) naturally represents the uncertainty in the number of objects as well as in their individual states. Hence, a realization of an RFS delivers an estimate for the number of persons in the proximity of the Companion-System. Since the number of objects in each realization is fixed, an RFS allows for the incorporation of dependencies between the objects, which is not possible in standard multi-object tracking approaches. Especially in crowded environments, the modeling of statistical dependencies between the objects is required since the presence of other objects physically restricts the possible movements of the considered object. A popular approach to modeling the interactions of persons is the Social Force Model [5], which is widely used for the simulation of evacuation scenarios.

This chapter is outlined as follows: First, the basics of random finite sets and multi-object tracking are introduced. In Sect. 15.3, a sequential Monte Carlo implementation of the multi-object Bayes filter as well as possibilities to integrate object interactions in the filtering algorithm are presented. Finally, an accurate and efficient approximation of the multi-object Bayes filter, the labeled multi-Bernoulli filter, is introduced and the differences with the sequential Monte Carlo implementation of the multi-object Bayes filter are illustrated.

15.2 Random Finite Sets and the Multi-object Bayes Filter

Random vectors are typically used to represent the state of an object in single-object tracking. A commonly used approach to applying random vectors for multi-object tracking is to stack the vectors of the individual objects. However, a drawback of this approach is the missing representation of the uncertainty about the number of objects and the ordering of the stacked vectors. In contrast, a random finite set (RFS)

$$\displaystyle{ \mathrm{X}_{}^{} =\{ x_{}^{(1)},\ldots,x_{}^{(n)}\} }$$
(15.1)

comprises a random number n ≥ 0 of unordered points whose states are represented by random vectors x (1), , x (n). Hence, an RFS Ximplicitly captures the uncertainty in the number of objects of the multi-object state. Similarly to single-object tracking, the state of the individual objects is represented using random vectors. Due to the varying number of objects in the sensor’s field of view and the possibility of missed detections and false alarms, the measurement process typically returns a random number of measurements. Further, the values of the measurements are also random. Consequently, an RFS

$$\displaystyle{ \mathrm{Z}_{}^{} =\{ z_{}^{(1)},\ldots,z_{}^{(m)}\} }$$
(15.2)

is well suited to represent the uncertainty of the measurement process, where z (i)denotes a single measurement. Finite set statistics (FISST) facilitates calculations with RFSs using the notion of integration and density in a way which is consistent with point process theory. Hence, FISST provides a mathematically well-founded way to extend the well-known single-object Bayes filter to multi-object tracking applications using RFSs—the multi-object Bayes Filter [11]. By filtering a finite set-valued random variable over time, the estimate obtained by the multi-object Bayes filter captures the uncertainty in number of objects in addition to the state uncertainty of the individual objects. Similarly to the single-object Bayes filter, the multi-object Bayes filter comprises a prediction and an update step, which are outlined in the following. For additional details as well as the derivations, the reader is referred to [11].

In the prediction or time update step, the multi-object posterior density at time k is predicted to the time of the next measurement. In contrast to single-object tracking, where a prediction of the object’s state xto the time of the next measurement using a Markov density f +(x +| x) is sufficient, the motion models required for multi-object tracking are far more complex. In addition to the state transition of the individual objects, the multi-object motion model is required to handle object appearance and disappearance. In some applications, it may even be necessary to incorporate object spawning, i.e. that an already existing object originates a new object. Since spawning is not relevant for the environment perception of a Companion-System, it is neglected in the following.

The standard multi-object motion model introduced by Mahler [11] comprises the following assumptions:

  • an object survives during the transition to the time of the next measurement with probability p S (x),

  • each object is assumed to move independently of other objects in the scene based on a Markov transition density f +(x +|x),

  • new-born objects follow a Poisson distributed birth density π B (X) which is statistically independent of the persisting objects.

Based on these assumptions, the multi-object Markov density is given by

$$\displaystyle{ f_{+}^{}(\mathrm{X}_{+}^{}\vert \mathrm{X}_{}^{}) =\pi _{ B}^{}(\mathrm{X}_{+}^{})\pi _{+}^{}(\emptyset \vert \mathrm{X}_{}^{})\sum _{\theta }\prod _{i:\theta (i)>0} \frac{p_{S}^{}(x_{}^{(i)}) \cdot f_{+}^{}(x_{+}^{(\theta (i))}\vert x_{}^{(i)})} {(1 - p_{S}^{}(x_{}^{(i)})) \cdot \lambda _{B}p_{B}^{}(x_{+}^{(\theta (i))})}. }$$
(15.3)

Here, the state-dependent survival probability is denoted by p S (⋅ ) and the expected number of new-born objects λ B as well as the probability density p B (⋅ ) are parameters of the birth model. The sum in (15.3) includes all possible associations \(\theta: \left \{1,\ldots,n'\right \} \rightarrow \left \{0,1,\ldots,n\right \}\); the association θ(i) = 0 represents the disappearance of object i and θ(i) > 0 implies the persistence of object i. The probabilities of all objects being new-born and of the disappearance of all objects are given by

$$\displaystyle{ \pi _{B}^{}(\mathrm{X}_{+}^{}) = e^{-\lambda _{B} }\prod _{i=1}^{n}\lambda _{ B}p_{B}^{}(x_{+}^{(i)}) }$$
(15.4)

and

$$\displaystyle{ \pi _{+}^{}(\emptyset \vert \mathrm{X}_{}^{}) =\prod _{ i=1}^{n'}\left (1 - p_{ S}^{}(x_{}^{(i)})\right ). }$$
(15.5)

Observe that the contribution of a state vector x (i)to (15.4) and (15.5) is canceled out for all associations θ(i) > 0 by the denominator of the product in (15.3).

Using the Chapman-Kolmogorov equation and the multi-object Markov density (15.3), the prediction of the multi-object Bayes filter to the time of the next measurement is given by

$$\displaystyle{ \pi _{+}(X_{+}) =\int f(X_{+}\vert X)\pi (X)\delta X, }$$
(15.6)

where π(X) is the prior multi-object density. Observe that the Markov assumption used in (15.6) implies that the multi-object posterior density π(X) captures the entire information about the multi-object state at a time k. The integral in (15.6) is a set integral which integrates over all possible cardinalities.

The update step of the multi-object Bayes filter is based on a multi-object likelihood function g(Z| X+) incorporating the single-object likelihood function g(z| x +), the field of view (FOV) of the sensor, the detection probability, and the false alarm rate. Here, the single-object likelihood function g(z| x +) provides the likelihood that a measurement zhas been generated by object x +based on the spatial distance and the corresponding uncertainties. Further, the state-dependent detection probability incorporates the handling of the sensor’s FOV. The standard multi-object measurement model [11] is illustrated by Fig. 15.1 and uses the following assumptions:

  • A measurement is generated by at most one object and each object is observed by the sensor according to a single-object spatial likelihood function g(z| x +),

  • Each object gives rise to a measurement according to the state dependent detection probability p D (x +) and it is not detected with probability 1 − p D (x +),

  • The sensor delivers Poisson-distributed false alarms with mean number of λ c measurements. The false alarms follow the spatial distribution c(z) which is usually modeled by a uniform distribution over the sensor’s FOV. Further, the object detection process and the false alarm process are assumed to be statistically independent and the measurements have to be conditionally independent of the objects’ states.

Fig. 15.1
figure 1

Illustration of the events represented by the multi-object likelihood function: object detections (red rectangles), missed detections (no measurement for partially occluded person on the right side), false alarms (red dashed rectangles)

In multi-object tracking, the track-to-measurement association, i.e. which measurement belongs to which target, is ambiguous in most scenarios due to the spatial uncertainty of the objects’ states and the measurements. Further, the possibility of missed detections and false alarms additionally increases the ambiguity. To handle these ambiguities, the multi-object likelihood function averages over all possible association hypotheses, which is the best one can do if no prior knowledge about the track-to-measurement association is available. Similarly to the multi-object Markov density, the association hypotheses for n objects and m measurements are represented by \(\theta: \left \{1,\ldots,n\right \} \rightarrow \left \{0,1,\ldots,m\right \}\), where the measurement ‘0’ covers possible missed detections of some of the objects. The assumption that a measurement belongs to at most one of the objects is ensured by θ(i) = θ( j) > 0 if and only if i = j, which uniquely assigns a measurement z θ(i)to an object i. The missed detection of an object is represented by θ(i) = 0.

Using the associations θ, the multi-object likelihood function covering missed detections and false alarms is given by

$$\displaystyle{ g_{}^{}(\mathrm{Z}_{}^{}\vert \mathrm{X}_{+}^{}) =\pi _{ C}^{}(\mathrm{Z}_{}^{})\pi _{}^{}(\emptyset \vert \mathrm{X}_{+}^{})\sum _{\theta }\prod _{i:\theta (i)>0} \frac{p_{D}^{}\big(x_{+}^{(i)}\big) \cdot g_{}^{}\big(z_{\theta (i)}^{}\vert x_{+}^{(i)}\big)} {\big(1 - p_{D}^{}\big(x_{+}^{(i)}\big)\big) \cdot \lambda _{c}c_{}^{}\big(z_{\theta (i)}^{}\big)}, }$$
(15.7)

where p D (⋅ ) denotes the state-dependent detection probability and g(⋅ | ⋅ ) is the single-object likelihood function representing the likelihood of a measurement z θ(i)given an object with predicted state x + (i). The expected number of false alarms λ c and the spatial distribution c(⋅ ) are the parameters of the Poisson clutter process. The factor

$$\displaystyle{ \pi _{}^{}(\emptyset \vert \mathrm{X}_{+}^{}) = \prod _{i=1}^{n}\big(1 - p_{ D}^{}\big(x_{+}^{(i)}\big)\big) }$$
(15.8)

denotes the probability that none of the objects has been detected by the sensor at the current time step and

$$\displaystyle{ \pi _{C}^{}(\mathrm{Z}_{}^{}) = e^{-\lambda _{c} }\prod _{z_{}^{}\in \mathrm{Z}_{}^{}}\lambda _{c}c_{}^{}(z_{}^{}) }$$
(15.9)

is the probability that all measurements z∈ Zare originated by the clutter process.

Using (15.7), the multi-object posterior density after integrating the current set of measurements is calculated using the multi-object Bayes filter update,

$$\displaystyle{ \pi (X\vert Z) = \frac{g(Z\vert X)\pi (X)} {\int g(Z\vert X)\pi (X)\delta X}. }$$
(15.10)

The recursive update of the multi-object posterior density is consequently realized by applying (15.6) and (15.10) each time a new measurement is obtained. Similarly to the single-object Bayes filter, an analytical implementation of the multi-object Bayes filter is not possible in general. However, the multi-object Bayes filter facilitates an approximation using sequential Monte-Carlo (SMC) methods as well as a closed-form implementation using δ-generalized labeled multi-Bernoulli (δ-GLMB) RFSs which are presented in detail in Sects. 15.3 and 15.4.

Further approximations of the multi-object Bayes filter, which will not be discussed in in this chapter, are the probability hypothesis density (PHD) filter [9], the cardinalized probability hypothesis density (CPHD) filter [10], and the cardinality balanced multi-target multi-Bernoulli (CB-MeMBer) filter [32]. While the PHD and CPHD filters approximate the multi-object posterior by the first statistical moment (and the cardinality distribution in case of the CPHD filter), the CB-MeMBer filter approximates the posterior using a Multi-Bernoulli distribution. Further details as well as implementations of these filters using Gaussian mixture (GM) and SMC methods are given in [10, 28, 30,31,32,33].

15.3 SMC Implementation of the Multi-object Bayes Filter and Modeling of Object Interactions

In typical applications of Companion-Systems, a high number of humans in the proximity is expected. Obviously, the movement of the individual persons is restricted in these scenarios by the presence of other persons, leading to statistical dependencies in their movements. In the following, the sequential Monte-Carlo (SMC) implementation of the multi-object Bayes filter incorporating object interactions is presented. For further details, the reader is referred to [19, 20, 22] as well as [8, 11, 26, 30].

In the SMC implementation of the single-object Bayer filter, vector-valued particles \(x_{}^{(i)} \in \mathbb{R}_{}^{n}\) are typically used to approximate the spatial distribution p(x). For the sequential Monte-Carlo multi-object Bayes (SMC-MOB) filter, each multi-object particle has to be a sample of a random finite set and is consequently given by a set of state vectors

$$\displaystyle{ \mathrm{X}_{}^{(i)} \triangleq \left \{x_{}^{(1)},\ldots,x_{}^{(n)}\right \}, }$$
(15.11)

where the number of objects n as well as the state vectors x ( j)are random. In the following, the state vectors x ( j)of the multi-object particle X (i)are conveniently called “particles”. Using the ν multi-object particles, the multi-object probability density is approximated by

$$\displaystyle{ \pi _{}^{}(\mathrm{X}_{}^{})\cong \sum _{i=1}^{\nu }w_{}^{(i)} \cdot \delta _{\mathrm{ X}_{}^{(i)}}(\mathrm{X}_{}^{}). }$$
(15.12)

15.3.1 Prediction

The prediction step of the SMC-MOB filter has to predict each multi-object particle according to the multi-object Markov density (15.3), incorporating the motion of persisting objects as well as object appearance and disappearance. Consequently, the prediction of a multi-object particle is obtained by the union of the set of surviving particles X+,S (i)and the set of new-born particles X B (i):

$$\displaystyle{ \mathrm{X}_{+}^{(i)} =\mathrm{ X}_{ +,S}^{(i)} \cup \mathrm{ X}_{ B}^{(i)}. }$$
(15.13)

The set of persisting particles of a multi-object particle X (i)= {x (1), , x (n)} is obtained by a multi-Bernoulli distribution using the persistence probability p S (x ( j)) as a parameter. Since a multi-Bernoulli distribution is the union of several independent Bernoulli distributions, the persistence of each object is assumed to be statistically independent of other objects. Thus, the persistence of a subset \(\{x_{}^{(1)},\ldots,x_{}^{(n^{{\prime}}) }\}\) has a probability of

$$\displaystyle{ \pi _{}^{}\left (\big\{x_{}^{(1)},\ldots,x_{}^{(n^{{\prime}}) }\big\}\big\vert \mathrm{X}_{}^{(i)}\right ) =\prod _{ x_{}^{}\in \mathrm{X}_{}^{(i)}}\left (1 - p_{S}^{}(x_{}^{})\right ) \cdot \prod _{\tilde{x}\in \left \{x_{}^{(1)},\ldots,x_{}^{(n^{{\prime}})}\right \}} \frac{p_{S}^{}(\tilde{x})} {1 - p_{S}^{}(\tilde{x})}. }$$
(15.14)

Instead of drawing the persisting particles directly using (15.14), the independence of the M Bernoulli distributions within the multi-Bernoulli distribution facilitates sampling the persistence of each particle independently. This can be realized by drawing a uniformly distributed random number ζ ( j)for each particle in X (i). Consequently, the set of persisting particles follows

$$\displaystyle{ \mathrm{X}_{S}^{(i)} = \left \{x:\zeta ^{(\,j)} < p_{ S}^{}(x_{}^{(\,j)})\ \forall \ j = 1,\ldots,\vert \mathrm{X}_{}^{(i)}\vert \right \}. }$$
(15.15)

Hence, a particle only persists if its state-dependent survival probability is greater than the drawn random number. Finally, the j = 1, , n persisting particles have to be predicted to the time of the next measurement using a single-object Markov transition density,

$$\displaystyle{ x_{+}^{(\,j)} \sim f_{ +}^{}\left (\cdot \vert x_{}^{(\,j)}\right ), }$$
(15.16)

in order to obtain the predicted set of surviving particles:

$$\displaystyle{ \mathrm{X}_{+,S}^{(i)} = \left \{x_{ +}^{(1)},\ldots,x_{ +}^{(n^{{\prime}}) }\right \}. }$$
(15.17)

The birth process is utilized to obtain the set of new-born particles X B (i). Therefore, the number of appearing objects n B is sampled from a Poisson-distributed cardinality distribution ρ B (n) with an expectation value of λ B . The state of the new-born particles is obtained by sampling from the spatial distribution p B of new-born objects:

$$\displaystyle{ x_{+}^{(\,j)} \sim p_{ B}^{}(\cdot )\ \forall \ j = 1,\ldots,n_{B}. }$$
(15.18)

15.3.2 Update

In the update step of the SMC-MOB, the weight of each multi-object particle has to be updated using the multi-object likelihood function (15.7). The usage of a hypotheses tree [12,13,14] facilitates an intuitive representation of all valid association hypotheses. An example of a hypotheses tree is illustrated by Fig. 15.2 for a scenario with two objects and two measurements. A complete association hypothesis for a multi-object particle corresponds to the path from the root of the tree to a leaf. Since different cardinalities are represented by additional hypotheses trees and the assignment of measurements to the clutter source is realized by the factor π C (Z), the hypotheses tree for the SMC-MOB filter is less complex than the one for the joint integrated probabilistic data association (JIPDA) algorithm in [12, 14]. The likelihood of an association θ(i) corresponds to an edge of the hypotheses tree. The value of each summand of (15.7) is calculated by multiplying the edge likelihoods from the root of the tree to the corresponding leaf. The likelihood of the measurement Zfor the multi-object particle X+ (i)is obtained by accumulating the likelihoods for all paths and a subsequent multiplication with the clutter factor π C (Z) and the missed detection factor π(∅ | X+ (i)). Similarly to [12, 14], the likelihoods for all track-to-measurement assignments are calculated a priori and stored in a lookup table. The implementation of the hypotheses tree using recursion is straightforward.

Fig. 15.2
figure 2

Hypotheses tree for a scenario with two objects, t 1and t 2, and two measurements, z 1and z 2. Each node represents an association of the object t i to measurement z j (i.e., θ(i) = j ) or the missed detection ∅ (i.e., θ(i) = 0)

The update step of the SMC-MOB filter does not affect the state of the multi-object particles, i.e. the posterior multi-object particles are identical to the predicted ones:

$$\displaystyle{ \mathrm{X}_{}^{(i)} \triangleq \mathrm{ X}_{ +}^{(i)}. }$$
(15.19)

However, the weights of the multi-object particles are updated using the multi-object likelihood function:

$$\displaystyle{ w_{}^{(i)} \triangleq \frac{\pi _{}^{}\left (\mathrm{Z}_{}^{}\vert \mathrm{X}_{+}^{(i)}\right )} {\sum _{e=1}^{\nu }\pi _{}^{}\left (\mathrm{Z}_{}^{}\vert \mathrm{X}_{ +}^{(e)}\right )}. }$$
(15.20)

Here, the denominator is a normalizing constant which ensures that the weights still sum up to 1 after the update.

After several measurement updates, the weights typically tend to concentrate on one or only a few multi-object particles since the prediction step increases the variance of the particles and the update does not decrease the variance. Hence, standard resampling approaches used for SMC implementation of the single-object Bayes have to be applied [25, 27].

15.3.3 Modeling Object Interactions

The standard multi-object motion model given by (15.7) assumes that the motion of each object depends only on its current state and the assumed motion model, i.e. the objects are considered to be statistically independent. Especially in scenarios with closely spaced objects, this assumption leads to physically impossible multi-object states after prediction. In the context of a Companion-System, typical examples of these impossible states are multi-object particles

In order to avoid invalid multi-object states, an appropriate model for human motion is required. In [6], Henderson observed correlations between fluid dynamics and human motion. However, this approach facilitates only a macroscopic formulation which, e.g., delivers the mean velocity of a group of people. In contrast, the Social Force Model proposed by Helbing and Molnar [5] uses a microscopic model to represent human motion where changes of the individual behaviors due to the current environment are modeled by attractive and repellent force vectors. Repellent forces are typically used to avoid collisions with other persons as well as static obstacles. Attractive forces are used to model the destination of a person. Further, the model is based on the knowledge that each person tries to reach its destination on the shortest path while moving with its desired velocity.

In addition to scenarios with closely spaced objects, the incorporation of object interaction is also recommended for scenarios with occlusions or in the case of low measurement rates. In Sect. 15.3.3.1, an approach based on the incorporation of physical constraints is proposed which avoids collisions of the persons and may be realized without any additional information. Further, Sect. 15.3.3.2 outlines the possibilities to improve the tracking results by using the information available in the Companion-Systems knowledge base.

15.3.3.1 Set-Based Weight Adaption

In order to obtain only valid predicted multi-object states, an incorporation of object inter-dependencies in the transition densities is required. Since the computation of such transition densities is computationally demanding, the proposed method predicts all objects within a multi-object particle independently, and a subsequent weight adaption of the multi-object particles is applied to remove invalid ones.

The weight adaption is based on the repellent forces used in the social force model, which are modeled using exponential functions [17]. In the case of circular objects with radius r p , the likelihood that a multi-object particle comprises two objects s and t follows

$$\displaystyle{ \varLambda _{d}(x_{}^{(s)},x_{}^{(t)}) = \left \{\begin{array}{ll} 0 &\text{if }d(x_{}^{(s)},x_{}^{(t)}) < 2r_{ p} \\ 1 -\exp \left (-\frac{(d(x_{}^{(s)},x_{}^{(t)})-2r_{ p})^{2}} {2\sigma _{d}^{2}} \right )&\mathrm{otherwise}, \end{array} \right. }$$
(15.21)

where d(x (s), x (t)) denotes their Euclidean distance. Obviously, a likelihood of 0 is assigned if two objects are overlapping and the exponential function facilitates a smooth transition of the likelihood function for all distances up to the preferred inter-object distance. Afterwards, the weight of the multi-object particle X (i)is adapted using the minimum likelihood of all possible pairings (s, t):

$$\displaystyle{ \widetilde{w}_{+}^{(i)} =\min _{}{ s=1,\ldots,\vert \mathrm{X}_{}^{(i)}\vert }\left (\min _{}{t=1,\ldots,\vert \mathrm{X}_{}^{(i)}\vert,t\neq s}\left (\varLambda _{d}(x_{}^{(s)},x_{}^{(t)})\right )\right ) \cdot w_{}{ +}^{(i)}\ . }$$
(15.22)

The weight of a multi-object particle is set to 0 if any two of its objects are colliding. In contrast, the weight of the multi-object particle is unchanged if the distances between all of its objects are higher than the preferred distance.

15.3.3.2 Integration of Destinations Using Knowledge Base

A Companion-System comprises a multitude of different components, each of which can potentially produce and/or consume information (see Fig. 15.3). As decision making and inference across different modules has to be kept consistent, a central probabilistic knowledge base (KB) is tasked with maintaining a global filtered probabilistic belief state X KB. Consistency requires that the local belief state X Cof a component C correspond to the marginalization of X KBover the variables not included in C. To achieve this global synchronization, probabilistic state information may flow bidirectionally between the KB and the interfacing modules. By maintaining a globally consistent belief state, the knowledge base provides mutual abstraction between all interfacing components, such that every module has to deal only with its local view of the global state. Such joint treatment of belief across components fosters synergistic effects, which may also improve the state prediction of multi-object trackers. Information originating from high-level components can be used to improve track continuity—in particular in situations featuring occlusions or low measurement rates, where the associations of the tracks between the individual time steps are ambiguous.

Fig. 15.3
figure 3

Architecture of a prototypical Companion-System. The central knowledge base maintains a filtered belief state. It integrates between lower-level sensor processing modules, like the multi-object tracker and further classifier, and high-level functionality, including decision making/planning and routines of the user interface

Basically every correlation between the global belief state X KBand the true location of a certain object/user can be used to improve the association quality. We can identify several sources of potentially useful information, although a large part of the model is application-dependent. As human users are supposed to interact with the system, we can harvest this interaction to gain hints about their true location. A registered touch event at a stationary device gives a strong indication that a user instead of a non-user is standing in front of the device. Further, knowledge about screen content increases the chance that a user will be moving toward the screen to read the information, even if no touch contact takes place. Besides information originating from the dialog management, we can also use knowledge obtained from planning. The planning component maintains a future course of action for the user [2] to follow. If some actions are known to be connected to a certain location, we can exploit this knowledge to improve the tracking accuracy. An exemplary situation happens when the system issues a job to a printer, and the user is supposed to fetch the produced document. Then this knowledge, in combination with knowledge of the location of the printer, can be used to disambiguate which observed object corresponds to the current user.

To exploit such hints on the true user location, one can follow two approaches. One approach consists of the multi-object tracking algorithm maintaining the track labels only in cases of high confidence, trying to avoid any wrong associations. This results in many spawned tracks belonging to the same object in challenging situations. Then a track to person association can be maintained by the knowledge base using available background information. The feasibility of this approach has been demonstrated successfully in an experimental setting [3, 4] using a probabilistic model formulated with Markov Logic [24]. One major disadvantage could be identified in the requirement of discretizing the user position, because of the limitations of the modeling language. The second approach is to improve the performance of the multi-object tracking algorithm itself by integrating the hints about current and future positions of human users. The (typically imprecise) knowledge about future destinations of the user may be used within the Social Force Model to improve the predicted state of the users, which is expected to significantly improve the performance in case of long term occlusions.

15.3.4 Real-Time Implementation

An implementation of the SMC-MOB filter requires a very large number of multi-object particles to obtain a sufficient approximation of the multi-object posterior. The reason for this is that the dimension of the space of the multi-object particles is given by the dimension of the state vector times the number of objects in the scene. The prediction and update steps of the SMC-MOB filter facilitate a massively parallel implementation since the calculations for each multi-object particle do not depend on any other multi-object particle. Consequently, graphics processing units (GPUs) are well suited for the implementation of the SMC-MOB filter.

Due to the combinatorial complexity and the restrictions concerning recursive functions on GPUs, an exact computation of the multi-object likelihood function is only feasible for a limited number of tracks and measurements. The reason for the complexity is the assumption that a measurement is created by at most one object. Neglecting this assumption, the multi-object likelihood function simplifies to [19]

$$\displaystyle{ \widetilde{g_{}^{}}(\mathrm{Z}_{}^{}\vert \mathrm{X}_{+}^{}) =\pi _{ C}^{}(\mathrm{Z}_{}^{})\pi _{}^{}(\emptyset \vert \mathrm{X}_{+}^{}) \cdot \prod _{i=1}^{n}\left (1 +\sum _{ j=1}^{m}\frac{p_{D}^{}\left (x_{+}^{(i)}\right ) \cdot g_{}^{}\left (z_{ j}^{}\vert x_{+}^{(i)}\right )} {\left (1 - p_{D}^{}(x_{+}^{(i)})\right )\lambda _{c}c_{}^{}(z_{j}^{})} \right ), }$$
(15.23)

i.e. the multi-object likelihood may be calculated using two for loops and the computational complexity reduces to \(\mathcal{O}(mn)\), where n is the number of objects and m is the number of measurements. The corresponding hypotheses tree for two measurements and two tracks is depicted by Fig. 15.4. Obviously, the approximation leads to two additional nodes in the tree (marked by dashed lines) and the approximation error is negligible if each measurement has a significant likelihood for at most one object, i.e. the association hypotheses due to the two additional nodes have an insignificant contribution to the multi-object likelihood function. Modeling object interactions as presented in Sect. 15.3.3; the approximation errors are negligible if the extent of the objects is significantly larger than the standard deviation of the measurement noise.

Fig. 15.4
figure 4

Approximate multi-object likelihood function: compared to Fig. 15.2, two additional nodes (marked by red dashed lines) have been added which facilitate the assignment of one measurement to multiple tracks

In [21, 22], it is shown that the proposed approximation of the multi-object likelihood function facilitates a real-time capable implementation of the SMC-MOB filter using a GPU. With a total number of 25,000 multi-object particles, an Nvidia Tesla C2075 GPU processes the prediction step, the update step and the track extraction in less than 40 ms for a scenario with up to seven objects.

15.4 Labeled Multi-Bernoulli Filter

The SMC-MOB filter introduced in the previous section requires a huge number of multi-object particles since the dimension of the sample space increases linearly in the number of objects. Since the required number of multi-object particles for a sufficient state representation grows exponentially with the state dimension, the maximum number of objects in the scene is limited due to the available computational resources. Hence, alternative approaches are required to handle large numbers of objects.

In [29], Vo and Vo showed that the class of δ-generalized labeled multi-Bernoulli (δ-GLMB) RFSs is closed under the prediction and update equations of the multi-object Bayes filterFootnote 1 for the standard multi-object motion model as well as the standard multi-object likelihood. Hence, the δ-GLMB filter [29] facilitates an analytical implementation of the multi-object Bayes filter. Similarly to the SMC-MOB filter, the number of components required within the δ-GLMB filter is combinatorial. The labeled multi-Bernoulli (LMB) filter [19, 23] approximates the δ-GLMB distribution by an LMB distribution which facilitates the tracking of a huge number of objects due to the application of principled approximations.

15.4.1 Labeled Random Finite Sets

The SMC-MOB filter as well as the PHD, CPHD, and CB-MeMBer filters require a (typically heuristic) post-processing to extract object tracks out of the estimated multi-object probability density. The underlying idea of the class of labeled RFSs is to augment the state by track labels. Thus, filtering a labeled RFS over time delivers a joint estimate of the number of tracks, their individual positions as well as their trajectories.

In a labeled RFS, each object’s state \(x_{}^{} \in \mathbb{X}_{}^{}\) is augmented by a label \(\ell_{}^{} \in \mathbb{L}_{}^{}\), where \(\mathbb{L}_{}^{}\) is a discrete label space (e.g., the set of positive integers \(\mathbb{N}_{}^{}\)). Consequently, a labeled multi-object state is represented by the set X= { x (1), , x (n)} on the space \(\mathbb{X}_{}^{} \times \mathbb{L}_{}^{}\), where the labeled state vectors are abbreviated using x = (x, ). In multi-object tracking applications, it is required that the object labels are distinct, i.e. a label may be assigned to at most one object in each realization. In order to ensure distinct labels within each realization of a labeled RFS X, the distinct label indicator [29]

$$\displaystyle{ \varDelta (\mathbf{X}_{}^{}) =\delta _{\vert \mathbf{X}_{}^{}\vert }(\vert \mathcal{L}_{}^{}(\mathbf{X}_{}^{})\vert ) }$$
(15.24)

requires the cardinality of a realization Xto be equal to the number of distinct track labels \(\vert \mathcal{L}_{}^{}(\mathbf{X}_{}^{})\vert \), where the set of track labels is given by the projection

$$\displaystyle{ \mathcal{L}_{}^{}(\mathbf{X}_{}^{}) =\{\mathcal{ L}_{}^{}(\mathbf{x}): \mathbf{x} \in \mathbf{X}_{}^{}\} }$$
(15.25)

with \(\mathcal{L}_{}^{}(\mathbf{x}) =\mathcal{ L}_{}^{}((x_{}^{},\ell_{}^{})) =\ell_{ }^{}\).

15.4.1.1 Labeled Multi-Bernoulli Random Finite Set

The representation of the uncertainty about object existence is intuitively realized using a Bernoulli RFS X. With the existence probability r, the Bernoulli RFS is given by a singleton. Consequently, the RFS Xcorresponds to the empty set with probability 1 − r. The probability density of a Bernoulli RFS follows [11, pp. 368]

$$\displaystyle{ \pi _{}^{}(\mathrm{X}) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1 - r_{}^{}, \quad &\text{if }\mathrm{X} =\emptyset,\\ r_{}^{} \cdot p_{}^{}(x),\quad &\text{if } \mathrm{X} =\{ x\}, \end{array} \right. }$$
(15.26)

where p(x) is the spatial distribution of the object on the space \(\mathbb{X}_{}^{}\). Obviously, the cardinality distribution follows a Bernoulli distribution with parameter r. A multi-Bernoulli distribution X[11] is the union of M independent Bernoulli RFSs X (i), i.e. \(\mathrm{X}_{}^{} = \cup _{i=1}^{M}\mathrm{X}_{}^{(i)}\).

By interpreting the component indices of the multi-Bernoulli distribution as track labels, the LMB RFS [29] is obtained which is completely defined by the parameter set

$$\displaystyle{ \boldsymbol{\pi }_{}^{}(\mathbf{X}_{}^{}) =\{ (r_{}^{(\ell_{}^{})},p_{}^{(\ell_{}^{})})\}_{\ell_{}^{} \in \mathbb{L}_{}^{}}. }$$
(15.27)

Using the multi-object exponential notation, an LMB RFS is expressed by

$$\displaystyle{ \boldsymbol{\pi }_{}^{}(\mathbf{X}_{}^{}) =\varDelta (\mathbf{X}_{}^{})w_{}^{}(\mathcal{L}_{}^{}(\mathbf{X}_{}^{}))p_{}^{\mathbf{X}_{}^{}}, }$$
(15.28)

where \(h^{\mathrm{X}_{}^{}} =\prod _{x\in \mathrm{X}_{}^{}}h(x)\) and \(h^{\emptyset } = 1\). The weights of the realizations are given by the multi-Bernoulli distribution

$$\displaystyle{ w_{}^{}(L) =\prod \limits _{i\in \mathbb{L}_{}^{}}\left (1 - r_{}^{(i)}\right )\prod \limits _{\ell_{}^{} \in L}\frac{1_{\mathbb{L}_{}^{}}(\ell_{}^{})r_{}^{(\ell_{}^{})}} {1 - r_{}^{(\ell_{}^{})}}, }$$
(15.29)

and the spatial distributions are \(p_{}^{}(x_{}^{},\ell_{}^{}) = p_{}^{(\ell_{}^{})}(x_{}^{})\). An example for an LMB RFS is illustrated by the upper part of Fig. 15.5.

Fig. 15.5
figure 5

Representation of the multi-object state using LMB and δ-GLMB RFSs. An LMB RFS can be equivalently rewritten in δ-GLMB form. In contrast, a δ-GLMB RFS can only be approximated using an LMB RFS

15.4.1.2 δ-Generalized Labeled Multi-Bernoulli Random Finite Set

An LMB RFS facilitates exactly one realization for a given set of track labels due to the assumption of statistical independence of the tracks. In contrast, a δ-GLMB RFS provides the possibility of several realizations for each set I of track labels. The distribution of a δ-GLMB RFS is given by

$$\displaystyle{ \boldsymbol{\pi }_{}^{}(\mathbf{X}_{}^{}) =\varDelta (\mathbf{X}_{}^{})\sum _{(I,\xi )\in \mathcal{F}(\mathbb{L}_{}^{})\times \varXi _{}^{}}w_{}^{(I,\xi )}\delta _{ I}(\mathcal{L}_{}^{}(\mathbf{X}_{}^{}))\left [p_{}^{(I,\xi )}\right ]^{\mathbf{X}_{}^{}}, }$$
(15.30)

where ξ denotes the history of track-to-measurement associations. Thus, the δ-GLMB RFS is able to represent the ambiguity in the track-to-measurement association during the filter update using several components or hypotheses for each set of track labels.

The difference between an LMB RFS and a δ-GLMB RFS is depicted by Fig. 15.5 (observe that only a subset of all hypotheses of the δ-GLMB RFS is shown). While the LMB RFS requires the tracks to be statistically independent, the δ-GLMB RFS facilitates the representation of statistical dependencies. Since an LMB RFS is a special case of a δ-GLMB RFS, it can be transformed into the corresponding δ-GLMB representation. In contrast, a δ-GLMB RFS can only be approximated by an LMB RFS.

15.4.2 Implementation of the Labeled Multi-Bernoulli Filter

The labeled multi-Bernoulli (LMB) filter is based on the representation of the multi-object state using an LMB RFS. A complete cycle of the LMB filter is conceptually illustrated by Fig. 15.6. In the following, the main ideas behind the LMB filter are outlined for an implementation using GMs. For additional details, the derivation of the filter, and SMC implementations, refer to [19, 23].

Fig. 15.6
figure 6

LMB filter schematic

In the prediction step, the Bernoulli distribution of each track is predicted independently. First, the spatial distribution of the track is predicted using the well-known Kalman filter equations. In the case of slightly non-linear motion models, the corresponding equations of the extended Kalman filter (EKF) or unscented Kalman filter (UKF) have to be applied. The prediction of the track’s existence probability is realized by multiplying the posterior existence probability with the survival probability p S . Finally, the tracks of the birth distribution have to be appended to the predicted LMB RFS.

To reduce the computational complexity of the filter update, the predicted LMB density is partitioned using a grouping procedure. The grouping procedure returns groups of closely spaced objects and their associated measurements where the groups can be assumed to be statistically independent in case of sufficiently large gating values. Thus, the filter update can be applied to each group independently, which significantly reduces the computational load [23].

The grouping procedure enables parallel processing of each group during filter update. The update of each group is performed as follows: After transforming the LMB RFS of each group to δ-GLMB form, the full δ-GLMB update [29] is applied, which results in several hypotheses for each set of track labels due to the ambiguity of the track-to-measurement association. The hypotheses are again given by the tree in Fig. 15.2, where each path from the root to a leaf represents a single association hypothesis. In order to reduce computational load, only the k best association hypotheses are evaluated for large groups using Murty’s algorithm [15]. After calculating the updated hypotheses, the posterior δ-GLMB density of each group is approximated by an LMB RFS. The approximation matches the first moment of the δ-GLMB density, i.e. the spatial distribution and the mean value of the cardinality distribution of the approximation are equivalent while the cardinality distribution itself differs. Finally, the LMB RFSs of the groups are merged and the subsequent track management module is extracting track estimates and pruning tracks with very small existence probabilities.

15.5 Conclusion

This chapter presented two multi-object tracking algorithms based on random finite sets, which are suitable to track all humans in the proximity of a Companion-System. The SMC-MOB filter facilitates the integration of object-interactions as well as the information of a knowledge-base in the filtering algorithm. In contrast, the LMB filter requires significantly smaller computational resources and is capable of tracking even huge numbers of objects. The table in Fig. 15.7 summarizes the differences of the two filters and illustrates that the choice for the most convenient tracking algorithm strongly depends on the scenarios that should be handled.

Fig. 15.7
figure 7

Comparison of LMB and SMC-MOB filters

The presented multi-object tracking algorithms facilitate an adaption of the Companion-System’s behavior to the current environment. Examples for the adaption are given in the context of the demonstration scenario 3 (see Chap. 25), e.g., activation of the system or including the group size into the purchase process. Further, the continuous tracking of the user provides the possibility to resume previously interrupted interactions. Other possibilities include the adaption of the input and output modalities to the current situation.