Uncertainty and Robustness in Dynamic Vision

Sznaier, Mario; Camps, Octavia

doi:10.1007/978-1-4471-5058-9_134

Mario Sznaier³ &
Octavia Camps³

396 Accesses

Abstract

Dynamic vision is a subfield of computer vision dealing explicitly with problems characterized by image features that evolve in time according to some underlying dynamics. Examples include sustained target tracking, activity classification from video sequences, and recovering 3D geometry from 2D video data. This article discusses the central role that systems theory can play in developing a robust dynamic vision framework, ultimately leading to vision-based systems with enhanced autonomy, capable of operating in stochastic, cluttered environments.

Access provided by Autonomous University of Puebla. Download reference work entry PDF

Uncertainty and Robustness in Dynamic Vision

Keywords

Background

In this article, we represent linear time invariant (LTI) systems by their associated transfer matrix G(z). The “size” of G(z), which plays a key role in assessing the effects of uncertainty, will be measured using the $\mathcal{H}_{\infty }$ norm, defined as $\|G\|_{\infty }\doteq\sup _{\omega }\overline{\sigma }\left (G(e^{j\omega }\right )$, where $\overline{\sigma }\left (.\right )$ denotes maximum singular value. For scalar systems, this reduces to the peak value of the frequency response (i.e., the maximum gain of the system). In the matrix case, this definition takes into account both the worst-case frequency and spatial direction. Background material on the $\mathcal{H}_{\infty }$ norm, its computation and its significance in the context of robust control theory, is given in Sánchez–Peña and Sznaier (1998). A general coverage of linear systems theory, including alternative representations of linear systems and their associated properties, can be found, for instance, in Rugh (1996).

Multiframe Tracking

A requirement common to most dynamic vision applications is the ability to track objects across frames, in order to collect the data required by a subsequent activity analysis step. Current approaches integrate correspondences between individual frames over time, using a combination of some assumed simple target dynamics (e.g., constant velocity) and empirically learned noise distributions (Isard and Blake 1998; North et al. 2000). However, while successful in many scenarios, these approaches are vulnerable to model uncertainty, occlusion, and appearance changes, as illustrated in Fig. 1.

**Uncertainty and Robustness in Dynamic Vision, Fig. 1**

As shown next, the fragility noted above can be avoided by modeling the motion of the target as the output of a dynamical system, to be identified directly from the available data, along with bounds on the identification error. In the sequel, we consider two different cases: (i) the motion of the target is known to belong to a relatively small set of a priori known motion modalities; and (ii) no prior knowledge is available.

The case of known motion models: Consider first the case where a set of models known to span all possible motions of the target is known a priori, as it is often the case with human motion. In this case, the position y_k of a given target can be modeled as $y(z) = \mathcal{F}(z)e(z) +\eta (z)$ where $\mathbf{e}$ and η_k denote a suitable input and measurement noise, respectively, and where $\mathcal{F}$ admits an expansion of the form $\mathcal{F} =\overbrace{\sum \limits _{ j=1}^{N_{p}}p_{j}\mathcal{F}^{j}}^{\mathcal{F}_{p}} + \mathcal{F}_{np}$. Here $\mathcal{F}^{j}$ represent the (known) motion modalities of the target and $\|\mathcal{F}_{np}\|_{\infty }\leq K$, e.g., a bound on the maximum admissible approximation error of the expansion $\mathcal{F}_{p}$ to $\mathcal{F}$ is available. In the reminder of this article, we will further assume that a set membership descriptions $\eta _{k} \in \mathcal{N}$ is available and, without loss of generality, that e(z) = 1 (i.e., motion of the target is modeled as the impulse response of the unknown operator $\mathcal{F}$).

In this context, the next location of the target feature y_k can be predicted by first identifying the relevant dynamics $\mathcal{F}$ and then using it to propagate its past values. In turn, identifying the dynamics entails finding an operator $\mathcal{F}(z) \in \mathcal{S}\doteq\left \{\mathcal{F}(z): \mathcal{F} = \mathcal{F}_{p} + \mathcal{F}_{np}\right \}$ such that $y-\eta = \mathcal{F}$, precisely the class of interpolation problem addressed in Parrilo et al. (1999). As shown there, finding such an operator reduces to solving a linear matrix inequality (LMI) feasibility problem. Once this operator is found, it can be used in conjunction with a particle (or a Kalman) filter to predict the future location of the target. Figure 2 shows the tracking results obtained using this approach. Here, we used a combination of a priori information: (i) 5 % noise level and (ii) $\mathcal{F}_{p} \in \text{span}[ \frac{1} {z-1},\ \frac{z} {z-a},\ \frac{z} {(z-1)^{2}},\ \frac{z^{2}} {(z-1)^{2}},\ \frac{z^{2}-\cos \omega z} {z^{2}-2\cos \omega z+1}$, $\frac{\sin \omega z^{2}} {z^{2}-2\cos \omega z+1}]$ where a ∈ { 0. 9, 1, 1. 2, 1. 3, 2} and ω ∈ { 0. 2, 0. 45}. The experimental information consisted of the position of the target in N = 20 frames, where it was not occluded. Note that, by exploiting predictive power of the identified model, the Kalman filter is now able to track the target past the occlusion, eliminating the need for using a (more computationally expensive) particle filter.

**Uncertainty and Robustness in Dynamic Vision, Fig. 2**

Unknown motion models: This case could be addressed in principle by performing a purely nonparametric worst-case identification (Parrilo et al. 1999) and then proceeding as above. However, a potential difficulty here stems from the high order of the resulting model (recall that the order of the central interpolant is the number of experimental data points). If a bound n on the order of the underlying models is available, this difficulty can be avoided by recasting the prediction problem into a rank minimization form, which in turn can be relaxed to a semi-definite optimization. To this effect, recall that (Ding et al. 2008), in the absence of noise, given 2n values of $\{\mathbf{y}_{k}\}_{k=t-2n+1}^{t}$, its next value $\mathbf{y}_{t+1}$ is the unique solution to the following rank minimization problem:

$$\displaystyle\begin{array}{rcl} \mathbf{y}_{t+1}& =& \mathop{\arg \min }_{\mathbf{y}}\left \{\mathrm{rank}\left [\mathbf{H}_{n+1}(\mathbf{y})\right ]\right \}\;\text{where}\;\mathbf{H}_{n+1}(\mathbf{y}) \\ & \doteq & \left [\begin{array}{@{}c@{\;\;\;}c@{\;\;\;}c@{\;\;\;}c@{}} \mathbf{y}_{t-2n+1}\;\;\;&\mathbf{y}_{t-2n+2}\;\;\;&\cdots \;\;\;& \mathbf{y}_{t-n} \\ \mathbf{y}_{t-2n+2}\;\;\;&\mathbf{y}_{t-2n+3}\;\;\;&\cdots \;\;\;&\mathbf{y}_{t-n+1}\\ \vdots \;\;\; & \vdots \;\;\; & \ddots \;\;\; & \vdots \\ \mathbf{y}_{t-n+1} \;\;\;& \mathbf{y}_{t-n+2} \;\;\;&\cdots \;\;\;& \mathbf{y} \end{array} \right ]{}\end{array}$$

(1)

Clearly, the same result holds if multiple elements of the sequence y are missing, at the price of considering longer sequences (the total number of data points should exceed 2n). This result allows for handling both noisy and missing data (due, for instance, to occlusion), by simply solving

$$\displaystyle{ \begin{array}{l} \min _{\boldsymbol{\zeta }}\left \{\mathrm{rank}\left [\mathbf{H}(\boldsymbol{\zeta })\right ]\right \}\;\mbox{ subject to $\mathbf{v} \in \mathcal{N}_{v}$ } \\ \mbox{ where $\zeta _{i} = \left \{\begin{array}{l} \mathbf{y}_{i} -\mathbf{v}_{i}\;\mathrm{if}\;i \in \mathcal{I}_{a} \\ \mathbf{x}_{i}\;\mathrm{if}\;i \in \mathcal{I}_{m} \end{array} \right .$}\\ \end{array} }$$

$\mathcal{I}_{a}$ and $\mathcal{I}_{m}$ denote the set of available (but noisy) and missing measurements, respectively, and where $\mathcal{N}_{v}$ is a set membership description of the noise v. In the case where $\mathcal{N}_{v}$ admits a convex description, using the nuclear norm as a surrogate for rank (Fazel et al. 2003) allows for reducing this problem to a convex semi-definite program. Examples of these descriptions are balls in ℓ_∞, e.g., $\mathcal{N}\doteq\{v: \vert v_{k}\vert \leq \epsilon \}$ or constraints on the norm of $\mathbf{H}_{v}$, the Hankel matrix of the noise sequence, which under mild ergodicity assumptions are equivalent to constraints on the magnitude of the noise covariance. Figure 3 illustrates the effectiveness of this approach. As shown there, the rank minimization-based filter successfully predicts the location of the target, while a Kalman filter-based tracker fails due to the substantial occlusion.

**Uncertainty and Robustness in Dynamic Vision, Fig. 3**

Event Detection and Activity Classification

Using the trajectories generated by the tracking step for activity recognition entails (i) segmenting the data into homogeneous segments each corresponding to a single activity and (ii) classifying these activities, typically based on exemplars from a database of known activities. As shown in the sequel, both steps can be efficiently accomplished by exploiting the properties of the underlying system. The starting point is to model these activities as the output of a switched piecewise linear system. In this context, under suitable dwell time constraints, each switch (indicating a change in the underlying activity) can be identified by simply searching for points associated with discontinuities in the rank of the associated Hankel matrix, as illustrated in Fig. 4. Further, in this framework, the problem of classifying each subactivity can be recast into the behavioral model (in)validation setup shown in Fig. 5. Here y_i(. ) represents the impulse response of the (unknown) LTI system G, affected by measurement noise $\eta _{i} \in \mathcal{N}$ and uncertainty $\Delta _{i} \in \mathcal{D}$ that accounts for the variability intrinsic to two different realizations of the same activity. Two different time series are considered to be realizations of the same activity if there exists at least one pair $(\eta _{1},\eta _{2}) \in \mathcal{N}^{2}$, one pair $(\Delta _{1},\Delta _{2}) \in \mathcal{D}^{2}$, a LTI system G with McMillan degree at most n_G, and suitable initial conditions $\mathbf{x}_{1}$, $\mathbf{x}_{2}$ resulting in the observed data. Remarkably, this model (in)validation problem can be reduced to a rank minimization form. In the simpler case where $\Delta _{i} = 0$, the problem can be solved using the following algorithm (Sznaier and Camps 2011):

**Uncertainty and Robustness in Dynamic Vision, Fig. 4**

**Uncertainty and Robustness in Dynamic Vision, Fig. 5**

Next, consider the more realistic case where the trajectories are also affected by bounded model uncertainty $\Delta$, $\|\Delta \|_{\infty }\leq \gamma$, where γ is given as part of a priori information. In this scenario, the internal signal z is given by $z(t) =\zeta (t) -\eta (t),\;\eta \in \mathcal{N}$, where the signal ζ satisfies

$$\displaystyle{ y = (1 + \Delta ){\ast}\zeta,\;\mathrm{for\;some\;}\Delta \in \mathcal{D} }$$

(2)

where ∗ denotes convolution. Exploiting Theorem 2.3.6 in Chen and Gu (2000) leads to an LMI condition in the variables z, η, for feasibility of (2). Thus, the only modification to Algorithm 1 required to handle model uncertainty is to incorporate this additional (convex) constraint to the rank minimization problems. Table 1 shows the results of applying this approach to 2 video sequences, walking and running, from the KTH database (Laptev et al. 2008). Sample frames from these sequences are shown in Fig. 6. In order to reduce the dimensionality of the data, the frames were first projected into a three-dimensional space using principal component analysis (PCA), and the resulting time series were used as the input to Algorithm 1, assuming 10 % noise and 10 % model uncertainty. As shown in Table 1, the algorithm correctly identifies the subsequences (a)–(c) as being generated by the same underlying activity (walking).

**Uncertainty and Robustness in Dynamic Vision, Fig. 6**

Algorithm 1 Behavioral model (in)validation

Data: Noisy measurements $y_{1},y_{2}$.

A priori information: noise description $\eta _{i} \in \mathcal{N}$

1.
Solve the following rank–minimization problems:

$r_{1}^{min} =\min _{\boldsymbol{\eta }_{1}}\mathrm{rank}(\mathbf{H}_{y_{1}} -\mathbf{H}_{\eta _{1}})$

subject to: $\eta _{1} \in \mathcal{N}$.

$r_{2}^{min} =\min _{\boldsymbol{\eta }_{2}}\mathrm{rank}(\mathbf{H}_{y_{2}} -\mathbf{H}_{\eta _{2}})$

subject to: $\eta _{2} \in \mathcal{N}$.

$r_{12}^{min} =\min _{\boldsymbol{\eta }_{1}}\mathrm{rank}([\mathbf{H}_{y_{1n}}\quad \mathbf{H}_{y_{2n}}])$

subject to: $\eta _{1},\eta _{2} \in \mathcal{N}$

$\mathbf{H}_{y_{1n}} = \mathbf{H}_{y_{1}} -\mathbf{H}_{\eta _{1}}$

$\mathbf{H}_{y_{2n}} = \mathbf{H}_{y_{2}} -\mathbf{H}_{\eta _{2}}$
2.
The given trajectories were generated by the same LTI system with McMillan degree ≤ n_G iff:

$r_{1}^{min} = r_{2}^{min} = r_{12}^{min} \leq n_{G}$

Uncertainty and Robustness in Dynamic Vision, Table 1 Activity classification results. Sequences (a)–(c) correspond to walking and (d) to running

Full size table

Summary and Future Directions

Vision-based systems are uniquely positioned to address the needs of a growing segment of the population. Aware sensors endowed with scene analysis capabilities can prevent crime, reduce time response to emergency scenes, and render viable the concept of ultra-sustainable buildings. Moreover, the investment required to accomplish these goals is relatively modest since a large number of cameras are already deployed and networked. Arguably, at this point, one of the critical factors limiting widespread use of these systems is their potential fragility when operating in unstructured scenarios. This article illustrates the key role that control theory can play in developing a comprehensive, provably robust dynamic vision framework. In turn, computer vision provides a rich environment both to draw inspiration from and to test new developments in systems theory.

Cross-References

Bibliography

Ayazoglu M, Sznaier M, Camps O (2010) Euclidean structure recovery from motion in perspective image sequences via Hankel rank minimization. LNCS, vol 6312. Springer, Berlin/New York, pp 71–84
Google Scholar
Chen J, Gu G (2000) Control oriented system identification, An $\mathcal{H}_{\infty }$ approach. Wiley, New York
Google Scholar
Ding T, Sznaier M, Camps O (2008) Receding horizon rank minimization based estimation with applications to visual tracking. In: Proceedings of the 47th IEEE conference on decision and control, Cancún, Dec 2008, pp 3446–3451
Google Scholar
Fazel M, Hindi H, Boyd SP (2003) Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In: Proceedings of the 2003 ACC, Denver, pp 2156–2162
Google Scholar
Isard M, Blake A (1998) CONDENSATION – conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE computer vision and pattern recognition, Anchorage, pp 1–8
Google Scholar
North B, Blake A, Isard M, Rittscher J (2000) Learning and classification of complex dynamics. IEEE Trans Pattern Anal Mach Intell 22(9):1016–1034
Article Google Scholar
Parrilo PA, Sánchez–Peña RS, Sznaier M (1999) A parametric extension of mixed time/frequency domain based robust identification. IEEE Trans Autom Control 44(2):364–369
Google Scholar
Rugh WJ (1996) Linear systems theory, 2nd edn. Prentice Hall, Upper Saddle River
Google Scholar
Sánchez–Peña RS, Sznaier M (1998) Robust systems theory and applications. Wiley, New York
Google Scholar
Szeliski R (2010) Computer vision: algorithms and applications. Springer, New York
Google Scholar
Sznaier M (2012) Compressive information extraction: a dynamical systems approach. In: Proceeding of the 2012 symposium on systems identification (SYSID 2012), July 2012, Brussels, pp 1559–1568
Google Scholar
Sznaier M, Camps O (2011) A rank minimization approach to trajectory (in)validation. In: 2011 American control conference, pp 675–680
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Northeastern University, Boston, MA, USA
Mario Sznaier & Octavia Camps

Authors

Mario Sznaier
View author publications
You can also search for this author in PubMed Google Scholar
Octavia Camps
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mechanical Engineering, Electrical and Computer Engineering, Boston University, Boston, MA, USA
John Baillieul
Honeywell Automation and Control Solutions, Golden Valley, MN, USA
Tariq Samad

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Sznaier, M., Camps, O. (2015). Uncertainty and Robustness in Dynamic Vision. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5058-9_134

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5058-9_134
Published: 21 July 2015
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5057-2
Online ISBN: 978-1-4471-5058-9
eBook Packages: EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Uncertainty and Robustness in Dynamic Vision

Abstract

Similar content being viewed by others

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Keywords

Background

Multiframe Tracking

Event Detection and Activity Classification

Algorithm 1 Behavioral model (in)validation

Summary and Future Directions

Cross-References

Recommended Reading

Bibliography

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Uncertainty and Robustness in Dynamic Vision

Abstract

Similar content being viewed by others

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Keywords

Background

Multiframe Tracking

Event Detection and Activity Classification

Algorithm 1 Behavioral model (in)validation

Summary and Future Directions

Cross-References

Recommended Reading

Bibliography

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation