1 Introduction

The problem of studying dynamical systems using image sequences (such as videos) is both important and challenging. It has applications in many areas including video surveillance, lip reading, pedestrian tracking, hand gesture recognition, human–machine interfaces, brain functional connectivity analysis and medical diagnosis. Since the size of video data is generally very high, analyses are often performed by extracting certain low-dimensional features of interest—geometric, motion and colorimetric features, etc—from each frame and then forming temporal sequences of these features for full videos. Consequently, analysis of videos gets replaced by analysis of longitudinal observations in a certain feature space. (Some papers (e.g., [13, 40]) discard temporal structure by pooling all the feature together but that may represent a severe loss of information.) Since many features are naturally constrained to lie on nonlinear manifolds, the corresponding representations form parameterized trajectories on these manifolds. Examples of these manifolds include unit spheres, Grassmann manifolds [16], Lie groups [7], and the space of probability distributions.

Fig. 1
figure 1

Examples of video frames in visual speech recognition (first two rows) and hand gesture classification (last two rows)

One of the most common and effective features in image analysis is a covariance matrix, as shown via applications in medical imaging [3, 31] and computer vision [15, 17, 23, 24, 39, 40]. These matrices are naturally constrained to be symmetric positive-definite matrices (SPDMs) and have also played a prominent role as region descriptors in texture classification, object detection, object tracking, action recognition and face recognition. Tuzel et al. [40] introduced the concept of covariance tracking where they extracted a covariance matrix for each video frame and studied the temporal evolution of this matrix in the context of pedestrian tracking in videos. Since the set of SPDMs is a well-known set, denoted by \({\tilde{\varvec{\mathcal {P}}}}\) (or \({\tilde{\varvec{\mathcal {P}}}}(n)\) when the dimension of the SPDM manifold is specified as n), a video segment can be represented as a (parameterized) trajectory in \({\tilde{\varvec{\mathcal {P}}}}\). In the brain functional connectivity analysis, the instantaneous connectivity, extracted from functional magnetic resonance imaging (fMRI) data, is typically represented as a SPDM [8, 9]. Therefore, a dynamic evolution of connectivity can be naturally represented as a trajectory on the set of SPDMs. Figure 1 shows some examples of video frames for the two applications studied in this paper: visual speech recognition and hand gesture classification.

One challenge in characterizing activities as trajectories comes from the variability in execution rates. The execution rate of an activity dictates the parameterization of the corresponding trajectory. The execution rates for different observations are quite different, even if the activities belong to the same class. Different execution rates imply that the trajectories go through the same sequences of points in \(\tilde{\varvec{\mathcal {P}}}\) but at different times. Consequently, directly analyzing such trajectories without temporal alignment, e.g., comparing the difference, and calculating point-wise mean and covariance, can be erroneous.

To make these issues precise, we develop some notation first. Let \(\alpha : [0,1] \rightarrow \tilde{\varvec{\mathcal {P}}}\) be a trajectory and let \(\gamma :[0,1] \rightarrow [0,1]\) be a positive diffeomorphism such that \(\gamma (0) = 0\) and \(\gamma (1) = 1\). This \(\gamma \) plays the role of a time-warping function, or a re-parameterization function, so that the composition \(\alpha \circ \gamma \) is now a time-warped or re-parameterized version of \(\alpha \). In other words, the trajectory \(\alpha \circ \gamma \) goes through the same set of points as \(\alpha \) but at a different rate (speed). Some kind of temporal registration is necessary to deal with this so-called phase variability.

There are two types of registration problems for trajectories. The first type is pairwise registration: let \(\alpha _1, \alpha _2: [0,1] \rightarrow \tilde{\varvec{\mathcal {P}}}\) be two trajectories, and the process of registration of \(\alpha _1\) and \(\alpha _2\) is to find a time-warping \(\gamma \) such that \(\alpha _1(t)\) is optimally registered to \(\alpha _2(\gamma (t))\) for all \(t \in [0,1]\), with optimality defined using an objective function. Another type is multiple registration: let \(\alpha _1, \alpha _2,\dots , \alpha _n\) be n trajectories on \(\tilde{\varvec{\mathcal {P}}}\), and we want to find out time-warpings \(\gamma _1, \gamma _2, \dots , \gamma _n\) such that for all t, the variables \(\{\alpha _i(\gamma _i(t))\}_{i=1}^n\) are optimally registered. A solution for pairwise registration can be used to solve the multiple registration problem using an iterative solution—for the given trajectories, first define a template trajectory and then align each given trajectory to this template in a pairwise fashion. One way of defining this template is to use the mean of given trajectories under an appropriately chosen metric.

Notice that the problem of comparisons of trajectories is different from the problem of curve fitting or trajectory estimation from noisy data. Many papers have studied spline-type solutions for fitting curves to discrete, noisy data points on manifolds [20, 25, 30, 33, 37], but in this paper, we assume that the trajectories are already available through some other means.

1.1 Past Work and Their Limitations

There are very few papers in the literature for analyzing—in the sense of comparing, averaging or clustering—trajectories on nonlinear manifolds. What one may consider a very natural approach actually has limitations when seeking parameterization invariance. Let \(d_{\tilde{\varvec{\mathcal {P}}}}\) denote the geodesic distance resulting from the chosen Riemannian metric on \(\tilde{\varvec{\mathcal {P}}}\). It can be shown that the quantity \(\int _0^1 d_{\tilde{\varvec{\mathcal {P}}}}(\alpha _1(t), \alpha _2(t)) \mathrm{d}t\) forms a proper distance for all trajectories on \(\tilde{\varvec{\mathcal {P}}}\). For example, [22] uses this metric, combined with the arc-length distance on \(\mathbb {S}^2\), to cluster hurricane tracks. However, this metric is not immune to different temporal evolutions of hurricane tracks. Handling this variability requires performing some kind of temporal alignment. It may be tempting to use the following modification of this distance to align two trajectories:

$$\begin{aligned} \inf _{\gamma \in \varGamma } \left( \int _0^1 d_{\tilde{\varvec{\mathcal {P}}}}(\alpha _1(t), \alpha _2 (\gamma (t))) \mathrm{d}t \right) , \end{aligned}$$
(1)

but this can lead to degenerate solutions (also known as the pinching problem, described for real-valued functions in [32, 35]). Pinching implies that a severely distorted \(\gamma \) is used to eliminate (or minimize) those parts of \(\alpha _2\) that do not match with \(\alpha _1\), which can be done even when \(\alpha _2\) is significantly different from \(\alpha _1\). While this degeneracy can be avoided using a regularization penalty on \(\gamma \), some of other problems remain, including the fact that the solution is not symmetric.

A recent solution, presented in [38, 39], develops the concept of elastic trajectories to deal with the parameterization variability. It represents each trajectory by its transported square-root vector field (TSRVF) defined as:

$$\begin{aligned} h_{\alpha }(t) = \left( { \dot{\alpha }(t) \over \sqrt{ | \dot{\alpha }(t) |} } \right) _{\alpha (t) \rightarrow c} \in \mathcal {T}_{c}({\tilde{\varvec{\mathcal {P}}}}), \end{aligned}$$

where c is a predetermined but arbitrary reference point on \(\tilde{\varvec{\mathcal {P}}}\) and \(\rightarrow \) denotes a parallel transport of the vector \(\dot{\alpha }(t)\) from the point \(\alpha (t)\) to c along a geodesic path. A trajectory is mapped into a curve in the tangent space \(\mathcal {T}_c(\tilde{\varvec{\mathcal {P}}})\) and one can compare/align these curves using the \(\mathbb {L}^2\) norm, denoted by \(\Vert \cdot \Vert \), on that vector space. More precisely, the quantity \(\inf _{\gamma } \Vert h_{\alpha _1} - h_{\alpha _2 \circ \gamma }\Vert \) provides not only a criterion for optimality of \(\gamma \) but also a proper metric for averaging and other statistical analyses. This TSRVF representation is an extension of the SRVF framework used for elastic shape analysis of curves in Euclidean spaces [36]. There are two main limitations of this mathematical representation. One is that the choice of reference point, c, is left arbitrary. The results can potentially change with c, which makes it difficult to interpret the results. A bigger issue is that the transport of tangent vectors \(\dot{\alpha }(t)\) to c, along geodesics, can introduce large distortion, especially when \(\alpha (t)\) is far from c on the manifold.

Since our original formulation [41], Brigant et al. [5, 6] have also used a similar Riemannian structure for comparing trajectories. However, their representations are based on a direct analysis of the vector fields \({ \dot{\alpha }(t) \over \sqrt{ | \dot{\alpha }(t) |} }\), i.e., without any parallel translation, and the space of such representations is the space of trajectories in full tangent bundle of the manifold. As described next, the proposed representation in our paper is a curve in a tangent space and, thus, the space of representations is a vector bundle, a proper subset of the tangent bundle used in [5, 6]. Consequently, the resulting Riemannian metric and geodesic paths are different in the two sets of works. A major limitation of [5, 6] is that, while they use parameterization-invariant metrics, they do not explicitly solve for the temporal registration across trajectories. This registration is, in fact, the main reason for choosing invariant metrics in the first place, and is a major contribution of the current paper.

1.2 Our Approach

We introduce a novel mathematical representation of trajectories that does not require a choice of c. In this representation, the trajectories are still represented by their transported vector fields but not at the global reference point. For each trajectory \(\alpha _i\), the reference point is chosen to be its starting point \(\alpha _i(0)\), and the transport is performed along the trajectory itself. In other words, for each t, the velocity vector \(\dot{\alpha }_i(t)\) is transported along \(\alpha \) to the tangent space of the starting point \(\alpha _i(0)\). As a consequence, the trajectory \(\alpha \) gets mapped into a curve in the tangent space \(\mathcal {T}_{\alpha (0)}(\tilde{\varvec{\mathcal {P}}})\). This idea has been used previously in [25] and others, for some shape manifolds, and results in a relatively stable curve with smaller distortions than the TSRVFs of [38]. However, these previous papers do not provide re-parameterization invariance in their analysis. In contrast, we develop a metric-based framework for comparing, averaging, and modeling such curves in a manner that is invariant to their re-parameterizations. Consequently, this framework provides a natural solution for removal of rate, or phase, variability from trajectory data.

The main contributions of this paper are:

  1. 1.

    Provides a novel representation for parameterization-invariant analysis of trajectories on manifolds. It results in a significant improvement over [38, 39] in the sense that the new representation forms a vector bundle of the manifold, rather than a predetermined tangent space.

  2. 2.

    Introduces a re-parameterization invariant metric on the vector bundle and uses that metric to generate temporal alignments, and rate-invariant sample summary of trajectories on manifolds.

  3. 3.

    Provides efficient algorithms for computation of geodesic paths under the chosen metric.

  4. 4.

    Demonstrates these ideas by analyzing covariance trajectories from video-based action recognition and dynamic brain functional connectivity analysis.

The rest of this paper is organized as follows. In Sect. 2, we introduce our framework for aligning, averaging and comparing of trajectories on a general manifold M. Since we mainly focus on covariance trajectories as an application, in Sect. 4, we introduce a Riemannian structure on \(\tilde{\varvec{\mathcal {P}}}\). Sections 58 demonstrate the proposed ideas with simulated and real data, involving video-based action recognition and dynamic functional brain network analysis.

2 Analysis of Trajectories on Manifolds

In this section, we derive a framework for comparing trajectories on a general Riemannian manifold M.

2.1 Representation of Trajectories

Let \(\alpha \) denote a piecewise \(C^1\) trajectory on a Riemannian manifold M. That is, \(\alpha :[0,1]\rightarrow M\) such that there are finitely many points \(0=t_0<t_1<\cdots <t_n=1\) such that on each \([t_{i-1},t_i]\) \(\alpha \) is a \(C^1\) curve (one-sided derivatives at each end). Let \(\mathcal{F}_p\) be all such piecewise \(C^1\) trajectories starting at p, and let \(\mathcal{F} = \coprod _{p \in M} \mathcal{F}_p\).

Define \(\varGamma \) to be the set of all orientation preserving diffeomorphisms of [0, 1]: \(\varGamma =\{\gamma :[0,1]\rightarrow [0,1]|\gamma (0)=0,\ \gamma (1)=1,\ \gamma \,\) is a diffeomorphism\(\}\). \(\varGamma \) forms a group under the composition operation. If \(\alpha \) is a trajectory on M, then \(\alpha \circ \gamma \) is a trajectory that follows the same sequence of points as \(\alpha \) but at the evolution rate governed by \(\gamma \). More technically, the group \(\varGamma \) acts on \(\mathcal{F}\), \(\mathcal{F} \times \varGamma \rightarrow \mathcal{F}\), according to \((\alpha *\gamma )=\alpha \circ \gamma \).

We introduce a new representation of trajectories that will be used to compare and register them. We assume that for any two points \(\alpha (\tau _1),\alpha (\tau _2) \in M, \tau _1 \ne \tau _2\), we have a mechanism for parallel transporting any vector \(v \in \mathcal {T}_{\alpha (\tau _1)}(M)\) along \(\alpha \) from \(\alpha (\tau _1)\) to \(\alpha (\tau _2)\), denoted by \((v)_{\alpha (\tau _1) \rightarrow \alpha (\tau _2)}\).

Definition 1

Let \(\alpha : [0, 1] \rightarrow M\) denote a piecewise \(C^1\) trajectory starting with \(p=\alpha (0)\). Given a trajectory \(\alpha \), and the velocity vector field \(\dot{\alpha }\), define its transported square-root vector field (TSRVF) to be a scaled parallel transport of the vector field along \(\alpha \) to the starting point p according to: for each \(\tau \in [0,1]\), \(q(\tau ) = \left( { \dot{\alpha }(\tau ) \over \sqrt{ | \dot{\alpha }(\tau ) |} }\right) _{\alpha (\tau ) \rightarrow p} \in \mathcal {T}_{p}(M) \ \), where \(| \cdot |\) denotes the norm that is defined through the Riemannian metric on M.

This representation is motivated from some similar but distinct ideas used in the past literature. Firstly, it relates to the notion of unrolling introduced by Jupp and Kent [20] for spherical manifolds. Starting with a piecewise \(C^1\) curve \(\alpha \) on a sphere, they constructed a curve in \(\mathbb {R}^2\), called the unrolling of \(\alpha \) as follows. They define the unrolled curve as the integral of the curve in \(\mathcal {T}_p(M)\) generated by the parallel translation of \(\dot{\alpha }(t)\) along \(\alpha \) to p. That is, \(\nu : [0,1] \rightarrow \mathcal {T}_p(M)\) is the unrolling of \(\alpha \), where \(\nu (t) = \int _0^t ( \dot{\alpha }(s) _{\alpha (s) \rightarrow p}) ~~ \mathrm{d}s\). The difference between unrolling and TSRVF is the use of \(\sqrt{|\dot{\alpha }(t)|}\) in the denominator of TSRVF and the extra integral present in unrolling. Secondly, it is similar to the TSRVF in [38] with the difference that in [38] the transport was along geodesics to a reference point c, but here the parallel transport is along \(\alpha \) (to the starting point p). This reduces distortion in representation relative to the parallel transport of [38] to a faraway reference point.

This TSRVF representation maps a trajectory \(\alpha \) on M to a curve q in \(\mathcal {T}_p(M)\). For any point \(p \in M\), let \(\mathbb {B}_p\) be the set of functions on the tangent space \(\mathcal {T}_p(M)\) of the type: \(v:[0,1]\rightarrow \mathcal {T}_p(M)\) is in \(\mathbb {B}_p\) if there are finitely many points \(0=t_0<t_1<\cdots <t_n=1\) such that, on each \([t_{i-1},t_i)\), v is continuous, and \(\lim _{t\rightarrow t_i-}\) exists. The space of interest, then, becomes an infinite-dimensional vector bundle \(\mathbb {B}= \coprod _{p \in M} \mathbb {B}_p\), which is the indexed union of \(\mathbb {B}_p\) for every \(p\in M\). We note in passing that \(\mathbb {B}_p\) is a subspace of the Hilbert space \(\mathbb {L}^2([0, 1], \mathcal {T}_p(M))\), the set of all square-integrable curves in \(\mathcal {T}_p(M)\).

There is a bijection between \(\mathcal{F}_p\) and \(\mathbb {B}_p\). This result is straightforward except for the following point. If \(\alpha \in \mathcal{F}_p\) has a bend at \(t_0\), then \(\dot{\alpha }(t_0)\) does not exist. To define the corresponding TSRVF q, we can take \(q(t_0)=\lim _{t\rightarrow t_0+}\Big (\dot{\alpha }(t)/\sqrt{|\dot{\alpha }(t)|}\Big )_{\alpha \rightarrow p}\), and the resulting \(q\in \mathbb {B}_p\). As a corollary to this result, the TSRVF representation is bijective: any \(\alpha \in \mathcal{F}\) is uniquely represented by a pair \((p,q) \in \mathbb {B}\), where \(p \in M\) is the starting point, \(q \in \mathbb {B}_p\) is its TSRVF. We can reconstruct the trajectory from (pq) using the covariant integral (see Algorithm 3 for a numerical implementation).

2.2 Riemannian Structure on \(\mathbb {B}\)

In order to compare trajectories, we will compare their corresponding representations in \(\mathbb {B}\) and that requires a Riemannian structure on \(\mathbb {B}\). Let \(\alpha _1, \alpha _2\) be two trajectories on M, with starting points \(p_1\) and \(p_2\), respectively, and let the corresponding TSRVFs be \(q_1\) and \(q_2\). Now \(\alpha _1, \alpha _2\) are represented as two points in the vector bundle \((p_1,q_1),(p_2,q_2) \in \mathbb {B}\) over M. This representation space is an infinite-dimensional vector bundle, whose fiber over each point p in M is \(\mathbb {B}_p\).

We impose the following Riemannian structure on \(\mathbb {B}\). For an element (xv) in \(\mathbb {B}\), where \(x\in M\), \(v \in \mathbb {B}_x\), we naturally identify the tangent space at (xv) to be: \(\mathcal {T}_{(x,v)}{(\mathbb {B})} \cong \mathcal {T}_x(M)\oplus \mathbb {B}_x\). To see this, suppose we have a curve in \(\mathbb {B}\) given by \((x(s),v(s,\tau ))\), \(s, \tau \in [0,1]\). The velocity vector to this curve at \(s=0\) is given by \((x_s(0),\nabla _{x_s}v(0,\cdot ))\in \mathcal {T}_x(M)\oplus \mathbb {B}_x\), where \(x_s\) denotes \(\mathrm{d}x/\mathrm{d}s\), and \(\nabla _{x_s}\) denotes covariant differentiation of tangent vectors. The Riemannian inner product on \(\mathbb {B}\) is defined in an obvious way: If \((u_1,w_1(\cdot ))\) and \((u_2,w_2(\cdot ))\) are both elements of \(\mathcal {T}_{(x,v)}(\mathbb {B})\cong \mathcal {T}_x(M)\oplus \mathbb {B}_x\), define

$$\begin{aligned} \langle (u_1,w_1(\cdot )),(u_2,w_2(\cdot ))\rangle = (u_1\cdot u_2)\,+\,\int _0^1 (w_1(\tau )\cdot w_2(\tau ))\,\mathrm{d}\tau , \end{aligned}$$
(2)

where the inner products on the right denote the original Riemannian metric in \(\mathcal {T}_x(M)\).

For given two points \((p_1,q_1)\) and \((p_2,q_2)\) in \(\mathbb {B}\), we want to find the geodesic path connecting them. Let (x(s),  \(v(s,\cdot ) ), s\in [0,1]\) be a path with \(\left( x(0),v(0,\cdot ) \right) = (p_1, q_1)\) and \(\left( x(1),v(1,\cdot ) \right) = (p_2, q_2)\). We have the following characterization of geodesics on \(\mathbb {B}\).

Theorem 1

A parameterized path \([0,1]\rightarrow \mathbb {B}\) given by \(s \mapsto (x(s),v(s,\tau ))\) (where the variable \(\tau \) corresponds to the parameterization in \(\mathbb {B}_x\)), is a geodesic in \(\mathbb {B}\) if and only if:

$$\begin{aligned} \begin{array}{lcl} \nabla _{x_s}x_s + \int _0^1 R(v,\nabla _{x_s} v)(x_s)\mathrm{d}\tau &{}=&{}0 \quad \text { for every } s,\\ \nabla _{x_s}(\nabla _{x_s}v)(s,\tau )&{}=&{}0 \quad \text { for every } s, \tau . \end{array} \end{aligned}$$
(3)

Here \(R(\cdot ,\cdot )(\cdot )\) denotes the Riemannian curvature tensor, \(x_s\) denotes \(\mathrm{d}x/\mathrm{d}s\), and \(\nabla _{x_s}\) denotes the covariant differentiation of tangent vectors on tangent space \(\mathcal {T}_{x(s)}(M)\).

Proof

We will prove this theorem in two steps. (1) First, we consider a simpler case where the space of interest is the tangent bundle TM of the Riemannian manifold M. An element of TM is denoted by (xv), where \(x\in M\) and \(v \in \mathcal {T}_x(M)\). It is natural to identify \(\mathcal {T}_{(x,v)}(TM)\cong \mathcal {T}_x(M) \oplus \mathcal {T}_x(M)\). The Riemannian inner product on TM is defined in the obvious way: If \((u_1,w_1)\) and \((u_2,w_2)\) are both elements of \(\mathcal {T}_{(x,v)}(TM)\), define

$$\begin{aligned} \langle (u_1,w_1),(u_2,w_2)\rangle =u_1\cdot u_2+w_1\cdot w_2 \end{aligned}$$

and, again, the inner products on the right denote the original Riemannian metric on M. Suppose we have a path in \([0, 1] \rightarrow TM\) given by \(s\mapsto (x(s),v(s))\). We define the energy of this path by

$$\begin{aligned} E=\int _0^1 (x_s\cdot x_s + \nabla _{x_s}v\cdot \nabla _{x_s}v) \mathrm{d}s. \end{aligned}$$

The integrand is the inner product of the velocity vector of the path with itself. It is a standard result that a geodesic on TM can be characterized as a path that is a critical point of this energy function on the set of all paths between two fixed points in TM. To derive local equations for this geodesic, we now assume we have a parameterized family of paths denoted by (x(st), v(st)), where s is the parameter of each individual path in the family (as above) and the variable t tells us which path in the family we are in. Assume \(0\le s\le 1\) and t takes values on \((-\,\delta ,\delta )\) for some small \(\delta > 0\). We want all the paths in this family to start and end at the same points of TM, so assume that (x(0, t), v(0, t)) and (x(1, t), v(1, t)) are constant functions of t. The energy of the path with index t is given by:

$$\begin{aligned} E(t)=\int _0^1 (x_s\cdot x_s + \nabla _{x_s}v\cdot \nabla _{x_s}v) \mathrm{d}s\ . \end{aligned}$$

To simplify notation in what follows, we will write \(\nabla _s\) for \(\nabla _{x_s}\) and \(\nabla _t\) for \(\nabla _{x_t}\). To establish conditions for (xv) to be critical, we take the derivative of E(t) with respect to t at \(t=0\):

$$\begin{aligned} E'(0)=2\int _0^1 [(\nabla _tx_s\cdot x_s)+(\nabla _t(\nabla _s v)\cdot \nabla _s v)]\mathrm{d}s. \end{aligned}$$

We will use two elementary facts: (a) \(\nabla _t(x_s)=\nabla _s(x_t)\) and (b) \(R(x_t,x_s)(v)=\nabla _t(\nabla _s v)-\nabla _s(\nabla _t v)\), without presenting their proofs. Plugging these facts into the above calculation, we get \(E'(0)\) to be:

$$\begin{aligned}&2\int _0^1[\nabla _s x_t\cdot x_s+R(x_t,x_s)(v)\cdot \nabla _sv+\nabla _s(\nabla _tv)\cdot \nabla _sv]\mathrm{d}s\\&\quad =2\!\int _0^1 [(-\nabla _sx_s\cdot x_t)+R(x_t,x_s)(v)\cdot \nabla _sv+(-\nabla _s(\nabla _sv)\cdot \nabla _t v)]\mathrm{d}s. \end{aligned}$$

The second equality comes from using integration by parts on the first and third term, taking into account the fact that \(x_t\) and \(\nabla _tv\) vanish at \(s=0,1\), (since all the paths begin and end at the same point). Now, using the standard identities \(R(X,Y)(Z)\cdot W= R(Z,W)(X)\cdot Y\) and \(R(X,Y)(Z)\cdot W=-R(X,Y)(W)\cdot Z\), we obtain:

$$\begin{aligned} E'(0)= & {} 2\int _0^1 [(-\nabla _sx_s\cdot x_t)+(-R(v,\nabla _sv)(x_s)\cdot x_t) \\&+\,(-\,\nabla _s(\nabla _sv)\cdot \nabla _t v)]\mathrm{d}s \\= & {} -2\int _0^1[(\nabla _sx_s+R(v,\nabla _sv)(x_s))\cdot x_t\\&+\,(\nabla _s(\nabla _sv)\cdot \nabla _tv)]\mathrm{d}s \\= & {} -2\int _0^1 (\nabla _sx_s+R(v,\nabla _sv)(x_s))\cdot x_t \,\mathrm{d}s \\&-\,2\int _0^1 \nabla _s(\nabla _sv)\cdot \nabla _tv\, \mathrm{d}s. \end{aligned}$$

Now, (x(s), v(s)) is critical for E if and only if \(E'(0)=0\) for every possible variation \(x_t\) of x and \(\nabla _t(v)\) of v, which is clearly true if and only if

$$\begin{aligned} \nabla _sx_s+R(v,\nabla _sv)(x_s)=0\hbox { and }\nabla _s(\nabla _sv)=0. \end{aligned}$$

Thus we have derived the geodesic equations for TM.

(2) Now we consider the case of the infinite-dimensional vector bundle \(\mathbb {B}\rightarrow M\) whose fiber over \(x \in M\) is \(\mathbb {L}^2(I,\mathcal {T}_x(M))\), \(I = [0,1]\). A point in \(\mathbb {B}\) is denoted by \((x,v(\tau ))\), where the variable \(\tau \) corresponds to the I-parameter in \(\mathbb {L}^2(I,\mathcal {T}_x(M))\). The tangent space to \(\mathbb {B}\) at \((x,v(\tau ))\) is \(\mathcal {T}_x(M)\oplus \mathbb {L}^2(I,\mathcal {T}_x(M))\). Suppose \((u_1,w_1(\tau ))\) and \((u_2,w_2(\tau ))\) are elements of this tangent space and we use the Riemannian metric:

$$\begin{aligned} \langle (u_1,w_1(\tau )),(u_2,w_2(\tau ))\rangle =u_1\cdot u_2+\int _0^1 w_1(\tau )\cdot w_2(\tau )\,\mathrm{d}\tau . \end{aligned}$$

Now we want to work out the local equations for geodesics in \(\mathbb {B}\). A path in \(\mathbb {B}\) is denoted by \((x(s),v(s,\tau ))\). The energy calculation is basically the same as above but surround everything with integration with respect to \(\tau \). So, it starts out with

$$\begin{aligned} E= & {} \int _0^1 \left( x_s\cdot x_s + \int _0^1 \nabla _{s}v\cdot \nabla _{s}v \,\mathrm{d}\tau \right) \mathrm{d}s\\= & {} \int _0^1\int _0^1\left( x_s\cdot x_s+\nabla _{s}v\cdot \nabla _{s}v\right) \,\mathrm{d}s\mathrm{d}\tau . \end{aligned}$$

(Of course \(x_s\cdot x_s\) does not involve the parameter \(\tau \), but surrounding it with \(\int _0^1\dots \mathrm{d}\tau \) does not change its value!)

In order to perform variational calculus, we now consider a parameterized family of such paths, denoted by \((x(s,t),v(s,t,\tau ))\) where we assume that x(0, t) and x(1, t) are constant functions of t, and for each \(\tau \), \(v(0,t,\tau )\) and \(v(1,t,\tau )\) are constant functions of t, since we want every path in our family to start and end at the same points of \(\mathbb {B}\).

Then, following through the computation exactly as in earlier case, we obtain

$$\begin{aligned} E'(0)= & {} -2\int _0^1 \left( \nabla _sx_s+\int _0^1R(v,\nabla _sv)(x_s)\,\mathrm{d}\tau \right) \cdot x_t \,\mathrm{d}s \\&-\,2\int _0^1\int _0^1 \nabla _s(\nabla _sv)\cdot \nabla _tv\, \mathrm{d}\tau \mathrm{d}s. \end{aligned}$$

In order for our path \((x(s),v(s,\tau ))\) to be critical for E, \(E'(0)\) must vanish for every variation \(x_t(s)\) of x(s) and \(\nabla _t(v(s,\tau ))\) of \(v(s,\tau )\), which is clearly true if and only if

$$\begin{aligned}&\nabla _sx_s+\int _0^1R(v,\nabla _sv)(x_s)\,\mathrm{d}\tau = 0,\ \hbox { for every }s \\&\nabla _s(\nabla _sv) = 0,\ \hbox { for every }s\hbox { and every }\tau . \end{aligned}$$

\(\square \)

The geodesic path \((x(s),v(s,\tau ))\) can be intuitively understood as follows: (1) x(s) is a baseline curve on M connecting \(p_1\) and \(p_2\), and the covariant differentiation of \(x_s\) at the tangent space of \(\mathcal {T}_{x(s)}(M)\) equals the negative integral of the Riemannian curvature tensor \( R(v(s,\tau ),\nabla _{x_s} v(s,\tau ))(x_s)\) with respect to \(\tau \). In other words, the values of v at each \(\tau \) equally determine the geodesic acceleration of x(s) in the first equation. (2) The second equation leads to a fact that v is covariantly linear, i.e., \(v(s,\tau ) = a(s,\tau )+sb(s,\tau )\) and \(\nabla _{x_s} a = \nabla _{x_s} b = 0\) for every s and \(\tau \). For a geodesic path connecting \((p_1,q_1)\) and \((p_2,q_2)\), it is natural to let \(a(s,\tau ) = q_1(\tau )_{x(0) \rightarrow x(s)}\) and \(b(s,\tau ) = w(\tau )_{x(0) \rightarrow x(s)}\), where \(q_1(\tau )_{x(0) \rightarrow x(s)}\) and \(w(\tau )_{x(0) \rightarrow x(s)}\) represent the parallel transport of \(q_1(\tau )\) and \(w(\tau )\) along x from x(0) to x(s), and w is the difference between the TSRVFs \(q_2\) and \(q_1\) in \(\mathcal {T}_{x(0)}(M)\), defined as \((q_2)_{x(1) \rightarrow x(0)} - q_1\). In Fig. 2, we illustrate geodesic paths between some arbitrary trajectories on \(M = \mathbb {S}^2\). In each case, the solid line denotes the baseline x(s) and the intermediate lines are the covariant integrals (in Algorithm 3) of \(v(s,\cdot )\) with starting point x(s). As a comparison, the dash line shows the standard geodesic curve between starting points \(p_1\) and \(p_2\) in \(\mathbb {S}^2\).

Fig. 2
figure 2

Examples of geodesic between two trajectories on \(\mathbb {S}^2\). The solid line denotes the baseline x(s), and the dash line shows the geodesic on \(\mathbb {S}^2\) as a comparison

Theorem 1 is only a characterization of geodesics and does not provide explicit expressions for computing them. In the following section, we develop a numerical solution for constructing geodesics in \(\mathbb {B}\).

2.3 Numerical Computations of Geodesics in \(\mathbb {B}\)

Here we develop a numerical approach for computing geodesic paths in the representation space. To simplify discussion, we will assume that the original trajectories on M are not only piecewise \(C^1\) but also piecewise geodesic. This implies that the corresponding TSRVFs are piecewise constant. (This restriction was also discussed for unrolling of spherical curves in [20].) Therefore, our focus in this section will be on piecewise constant TSRVFs.

There are two main approaches in numerical construction of geodesic paths on manifolds. The first approach, called path straightening, initializes the search with an arbitrary path, between the given two points on the manifold, and then iteratively “straightens” it until a geodesic is reached. The second approach, called the shooting method, tries to “shoot” a geodesic from the first point, iteratively adjusting the shooting direction, so that the resulting geodesic passes through the second point. In this paper, we use the shooting method to construct geodesic paths in \(\mathbb {B}\).

In order to implement the shooting method, we need the exponential map on \(\mathbb {B}\). Given a point \((p,q) \in \mathbb {B}\) and a tangent vector \((u,w) \in \mathcal {T}_{(p,q)}(\mathbb {B})\), the exponential map \(\exp _{(p,q)}\left( s(u,w) \right) \) for \(s \in [0,1]\) gives a geodesic path (x(s), v(s)) in \(\mathbb {B}\). Equation 3 helps us with this construction as follows. The two equations essentially provide expressions for second-order covariant derivatives of x and v components of the path. Therefore, using numerical techniques, we can perform covariant integration of these quantities to recover the path itself.

We assume that v and w are piecewise constant over the same partition of [0, 1]. Furthermore, using the re-parameterization group introduced later, we can also assume that this partition is a uniform partition of [0, 1]. Also, we point out that addition and subtraction of piecewise constant functions with identical partitions simplify to these operations restricted to only the midpoints of the intervals.

figure f

In this setup, Algorithm 1 corresponds to the Euler’s method for numerical integration of an ordinary differential equation and, thus, follows a standard convergence analysis.

Once we have a procedure for the exponential map, we can establish the shooting method for finding geodesics. Let \((p_1,q_1)\) be the starting point and \((p_2,q_2)\) be the target point. The shooting method iteratively updates the tangent or shooting vector (uw) on \(\mathcal {T}_{(p_1,q_1)}(\mathbb {B})\) such that \(\exp _{(p_1,q_1)}\left( (u,w)\right) = (p_2,q_2)\). Then, the geodesic between \((p_1,q_1)\) and \((p_2,q_2)\) is given by \((x(s),v(s))=\exp _{(p_1,q_1)}(s(u,w))\), \(s\in [0,1]\). The key step here is to use the current discrepancy between the point reached, \(\exp _{(p_1,q_1)}\left( (u,w)\right) \), and the target, \((p_2, q_2)\), to update the shooting vector (uw), at each iteration. There are several possibilities for performing the updates and we discuss one here. Since we have two components to update, u and w, we will update them separately: (1) Fix w and update u. For the u component, the increment can come from parallel translation of the vector \(\exp _{\tilde{p}}^{-1}(p_2)\) (the difference between the reached point \(\tilde{p}\) and the target point \(p_2\)) from \(\tilde{p}\) to \(p_1\), where \(\tilde{p}\) is the first component of reached point \(\exp _{(p_1,q_1)}((u,w))\). (2) Fix u and update w. For the w component, we can take the difference between \(q_2\) and the second component of the point reached (denoted as \(\tilde{q}\)) as the increment. This is done by parallel translating \(\tilde{q}\) to \(\mathcal {T}_{p_2}(M)\) (the same space as \(q_2\)) and calculate the difference, and then parallel translate the difference to \(\mathcal {T}_{p_1}(M)\) to update w.

Once again we will assume that the TSRVFs \(q_1\) and \(q_2\) are piecewise constant curves on a uniform partition of the interval [0, 1]. Numerical accuracy of this shooting algorithm naturally depends on the numerical accuracy of Algorithm 1.

figure g

Recall that trajectories on M and their representations in \(\mathbb {B}\) are bijective. For each pair \((p,q) \in \mathbb {B}\), one can reconstruct the corresponding trajectory \(\alpha \) using covariant integration. A numerical implementation of this procedure is summarized in Algorithm 3. Similar to Algorithm 1, Algorithm 3 is also an Euler’s method for numerical integration of an ordinary differential equation and, thus, follows a standard convergence analysis.

figure h

Algorithm 2 allows us to calculate the geodesic between two points in \(\mathbb {B}\). So, for each point along the geodesic (x(s), v(s)) in \(\mathbb {B}\), one can easily reconstruct the trajectory on M using Algorithm 3. Here, one sets x(s) as the starting point and v(s) as the TSRVF of the trajectory.

2.4 Geodesic Distance on \(\mathbb {B}\)

Using the chosen Riemannian metric on \(\mathbb {B}\) (defined in Eq. 2), the geodesic distance between any two points in \(\mathbb {B}\) is defined as follows.

Definition 2

Given two trajectories \(\alpha _1\), \(\alpha _2\) and their representations \((p_1,q_1), (p_2,q_2) \in \mathbb {B}\), and let \((x(s),v(s)) \in \mathbb {B}\), \(s \in [0,1]\) be the geodesic between \((p_1,q_1)\) and \((p_2,q_2)\) on \(\mathbb {B}\), the geodesic distance is given as:

$$\begin{aligned} d_c((p_1,q_1), (p_2,q_2))=\sqrt{ l_{x}^2 + \int _0^1 |{ q}^{\parallel }_1(\tau )- q_2(\tau )|^2\mathrm{d}\tau }. \end{aligned}$$
(4)

This distance has two components: (1) the length between the starting points on M, \(l_x = \int _0^1 |\dot{x}(s) |\mathrm{d}s\); and (2) the standard \(\mathbb {L}^2\) norm on \(\mathbb {B}_{p_2}\) between the TSRVFs of the two trajectories, where \({ q}^{\parallel }_1\) represents the parallel transport of \(q_1\in \mathbb {B}_{p_1}\) along x to \(\mathbb {B}_{p_2}\). Since we have a numerical approach for approximating the geodesic, the same algorithm can also provide an estimate for the geodesic distance.

3 Analysis of Trajectories Modulo Re-parameterization

The main motivation of using TSRVF representation for trajectories on M and constructing the distance \(d_c\) to compare two trajectories comes from the following. If a trajectory \(\alpha \) is warped by \(\gamma \), resulting in \(\alpha \circ \gamma \), the new TSRVF is given by:

$$\begin{aligned} q_{\alpha \circ \gamma }(t)= & {} \left( { (\dot{\alpha }(\gamma (t)) \dot{\gamma }(t)) \over \sqrt{ | \dot{\alpha }(\gamma (t)) \dot{\gamma }(t)|} } \right) _{\alpha (\gamma (t)) \rightarrow p} \\= & {} \left( { (\dot{\alpha }(\gamma (t)))\sqrt{\dot{\gamma }(t)} \over \sqrt{ | \dot{\alpha }(\gamma (t)) |} } \right) _{\alpha (\gamma (t)) \rightarrow p} \\= & {} q_\alpha (\gamma (t)) \sqrt{\dot{\gamma }(t)} \equiv (q_\alpha *\gamma )(t). \end{aligned}$$

Theorem 2

For any two trajectories \(\alpha _1, \alpha _2 \in \mathcal{F}\) and their representations \((p_1,q_{1}), (p_2,q_{2}) \in \mathbb {B}\), the metric \(d_c\) satisfies \(d_c((p_1,q_{\alpha _1\circ \gamma }), (p_2,q_{\alpha _2 \circ \gamma })) = d_c((p_1,q_{1}), (p_2,q_{2}))\), for any \(\gamma \in \varGamma \).

Proof

First, if a trajectory is warped by \(\gamma \in \varGamma \), the resulting trajectory is \(\alpha \circ \gamma \), i.e., \(\gamma \) acts on the space \(\mathbb {B}\) by \((p,q)*\gamma = (p,q*\gamma )\). The differential of this action is the map \(\mathcal {T}_{(p,q)}(\mathbb {B}) \rightarrow \mathcal {T}_{(p,q*\gamma )}(\mathbb {B})\) given by \((u,w)\mapsto (u,w*\gamma )\). We prove that this differential preserves our Riemannian inner product (Eq. 2) as follows: let \((u_1,w_1)\) and \((u_2,w_2)\) be two tangent vectors on \(\mathcal {T}_{(p,q)}(\mathbb {B})\); it follows that

$$\begin{aligned}&\left\langle (u_1,w_1*\gamma ),(u_2,w_2*\gamma ) \right\rangle \nonumber \\&\quad = (u_1 \cdot u_2)+ \int _0^1 w_1(\gamma (t))\sqrt{\dot{\gamma }}(t)w_2(\gamma (t))\sqrt{\dot{\gamma }}(t) \mathrm{d}t \nonumber \\&\quad = (u_1\cdot u_2)+ \int _0^1 w_1(\gamma ) w_2(\gamma ) \mathrm{d}\gamma \nonumber \\&\quad = (u_1\cdot u_2)+ \int _0^1 w_1(s) w_2(s) \mathrm{d}s \nonumber \\&\quad = \left\langle (u_1,w_1),(u_2,w_2)\right\rangle , \end{aligned}$$
(5)

Since \(\varGamma \) acts on \(\mathbb {B}\) by isometries, i.e., preserving the Riemannian inner product, it follows immediately that it takes geodesics to geodesics, and preserves geodesic distance. \(\square \)

Theorem 2 reveals the advantage of using TSRVF representation: the action of \(\varGamma \) on \(\mathbb {B}\) under the metric \(d_c\) is by isometries. The isometry property allows us to compare trajectories in a manner that the resulting comparison is invariant to the time warping. This is achieved through defining a distance in the quotient space of re-parameterization group.

3.1 Theoretical Setup

To form the quotient space of \(\mathbb {B}\) modulo the re-parameterization group, we take the approach presented in several previous papers, including [27]. While [27] considers shapes of curves in Euclidean domains, these ideas naturally extend to the nonlinear manifolds. The approach is to introduce a set \(\tilde{\varGamma }\) as the set of all non-decreasing, absolutely continuous functions \(\gamma \) on [0, 1] such that \(\gamma (0) = 0\) and \(\gamma (1) = 1\). This set is a semigroup with the composition operation (it does not have a well-defined inverse). It can be shown that \(\varGamma \) is a dense subset of \(\tilde{\varGamma }\). For any \(q \in \mathbb {L}^2([0,1], \mathcal {T}_p(M))\), let \([q]_{\tilde{\varGamma }}\) denote the set \(\{ (q *\gamma ) | \gamma \in \tilde{\varGamma }\}\). This is a closed set [27, 38], while the orbit of q under \(\varGamma \) is not, and therefore we choose to work with the former, at least for the formal development. (However, in practice, we approximate solutions using the elements of \(\varGamma \).) Note that the actions of \(\varGamma \) and \(\tilde{\varGamma }\) on q are exactly same as if \(\alpha \) was a Euclidean curve, as the kind studied in [27]. Therefore, borrowing results from [27], the closure of \(\varGamma \)-orbit of q is equal to the \(\tilde{\varGamma }\)-orbit of q. Consequently, will call the set \([q]_{\tilde{\varGamma }}\) a closed-up orbit of q. We define the quotient space \(\mathbb {B}/\tilde{\varGamma }\) as the set of all closed-up orbits, with each orbit being:

$$\begin{aligned}{}[(p,q)] \equiv (p,[q]) = \{ (p,(q* \gamma ))| \gamma \in \tilde{\varGamma } \}. \end{aligned}$$

To understand the concept of a closed-up orbit, one can view it as an equivalence class under the following relation. For any two trajectories \(\alpha _1,\alpha _2\) and their representations in \(\mathbb {B}\), \((p_1,q_{1}),(p_2,q_{2})\), we define them to be equivalent when: (1) \(p_1 = p_2\); and (2) there exists a sequence \(\gamma _i \in \tilde{\varGamma }\) such that \(q_{\alpha _2 \circ \gamma _i}\) converges to \(q_{1}\). In other words, if two trajectories have the same starting point, and the TSRVF of one can be time warped into the TSRVF of the other, using a sequence of time warpings, then these two trajectories are deemed equivalent to each other. Theorem 2 states that if two trajectories are warped by the same \(\gamma \) function, the distance \(d_c\) between them remains the same. In other words, the closed-up orbits in \(\mathbb {B}\) are “parallel” to each other.

The main reason for introducing the quotient space \(\mathbb {B}/\tilde{\varGamma }\) is to define a proper distance on it and to compute geodesic paths between its elements with respect to this distance for the purposes of statistical analysis. We define a distance on the quotient space \(\mathbb {B}/\tilde{\varGamma }\) using the inherent Riemannian metric from \(\mathbb {B}\), as follows.

Definition 3

The geodesic distance \(d_{q}\) on \(\mathbb {B}/\tilde{\varGamma }\) is the shortest distance between two closed-up orbits in \(\mathbb {B}\), given as

$$\begin{aligned}&d_q((p_1,[q_1]), (p_2,[q_2])) \nonumber \\& = \inf _{\gamma _1, \gamma _2 \in \tilde{\varGamma }}d_c( (p_1,(q_1 * \gamma _1))), (p_2,(q_2 * \gamma _2))) \end{aligned}$$
(6)

For a similar representation, [38] established that the induced distance is a proper distance on the set of closed-up orbits and that same proof applies to the current context also. It is also similar to the theory described for Euclidean curves in [27].

In order to compute geodesics paths in \(\mathbb {B}/\tilde{\varGamma }\), one can solve for the optimization problem stated in Eq. 6 and use the optimal points to form geodesics in the upper space \(\mathbb {B}\). That is, for any \(\alpha _1, \alpha _2 \in \mathbb {B}\), and the corresponding representations \((p_1, q_1)\), \((p_2, q_2)\), we first solve for

$$\begin{aligned} (\hat{\gamma }_1, \hat{\gamma }_2) = \mathop {\mathrm{argmin}}\limits _{\gamma _1, \gamma _2 \in \tilde{\varGamma }} d_c( (p_1,(q_1 *\gamma _1))), (p_2,(q_2 * \gamma _2)) ). \end{aligned}$$
(7)

Then, we simply compute a geodesic path between \((p_1, q_1 * \hat{\gamma }_1)\), and \((p_2, q_2 * \hat{\gamma }_2)\) in \(\mathbb {B}\), as described in the previous section.

3.2 Numerical Approximations

Conceptually, the geodesic and the geodesic distance between closed-up orbits \((p_1,[q_1])\) and \((p_2,[q_2])\) are defined by optimizing over geodesics between all possible cross-pairs in sets \((p_1,[q_1])\) and \((p_2,[q_2])\). This, in turn, requires a double optimization on the set \(\tilde{\varGamma }\), as stated in Definition 3. We now look at the computational aspects of this definition and seek some faster approximations.

Firstly, since \(\varGamma \) is dense in \(\tilde{\varGamma }\), we can compute the geodesic distance \(d_q((p_1,[q_1]), (p_2,[q_2])) \) using only a single optimization on the group \(\varGamma \). This is because:

$$\begin{aligned}&\mathop {\mathrm{argmin}}\limits _{\gamma _1, \gamma _2 \in \tilde{\varGamma }} d_c( (p_1,(q_1 *\gamma _1))), (p_2,(q_2 * \gamma _2)) ) \nonumber \\&= \mathop {\mathrm{argmin}}\limits _{\gamma _1, \gamma _2 \in {\varGamma }} d_c( (p_1,(q_1 *\gamma _1))), (p_2,(q_2 * \gamma _2)) ) \nonumber \\&= \inf \limits _{\gamma \in \varGamma }d_c( (p_1,q_1), (p_2,(q_2 * \gamma )) ). \end{aligned}$$
(8)

There is no approximation here and the infimum on a single \(\varGamma \) is much faster compared to the double optimization on \(\tilde{\varGamma }\).

If we further assume that the trajectories \(\alpha _1,\ \alpha _2\) are piecewise geodesic and, thus, their TSRVFs are piecewise constants, then some additional results hold. The paper [27] provided an exact approach for optimal alignment of SRVFs of piecewise linear curves in Euclidean spaces. Since TSRVFs are Euclidean curves, that approach can be easily adapted to solve for the optimal alignment in Eq. 7. This would provide optimal alignment and consequently a precise geodesic path between \((p_1,[q_1])\) and \((p_2,[q_2])\).

However, this approach can be slow in practice, and one can speed the implementation using a single optimization according to Eq. 8. That is, we can solve for a single \(\hat{\gamma }\) and match the point \(\alpha _1(t)\) to the point \(\alpha _2(\hat{\gamma }(t))\). While this approach is much faster than the joint optimization, a drawback here is that there is no guarantee that we are close to an optimal matching. However, in practice, we have found that matchings and geodesic paths obtained this way are quite similar to the optimal solutions in most real data. To solve Eq. 8, it is equivalent to optimizing the following equation:

$$\begin{aligned} \min _{(x,v), \gamma }\left( {\ l^2_{x} + \int _0^1\Vert {q}^{\parallel }_{1,x}(t)- (q_2* \gamma )(t)\Vert ^2\mathrm{d}t\ } \right) \quad , \end{aligned}$$
(9)

where (xv) is the path between \((p_1,q_1)\) and \((p_2,q_{\alpha _2 \circ \gamma })\), and \({q}^{\parallel }_{1,x}\) means parallel transport \(q_1\) along x to \(\mathbb {B}_{p_2}\). Note that the time-warping \(\gamma \) acting on \(\alpha _2\) changes the underlying geodesic (xv) between two trajectories. Algorithm 4 describes a numerical solution for optimizing Eq. 9 on a general manifold M.

figure i

Note that in Algorithm 4, Step 2 corresponds to the first argument (xv) and Step 3 corresponds to the second argument \(\gamma \) in Eq. 9, respectively. The optimization over the warping function in Step 3 is achieved using the Dynamic Programming Algorithm (page 435–436 in [35]). Here one samples the interval [0, 1] using N discrete points and then restricts to only piecewise linear \(\gamma \)’s that pass through that \(N \times N\) grid. In practice, Algorithm 4 typically takes a few iterations to converge.

Since Algorithm 4 involves multiple evaluations of the exponential map and dynamic programming alignment, it is still not computationally very efficient. We further speed up this computation as follows: find the baseline x(s) connecting two trajectories first (using geodesic between \(\alpha _1(0)\) and \(\alpha (1)\) on M) and then align their TSRVFs accordingly. This substantially speeds up the solution albeit at the cost of diverging from the optimal solution stated under the theoretical formulation. In the experimental results presented later, we use this method to speed up registration and comparison.

4 Riemannian Structure on \(\tilde{\varvec{\mathcal {P}}}\)

Next we discuss the geometry of \(\tilde{\varvec{\mathcal {P}}}\) and impose a Riemannian structure that facilitates our analysis of trajectories on \(\tilde{\varvec{\mathcal {P}}}\). Several past papers have studied the space of SPDMs as a nonlinear manifold and have imposed metric structures on that manifold [3, 12, 19, 31, 34]. While they mostly seek to define distances on this set, a few of these distances originate from a Riemannian structure with expressions for geodesics and exponential maps. However, the most common Riemannian framework [31] does not provide expressions for all desired items that are needed in our context, especially expressions for parallel transport and Riemannian curvature tensor. Therefore, we choose a more recent Riemannian structure that was introduced in [37], and subsequently used in [38]. We will summarize the main results here and refer the reader for more details to these papers and two supplementary files.

Let \(\tilde{\varvec{\mathcal {P}}}\) be the space of \(n \times n\) SPDMs, and let \(\varvec{\mathcal {P}}\) be its subset of matrices with determinant one. The idea is to first identify the space \({ \varvec{\mathcal {P}}}\) with the quotient space SL(n) / SO(n) and borrow the Riemannian structure from the latter directly. Then, one can straightforwardly extend the Riemannian structure on \(\varvec{\mathcal {P}}\) to \(\tilde{{\varvec{\mathcal {P}}}}\). The process starts by choosing a Riemannian metric on SL(n) as follows: for any point \(G \in SL(n)\) the metric is defined by pulling back the tangent vectors under \(G^{-1}\) to I, and then using the trace metric (see more details in Sect. 1 of Supplementary Material I). This definition leads to expressions for the exponential map, its inverse, parallel transport of tangent vectors, and the Riemannian curvature tensor on SL(n). It also induces a Riemannian structure on the quotient space SL(n) / SO(n) in a natural way because the chosen metric is invariant to the action of SO(n) on SL(n). Finally, these results are transferred to \(\varvec{\mathcal {P}}\) using the mapping

$$\begin{aligned} \pi : SL(n)/SO(n) \rightarrow \mathcal {P}, \pi ([G]) = \sqrt{{\tilde{G}}{ \tilde{G}}^t}, \end{aligned}$$

for any \(\tilde{G} \in [G]\). One can check that this map is well defined and is a diffeomorphism, by letting \(\tilde{G} = PS\) (polar decomposition), and then \(\pi ([G]) = \sqrt{\tilde{G}\tilde{G}^t} = \sqrt{PSS^tP} = P\). This square root is the symmetric, positive-definite square root of a symmetric matrix. The inverse map of \(\pi \) is given by: \(\pi ^{-1}(P) = [P] \equiv \{PS|S \in SO(n) \} \in SL(n)/SO(n)\). This establishes a one-to-one correspondence between the quotient space SL(n) / SO(n) and \(\varvec{\mathcal {P}}\). In turn, this correspondence is used to derive required expressions for geodesics, exponential map, curvature tensor, etc., on \(\varvec{\mathcal {P}}\). We refer the reader to the Supplementary Material I for more details.

These results are then extended to the set \(\tilde{\varvec{\mathcal {P}}}\) using a product map. Since for any \(\tilde{P} \in \tilde{\varvec{\mathcal {P}}}\) we have \(\det ({\tilde{P}})>0\), we can express \(\tilde{P} = (P, {1 \over n} \log (\det (\tilde{P})))\) with \(P = {\tilde{P} \over \det (\tilde{P})^{1/n}} \in \varvec{\mathcal {P}}\). Thus, \(\tilde{\varvec{\mathcal {P}}}\) is identified with the product space of \(\varvec{\mathcal {P}}\times \mathbb {R}\). To define a metric on this product space, we can use the square root of sum of squares of the individual metrics but with arbitrary weights. Here we use the weight 1 / n for the determinant term. We summarize expressions for the required mathematical tools on \(\tilde{\varvec{\mathcal {P}}}\):

  1. 1.

    Exponential map Give \(\tilde{P} \in \tilde{\varvec{\mathcal {P}}}\) and a tangent vector \(\tilde{V} \in \mathcal {T}_{\tilde{P}}(\tilde{\varvec{\mathcal {P}}})\). We denote \(\tilde{V} = (V,v)\), where \(V \in \mathcal {T}_P(\varvec{\mathcal {P}})\), \(P = \tilde{P}/\det (\tilde{P})^{1/n}\) and \(v = \frac{1}{n}\log (\det (\tilde{P}))\). The exponential map \(\exp _{\tilde{P}}(\tilde{V})\) is given as \(e^v\exp _P(V)\), where \(\exp _P(V) = \sqrt{Pe^{2P^{-1}VP^{-1}}P}.\)

  2. 2.

    Geodesic distance For any \(\tilde{P}_1, \tilde{P}_2 \in \tilde{\varvec{\mathcal {P}}}\), the squared geodesic distance between them is : \(d_{\tilde{\varvec{\mathcal {P}}}}(\tilde{P}_1, \tilde{P}_2)^2= d_{\varvec{\mathcal {P}}}(I,P_{12})^2 + \frac{1}{n}(\log (\det (\tilde{P}_2))-\log (\det (\tilde{P}_1)))^2\), where \( P_{12} = \sqrt{P_1^{-1} P_2^2 P_1^{-1} } \) and \(d_{\varvec{\mathcal {P}}}(I,P_{12}) = \Vert A_{12}\Vert \) for \(e^{A_{12}} = P_{12} \in \varvec{\mathcal {P}}\).

  3. 3.

    Inverse exponential map For any \(\tilde{P}_1,\tilde{P}_2 \in \tilde{\varvec{\mathcal {P}}}\), the inverse exponential map \(\exp ^{-1}_{\tilde{P}_1}(\tilde{P}_2) = \tilde{V} \equiv (V, v)\), where \(V = P_1 \log (\sqrt{P_1^{-1} P_2^2 P_1^{-1}}) P_1\) and

    $$\begin{aligned} v = \frac{1}{n}\log (\det (P_2)) - \frac{1}{n}\log (\det (P_1)). \end{aligned}$$
  4. 4.

    Parallel transport For any \(\tilde{P}_1,\tilde{P}_2 \in \tilde{\varvec{\mathcal {P}}}\) and a tangent vector \(\tilde{V} = (V,v) \in \mathcal {T}_{\tilde{P}_1}(\tilde{\varvec{\mathcal {P}}})\), the parallel transport of \(\tilde{V}\) along the geodesic from \(\tilde{P}_1\) to \(\tilde{P}_2\) is: \( (P_2T_{12}^TBT_{12}P_2,v)\), where \(B = P_1^{-1}VP_1^{-1},T_{12}=P_{12}^{-1}P_1^{-1}P_2\) and \(P_{12} = \sqrt{P_1^{-1} P_2^2 P_1^{-1} }\).

  5. 5.

    Riemannian curvature tensor For any \(\tilde{P} \in \tilde{\varvec{\mathcal {P}}}\), and tangent vectors \(\tilde{X}=(X,x),\tilde{Y} = (Y,y)\) and \(\tilde{Z} = (Z,z) \in \mathcal {T}_{\tilde{P}}(\tilde{\varvec{\mathcal {P}}})\), the Riemannian curvature tensor is given by \(R(\tilde{X},\tilde{Y})(\tilde{Z}) = -P[[A,B],C]P\), where \( A = P^{-1}XP^{-1},\) \(B = P^{-1}YP^{-1}, C = P^{-1}ZP^{-1}\) and \([A,B] = AB - BA\).

Remark 1

The Riemannian structure used here and the one used previously [31] are both derived from the same induced structure on the quotient space SL(n) / SO(n). The difference lies in the mapping used to map the metric from SL(n) / SO(n) to \({\varvec{\mathcal {P}}}\). In [31], the mapping from the quotient space \({\varvec{\mathcal {P}}}\) is \({GG^T}\), leading to the relationship:

while in our approach, this mapping is \(\sqrt{GG^T}\), leading to the picture:

The main motivation for the current choice of mapping, and the resulting Riemannian metric on \({\varvec{\mathcal {P}}}\), is as follows. Consider any \(G \in SL(n)\). It is an important fact that \(\sqrt{GG^T}\) is the only element in [G] that is also in \({\varvec{\mathcal {P}}}\). So, it is a very natural idea to represent the equivalence class [G] with \(\sqrt{GG^T}\) giving a representation of SL(n) / SO(n) using \({\varvec{\mathcal {P}}}\). Note that \(GG^T\), used in [31], is generally not an element of [G]. In view of this simplification, i.e., the identity mapping from \({\varvec{\mathcal {P}}}\) to SL(n), \({\varvec{\mathcal {P}}}\) can be viewed as a subset of SL(n). Thus, all the relevant expressions can be derived under this identification rather than treating \({\varvec{\mathcal {P}}}\) as a separate space. Specifically, we have readily available expressions for geodesic, geodesic distance, exponential map and its inverse, parallel transport, and Riemannian curvature tensor on \({\varvec{\mathcal {P}}}\) viewed as a subset of SL(n).

5 Demonstration of Numerical Procedures

In this section, we demonstrate the numerical procedures of the proposed framework on simulated covariance trajectories. We used \(M = \varvec{\mathcal {P}}(3)\), the set of \(3 \times 3\) SPDMs with determinant one.

Geodesic computation As a first example, we compute the geodesic between two arbitrary trajectories using the numerical method in Algorithm 2. Figure 3 shows the result. In this plot, each matrix is visualized by an ellipsoid and a trajectory in \(\varvec{\mathcal {P}}(3)\) by a sequence of ellipsoids. The top row shows two original trajectories \(\alpha _1\) and \(\alpha _2\) with representations \((p_1,q_1)\) and \((p_2,q_2)\)). The next row shows the baseline path x(s) associated with the geodesic between \(\alpha _1\) and \(\alpha _2\), and the end point of the geodesic, i.e., \(\exp _{(p_1,q_1)}(u,w)\). Here we select \((p_1,q_1)\) as the starting point and compute the shooting direction (uw) such that \(\exp _{(p_1,q_1)}(u,w) \approx (p_2,q_2)\). The bottom panel shows the evolution of \(\mathbb {L}^2\) norm between the shot trajectory and the target \((p_2,q_2)\) during the shooting algorithm.

Fig. 3
figure 3

Example of calculating geodesic using shooting method for trajectories on \(\varvec{\mathcal {P}}\). The first row shows the original trajectory \((p_1,q_1)\) and the target trajectory \((p_2,q_2)\). The second and third rows show some results obtained from Algorithm 2: baseline curve x(s) connecting \(p_1\) and \(p_2\) on \(\varvec{\mathcal {P}}\), the final shot trajectory \(\exp _{(p_1,q_1)}(u,w)\) and the \(\mathbb {L}^2\) discrepancy between the shot trajectory and the target trajectory versus number of iterations

Temporal alignment Next, we present an example of aligning two trajectories \(\alpha _1\) and \(\alpha _2\) in \(\varvec{\mathcal {P}}(3)\) in Fig. 4. This alignment is based on particularization of Algorithm 4 to \(M = \varvec{\mathcal {P}}(3)\). As the figure shows, the two trajectories are very well aligned as a result.

Fig. 4
figure 4

Pairwise registration of two trajectories \(\alpha _1\) (first row) and \(\alpha _2\) (second row). The bottom panel shows the warping function \(\gamma \) to warp \(\alpha _2\) to \(\alpha _1\)

Computation of summary statistics Finally, we focus on the problem of generating statistical summaries of covariance trajectories. Since \(d_q\) defines a metric in the quotient space \(\mathbb {B}/\tilde{\varGamma }\), this framework allows us to perform statistical analysis of multiple trajectories in \(\mathbb {B}/\tilde{\varGamma }\). Given a set of trajectories \(\{\alpha _i, \ i= 1\dots k\}\), we are interested in computing the average of these trajectories and using it as a template for registering these trajectories. The sample mean can be approximated through:

$$\begin{aligned} (\mu _p,[\mu _q]) = \mathop {\mathrm{argmin}}\limits _{(p,[q]) \in \mathbb {B}/\tilde{\varGamma }} \sum _{i=1}^n d_q ( (p,[q]), (p_i, [q_{\alpha _i}]) )^2. \end{aligned}$$
(10)

Note that \((\mu _p, [\mu _q])\) is an orbit (equivalence class of trajectories) and one can select any element of this orbit as a template to align multiple trajectories.

figure j

For discussions on existence of this Riemannian sample mean and convergence of Algorithm 5 to a limit, we refer the reader to [1, 21].

After Algorithm 5 coverages, one can compute the covariant integral of \((\mu _p,\mu _q)\) using Algorithm 3, denoted by \(\mu \), which is the mean of \(\{\alpha _1,\alpha _2,\ldots ,\alpha _n\}\). Figure 5 shows an example of calculating the mean of given trajectories. The upper part of the figure shows four simulated trajectories. The bottom part shows the mean trajectory in two situations: in \(\mathbb {B}\) using \(d_c\) (without temporal alignment) and in \(\mathbb {B}/{\varGamma }\) under \(d_q\) (with alignment). One can see that under \(d_q\) the structures along the trajectories are better preserved.

Fig. 5
figure 5

Example of calculating the mean trajectory. The upper panel shows simulated trajectories, and the bottom panel shows means before and after alignment

6 Comparison with Previous Work

The proposed framework is an improvement over [38] in the following sense. It preserves the invariance properties achieved in [38], but does not require choosing a global reference point. Also, this framework naturally includes the difference between the starting points of two trajectories that was ignored in [38]. Since the velocity vectors here are transported to the starting point of a trajectory, along that trajectory, as opposed to a transport to an arbitrary reference point in [38], this representation is more stable.

To quantitatively compare with [38, 39], we performed the same visual speech recognition task utilizing the same subset of OuluVS dataset [39, 42]. Here, we briefly introduce the experiment setup and more details can be found in [39]. OuluVS dataset includes 20 speakers, each uttering 10 everyday greetings five times: Hello, Excuse me, I am sorry, Thank you, Good bye, See you, Nice to meet you, You are welcome, How are you, Have a good time. Thus, the database has a total of 1000 videos. All the image sequences are segmented, having the mouth region determined by manually labeled eye positions in each frame [43]. Some examples of the segmented mouth images are shown in Fig. 6. We performed the experiment on a subset of the dataset, which contains 800 video sequences by removing some short videos [39]. The same covariance matrix features as [39] were extracted to represent each video. The resulting trajectories in \(\tilde{\varvec{\mathcal {P}}}(7)\) are aligned using Algorithm 4 and compared using distance \(d_q\) defined in Eq. 6.

In Fig. 7a, we show some optimal \(\gamma \)’s obtained to align one video of phrase (“excuse me”) to other videos of the same phrase spoken by the same person. One can see that there exist temporal differences in the original videos and they need to be aligned before further analysis. In (b), we show the histogram of \((d_c-d_q)/\text {max}(d_c,d_q)\)’s (the relative distance changes before and after alignment). In this case, each person has 50 videos, and we can calculate \((50\times 49)/2\) pairwise distances before and after alignment, and their differences. For all 20 persons in this dataset, we have \(20 \times (50\times 49)/2 = 24,500\) such differences. From the histogram of the relative changes, one can see that after our alignment, the distances (\(d_q\)’s) consistently become smaller.

Fig. 6
figure 6

Examples of down sampled video sequences in OuluVS dataset. The first and second rows show one person’s two speech samples of the phrase “Nice to meet you”; the third and fourth rows show the phrases “How are you” and “Good bye” uttered by different persons

Fig. 7
figure 7

a shows the optimal \(\gamma \)’s obtained to align one video of phrase (“excuse me”) to the other four videos of the same phrase spoken by the same person. b shows the histogram of \((d_c-d_q)/\text {max}(d_c,d_q)\)’s (The relative distance changes before and after alignment)

Table 1 shows the average nearest neighbor (1NN) classification rate of our method and [39]. Our method has the classification rate of \(78.6\%\), which is \(8.1\%\) better than [39]’s. These results indicate that the new representation of trajectories and analysis framework have better discriminative power even before alignment comparing with the reference point-based method in [38, 39]. In addition, there is a \(37.6\%\) improvement due to alignment (registration), which demonstrates the importance of removing temporal difference in comparing of the dynamic systems in computer vision.

Table 1 Comparison of SDT performance on OuluVS

7 Application 1: Generative Modeling of Trajectories

Algorithm 5 results in several quantities of interest: (i) the mean trajectory \((\mu _p,[\mu _q])\), (ii) the aligned trajectories \((p_i,\tilde{q}_i)\), and (iii) the shooting vectors from the mean to \((p_i,\tilde{q}_i)\), denoted as \((u_i,w_i)\). Since our representation is invertible, one can develop generative models of given trajectories using these quantities. For example, we can use statistical methods to infer the distribution of a set of trajectories, draw random samples and perform statistical inference. We illustrate these ideas using a simple example on \(\varvec{\mathcal {P}}(3)\), as it is easy to visualize.

As described in Supplementary Material I, the tangent element \(X \in \mathcal {T}_{\mu _p}(\varvec{\mathcal {P}})\) can be identified as \(\mu _p B \mu _p\), where \(B \in \mathcal {T}_I(\varvec{\mathcal {P}}) = \{A| A^t = A \text { and tr}(A) = 0\}\). For matrix size three, elements of \(\mathcal {T}_I(\varvec{\mathcal {P}}(3))\) has only five degrees of freedom. Let \(\phi : \mathcal {T}_I(\varvec{\mathcal {P}}(3))\mapsto \mathbb {R}^5\) be an embedding given by \(\phi (A) = [a_{11},a_{12}, a_{13}, a_{22}, a_{23}]^T\). Given tangent vectors \((u_i,w_i)\) for \(i=1,\ldots ,n\) in \(\mathcal {T}_{(\mu _p,\mu _q)}(\mathbb {B})\), we have \(u_i \in \mathcal {T}_{\mu _p}(\varvec{\mathcal {P}}(3))\) and \(w_i \in \mathbb {L}^2([0,1],\mathcal {T}_{\mu _p}(\varvec{\mathcal {P}}(3)))\). We transform \((u_i,w_i)\) into \((u^0_i,w^0_i)\) such that \(u^0_i \in \mathcal {T}_{I}(\varvec{\mathcal {P}}(3))\) and \(w^0_i \in \mathbb {L}^2([0,1],\mathcal {T}_{I}(\varvec{\mathcal {P}}(3)))\) according to

$$\begin{aligned} u_i= \mu _p u^0_i \mu _p \text { and } w_i(s) = \mu _p w^0_i(s) \mu _p, \text { for } s \in [0,1], \end{aligned}$$

and perform the statistical modeling in \(\mathcal {T}_{I}(\varvec{\mathcal {P}}(3))\). These elements are further mapped into \(\mathbb {R}^5\) and functions in \(\mathbb {R}^5\), respectively, using \(\phi \). Statistical modeling of the trajectories in \(\varvec{\mathcal {P}}(3)\) becomes of modeling points in \(\mathbb {R}^5\) and \(\mathbb {L}^2\) functions in \(\mathbb {R}^5\).

Next, we consider the problem of fitting a distribution to the given sample trajectories. For the first component \(u_i\), we use a simple multivariate Gaussian distribution. For the second component \(w_i\), we follow a similar procedure as [26] to define a Gaussian distribution in a principal subspace and then map it back to the trajectory space. Let the trajectory \(\alpha _i\) be sampled with a finite number of points, say m, we then calculate the sample covariance matrix \(\mathbf{K} \in \mathbb {R}^{5m \times 5m}\) similar to [26]. Let \(\mathbf{K} = \mathbf{U}\varvec{\Sigma }{} \mathbf{U}^T\) be the singular value decomposition of \(\mathbf{K}\), and let \(\mathbf{U}_r\), the first r columns of \(\mathbf{U}\), span the principal subspace of the observed data. The principal scores of each data can be calculated by projecting each \(\phi (w^0_i)\) to this principal subspace, and we apply a Gaussian distribution to model these principal scores.

Fig. 8
figure 8

Simulated trajectories in (a) (with FA value) and their FA values in (b)

In the simulation study, we generate 50 random trajectories in \(\varvec{\mathcal {P}}(3)\), some of them are shown in Fig. 8a. To give an idea about variability along these trajectories, we compute the fractional anisotropy (FA) [4] value of each SPDM in a trajectory, and visualize it as a scalar function of time. In Fig. 8b, we show FA curves of the 50 simulated trajectories. Using Algorithm 5, we calculate the mean trajectory, the aligned trajectories \((p_i,\tilde{q}_i)\), and the shooting vectors \((u_i,w_i)\). Figure 9a shows FA curves of the aligned trajectories, and (b) shows the mean trajectory and its FA curve. A PCA of shooting vectors leads to dimension reduction in data, which is necessary for reaching an efficient statistical modeling on trajectories. Figure 9c shows the variability of trajectories along the first PC of the given data. Figure 10 shows some random samples from the Gaussian models discussed above. In (a), we show the FA curves of the 200 sampled trajectories, and in (b), we show 9 random trajectories. Notice that the statistical model is in \(\mathbb {B}/\tilde{\varGamma }\), so we do not consider the variation of the wrapping functions \(\gamma \). To build a more complex model that considers \(\gamma \), we refer the reader to Chapter 7 in [35].

Fig. 9
figure 9

Statistical modeling on the tangent space of the mean trajectory. a shows the FA curves of the aligned trajectories to the mean; b shows the mean trajectory and its FA curve; and c shows the first PC direction

Fig. 10
figure 10

Randomly sampled trajectories from the fitted model. a shows the FA curves of the 200 simulated trajectories and b shows 9 example trajectories

8 Application 2: Discriminative Analysis of Trajectories

Now we turn to evaluation of the framework for discriminative pattern recognition. We consider two applications here: (1) action recognition and (2) dynamic functional brain connectivity analysis.

Fig. 11
figure 11

a shows three examples of gestures in the Cambridge hand gesture database. b shows the five different illumination conditions in the database

Table 2 Recognition results on the Cambridge Hand Gesture dataset

8.1 Hand Gesture Recognition

Hand gesture recognition using videos is an important research area since people use gestures to depict sign language for deaf, convey messages in loud environment and to interface with computers. In this section, we are interested in applying our framework in video-base hand gesture recognition. We use the Cambridge hand gesture dataset [24] which has 900 video sequences with nine different hand gestures (100 video sequences for each gesture). The nine gestures result from 3 primitive hand shapes and 3 primitive motions, and as collected under different illumination conditions. Some example gestures are shown in Fig. 11. The gestures are imaged under five different illuminations, labeled as Set1, Set2, \(\dots \), Set5.

In addition to the illumination variability, the main challenge here comes from the fact that hands in this database are not well aligned, e.g., the proportion of a hand in an image and the location of the hand are different in different video sequences. To reduce these effects, we evenly split one image into four quadrants (upper-left, upper-right, bottom-left, bottom-right) with some overlaps. Each of the four quadrants is represented by a sequence of covariance matrices in \(\tilde{\varvec{\mathcal {P}}}\). In this experiment, we use HOG features [10] to form a covariance matrix per image quadrant as follows. We use \(2\times 2\) blocks of \(8 \times 8\) pixel cells with 7 histogram channels to form HOG features. Those HOG features are then used to generate a \(7\times 7\) covariance matrix for each quadrant of each frame. Thus, our representation of a video is now given by \(t \mapsto \alpha (t) \in \tilde{\varvec{\mathcal {P}}}(7)^4\).

Since we have split each hand gesture into four dynamic parts, the total distance between any two hand gestures is a composite of four corresponding distances. For each corresponding dynamic quadrant, e.g., the upper-left part, we first align a pair of videos (using Algorithm 4) and then compare them using the metric \(d_q\), denoted by \(d^q_\mathrm{upl}\). The final distance is obtained using an weighted average of the four parts: \(d = \lambda _1 d^q_\mathrm{upl} + \lambda _2 d^q_\mathrm{upr} + \lambda _3 d^q_\mathrm{downl} + \lambda _4 d^q_\mathrm{downr}\) and \(\sum _{i=1}^4\lambda _i = 1\). For an unsupervised study, we set \(\lambda _i = 1/4\) for \(i=1,\ldots ,4\). In another setting, we learn a different weight for each illumination to make d be more robust to the registration and illumination issues. In our experiment, we randomly selected half of the data (90 video sequences) as training data, and the other half of the data were used for testing. Table 2 shows our results using the nearest neighbor classifier on all five sets. One can see that the alignment can significantly improve the classification rate on every illumination condition in both supervised and unsupervised settings. We also reported the state-of-the-art results on this database [16, 28, 29]. One can see that with only a \(7 \times 7\) covariance feature per quadrant per frame, we are able to achieve a classification result that is equivalent or better than the state-of-the-art results. The classification result may further be improved using a more discriminative feature or better classifier.

8.2 Dynamic Functional Connectivity Study

Another interesting application of the proposed framework is in analyzing functional brain connectivity using fMRI data. Functional connectivity (FC) is defined as statistical dependencies among remote neurophysiological events. The short-term FC is often represented as a covariance or correlation matrix of fMRI data over a small time window, with the matrix size being the number of brain regions. In the early studies, FC associated with individual tasks or stimuli was treated as fixed over time. However, later studies [18] revealed that FC is a dynamic process and evolves over time. Therefore, it is natural to represent FC observed over a long interval as an indexed sequence of covariance matrices. Consequently, we can use the method developed in this paper for comparing and analyzing such FC.

We present some experimental results using data from the Human Connectome Project (HCP). For the first experiment, we select fMRI data for 20 subjects during resting state and during performances of various other tasks. The data are aligned and denoised using HCP data preprocessing pipeline [14], and the Destrieux atlas [11] is used to parcellate cortical regions into 74 nodes per hemisphere. We choose 10 regions, including the inferior frontal gyrus and sulcus, and transverse temporal sulcus, that are related to the go/no tasks [2]. The dynamic connectivity of these regions is represented as a trajectory of covariance matrices between the regions. We use a sliding window [18] to calculate a covariance trajectory. For comparisons, we have selected two tasks: resting state fMRI and gambling task fMRI, and 10 trajectories for each task.

Fig. 12
figure 12

a shows the determinant part (\(\frac{1}{n} \log (\det (\tilde{P}))\) ) of the 20 dynamic brain subnetworks. b shows the pairwise distances between the 20 subnetworks after alignment. c shows the histogram of \((d_c - d_q)/\text {max}(d_c,d_q)\). d shows the pairwise distances calculated based on log-E metric

The results are shown in Fig. 12. In panel (a), we show the determinant (\(\frac{1}{n} \log (\det (\tilde{P}(t)))\)), as function of t, of these 20 covariance trajectories. We can clearly see fluctuations in the dynamics of the determinant part over time. Since the gambling task is carefully designed to repeat the reward, neutral and loss blocks, we see similar periodic fluctuations for different subjects (but there are some temporal misalignment due to inter-subject variability). This dynamic pattern seems different from that for the resting state fMRI. Figure 12 (b) shows the pairwise distances \(d_q\) between the 20 trajectories. The block pattern in this distance matrix is an evidence that the resting state are very different from the gambling task state. The dynamic FC trajectories are more homogeneous in the gambling task (after temporal alignment). The temporal alignment plays an important role in studying dynamic FC. Panel (c) shows the histograms of percentages of distance changes before and after alignment for resting and gambling task. We see a significant reduction in distances in both cases, but the percentage reduction for task trajectories is more than that of the resting ones. This is due to the fact that the gambling task is performed under a well-designed rule, and although there are some temporal misalignments, everyone in the experiment exhibits similar dynamic FC among the selected ROIs. For comparison, we also compute pairwise trajectory distances under the log-Euclidean distance [44]. That is, for any two trajectories \(\alpha _1,\alpha _2\), the distance between them is calculated as \(d_{\log _\text {E}}(\alpha _1,\alpha _2) = \int _0^1 \Vert \log (\alpha _1(t)) - \log (\alpha _2(t))\Vert \mathrm{d}t \). Panel (d) shows the pairwise distance matrix calculated using \(d_{\log _E}\) for the same data. One can see that there is no block pattern there, indicating the lack of power of this metric in dynamic FC study.

In the previous experiment, the 10 selected regions are marked as “Set1 ROIs”. We take another set of 10 regions (postcentral and precentral gyrus, and central and precentral sulcus, that are part of the motor cortex) to generate another set of covariance trajectories. These 10 regions are denoted as “Set2 ROIs”. We compare the dynamic FC generated by these two sets of ROIs under different tasks, and the results are shown in Fig. 13. The results show that, while the dynamic FCs are very different for the two sets of ROIs in the motor task, the connectivities are not that well separated in the gambling task. These experiments demonstrate that the proposed framework can perform statistical analysis of dynamic FCs.

Fig. 13
figure 13

The pairwise distances (based on \(d_q\)) between covariance trajectories generated from different sets of nodes. a shows the distance in the motor task for 20 trajectories after the alignment. b shows the distance in the gambling task

9 Conclusion

In summary, we have proposed a metric-based approach for simultaneous alignment and comparisons of trajectories on \(\tilde{\varvec{\mathcal {P}}}\), the Riemannian manifold of covariance matrices (SPDMs). We impose a Riemannian structure on this manifold that facilitates explicit expressions for geometric quantities, such as parallel transport and Riemannian curvature tensor. For analyzing covariance trajectories, the basic idea is to represent each trajectory by a starting point \(\tilde{P} \in \tilde{\varvec{\mathcal {P}}}\) and a TSRVF which is a curve in the tangent space \(\mathcal {T}_{\tilde{P}}(\tilde{\varvec{\mathcal {P}}})\). The metric for comparing these elements is a composite of: (a) the length of the path between the starting points and (b) the difference introduced by the TSRVFs. The search for optimal path, or a geodesic, is based on a shooting method, that in itself uses geodesic equations for computing the exponential map. Using a numerical implementation of the exponential map, we derive numerical solutions for pairwise alignment of covariance trajectories and to quantify their differences using a rate-invariant distance. We have applied this framework to two scenarios: (1) covariance tracking in video data, with an application to the hand gesture recognition, and (2) dynamic functional connectivity study in fMRI data. The advantages and potential applications of the proposed framework have been demonstrated in these experiments.