Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Regression analysis is a fundamental statistical tool for determining how a measured variable is related to one or more potential explanatory variables. The most widely used regression model is linear regression, due to its simplicity, ease of interpretation, and ability to model many phenomena. However, if the response variable takes values on a nonlinear manifold, a linear model is not applicable. Such manifold-valued measurements arise in many applications, including those involving directional data, transformations, tensors, and shape. For example, in biology and medicine it is often critical to understand processes that change the shape of anatomy. The difficulty is that shape variability is inherently high-dimensional and nonlinear. An effective approach to capturing this variability has been to parameterize shape as a manifold, or shape space.

Statistical analysis of manifold data has been developed by several authors. The seminal work of Fréchet [10] generalized the concept of expectation from linear spaces to general metric spaces. This opened up the possibility of computing a sample mean statistic from a set of data on a manifold using the geodesic distance as metric. The Fréchet mean of a set of points, y 1, , y N , in a Riemannian manifold M is given by

$$\mu =\arg \min\limits_{y\in M}\displaystyle\sum _{i=1}^{N}d{(y,y_{ i})}^{2},$$

where d is the geodesic distance between points on M. This equation generalizes the principle of least squares to the metric space setting. Karcher [12] provided conditions guaranteeing the existence and uniqueness of the Fréchet mean, which were later improved by Kendall [14]. Second-order statistics such as generalizations of principal components analysis [8] and Gaussian covariances [21] have also been developed and applied in the domain of image analysis. Related work includes statistical analysis of directional data (e.g., spheres) [16] and analysis on shape manifolds [5], where statistics are derived from probability distributions on specific manifolds (for example, the Fisher-von Mises distribution on spheres).

Several works have studied the regression problem on manifolds. Jupp and Kent [11] propose an unrolling method on shape spaces. Regression analysis on the group of diffeomorphisms has been proposed as growth models by Miller [18], nonparametric regression by Davis et al. [2], and second-order splines by Trouvé and Vialard [23]. Durrleman et al. [6] construct spatiotemporal image atlases from longitudinal data. Finally, Shi et al. [22] proposed a semiparametric model with multiple covariates for manifold response data. None of these methods provide a direct generalization of linear regression to manifolds. The purpose of this work is to develop such a generalization, called geodesic regression, which models the relationship between an independent scalar variable with a dependent manifold-valued random variable as a geodesic curve. Like linear regression, the advantages of this model are its simplicity and ease of interpretation. As will be shown, the geodesic regression model also leads to a straightforward generalization of the R 2 statistic and a hypothesis test for significance of the estimated geodesic trend. This chapter is an expanded exposition of the geodesic regression method first introduced in [7]. Niethammer et al. [20] independently proposed geodesic regression for the case of diffeomorphic transformations of image time series.

2 Multiple Linear Regression

Before formulating geodesic regression on general manifolds, we begin by reviewing multiple linear regression in \({\mathbb{R}}^{n}\). Here we are interested in the relationship between a non-random independent variable \(X\,\in \,\mathbb{R}\) and a random dependent variable Y taking values in \({\mathbb{R}}^{n}\). A multiple linear model of this relationship is given by

$$Y =\alpha +X\beta +\epsilon ,$$
(2.1)

where \(\alpha \in {\mathbb{R}}^{n}\) is an unobservable intercept parameter, \(\beta \in {\mathbb{R}}^{n}\) is an unobservable slope parameter, and ε is an \({\mathbb{R}}^{n}\)-valued, unobservable random variable representing the error. Geometrically, this is the equation of a one-dimensional line through \({\mathbb{R}}^{n}\) (plus noise), parameterized by the scalar variable X. For the purposes of generalizing to the manifold case, it is useful to think of α as the starting point of the line and β as a velocity vector.

Given realizations of the above model, i.e., data \((x_{i},y_{i}) \in \mathbb{R} \times {\mathbb{R}}^{n}\), for i = 1, , N, the least squares estimates, \(\hat{\alpha },\hat{\beta },\) for the intercept and slope are computed by solving the minimization problem

$$(\hat{\alpha },\hat{\beta }) =\arg \min\limits_{(\alpha ,\beta )}\displaystyle\sum _{i=1}^{N}{\left \|y_{ i} -\alpha -x_{i}\beta \right \|}^{2}.$$
(2.2)

This equation can be solved analytically, yielding

$$\displaystyle\begin{array}{rcl} \hat{\beta }& =& \frac{ \frac{1} {N}\displaystyle\sum x_{i}\,y_{i} -\bar{ x}\,\bar{y}} {\displaystyle\sum x_{i}^{2} -\bar{ {x}}^{2}} , \\ \hat{\alpha }& =& \bar{y} -\bar{ x}\,\hat{\beta }, \\ \end{array}$$

where \(\bar{x}\) and \(\bar{y}\) are the sample means of the x i and y i , respectively. If the errors in the model are drawn from distributions with zero mean and finite variance, then these estimators are unbiased and consistent. Furthermore, if the errors are homoscedastic (equal variance) and uncorrelated, then the Gauss-Markov theorem states that they will have minimal mean-squared error amongst all unbiased linear estimators.

3 Geodesic Regression

Let y 1, , y N be points on a smooth Riemannian manifold M, with associated scalar values \(x_{1},\ldots ,x_{N} \in \mathbb{R}\). The goal of geodesic regression is to find a geodesic curve γ on M that best models the relationship between the x i and the y i . Just as in linear regression, the speed of the geodesic will be proportional to the independent parameter corresponding to the x i . Estimation will be set up as a least-squares problem, where we want to minimize the sum-of-squared Riemannian distances between the model and the data. A schematic of the geodesic regression model is shown in Fig. 2.1.

Fig. 2.1
figure 1

Schematic of the geodesic regression model

Before formulating the model, we review a few basic concepts of Riemannian geometry. We will write an element of the tangent bundle as the pair (p, v) ∈ TM, where p is a point in M and v ∈ T p M is a tangent vector at p. Recall that for any (p, v) ∈ TM there is a unique geodesic curve γ, with initial conditions γ(0) = p and γ′(0) = v. This geodesic is only guaranteed to exist locally. When γ is defined over the interval [0, 1], the exponential map at p is defined as Exp p (v) = γ(1). In other words, the exponential map takes a position and velocity as input and returns the point at time 1 along the geodesic with these initial conditions. The exponential map is locally diffeomorphic onto a neighborhood of p. Let V (p) be the largest such neighborhood. Then within V (p) the exponential map has an inverse, the Riemannian log map, Log p : V (p) → T p M. For any point q ∈ V (p) the Riemannian distance function is given by \(d(p,q) =\|\mathrm{ Log}_{p}(q)\|\). It will be convenient to include the point p as a parameter in the exponential and log maps, i.e., define Exp(p, v) = Exp p (v) and Log(p, q) = Log p (q).

Notice that the tangent bundle TM serves as a convenient parameterization of the set of possible geodesics on M. An element (p, v) ∈ TM provides an intercept p and a slope v, analogous to the α and β parameters in the multiple linear regression model (2.1). In fact, β is a vector in the tangent space \(T_{\alpha }{\mathbb{R}}^{n}\cong{\mathbb{R}}^{n}\), and thus (α, β) is an element of the tangent bundle \(T{\mathbb{R}}^{n}\). Now consider an M-valued random variable Y and a non-random variable \(X\,\in \,\mathbb{R}\). The generalization of the multiple linear model to the manifold setting is the geodesic model,

$$Y =\mathrm{ Exp}(\mathrm{Exp}(p,Xv),\epsilon ),$$
(2.3)

where ε is a random variable taking values in the tangent space at Exp(p, Xv). Notice that for Euclidean space, the exponential map is simply addition, i.e., \(\mathrm{Exp}(p,v) = p + v\). Thus, the geodesic model coincides with (2.1) when \(M = {\mathbb{R}}^{n}\).

3.1 Least Squares Estimation

Consider a realization of the model (2.3): \((x_{i},y_{i}) \in \mathbb{R} \times M\), for i = 1, , N. Given this data, we wish to find estimates of the parameters (p, v) ∈ TM. First, define the sum-of-squared error of the data from the geodesic given by (p, v) as

$$E(p,v) = \frac{1} {2}\displaystyle\sum _{i=1}^{N}d{(\mathrm{Exp}(p,x_{ i}v),y_{i})}^{2}.$$
(2.4)

Following the ordinary least squares minimization problem given by (2.2), we formulate a least squares estimator of the geodesic model as a minimizer of the above sum-of-squares energy, i.e.,

$$(\hat{p},\hat{v}) =\arg \min _{(p,v)}E(p,v).$$
(2.5)

Again, notice that this problem coincides with the ordinary least squares problem when \(M = {\mathbb{R}}^{n}\).

Unlike the linear setting, the least squares problem in (2.5) for a general manifold M will typically not yield an analytic solution. Instead we derive a gradient descent algorithm. Computation of the gradient of (2.4) will require two parts: the derivative of the Riemannian distance function and the derivative of the exponential map. Fixing a point p ∈ M, the gradient of the squared distance function is \(\nabla _{x}d{(p,x)}^{2} = -2\mathrm{Log}_{x}(p)\) for x ∈ V (p).

The derivative of the exponential map Exp(p, v) can be separated into a derivative with respect to the initial point p and a derivative with respect to the initial velocity v. To do this, first consider a variation of geodesics given by c 1(s, t) = Exp(Exp(p, su 1), tv(s)), where u 1 ∈ T p M defines a variation of the initial point along the geodesic η(s) = Exp(p, su 1). Here we have also extended v ∈ T p M to a vector field v(s) along η via parallel translation. This variation is illustrated on the left side of Fig. 2.2. Next consider a variation of geodesics \(c_{2}(s,t)\,=\,\mathrm{Exp}(p,su_{2} + tv)\), where u 2 ∈ T p M. (Technically, u 2 is a tangent to the tangent space, i.e., an element of T v (T p M), but there is a natural isomorphism \(T_{v}(T_{p}M)\cong T_{p}M\).) The variation c 2 produces a “fan” of geodesics as seen on the right side of Fig. 2.2.

Fig. 2.2
figure 2

Jacobi fields as derivatives of the exponential map

Now the derivatives of Exp(p, v) with respect to p and v are given by

$$\displaystyle\begin{array}{rcl} d_{p}\mathrm{Exp}(p,v) \cdot u_{1}& =& \frac{d} {ds}c_{1}(s,t)\Big{\vert }_{s=0} = J_{1}(1) \\ d_{v}\mathrm{Exp}(p,v) \cdot u_{2}& =& \frac{d} {ds}c_{2}(s,t)\Big{\vert }_{s=0} = J_{2}(1), \\ \end{array}$$

where J i (t) are Jacobi fields along the geodesic γ(t) = Exp(p, tv). Jacobi fields are solutions to the second order equation

$$\frac{{D}^{2}} {d{t}^{2}}J(t) + R(J(t),\,\gamma ^{\prime}(t))\,\,\gamma ^{\prime}(t) = 0,$$
(2.6)

where R is the Riemannian curvature tensor. For more details on the derivation of the Jacobi field equation and the curvature tensor, see for instance [3]. The initial conditions for the two Jacobi fields above are \(J_{1}(0) = u_{1},J_{1}^{\prime}(0) = 0\) and J 2(0) = 0, J 2 (0) = u 2, respectively. If we decompose the Jacobi field into a component tangential to γ and a component orthogonal, i.e., \(J = {J}^{\top } + {J}^{\perp }\), the tangential component is linear: \({J}^{\top }(t) = u_{1}^{\top } + tu_{2}^{\top }\). Therefore, the only challenge is to solve for the orthogonal component.

Finally, the gradient of the sum-of-squares energy in (2.4) is given by

$$\displaystyle\begin{array}{rcl} \nabla _{p}\,E(p,v)& =& -\displaystyle\sum _{i=1}^{N}d_{ p}\mathrm{Exp}{(p,x_{i}v)}^{\dag }\mathrm{Log}(\mathrm{Exp}(p,x_{ i}v),y_{i}), \\ \nabla _{v}\,E(p,v)& =& -\displaystyle\sum _{i=1}^{N}x_{ i}\,d_{v}\mathrm{Exp}{(p,x_{i}v)}^{\dag }\mathrm{Log}(\mathrm{Exp}(p,x_{ i}v),y_{i}), \\ \end{array}$$

where we have taken the adjoint of the exponential map derivative, e.g., defined by \(\langle d_{p}\mathrm{Exp}(p,v)u,w\rangle =\langle u,d_{p}\mathrm{Exp}{(p,v)}^{\dag }w\rangle\). As we will see in the next section, formulas for Jacobi fields and their respective adjoint operators can often be derived analytically for many useful manifolds.

3.2 R 2 Statistics and Hypothesis Testing

In regression analysis the most basic question one would like to answer is whether the relationship between the independent and dependent variables is significant. A common way to test this is to see if the amount of variance explained by the model is high. For geodesic regression we will measure the amount of explained variance using a generalization of the R 2 statistic, or coefficient of determination, to the manifold setting. To do this, we first define predicted values of y i and the errors ε i as

$$\displaystyle\begin{array}{rcl} \hat{y}_{i}& =& \mathrm{Exp}(\hat{p},x_{i}\hat{v}), \\ \hat{\epsilon }_{i}& =& \mathrm{Log}(\hat{y}_{i},y_{i}), \\ \end{array}$$

where \((\hat{p},\hat{v})\) are the least squares estimates of the geodesic parameters defined above. Note that the \(\hat{y}_{i}\) are points along the estimated geodesic that are the best predictions of the y i given only the \(x_{i}\). The \(\hat{\epsilon }_{i}\) are the residuals from the model predictions to the true data.

Now to define the total variance of data, y 1, , y N  ∈ M, we use the Fréchet variance, intrinsically defined by

$$\mathrm{var}(y_{i}) =\min\limits_{y\in M} \frac{1} {N}\displaystyle\sum _{i=1}^{N}d{(y,y_{ i})}^{2}.$$

The unexplained variance is the variance of the residuals, \(\mathrm{var}(\hat{\epsilon _{i}}) = \frac{1} {N}\sum \|\hat{\epsilon }{_{i}\|}^{2}\). From the definition of the residuals, it can be seen that the unexplained variance is the mean squared distance of the data to the model, i.e., \(\mathrm{var}(\hat{\epsilon }_{i}) = \frac{1} {N}\sum d{(\hat{y}_{i},y_{i})}^{2}\). Using these two variance definitions, the generalization of the R 2 statistic is then given by

$${R}^{2} = 1 -\frac{\text{unexplained variance}} {\text{total variance}} = 1 - \frac{\mathrm{var}(\hat{\epsilon }_{i})} {\mathrm{var}(y_{i})}.$$
(2.7)

Fréchet variance coincides with the standard definition of variance when \(M = {\mathbb{R}}^{n}\). Therefore, it follows that the definition of R 2 in (2.7) coincides with the R 2 for linear regression when \(M = {\mathbb{R}}^{n}\). Also, because Fréchet variance is always nonnegative, we see that R 2 ≤ 1, and that R 2 = 1 if and only if the residuals to the model are exactly zero, i.e., the model perfectly fits the data. Finally, it is clear that the residual variance is always smaller than the total variance, i.e., \(\mathrm{var}(\hat{\epsilon }_{i}) \leq \mathrm{ var}(y_{i})\). This is because we could always choose \(\hat{p}\) to be the Fréchet mean and v = 0 to achieve \(\mathrm{var}(\hat{\epsilon }_{i}) =\mathrm{ var}(y_{i})\). Therefore, R 2 ≥ 0, and it must lie in the interval [0, 1], as is the case for linear models.

We now describe a permutation test for testing the significance of the estimated slope term, \(\hat{v}\). Notice that if we constrain v to be zero in (2.5), then the resulting least squares estimate of the intercept, \(\hat{p},\) will be the Fréchet mean of the y i . The desired hypothesis test is whether the fraction of unexplained variance is significantly decreased by also estimating v. The null hypothesis is H 0 : R 2 = 0, which is the case if the unexplained variance in the geodesic model is equal to the total variance. Under the null hypothesis, there is no relationship between the X variable and the Y variable. Therefore, the x i are exchangeable under the null hypothesis, and a permutation test may randomly reorder the x i data, keeping the y i fixed. Estimating the geodesic regression parameters for each random permutation of the x i , we can calculate a sequence of R 2 values, \(R_{1}^{2},\ldots ,R_{m}^{2}\), which approximate the sampling distribution of the R 2 statistic under the null hypothesis. Computing the fraction of the R k 2 that are greater than the R 2 estimated from the unpermuted data gives us a p-value.

4 Testing the Geodesic Fit

In any type of regression analysis, a choice is made as to the type of model that is fit to the data, whether it be linear, polynomial, or perhaps nonparametric. An important step in the analysis is to verify that the selected model is in fact appropriate. In linear regression, for example, one would want to test several assumptions: (1) that the trend in the data is truly linear, (2) that the error is homoscedastic, (3) that the model fit is not led astray by outliers, (4) that the errors are Gaussian distributed, etc. Several graphical and quantitative heuristic tests have been developed to test these assumptions. For a detailed treatment of these methods, see [9].

In this section we develop a diagnostic test of the model assumptions for geodesic regression. We focus on the following question: is a geodesic curve an appropriate model for the relationship between the independent and dependent variables? A geodesic curve is, in some sense, the “straightest” path one can take on a manifold. This begs the question if a more flexible model would do a better job at fitting the data. This is analogous to the model selection problem for real-valued data when one is making the choice between a linear model and something more flexible, such as a higher-order polynomial model. Of course, if a model is made more flexible there is a danger that the data will be overfit. One way to test if a model is “flexible enough” is to plot the residuals of the model versus the independent variable. If the model has captured the relationship between the independent and dependent variables, then the residuals should show no obvious trend. If they do show a trend, then a more flexible model is needed to capture the relationship between the data. However, for regression on manifolds, this is a difficult test to apply because the residuals are high-dimensional tangent vectors and are thus difficult to plot versus the independent variable. One solution might be to plot the magnitude of the residuals instead, but this loses most of the information contained in the residual vectors.

Instead, we will use nonparametric regression as a comparison model to test if a geodesic is sufficient to capture the relationships in the data. Nonparametric regression models, such as the kernel regression method described below, are highly flexible. Because there is no parametric model assumed for the functional relationship between the independent and dependent variables, these models can adapt to highly complex functions given enough data. Given a method to visualize the results of a manifold-valued regression, the diagnostic test is as follows. First, compute both a geodesic regression and a nonparametric regression of the data. Second, visualize the results of both regression methods. If the nonparametric regression trend is similar to the estimated geodesic, then this provides strong evidence that the geodesic model is sufficient. If the nonparametric trend deviates significantly from the estimated geodesic, then this indicates that the geodesic model is too inflexible to capture the relationship between the two variables.

An example of this procedure is given for synthesized univariate data in Fig. 2.3. The left figure shows data generated from a noisy linear trend. In this case the linear model and the nonparametric model give similar answers. The right figure shows data generated from a noisy nonlinear (sinusoidal) trend. Here the nonparametric regression adapts to the nonlinearities in the data, and the inadequacy of the linear trend can be seen as a difference between the two regression models. Of course, in the univariate case we can easily see that a linear trend is inadequate just by plotting the data even without comparing it to a nonparametric regression. However, for high-dimensional manifolds this type of plot is not available. This is where a comparison to a nonparametric trend is highly useful. In the results below (Sect. 2.6.2) we give an example of how this comparison to nonparametric regression can be used as a diagnostic of model fit in shape analysis applications. The nonparametric regression method that we use for comparison is the one given by Davis et al. [2], which we review now.

Fig. 2.3
figure 3

Comparison of linear (black) and nonparametric (red) regressions as a test of fit. When the data is generated from a linear model (left), the two regression methods produce similar results. When the data is generated from a nonlinear model (right), the difference in the two models helps detect that a linear model is insufficient

4.1 Review of Univariate Kernel Regression

Before reviewing the manifold version, we give a quick overview of univariate kernel regression as developed by Nadaraya [19] and Watson [25]. As in the linear regression setting, we are interested in finding a relationship between data \(x_{1},\ldots ,x_{N} \in \mathbb{R}\), coming from an independent variable X, and data \(y_{1},\ldots ,y_{N} \in \ \mathbb{R}\), representing a dependent variable Y. The model of their relationship is given by

$$Y = f(X)+\epsilon ,$$

where f is an arbitrary function, and ε is a random variable representing the error. Contrary to linear regression, the function f is not assumed to have any particular parametric form.

Instead, the function f is estimated from the data by local weighted averaging.

$$\hat{f}_{h}(x) = \frac{\displaystyle\sum\nolimits_{i=1}^{N}K_{h}(x - x_{i})y_{i}} {\displaystyle\sum \nolimits_{i=1}^{N}K_{h}(x - x_{i})}.$$

In this equation, K is a function that satisfies ∫K(t) dt = 1 and \(K_{h}(t) = \frac{1} {h}K( \frac{t} {h})\), with bandwidth parameter h > 0. This is the estimation procedure shown in Fig. 2.3 (red curves).

4.2 Nonparametric Kernel Regression on Manifolds

The regression method of Davis et al. [2] generalizes the Nadaraya-Watson kernel regression method to the case where the dependent variable lives on a Riemannian manifold, i.e., y i  ∈ M. Here the model is given by

$$Y =\mathrm{ Exp}(f(X),\epsilon ),$$

where \(f : \mathbb{R} \rightarrow M\) defines a curve on M, and ε ∈ T f(X) M is an error term. As in the univariate case, there are no assumptions on the parametric form of the curve f.

Motivated by the definition of the Nadaraya-Watson estimator as a weighted averaging, the manifold kernel regression estimator is defined using a weighted Fréchet sample mean as

$$\hat{f}_{h}(x) =\arg \min\limits_{y\in M}\frac{\displaystyle\sum\nolimits_{i=1}^{N}K_{h}(x - x_{i})d{(y,y_{i})}^{2}} {\displaystyle\sum\nolimits_{i=1}^{N}K_{h}(x - x_{i})}.$$

Notice that when the manifold under study is a Euclidean vector space, equipped with the standard Euclidean norm, the above minimization results in the Nadaraya-Watson estimator.

4.3 Bandwidth Selection

It is well known within the kernel regression literature that kernel width plays a crucial role in determining regression results [24]. In particular, it is important to select a bandwidth that captures relevant population-wide changes without either oversmoothing and missing relevant changes or undersmoothing and biasing the results based on individual noisy data points. The ‘Goldie Locks’ method of tuning the bandwidth until the results are most pleasing is a common subjective method for bandwidth selection. However, non-subjective methods may be required, for example, when kernel regression is part of a larger statistical study. A number of automatic kernel bandwidth selection techniques have been proposed for this purpose [24].

One classic method for automatic bandwidth selection is based on least squares cross-validation. This method is easily extended to the manifold regression setting in the following way. The least squares cross-validation estimate for the optimal bandwidth h is defined as

$$\hat{h}_{\text{LSCV}} =\arg \min\limits_{h\in {\mathbb{R}}^{+}} \frac{1} {N}\displaystyle\sum _{i=1}^{N}d{\left (\hat{f}_{ h}^{(i)}(x_{ i}),y_{i}\right )}^{2},$$

where

$$\hat{f}_{h}^{(i)}(t) =\arg \min\limits_{ y\in M}\left (\frac{\displaystyle\sum\nolimits_{j=1,j\neq i}^{N}K_{h}(x - x_{j})d{(y,y_{j})}^{2}} {\displaystyle\sum\nolimits_{j=1,j\neq i}^{N}K_{h}(x - x_{j})} \right )$$

is the manifold kernel regression estimator with the i-th observation left out. This cross-validation method was used to select the bandwidth for the kernel regression example in Fig. 2.3.

5 Results: Regression of 3D Rotations

5.1 Overview of Unit Quaternions

We represent 3D rotations as the unit quaternions, \(\mathbb{Q}_{1}\). A quaternion is denoted as q = (a, v), where a is the “real” component and \(v = bi + cj + dk\). Geodesics in the rotation group are given simply by constant speed rotations about a fixed axis. Let e = (1, 0) be the identity quaternion. The tangent space \(T_{e}\mathbb{Q}_{1}\) is the vector space of quaternions of the form (0, v). The tangent space at an arbitrary point \(q \in \mathbb{Q}_{1}\) is given by right multiplication of \(T_{e}\mathbb{Q}_{1}\) by q. The Riemannian exponential map is \(\mathrm{Exp}_{q}((0,v) \cdot q) = (\cos (\theta /2),2v \cdot \sin (\theta /2)/\theta ) \cdot q\), where \(\theta = 2\|v\|\). The log map is given by \(\mathrm{Log}_{q}((a,v) \cdot q) = (0,\theta v/\|v\|) \cdot q\), where θ = arccos(a).

Being a unit sphere, \(\mathbb{Q}_{1}\) has constant sectional curvature K = 1. In this case the orthogonal component of the Jacobi field equation (2.6) along a geodesic γ(t) has the analytic solution

$$J{(t)}^{\perp } = u_{ 1}(t)\cos \left (Lt\right ) + u_{2}(t)\frac{\sin \left (Lt\right )} {L} ,$$

where u 1, u 2 are parallel vector fields along γ, with initial conditions u 1(0) = J(0) ⊥  and u 2(0) = J′(0) ⊥ , and \(L =\|\gamma ^{\prime}\|\). While the Jacobi field equation gives us the differential of the exponential map, we really need the adjoint of this operator for geodesic regression. However, from the above equation it is clear that d p Exp and d v Exp are both self-adjoint operators. That is, the above Jacobi field equation provides us both the differential and its adjoint.

5.2 Geodesic Regression of Simulated Rotation Data

To test the geodesic regression least squares estimation on \(\mathbb{Q}_{1}\), synthetic rotation data was simulated according to the geodesic model (2.3). The intercept was the identity rotation: p = (1, 0, 0, 0), and the slope was a rotation about the z-axis: \(v = (0,0,0,\pi /4)\). The x i data were drawn from a uniform distribution on [0, 1]. The errors in the model were generated from an isotropic Gaussian distribution in the tangent space, with \(\sigma =\pi /8\). The resulting data (x i , y i ) were used to compute estimates of the parameters \((\hat{p},\hat{v})\). This experiment was repeated 1,000 times each for sample sizes N = 2k, k = 2, 3, , 8. We would expect that as the sample size increases, the mean squared error (MSE) in the estimates \((\hat{p},\hat{v})\), relative to the true parameters, would approach zero. The MSE is defined as

$$\mathrm{MSE}(\hat{p}) = \frac{1} {M}\displaystyle\sum _{i=1}^{M}d{(\hat{p}_{ i},p)}^{2},\quad \quad \mathrm{MSE}(\hat{v}) = \frac{1} {M}\displaystyle\sum _{i=1}^{M}\|\hat{v}_{ i} \cdot (\hat{p}_{i}^{-1}p) - {v\|}^{2},$$

where M = 1,000 is the number of repeated trials, and \((\hat{p}_{i},\hat{v}_{i})\) is the estimate from the ith trial. Notice the multiplication by \((\hat{p}_{i}^{-1}p)\) in the second equation is a right-translation of \(\hat{v}_{i}\) to the tangent space of p. Figure 2.4 shows plots of the resulting MSE for the slope and intercept estimates. As expected, the MSE approaches zero as sample size increases, indicating at least empirically that the least squares estimates are consistent.

Fig. 2.4
figure 4

Results for simulated rotation data: MSE of the geodesic regression estimates for the intercept (left) and slope (right) as a function of sample size

6 Results: Regression in Shape Spaces

One area of medical image analysis and computer vision that finds the most widespread use of Riemannian geometry is the analysis of shape. Dating back to the groundbreaking work of Kendall [13] and Bookstein [1], modern shape analysis is concerned with the geometry of objects that is invariant to rotation, translation, and scale. This typically results in representing an object’s shape as a point in a nonlinear Riemannian manifold, or shape space. Recently, there has been a great amount of interest in Riemannian shape analysis, and several shape spaces for 2D and 3D objects have been proposed [8, 15, 17, 26]. We choose here to use Kendall’s shape space, but geodesic regression is applicable to other shape spaces as well. It could also be applied to spaces of diffeomorphisms, using the Jacobi field calculations given by Younes [27]. In fact, Niethammer et al. [20] recently independently developed geodesic regression for diffeomorphic transformations of image time series. They solve the gradient descent problem with an elegant control theory approach, constraining the regression curve to be a geodesic using Lagrange multipliers. The resulting update to the geodesic’s initial conditions involves a numerical integration of an adjoint equation backwards along the geodesic with jump conditions at the data points.

6.1 Overview of Kendall’s Shape Space

We begin with derivations of the necessary computations for geodesic regression on Kendall’s shape space. A configuration of k points in the 2D plane is considered as a complex k-vector, \(z \in {\mathbb{C}}^{k}\). Removing translation, by requiring the centroid to be zero, projects this point to the linear complex subspace \(V =\{ z \in {\mathbb{C}}^{k} :\sum z_{i} = 0\}\), which is equivalent to the space \({\mathbb{C}}^{k-1}\). Next, points in this subspace are deemed equivalent if they are a rotation and scaling of each other, which can be represented as multiplication by a complex number, ρe , where ρ is the scaling factor and θ is the rotation angle. The set of such equivalence classes forms the complex projective space, \(\mathbb{C}{P}^{k-2}\). As Kendall points out, there is no unique way to identify a shape with a specific point in complex projective space. However, if we consider that the geodesic regression problem only requires computation of exponential/log maps and Jacobi fields, we can formulate these computations without making an explicit identification of shapes with points in \(\mathbb{C}{P}^{k-2}\).

Thus, we think of a centered shape x ∈ V as representing the complex line \(L_{x} =\{ z \cdot x : z \in \mathbb{C}\setminus \{0\}\,\}\), i.e., L x consists of all point configurations with the same shape as x. A tangent vector at L x  ∈ V is a complex vector, v ∈ V , such that ⟨x, v⟩ = 0. The exponential map is given by rotating (within V ) the complex line L x by the initial velocity v, that is,

$$\mathrm{Exp}_{x}(v) =\cos \theta \cdot x + \frac{\|x\|\sin \theta } {\theta } \cdot v,\quad \theta =\| v\|.$$
(2.8)

Likewise, the log map between two shapes x, y ∈ V is given by finding the initial velocity of the rotation between the two complex lines L x and L y . Let \(\pi _{x}(y) = x \cdot \langle x,y\rangle /\|{x\|}^{2}\) denote the projection of the vector y onto x. Then the log map is given by

$$\mathrm{Log}_{x}(y) = \frac{\theta \cdot \left (y -\pi _{x}(y)\right )} {\|y -\pi _{x}(y)\|} ,\quad \theta =\arccos \frac{\vert \langle x,y\rangle \vert } {\|x\|\|y\|}.$$
(2.9)

Notice that we never explicitly project a shape onto \(\mathbb{C}{P}^{k-2}\). This has the effect that shapes computed via the exponential map (2.8) will have the same orientation and scale as the base point x. Also, tangent vectors computed via the log map (2.9) are valid only at the particular representation x (and not at a rotated or scaled version of x). This works nicely for our purposes and implies that shapes along the estimated geodesic will have the same orientation and scale as the intercept shape, \(\hat{p}\).

The sectional curvature of \(\mathbb{C}{P}^{k-2}\) can be computed as follows. Let u, w be orthonormal vectors at a point \(p \in \mathbb{C}{P}^{k-2}\). These vectors may be thought of as vectors in \({\mathbb{C}}^{k-1}\cong{\mathbb{R}}^{2k-2}\). Writing the vector w as \(w = (w_{1},\ldots ,w_{2k-2})\), define the operator

$$j(w) = (-w_{k},\ldots ,-w_{2k-2},w_{1},\ldots ,w_{k-1}).$$

(This is just multiplication by \(i = \sqrt{-1}\) if we take w as a complex vector with the k − 1 real coordinates listed first.) Using this operator, the sectional curvature is given by

$$K(u,w) = 1 + 3\langle u,j{(w)\rangle }^{2}.$$

When k = 3, \(\mathbb{C}{P}^{1}\) is the space of triangle shapes and is isomorphic to the sphere, S 2, and thus has constant sectional curvature, K = 1. For k > 3, \(\mathbb{C}{P}^{k-2}\) has sectional curvature in the interval K ∈ [1, 4]. Furthermore, let \(u \in T_{p}\mathbb{C}{P}^{k-2}\) be any unit length vector. If we decompose the tangent space into an orthonormal basis e 1, , e 2k − 2, such that e 1 = j(u), then we have K(u, e 1) = 4 and K(u, e i ) = 1 for i > 1. This leads to the following procedure for computing the Jacobi field equation on \(\mathbb{C}{P}^{k-2}\) along a geodesic γ. Given initial conditions for J(0) ⊥  and J′(0) ⊥ , decompose \(J{(0)}^{\perp } = u_{1} + w_{1}\), so that u 1 is orthogonal to j(γ′) and w 1 is tangential to j(γ′). Do the same for \(J^{\prime}{(0)}^{\perp } = u_{2} + w_{2}\). As before, extend these vectors to parallel fields, u i (t), w i (t), along γ. Then the orthogonal component of the Jacobi field along γ is given by

$$\begin{array}{rlrlrl} J{(t)}^{\perp } = u_{ 1}(t)\cos \left (Lt\right ) + u_{2}(t)\frac{\sin \left (Lt\right )} {L} + w_{1}(t)\cos \left (2Lt\right ) + w_{2}(t)\frac{\sin \left (2Lt\right )} {2L}. & & \end{array}$$

As was the case for rotations, both d p Exp and d v Exp are self-adjoint operators.

6.2 Application to Corpus Callosum Aging

The corpus callosum is the major white matter bundle connecting the two hemispheres of the brain. A midsagittal slice from a magnetic resonance image (MRI) with segmented corpus callosum is shown in Fig. 2.5. Several studies have shown that the volume of the corpus callosum decreases with normal aging [4]. However, less is known about how the shape of the corpus callosum changes with age. Understanding shape changes may provide a deeper understanding of the anatomical and biological processes underlying aging. For example, does the corpus callosum shrink uniformly in size, or do certain regions deteriorate faster than others? This type of question can be answered by geodesic regression in shape spaces.

Fig. 2.5
figure 5

Corpus callosum segmentation and boundary point model for one subject

To understand age-related changes in the shape of the corpus callosum, geodesic regression was applied to corpus callosum shape data derived from the OASIS brain database (www.oasis-brains.org). The data consisted of MRI from 32 subjects with ages ranging from 19 to 90 years old. The corpus callosum was segmented in a midsagittal slice using the ITK SNAP program (www.itksnap.org). These boundaries of these segmentations were sampled with 128 points using ShapeWorks (www.sci.utah.edu/software.html). This algorithm generates a sampling of a set of shape boundaries while enforcing correspondences between different point models within the population. An example of a segmented corpus callosum and the resulting boundary point model is shown in Fig. 2.5. The entire collection of input shapes and their ages is shown in Fig. 2.6 (boundary points have been connected into a boundary curve for visualization purposes). Each of these preprocessing steps were done without consideration of the subject age, to avoid any bias in the data generation.

Fig. 2.6
figure 6

The input corpus callosum shape data and corresponding subject ages in years

Geodesic regression was applied to the data (x i , y i ), where x i was the ith subject’s age, and y i was the ith subject’s corpus callosum, generated as above and represented as a point in Kendall’s shape space. First, the average age of the group, \(\bar{x}\), was subtracted from each x i , which was done to make the intercept term correspond to the shape at the mean age, rather than the shape at age 0, which would be far outside the data range. Least squares estimates \((\hat{p},\hat{v})\) were generated according to (2.5), and using the above calculations for \(\mathbb{C}{P}^{k-2}\). The resulting estimated geodesic is shown in Fig. 2.7 as a sequence of shapes: \(\hat{\gamma }(t_{k})\,=\,\mathrm{Exp}(\hat{p},(t_{k} -\bar{ x})\hat{v})\), for t k  = 19, 36, 54, 72, 90. The shape trend shows a very clear thinning of the corpus callosum, with the largest effects in the posterior part of the body and in the genu (anterior end).

Fig. 2.7
figure 7

Geodesic regression of the corpus callosum. The estimated geodesic is shown as a sequence of shapes from age 19 (blue) to age 90 (red)

The statistical significance of the estimated trend was tested using the permutation test described in Sect. 2.3.2, using 10,000 permutations. The p-value for the significance of the slope estimate, \(\hat{v}\), was p = 0. 009. The coefficient of determination (for the unpermuted data) was R 2 = 0. 12. The low R 2 value must be interpreted carefully. It says that age only describes a small fraction of the shape variability in the corpus callosum. This is not surprising: we would expect the intersubject variability in corpus callosum shape to be difficult to fully describe with a single variable (age). However, this does not mean that the age effects are not important. In fact, the low p-value says that the estimated age changes are highly unlikely to have been found by random chance.

Finally, the appropriateness of the resulting geodesic model fit was tested using a comparison to nonparametric regression, as outlined in Sect. 2.4. First, a nonparametric kernel regression of the corpus callosum data versus age was computed using the method developed by Davis et al. [2] and reviewed in Sect. 2.4.2. The kernel regression was performed on the same Kendall shape space manifold and the bandwidth was chosen automatically using the cross-validation procedure described in Sect. 2.4.3. Next, the resulting corpus callosum shape trend generated by the kernel regression method was compared to the result of the geodesic regression. This was done by again generating shapes from the geodesic model \(\hat{\gamma }(t_{k})\) at a sequence of ages, t k , and overlaying the corresponding generated shapes from the kernel regression model at the same ages. The results are plotted for ages t k  = 20, 44, 66, and 90 (Fig. 2.8). Both regression methods give strikingly similar results. The two regression models at other values of ages, not shown, are also close to identical. This indicates that a geodesic curve does capture the relationship between age and corpus callosum shape, and that the additional flexibility offered by the nonparametric regression does not change the estimated trend. However, even though both methods provide a similar estimate of the trend, the geodesic regression has the advantage that it is simpler to compute and easier to interpret, from the standpoint of the R 2 statistic and hypothesis test demonstrated above.

Fig. 2.8
figure 8

Comparison of geodesic regression (solid black) and nonparametric kernel regression (dashed red) of the corpus callosum shape versus age

7 Conclusion

We introduced a geodesic regression analysis method for Riemannian manifolds. The geodesic regression model is the natural generalization of linear regression and is parameterized by an intercept and slope term. We also developed a generalization of the R 2 statistic and a permutation test for the significance of the estimated geodesic trend. There are several avenues for future work. First, the hypothesis test presented here could be extended to test for group differences, for example, to test if age-related anatomical changes are different in a disease population compared to controls. Second, theoretical properties of geodesic regression, such as unbiasedness and consistency, would be of interest. Finally, regression diagnostics and model selection procedures need to be developed to assess the appropriateness of a geodesic model for a particular data set.