Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In biological tissues such as nerve fiber bundles and muscles, the spontaneous heat motion of water molecules is restricted by obstacles in the fibrous microstructure. Diffusion Imaging [70] uses the principles of Magnetic Resonance Imaging (MRI) to non-invasively measure properties of this motion, which is also known as self-diffusion. When applied to the human brain, this provides unique insights about brain connectivity, which makes diffusion MRI one of the key technologies in an ongoing large-scale scientific effort to map the human brain connectome [33]. Consequently, it is a timely and important topic of research to create mathematical models that infer biologically meaningful parameters from such data.

Higher-order tensors have been used in applications ranging from psychometrics [64] and chemometrics [103] to signal processing [102], computer vision [110], and neuroscience [85]. They also provide adequate models for a number of quantities that occur in the context of diffusion imaging. Many practitioners view higher-order tensors as a generalization of matrices to multi-way arrays. However, tensors can also be studied in an invariant, coordinate-free notation. Tensor decompositions are an active and challenging topic in applied mathematics, since fundamental concepts from linear algebra, such as the singular value decomposition, do not have unique generalization to higher order, and most generalizations are hard to compute.

It is a goal of our survey to stimulate an active exchange between mathematicians, who are studying tensor decompositions and the geometry of tensors, and computer scientists and MR physicists, who are interested in using tensors as mathematical tools in the context of diffusion MRI. Therefore, unlike previous surveys [39, 90], Sect. 2 provides a broad overview of all physical quantities that have been modeled with higher-order tensors in the context of diffusion MRI. On the other hand, our introduction to the higher-order tensor formalism in Sect. 3 differs from existing discussions [66, 73] by focusing on aspects relevant to this specific application.

Relevant literature is spread over journals in applied mathematics, MR physics, neuroimaging, and computer science. Drawing on all these fields, Sect. 4 presents the current state of the art on fitting higher-order tensor models to the measured data, and Sect. 5 discusses operations performed on the tensors for further analysis. Among others, this includes computation of scalar invariants (Sect. 5.1), maximum detection (Sect. 5.3), and tensor decompositions (Sect. 5.4).

2 Overview of Higher-Order Tensor Models in dMRI

Different physical quantities that can be measured by or inferred from diffusion MRI have been modeled with higher-order tensors. The resulting tensors not only differ in their interpretation, but also in dimension, order, and symmetry.

Diffusion imaging inserts magnetic field gradients into the MR sequence which sensitize the measurement to molecular motion along the gradient direction [70]. Compared to an image without diffusion weighting, this leads to an attenuation of signal strength. The standard diffusion tensor model [18] assumes that the diffusion-weighted MR signal in direction u is given by a monoexponential attenuation of the unweighted signal S 0, depending on the diffusion weighting b and a directionally dependent apparent diffusion coefficient, modeled by a diffusion tensor D:

$$\displaystyle{ S(\mathbf{u}) = S_{0}\mathrm{e}^{-b\mathbf{u}^{\mathrm{T}}\mathbf{D}\mathbf{u} } }$$
(1)

Estimating the six unique coefficients of D requires measurements in at least six different gradient directions. Typical parameter values are b ∈ [700, 1, 000] mm2∕s, and a spatial resolution of around 2 × 2 × 2 mm3. When studying the human brain, this corresponds to a subdivision into around 105 volume elements (voxels); a separate diffusion tensor D is computed for each of them.

Since nerve fibers are on the micrometer scale and therefore far below image resolution, their complex organization often leads to apparent diffusivities D(u) that are poorly approximated by a quadratic function. For these cases, Eq. (1) has been generalized to use higher-order polynomials. As it will be explained in Sect. 3.3, this corresponds to a higher-order diffusion tensor \(\mathcal{D}\) [86]:

$$\displaystyle{ S(\mathbf{u}) = S_{0}\mathrm{e}^{-b\,D(\mathbf{u})}\quad \text{with}\quad D(\mathbf{u}) = \mathcal{D}\cdot ^{k}\mathbf{u} }$$
(2)

Such High Angular Resolution Diffusion Imaging (HARDI) models require a larger number of 30–100 gradient directions, and larger b ∈ [1, 000, 3, 000] mm2∕s.

One goal in diffusion imaging is to estimate the dominant nerve fiber directions within each voxel. When there is only one such direction, the principal eigenvector of the diffusion tensor D is aligned with it. However, a mixture of multiple fiber directions is not easily resolved with the higher-order diffusion tensor \(\mathcal{D}\). For this purpose, it is easier to consider the diffusion propagator P(x), the probability density of a molecular displacement along vector x within the diffusion time. Under certain assumptions, P(x) can be computed from \(\mathcal{D}\); this will be the topic of Sect. 5.2.

Writing the diffusion propagator P(x) in spherical coordinates and integrating over the radius results in the diffusion orientation distribution function ψ(u), whose maxima approximate the main nerve fiber directions. The q-ball model has been introduced as an approximative way of computing ψ(u) [109]. Even though its exact interpretation has been disputed [17], q-ball maxima indicate approximate fiber directions, and q-balls are sometimes expressed in a tensor basis [7, 55], making it relevant to compute the maxima of homogeneous forms (cf. Sect. 5.3).

When measuring at different b values, it is common to observe that the true signal attenuation is not monoexponential, as assumed by Eqs. (1) and (2). This indicates that the diffusion propagator P(x) is non-Gaussian. Accounting for all higher-order moments of P leads to a different generalization of Eq. (1) [77, 78],

$$\displaystyle{ S(\mathcal{B}) = S_{0}\,\mathrm{e}^{\sum _{k=2}^{\infty }j^{k}\langle \mathcal{D}^{(k)},\mathcal{B}^{(k)}\rangle }, }$$
(3)

where j is the imaginary unit, \(\mathcal{D}^{(k)}\) is a series of diffusion tensors with increasing order k, and the diffusion-weighted signal \(S(\mathcal{B})\) is a function of a series of tensors \(\mathcal{B}^{(k)}\), which combine information about the direction and strength of the diffusion weighting. \(\langle \mathcal{D}^{(k)},\mathcal{B}^{(k)}\rangle\) denotes the scalar product of the two tensors.

In contrast to Eq. (2), which uses a single higher-order tensor \(\mathcal{D}\) that contains all the information that would be present in lower-order approximations, each element in the series of tensors \(\mathcal{D}^{(k)}\) in Eq. (3) contains non-redundant information that is independent from all other orders k. This additional information needs to be acquired by sampling multiple b values in several gradient directions [79].

The tensors in Eq. (3) are three-dimensional, and symmetric under all index permutations. The odd orders k in Eq. (3) carry information about asymmetries in the diffusion propagator, i.e., P(−x) ≠ P(x). However, that information resides in the phase of the complex-valued MR signal. At the technical state of the art, signal phase in diffusion MRI is so heavily corrupted by measurement noise and artifacts that it is not informative. Therefore, practical implementations of this generalization are limited to estimating even-order tensors from the signal magnitude [80].

Diffusional Kurtosis Imaging augments the second-order diffusion tensor D in Eq. (1) with a fourth-order kurtosis tensor \(\mathcal{W}\) [61],

$$\displaystyle{ S(\mathbf{u},b) = S_{0}\mathrm{e}^{-b\mathbf{u}^{\mathrm{T}}\mathbf{D}\mathbf{u}+\frac{1} {6} b^{2}\left (\frac{1} {3} \mathrm{tr}(\mathbf{D})\right )^{2}\mathcal{W}\cdot ^{4}\mathbf{u} }, }$$
(4)

where tr indicates matrix trace. Computing the parameters in Eq. (4) requires measurements at multiple b values, but no signal phase. They capture the same information present in the second and fourth moments of P(x), but allow for simpler computation of the apparent diffusional kurtosis K app in direction u:

$$\displaystyle{ K_{\mbox{ app}}(\mathbf{u}) = \frac{\left (\frac{1} {3}\mathrm{tr}(\mathbf{D})\right )^{2}} {\left (\mathbf{u}^{\mathrm{T}}\mathbf{D}\mathbf{u}\right )^{2}} \mathcal{W}\cdot ^{4}\mathbf{u} }$$
(5)

For Gaussian diffusion, K app = 0. Negative kurtosis is expected from diffusion restricted by spherical pores, and positive kurtosis can indicate presence of heterogeneous diffusion compartments [61].

Fourth-order covariance tensors Σ occur in statistical models of second-order diffusion tensors [19]. Even though they are three-dimensional in each mode, they only possess partial symmetries (Σ ijkl  = Σ klij ; Σ ijkl  = Σ jikl ; Σ ijkl  = Σ ijlk ) [20].

If we assume that all nerve fiber bundles within a voxel have approximately the same diffusion characteristics, the MR signal is given by the convolution of a fiber orientation density function (fODF) with a kernel describing the single fiber response [107]. Unlike the diffusion ODF, values of the fODF F(u) are interpreted as the fraction of fibers aligned with direction u. F(u) can be obtained by spherical deconvolution and a variant of that technique, which will be explained in Sect. 4.3, allows for further analysis of the fODF via tensor decomposition [100].

3 Mathematical Background

We include a basic introduction to tensors and tensor fields. In a nutshell, a tensor of order p or p-tensor is a multilinear functional on p vector spaces \(T: \mathbb{V} \times \mathbb{V} \times \ldots \times \mathbb{V} \rightarrow \mathbb{R}\), and can be represented in coordinates as a p-dimensional matrix \(A \in \mathbb{R}^{n\times n\times \ldots \times n}\), \(n =\dim (\mathbb{V})\), if one chooses a basis on \(\mathbb{V}\). A tensor field is a tensor-valued function on a manifold. We refer readers who are interested in further properties of tensors and hypermatrices to [73] for an elementary treatment. Mathematically sophisticated readers may consult [66] for a much more in-depth treatment.

3.1 Basic Definitions

Let us first define our basic mathematical objects: (i) tensors, and (ii) tensor fields. Let \(\mathbb{V}\) be a vector space over \(\mathbb{R}\). An order-p tensor is a multilinear functional

$$\displaystyle{f:\mathop{\underbrace{\mathop{ \mathbb{V} \times \mathbb{V} \times \ldots \times \mathbb{V}}}\limits }\limits_{ p\text{ times}} \rightarrow \mathbb{R}.}$$

Multilinear means that if all arguments are kept constant but one, then f is linear in that varying argument, i.e.,

$$\displaystyle{ f(\mathbf{u}_{1},\ldots,\alpha \mathbf{v}_{i} +\beta \mathbf{w}_{i},\ldots,\mathbf{u}_{p}) =\alpha f(\mathbf{u}_{1},\ldots,\mathbf{v}_{i},\ldots,\mathbf{u}_{p}) +\beta f(\mathbf{u}_{1},\ldots,\mathbf{w}_{i},\ldots,\mathbf{u}_{p}), }$$
(6)

for every \(i = 1,\ldots,p\), \(\alpha,\beta \in \mathbb{R}\) and \(\mathbf{u}_{i},\mathbf{v}_{i},\mathbf{w}_{i} \in \mathbb{V}\). The set of all p-tensors is called the p-fold tensor product of the vector space \(\mathbb{V}\) and denoted

$$\displaystyle{\mathbb{V}^{\otimes p} =\mathop{\underbrace{\mathop{ \mathbb{V} \otimes \mathbb{V} \otimes \ldots \otimes \mathbb{V}}}\limits }\limits_{ p\text{ times}}.}$$

We ignore the distinction between covariant, contravariant and mixed tensors, since it is less relevant when working with coordinate representations in an orthonormal basis, as will be the case in this survey. An abstract approach towards tensors is now standard in any basic graduate courses in algebra [57, 68] or even mathematical methods courses for physicists [38]. However, such courses focus almost exclusively on properties of an entire space of tensors [116] as opposed to properties of an individual tensor, i.e., a specific element from such a tensor space. Properties of an individual tensor such as rank, norm, eigenvalues, decompositions, are of great relevance to us and will be discussed after we introduce tensor fields.

We will be informal in our treatment of tensor fields to make it more easily accessible. Readers who wish to see a rigorous definition would have no shortage of standard Refs. [24, 67, 112] to consult. Let M be a topological manifold which we may later endow with additional structures (differential, Riemannian, Finsler, etc.). A tensor field is, roughly speaking, a tensor-valued function \(F: M \rightarrow \mathbb{V}^{\otimes p}\) or alternatively, a function of the form

$$\displaystyle{ F: M \times \mathop{\underbrace{\mathop{ \mathbb{V} \times \mathbb{V} \times \ldots \times \mathbb{V}}}\limits }\limits_{ p\text{ times}} \rightarrow \mathbb{R} }$$
(7)

with the property that for every point x ∈ M,

$$\displaystyle{F(\mathbf{x};\cdot,\cdot,\ldots,\cdot ): \mathbb{V} \times \mathbb{V} \times \ldots \times \mathbb{V} \rightarrow \mathbb{R}}$$

is a multilinear functional, i.e., \(F(\mathbf{x};\mathbf{u}_{1},\ldots,\mathbf{u}_{p})\) is multilinear in the last p arguments for every fixed x ∈ M. If we want F to have additional properties like continuity or differentiability, this definition is only good locally, i.e., every x 0 ∈ M has a neighborhood \(U_{\mathbf{x}_{0}} \subseteq M\) such that

$$\displaystyle{F: U_{\mathbf{x}_{0}} \times \mathbb{V} \times \mathbb{V} \times \ldots \times \mathbb{V} \rightarrow \mathbb{R}}$$

is multilinear for every \(\mathbf{x} \in U_{\mathbf{x}_{0}}\). By far the most common choice for \(\mathbb{V}\) is T x (M), the tangent space at x, i.e., the vector space \(\mathbb{V}_{i}\) changes with each x and we really have a multilinear function

$$\displaystyle{F(\mathbf{x};\cdot,\cdot,\ldots,\cdot ): T_{\mathbf{x}}(M) \times \ldots \times T_{\mathbf{x}}(M) \rightarrow \mathbb{R}}$$

at each \(\mathbf{x} \in U_{\mathbf{x}_{0}}\). So each \(F(\mathbf{x};\cdot,\cdot,\ldots,\cdot )\) has a different domain, and F is really a family of multilinear functionals parameterized by x ∈ M. The proper treatment is to define F as a section (of a tensor product of vector bundles) as opposed to a function (with values in a tensor product of vector spaces). In fact, tensor fields are more than pointwise multilinear functionals, they satisfy the multilinearity condition in Eq. (6) with coefficients α, β being real-valued functions on M (usually in C (M) if \(M\) is a smooth manifold) instead of merely being constants in \(\mathbb{R}\).

The above discussions use the coordinate-free language of modern treatments of tensors and tensor fields in mathematics. In applications such as those considered in this survey, computations require introducing coordinates by chosing a basis on \(\mathbb{V}\). If we pick a basis \(\mathbf{b}_{1},\ldots,\mathbf{b}_{n}\), where \(n =\dim (\mathbb{V})\), then a multilinear functional f may be represented as an \(n \times n \times \ldots \times n\) (p times) array of elements of \(\mathbb{R}\):

$$\displaystyle{ \mathcal{A} = (a_{i_{1}i_{2}\cdots i_{p}})_{i_{1},\ldots,i_{p}=1}^{n} \in \mathbb{R}^{n\times \ldots \times n}. }$$
(8)

We shall use the term hypermatrix of order p, or simply p-hypermatrix, when referring to a p-dimensional matrix of the form in Eq. (8). The origin of this terminology would appear to be [37]. These objects are natural multilinear generalizations of matrices in the following way. Since we have fixed a basis, every vector in \(\mathbb{V}\) has a coordinate representation and we may assume that \(\mathbb{V} = \mathbb{R}^{n}\). A bilinear functional \(f: \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}\) can be encoded by a matrix \(\mathbf{A} = [a_{\mathit{ij}}]_{i,j=1}^{n} \in \mathbb{R}^{n\times n}\), in which the entry a ij records the value of \(f(\mathbf{e}_{i},\mathbf{e}_{j}) \in \mathbb{R}\) where e i denotes the ith standard basis vector in \(\mathbb{R}^{n}\). By linearity in each coordinate, specifying A determines the values of f on all of \(\mathbb{R}^{n} \times \mathbb{R}^{n}\); in fact, we have \(f(\mathbf{u},\mathbf{v}) = \mathbf{u}^{T}\mathbf{Av}\) for any (column) vectors \(\mathbf{u},\mathbf{v} \in \mathbb{R}^{n}\). Thus, matrices encode all bilinear functionals. If A = A T is symmetric, the corresponding bilinear map is invariant under exchanging of coordinates:

$$\displaystyle{f(\mathbf{u},\mathbf{v}) = \mathbf{u}^{T}\mathbf{Av} = (\mathbf{u}^{T}\mathbf{Av})^{T} = \mathbf{v}^{T}\mathbf{A}^{T}\mathbf{u} = \mathbf{v}^{T}\mathbf{Au} = f(\mathbf{v},\mathbf{u}).}$$

To avoid sub-subscripts, we will restrict our discussion to 4-tensors. A 4-tensor is a quadrilinear functional \(f: \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}\) which has a coordinate representation given by a 4-hypermatrix \(\mathcal{A} = (a_{\mathit{ijkl}})_{i,j,k,l=1}^{n} \in \mathbb{R}^{n\times n\times n\times n}\) as in Eq. (8) with p = 4. The subscripts and superscripts in Eq. (8) will be dropped whenever the range of i, j, k, l is obvious or unimportant. A 4-hypermatrix is said to be symmetric if the value of a ijkl stays the same for all 24 permutations of the indices:

$$\displaystyle{a_{\mathit{ijkl}} = a_{\mathit{ijlk}} = a_{\mathit{jilk}} =\ldots = a_{\mathit{lkji}}.}$$

Symmetric 4-tensors correspond to coordinate representations of quadrilinear maps \(f: \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}\) with

$$\displaystyle{f(\mathbf{t},\mathbf{u},\mathbf{v},\mathbf{w}) = f(\mathbf{t},\mathbf{u},\mathbf{w},\mathbf{v}) = f(\mathbf{u},\mathbf{t},\mathbf{v},\mathbf{w}) =\ldots = f(\mathbf{w},\mathbf{v},\mathbf{u},\mathbf{t}).}$$

The set of symmetric 4-hypermatrices is often denoted \(\mathsf{S}^{4}(\mathbb{R}^{n})\) and it forms a linear subspace of the vector space \(\mathbb{R}^{n\times n\times n\times n}\). More generally \(\mathsf{S}^{p}(\mathbb{V})\), the set of symmetric p-tensors over an arbitrary vector space \(\mathbb{V}\), may be defined in a coordinate-free manner [26] and forms a subspace of \(\mathbb{V}^{\otimes p}\).

What about tensor fields? Since any manifold M may be given local coordinates, we may view tensor fields as hypermatrix-valued functions \(F: M \rightarrow \mathbb{R}^{n\times \ldots \times n}\), \(\mathbf{x}\mapsto \mathcal{A}_{\mathbf{x}} = (a_{i_{1}i_{2}\cdots i_{p}}(\mathbf{x}))_{i_{1},\ldots,i_{p}=1}^{n}\), that are locally defined (roughly speaking, they are defined for local coordinates chosen for each neighborhood U x  ⊆ M). The coordinate-dependent view of tensor fields as (hyper)matrix-valued functions is the classical approach. The subject, studied in this light, is often called tensor calculus, tensor analysis, or Ricci calculus. Tullio Levi-Civita, Gregorio Ricci-Curbastro, and Jan Schouten are usually credited for its invention [104].

3.2 Tensor Algebra and Homogeneous Polynomials

As we saw in the last section, a 4-hypermatrix \(\mathcal{A}\in \mathbb{R}^{n\times n\times n\times n}\), is a coordinate representation of a 4-tensor, i.e., a quadrilinear functional \(f: \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}\). The set of 4-hypermatrices is naturally equipped with algebraic operations inherited from the algebraic structure of the tensor product space \(\mathbb{R}^{n} \otimes \mathbb{R}^{n} \otimes \mathbb{R}^{n} \otimes \mathbb{R}^{n}\):

  • Addition and Scalar Multiplication: for \((a_{\mathit{ijkl}}),(b_{\mathit{ijkl}}) \in \mathbb{R}^{n\times n\times n\times n}\) and \(\lambda,\mu \in \mathbb{R}\),

    $$\displaystyle{ \lambda (a_{\mathit{ijkl}}) +\mu (b_{\mathit{ijkl}}) = (\lambda a_{\mathit{ijkl}} +\mu b_{\mathit{ijkl}}) \in \mathbb{R}^{n\times n\times n\times n}, }$$
    (9)
  • Outer Product Decomposition: every \(\mathcal{A} = (a_{\mathit{ijkl}}) \in \mathbb{R}^{n\times n\times n\times n}\) may be decomposed as

    $$\displaystyle{ \mathcal{A} =\sum \nolimits _{ q=1}^{r}\lambda _{ q}\,\mathbf{w}_{q} \otimes \mathbf{x}_{q} \otimes \mathbf{y}_{q} \otimes \mathbf{z}_{q},\qquad a_{\mathit{ijkl}} =\sum \nolimits _{ q=1}^{r}\lambda _{ q}w_{iq}x_{jq}y_{kq}z_{lq}, }$$
    (10)

    with \(\lambda _{q} \in \mathbb{R}\), \(\mathbf{w}_{q},\mathbf{x}_{q},\mathbf{y}_{q},\mathbf{z}_{q} \in \mathbb{R}^{n}\) for \(q = 1,\ldots,r\). The symbol ⊗ here denotes the Segre outer product: for vectors \(\mathbf{w} = [w_{1},\ldots,w_{n}]^{T},\ldots,\mathbf{z} = [z_{1},\ldots,z_{n}]^{T}\),

    $$\displaystyle{\mathbf{w} \otimes \mathbf{x} \otimes \mathbf{y} \otimes \mathbf{z}:= (w_{i}x_{j}y_{k}z_{l})_{i,j,k,l=1}^{n} \in \mathbb{R}^{n\times n\times n\times n},}$$

    with obvious generalization to an arbitrary number of vectors. The -fold outer product of x with itself is written x .

  • Multilinear Matrix Multiplication: every \(\mathcal{A} = (a_{\mathit{ijkl}}) \in \mathbb{R}^{n\times n\times n\times n}\) may be multiplied on its ‘4 sides’ by matrices W = [w ], X = [x ], Y = [y ], \(\mathbf{Z} = [z_{l\delta }] \in \mathbb{R}^{n\times r}\) as follows

    $$\displaystyle\begin{array}{rcl} \mathcal{A}\cdot (\mathbf{W},\mathbf{X},\mathbf{Y},\mathbf{Z})& =& (c_{\alpha \beta \gamma \delta })_{\alpha,\beta,\gamma,\delta =1}^{n} \in \mathbb{R}^{n\times n\times n\times n}, \\ c_{\alpha \beta \gamma \delta }& =& \sum \nolimits _{i,j,k,l=1}^{n}a_{\mathit{ ijkl}}w_{i\alpha }x_{j\beta }y_{k\gamma }z_{l\delta }. {}\end{array}$$
    (11)

A different choice of bases \(\mathbf{b}_{1}^{{\prime}},\ldots,\mathbf{b}_{n}^{{\prime}}\) on \(\mathbb{V}\) would lead to a different hypermatrix representation \(\mathcal{B}\in \mathbb{R}^{n\times n\times n\times n}\) of elements in \(\mathbb{V} \otimes \mathbb{V} \otimes \mathbb{V} \otimes \mathbb{V}\) – where the two hypermatrix representations \(\mathcal{A}\) and \(\mathcal{B}\) would be related precisely by a multilinear matrix multiplication of the form

$$\displaystyle{\mathcal{A}\cdot (\mathbf{X},\mathbf{X},\mathbf{X},\mathbf{X}) = \mathcal{B}}$$

where X is the change-of-basis matrix, i.e., an invertible matrix with \(\mathbf{Xb}_{q} = \mathbf{b}_{q}^{{\prime}}\) for \(q = 1,\ldots,n\). Therefore, a tensor and a hypermatrix are different in the same way a linear operator and a matrix are different. Note that in the context of matrices,

$$\displaystyle{\mathbf{x} \otimes \mathbf{y} = \mathbf{xy}^{T}\quad \text{and}\quad \mathbf{A} \cdot (\mathbf{X},\mathbf{Y}) = \mathbf{Y}^{T}\mathbf{AX}.}$$

When r = 1 in Eq. (11), i.e., the matrices W, X, Y, Z are vectors w, x, y, z, we omit the ⋅ and write

$$\displaystyle{ \mathcal{A}(\mathbf{w},\mathbf{x},\mathbf{y},\mathbf{z}) =\sum \nolimits _{ i,j,k,l=1}^{n}a_{\mathit{ ijkl}}w_{i}x_{j}y_{k}z_{l} }$$
(12)

for the associated quadrilinear functional. Another special case occurs when one or more of the matrices W, X, Y, Z in Eq. (11) is the identity I = I n×n . For example,

$$\displaystyle{ \mathcal{A}(\mathbf{I},\mathbf{x},\mathbf{y},\mathbf{z}) =\sum \nolimits _{ j,k,l=1}^{n}a_{\mathit{ ijkl}}x_{j}y_{k}z_{l} \in \mathbb{R}^{n}. }$$
(13)

In particular, the (partial) gradient of the quadrilinear functional \(\mathcal{A}(\mathbf{w},\mathbf{x},\mathbf{y},\mathbf{z})\) may be expressed as

$$\displaystyle{\nabla _{\mathbf{w}}\mathcal{A}(\mathbf{w},\mathbf{x},\mathbf{y},\mathbf{z}) = \mathcal{A}(\mathbf{I},\mathbf{x},\mathbf{y},\mathbf{z}),\quad \nabla _{\mathbf{x}}\mathcal{A}(\mathbf{w},\mathbf{x},\mathbf{y},\mathbf{z}) = \mathcal{A}(\mathbf{w},\mathbf{I},\mathbf{y},\mathbf{z}),\quad \text{etc.}}$$

For a symmetric 4-tensor \(\mathcal{S}\), we write \(\mathcal{S}\cdot \mathbf{x}\) as a shorthand for \(\mathcal{S}(\mathbf{x},\mathbf{I},\mathbf{I},\mathbf{I})\); the result is a 3-tensor. Repeating this operation times is written \(\mathcal{S}\cdot ^{\ell}\mathbf{x}\). With this notation, the homogeneous quartic polynomial \(\mathcal{S}(\mathbf{x})\) that is uniquely associated with \(\mathcal{S}\) can be written as

$$\displaystyle{ \mathcal{S}(\mathbf{x}):= \mathcal{S}(\mathbf{x},\mathbf{x},\mathbf{x},\mathbf{x}) = \mathcal{S}\cdot ^{4}\mathbf{x} =\sum \nolimits _{ d_{1}+\ldots +d_{n}=4}\mu _{d_{1}\cdots d_{n}}\sigma _{d_{1}\cdots d_{n}}x_{1}^{d_{1} }x_{2}^{d_{2} }\cdots x_{n}^{d_{n} }. }$$
(14)

Similarly, the gradient of \(\mathcal{S}(\mathbf{x})\) can be conveniently expressed as \(\nabla \mathcal{S}(\mathbf{x}) = 4\mathcal{S}\cdot ^{3}\mathbf{x}\). The right-hand side of Eq. (14) is the more typical way of writing a homogeneous polynomial in terms of monomials, unique coefficients \(\sigma _{d_{1},\ldots,d_{n}}\), and multiplicities \(\mu _{d_{1},\ldots,d_{n}}:= \binom{n}{d_{1},\ldots,d_{n}}\). This is the higher-order equivalent of writing, for \(\mathbf{A} = \left [\begin{matrix}\scriptstyle a&\scriptstyle b \\ \scriptstyle b&\scriptstyle c\end{matrix}\right ]\) and \(\mathbf{x} = \left [\begin{matrix}\scriptstyle x_{1} \\ \scriptstyle x_{2}\end{matrix}\right ]\),

$$\displaystyle{\mathbf{A}(\mathbf{x}) = \mathbf{x}^{T}\mathbf{A}\mathbf{x} = ax_{ 1}^{2} + bx_{ 1}x_{2} + bx_{2}x_{1} + cx_{2}^{2} = ax_{ 1}^{2} + 2bx_{ 1}x_{2} + cx_{2}^{2}.}$$

The Frobenius norm or Hilbert-Schmidt norm of a tensor \(\mathcal{A}\) is defined by

$$\displaystyle{ \Vert \mathcal{A}\Vert _{F}^{2} =\sum \nolimits _{ i,j,k,l=1}^{n}a_{\mathit{ ijkl}}^{2}. }$$
(15)

This is by far the most popular choice of norms used for a tensor since it is readily computable and also because it is induced by an inner product

$$\displaystyle{ \langle \mathcal{A},\mathcal{B}\rangle =\sum \nolimits _{ i,j,k,l=1}^{n}a_{\mathit{ ijkl}}b_{\mathit{ijkl}} }$$
(16)

that generalizes the trace inner product. For symmetric p-tensors \(\mathcal{S},\mathcal{T}\) expressed in monomial form as in Eq. (14), this inner product may be written in the form

$$\displaystyle{\langle \mathcal{S},\mathcal{T}\rangle:=\sum \nolimits _{d_{1}+\ldots +d_{n}=p}\mu _{d_{1}\cdots d_{n}}\sigma _{d_{1}\cdots d_{n}}\tau _{d_{1}\cdots d_{n}}}$$

and is often called the apolar inner product in invariant theory. For any \(\mathbf{v} \in \mathbb{R}^{n}\), the apolar inner product of a symmetric tensor and the rank-1 symmetric tensor \(\mathbf{v}^{\otimes p}:= \mathbf{v} \otimes \ldots \otimes \mathbf{v}\) (p times),

$$\displaystyle{\langle \mathcal{S},\mathbf{v}^{\otimes p}\rangle = \mathcal{S}(\mathbf{v}),}$$

which makes the set of symmetric p-tensors into a reproducing kernel Hilbert space.

3.3 Homogeneous Polynomials and Spherical Harmonics

By restricting Eq. (14) to the 3D unit sphere S 2, x = [sinθcosϕ, sinθsinϕ, cosθ]T, every symmetric tensor \(\mathcal{S}\) defines a real-valued homogeneous polynomial function on S 2. Spherical harmonics (SH) are an alternate basis for describing functions on the sphere. The SHs form a complex complete orthonormal basis for square integrable functions on the unit sphere. Spherical functions can, therefore, be naturally expanded in the infinite SH basis or approximated to any accuracy by a truncated series. Again the diffusion signal being real and symmetric, a modified real and symmetric SH basis is chosen in dMRI. Therefore, S can be written as

$$\displaystyle{ S(\theta,\phi ) =\sum \nolimits _{ j=1}^{M^{{\prime}} }c_{j}Y _{j}(\theta,\phi ), }$$
(17)

where θ ∈ [0, π], ϕ ∈ [0, 2π) and c j are the coefficients describing S in the modified SH basis [29]

$$\displaystyle{ Y _{j}(\theta,\phi ) = \left \{\begin{array}{@{}l@{\quad }l@{}} \sqrt{2}\text{Re}(Y _{l}^{\vert m\vert }(\theta,\phi )) \quad &\text{if }m < 0, \\ Y _{l}^{m}(\theta,\phi ) \quad &\text{if }m = 0, \\ (-1)^{m+1}\sqrt{2}\text{Im}(Y _{l}^{m}(\theta,\phi ))\quad &\text{if }m > 0, \end{array} \right. }$$
(18)

with Y l m(θ, ϕ) the rank l and degree m regular complex spherical harmonic:

$$\displaystyle{ Y _{l}^{m}(\theta,\phi ) = \sqrt{\frac{(2l + 1)(l - m)!} {4\pi (l + m)!}} P_{l}^{m}(\cos \theta )e^{im\phi },\quad m \leq \vert l\vert. }$$
(19)

In [28, 86] it was shown that the tensor basis and the SH basis are bijective via a linear transformation when the rank l of the truncated SH basis equals the order k of the symmetric tensor. This can be understood from the spherical harmonic transform of the polynomial representation of S:

$$\displaystyle{ c_{j} =\sum \nolimits _{ i=1}^{M^{{\prime}}=M }\mu _{i}\sigma _{i}\int _{S_{2}}x_{1}^{\alpha _{i} }x_{2}^{\beta _{i} }x_{3}^{l-\alpha _{i}-\beta _{i} }Y _{j}(\theta,\phi )d\varOmega. }$$
(20)

where the new indexing of μ and σ assumes an arbitrary ordering of the \(\mu _{d_{1}\cdots d_{n}}\) and \(\sigma _{d_{1}\cdots d_{n}}\) from Eq. (14). Since the integral does not depend on the tensor coefficients σ j , Eq. (20) can be seen as a dot product between the vector of unique tensor coefficients and the vector of spherical harmonic transforms of the M monomials \(x_{1}^{\alpha _{i}}x_{2}^{\beta _{i}}x_{3}^{l-\alpha _{i}-\beta _{i}}\). In other words, computing the M SH coefficients can be written as a matrix vector multiplication

$$\displaystyle{ \mathbf{c} = \mathbf{Ms}, }$$
(21)

where \(\mathbf{c} = [c_{1},c_{2},\ldots,c_{M}]^{T}\), \(\mathbf{s} = [\sigma _{1},\sigma _{2},\ldots,\sigma _{M}]^{T}\), and:

$$\displaystyle{ \mathbf{M} = \left [\begin{array}{*{10}c} \mu _{1}\int _{S_{2}}x_{1}^{\alpha _{1}}x_{2}^{\beta _{1}}x_{3}^{l-\alpha _{1}-\beta _{1}}Y _{1}d\varOmega &\ldots & \mu _{M}\int _{S_{ 2}}x_{1}^{\alpha _{M}}x_{ 2}^{\beta _{M}}x_{ 3}^{l-\alpha _{M}-\beta _{M}}Y _{ 1}d\varOmega \\ \vdots &\ddots & \vdots \\ \mu _{1}\int _{S_{2}}x_{1}^{\alpha _{1}}x_{2}^{\beta _{1}}x_{3}^{l-\alpha _{1}-\beta _{1}}Y _{M}d\varOmega &\ldots &\mu _{M}\int _{S_{ 2}}x_{1}^{\alpha _{M}}x_{ 2}^{\beta _{M}}x_{ 3}^{l-\alpha _{M}-\beta _{M}}Y _{ M}d\varOmega \end{array} \right ]. }$$
(22)

3.4 Tensor Decompositions and Approximations

A tensor that can be expressed as an outer product of vectors is called decomposable and rank-1 if it is also nonzero. More generally, the rank of a tensor \(\mathcal{A} = (a_{\mathit{ijkl}})_{i,j,k,l=1}^{n} \in \mathbb{R}^{n\times n\times n\times n}\), denoted \(\text{rank}(\mathcal{A})\), is defined as the minimum r for which \(\mathcal{A}\) may be expressed as a sum of r rank-1 tensors [52, 53],

$$\displaystyle{ \text{rank}(\mathcal{A}):=\min {\Bigl \{ r\Bigm |\mathcal{A} =\sum \nolimits _{ q=1}^{r}\lambda _{ q}\,\mathbf{w}_{q} \otimes \mathbf{x}_{q} \otimes \mathbf{y}_{q} \otimes \mathbf{z}_{q}\Bigr \}} }$$
(23)

where the minimum is taken over all decomposition with \(\lambda _{p} \in \mathbb{R}\), \(\mathbf{w}_{p},\mathbf{x}_{p},\mathbf{y}_{p},\mathbf{z}_{p} \in \mathbb{R}^{n}\), \(p = 1,\ldots,r\). If \(\mathcal{S}\) is a symmetric tensor, then its symmetric rank [26] is

$$\displaystyle{ \text{srank}(\mathcal{S}):=\min {\Bigl \{ r\Bigm |\mathcal{S} =\sum \nolimits _{ q=1}^{r}\lambda _{ q}\,\mathbf{x}_{q} \otimes \mathbf{x}_{q} \otimes \mathbf{x}_{q} \otimes \mathbf{x}_{q}\Bigr \}}. }$$
(24)

We remark that it is not known whether the rank of a symmetric tensor is equal to its symmetric rank. The definition of rank in Eq. (23) agrees with matrix rank when applied to an order-2 tensor. In certain other literature, for example [90], the term ‘rank’ is used synonymously with what we called ‘order’ in the first paragraph of this section. For tensors of order greater than 2, rank becomes a more intricate notion than matrix rank with properties that may seem surprising at first encounter. We refer the readers to [31] (rank) and [26] (symmetric rank) for further information.

Best rank-r approximations

$$\displaystyle{ \text{argmin}_{\boldsymbol{\lambda }\in \mathbb{R}^{r},\;\mathbf{W},\mathbf{X},\mathbf{Y},\mathbf{Z}\in \mathbb{R}^{n\times r}}\left \Vert \mathcal{A}-\sum \nolimits _{q=1}^{r}\lambda _{ q}\,\mathbf{w}_{q} \otimes \mathbf{x}_{q} \otimes \mathbf{y}_{q} \otimes \mathbf{z}_{q}\right \Vert }$$
(25)

and the corresponding best symmetric rank-r approximation problem (i.e., when W = X = Y = Z) are used in practice (Sect. 5.4), but have no solution in general when r > 1. The easiest way to explain this is that the infimum of the objective function, taken over all \(\boldsymbol{\lambda }= (\lambda _{1},\ldots,\lambda _{r}) \in \mathbb{R}^{r}\) and \(\mathbf{W} = [\mathbf{w}_{1},\ldots,\mathbf{w}_{r}]\), \(\mathbf{X} = [\mathbf{x}_{1},\ldots,\mathbf{x}_{r}]\), \(\mathbf{Y} = [\mathbf{y}_{1},\ldots,\mathbf{y}_{r}]\), \(\mathbf{Z} = [\mathbf{z}_{1},\ldots,\mathbf{z}_{r}] \in \mathbb{R}^{n\times r}\) need not be attained. This happens regardless of symmetry, the choice of norms in Eq. (25) and for any order p ≥ 3. In the unsymmetric case, it is known that the set of tensors of rank s > r that do not have a best rank-r approximation could form a positive volume set. A particularly egregious case is \(\mathbb{R}^{2\times 2\times 2}\), where no rank-3 tensor has a best rank-2 approximation. Fortunately, there are special cases where the problem can be alleviated, notably: (i) when all coordinates of \(\mathcal{A}\) are nonnegative and \(\boldsymbol{\lambda },\,\mathbf{W},\mathbf{X},\mathbf{Y},\mathbf{Z} \geq 0\) [74]; (ii) when W, X, Y, Z satisfy a ‘coherence’ condition [75]; (iii) when p is even and \(\boldsymbol{\lambda }\geq 0\) [76]. Unlike cases (i) and (ii), case (iii) only applies to symmetric approximations.

3.5 Eigenvectors and Eigenvalues

The basic notions for eigenvalues of tensors were introduced independently by Lim [72] and Qi [92]. The usual eigenvalues and eigenvectors of a matrix \(\mathbf{A} \in \mathbb{R}^{n\times n}\) are the stationary values and stationary points of its Rayleigh quotient, and this point of view generalizes naturally to tensors of higher order. This gives, for example, an eigenvector of a tensor \(\mathcal{A} = (a_{\mathit{ijkl}})_{i,j,k,l=1}^{n} \in \mathbb{R}^{n\times n\times n\times n}\) as a nonzero column vector \(\mathbf{x} = [x_{1},\ldots,x_{n}]^{T} \in \mathbb{R}^{n}\) satisfying

$$\displaystyle{ \sum \nolimits _{i,j,k=1}^{n}a_{\mathit{ ijkl}}x_{i}x_{j}x_{k} =\lambda x_{l},\quad l = 1,\ldots,n, }$$
(26)

for some \(\lambda \in \mathbb{R}\), which is called an eigenvalue of \(\mathcal{A}\). Notice that if (λ, x) is an eigenpair, then so is (t 2 λ, t x) for any t ≠ 0; thus, eigenpairs are more naturally defined projectively. As in the matrix case, generic tensors over \(\mathbb{R}\) or \(\mathbb{C}\) have a finite number of eigenvalues and eigenvectors (up to this scaling equivalence), although their count is exponential in n. Still, it is possible for a tensor to have an infinite number of eigenvalues, but in that case they comprise a cofinite set of complex numbers. For an even-ordered symmetric tensor \(\mathcal{S}\in \mathsf{S}^{2p}(\mathbb{R}^{n})\), one has that \(\mathcal{S}\) is nonnegative definite, i.e., \(\mathcal{S}(\mathbf{x}) \geq 0\) for all \(\mathbf{x} \in \mathbb{R}^{n}\), if and only if all the eigenvalues of \(\mathcal{S}\) are nonnegative [92] – a generalization of a well-known fact for symmetric tensors.

It is worth noting that unlike in the matrix case, most tensor problems are NP-hard. This includes determining rank, best rank-1 approximation, spectral norm, eigenvalues, and eigenvectors [51]. However, the notion of NP-hardness is an asymptotic one that applies when n → . Therefore, these hardness results do not preclude the existence of efficient algorithms for a fixed n, and especially for small values such as n = 3, the case of greatest interest to diffusion MRI.

4 Fitting Higher-Order Tensor Models

4.1 Fitting Models of Apparent Diffusivity

One of the earliest models that attempted to overcome the limitations of second-order diffusion tensors used HOTs to account for diffusion with generalized angular profiles while preserving its radial monoexponential behavior [86]. Even order Cartesian tensors were used to measure the apparent diffusivities (ADC) from the generalized Stejskal-Tanner equation as described in Eq. (2).

The simplest method [86] for estimating such tensors, \(\mathcal{D}\), from the diffusion signal is to linearize the Stejskal-Tanner equation by taking the logarithm of Eq. (2). This leads to system of linear equations: Ax = y, where the rows of the design matrix A contain the monomials of the homogeneous form \(D(\mathbf{u}) = \mathcal{D}^{(k)} \cdot ^{k}\mathbf{u}\), the vector y contains the log-normalized diffusion signal scaled by the acquisition parameter b, and the vector x contains the unknown coefficients of the tensor \(\mathcal{D}\). This system is overdetermined when the number of data acquisitions is greater than the number of unknown tensor coefficients and can be solved uniquely in the least squares sense by taking the Moore-Penrose pseudo-inverse of A.

Since diffusivity is a non-negative physical quantity, the homogeneous form D(u) cannot be negative for any u ∈ S 2. This leads to a positivity constraint that needs to be respected while estimating \(\mathcal{D}\). The least squares approach often violates this constraint for \(\mathcal{D}\) with high orders and when the acquisitions are noisy.

Descoteaux et al. [28] proposed a linear approach with angular regularization to account for noisy acquisitions. Leveraging the bijection between HOTs and SHs, see Eq. (21), they estimated the coefficients of \(\mathcal{D}\) by first estimating the coefficients in an SH basis of rank equal to the order of the tensor while applying Laplace-Beltrami smoothing on the sphere and then converting back to the tensor basis. This again leads to a linear system that is overdetermined when the number of acquisitions is larger than the number of tensor coefficients,

$$\displaystyle{ \mathbf{x} = \mathbf{M}^{-1}(\mathbf{B}^{T}\mathbf{B} +\lambda \mathbf{L})^{-1}\mathbf{B}^{T}\mathbf{y}, }$$
(27)

where x contains the unique tensor coefficients, y contains the log-normalized signal, B is the design matrix in the SH basis and M represents the linear transformation matrix between the HOT basis and the SH basis. The matrix L is a diagonal matrix with entries \(l_{ii} =\ell_{ i}^{2}(\ell_{i} + 1)^{2}\), which represents the Laplace-Beltrami regularization of the SH Y m, and λ is the regularization weight. This becomes the least squares solution when λ = 0, but with nonzero λ, L tends to smooth higher order terms more, therefore, dampening the effects of noise in higher orders.

Florack et al. used the same Laplace-Beltrami regularization on the sphere, but for tensors instead of SHs [34]. This was based on an infinite inhomogeneous tensor basis representation, much like the SHs, with the diffusion function modified to \(\tilde{D}(\mathbf{u}) =\sum _{ k=0}^{\infty }\mathcal{D}^{(k)} \cdot ^{k}\mathbf{u}\). It was shown that on the sphere, this representation was redundant, and when truncated to a finite order, it represented the same diffusion function as in Eq. (2). The relation between the homogeneous and inhomogeneous tensor representation has been addressed rigorously in [8]. The estimation process was specifically crafted such that higher order tensors only captured the residual information not available in lower order tensors. This resulted in a “canonical” tensorial representation where the span of a tensor of fixed order k formed a degenerate eigenspace for the Laplace-Beltrami operator with eigenvalue − k(k + 1), exactly like the SHs.

The problem of estimating \(\mathcal{D}\) with the positivity constraint was solved for order 4 tensors, in two different ways. The homogeneous forms of symmetric order 4 tensors of dimension 3 are known as ternary quartics . Barmpoutis et al. [9, 10] and Ghosh et al. [43] use Hilbert’s theorem on positive semi-definite (psd) ternary quartics:

Theorem 1.

If P(x,y,z) is homogeneous, of degree 4, with real coefficients and P(x,y,z) ≥ 0 at every \((x,y,z) \in \mathbb{R}^{3}\) , then there are quadratic homogeneous polynomials f,g,h with real coefficients, such that \(P = f^{2} + g^{2} + h^{2}\) .

Therefore, estimating P(x, y, z) (or \(\mathcal{D}^{(4)}\)) by estimating f, g, h ensures \(\mathcal{D}^{(4)}\) to be psd. However, these quadratic polynomials can only be uniquely determined up to a 3D rotation and up to a sign. In other words, if the 6 coefficients of f, g, h each are written as column vectors \(\mathbf{w}_{f},\mathbf{w}_{g},\mathbf{w}_{h}\), respectively, and a 6 × 3 matrix \(\mathbf{W} = [\mathbf{w}_{f},\mathbf{w}_{g},\mathbf{w}_{h}]\) is constructed, then \(P(x,y,z) = \mathbf{v}^{T}\mathbf{WW}^{T}\mathbf{v}\), where \(\mathbf{v}^{T} = [x^{2},y^{2},z^{2},xy,xz,yz]\). Thus W, −W and WR for any 3 × 3 orthogonal matrix R result in the same P.

Initially, Barmpoutis et al. fixed R by choosing the rotation that renders A – the top 3 × 3 block of W – to a lower triangular matrix [10]. This was achieved by considering the QR-decomposition of A, but in practice A was taken to be lower triangular. This resulted in a reduction of unknown coefficients from 18 = 3 × 6 to 15, which is exactly the number of unique coefficients of \(\mathcal{D}^{(4)}\). In a later work [9], an Iwasawa decomposition of WW T was taken, which implied the Cholesky decomposition of A. This again resulted in A being rendered lower triangular – defining uniqueness over 3D rotations and again reducing the number of unknowns to 15. Furthermore, the Cholesky decomposition constrained the diagonal entries of A to be positive – defining uniqueness over the sign.

Ghosh et al. [43] estimated all 18 unknowns of W and reconstructed the 15 coefficients of \(\mathcal{D}^{(4)}\) from the Gram matrix WW T. Although W cannot be estimated uniquely, the Gram matrix representing the homogeneous form P is unique and the mapping from the coefficients of the Gram matrix to the coefficients of \(\mathcal{D}^{(4)}\) is unique. Therefore, the estimation of the tensor coefficients is unambiguous. While Barmpoutis et al. [9, 10] used a Levenberg-Marquardt optimization scheme, Ghosh et al. [43] prefer the Broyden-Fletcher-Goldfarb-Shanno (BFGS) scheme.

Barmpoutis et al. [9, 10] further introduced an L2 distance measure between the homogeneous forms corresponding to the tensors evaluated on the unit sphere

$$\displaystyle{ \text{dist}(\mathcal{D}_{1}^{(4)},\mathcal{D}_{ 2}^{(4)})^{2} = \frac{1} {4\pi }\int _{S^{2}}[D_{1}(\mathbf{u}) - D_{2}(\mathbf{u})]^{2}d\mathbf{u}, }$$
(28)

which was computed analytically in terms of the difference of the coefficients of \(\mathcal{D}_{1}^{(4)}\) and \(\mathcal{D}_{2}^{(4)}\), and which was used for spatial regularization of the tensor field to account for noise.

A second way of estimating \(\mathcal{D}^{(4)}\) with the positivity constraint was proposed by Ghosh et al. [44]. In this approach, the 6 × 6 isometrically equivalent matrix representation [20] D of \(\mathcal{D}^{(4)}\) was used. Since D is symmetric and its positive definiteness ensures \(\mathcal{D}^{(4)}\) to be positive, the affine invariant Riemannian metric for the space of symmetric positive definite matrices [71] was used to estimate D via a Riemannian gradient descent. However, the symmetry of the tensor \(\mathcal{D}^{(4)}\) cannot be entirely captured by D, which has 21 unique coefficients. Therefore, a final symmetrizing step was used to recover a positive and symmetric tensor \(\mathcal{D}^{(4)}\).

The problem of estimating an arbitrary even order HOT, \(\mathcal{D}^{(2k)}\), with the positivity constraint was also solved in two different ways. Barmpoutis et al. [13] used a result that states that for any even degree, 2k, a (homogeneous) polynomial positive on the unit sphere can be written as a sum of squares of polynomials, p, of degree k on the unit sphere, \(D(\mathbf{u}) = \mathcal{D}^{(2k)} \cdot ^{(2k)}\mathbf{u} =\sum _{ j=1}^{R}\lambda _{j}p^{(k)}(\mathbf{u},\mathbf{c}_{j})^{2}\), where λ j are all positive and c j are the coefficient vectors of the polynomials p j with | | c j  | |  = 1. However, since R, the number of polynomials in the sum, cannot be determined, they reformulated the problem as a spherical convolution problem \(D(\mathbf{u}) =\int _{S^{\#\mathbf{c}-1}}\lambda (\mathbf{c})p^{(k)}(\mathbf{u};\mathbf{c})^{2}d\mathbf{c}\), where the unit sphere S #c−1 is embedded in \(\mathbb{R}^{\#\mathbf{c}}\), with # c being the number of elements in c. The convolution was solved numerically by discretizing S #c−1 finely and \(\mathcal{D}^{(2k)}\) was estimated by solving the least squares problem for the unknowns λ j

$$\displaystyle{ E =\sum \nolimits _{ i=1}^{N}\left (S_{ i}/S_{0} -\mathrm{ e}^{-b\sum _{j=1}^{r}\lambda _{ j}p(\mathbf{g}_{i};\mathbf{c}_{j})^{2} }\right )^{2} }$$
(29)

using a non-negative least squares (NNLS) to ensure that all λ j  ≥ 0. Eq. (29) essentially overestimates R by r by discretizing the convolution, while the NNLS tends to compute a sparse solution ensuring that Eq. (29) does not overfit the signal.

A second method for estimating even order psd HOTs based on convex optimization was proposed by Qi et al. [95]. It was shown that the set of order 2k psd HOTs, \(\mathcal{D}\), form a closed convex cone \(\mathcal{C}\) in \(\mathbb{R}^{n}\), where \(\mathcal{D}\) has n unique coefficients and can be represented by \(\mathbf{x} \in \mathbb{R}^{n}\). Furthermore, the psd constrained least squares estimation was shown to be convex and quadratic with a unique minimizer \(\mathbf{x}^{{\ast}} \in \mathcal{C}\) such that if the unconstrained solution \(\overline{\mathbf{x}} \in \mathbb{R}^{n}\setminus \mathcal{C}\) then \(\mathbf{x}^{{\ast}}\in \partial \mathcal{C}\), the boundary of \(\mathcal{C}\). The explicit psd constraint on \(\mathcal{D}\) was formulated as \(\lambda _{\min }(\mathcal{D}) \geq 0\) where \(\lambda _{\min }(\mathcal{D})\), the minimum Z-eigenvalue of \(\mathcal{D}\), was shown to be computationally tractable. The psd HOT \(\mathcal{D}_{\mathbf{x}^{{\ast}}}\) (corresponding to x ) was estimated by first checking the psd-ness of the unconstrained HOT \(\mathcal{D}_{\mathbf{\overline{x}}}\). If \(\lambda _{\min }(\mathcal{D}_{\mathbf{\overline{x}}}) \geq 0\), then by uniqueness \(\mathcal{D}_{\mathbf{x}^{{\ast}}} = \mathcal{D}_{\mathbf{\overline{x}}}\). However, if \(\mathbf{\overline{x}}\notin \mathcal{C}\), then \(\mathcal{D}_{\mathbf{x}^{{\ast}}}\) was estimated by solving the non-differentiable, non-convex optimization problem \(L(\mathbf{x}) =\min \{ \vert \mathbf{Ax} -\mathbf{y}\vert ^{2}:\lambda _{\min }(\mathcal{D}_{\mathbf{x}}) = 0\}\), with only an equality constraint, by a subgradient descent approach. In theory, \(\mathcal{D}_{\mathbf{x}^{{\ast}}}\) could also be estimated by solving the psd constrained non-differentiable convex least squares problem.

Alternatively, Barmpoutis et al. [11] used even ordered HOTs to model the logarithm of the diffusivities. This preserved the monoexponential radial diffusion but considered the exponential of the tensor for the angular diffusion \(D(\mathbf{u}) =\exp (\mathcal{D}^{(k)} \cdot ^{k}\mathbf{u})\) (in Eq. (2)). This automatically ensured positive diffusion without having to impose any constraints. The approach was inspired by the Log-Euclidean metric for DTI [3].

4.2 Fitting Models of Apparent Diffusional Kurtosis

Fitting the coefficients of the diffusion tensor D and kurtosis tensor \(\mathcal{W}\) in Eq. (5) is simplified by initially considering each gradient direction separately, and finding parameters of the corresponding one-dimensional diffusion process,

$$\displaystyle{ S(b) = S_{0}\mathrm{e}^{-bd+\frac{1} {6} (bd)^{2}K }, }$$
(30)

where d and K are apparent diffusion and kurtosis coefficients, respectively. Estimating these two variables requires measurements S(b) with at least two non-zero b-values, in addition to the baseline S 0 measurement. After taking the logarithm on both sides of Eq. (30), this leads to a system of equations that is quadratic in d, and can thus no longer be solved with a linear least squares estimator. Instead, gradient-based iterative Levenberg-Marquardt optimization has been employed [61].

Assuming a Gaussian noise model results in a positive bias in the estimated kurtosis values, which can be removed by finding the maximum likelihood fit under a Rician noise model [111] or, more easily, by accounting for the noise-induced bias in the measurements themselves [61, 82]. This is done by adding an estimate η of the background noise to the signal model in Eq. (30),

$$\displaystyle{ S(b) = \sqrt{\eta ^{2 } + \left (S_{0 } \mathrm{e}^{-bd+\frac{1} {6} (bd)^{2}K }\right )^{2}}. }$$
(31)

After finding parameters d i and K i for each individual gradient direction i, a second-order diffusion tensor D can be fit linearly to the d i . Given this estimate of D, the fourth-order kurtosis tensor \(\mathcal{W}\) can then be fit linearly using Eq. (4) [69, 82].

Kurtosis is a dimensionless quantity and can, in theory, take on any value K ≥ −2. However, the kurtosis of a system that contains noninteracting Gaussian compartments with different diffusivities is always non-negative, and empirical results suggest non-negative kurtosis in human brain tissue [61]. Similarly, an upper bound on kurtosis, \(K_{i} \leq 3/(b_{\text{max}}d_{i})\), where b max is the largest b-value used in the measurements, is implied by the empirical observation that in practice, the signal S(b) is a monotonically decreasing function of b. These two constraints have been enforced as part of the fitting, using quadratic programming or heuristic thresholding [105]. Other authors have chosen to merely enforce the lower bound \(K \geq -3/7\), which correspond to the kurtosis of water confined to equally-sized spherical pores, by a sum-of-squares parametrization of the homogeneous polynomial represented by \(\mathcal{W}\) [16]. Additional regularization has been employed to penalize extrema in the homogeneous form that fall outside the range of the measured kurtosis values [65].

4.3 Fitting Deconvolution-Based Models

Spherical deconvolution models the diffusion-weighted signal S(u) in different gradient directions u as the convolution of a fiber orientation density function (fODF) F with a response function R. It describes the signal attenuation caused by a single nerve fiber bundle, and it is assumed to be cylindrically symmetric:

$$\displaystyle{ S(\mathbf{u}) =\iint _{\|\mathbf{v}\|=1}F(\mathbf{v})\;R(\mathbf{v} \cdot \mathbf{u})\;d\mathbf{v} }$$
(32)

Based on Eq. (32), deconvolution can be used to estimate the fiber ODF F from the measurements S. Deconvolution is done most easily in the spherical harmonics basis, where it amounts to simple scalar division. However, constructing a spherical harmonics representation of the deconvolution kernel R requires two choices: Beside estimating the response of a single fiber compartment from the data [107] or deriving it from an analytical fiber model [30, 101], it involves deciding how the single fiber compartment should be represented after deconvolution [106].

Even though the delta distribution may seem like an obvious choice, it requires an infinite number of coefficients in the spherical harmonics basis. Therefore, Tournier et al. [106] approximate the delta peak, resulting in non-trivial interactions between peaks of non-orthogonal fiber compartments and leading to systematic errors when taking ODF maxima as estimates of fiber directions, even when no measurement noise is present. Schultz and Seidel [100] have removed this problem by instead modeling single fiber peaks as rank-1 tensors, and performing a low-rank approximation of the resulting order-p fODF tensor \(\mathcal{F}\),

$$\displaystyle{ \mathop{\text{argmin}}\limits_{\lambda _{i},\mathbf{v}_{i}}\;\left \|\mathcal{F}-\sum \nolimits _{i=1}^{r}\lambda _{ i}\mathbf{v}_{i}^{\otimes p}\right \|_{ F}, }$$
(33)

where v i describe the per-compartment principal directions, and λ i are proportional to their volume fractions. The approximation rank r corresponds to the number of discrete fiber compartments; one way to estimate it is by learning from simulated training data via support vector regression [97].

This tensor-based variant of spherical deconvolution uses the linear bijection between spherical harmonics and polynomial bases (cf. Sect. 3.3) twice: First, to map a rank-1 tensor of the same order as the desired fODF tensor \(\mathcal{F}\) to the spherical harmonics basis, which is required to find the correct kernel R for use with that tensor order. Second, to transform the deconvolution result F, obtained in the spherical harmonics basis, back into its tensor representation \(\mathcal{F}\).

Since compartments cannot have negative weights, valid fODF tensors should permit a positive decomposition into rank-1 terms. For tensor order k > 2, this is a stronger requirement than non-negativity of the homogeneous form, which is a more natural constraint for models of apparent diffusivity (Sect. 4.1). It can be enforced by computing an approximation with the generic number of rank-1 terms and non-negative weights [98].

Similar to a previous approach of Barmpoutis et al. [13], Weldeselassie et al. [113] enforce non-negativity of F by parametrizing the homogeneous polynomial \(\mathcal{F}\) (with even order k) as a sum of squares of polynomials of order k∕2. Rather than performing the deconvolution in spherical harmonics, they discretize the fODF, so that it can be found as the non-negative least squares solution of a linear system.

4.4 Fitting Other Types of Models

When fitting the higher-order diffusion model described by Eq. (3) [77, 78], we only consider tensors of even order, as was argued in Sect. 2. By taking the logarithm and truncating after order 2n, the equation can be rewritten in the form

$$\displaystyle{ \text{Re}[\log (S(\mathcal{B})/S_{0})] =\sum \nolimits _{ k=1}^{n}(-1)^{k}{\bigl \langle\mathcal{B}^{(2k)},\mathcal{D}^{(2k)}\bigr \rangle}, }$$

where Re denotes the real part of the logarithmic signal and the inner products between tensors \(\mathcal{B}^{(2k)}\) and \(\mathcal{D}^{(2k)}\) is defined in Eq. (16). Tensors \(\mathcal{D}^{(2k)}\) can be estimated by considering measurements with different gradient strengths and directions, which lead to different \(\mathcal{B}_{i}^{(2k)}\), and truncating the tensor series at the desired order. If we have m measurements, we obtain m equations of the above form, linear in the coefficients of \(\mathcal{D}\). These can be combined in a matrix equation

$$\displaystyle{ \mathbf{y}(\log (\vert S_{i}(\mathcal{B}_{i})\vert /\vert S_{0}\vert )) = \mathbf{B}(\mathcal{B}^{(2k),i})\,\mathbf{x}(\mathcal{D}^{(2k)}), }$$
(34)

where i = 1, , m. In practice, the modulus | ⋅ | rather than the real part of the complex signal is used, since phase is unreliable. The vector x, which contains the coefficients of \(\mathcal{D}^{(2k)}\), can be estimated by solving Eq. (34) in the least squares sense.

Higher-order tensors representing q-ball ODFs (see Sect. 2) can also be fitted to HARDI data. An analytical solution for the q-ball ODF is given by Anderson [2], Hess et al. [50], and Descoteaux et al. [29]

$$\displaystyle{ \psi _{\text{q-ball}}(\mathbf{u}) =\sum \nolimits _{ i=1}^{N}2\pi P_{ l_{i}}(0)\,c_{i}\,Y _{i}(\mathbf{u}) }$$
(35)

where u is a unit norm vector, \(P_{l_{i}}\) is the Legendre polynomial of degree l i , \(\{Y _{i}\}_{i=1}^{N}\) is a modified SH basis as in Eq. (18), and c i are the harmonic coefficients of the MR signal. A tensor representation of ψ q-ball can be obtained from the bijection between SHs and tensors. Alternatively, it can be reconstructed directly in a tensor basis [34]

$$\displaystyle{ \psi _{\text{q-ball}}(\mathbf{u}) =\sum \nolimits _{ k=0}^{n}2\pi P_{ l_{k}}(0)\mathcal{S}_{k} \cdot ^{k}\mathbf{u} }$$
(36)

where n is the maximum order of a series of tensors \(\mathcal{S}_{k}\) fitted to the diffusion signal such that higher orders only encode the fitting residuals from lower orders.

5 Processing Higher-Order Tensors in Diffusion MRI

5.1 Computing Rotationally Invariant Scalar Measures

It is desirable to extract meaningful scalars from the estimated higher-order tensors. In particular, rotationally invariant quantities are preferable. These are independent of the coordinate system and thus intrinsic features of the tensor.

5.1.1 Higher-Order Diffusion Tensors

Rotationally invariant measures of diffusivity and anisotropy based on higher-order diffusion tensors have been proposed in [89]. The mean diffusivity is defined as:

$$\displaystyle{ \langle D\rangle = \frac{1} {4\pi }\iint _{S^{2}}D(\mathbf{u})\;d\mathbf{u} }$$
(37)

where u is a unit direction vector and D(u) are the diffusivities as in Eq. (2). The generalized anisotropy (GA) and scaled entropy (SE) are given by

$$\displaystyle{ \mbox{ GA} = 1 - \frac{1} {1 + (250\mbox{ V})^{\varepsilon (V )}}\quad \text{and}\quad \mbox{ SE} = 1 - \frac{1} {1 + (60(\mbox{ ln}\;3-\eta ))^{\varepsilon (\mbox{ ln}\;3-\eta )}}, }$$
(38)

where \(\varepsilon (\gamma ) = 1 + 1/(1 + 5,000\cdot \gamma )\) and V and η are the variance and entropy of the normalized diffusivities, \(D(\mathbf{u})/3\langle D\rangle\). The definition of these measures does not rely on any specific tensor order. In addition, GA and SE are scaled between 0 and 1. Note that these measures can also be calculated from other functions defined on the unit sphere, such as orientation distribution functions.

GA and SE values for simulated data modelling two and three fibers show a clear difference between those implied by tensors of order 2 and higher-order (4, 6 and 8) tensors, the latter being significantly higher [27, 28, 84, 89]. GA and SE have also been reported to be slightly higher in the case of sixth order tensors than for order 4 [27]. On the other hand, for data simulating one fiber, GA and SE are independent of the tensor order. This is also the case for the mean diffusivity [89].

GA and SE for real HARDI data of healthy subjects have been studied in [27, 83, 84]. It has been shown that fourth- and sixth-order tensors result in increased values for both measures, especially for SE, with respect to second-order tensors. This effect is observed in areas with intra-voxel orientational heterogeneity but also in some regions with coherent axonal orientation. On the other hand, GA and SE become more sensitive to noise for increasing tensor order [27].

The variance of fourth-order covariance tensors has also been investigated for DTI data of glioblastoma patients [32]. Results indicate a better variance contrast between tumor subregions than for FA.

5.1.2 Diffusional Kurtosis Tensors

A number of rotationally invariant scalar measures based on fourth-order kurtosis tensors have been proposed. Different definitions of mean kurtosis (also referred to as average AKC), kurtosis anisotropy, radial and axial kurtoses can be found in the literature. Some of them are related to certain eigenvalues of the kurtosis tensor, which we discuss later in this section. These measures are summarized in Tables 1 and 2. It is clear that they are rotationally invariant, since both the AKC and eigenvalues involved in their definition are rotationally invariant.

Table 1 Mean kurtosis and kurtosis anisotropy. β: D-eigenvalue of \(\mathcal{W}\), ν: number of D-eigenvalues, N: total number of diffusion measurement directions, K app: AKC in a particular direction as in Eq. (5), e i (i = 1, 2, 3): eigenvector of diffusion tensor \(\mathcal{D}\), \(K_{\mbox{ app}}(\mathbf{e_{i}}) = (\mbox{ MD}^{2}/\lambda _{i}^{2}) \cdot \hat{\mathcal{W}}_{iiii}\), \(\hat{\mathcal{W}}\): kurtosis tensor in the basis {e i}, \(\bar{K} = (1/3)(K_{\mbox{ app}}(\mathbf{e_{1}}) + K_{\mbox{ app}}(\mathbf{e_{2}}) + K_{\mbox{ app}}(\mathbf{e_{3}}))\)

Note that the first two definitions of kurtosis anisotropy in Table 1 are completely analogous to the DTI case but based on the kurtosis tensor D-eigenvalues and AKC values along the diffusion tensor eigenvectors, respectively. As FA, FA K takes on values 0 ≤ FA K  ≤ 1, except for the definition in [91].

Table 2 Axial and radial kurtoses. \(\mathbf{e}_{\boldsymbol{\phi }} = (0,\mbox{ cos}\phi,\mbox{ sin}\phi )\) in the basis {e i}

Some of these measures have been probed for in vivo and ex vivo rat brain DKI, and compared to their DTI analogues [56]. Mean and radial kurtoses showed strong contrast between GM and WM both in and ex vivo. In particular, radial kurtosis performs better than all other directional diffusivities and kurtoses. For axial kurtosis, a stronger contrast was observed under ex vivo conditions. On the other hand, kurtosis anisotropy was similar to FA both in and ex vivo.

Mean kurtosis and kurtosis anisotropy have also been computed by an adaptive spherical integral, and compared to those based on D-eigenvalues for real diffusion data of a healthy subject and a stroke patient [81]. The latter are seen to be more sensitive to noise. Exact expressions for mean and radial kurtoses can be obtained [105]. These have been shown, together with axial kurtosis, on DKI scans of healthy subjects [60, 105]. The optimization of the diffusion gradient settings for estimation of mean and radial kurtosis, and kurtosis anisotropy has been studied as well. It has been shown that this increases precision considerably [91].

D-eigenvalues of the fourth-order kurtosis tensor \(\mathcal{W}\) are defined by Qi et al. [94]

$$\displaystyle{ \mathcal{W}\cdot ^{3}\mathbf{x} =\beta \; \mathbf{D}\mathbf{x};\;\;\mathbf{x}^{\mathrm{T}}\mathbf{D}\mathbf{x} = 1, }$$
(39)

where x is the D-eigenvector associated to D-eigenvalue β. D-eigenvalues have been shown to be rotationally invariant [94]. The largest and smallest D-eigenvalues can be used to compute the largest and smallest AKC values as \((\mbox{ MD})^{2}\beta _{\mbox{ max}}\) and \((\mbox{ MD})^{2}\beta _{\mbox{ min}}\). Other type of eigenvalues which have been studied in this context are the Kelvin eigenvalues of the kurtosis tensor, which are also rotationally invariant. A three-dimensional symmetric fourth-order tensor can be mapped to a six-dimensional second-order tensor. The eigenvalues (η 1, , η 6) of its matrix representation, a symmetric 6 × 6 matrix, are the Kelvin eigenvalues of the considered fourth-order tensor. It has been shown that the largest and smallest Kelvin eigenvalues of (a scaled version of) the kurtosis tensor \(\hat{\mathcal{W}}\) are, respectively, an upper and lower bound of the largest and smallest AKC values [93]. The interpretation of Kelvin eigenvalues in terms of AKC values is thus less clear than for D-eigenvalues.

5.1.3 Orientation Distribution Functions

ODF maxima are characterized by their position and value (see Sect. 5.3), but also by their geometric shape. A peak sharpness measure

$$\displaystyle{ \mathrm{PS} = \frac{-\mu _{1}} {k\,F(\mathbf{u})} }$$
(40)

can be derived from the value F(u), order k of \(\mathcal{F}\), and a Hessian eigenvalue μ 1 of F (at maxima, μ 2 ≤ μ 1 ≤ 0). The homogeneous forms of second-order tensors F have a single maximum, whose sharpness depends on the degree to which F has a linear shape, as measured by the widely used invariant \(c_{l} = (\lambda _{1} -\lambda _{2})/\lambda _{1}\) [114]. In fact, when applied to a second-order tensor, PS = c l [98].

Peak Fractional Anisotropy (PFA) is designed to coincide with traditional Fractional Anisotropy (FA) [21] when the diffusion process is well-described by a second-order diffusion tensor, but generalizes it to a per-peak measure in case of more than one ODF maximum [41]. It is defined by fitting a second-order tensor to each ODF peak and computing its FA. Based on the function value F and principal curvatures κ 1 > κ 2 at the maximum, the fitted tensor eigenvalues are given by:

$$\displaystyle\begin{array}{rcl} \text{ODF-T}:\lambda _{1} = F^{2},\;\lambda _{ 2} = \frac{F} {\kappa _{2}},\;\lambda _{3} = \frac{F} {\kappa _{1}} & &{}\end{array}$$
(41)
$$\displaystyle\begin{array}{rcl} \text{ODF-SA}:\lambda _{1} = 1,\;\lambda _{2} = \frac{3} {2 +\kappa _{2}F},\;\lambda _{3} = \frac{3} {2 +\kappa _{1}F}& &{}\end{array}$$
(42)

ODF-T refers to the q-ball defined by Tuch [109]; ODF-SA denotes a solid angle ODF [1, 108]. The total PFA is defined by considering a weighted sum of the PFA over all ODF maxima:

$$\displaystyle{ \mbox{ Total-PFA} =\sum \nolimits _{ i=1}^{\sharp \mbox{ max.} }F_{i} \cdot \mbox{ PFA}_{i} }$$
(43)

Unlike Fractional Anisotropy, Total-PFA is able to distinguish between near- isotropic regions with many weak ODF maxima and areas with complex fiber structure, which exhibit multiple, high anisotropy maxima.

Other geometrical scalars have also been considered. The Ricci scalar is a well-known invariant quantity in differential geometry representing intrinsic curvature, and constructed from the metric and metric-derived tensors. It has been proposed as a DTI scalar measure in the context of Riemannian geometry [35]. The Ricci scalar can also be calculated from a (strongly) convexified ODF by relating it to Finsler geometry (see Sect. 5.5 and chapter “Riemann-Finsler Geometry for Diffusion Weighted Magnetic Resonance Imaging”) [6]. However, experimental results on the latter have not yet been reported.

In addition, principal invariants of fully symmetric fourth-order tensors representing an ODF have been studied [36]. Invariants of fourth-order covariance tensors in DTI had been previously investigated [20]. More general invariants of fourth-order tensors have been recently presented [42]. Principal invariants can be computed from the tensor Kelvin eigenvalues (η 1, , η 6) (see Sect. 5.1.2):

$$\displaystyle\begin{array}{rcl} I_{1}& =& \eta _{1} +\eta _{2} +\eta _{3} +\eta _{4} +\eta _{5} +\eta _{6} \\ I_{2}& =& \eta _{1}\eta _{2} +\eta _{1}\eta _{3} +\ldots +\eta _{5}\eta _{6} \\ I_{3}& =& \eta _{1}\eta _{2}\eta _{3} +\eta _{1}\eta _{2}\eta _{4} +\ldots +\eta _{4}\eta _{5}\eta _{6} \\ I_{4}& =& \eta _{1}\eta _{2}\eta _{3}\eta _{4} +\eta _{1}\eta _{2}\eta _{3}\eta _{5} +\ldots +\eta _{3}\eta _{4}\eta _{5}\eta _{6} \\ I_{5}& =& \eta _{1}\eta _{2}\eta _{3}\eta _{4}\eta _{5} +\ldots +\eta _{2}\eta _{3}\eta _{4}\eta _{5}\eta _{6} \\ I_{6}& =& \eta _{1}\eta _{2}\eta _{3}\eta _{4}\eta _{5}\eta _{6} {}\end{array}$$
(44)

These quantities are, by definition, rotationally invariant and can therefore be used as building blocks for invariant scalar HARDI measures. Experiments on HARDI phantom and brain data have been presented but further work is required to asses the utility of principal invariants in this context. Finally, note that both the Ricci scalar and principal invariants can also be calculated from higher-order diffusion tensors.

5.2 Reconstructing the Diffusion Propagator

The diffusion process is characterized by a probability density function P(r, t) that specifies the probability of a spin displacement r within diffusion time t. P(r, t) is known as the diffusion propagator or Ensemble Average Propagator (EAP) . It is related to the dMRI signal by a Fourier transform in the q-space formalism \(S(\mathbf{q},t)/S_{0} =\int _{\mathbb{R}^{3}}P(\mathbf{r},t)\mathrm{e}^{2\pi i\mathbf{q}\cdot \mathbf{r}}d\mathbf{r}\) [25]. Even though higher order tensor estimates of ADC and kurtosis can discern regions with multiple fiber directions, they cannot be used to resolve the directions themselves. To resolve fiber directions the EAP or its characteristics such as the ODF need to be computed.

In DTI, the diffusivities are modeled by a quadratic function given by the diffusion tensor, Eq. (1). The Fourier transform of the resulting signal yields the corresponding EAP, an oriented Gaussian distribution with the tensor’s largest eigen-pair indicating the single major fiber direction. However, when HOTs are used to model more complex ADC profiles, computing the EAP turns out to be a trickier problem.

Unlike in DTI, the analytical Fourier transform of the tensor model in Eq. (2) is unknown. In [88], a fast Fourier transform was performed on interpolated (and extrapolated) q-space data on a Cartesian grid generated from the tensor in Eq. (2) to numerically estimate the EAP. In [87], an analytical EAP on a single R 0-shell, i.e., \(P(R_{0} \frac{\mathbf{r}} {\vert \vert \mathbf{r}\vert \vert })\), was proposed for this model. However, in this Diffusion Orientation Transform (DOT) , the SH basis representation of the tensor was used, see Eq. (21).

In [40], the authors considered a modified non-monoexponential model inspired from Eq. (2) where the HOT was used to describe the signal in the entire q-space. The modified model leads to an analytical series expansion of the EAP in Hermite polynomials. In [15], the authors proposed to use tensors to describe a single R 0-shell of the EAP, \(P(R_{0} \frac{\mathbf{r}} {\vert \vert \mathbf{r}\vert \vert })\). They used Hermite polynomials to describe the dMRI signal, since under certain constraints the Fourier transform of Hermite polynomials are homogeneous forms or tensors. Note that [40] and [15] used the same dual Fourier bases but in the opposite spaces to analytically resolve the Fourier transform.

The first attempt to estimate the EAP analytically was based on the tensor model in Eq. (3), where the HOTs represented the cumulant tensors of the EAP since the dMRI signal is also the characteristic function of the EAP. The authors in [77, 78], proposed to use the Gram-Charlier series to compute a series estimate of the EAP from the first four cumulant tensors, i.e., covariance (diffusion) and kurtosis. In theory, the Gram-Charlier series could be improved by the Edgeworth series [45].

In [69], the authors computed the ODF directly from the first four cumulant tensors – diffusion and kurtosis. In contrast to [77, 78], they do not estimate the full EAP, but only its radial marginalization.

5.3 Finding Maxima of the Homogeneous Form

The maxima of many orientation distribution functions in dMRI, which can be represented in the HOT or SH bases, indicate underlying fiber directions. It is, therefore, crucial to compute these maxima with high precision.

The simplest approach is to discretely sample the homogeneous form on a spherical mesh and to compare its values on the finite vertices to approximately identify the maxima [54]. However, even a 16th order tessellation of the icosahedron or 1,281 vertices on the sphere can lead to an error of ∼ 4. Numerical optimization techniques such as Newton-Raphson and Powell’s methods have been used in the SH basis [58, 107] to overcome this limitation. In [55], numerical optimization was combined with the Euler integration step of a tractography algorithm in the tensor basis to trace fibers efficiently.

However, such local optimization techniques are highly dependent on initialization. In [23] and [47] two methods were shown for computing all the stationary points of a homogeneous form . In [23], the Z-eigenvalue/eigenvector formulation was used and a system of two polynomials in two variables – the homogeneous form and the unit sphere constraint – was solved using resultants (detailed in [95]). The stationary points were then classified by their principal curvatures into maxima, minima and saddle-points. In [46], the gradient of the homogeneous form constrained to the unit sphere – a system of four polynomials – was equated to zero. The roots of the system were computed by the subdivision method which ensures that all roots are analytically bracketed thus missing none. The stationary points were then classified into maxima, minima and saddle points using the Bordered Hessian.

5.4 Applications of Tensor Decompositions and Approximations

There are four lines of work that have applied tensor decompositions in the context of diffusion MRI. The first results from considering normal distributions of second-order diffusion tensors, which involve a fourth-order covariance tensor Σ. When the diffusion tensor is written as a vector, Σ is naturally represented by a 6 × 6 symmetric positive definite matrix S, to which the spectral decomposition into eigenvalues and eigentensors can be applied, in order to facilitate visualization and quantitative analysis [20]. Alternatively, Σ can be expressed in a local coordinate frame that is derived from invariant gradients and rotation tangents [63]. The coordinates in this frame isolate physically and biologically meaningful components such as variability that can be attributed to changes in trace, anisotropy, or orientation.

Second, the distribution of fiber orientation estimates, either from the diffusion tensor or from HARDI, has been modeled by mapping the corresponding probability measure into a reproducing kernel Hilbert space. With a power-of-cosine kernel, this results in a higher-order tensor representation, which can be decomposed into a rank-1 approximation and a non-negative residual to visually and quantitatively investigate the uncertainty in fiber estimates from diffusion MRI [99].

Third, in the framework described in detail in Sect. 4.3, a low-rank approximation of fODF tensors provides a less biased estimate of principal directions than fODF maxima. It has been shown [101] that this model can be used to approximate and to more efficiently and robustly fit the ball-and-multi-stick model [22]. Subsequent work has imposed an additional non-negativity constraint during deconvolution, and proposed an alternative optimization algorithm [62]. Low-rank approximations were shown to produce useful estimates of crossing fibers even from a relatively small number of gradient directions [49].

Finally, another line of work has attempted to decompose higher-order diffusion tensors in order to obtain crossing fiber directions [59, 115]. However, these techniques are yet to be validated on synthetic data with varying crossing angles, and have not yet been shown to reconstruct known fiber crossings in real data.

5.5 Finslerian Tractography

DTI streamline tracking can be generalized to HARDI by means of Finsler geometry . A second-order Finsler metric tensor can be defined at each point q from an ODF in the following way [4, 5, 7, 34]

$$\displaystyle{ \hat{F}(\mathbf{q},\mathbf{x}) = \left (\sum \nolimits _{i_{i}\ldots i_{p}}\mathcal{F}_{i_{1}\ldots i_{p}}(\mathbf{q})x^{i_{1} }\ldots x^{i_{p} }\right )^{1/p},\;\;g_{\mathit{ ij}}(\mathbf{q},\mathbf{x}) = \frac{1} {2} \frac{\partial \hat{F}^{2}(\mathbf{q},\mathbf{x})} {\partial x^{i}\partial x^{j}} }$$
(45)

where \(\mathcal{F}\) is an ODF tensor of (even) order p, \(\hat{F}\) is the Finsler function and g ij is the Finsler metric, \(i,j = 1,\ldots,3\), which depends on both position and direction. Note that this definition of the Finsler function \(\hat{F}\) is by no means unique. In fact, this is still a subject of intensive research (see chapter “Riemann-Finsler Geometry for Diffusion Weighted Magnetic Resonance Imaging”). Thus a local diffusion tensor can be obtained per direction. Tracking can be performed by extracting the principal eigenvector of the diffusion tensor corresponding to the arrival direction. As long as this direction is sufficiently aligned to the eigenvector, and the diffusion tensor FA is above a certain treshold, tracking continues. Experiments on Finsler streamline tracking using fourth-order tensors have been presented on simulated fiber crossings and real HARDI data. It has been shown that Finsler streamlines can, unlike DTI streamlines, correctly cope with nerve fiber bundle crossings.

5.6 Registration and Atlas Construction

Registration transforms data sets from different times or subjects to a common coordinate system, so that anatomical structures align. Atlas construction is based on registering a large number of subjects, in order to obtain a description of average anatomy, and of the most common modes of variation. Modeling parameters of the diffusion process with higher-order tensors makes registration of tensor fields a relevant research problem. Registration requires selection of an appropriate metric to measure the dissimilarity between individual tensors; for this purpose, Barmpoutis et al. [12, 14] propose two alternative choices, which are both scale and rotation invariant. Integrating the local dissimilarity over the domain of the tensor field results in an overall measure of dissimilarity. Registration is achieved by finding the coordinate transformation that minimizes this measure.

It is important to also transform the individual tensors according to the coordinate transformation applied to the domain of the field. For example, when the domain of the tensor field is rotated, a corresponding rotation of the tensors themselves is required in order to preserve relevant structures, such as the trajectories of nerve fiber bundles. When the transformation is (locally) affine, it has been proposed to simply apply it to the tensors via Eq. (11) [14]. Alternative methods for transformation have been proposed based on the spectral decomposition [96] and different sum-of-squares parametrizations [9, 48, 96].

6 Conclusion

The wide range of models and computational methods that have been surveyed in this chapter testify to the power and flexibility that higher-order tensors provide for the analysis of data from diffusion MRI, and to the increasing momentum of the research associated with this topic. Generalized eigenvalues, scalar invariants, tensor decompositions, and low-rank approximations have all proven valuable in the context of this application.

Looking ahead, several theoretical problems remain to be solved. While many approaches have focused on the properties of individual tensors, less attention has been paid to the global nature of the tensor fields that arise in diffusion MRI. The recent use of Finsler geometry is a natural step in this direction.

Even though low-rank approximations have proven to work well in practice, uniqueness of approximations over the reals is mostly open (for the complex case, see [66]). Moreover, we are still lacking algorithms with provable convergence properties, and formal results on the well-conditionedness of such approximations.

Many approaches have been proposed to ensure non-negativity of higher-order tensors that model apparent diffusivities (cf. Sect. 4.1). Less attention has been paid to the fitting of deconvolution models, which are constrained to the convex cone of tensors that can be expressed as a positive sum of rank-1 tensors; in general, that is a stricter constraint than non-negativity.

While many neuroscientific studies that use diffusion imaging are now published each month, they still almost exclusively use either the second-order diffusion tensor [21] or the ball-and-stick model [22]. A challenge in the next few years will be to take approaches based on higher-order tensors into the application domain. This will require more work on several subproblems:

Statistical tests on scalar invariants such as Mean Diffusivity or Fractional Anisotropy are a mainstay of DTI-based studies. Even though a considerable number of invariants have now been derived from higher-order tensors (cf. Sect. 5.1), the practical utility of many of them is limited by their unclear biological or neuroanatomical interpretation.

Given an ever-increasing palette of models, it becomes a more urgent problem to pick one of them to test a given hypothesis, and to choose values for parameters such as tensor order, approximation rank, or regularization weights. Improved understanding of formal relationships between different models and mathematical rules for model selection are required.

Spatial coherence and signal sparsity need to be exploited in order to reliably estimate the large number of parameters in advanced models such as the ensemble average propagator, without requiring excessively time consuming measurements.