1 Introduction

Differential equations on high-dimensional state spaces arise in many different fields and applications. Typical examples are the Black-Scholes equation for basket options in mathematical finance, the chemical master equation used in systems biology to model the stochastic dynamics of a gene regulatory network, or the Schrödinger equation which describes the dynamics of the particles within a molecule according to the laws of quantum mechanics. Solving such problems numerically is notoriously difficult due to the fact that for standard discretizations the number of unknowns grows exponentially with respect to the dimension (“curse of dimensionality”). Hence, in high-dimensional approximations the main challenge is to compress the problem in such a way that the reduced equation can be solved numerically and nevertheless provides an acceptable approximation to the solution of the full problem.

If a multivariate function on a hyper-rectangle is discretized by an equidistant grid, the function values at the grid points can be regarded as the entries of a high-dimensional tensor. Hence, the approximation of high-dimensional functions and tensors is closely related, and for \(N_{1}, \ldots, N_{d} \in\mathbb{N}\) we will consider tensors \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) as functions Y=Y(i 1,…,i d ) with i k ∈{1,…,N k }. For a straightforward, entry-wise representation of Y, all \(\prod_{k=1}^{d}N_{k}\) values have to be stored. This number can be significantly reduced if Y can be approximated by a sum of products of univariate functions. Many such representations have been proposed, and an overview over the literature is given in [3, 9]. A particularly useful and popular representation is the orthogonal Tucker format

$$ Y(i_{1}, \ldots, i_{d}) \approx\widetilde{Y}(i_{1}, \ldots, i_{d}) = \sum _{j_{1}=1}^{r_{1}} \ldots\sum _{j_{d}=1}^{r_{d}} a(j_{1}, \ldots, j_{d}) \prod_{k = 1}^d U^k_{j_k}(i_{k}), $$
(1.1)

see, e.g., [9, Sect. 4]. Here, \(a\in\mathbb{R}^{r_{1} \times\cdots\times r_{d}}\) is the core tensor of coefficients, and for every direction k∈{1,…,d}, \(U^{k}_{1}, \ldots, U^{k}_{r_{k}}\) is a set of univariate basis functions which satisfy the orthonormality conditions

$$\begin{aligned} \sum_{i_k=1}^{N_k}U^k_{j}(i_{k})U^k_{m}(i_{k}) = \delta_{j,m} := \left\{ \begin{array}{l@{\quad}l} 1 & \mbox{if } j=m \\ 0 & \mbox{else} \end{array} \right. \end{aligned}$$

for all j,m∈{1,…,r k }. From the perspective of linear algebra, every

$$ U^k_{j}:\{1, \ldots, N_k\} \rightarrow \mathbb{R} $$

can be interpreted as a vector. Hence, (1.1) is a high-dimensional generalization of the well-known singular value decomposition, and \(\widetilde{Y}\) is a low-rank-approximation of Y(i 1,…,i d ), cf. [1]. For the representation (1.1) of \(\widetilde {Y}\) only

$$\begin{aligned} \prod_{k = 1}^{d}r_{k} + \sum _{k = 1}^{d}r_k N_k \end{aligned}$$

degrees of freedom have to be stored, which is a significant improvement if r k N k for all k. Unfortunately, the fact that all possible combinations appear in the right-hand side of (1.1) causes again an exponential growth of the data (unless r k decreases to 1 when k→∞). For example, if r 1=⋯=r d =r, then the core tensor a(j 1,…,j d ) has r d entries.

The hierarchical Tucker format avoids the disadvantageous growth of degrees of freedom by using the approach (1.1) in a recursive way. For example, with univariate functions \(U^{\{k\}}_{j_{k}}:\{1, \ldots, N_{k}\}\rightarrow\mathbb{R}\) and coefficients \(\mathcal{B}^{\{1,2\}}_{m,j_{1},j_{2}}, \mathcal{B}^{\{3,4\} }_{l,j_{3},j_{4}} \in\mathbb{R}\), one can define bivariate functions

$$\begin{aligned} U^{\{1,2\}}_{m}(i_{1},i_{2}) =& \sum _{j_{1}=1}^{r_{1}}\sum _{j_{2}=1}^{r_{2}} \mathcal{B}^{\{1,2\}}_{m,j_{1},j_{2}}U^{\{1\} }_{j_1}(i_{1})U^{\{2\}}_{j_2}(i_{2}) \\ U^{\{3,4\}}_{l}(i_{3},i_{4}) =& \sum _{j_{3}=1}^{r_{3}}\sum _{j_{4}=1}^{r_{4}} \mathcal{B}^{\{3,4\}}_{l,j_{3},j_{4}}U^{\{3\} }_{j_3}(i_{3})U^{\{4\}}_{j_4}(i_{4}), \end{aligned}$$

and then construct a four-dimensional tensor via

$$\begin{aligned} Y(i_{1},i_{2},i_{3},i_{4}) =& \sum_{m=1}^{r_{\{1,2\}}}\sum _{l=1}^{r_{\{3,4\}}} \mathcal{B}^{\{1,2,3,4\}}_{m,l}U^{\{1,2\} }_{m}(i_{1},i_{2})U^{\{3,4\}}_{l}(i_{3},i_{4}) \end{aligned}$$

for some \(\mathcal{B}^{\{1,2,3,4\}}_{m,l}\in\mathbb{R}\). This idea has first been used in quantum chemistry to solve high-dimensional Schrödinger equations [13, 15, 16] and later been investigated from a mathematical point of view in [2, 4, 12, 14]; see also [10] for a useful Matlab toolbox. A formal definition of the hierarchical Tucker format will be given in Sect. 2. Due to the recursive definition, the hierarchical version is technically more involved than the standard Tucker format, but as we will discuss in Sect. 2, this avoids the exponential growth of the degrees of freedom.

The problem how a given tensor can be approximated in the hierarchical Tucker format up to a prescribed error tolerance has been studied in [2]. In the context of high-dimensional differential equations, however, an approximation for the unknown solution is sought after, and for time-dependent differential equations the question arises how an approximation in the hierarchical Tucker format can be propagated in time. For the non-hierarchical Tucker format, equations of motion for the core tensor and the basis functions have been derived in [8, 11] from the Dirac-Frenkel variational principle, and error bounds for the dynamical low-rank approximation in the non-hierarchical Tucker format have been proven in [8]. The main contribution of our article is to extend these results to the hierarchical Tucker format. We remark, however, that very similar results have independently and simultaneously been obtained in [12].

In Sect. 2 we introduce our notation, define the hierarchical Tucker format and discuss some of its properties. In Sect. 3, the hierarchical Tucker format is applied to approximate high-dimensional initial-value problems. In particular, differential equations for the parts of the representation are derived from the Dirac-Frenkel variational principle. The last three sections are devoted to the analysis of the accuracy of this approximation. The analysis is based on matricizations of tensors which enable a compact representation of certain projections defined in Sect. 4. The second important ingredient for the error analysis are curvature bounds which are proven in Sect. 5. Finally, an error bound for the approximation of high-dimensional differential equations in the hierarchical Tucker format is proven in Sect. 6.

2 Hierarchical low-rank representation of tensors: the \(\mathcal{H}\)-Tucker format

A tensor \(T \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) is a d-dimensional array with \(\prod_{j = 1}^{d} N_{j}\) entries. Defining the index sets

$$ I_j = \{1 , \ldots, N_j \}, \quad j \in\{1, \ldots, d \} $$

leads to the natural interpretation of tensors as multivariate functions

$$ T : I_1 \times\cdots\times I_d \rightarrow\mathbb{R}, \quad T=T(i_{1}, \ldots, i_{d}). $$

This interpretation is advantageous as it simplifies the construction of high-dimensional tensors as products of low-dimensional ones which are considered as functions that only depend on a subset μ={μ 1,…,μ m }⊆{1,…,d} of directions, i.e.

$$\begin{aligned} T : I_{\mu} \rightarrow\mathbb{R} \quad \mbox{with } I_{\mu} = I_{\mu_{1}} \times\cdots\times I_{\mu_{m}} . \end{aligned}$$

In order to define the multiplication of two tensors \(U : I_{\mu} \rightarrow\mathbb{R}\) and \(V : I_{\nu} \rightarrow \mathbb{R} \) for μ,ν⊆{1,…,d}, both tensors are interpreted as functions \(\widetilde{U}\) and \(\widetilde{V}\) in all variables κ={κ 1,…,κ k }=μν. For example, if U=U(i 2,i 4,i 7) and V=V(i 2,i 5), then we let \(\widetilde{U}(i_{2},i_{4},i_{5},i_{7})=U(i_{2},i_{4},i_{7})\) for all i 5I 5 and \(\widetilde {V}(i_{2},i_{4},i_{5},i_{7})=V(i_{2},i_{5})\) for all i 4I 4 and i 7I 7. Then, the product UV is to be understood as the pointwise product of \(\widetilde{U}\) and \(\widetilde {V}\), i.e.

$$\begin{aligned} (U \cdot V) : I_{\kappa} \rightarrow\mathbb{R}, \quad (U \cdot V) (i_{\kappa_1} , \ldots, i_{\kappa_k}) = \widetilde{U}(i_{\kappa_1} , \ldots, i_{\kappa_k})\widetilde{V}(i_{\kappa_1} , \ldots, i_{\kappa_k}). \end{aligned}$$

Besides multiplication of tensors, a second operation will be important in this article. Let μ={μ 1,…,μ m } and ν={ν 1,…,ν n } be subsets of {1,…,d}, let again κ={κ 1,…,κ k }=μν, and let λ={λ 1,…,λ m }=(μν)∖(μν) be the indices which occur either in μ or in ν. Then, with the convention that \(\mathbb{R}^{\emptyset} = \mathbb{R}\), we define

$$ \begin{aligned} &\langle{\cdot} , {\cdot} \rangle : \mathbb{R}^{N_{\mu_1} \times \cdots\times N_{\mu_m}} \times \mathbb{R}^{N_{\nu_1} \times \cdots\times N_{\nu_n}} \rightarrow\mathbb{R}^{N_{\lambda_1} \times \cdots\times N_{\lambda_l}} \\ &\langle U, V \rangle(i_{\lambda_1} , \ldots, i_{\lambda_l}) = \sum_{i_{\kappa_1}\in I_{\kappa_1}}^{(\kappa_1\in\mu\cap\nu)} \ldots\sum _{i_{\kappa_k}\in I_{\kappa_k}}^{(\kappa_k\in\mu\cap\nu)} (U \cdot V) (i_{\kappa_1} , \ldots, i_{\kappa_k}) \end{aligned} $$
(2.1)

where the symbol \(\sum_{i_{\kappa_{j}}\in I_{\kappa_{j}}}^{(\kappa_{j}\in \mu\cap\nu)}\) means that the sum is taken only if κ j μν. Hence, after multiplying the two tensors we take the sum over all common directions of the two tensors. For example, in case of the tensors U=U(i 2,i 3,i 5,i 9) and V=V(i 2,i 4,i 5), (2.1) simply reduces to

$$\begin{aligned} \langle U, V \rangle(i_{3},i_{4},i_{9}) = & \sum_{i_{2}\in I_{\kappa_2}}\sum_{i_{5}\in I_{\kappa _5}}U(i_{2},i_{3},i_{5},i_{9})V(i_{2},i_{4},i_{5}) \\ = & \sum_{i_{2}=1}^{N_{\kappa_2}}\sum _{i_{5}=1}^{N_{\kappa_5}}U(i_{2},i_{3},i_{5},i_{9})V(i_{2},i_{4},i_{5}). \end{aligned}$$

The result is a tensor which depends on all directions which are not common directions of U and V. Hence, if U and V depend on exactly the same directions (i.e. μν=μ=ν), then \(\langle U,V \rangle\in\mathbb {R}^{\emptyset} = \mathbb{R}\) is a real number, and it can be verified that 〈⋅,⋅〉 defines a scalar product in this case. If both U=U(i j ) and V=V(i j ) are one-dimensional tensors with respect to the same direction, then this scalar product coincides with the Euclidean scalar product of the two vectors (U(1),…,U(N j )) and (V(1),…,V(N j )). This is why we have chosen the symbol 〈⋅,⋅〉.

The numerical evaluation of 〈U,V〉 can be simplified if U and V are products of tensors. If, for example, U=U(i 2,i 5)=U {2}(i 2)U {5}(i 5) and V=V(i 2,i 5)=V {2}(i 2)V {5}(i 5), then

$$\begin{aligned} \langle U,V \rangle = & \sum_{i_{2}\in I_{\kappa_2}}\sum _{i_{5}\in I_{\kappa_5}}U^{\{ 2 \} }(i_2)V^{\{ 2 \} }(i_2) U^{\{ 5 \} }(i_5)V^{\{ 5 \} }(i_5) \\ = & \bigl\langle U^{\{ 2 \} },V^{\{ 2 \} } \bigr\rangle\cdot\bigl\langle U^{\{ 5 \} },V^{\{ 5 \} } \bigr\rangle. \end{aligned}$$
(2.2)

This will be frequently used in the proofs below.

The following definitions provide the framework of a hierarchical tensor structure.

Definition 2.1

(dimension tree)

A dimension tree T is a binary tree which represents a recursive decomposition of the set {1,…,d}. Every node \(\mathcal{N}\in T\) is a selection of directions, \(\mathcal {N}\subseteq\{1, \ldots, d \}\), where \(\mathcal{N}_{0} = \mathrm{root} (T) = \{1, \ldots, d \}\) is called the root of the tree. Apart from the root, every node \(\mathcal{N}\in T\) is linked by an edge with exactly one father \(\widehat{\mathcal{N}} \supset\mathcal{N}\). Conversely, every node is either linked with two successors (child nodes) \((\mathcal{N}_{1}, \mathcal{N}_{2} ) = \mathit{succ} (\mathcal{N})\), or it does not have any successors at all. The successors of \(\mathcal{N}\) have the properties that

$$\begin{aligned} \mathcal{N}_1 \cup\mathcal{N}_2 = \mathcal{N} \quad\text{and} \quad\mathcal{N}_1 \cap\mathcal{N}_2 = \emptyset. \end{aligned}$$

\(\mathcal{N}\) is an interior node if

$$ \mathcal{N}\in\mathcal{I}(T) := \{ \widetilde{\mathcal{N}} \in T : \widetilde{\mathcal{N}} \text{ has successors} \}, $$

and \(\mathcal{N}\) is called a leaf of the dimension tree T if

$$ \mathcal{N}\in\mathcal{L} (T) := \{ \widetilde{\mathcal{N}} \in T : \widetilde{\mathcal{N}} \text{ has no successors} \} = \bigl\{ \{1\}, \ldots, \{d\} \bigr\} = T\setminus \mathcal{I}(T). $$

Figure 1 shows an example of a dimension tree with inner nodes \(\mathcal{N}_{0}\), \(\mathcal{N}_{1}\) and leaves \(\mathcal{N}_{2}\), \(\mathcal{N}_{11}\), \(\mathcal{N}_{12}\).

Fig. 1
figure 1

Dimension tree T

Definition 2.2

(hierarchical Tucker format)

Let T be a dimension tree with root \(\mathcal{N}_{0}\) and let \(( r_{\mathcal{N}})_{\mathcal{N}\in T}\) be a family of positive integers with \(r_{\mathcal{N}_{0}} = 1\). Y is a tensor in the hierarchical Tucker format if and only if there exist transfer tensors

$$ \mathcal{B}^\mathcal{N}\in\mathbb{R}^{r_\mathcal{N}\times r_{\mathcal{N}_1} \times r_{\mathcal{N}_2}} \quad\text{for all } \mathcal{N}\in\mathcal{I} (T) \text{ with }(\mathcal{N}_1 , \mathcal{N}_2) = \mathit{succ}(\mathcal{N}) $$

and univariate functions

$$\begin{aligned} U^\ell_i: I_\omega\rightarrow\mathbb{R} \quad\mbox{for all } \ell=\{\omega\} \in\mathcal{L}(T), \ \omega \in\{1, \ldots, d\}, \ i=1, \ldots, r_{\ell} \end{aligned}$$

such that Y can be represented by

$$ Y = \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l , \quad( \mathcal{N}_{01} , \mathcal{N}_{02}) = \mathit{succ}(\mathcal{N}_0) $$

with \(\boldsymbol{\mathcal{N}}\) -frames

$$ U^\mathcal{N}_i : I_\mathcal{N}\rightarrow\mathbb{R} \quad\text{for all } \mathcal{N}\in T\setminus\{\mathcal{N}_0\}, \ i = 1 , \ldots, r_\mathcal{N} $$

defined by the recursion

$$ U^\mathcal{N}_i = \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \quad \text{for all } \mathcal{N}\in\mathcal{I}(T) ,\ (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ}(\mathcal{N}). $$

The family \((r_{\mathcal{N}})_{\mathcal{N}\in T}\) is called the hierarchical representation rank of Y. The set of all tensors in the hierarchical Tucker format with hierarchical representation rank \((r_{\mathcal{N}})_{\mathcal{N}\in T}\) is denoted by \(\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\).

Remark

  1. 1.

    Since the transfer tensors in Definition 2.2 are always three-dimensional objects, we do not interpret these particular tensors as multivariate functions.

  2. 2.

    Every tensor \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) can be reconstructed from the transfer tensors \((\mathcal{B}^{\mathcal{N}})_{\mathcal{N}\in\mathcal{I}(T)}\) and the functions \((U^{\ell})_{\ell\in\mathcal{L}(T)} \) via the above recursion. The \(\mathcal{N}\)-frames at the interior nodes can be considered as auxiliary variables which are only used to reconstruct the full tensor Y if necessary.

  3. 3.

    Definitions 2.1 and 2.2 have been adapted from Definitions 3.3 and 3.6 in [2]. At this point, it is not yet assumed that the \(\mathcal{N}\)-frames are linearly independent. For the approximation of time-dependent problems, however, we will later have to assume that the \(\mathcal{N}\)-frames at every node \(\mathcal{N}\in T\setminus\{ \mathcal{N}_{0}\}\) are an orthonormal basis, and that the above representation is not redundant, cf. Assumptions 1 and 2 below.

  4. 4.

    A special case of the hierarchical Tucker format is the so-called tensor train format which has been investigated, e.g., in [5].

At this point, a comparison with the standard (non-hierarchical) Tucker format is helpful. For the hierarchical Tucker format, an equation similar to (1.1) can be obtained, but due to the recursive definition of the \(\mathcal {N}\)-frames, the corresponding formula is much more complicated. The crucial difference is the fact that the number of terms in the linear combination can be considerably lower. In fact, the total number of degrees of freedom in the hierarchical Tucker format is at most

$$ \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle (\mathcal{N}_1 ,\mathcal{N}_2) = \mathit{succ}(\mathcal{N})}} r_\mathcal{N} r_{\mathcal{N}_1} r _{\mathcal{N}_2} + \sum_{k = 1}^{d} r_{\{k\} } N_k $$

because every \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) is defined by the r basis functions at the leaves ={k} with k=1,…,d and by the entries of all transfer tensors, cf. Lemma 3.7 in [2].

Lemma 1

(orthonormality of transfer tensors)

Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor represented by \(((\mathcal{B}^{\mathcal{N}})_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)})\). Then, the following statements are equivalent:

  1. (a)

    The \(\mathcal{N}\)-frames \(( U^{\mathcal{N}} )_{\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \} }\) are orthonormal with respect to mapping (2.1), i.e.

    $$ \bigl\langle U^\mathcal{N}_{i} , U^\mathcal{N}_{j} \bigr\rangle= \delta_{i,j} $$

    holds for all \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \} \) and \(i,j \in\{1, \ldots, r_{\mathcal{N}}\} \).

  2. (b)

    For all \(\mathcal{N}\in\mathcal {I}(T)\setminus\{ \mathcal{N}_{0}\}, (\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \dots, r_{\mathcal{N}}\}\) the equation

    $$ \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j,l} \mathcal{B}^\mathcal{N}_{i_2,j,l} = \delta_{i_1,i_2} $$
    (2.3)

    holds and the -frames \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) are orthonormal with respect to mapping (2.1), i.e.

    $$ \bigl\langle U^\ell_{i} , U^\ell_{j} \bigr\rangle= \delta_{i,j} $$

    holds for all \(\ell\in\mathcal{L}(T), i,j \in\{1, \ldots, r_{\ell}\}\).

Proof

Let \(\mathcal{N}\in\mathcal{I}(T)\) be a node in the interior of the dimension tree, \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \ldots,r_{\mathcal{N}}\}\). If (a) holds, then it follows from Definition 2.2 that

$$\begin{aligned} \bigl\langle U^\mathcal{N}_{i_1} , U^\mathcal{N}_{i_2} \bigr\rangle = & \sum_{j_1 = 1}^{r_{\mathcal{N}_1}} \sum _{l_1 = 1}^{r_{\mathcal{N}_2}} \sum _{j_2 = 1}^{r_{\mathcal{N}_1}} \sum_{l_2 = 1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j_1,l_1} \mathcal{B}^\mathcal{N}_{i_2,j_2,l_2} \bigl\langle U^{\mathcal{N}_1}_{j_1} \cdot U^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_1}_{j_2} \cdot U^{\mathcal{N}_2}_{l_2} \bigr \rangle \\ = & \sum_{j_1 = 1}^{r_{\mathcal{N}_1}} \sum _{l_1 = 1}^{r_{\mathcal{N}_2}} \sum _{j_2 = 1}^{r_{\mathcal{N}_1}} \sum_{l_2 = 1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j_1,l_1} \mathcal{B}^\mathcal{N}_{i_2,j_2,l_2} \bigl\langle U^{\mathcal{N}_1}_{j_1} , U^{\mathcal{N}_1}_{j_2} \bigr\rangle\bigl\langle U^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_2}_{l_2} \bigr\rangle \\ = & \sum_{j = 1}^{r_{\mathcal{N}_1}} \sum_{l = 1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j,l} \mathcal{B}^\mathcal{N}_{i_2,j,l} \end{aligned}$$
(2.4)

which proves (a) ⇒ (b). Now assume that (b) holds, and that \(\mathcal{N}_{1}, \mathcal{N}_{2} \in\mathcal{L}(T)\). Then, Eq. (2.4) yields that

$$\begin{aligned} \bigl\langle U^\mathcal{N}_{i_1} , U^\mathcal{N}_{i_2} \bigr\rangle= \delta_{i_1,i_2} \end{aligned}$$

for the father \(\mathcal{N}\) of \(\mathcal{N}_{1}\) and \(\mathcal{N}_{2}\), and by induction over the levels of the tree we obtain (b) ⇒ (a). □

Assumption 1

For every tensor Y in the hierarchical Tucker format we assume the \(\mathcal{N}\)-frames \(( U^{\mathcal{N}} )_{\mathcal{N}\in T \setminus\{\mathcal{N}_{0}\}}\) to be orthonormal with respect to mapping (2.1), i.e.

$$ \bigl\langle U^\mathcal{N}_i , U^\mathcal{N}_j \bigr\rangle= \delta_{i,j} \quad\text{for all } \mathcal{N}\in \mathcal{I}(T)\setminus \{\mathcal{N}_0\} ,\ i,j \in\{1, \ldots, r_\mathcal{N}\}. $$

Since for any \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) we can find a representation with orthonormal \(\mathcal{N}\)-frames, Assumption 1 is not a restriction; see [2, Algorithm 3] for an orthonormalization algorithm. It will be shown in Lemma 4 that when the tensor Y approximates the time-dependent solution of a high-dimensional differential equation, the differential equations for the \(\mathcal{N}\)-frames preserve the orthonormality for all times.

An important consequence of Assumption 1 is that for \(\mathcal{N}\in\mathcal{I}(T)\) with \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\), we have

$$\begin{aligned} \bigl\langle U^{\mathcal{N}_1}_k , U^\mathcal{N}_i \bigr\rangle= \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle U^{\mathcal{N}_1}_k , U^{\mathcal{N}_1}_j \bigr\rangle U^{\mathcal{N}_2}_l = \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,k,l} U^{\mathcal{N}_2}_l \end{aligned}$$

for any \(k\in\{1, \ldots, r_{\mathcal{N}_{1}}\}\) and hence

$$\begin{aligned} \sum_{k=1}^{r_{\mathcal{N}_1}} U^{\mathcal{N}_1}_k \bigl\langle U^{\mathcal{N}_1}_k , U^\mathcal{N}_i \bigr\rangle= \sum_{k=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,k,l} U^{\mathcal{N}_1}_k \cdot U^{\mathcal{N}_2}_l = U^\mathcal{N}_i. \end{aligned}$$

Substituting this into the recursive representation of Y yields

$$\begin{aligned} \sum_{k=1}^{r_{\mathcal{N}_1}} U^{\mathcal{N}_1}_k \bigl\langle U^{\mathcal{N}_1}_k , Y \bigr\rangle= Y \end{aligned}$$
(2.5)

for all \(\mathcal{N}_{1} \in T \setminus\{ \mathcal{N}_{0} \}\). These and similar formulas will be frequently used henceforth.

Lemma 2

(minimal representation rank)

Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format represented by \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)} , ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\). Assume that the tensors

$$ \bigl\langle Y , U^\mathcal{N}_1 \bigr\rangle, \ldots, \bigl \langle Y , U^\mathcal{N}_{r_\mathcal{N}} \bigr\rangle $$

are linearly independent for all \(\mathcal{N}\in T\setminus\{ \mathcal{N}_{0} \}\). Then, Y has minimal representation rank, which means that there exists no other family of positive integers \((r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T}\) and no node \(\mathcal{N}^{*} \in T\) with

$$\begin{aligned} r_\mathcal{N}^* \leq r_\mathcal{N} \quad \textit{for all } \mathcal{N}\in T \textit{ and } r_{\mathcal{N}^*}^* < r_{\mathcal{N}^*}, \end{aligned}$$
(2.6)

such that \(Y \in\mathcal{H}\textit{-Tucker} ( ( r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T} )\).

Proof

Suppose that Y can also be represented by \(( ( \widetilde{\mathcal{B}}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( \widetilde{U}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{L}(T)} )\) and that the hierarchical representation rank \((r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T}\) of this representation has the properties (2.6). According to (2.5) we have

$$ Y = \sum_{i^* = 1}^{r^*_{\mathcal{N}^*}} \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigl\langle Y , \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigr\rangle= \sum _{i = 1}^{r_{\mathcal{N}^*}} U^{\mathcal{N}^*}_{i} \bigl\langle Y , U^{\mathcal{N}^*}_{i} \bigr\rangle. $$

Substituting the first representation into the second yields

$$ Y = \sum_{i = 1}^{r_{\mathcal{N}^*}} U^{\mathcal{N}^*}_{i} \sum_{i^* = 1}^{r^*_{\mathcal{N}^*}} \bigl\langle U^{\mathcal{N}^*}_{i} , \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigr\rangle\bigl\langle Y , \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigr \rangle $$

and hence

$$ \bigl\langle Y , U^{\mathcal{N}^*}_{i} \bigr\rangle= \sum _{i^* = 1}^{r^*_{\mathcal{N}^*}} \bigl\langle U^{\mathcal{N}^*}_{i} , \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigr\rangle\bigl\langle Y , \widetilde{U}^{\mathcal{N}^*}_{i^*} \bigr\rangle. $$

This is a contradiction to the assumption that \(\langle Y , U^{\mathcal{N}^{*}}_{1} \rangle, \ldots, \langle Y , U^{\mathcal{N}^{*}}_{r_{\mathcal{N}^{*}}} \rangle\) are linearly independent. □

Remark

It can be shown that the converse assertion is also true.

Assumption 2

For every tensor Y in the hierarchical Tucker format and for every \(\mathcal{N}\in T \setminus\{\mathcal{N}_{0}\}\) we assume

$$ \bigl\langle Y , U^\mathcal{N}_1 \bigr\rangle, \ldots, \bigl \langle Y , U^\mathcal{N}_{r_\mathcal{N}} \bigr\rangle $$

to be linearly independent.

As shown in Sect. 4, under Assumption 2 the set \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) is a smooth manifold, which is essential in our approach. Furthermore we will use the term ‘hierarchical rank’ instead of ‘hierarchical representation rank’ and denote the \(\mathcal{N}\)-frames as basis functions. Assumption 2 and Lemma 2 motivate the following definition.

Definition 2.3

(Single-hole operator)

Let \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format, which is represented by the tuple \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal {I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\), \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0}\}\) and \(i \in\{1,\ldots , r_{\mathcal{N}}\}\). The single-hole operator \(\mathcal{S}\) is defined as

$$ \mathcal{S} (Y, \mathcal{N}, i) := \bigl\langle Y , U^\mathcal{N}_i \bigr\rangle. $$

This is a generalization of the single-hole functions defined in Sect. II.3.3 in [11].

Remark

With the single-hole operator (2.5) reads

$$\begin{aligned} Y = \sum_{i = 1}^{r_\mathcal{N}} \mathcal{S} (Y, \mathcal{N}, i) \cdot U^\mathcal{N}_i \end{aligned}$$
(2.7)

for all \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}),\mathcal{N}\in T \setminus\{\mathcal{N}_{0} \}\).

Definition 2.4

For \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}), \mathcal{N}\in T \setminus\{\mathcal{N}_{0} \} \) and \(X \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) the orthogonal projection of X onto the space spanned by \(U^{\mathcal{N}}_{1} , \ldots, U^{\mathcal{N}}_{r_{\mathcal{N}}}\) is denoted by

$$ \mathcal{P}_\mathcal{N}X = \sum_{i=1}^{r_\mathcal{N}} \bigl\langle X , U^\mathcal{N}_i \bigr\rangle U^\mathcal{N}_i. $$

The orthogonal projection onto the orthogonal complement is given by \(\mathcal{P}_{\mathcal{N}}^{\bot}X = X- \mathcal{P}_{\mathcal{N}}X \).

3 Solving high-dimensional initial value problems via hierarchical tensor approximation

In this section we show how the hierarchical Tucker format can be used to approximate the time-dependent solution \(Y_{ex}(t)\in\mathbb {R}^{N_{1} \times\cdots\times N_{d}}\) of a high-dimensional ordinary differential equation

$$ \dot{Y}_{ex}= F(Y_{ex}) , \quad t \geq0 , \quad F: \mathbb{R}^{N_1 \times\cdots\times N_d} \rightarrow\mathbb{R}^{N_1 \times\cdots\times N_d} $$
(3.1)

with a given initial value \(Y_{ex}(0) \in\mathbb{R}^{N_{1} \times \cdots\times N_{d}}\) and a function F mapping tensors to tensors. Such problems arise, e.g., when the method of lines is applied to a high-dimensional partial differential equation. If the partial differential equation is linear, then the function F is linear, too. The exact solution Y ex (t) of (3.1) is a time-dependent, d-dimensional tensor with \(\prod_{\mu= 1}^{d} N_{\mu}\) entries. Hence, classical numerical schemes for ordinary differential equations cannot be applied if d≫3 due to the huge number of unknowns. Therefore, we are looking for an approximation YY ex which lies on the manifold

$$ \mathcal{M} := \mathcal{H}\text{-Tucker}\bigl((r_\mathcal {N})_{\mathcal{N}\in T}\bigr) $$

at any time. For this purpose, the variational principle of Dirac-Frenkel is applied (cf. [11]): Starting with an approximation of the initial value on the manifold, the right-hand side F(Y) is projected onto \(\mathcal {T}_{Y}\mathcal{M}\), the tangent space of \(\mathcal{M}\) at Y(t), for every t≥0; see Fig. 2 for an illustration.

Fig. 2
figure 2

Orthogonal projection on the tangent space

With P(Y) denoting the orthogonal projector on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) the differential equation for \(Y(t)\in\mathcal{M}\) takes the form

$$ \dot{Y} = P(Y) F(Y) \in\mathcal{T}_Y \mathcal{M}, \quad t \geq0, \quad Y(0) \in\mathcal{M} $$
(3.2)

or equivalently

$$ \bigl\langle F (Y) - \dot{Y} , \delta Y \bigr \rangle= 0 \quad\forall\delta Y \in\mathcal{T}_Y \mathcal{M}. $$

Even in applications where the original problem (3.1) is linear, the new differential equation (3.2) is nonlinear. The great advantage of (3.2) is the fact that for the propagation of Y(t) only the transfer tensors \(\mathcal{B}^{\mathcal{N}}(t)\), \(\mathcal{N}\in\mathcal{I}(T)\) and the basis functions \(U^{\ell}_{i}(t)\), \(\ell\in\mathcal{L}(T)\) for the time-dependent representation of Y(t) have to be computed. In order to make use of this advantage, however, (3.2) has to be replaced by differential equations for \(\mathcal{B}^{\mathcal{N}}(t)\) and \(U^{\ell}_{i}(t)\) instead of Y(t). This is the goal of this section.

Since every point Y on the manifold is determined by

$$ \bigl( \bigl( \mathcal{B}^\mathcal{N}\bigr)_{\mathcal{N}\in \mathcal{I}(T)}, \bigl(U^\ell\bigr)_{\ell\in\mathcal{L}(T)} \bigr), $$

the transfer tensors and the basis functions on the leaves of the dimension tree can be considered as the parameters of the representation. Let Ψ be the mapping which maps the parameters to the corresponding tensor, i.e.

\(\widetilde{Y}\) is constructed according to Definition 2.2. Then, each element δY of the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) can be written as

$$\begin{aligned} \delta Y = & \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \frac{\partial\varPsi}{\partial \widetilde{\mathcal{B}}^{\mathcal{N}_0}_{1,j,l}} \Bigg\vert_{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal{N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)}} \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} \\ &{} + \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \setminus\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \sum _{i=1}^{r_{\mathcal{N}}} \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \frac{\partial\varPsi}{\partial \widetilde{\mathcal{B}}^{\mathcal{N}}_{i,j,l}} \Bigg\vert_{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal {N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)}} \delta \mathcal{B}^{\mathcal{N}}_{i,j,l} \\ & {}+ \sum_{\ell\in\mathcal{L}(T)} \sum _{i=1}^{r_{\ell}} \frac{\partial\varPsi}{\partial \widetilde{U}^{\ell}_{i}} \Bigg\vert _{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal {N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)} } \delta U^{\ell}_i , \end{aligned}$$
(3.3)

where the tuple \(( ( \delta\mathcal{B}^{\mathcal{N}} )_{\mathcal {N}\in\mathcal{I}(T)}, ( \delta U^{\ell})_{\ell\in\mathcal{L}(T)} )\) represents all possible variations of the transfer tensors and the basis functions on the leaves of the dimension tree. Since Ψ is linear in each argument, we find with

$$\begin{aligned} \frac{\partial\varPsi}{\partial \widetilde{\mathcal{B}}^{\mathcal{N}_0}_{1,j,l}} \bigg\vert _{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal {N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)} } & = U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l, \quad ( \mathcal{N}_1, \mathcal{N}_2) = \mathit{succ}(\mathcal{N}_0) , \end{aligned}$$
(3.4)
$$\begin{aligned} \frac{\partial\varPsi}{\partial \widetilde{\mathcal{B}}^{\mathcal{N}}_{i,j,l}} \bigg\vert _{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal {N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)} } & = \mathcal{S} (Y, \mathcal{N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l, \quad (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ}(\mathcal{N}) \end{aligned}$$
(3.5)

and

$$\begin{aligned} \frac{\partial\varPsi}{\partial \widetilde{U}^{\ell}_{i}} \bigg\vert _{{\scriptstyle \widetilde{\mathcal{B}}^\mathcal{N}= \mathcal{B}^\mathcal {N}\forall\mathcal{N}\in\mathcal{I}(T)\atop\scriptstyle \widetilde{U}^\ell= U^\ell\forall\ell\in\mathcal{L}(T)} } & = \mathcal{S} (Y, \ell, i) \end{aligned}$$
(3.6)

expressions for every partial derivative in (3.3). Therefore, the tangent space can be written as

$$\begin{aligned} \mathcal{T}_Y \mathcal{M}= \bigl\{ \delta Y \in\mathbb{R}^{N_1 \times\cdots\times N_d} \colon\ & \text{there exists a tuple } \bigl( \bigl( \delta\mathcal {B}^\mathcal{N}\bigr)_{\mathcal{N}\in\mathcal{I}(T)}, \bigl(\delta U^\ell\bigr)_{\ell\in\mathcal{L}(T)} \bigr) \\ & \text{such that (3.3), (3.4), (3.5) and (3.6) hold} \bigr\}. \end{aligned}$$

Remark

Since Assumption 1 means no restriction for the set \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), the orthogonality conditions means no restriction for the tangent space as well. Even the restriction of mapping Ψ to transfer tensors and basis functions which fulfill Assumptions 2 does not influence the tangent space since the restricted set is a dense and open subset of the domain, cf. Sect. 3.2 in [14].

We are now ready to formulate the differential equations for the transfer tensors and the basis functions at the leaves. In view of the recursive definition of \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), it does not come at a surprise that these differential equations are defined recursively, too. The same result has also been obtained with other techniques in [12] and [14].

Theorem 3

(equations of motion)

Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format, let \((\mathcal{B}^{\mathcal{N}} )_{\mathcal{N} \in\mathcal{I}(T)}\) the corresponding transfer tensors, and let \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) the corresponding basis functions on the leaves of the dimension tree. Furthermore we define for every \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \}\) the symmetric stiffness matrix

$$ M^{\mathcal{N}}= \bigl(m_{ij}^{\mathcal{N}} \bigr)_{i,j=1}^{r_\mathcal{N}} \in\mathbb{R}^{r_\mathcal{N}\times r_\mathcal{N}} \quad \textit{with entries } m_{ij}^{\mathcal{N}} = \bigl\langle \mathcal{S} (Y, \mathcal{N}, i ) , \mathcal{S} (Y, \mathcal{N}, j ) \bigr\rangle $$
(3.7)

and its inverse

$$ W^{\mathcal{N}} = \bigl (w_{ij}^{\mathcal{N}} \bigr)_{i,j=1}^{r_\mathcal{N}} = \bigl( M^{\mathcal{N}} \bigr)^{-1}. $$
(3.8)

Then, the differential equation

$$\begin{aligned} \dot{Y} = P(Y) F(Y) , \quad t \geq0 \end{aligned}$$

is equivalent to

$$\begin{aligned} \dot{Y} = & \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum_{l=1}^{r_{\mathcal{N}_{02}}} \dot{\mathcal{B}}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l \\ & {}+ \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} \dot{U}^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l + \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_{01}}_j \cdot\dot{U}^{\mathcal{N}_{02}}_l \end{aligned}$$
(3.9)

with \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal {N}_{0})\). The time derivatives of the basis functions on an interior node of the dimension tree satisfy the recursion

$$\begin{aligned} \dot{U}^\mathcal{N}_i = & \sum _{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \dot{\mathcal{B}}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \\ & {}+ \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} \dot{U}^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l + \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot\dot{U}^{\mathcal{N}_2}_l \end{aligned}$$
(3.10)

for all \(\mathcal{N}\in\mathcal{I} (T) \setminus\{\mathcal{N}_{0} \} \) with \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ} (\mathcal{N})\). On the leaves of the dimension tree we have

$$ \dot{U}^{\ell}_{i} = \sum _{i' = 1}^{r_{\ell}} w_{i', i}^{\ell} \mathcal{P}^\bot_{\ell} \bigl\langle F(Y) , \mathcal{S} \bigl(Y, \ell, i' \bigr) \bigr\rangle, \quad\ell\in \mathcal{L}(T),\ i=1,\dots, r_{\ell}. $$
(3.11)

The transfer tensors evolve according to the differential equations

$$ \dot{\mathcal{B}}^{\mathcal{N}_0}_{1,j,l} = \bigl\langle F(Y) , U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l \bigr \rangle $$

and

$$ \dot{\mathcal{B}}^{\mathcal{N}}_{i,j,l} = \sum _{i'=1}^{r_\mathcal{N}} w_{i,i'}^\mathcal{N}\bigl \langle F(Y) , \mathcal{S} \bigl(Y, \mathcal{N}, i' \bigr) \cdot \mathcal{P}^\bot_{\mathcal{N}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle. $$

We remark that the crucial terms are \(\dot{U}^{\ell}_{i}\) for \(\ell\in\mathcal{L}(T)\) and \(\dot{\mathcal{B}}^{\mathcal{N}}_{i,j,l}\) for \(\mathcal{N}\in \mathcal{I}(T)\). If these objects are known, all \(\dot{U}^{\mathcal{N}}_{i}\) with \(\mathcal{N}\in\mathcal{I} (T) \setminus\{\mathcal{N}_{0} \}\) can be computed recursively via (3.10). Hence, the time derivatives of the \(\mathcal{N}\)-frames at the interior nodes can be considered as auxiliary variables.

Before the proof of Theorem 3, we show that the solution of the equations of motion preserve the orthonormality of the basis functions.

Lemma 4

(Gauge-conditions)

Let \(( ( \mathcal{B}^{\mathcal{N}}(t) )_{\mathcal{N}\in\mathcal {I}(T)} , ( U^{\ell}(t) )_{\ell\in\mathcal{L}(T)} )\) be the solution of the equations of motion of Theorem 3, and let \(( (\dot{\mathcal{B}}^{\mathcal{N}}(t) )_{\mathcal{N}\in\mathcal{I}(T)} , ( \dot{U}^{\ell}(t) )_{\ell\in \mathcal{L}(T)} )\) be the corresponding time derivatives. Then the Gauge-condition

$$ \bigl\langle\dot{U}^\mathcal{N}_i (t) , U^\mathcal{N}_j (t) \bigr\rangle= 0 $$

is satisfied for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\), \(i,j \in\{1, \ldots,r_{\mathcal{N}}\}\) and t≥0. Furthermore, the orthonormality of the basis functions

$$ \bigl\langle U^\mathcal{N}_i (t) , U^\mathcal{N}_j (t) \bigr\rangle= \delta_{i,j} $$
(3.12)

is preserved for all t≥0 if (3.12) holds for t=0.

Proof of Lemma 4

For all \(\mathcal{N}\in\mathcal{I}(T)\setminus\{\mathcal{N}_{0}\}\), \((\mathcal{N}_{1},\mathcal{N}_{2} ) = \mathit{succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \ldots, r_{\mathcal{N}}\} \) the equations

$$\begin{aligned} &\sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \dot{\mathcal{B}}^\mathcal{N}_{i_1,j,l} \mathcal{B}^\mathcal{N}_{i_2,j,l} \\ &\quad= \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_2,j,l} \sum_{i=1}^{r_\mathcal{N}} w_{i_1,i}^\mathcal{N} \bigl\langle F(Y) , \mathcal{S} (Y, \mathcal{N}, i ) \cdot\mathcal {P}^\bot_{\mathcal{N}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle \\ &\quad= \sum_{i =1}^{r_\mathcal{N}} w_{i_1,i}^\mathcal{N}\Biggl\langle F(Y) , \mathcal{S} (Y, \mathcal {N}, i ) \cdot\mathcal{P}^\bot_{\mathcal{N}} \Biggl( \sum _{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_2,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr) \Biggr\rangle \\ &\quad= 0 \end{aligned}$$
(3.13)

hold. Furthermore, we have

$$\begin{aligned} \bigl\langle\dot{U}^\ell_{i_1} , U^\ell_{i_2} \bigr\rangle = & \Biggl\langle\sum_{i = 1}^{r_{\ell}} w_{i, i_1}^{\ell} \mathcal{P}^\bot_{\ell} \bigl \langle F(Y) , \mathcal{S} (Y, \ell, i ) \bigr\rangle, U^\ell_{i_2} \Biggr\rangle= 0 \end{aligned}$$

for all \(\ell\in\mathcal{L}(T), i_{1},i_{2} \in\{1, \ldots, r_{\ell}\} \). Now suppose that

$$ \bigl\langle\dot{U}^{\mathcal{N}_1}_{j_1} , U^{\mathcal{N}_1}_{j_2} \bigr\rangle= 0 \quad\text{and} \quad \bigl\langle\dot{U}^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_2}_{l_2} \bigr\rangle = 0 $$

for an interior node \(\mathcal{N}\in\mathcal{I}(T)\setminus\{ \mathcal{N}_{0} \}\), \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) for all \(j_{1},j_{2} \in\{1, \ldots, r_{\mathcal{N}_{1}} \}\) and \(l_{1},l_{2} \in\{1, \ldots, r_{\mathcal{N}_{2}} \}\). Then with (3.13) we deduce

$$\begin{aligned} &\bigl\langle\dot{U}^{\mathcal{N}}_{i_1} , U^{\mathcal{N}}_{i_2} \bigr\rangle \\ &\quad= \sum_{j_1=1}^{r_{\mathcal{N}_1}} \sum _{l_1=1}^{r_{\mathcal{N}_2}} \sum_{j_2=1}^{r_{\mathcal{N}_1}} \sum_{l_2=1}^{r_{\mathcal{N}_2}} \dot{\mathcal{B}}^\mathcal {N}_{i_1,j_1,l_1} \mathcal{B}^\mathcal{N}_{i_2,j_2,l_2} \bigl\langle U^{\mathcal{N}_1}_{j_1} \cdot U^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_1}_{j_2} \cdot U^{\mathcal{N}_2}_{l_2} \bigr\rangle \\ &\qquad {}+ \sum_{j_1=1}^{r_{\mathcal{N}_1}} \sum _{l_1=1}^{r_{\mathcal{N}_2}} \sum_{j_2=1}^{r_{\mathcal{N}_1}} \sum_{l_2=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j_1,l_1} \mathcal{B}^\mathcal{N}_{i_2,j_2,l_2} \bigl\langle\dot {U}^{\mathcal{N}_1}_{j_1} \cdot U^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_1}_{j_2} \cdot U^{\mathcal{N}_2}_{l_2} \bigr\rangle \\ & \qquad{}+ \sum_{j_1=1}^{r_{\mathcal{N}_1}} \sum _{l_1=1}^{r_{\mathcal{N}_2}} \sum_{j_2=1}^{r_{\mathcal{N}_1}} \sum_{l_2=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j_1,l_1} \mathcal{B}^\mathcal{N}_{i_2,j_2,l_2} \bigl\langle U^{\mathcal{N}_1}_{j_1} \cdot\dot{U}^{\mathcal{N}_2}_{l_1} , U^{\mathcal{N}_1}_{j_2} \cdot U^{\mathcal{N}_2}_{l_2} \bigr\rangle \\ &\quad= 0 \end{aligned}$$

for all \(i_{1},i_{2} \in\{1, \ldots, r_{\mathcal{N}}\}\). Hence, the Gauge-conditions are satisfied, and the equations

$$\begin{aligned} \bigl\langle U^{\mathcal{N}}_{i} (t) , U^{\mathcal{N}}_{j} (t) \bigr\rangle =& \bigl\langle U^{\mathcal{N}}_{i} (0) , U^{\mathcal{N}}_{j} (0) \bigr\rangle+ \int_0^t \bigl\langle\dot{U}^{\mathcal{N}}_{i} (s) , U^{\mathcal{N}}_{j} (s) \bigr\rangle+ \bigl\langle U^{\mathcal{N}}_{i} (s) , \dot{U}^{\mathcal{N}}_{j} (s) \bigr\rangle\,ds \\ = & \delta_{i,j} \end{aligned}$$

show the preservation of the orthonormality. □

Proof of Theorem 3

In order to prove the theorem it has to be shown that the variational equation

$$ \bigl\langle F(Y) - \dot{Y} , \delta Y \bigr\rangle= 0 $$

is satisfied for every variation \(\delta Y \in\mathcal{T}_{Y} \mathcal{M}\).

(i) We start with the variation

$$ \delta Y = \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l, \qquad (\mathcal{N}_1,\mathcal{N}_2 ) = \mathit{succ} (\mathcal{N}_0) $$
(3.14)

for an arbitrary \(\delta\mathcal{B}^{\mathcal{N}_{0}} \in\mathbb {R}^{r_{\mathcal{N}_{1}} \times r_{\mathcal{N}_{2}}}, j \in\{1, \ldots, r_{\mathcal{N}_{1}} \},l \in\{1,\ldots, r_{\mathcal{N}_{2}} \} \). Together with (3.9) this yields

$$\begin{aligned} \langle\dot{Y} , \delta Y \rangle = & \Biggl\langle\sum _{j'=1}^{r_{\mathcal{N}_1} } \sum_{l'=1}^{r_{\mathcal{N}_2} } \dot{\mathcal{B}}^{\mathcal{N}_0}_{1,j',l'} U^{\mathcal{N}_1}_{j'} \cdot U^{\mathcal{N}_2}_{l'} , \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr \rangle \\ & {}+ \Biggl\langle\sum_{j'=1}^{r_{\mathcal{N}_1} } \sum_{l'=1}^{r_{\mathcal{N}_2} } \mathcal{B}^{\mathcal{N}_0}_{1,j',l'} \dot{U}^{\mathcal{N}_1}_{j'} \cdot U^{\mathcal{N}_2}_{l'} , \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr\rangle \\ & {}+ \Biggl\langle\sum_{j'=1}^{r_{\mathcal{N}_1} } \sum_{l'=1}^{r_{\mathcal{N}_2} } \mathcal{B}^{\mathcal{N}_0}_{1,j',l'} U^{\mathcal{N}_1}_{j'} \cdot\dot{U}^{\mathcal{N}_2}_{l'} , \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr\rangle \\ = & \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} \dot{ \mathcal{B}}^{\mathcal{N}_0}_{1,j,l} \\ = & \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} \bigl\langle P(Y) F(Y) , U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr \rangle \\ = & \bigl\langle F(Y) , \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr \rangle \\ =& \bigl\langle F(Y) , \delta Y \bigr\rangle. \end{aligned}$$

This proves that

$$ \bigl\langle F(Y) - \dot{Y} , \delta Y \bigr\rangle= 0 \quad\text{for all } \delta Y = \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l. $$

(ii) Next, we assume \(\mathcal{N}\in\mathcal{I}(T)\setminus\{ \mathcal{N}_{0}\}\), \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ}(\mathcal{N}) \) and choose the variation

$$ \delta Y = \delta\mathcal{B}^\mathcal{N}_{i,j,l} \mathcal{S}(Y, \mathcal{N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l $$
(3.15)

for an arbitrary \(\delta\mathcal{B}^{\mathcal{N}} \in\mathbb {R}^{r_{\mathcal{N}}\times r_{\mathcal{N}_{1}} \times r_{\mathcal{N}_{2}}}\), \(i \in\{1,\ldots, r_{\mathcal{N}} \}\), \(j \in\{1,\ldots, r_{\mathcal{N}_{1}} \}\), \(l \in\{1,\ldots, r_{\mathcal{N}_{2}} \} \). Deriving both sides of (2.7) with respect to t gives that

$$\begin{aligned} \langle\dot{Y} , \delta Y \rangle = & \Biggl\langle\sum _{i' = 1}^{r_\mathcal{N}} \dot{\mathcal{S}} \bigl(Y, \mathcal{N} ,i'\bigr) \cdot U^\mathcal{N}_{i'} , \delta \mathcal{B}^\mathcal{N}_{i,j,l} \mathcal{S}(Y, \mathcal{N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr \rangle \\ & {}+ \Biggl\langle\sum_{i' = 1}^{r_\mathcal{N}} \mathcal{S} \bigl(Y, \mathcal{N},i'\bigr) \cdot\dot{U}^\mathcal{N}_{i'} , \delta\mathcal{B}^\mathcal{N}_{i,j,l} \mathcal{S}(Y, \mathcal {N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \Biggr\rangle \\ = & \sum_{i' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle\dot{\mathcal{S}} \bigl(Y, \mathcal{N},i'\bigr) , \mathcal{S}(Y, \mathcal{N}, i) \bigr\rangle\bigl \langle U^\mathcal{N}_{i'} , U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr\rangle \\ & {}+ \sum_{i' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle\mathcal{S} \bigl(Y, \mathcal{N} ,i'\bigr) , \mathcal{S}(Y, \mathcal{N}, i) \bigr\rangle\bigl \langle \dot{U}^\mathcal{N}_{i'} , U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr\rangle \\ = & \sum_{i' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle P(Y) F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot U^\mathcal{N}_{i'} \bigr \rangle \mathcal{B}^\mathcal{N}_{i',j,l} \\ & {}+ \sum_{i' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} m_{i',i}^\mathcal{N} \dot{\mathcal{B}}^\mathcal{N}_{i',j,l} \\ = & \sum_{i' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle P(Y) F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot U^\mathcal{N}_{i'} \bigr \rangle \mathcal{B}^\mathcal{N}_{i',j,l} \\ & {}+ \sum_{i'' = 1}^{r_\mathcal{N}} \delta \mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle F(Y) , \mathcal{S} \bigl( Y, \mathcal{N}, i''\bigr) \cdot\mathcal{P}_\mathcal {N}^\bot \bigl( U^{\mathcal{N}_1} _j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle\Biggl( \sum_{i' = 1}^{r_\mathcal{N}} m_{i',i}^\mathcal{N}w_{i',i''}^\mathcal{N}\Biggr) \end{aligned}$$

and since \(\mathcal{S}(Y, \mathcal{N}, i) \cdot U^{\mathcal{N}}_{i'} \in\mathcal{T}_{Y} \mathcal{M}\) we obtain

$$\begin{aligned} \langle\dot{Y} , \delta Y \rangle = & \sum _{i' = 1}^{r_\mathcal{N}} \delta\mathcal{B}^\mathcal{N} _{i,j,l} \bigl\langle F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot U^\mathcal{N}_{i'} \bigr\rangle\mathcal{B}^\mathcal{N}_{i',j,l} \\ & {}+ \delta\mathcal{B}^\mathcal{N}_{i,j,l} \bigl \langle F(Y) , \mathcal{S} ( Y, \mathcal{N}, i) \cdot\mathcal {P}_\mathcal{N}^\bot \bigl( U^{\mathcal{N}_1} _j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle. \end{aligned}$$
(3.16)

On the other hand

$$\begin{aligned} \bigl\langle F(Y) , \delta Y \bigr\rangle = & \bigl\langle F(Y) , \delta\mathcal{B}^\mathcal{N}_{i,j,l} \mathcal{S}(Y, \mathcal{N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr \rangle \\ = & \delta\mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot\mathcal{P}_\mathcal {N}\bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle \\ & {}+ \delta\mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot\mathcal{P}_\mathcal {N}^\bot \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle \\ = & \sum_{i' = 1}^{r_\mathcal{N}} \delta\mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot U^\mathcal{N}_{i'} \bigr \rangle \mathcal{B}^\mathcal{N}_{i',j,l} \\ & {}+ \delta\mathcal{B}^\mathcal{N}_{i,j,l} \bigl\langle F(Y) , \mathcal{S}(Y, \mathcal{N}, i) \cdot\mathcal{P}_\mathcal {N}^\bot \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle \end{aligned}$$
(3.17)

holds. Since (3.16) is equal to (3.17), it follows that

$$ \bigl\langle F(Y) - \dot{Y} , \delta Y \bigr\rangle= 0 \quad\text{for all } \delta Y = \delta\mathcal{B}^\mathcal{N}_{i,j,l} \mathcal{S}(Y, \mathcal {N}, i) \cdot U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_j. $$

(iii) Finally, let \(\ell\in\mathcal{L}(T)\) and consider the variation

$$ \delta Y = \mathcal{S} (Y, \ell, i) \cdot \delta U^\ell_i $$
(3.18)

for an arbitrary \(\delta U^{\ell}_{i} : I_{\ell}\rightarrow\mathbb{R}, i\in\{1, \ldots,r_{\ell}\}\). Similar as before, the time derivative of (2.7) yields

$$\begin{aligned} \langle\dot{Y} , \delta Y \rangle = & \Biggl\langle\sum _{i' = 1}^{r_\ell} \dot{\mathcal{S}} \bigl(Y, \ell ,i'\bigr) \cdot U^\ell_{i'} , \mathcal{S} (Y, \ell, i) \cdot\delta U^\ell_i \Biggr\rangle \\ & {}+ \Biggl\langle\sum_{i' = 1}^{r_\ell} \mathcal{S} \bigl(Y, \ell,i'\bigr) \cdot \dot{U}^\ell_{i'} , \mathcal{S} (Y, \ell, i) \cdot\delta U^\ell_i \Biggr\rangle \\ = & \sum_{i' = 1}^{r_\ell} \bigl \langle\dot{\mathcal{S}}\bigl( Y, \ell, i'\bigr) , \mathcal{S}( Y, \ell, i) \bigr\rangle\bigl\langle U^\ell_{i'} , \delta U^\ell_{i} \bigr\rangle \\ & {}+ \sum_{i' = 1}^{r_\ell}\bigl \langle\mathcal{S}\bigl( Y, \ell, i'\bigr) , \mathcal{S}( Y, \ell, i) \bigr\rangle\bigl\langle\dot{U}^\ell_{i'} , \delta U^\ell_{i} \bigr\rangle \\ = & \sum_{i' = 1}^{r_\ell} \bigl \langle P(Y) F(Y) , \mathcal{S}( Y, \ell, i) \cdot U^\ell_{i'} \bigr\rangle\bigl\langle U^\ell_{i'} , \delta U^\ell_{i} \bigr\rangle \\ & {}+ \sum_{i' = 1}^{r_\ell} m_{i',i}^\ell\Biggl\langle\sum _{i''=1}^{r_\ell} w_{i',i''}^\ell \mathcal{P}_\ell^\bot\bigl\langle F(Y) , \mathcal{S} \bigl( Y, \ell, i''\bigr) \bigr\rangle, \delta U^\ell_{i} \Biggr\rangle \\ = & \sum_{i' = 1}^{r_\ell} \bigl \langle P(Y) F(Y) , \mathcal{S}( Y, \ell, i) \cdot U^\ell_{i'} \bigr\rangle\bigl\langle U^\ell_{i'} , \delta U^\ell_{i} \bigr\rangle \\ & {}+ \sum_{i'' = 1}^{r_\ell} \bigl \langle\mathcal{P}_\ell^\bot F(Y) , \mathcal{S} \bigl( Y, \ell, i''\bigr) \cdot\delta U^\ell_{i} \bigr\rangle\Biggl( \sum_{i'=1}^{r_\ell} m_{i',i}^\ell w_{i',i''}^\ell\Biggr) \end{aligned}$$

and with \(\mathcal{S}( Y, \ell, i) \cdot U^{\ell}_{i'} \in\mathcal {T}_{Y} \mathcal{M}\) we get

$$\begin{aligned} \langle\dot{Y} , \delta Y \rangle = & \sum _{i' = 1}^{r_\ell} \bigl\langle F(Y) , \mathcal{S}( Y, \ell , i) \cdot U^\ell_{i'} \bigr\rangle\bigl\langle U^\ell_{i'} , \delta U^\ell_{i} \bigr\rangle \\ & {}+ \bigl\langle F(Y) , \mathcal{S} ( Y, \ell, i) \cdot \mathcal{P}_\ell^\bot\delta U^\ell_{i} \bigr\rangle. \end{aligned}$$
(3.19)

On the other hand

$$\begin{aligned} \bigl\langle F(Y) , \delta Y \bigr\rangle = & \bigl\langle F(Y) , \mathcal{S} (Y, \ell, i) \cdot\delta U^\ell_i \bigr \rangle \\ = & \bigl\langle F(Y) , \mathcal{S} (Y, \ell, i) \cdot \mathcal{P}_\ell\delta U^\ell_i \bigr\rangle \\ & {}+ \bigl\langle F(Y) , \mathcal{S} (Y, \ell, i) \cdot \mathcal{P}_\ell^\bot\delta U^\ell_i \bigr\rangle \\ = & \sum_{i'=1}^{r_\ell} \bigl\langle F(Y) , \mathcal{S} (Y, \ell, i) \cdot U^\ell_{i'} \bigr\rangle\bigl\langle U^\ell_{i'} , \delta U^\ell_i \bigr\rangle \\ & {}+ \bigl\langle F(Y) , \mathcal{S} (Y, \ell, i) \cdot \mathcal{P}_\ell^\bot\delta U^\ell_i \bigr\rangle \end{aligned}$$
(3.20)

holds. Since (3.19) is equal to (3.20), it follows that

$$ \bigl\langle F(Y) - \dot{Y} , \delta Y \bigr\rangle= 0 \quad\text{for all } \delta Y = \mathcal{S} (Y, \ell, i) \cdot\delta U^\ell_i. $$

The superposition of the variations (3.14), (3.15) and (3.18) span the whole tangent space and therefore

$$ \bigl\langle F(Y) - \dot{Y} , \delta Y \bigr\rangle= 0 \quad\text{for all variations } \delta Y \in\mathcal{T}_Y \mathcal{M}. $$

 □

Corollary 5

Let \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) be an arbitrary tensor, let \(Y\in\mathcal{M}\) and let P(Y)B be the orthogonal projection of B onto the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) defined by

$$ \bigl\langle P(Y) B - B , \delta Y \bigr\rangle= 0 \quad\textit{for all } \delta Y \in\mathcal{T}_Y \mathcal{M}. $$

Then, formally replacing \(\dot{Y}\) by P(Y)B and F(Y) by B in Theorem 3 yields a recursive representation of P(Y)B.

4 Matrix representation of tensors

Since the solution Y ex of the original differential equation (3.1) does, in general, not evolve on the approximation manifold \(\mathcal{M} = \mathcal {H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), the solution Y of the projected problem only yields an approximation YY ex . For the error analysis of this approximation, the matricization of a tensor is a useful concept. Such a matricization turns a tensor into a matrix with the same entries, just as the command reshape(…) in Matlab. For any \(\mathcal{N}\in T\) the set \(\mathcal{N}_{C}:=\{1, \ldots,d \} \setminus\mathcal{N}\) is called the complement node of \(\mathcal{N}\). Then, the matricization

$$ Y^{(\mathcal{N}, \mathcal{N}_C)} \in\mathbb{R}^{\pi(\mathcal{N}) \times\pi(\mathcal{N}_C)}, \quad\pi(\mathcal{N}) = \prod _{k \in\mathcal{N}} N_{k}, \ \pi(\mathcal{N}_C) = \prod_{k \in\mathcal{N}_C} N_{k} $$

of a tensor \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) rearranges the elements of the tensor in such a way that all the different directions in \(\mathcal{N}\) range over the rows whereas all the directions in \(\mathcal{N}_{C}\) range over the columns of the matricization. There are many different ways to do this, but it does not really matter which mapping is used. The only condition we impose is that \(Y^{(\mathcal{N}, \mathcal{N}_{C})} = ( Y^{(\mathcal{N}_{C}, \mathcal{N})} )^{T}\). For the inverse mapping the notation

$$ Y = \bigl( Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr)_{(\mathcal{N}, \mathcal{N}_C)} $$

will be used. A special case of a matricization is the vectorization

$$ Y^{(\mathcal{N}_0,\{ \})} \in\mathbb{R}^{N_1 \cdots N_d} $$

which reshapes a tensor to a vector. The corresponding inverse mapping is denoted by

$$ Y = \bigl(Y^{(\mathcal{N}_0,\{ \})} \bigr)_{(\mathcal{N}_0,\{ \})}. $$

For \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) and \(\mathcal{N}\in\mathcal{I}(T)\backslash\{ \mathcal {N}_{0}\}\), let the single-hole matrix

$$ \mathbb{S}^\mathcal{N}:= \begin{pmatrix} ( \mathcal{S}(Y, \mathcal{N},1) )^{(\mathcal{N}_C,\{ \})} \big| \quad\ldots\quad \big| ( \mathcal{S}(Y, \mathcal{N},r_\mathcal{N}) )^{(\mathcal{N}_C,\{ \})} \end{pmatrix} \in\mathbb{R}^{\pi(\mathcal{N}_C) \times r_\mathcal{N}} $$

be the matrix whose j-th column is the vectorization of \(\mathcal {S}(Y,\mathcal{N},j)\). Similarly, the orthonormal basis matrix is defined by

$$ \mathbb{U}^\mathcal{N}:= \begin{pmatrix} ( U^\mathcal{N}_1 )^{(\mathcal{N},\{ \})} \big| \quad\ldots\quad \big| ( U^\mathcal{N}_{r_\mathcal{N}} )^{(\mathcal{N},\{ \})} \end{pmatrix} \in\mathbb{R}^{\pi(\mathcal{N}) \times r_\mathcal{N}}. $$

Assumptions 1 implies that \((\mathbb {U}^{\mathcal{N}})^{T}\mathbb{U}^{\mathcal{N}}=I\), and it follows from (2.7) that

$$ Y^{(\mathcal{N},\mathcal{N}_C)} = \mathbb{U}^\mathcal{N}\bigl (\mathbb{S}^\mathcal{N} \bigr)^T, \quad \mathbb{S}^\mathcal{N}= \bigl( Y^{(\mathcal{N},\mathcal{N}_C)} \bigr)^T \mathbb{U}^\mathcal{N}. $$

Hence, the singular values of the matricization \(Y^{(\mathcal{N}, \mathcal{N}_{C})}\) coincide with the singular values of the single-hole matrix \(\mathbb{S}^{\mathcal{N}}\). The projection onto the space spanned by the columns of \(\mathbb {U}^{\mathcal{N}}\) and on the orthogonal complement will be denoted by

$$ P^\mathcal{N}:= \mathbb{U}^\mathcal{N}\bigl( \mathbb{U}^\mathcal{N} \bigr)^T \quad\mbox{and} \quad\bigl(P^\mathcal{N} \bigr)^\bot:= I - \mathbb{U}^\mathcal{N}\bigl( \mathbb {U}^\mathcal{N} \bigr)^T, $$

respectively. In addition we define for \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) the binary matrix operator

$$\begin{aligned} \mathbb{U}^{\mathcal{N}_1} \odot\mathbb{U}^{\mathcal{N}_2} \rightarrow& \mathbb{R}^{\pi(\mathcal{N}) \times(r_{\mathcal {N}_1} r_{\mathcal{N}_2})} \\ \mathbb{U}^{\mathcal{N}_1} \odot\mathbb{U}^{\mathcal{N}_2} = & ( v_{11} | v_{12} | \ldots| v_{1r_{\mathcal{N}_2}} | v_{21} | \ldots| v_{2r_{\mathcal{N}_2}} | \ldots| \ldots| v_{r_{\mathcal {N}_1}r_{\mathcal{N}_2}} ) \\ v_{jk} = & \bigl( U^{\mathcal{N}_1}_{j} \cdot U^{\mathcal{N}_2}_{k} \bigr)^{(\mathcal{N},\{ \})}. \end{aligned}$$

After these preparations, the results of [14] can now be adapted to show that \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) is indeed a manifold.

Lemma 6

The set

$$ \mathcal{H}\textit{-Tucker}\bigl((r_\mathcal{N})_{\mathcal{N}\in T}\bigr) = \bigl\{ X \in\mathbb{R}^{N_1 \times\cdots\times N_d } : \mathrm {rank} \bigl( X^{(\mathcal{N}, \mathcal{N}_C)} \bigr) = r_\mathcal{N}\forall \mathcal{N}\in T\setminus\{ \mathcal{N}_0\} \bigr\} $$

is a smooth submanifold of \(\mathbb{R}^{N_{1} \times\cdots\times N_{d} }\).

Proof

In [14, Theorem 4.11] it has been shown that

$$\begin{aligned} \widehat{\mathcal{M}} := \bigl\{ X \in\mathbb{R}^{N_1 \times \cdots\times N_d } : \mathrm{rank} \bigl( X^{(\mathcal{N}, \mathcal{N}_C)} \bigr) = r_\mathcal{N} \text{ for all } \mathcal{N}\in T \setminus\{\mathcal{N}_0\} \bigr\} \end{aligned}$$

is a smooth submanifold of \(\mathbb{R}^{N_{1} \times\cdots\times N_{d} }\). Hence, it only has to be shown that \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})=\widehat{\mathcal{M}}\). Let \(Y \in\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) be represented by \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\). By Assumption 2, the sequence \(\langle Y , U^{\mathcal{N}}_{1} \rangle, \ldots, \langle Y , U^{\mathcal{N}}_{r_{\mathcal{N}}} \rangle\) is linearly independent for all \(T\setminus\{\mathcal{N}_{0}\}\) , which means that

$$ \mathrm{rank} \bigl( \mathbb{S}^\mathcal{N}\bigr) = \mathrm{rank} \bigl( Y^{(\mathcal{N},\mathcal{N}_C)} \bigr) = r_\mathcal{N} \quad\text{for all } \mathcal{N}\in T\setminus\{ \mathcal{N}_0\} $$

and therefore \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}) \subseteq\widehat{\mathcal{M}}\). Conversely, for every \(\widehat{Y}\in\widehat{\mathcal{M}}\) Proposition 3.6 in [14] and orthogonalization with Algorithm 3 in [2] ensures the existence of \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\) such that the equation

$$ \widehat{Y} = \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l , \quad( \mathcal{N}_1 , \mathcal{N}_2) = \mathit{succ}(\mathcal{N}_0) $$

and the recursion

$$ U^\mathcal{N}_i = \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l $$

hold for all \(\mathcal{N}\in\mathcal{I}(T) , i \in\{1, \ldots, r_{\mathcal{N}}\}\) and \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\). The rank of the single-hole matrix \(\mathbb{S}^{\mathcal{N}}= ( X^{(\mathcal{N}, \mathcal{N}_{C})} )^{T} \mathbb{U}^{\mathcal{N}}\) equals \(r_{\mathcal{N}}\) for all \(\mathcal {N}\in T\setminus\{\mathcal{N}_{0}\}\), which means that the sequence \(\langle X , U^{\mathcal{N}}_{1} \rangle, \ldots, \langle X , U^{\mathcal{N}}_{r_{\mathcal{N}}} \rangle\) is linearly independent for all \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \}\) and therefore \(\widehat{\mathcal{M}} \subseteq\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\). □

One of the main ingredients for the error analysis is a compact representation of the projection of a tensor on the tangent space of the manifold, i.e. an explicit representation which is not based on a recursion. Such a compact representation can be obtained in matrix-vector notation. As a preparatory step, we reformulate the results of Corollary 5 in matrix-vector notation.

Corollary 7

Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format represented by the transfer tensors \((\mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}\) and by the basis functions \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) on the leaves of the dimension tree. If \(B \in\mathbb{R}^{N_{1} \times \cdots\times N_{d}}\) is an arbitrary tensor, then under the conditions of Corollary 5 there exists a tuple \(( ( \delta\mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal {I}(T)} ,(\delta U^{\ell})_{\ell\in\mathcal{L}(T)} )\) such that for \((\mathcal {N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\)

$$\begin{aligned} P(Y) B =& \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l \\ & {}+ \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} \delta U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l + \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum _{l=1}^{r_{\mathcal{N}_{02}}} \mathcal{B}^{\mathcal{N}_0}_{1,j,l} U^{\mathcal{N}_{01}}_j \cdot\delta U^{\mathcal{N}_{02}}_l \end{aligned}$$

with

$$\begin{aligned} \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} =& \bigl( \bigl( U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l \bigr)^{(\mathcal{N}_0, \{ \} )} \bigr)^T B^{(\mathcal{N}_0, \{ \} )}. \end{aligned}$$

For all \(\mathcal{N}\in\mathcal{I} (T) \backslash\{\mathcal{N}_{0} \} \) with \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ} (\mathcal{N})\), \(\delta U^{\mathcal{N}}\) is defined by the recursion

$$\begin{aligned} \delta U^\mathcal{N}_i = & \sum _{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \delta\mathcal{B}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \\ & {}+ \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} \bigl( \delta U^{\mathcal{N}_1} \bigr)_j \cdot U^{\mathcal{N}_2}_l + \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i,j,l} U^{\mathcal{N}_1}_j \cdot\delta U^{\mathcal{N}_2}_l \end{aligned}$$

with

$$\begin{aligned} \bigl( \delta\mathcal{B}^\mathcal{N}_{1,j,l} \cdots\delta \mathcal{B}^\mathcal{N}_{r_\mathcal{N},j,l} \bigr) =& \bigl( \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}, \{ \} )} \bigr)^T \bigl(P^\mathcal{N} \bigr)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \bigl( \mathbb {S}^{\mathcal{N}^+} \bigr)^T \\ \delta\mathbb{U}^\ell =& \bigl(P^\ell\bigr)^\bot B^{(\ell,\ell_C)} \bigl(\mathbb{S}^{\ell^+} \bigr)^T. \end{aligned}$$

Here and below, M + is the pseudoinverse of a matrix M, i.e. M +=(M T M)−1 M T if M has full column rank.

Proof

The equation for \(\delta\mathcal{B}^{\mathcal{N}_{0}}\) can be deduced directly from Corollary 5. For \(\mathcal{N}\in\mathcal{I}(T)\backslash\{\mathcal{N}_{0} \} \) and \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal {N})\), Corollary 5 states that

$$\begin{aligned} \delta\mathcal{B}^{\mathcal{N}}_{i,j,l} = & \sum _{i'=1}^{r_\mathcal{N}} \bigl\langle B , \mathcal{S} \bigl(Y, \mathcal{N} , i' \bigr) \cdot\mathcal{P}^\bot_{\mathcal{N}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) \bigr\rangle w_{i,i'}^\mathcal{N} \\ = & \sum_{i'=1}^{r_\mathcal{N}} \bigl\langle \mathcal{P}^\bot_{\mathcal{N}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr) , \bigl\langle B , \mathcal{S} \bigl(Y, \mathcal{N}, i' \bigr) \bigr\rangle\bigr\rangle w_{i,i'}^\mathcal{N} \\ = & \sum_{i'=1}^{r_\mathcal{N}} \bigl\langle U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l , \mathcal{P}^\bot_{\mathcal{N}} \bigl\langle B , \mathcal{S} \bigl(Y, \mathcal{N}, i' \bigr) \bigr\rangle\bigr\rangle w_{i,i'}^\mathcal{N} \end{aligned}$$

for all \(j \in\{1,\ldots,r_{\mathcal{N}_{1}} \} , l \in\{1,\ldots ,r_{\mathcal{N}_{2}} \} \). Since \(W^{\mathcal{N}}\) defined in (3.8) satisfies

$$\begin{aligned} \bigl( \mathbb{S}^\mathcal{N}\bigr) W^\mathcal {N}= \bigl( \mathbb{S}^\mathcal{N}\bigr) \bigl( \bigl( \mathbb{S}^\mathcal {N}\bigr)^T \mathbb{S}^\mathcal{N}\bigr)^{-1} = \bigl( \mathbb{S}^{\mathcal{N}^+} \bigr)^T \end{aligned}$$
(4.1)

we obtain

$$ \bigl( \delta\mathcal{B}^\mathcal{N}_{1,j,l} \cdots\delta \mathcal{B}^\mathcal{N}_{r_\mathcal{N},j,l} \bigr) = \bigl( \bigl( U^{\mathcal{N}_1} _j \cdot U^{\mathcal{N}_2} _l \bigr)^{(\mathcal{N}, \{ \} )} \bigr)^T \bigl(P^\mathcal{N} \bigr)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \bigl( \mathbb {S}^{\mathcal{N}^+} \bigr)^T. $$

Last, for \(\ell\in\mathcal{L}(T)\) and i=1,…,r we recall from Corollary 5 that

$$ \delta U^{\ell}_{i} = \sum_{i' = 1}^{r_{\ell}} \mathcal{P}^\bot_{\ell} \bigl\langle B, \mathcal{S} \bigl(Y, \ell, i' \bigr) \bigr\rangle w_{i', i}^{\ell}, $$

and substituting (4.1) with \(\mathcal{N}=\ell\) yields

$$ \delta\mathbb{U}^\ell= \bigl(P^\ell\bigr)^\bot B^{(\ell,\ell_C)} \bigl(\mathbb{S}^{\ell^+} \bigr)^T. $$

 □

With Corollary 7 we are now able to derive a compact representation for the projection of a tensor on the tangent space of the manifold.

Corollary 8

Under the assumptions of Corollary 7, the projection of a tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) can be written as

$$\begin{aligned} &P(Y) B \\ & \quad= \bigl( \bigl( \mathbb{U}^{\mathcal{N}_{01}} \odot\mathbb {U}^{\mathcal{N}_{02}} \bigr) \bigl( \mathbb{U}^{\mathcal{N}_{01}} \odot\mathbb{U}^{\mathcal{N}_{02}} \bigr)^T B^{(\mathcal{N}_0, \{ \} )} \bigr)_{(\mathcal{N}_0, \{ \} )} \\ & \qquad{}+ \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1,\mathcal{N}_2) = \mathit{succ} ( \mathcal{N})} } \bigl( \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb {U}^{\mathcal{N}_2} \bigr) \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb{U}^{\mathcal{N}_2} \bigr)^T \\ &\qquad{}\times\bigl(P^\mathcal{N}\bigr)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \bigl( \mathbb{S}^\mathcal{N}{\mathbb{S}^\mathcal{N}}^+ \bigr)^T \bigr)_{(\mathcal{N}, \mathcal{N}_C)} \\ & \qquad{}+ \sum_{\ell\in\mathcal{L}(T) } \bigl( \bigl(P^\ell\bigr)^\bot B^{(\ell, \ell_C)} \bigl( \mathbb{S}^\ell{\mathbb{S}^\ell}^+ \bigr)^T \bigr)_{(\ell, \ell_C)}, \end{aligned}$$

where \((\mathcal{N}_{01}, \mathcal{N}_{02}) = \mathit{succ}(\mathcal{N}_{0})\).

Proof

By resolving the recursion in Corollary 5 and 7, respectively, we obtain the equation

$$\begin{aligned} P(Y) B & = \sum_{j=1}^{r_{\mathcal{N}_{01}}} \sum_{l=1}^{r_{\mathcal{N}_{02}}} U^{\mathcal{N}_{01}}_j \cdot U^{\mathcal{N}_{02}}_l \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} \end{aligned}$$
(4.2)
$$\begin{aligned} & \quad{} + \sum _{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{\mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \cdot\sum_{i=1}^{r_{\mathcal{N}}} \delta\mathcal{B}^{\mathcal{N}}_{i,j,l} \mathcal{S}( Y, \mathcal {N}, i) \end{aligned}$$
(4.3)
$$\begin{aligned} &\quad{} + \sum_{\ell\in\mathcal{L}(T)} \sum _{i=1}^{r_{\ell}} \delta U^{\ell}_i \cdot\mathcal{S} (Y, \ell,i), \end{aligned}$$
(4.4)

see also Eqs. 3.3), (3.4), (3.5) and (3.6) in Sect. 3. A closer inspection of the first term (4.2) shows that

$$\begin{aligned} &\sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \delta\mathcal{B}^{\mathcal{N}_0}_{1,j,l} \\ & \quad= \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigl( \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}_0, \{ \} )} \bigr)^T B^{(\mathcal{N}_0, \{ \} )} \\ & \quad= \Biggl( \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}_0, \{ \} )} \bigl( \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}_0, \{ \} )} \bigr)^T B^{(\mathcal{N}_0, \{ \} )} \Biggr)_{(\mathcal{N}_0, \{ \} )} \\ & \quad= \bigl( \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb {U}^{\mathcal{N}_2} \bigr) \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb{U}^{\mathcal{N}_2} \bigr )^T B^{(\mathcal{N}_0, \{ \} )} \bigr)_{(\mathcal{N}_0, \{ \} )}. \end{aligned}$$

For the second term (4.3), we obtain

$$\begin{aligned} & \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \sum _{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \cdot \sum_{i=1}^{r_{\mathcal{N}}} \delta\mathcal{B}^{\mathcal{N}} _{i,j,l} \mathcal{S}( Y, \mathcal{N}, i) \\ & \quad= \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash \{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})}} \Biggl( \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}, \{ \} )} \\ &\qquad{}\times\sum _{i=1}^{r_{\mathcal{N}}} \delta\mathcal{B}^{\mathcal{N}}_{i,j,l} \bigl( \mathcal{S} (Y , \mathcal{N}, i )^{(\mathcal{N}_C, \{ \} )} \bigr)^T \Biggr)_{(\mathcal{N}, \mathcal{N}_C)} \\ & \quad= \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash \{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \Biggl( \sum_{j=1}^{r_{\mathcal{N}_1}} \sum _{l=1}^{r_{\mathcal{N}_2}} \bigl( \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}, \{ \} )} \bigr) \\ &\qquad{}\times\bigl( \delta \mathcal{B}^\mathcal{N}_{1,j,l} \dots\delta\mathcal{B}^\mathcal {N}_{r_\mathcal{N},j,l} \bigr) \bigl( \mathbb{S}^\mathcal{N}\bigr)^T \Biggr)_{(\mathcal {N}, \mathcal{N}_C)} \end{aligned}$$

and together with

$$ \bigl( \delta\mathcal{B}^\mathcal{N}_{1,j,l} \dots\delta \mathcal{B}^\mathcal{N}_{r_\mathcal{N},j,l} \bigr) = \bigl( \bigl( U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \bigr)^{(\mathcal{N}, \{ \} )} \bigr)^T \bigl(P^\mathcal{N} \bigr)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \bigl( {\mathbb {S}^\mathcal{N}}^+ \bigr)^T $$

we obtain

$$\begin{aligned} & \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \sum _{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} U^{\mathcal{N}_1}_j \cdot U^{\mathcal{N}_2}_l \cdot \sum_{i=1}^{r_{\mathcal{N}}} \delta\mathcal{B}^{\mathcal{N}} _{i,j,l} \mathcal{S}( Y, \mathcal{N}, i) \\ & \quad= \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash \{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1, \mathcal{N}_2) = \mathit{succ} (\mathcal{N})} } \bigl( \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb {U}^{\mathcal{N}_2} \bigr) \bigl( \mathbb{U}^{\mathcal{N}_1} \odot\mathbb{U}^{\mathcal{N}_2} \bigr)^T \bigl(P^\mathcal{N}\bigr)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \\ &\qquad{}\times\bigl( \mathbb{S}^\mathcal{N}{\mathbb{S}^\mathcal{N}}^+ \bigr)^T \bigr)_{(\mathcal{N}, \mathcal{N}_C)}. \end{aligned}$$

Finally, the last term (4.4) can be reformulated as

$$\begin{aligned} \sum_{\ell\in\mathcal{L}(T)} \sum_{i=1}^{r_{\ell}} \delta U^{\ell}_i \cdot\mathcal{S} (Y, \ell,i) = & \sum _{\ell\in\mathcal{L}(T) } \Biggl( \sum_{i=1}^{r_{\ell}} \bigl( \delta U^{\ell}_i \bigr)^{(\ell, \{ \} )} \bigl( \mathcal{S} (Y, \ell, i)^{(\ell_C, \{ \} )} \bigr)^T \Biggr )_{(\ell, \ell_C)} \\ = & \sum_{\ell\in\mathcal{L}(T) } \bigl( \delta\mathbb{U}^\ell \bigl( \mathbb{S}^\ell\bigr)^T \bigr)_{(\ell, \ell_C)} \\ = & \sum_{\ell\in\mathcal{L}(T) } \bigl( \bigl(P^\ell \bigr)^\bot B^{(\ell, \ell_C)} \bigl( {\mathbb{S}^\ell}^+ \bigr)^T \bigl( \mathbb{S}^\ell\bigr)^T \bigr)_{(\ell, \ell_C)} \\ = & \sum_{\ell\in\mathcal{L}(T) } \bigl( \bigl(P^\ell \bigr)^\bot B^{(\ell, \ell_C)} \bigl( \mathbb{S}^\ell{ \mathbb{S}^\ell}^+ \bigr)^T \bigr)_{(\ell, \ell_C)}. \end{aligned}$$

This proves the assertion. □

5 Curvature bounds

In this section estimates for the curvature of the manifold are proven. These estimates, formulated in Lemma 10 below, will play a crucial role in the error analysis of the variational approximation in Sect. 6. Similar results have been obtained for the dynamical low-rank approximation of matrices or tensors, respectively, in [7, 8]. Our general strategy follows the ideas developed there, but extending the results of [7, 8] to the hierarchical Tucker format is a considerable challenge due to the recursive definition of the equations of motion.

In the following σ i (M) denotes the i-th singular value (in descending order) of a matrix \(M \in\mathbb{R}^{n \times m}\) with n,mi. Moreover, we will use the Frobenius norm \(\| Y \|_{F} = \sqrt{\langle Y , Y \rangle}\) of tensors \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). By construction the Frobenius norm of a tensor equals the matrix Frobenius norm of a corresponding matricization, i.e. \(\| Y \|_{F} = \Vert Y^{(\mathcal{N}, \mathcal {N}_{C})} \Vert_{F} \) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\).

Lemma 9

If \(Y, \widetilde{Y} \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) are tensors in the hierarchical Tucker format fulfilling

$$ \sigma_{r_{\mathcal{N}}} \bigl( Y^{(\mathcal{N}, \mathcal{N}_C )} \bigr) \geq\rho> 0 \quad \textit{and} \quad\|Y - \widetilde{Y} \|_F = \delta $$

for some \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\), then

$$ \sigma_{r_{\mathcal{N}}} \bigl( \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C )} \bigr) \geq\rho- \delta. $$

If in addition ρ>δ, then the pseudoinverse \(\widetilde {\mathbb{S}}^{{\mathcal{N}}^{+}} \) of the single-hole-matrix \(\widetilde{\mathbb{S}}^{\mathcal{N}}\) of \(\widetilde{Y}\) is bounded by

$$ \bigl\| \widetilde{\mathbb{S}}^{{\mathcal{N}}^+} \bigr\|_2 \leq \frac{1}{\rho- \delta}. $$

Proof

The estimates

$$\begin{aligned} \delta= \| \widetilde{Y} - Y \|_F = & \bigl\| \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C)} - Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr\|_F \\ \geq& \bigl\| \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C)} - Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr\|_2 \\ \geq& \bigl\vert\sigma_{r_\mathcal{N}} \bigl( \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C)} \bigr) - \sigma _{r_\mathcal{N}} \bigl( Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \bigr\vert \end{aligned}$$
(5.1)

hold for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\). The inequality (5.1) is shown with Theorem 7.4.51 in [6]. Then

$$\begin{aligned} \sigma_{r_\mathcal{N}} \bigl( \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \geq& \sigma_{r_\mathcal{N}} \bigl( Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr) - \bigl\vert \sigma_{r_\mathcal{N}} \bigl( \widetilde{Y}^{(\mathcal{N}, \mathcal{N}_C)} \bigr) - \sigma_{r_\mathcal{N}} \bigl( Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \bigr\vert\geq\rho- \delta. \end{aligned}$$

For ρ>δ the pseudoinverse can be bounded by

$$\begin{aligned} \bigl\Vert\widetilde{\mathbb{S}}^{\mathcal{N}^+} \bigr\Vert_2 = & \bigl( \sigma_{r_\mathcal{N}} \bigl( \widetilde{\mathbb {S}}^{\mathcal{N}} \bigr) \bigr)^{-1} = \bigl( \sigma_{r_\mathcal{N}} \bigl( \widetilde {Y}^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \bigr)^{-1} \leq(\rho- \delta)^{-1}. \end{aligned}$$

 □

Definition 5.1

An ordered tuple \(( Y , \widetilde{Y} ) \in ( \mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) is called path-connected with respect to ODE (5.2), if there exists a solution for the boundary value problem

$$\begin{aligned} \begin{aligned} \dot{X} (\tau) & = \sum_{k=0}^\infty \bigl( P\bigl(X( \tau)\bigr) - P(Y) \bigr)^k P (Y) ( \widetilde {Y} - Y ) \\ X(0) & = Y , \qquad X(1) = \widetilde{Y} \end{aligned} \end{aligned}$$
(5.2)

on the interval [0,1].

Lemma 10

(curvature bounds)

Let \((Y, \widetilde{Y} ) \in ( \mathcal{H}\textit {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) tensors in the hierarchical Tucker format, which are path-connected with respect to ODE (5.2). Let \((\rho_{\mathcal{N}})_{\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\} }\) and \((c_{\mathcal{N}})_{\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\} }\) be families of positive real values such that

$$\begin{aligned} \sigma_{r_\mathcal{N}} \bigl (Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \geq\rho_\mathcal{N}> 0 \quad\textit{and} \quad\delta: = \| Y - \widetilde{Y} \|_F \leq c_\mathcal{N}\rho_\mathcal{N} \end{aligned}$$
(5.3)

for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\). Furthermore let \(\widetilde{c}> 0\) be a constant fulfilling the inequalities

$$ \sum_{ \mathcal{N}\in T\setminus\{\mathcal{N}_0\} } 8 c_\mathcal {N}\bigl(\widetilde{c} ( 1 - \widetilde{c} - c_\mathcal{N})\bigr)^{-1} \leq1 \quad \textit{and} \quad\widetilde{c} + c_\mathcal{N}< 1 $$

and define

$$ C_{Y, \widetilde{Y}} := \sum_{\mathcal{N}\in T \setminus\{ \mathcal {N}_0\}} 8 \bigl( \rho_\mathcal{N}( 1 - \widetilde{c} - c_\mathcal{N}) \bigr)^{-1} . $$

Then the bounds

$$\begin{aligned} \bigl\| \bigl(P(Y) - P(\widetilde{Y})\bigr) B \bigr\|_F \leq& C_{Y, \widetilde{Y}} \| Y - \widetilde{Y} \|_F \| B \|_F , \end{aligned}$$
(5.4)
$$\begin{aligned} \bigl\| P^\bot(Y) (\widetilde{Y}-Y) \bigr\|_F \leq& \frac{ C_{Y, \widetilde{Y}} }{1 - \widetilde{c}} \| Y - \widetilde{Y} \|_F^2 \end{aligned}$$
(5.5)

hold for every tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).

A similar result has been proven independently in [12].

Proof

Since the proof is rather long and technical, it is divided into several parts.

Part 1: For the moment we assume that there are

$$\begin{aligned} X(\tau) \in\mathcal{H}\text{-Tucker}\bigl((r_\mathcal {N})_{\mathcal{N}\in T}\bigr) \quad\text{ and } \quad Z(\tau) \bot \mathcal{T}_Y \mathcal{M} \end{aligned}$$

such that

$$ Y + \tau(\widetilde{Y} -Y) = X(\tau) + Z (\tau) $$
(5.6)

for all 0≤τ≤1. Below we will prove that this assumption is indeed true. In the following Δ denotes the projection of the difference from Y to \(\widetilde{Y}\) on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\):

$$ \varDelta:= P(Y) (\widetilde{Y} - Y) \in\mathcal{T}_Y \mathcal{M}. $$

Obviously \(\| \varDelta\|_{F} \leq \| \widetilde{Y} - Y \|_{F} = \delta \) holds. The projection of (5.6) on the tangent space leads to the equation

$$ P(Y) \bigl(X(\tau) -Y \bigr) = \tau\varDelta, $$
(5.7)

and differentiating with respect to τ gives

$$ P(Y) \bigl(\dot{X}(\tau)\bigr) = \varDelta. $$

The derivative \(\dot{X}(\tau)\) belongs to the tangent space of X(τ), therefore

$$ P\bigl(X(\tau)\bigr) \dot{X} (\tau) = \dot{X}(\tau) $$

is satisfied, which leads to

$$\begin{aligned} \dot{X} (\tau) = & P\bigl(X(\tau)\bigr) \dot{X}(\tau) + \underbrace{ \varDelta- P(Y) \dot{X}(\tau)}_{=0} \\ = & \varDelta+ \bigl(P\bigl(X(\tau)\bigr)- P(Y)\bigr) \dot{X} (\tau). \end{aligned}$$
(5.8)

In part 6 of the proof it will be shown that

$$ \bigl\| P\bigl(X(\tau)\bigr) - P(Y) \bigr\|_F \leq \widetilde{c} < 1 $$
(5.9)

for all 0≤τ≤1. Together with (5.8) this yields

$$\begin{aligned} \bigl\| \dot{X} ( \tau) \bigr\|_F \leq& \| \varDelta\|_F + \bigl\| P\bigl( X(\tau)\bigr) - P(Y) \bigr\|_F \bigl\| \dot{X} ( \tau) \bigr\|_F \leq\delta+ \widetilde{c} \bigl\| \dot{X} ( \tau) \bigr\|_F \end{aligned}$$

and hence

$$\begin{aligned} \bigl\| \dot{X} (\tau) \bigr\|_F \leq& \delta (1 - \widetilde{c} )^{-1}. \end{aligned}$$
(5.10)

Estimate (5.9) guarantees for all τ∈[0,1] the convergence of the Neumann series for P(X(τ))−P(Y). Hence, the operator I−[P(X(τ))−P(Y)] is invertible. Defining the function

$$ G \bigl(X( \tau) \bigr) := \sum_{k=0}^\infty \bigl( P\bigl(X( \tau)\bigr) - P(Y) \bigr)^k \varDelta= \bigl[ I - \bigl( P\bigl(X(\tau)\bigr) - P(Y)\bigr) \bigr]^{-1} \varDelta $$

leads to the equations

$$ \dot{X} (\tau) = \varDelta+ \bigl(P\bigl(X( \tau)\bigr)- P(Y)\bigr) \dot{X} (\tau) = G \bigl(X( \tau) \bigr). $$
(5.11)

Then, the existence of a solution X(τ) of (5.11), (5.8) respectively, with initial value X(0)=Y follows from the theory of ordinary differential equations and (5.10). Together with \(\widetilde{c} <1\) the estimates

$$\begin{aligned} & \bigl\| P^\bot\bigl( X (\tau)\bigr) \bigl( P (Y) \dot{X} (\tau) - P(Y) \varDelta\bigr) \bigr\|_F \\ & \quad= \bigl\| \bigl[I -P \bigl( X (\tau)\bigr) \bigr] P(Y) \bigl( \dot{X} (\tau) - \varDelta\bigr) \bigr\|_F \\ & \quad= \bigl\| \bigl[ P(Y) -P \bigl( X (\tau)\bigr) \bigr] \bigl( P (Y) \dot{X} (\tau) - \varDelta\bigr) \bigr\|_F \\ & \quad= \bigl\| \bigl[ P(Y) -P \bigl( X (\tau)\bigr) \bigr] \bigl[ P\bigl( X (\tau) \bigr) + P^\bot\bigl( X (\tau)\bigr) \bigr] \bigl( P (Y) \dot {X} ( \tau) - \varDelta\bigr) \bigr\|_F \\ & \quad= \bigl\| \bigl[ P(Y) -P \bigl( X (\tau)\bigr) \bigr] P^\bot\bigl( X (\tau)\bigr) \bigl( P (Y) \dot{X} (\tau) - \varDelta\bigr) \bigr \|_F \\ & \quad\leq \bigl\| P \bigl( X (\tau)\bigr) - P(Y) \bigr\|_F \bigl\| P^\bot\bigl( X (\tau)\bigr) \bigl( P (Y) \dot{X} (\tau) - \varDelta \bigr) \bigr\|_F \\ & \quad\leq \widetilde{c} \bigl\| P^\bot\bigl( X (\tau)\bigr) \bigl( P (Y) \dot{X} (\tau) - \varDelta\bigr) \bigr\|_F \end{aligned}$$

show

$$ P^\bot\bigl( X (\tau)\bigr) \bigl( P (Y) \dot {X} ( \tau) - \varDelta\bigr) = 0. $$

Moreover the projection of (5.8) on the tangent space of X(τ) leads to

$$ P \bigl( X (\tau)\bigr) \bigl( P (Y) \dot{X} (\tau) - \varDelta \bigr) = 0 $$

and hence

$$ P(Y) \dot{X} (\tau) = \varDelta. $$
(5.12)

Since integrating (5.12) leads to (5.7), setting

$$ Z(\tau) = \tau P^\bot(Y) ( \widetilde{Y} -Y) - P^\bot(Y) X(\tau) $$

shows the existence of the decomposition stated above.

Part 2: In this part of the proof, we prove a number of auxiliary inequalities for later use. Since X(0)=Y by construction, the fundamental theorem of calculus and (5.10) reveal the estimate

$$ \bigl\| X (\tau) - Y \bigr\|_F \leq\tau\sup_{s \in[0, \tau] } \bigl\| \dot{X} (s) \bigr\|_F \leq\delta(1- \widetilde{c} )^{-1} $$

because τ≤1. Let \(\mathbb{U}^{\mathcal{N}}(\tau)\) and \(\mathbb{S}^{\mathcal{N}} (\tau)\) be the orthonormal basis matrix and the single-hole-matrix of X(τ), i.e. \((X(\tau))^{(\mathcal {N},\mathcal{N}_{C})}=\mathbb{U}^{\mathcal{N}}(\tau)\mathbb {S}^{\mathcal{N}}(\tau)^{T}\). Under Assumption (5.3), Lemma 9 can be applied and provides the bound

$$\begin{aligned} \bigl\| \mathbb{S}^\mathcal{N}(\tau)^+ \bigr\|_2 \leq& \frac{1}{\rho_\mathcal{N}-(1- \widetilde{c})^{-1} \delta} \leq \frac{1}{\rho_\mathcal{N}-(1- \widetilde{c})^{-1} c_\mathcal{N}\rho_\mathcal{N}} \\ = & \frac{1- \widetilde{c}}{\rho_\mathcal{N}( 1- \widetilde{c} - c_\mathcal{N})} \end{aligned}$$
(5.13)

for all 0≤τ≤1 for the pseudoinverse \(\mathbb{S}^{\mathcal{N}}(\tau)\). Next, a bound for \(\| \dot{\mathbb{U}}^{\mathcal{N}}\mathbb {S}^{\mathcal{N}}(\tau)^{T} \|_{F}\) is shown. Since \(P^{\mathcal{N}}(\tau)^{\bot}= I - \mathbb{U}^{\mathcal{N}}(\tau) \mathbb{U}^{\mathcal{N}}(\tau)^{T} \) by definition, it follows from

$$\begin{aligned} \mathbb{U}^\mathcal{N}(\tau)^T \mathbb{U}^\mathcal{N}(\tau) = I \quad\mbox{and} \quad\mathbb{U}^\mathcal{N}( \tau)^T\dot{\mathbb{U}}^\mathcal{N}(\tau) = 0 \end{aligned}$$
(5.14)

that

$$\begin{aligned} \dot{\mathbb{U}}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau )^T = & P^\mathcal{N}(\tau)^\bot\bigl(\dot{\mathbb{U}}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^T + \mathbb{U}^\mathcal{N}(\tau) \dot{ \mathbb{S}}^\mathcal{N}(\tau)^T \bigr) \\ = & P^\mathcal{N}(\tau)^\bot\dot{X}^{(\mathcal{N},\mathcal {N}_C)}(\tau). \end{aligned}$$

With \(\| P^{\mathcal{N}}(\tau)^{\bot}\|_{2}\leq1\) and (5.10), we thus obtain

$$ \bigl\| \dot{\mathbb{U}}^\mathcal{N}\mathbb{S}^\mathcal{N}(\tau)^T \bigr\|_F \leq\bigl\| P^\mathcal{N}(\tau)^\bot \bigr\|_{2} \cdot\bigl\| \dot{X}^{(\mathcal{N},\mathcal {N}_C)}(\tau) \bigr\|_F \leq \delta(1- \widetilde{c} )^{-1}. $$

Combining this with (5.13) yields

$$\begin{aligned} \bigl\| \dot{\mathbb{U}}^\mathcal{N}(\tau) \bigr\|_F \leq& \bigl\| \dot{ \mathbb{U}}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^T \bigr\|_F \bigl\| \mathbb{S}^\mathcal{N}(\tau)^+ \bigr\|_2 \leq\delta(1- \widetilde{c} )^{-1} \frac{1- \widetilde{c}}{\rho_\mathcal{N}( 1 - \widetilde{c} - c_\mathcal{N})} \\ = & \frac{\delta}{\rho_\mathcal{N}(1- \widetilde{c} - c_\mathcal{N})}. \end{aligned}$$

With this inequality, the derivative of \(P^{\mathcal{N}}(\tau)^{\bot}= I - \mathbb{U}^{\mathcal{N}}(\tau) \mathbb{U}^{\mathcal{N}}(\tau)^{T} \) can be bounded by

$$ \bigl\Vert\dot{P}^\mathcal{N}(\tau)^\bot\bigr\Vert _F \leq\frac{2 \delta}{\rho_\mathcal{N}(1- \widetilde{c} - c_\mathcal{N})}. $$

A bound for the derivative of the single-hole-matrix \(\mathbb {S}^{\mathcal{N}}(\tau)\) of X(τ) can be derived from the orthonormality (5.14) and (5.10) via

$$\begin{aligned} \bigl\Vert\dot{\mathbb{S}}^\mathcal{N}(\tau)^T \bigr \Vert_F = & \bigl\Vert\mathbb{U}^\mathcal{N}( \tau)^T \dot{\mathbb{U}}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}( \tau)^T + \mathbb{U}^\mathcal{N}(\tau)^T \mathbb{U}^\mathcal{N}( \tau) \dot{\mathbb{S}}^\mathcal{N}(\tau)^T \bigr\Vert_F \\ = & \bigl\Vert\mathbb{U}^\mathcal{N}(\tau)^T \dot{X}^{(\mathcal{N}, \mathcal{N}_C)} (\tau) \bigr\Vert_F \\ \leq& \delta(1 - \widetilde{c} )^{-1}. \end{aligned}$$
(5.15)

Next, we want to estimate the derivative of \(\mathbb{S}^{\mathcal{N}}(\tau) \mathbb{S}^{\mathcal{N}}(\tau)^{+} \), which can be written as

$$\begin{aligned} \frac{d}{d \tau} \bigl[\mathbb{S}^\mathcal{N}(\tau) \mathbb {S}^\mathcal{N}(\tau)^+ \bigr] =& \frac{d}{d \tau} \bigl[\mathbb{S}^\mathcal{N}(\tau) \bigl( \mathbb{S}^\mathcal{N}(\tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr)^{-1} \mathbb{S}^\mathcal{N}(\tau)^T \bigr] \\ = & T_1(\tau) + T_2(\tau) + T_1^T( \tau) \end{aligned}$$

with

$$\begin{aligned} T_1(\tau) = & \dot{\mathbb{S}}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^+ \\ T_2(\tau) = & \mathbb{S}^\mathcal{N}(\tau) \frac{d}{d \tau} \bigl[ \bigl( \mathbb{S}^\mathcal{N}(\tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr)^{-1} \bigr] \mathbb{S}^\mathcal{N}(\tau)^T. \end{aligned}$$

It follows from (5.13) and (5.15) that

$$\begin{aligned} \| T_1 \|_F \leq& \bigl\| \mathbb{S}^\mathcal{N}(\tau)^+ \bigr\|_2 \cdot\bigl\| \dot{\mathbb{S}}^\mathcal{N}(\tau) \bigr \|_F \leq \frac{1 - \widetilde{c}}{\rho_\mathcal{N}(1 - \widetilde{c} - c_\mathcal{N})} \delta(1 - \widetilde{c} )^{-1} \\ = & \frac{\delta}{\rho_\mathcal{N}(1 - \widetilde{c} - c_\mathcal{N})}. \end{aligned}$$

In order to bound T 2 we substitute

$$\begin{aligned} & \frac{d}{d \tau} \bigl[ \bigl( \mathbb{S}^\mathcal{N}( \tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr)^{-1} \bigr] \\ & \quad= - \bigl(\mathbb{S}^\mathcal{N}(\tau)^T \mathbb {S}^\mathcal{N}(\tau) \bigr)^{-1} \frac{d}{d \tau} \bigl[ \mathbb{S}^\mathcal{N}( \tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr] \bigl(\mathbb {S}^\mathcal{N}( \tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr)^{-1} \end{aligned}$$

and obtain

$$\begin{aligned} T_2(\tau) = & - \bigl(\mathbb{S}^\mathcal{N}(\tau)^+ \bigr)^T \frac{d}{d \tau} \bigl[ \mathbb{S}^\mathcal{N}( \tau)^T \mathbb{S}^\mathcal{N}(\tau) \bigr] \mathbb{S}^\mathcal {N}(\tau)^+ \\ = & - T_1^T(\tau) \mathbb{S}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^+ - \bigl(\mathbb{S}^\mathcal {N}(\tau)^+ \bigr)^T\mathbb{S}^\mathcal{N}(\tau)^T T_1(\tau). \end{aligned}$$

Since \(\|\mathbb{S}^{\mathcal{N}}(\tau)\mathbb{S}^{\mathcal{N}}(\tau )^{+}\|_{2} = 1\), this gives the bound ∥T 2(τ)∥ F ≤2∥T 1(τ)∥ F and in total

$$\begin{aligned} \biggl\Vert\frac{d}{d \tau} \bigl[ \bigl( \mathbb{S}^\mathcal {N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^+\bigr)^T \bigr] \biggr\Vert _F \leq4 \bigl\| T_1(\tau) \bigr\|_F \leq \frac{4 \delta}{ \rho_\mathcal{N}(1- \widetilde{c} - c_\mathcal{N})}. \end{aligned}$$

Part 3: Since \(( Y, \widetilde{Y} ) \in ( \mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) is path-connected with respect to ODE (5.2), we have \(X(1) = \widetilde{Y}\). As a first step towards an error bound for \(\| ( P(\widetilde{Y}) - P(Y)) B \|_{F} \) we use

$$ \bigl[ P(\widetilde{Y}) - P(Y) \bigr] B = \bigl[ P \bigl(X(1)\bigr) - P\bigl(X(0)\bigr) \bigr] B = \int_0^1 \frac{d}{d \tau} \bigl[ P\bigl(X(\tau)\bigr) B \bigr] \,d \tau $$
(5.16)

and replace P(X(τ))B by the representation from Corollary 8. This yields

$$\begin{aligned} & \bigl\| \bigl( P(\widetilde{Y}) - P(Y)\bigr) B \bigr\|_F \\ & \quad\leq \max_{\tau\in[0,1]} \biggl\Vert \frac{d}{d \tau} \bigl[ \bigl( \mathbb{U}^{\mathcal{N}_{01}} (\tau) \odot\mathbb{U}^{\mathcal{N}_{02}} (\tau) \bigr) \bigl( \mathbb{U}^{\mathcal{N}_{01}} (\tau) \odot \mathbb{U}^{\mathcal{N}_{02}} (\tau) \bigr)^T B^{(\mathcal{N}_0, \{ \} )} \bigr] \biggr\Vert_F \end{aligned}$$
(5.17)
$$\begin{aligned} & \qquad {}+ \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1,\mathcal{N}_2) = \mathit{succ} ( \mathcal{N})} } \max_{\tau\in[0,1]} \biggl\Vert\frac{d}{d \tau} \bigl[ \mathbb{W}^{(\mathcal{N}_1,\mathcal{N}_2)}(\tau) P^\mathcal {N}(\tau)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \\ &\qquad{}\times\bigl( \mathbb{S}^\mathcal{N}(\tau) {\mathbb{S}^\mathcal{N}}^+ (\tau) \bigr)^T \bigr] \biggr\Vert_F \end{aligned}$$
(5.18)
$$\begin{aligned} & \qquad {}+ \sum_{\ell\in\mathcal{L}(T) } \max _{\tau\in[0,1]} \biggl\Vert\frac{d}{d \tau} \bigl[ P^\ell(\tau)^\bot B^{(\ell, \ell_C)} \bigl( \mathbb{S}^\ell(\tau) {\mathbb{S}^\ell}^+ (\tau) \bigr)^T \bigr] \biggr\Vert_F, \end{aligned}$$
(5.19)

where

$$ \mathbb{W}^{(\mathcal{N}_1,\mathcal{N}_2)}(\tau) := \bigl( \mathbb {U}^{\mathcal{N}_1} (\tau) \odot\mathbb{U}^{\mathcal{N}_2} (\tau) \bigr) \bigl( \mathbb{U}^{\mathcal{N}_1} (\tau) \odot\mathbb {U}^{\mathcal{N}_2} (\tau) \bigr)^T $$

and \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\).

Part 4: Now for each of the three terms (5.17), (5.18), (5.19), a suitable bound has to be derived. Assuming \(\mathcal{N}\) to be an interior node with successors \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\) allows us to bound

$$\begin{aligned} \biggl\Vert\frac{d}{d \tau} \mathbb{W}^{(\mathcal{N}_1,\mathcal {N}_2)}(\tau) \biggr\Vert_2 = & \biggl \Vert\frac{d}{d \tau} \bigl[ \bigl( \mathbb{U}^{\mathcal{N}_1} (\tau) \odot \mathbb{U}^{\mathcal{N}_2} (\tau) \bigr) \bigl( \mathbb {U}^{\mathcal{N}_1} (\tau) \odot \mathbb{U}^{\mathcal{N}_2} (\tau) \bigr)^T \bigr] \biggr\Vert_2 \\ \leq& 2 \biggl\Vert\frac{d}{d \tau} \bigl[ \mathbb {U}^{\mathcal{N}_1} (\tau) \odot\mathbb{U}^{\mathcal{N}_2} (\tau) \bigr] \biggr\Vert_2 \\ \leq& 2 \bigl\Vert\dot{\mathbb{U}}^{\mathcal{N}_1} (\tau) \odot\mathbb{U}^{\mathcal{N}_2} ( \tau) \bigr\Vert_2 + 2 \bigl\Vert\mathbb{U}^{\mathcal{N}_1} (\tau) \odot\dot{ \mathbb{U}}^{\mathcal{N}_2} (\tau) \bigr\Vert_2 \\ \leq& 2 \bigl\Vert\dot{\mathbb{U}}^{\mathcal{N}_1} (\tau)\bigr \Vert_2 + 2 \bigl\Vert \dot{\mathbb{U}}^{\mathcal{N}_2} (\tau) \bigr\Vert_2 \\ \leq& \frac{2 \delta}{\rho_{\mathcal{N}_1} ( 1 - \widetilde{c} - c_{\mathcal{N}_1})} + \frac{2 \delta}{\rho_{\mathcal{N}_2} ( 1 - \widetilde{c} - c_{\mathcal{N}_2})}. \end{aligned}$$

Hence, the inequality

$$\begin{aligned} & \biggl\Vert\frac{d}{d \tau} \bigl[ \bigl( \mathbb{U}^{\mathcal{N}_{01}} (\tau) \odot\mathbb{U}^{\mathcal {N}_{02}} (\tau) \bigr) \bigl( \mathbb{U}^{\mathcal{N}_{01}} (\tau) \odot\mathbb{U}^{\mathcal {N}_{02}} (\tau) \bigr)^T B^{(\mathcal{N}_0, \{ \} )} \bigr] \biggr\Vert_F \\ & \quad\leq \frac{2 \delta}{\rho_{\mathcal{N}_{01}} ( 1 - \widetilde{c} - c_{\mathcal{N}_{01}})} \| B \|_F + \frac{2 \delta}{\rho_{\mathcal{N}_{02}} ( 1 - \widetilde{c} - c_{\mathcal{N}_{02}})} \| B \|_F \end{aligned}$$
(5.20)

holds for all tensors \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).

Next, we consider (5.18). Assuming \(\mathcal{N}\neq\mathcal{N}_{0}\) allows us to deduce the estimate

$$\begin{aligned} & \biggl\Vert\frac{d}{d \tau} \bigl[ \mathbb{W}^{(\mathcal {N}_1,\mathcal{N}_2)}( \tau) P^\mathcal{N}(\tau)^\bot B^{(\mathcal{N}, \mathcal{N}_C)} \bigl( \mathbb{S}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^+ \bigr)^T \bigr] \biggr\Vert_F \\ & \quad\leq\biggl\Vert\frac{d}{d \tau} \bigl[ \mathbb{W}^{(\mathcal{N}_1,\mathcal{N}_2)}(\tau) \bigr] \biggr \Vert_2 \| B \|_F \bigl\Vert\bigl( \mathbb{S}^\mathcal{N}( \tau) \mathbb{S}^\mathcal{N}(\tau)^+ \bigr)^T \bigr\Vert_2 \\ & \qquad{}+ \bigl\Vert\mathbb{W}^{(\mathcal{N}_1,\mathcal {N}_2)}(\tau) \bigr\Vert _2 \biggl\Vert\frac{d}{d \tau} \bigl[ P^\mathcal{N}( \tau)^\bot\bigr] \biggr\Vert_F \| B \|_F \bigl\Vert\bigl( \mathbb{S}^\mathcal{N}(\tau) \mathbb {S}^\mathcal{N}( \tau)^+ \bigr)^T \bigr\Vert_2 \\ & \qquad{}+ \bigl\Vert\mathbb{W}^{(\mathcal{N}_1,\mathcal {N}_2)}(\tau) \bigr\Vert _2 \| B \|_F \biggl\Vert\frac{d}{d \tau} \bigl[ \bigl( \mathbb{S}^\mathcal{N}(\tau) \mathbb{S}^\mathcal{N}(\tau)^+ \bigr)^T \bigr] \biggr\Vert_2 \\ & \quad\leq\biggl(\frac{2 \delta}{\rho_{\mathcal{N}_1} ( 1 - \widetilde{c} - c_{\mathcal{N}_1})} + \frac{2 \delta}{\rho_{\mathcal{N}_2} ( 1 - \widetilde{c} - c_{\mathcal{N}_2})} + \frac{6 \delta}{\rho _\mathcal{N}( 1 - \widetilde{c} - c_\mathcal{N})} \biggr) \| B \|_F . \end{aligned}$$
(5.21)

Finally, let be a leaf of the dimension tree. Then, for the term in (5.19) we obtain the estimate

$$\begin{aligned} &\biggl\Vert\frac{d}{d \tau} \bigl[ P^\ell( \tau)^\bot B^{(\ell, \ell_C)} \bigl( \mathbb{S}^\ell(\tau) \mathbb{S}^\ell(\tau)^+ \bigr) \bigr] \biggr\Vert_F \\ & \quad\leq \biggl\Vert\frac{d}{d \tau} \bigl[ P^\ell( \tau)^\bot\bigr] \biggr\Vert_F \| B \|_F \bigl\Vert\bigl( \mathbb{S}^\ell(\tau) \mathbb{S}^\ell(\tau)^+ \bigr)^T \bigr\Vert_2 \\ & \qquad{}+ \| B \|_F \biggl\Vert\frac{d}{d \tau} \bigl[ \bigl( \mathbb{S}^\ell(\tau) \mathbb{S}^\ell(\tau)^+ \bigr)^T \bigr] \biggr\Vert_2 \\ & \quad\leq\frac{6 \delta}{\rho_\ell( 1 - \widetilde{c} - c_\ell )} \| B \|_F . \end{aligned}$$
(5.22)

Part 5: Substituting (5.20), (5.21), (5.22) into (5.17), (5.18), (5.19) yields the first assertion (5.4):

$$\begin{aligned} &\bigl\| \bigl( P(\widetilde{Y}) - P(Y)\bigr) B \bigr\|_F \\ & \quad\leq \frac{2 \delta}{\rho_{\mathcal{N}_{01}} ( 1 - \widetilde{c} - c_{\mathcal{N}_{01}})} \| B \|_F + \frac{2 \delta}{\rho_{\mathcal{N}_{02}} ( 1 - \widetilde{c} - c_{\mathcal{N}_{02}})} \| B \|_F \\ & \qquad {}+ \sum_{{\scriptstyle \mathcal{N}\in\mathcal{I}(T) \backslash\{ \mathcal{N}_0 \}\atop\scriptstyle (\mathcal{N}_1,\mathcal{N}_2) = \mathit{succ} ( \mathcal{N})} } \biggl( \frac{2 \delta}{\rho_{\mathcal{N}_1} ( 1 - \widetilde{c} - c_{\mathcal{N}_1})} + \frac{2 \delta}{\rho_{\mathcal{N}_2} ( 1 - \widetilde{c} - c_{\mathcal{N}_2})} \\ &\qquad{}+ \frac{6 \delta}{\rho_\mathcal{N}( 1 - \widetilde{c} - c_\mathcal{N})} \biggr)\| B \|_F \\ & \qquad {}+ \sum_{\ell\in\mathcal{L}(T) } \frac{6 \delta}{\rho _\ell( 1 - \widetilde{c} - c_\ell)} \| B \|_F \\ & \quad= \sum_{ \mathcal{N}\in T\setminus\{\mathcal{N}_0\} } \frac {8 \delta}{\rho_{\mathcal{N}} ( 1 - \widetilde{c} - c_{\mathcal{N}})} \| B \|_F , \end{aligned}$$

where \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\). The proof of the second assertion (5.5) is much shorter. With \(P(X (\tau)) \dot{X} (\tau)=\dot{X}(\tau)\) we obtain

$$\begin{aligned} P^\bot(Y) (\widetilde{Y}-Y) = \int_0^1 \bigl(I - P(Y)\bigr) \dot{X} (\tau) \,d \tau= \int_0^1 \bigl( P\bigl(X(\tau)\bigr) - P(Y) \bigr) \dot{X}(\tau) \,d \tau, \end{aligned}$$

and with (5.4) and (5.10) the error bound

$$\begin{aligned} \bigl\| P^\bot(Y) (\widetilde{Y}-Y) \bigr\|_F \leq& \max _{\tau\in[0,1]} \bigl\| P\bigl(X(\tau)\bigr) - P(Y) \bigr\|_F \cdot \bigl\| \dot{X}(\tau) \bigr\|_F \\ \leq& \sum_{ \mathcal{N}\in T\setminus\{\mathcal{N}_0\} } \frac {8 \delta}{\rho_{\mathcal{N}} ( 1 - \widetilde{c} - c_{\mathcal{N}})} \delta(1 - \widetilde{c} )^{-1} \\ = & \sum_{ \mathcal{N}\in T\setminus\{\mathcal{N}_0\} } \frac{8 \delta^2}{\rho_{\mathcal{N}} ( 1 - \widetilde{c} - c_{\mathcal{N}})(1 - \widetilde{c} )} \end{aligned}$$

follows.

Part 6: Finally, we can prove inequality (5.9) which had been used in part 1 of the proof. Assume (5.9) does not hold. Since \(\mathcal{M}\) is a smooth manifold, there exist 0<τ <1 and a tensor \(B^{*} \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) such that

$$\begin{aligned} \bigl\| P\bigl(X(s)\bigr) - P(Y) \bigr\|_F \leq& \widetilde{c} < 1 \quad\text{and} \\ \bigl\| \bigl( P\bigl(X\bigl(\tau^*\bigr)\bigr) -P(Y) \bigr) B^* \bigr\|_F = & \widetilde{c} \bigl\| B^* \bigr\|_F \end{aligned}$$
(5.23)

hold for all s∈[0,τ ]. Under these conditions all the estimates of the proof remain valid for τ∈[0,τ ]. However, in (5.16) we take the integral over the interval [0,τ ] instead of [0,1] and obtain with similar arguments the estimate

$$ \bigl\| \bigl(P\bigl(X\bigl(\tau^*\bigr)\bigr)- P(Y) \bigr) B \bigr\|_F \leq \tau^* \sum_{\mathcal{N}\in T\setminus\{\mathcal{N}_0\}} \frac{8 \delta}{\rho_\mathcal{N}( 1- \widetilde{c} - c_\mathcal{N})} \| B \|_F $$

for any tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). This results in a contradiction to (5.23) since

$$\begin{aligned} & \bigl\| \bigl(P\bigl(X\bigl(\tau^*\bigr)\bigr)- P(Y) \bigr) B^* \bigr\|_F \\ & \quad\leq \tau^* \sum_{\mathcal{N}\in T\setminus\{\mathcal {N}_0\}} \frac{8 \delta}{\rho_\mathcal{N}( 1- \widetilde{c} - c_\mathcal{N})} \bigl \| B^* \bigr\|_F \leq\tau^* \widetilde{c} \bigl\| B^* \bigr\|_F < \widetilde{c} \bigl\| B^* \bigr\|_F . \end{aligned}$$

 □

In order to simplify the previous theorem, we assume \(c_{\mathcal{N}}= c\) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and maximize this constant under the constraints of Lemma 10, i.e.

$$ \sum_{ \mathcal{N}\in T\setminus\{\mathcal{N}_0\} } 8 c_\mathcal {N}\bigl(\widetilde{c} ( 1 - \widetilde{c} - c_\mathcal{N})\bigr)^{-1} \leq1 \quad \text{and} \quad\widetilde{c} + c_\mathcal{N}< 1. $$

Since the number of nodes in the dimension tree is always 2d−1, the maximum will be attained for

$$ \widetilde{c} = - 8 (2d -2) + \sqrt{8^2 (2d-2)^2 + 8 (2d-2)} \quad\text{and} \quad c=\frac{- \widetilde{c}^2 + \widetilde{c}}{8 (2d-2) + \widetilde{c}}. $$

As \(\widetilde{c} \rightarrow\frac{1}{2}\) for d→∞, we let

$$ \widetilde{c} = \frac{1}{2} \quad\text{and} \quad c = ( 64d - 62)^{-1} $$

and obtain the next corollary.

Corollary 11

(curvature bounds)

Let \((Y,\widetilde{Y} ) \in ( \mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) tensors, which are path-connected with respect to ODE (5.2). Let ρ>0 such that

$$ \sigma_{r_\mathcal{N}} \bigl(Y^{(\mathcal{N}, \mathcal{N}_C)} \bigr) \geq\rho> 0 \quad\textit{and} \quad\| Y - \widetilde{Y} \|_F \leq(64 d-62)^{-1} \rho $$

is satisfied for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and β:=(32d−31)ρ −1. Then

$$\begin{aligned} \bigl\| \bigl(P(Y) - P(\widetilde{Y})\bigr) B \bigr\|_F \leq& \beta\| Y - \widetilde{Y} \|_F \| B \|_F \end{aligned}$$
(5.24)
$$\begin{aligned} \bigl\Vert P^\bot(Y) ( \widetilde{Y}-Y) \bigr\Vert_F \leq& 2 \beta\| Y - \widetilde{Y} \|_F^2 \end{aligned}$$
(5.25)

for any tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).

Proof

Inserting \(c = c_{\mathcal{N}}= ( 64d-62)^{-1}\) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and \(\widetilde{c} = \frac{1}{2}\) in the bounds (5.4) and (5.5) of Lemma 10 proves the assertion. □

6 Error analysis

We are finally ready to analyze the accuracy of the variational approximation \(Y(t)\in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) to the solution of (3.1). The error bound presented in Theorem 12 is the main result of this article. Very similar results have independently and simultaneously been obtained in [12].

Let \(Y_{ex}(t)\in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) be the solution of the differential equation (3.1), i.e. \(\dot{Y}_{ex}= F(Y_{ex})\) for all t∈[0,t end] with initial value \(Y_{ex}(0) \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). Let \(Y(t)\in\mathcal{M}= \mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) be the variational approximation defined by (3.2), i.e. \(\dot{Y} = P(Y) F(Y)\). For simplicity, it is assumed that \(Y_{ex}(0)\in\mathcal{M}\) and that Y(0)=Y ex (0). (Otherwise, an additional error term for the initial error has to be included in Theorem 12.) Moreover, let \(X(t) \in\mathcal{M}\) be the best approximation of Y ex (t) on \(\mathcal{M}\), i.e. for every t∈[0,T] we have

$$ \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \leq\bigl\| \widehat{X} - Y_{ex}(t) \bigr\|_F \quad\forall\widehat{X} \in\mathcal{M}. $$

Since by definition and by the triangle inequality we have

$$\begin{aligned} \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \leq\bigl\| Y(t) - Y_{ex}(t) \bigr\|_F \leq\bigl\| Y(t) - X(t) \bigr\|_F + \bigl\| X(t) - Y_{ex}(t) \bigr \|_F, \end{aligned}$$

it makes sense to prove a result which bounds the error ∥Y(t)−X(t)∥ F in terms of ∥X(t)−Y ex (t)∥ F . The following assumptions are made for all t∈[0,t end].

  1. (A1)

    X(t) is continuously differentiable.

  2. (A2)

    There is a constant μ>0 such that

    $$ \bigl\| F\bigl(Y_{ex}(t)\bigr) \bigr\|_F \leq\mu, \qquad\bigl\| F\bigl(X(t)\bigr) \bigr\|_F \leq\mu\quad\text{and} \quad \bigl\| F\bigl(Y(t)\bigr) \bigr\|_F \leq\mu. $$
  3. (A3)

    There exist constants L>0 and \(\lambda\in \mathbb{R}\) such that the Lipschitz conditions

    $$\begin{aligned} \bigl\| F\bigl(Y_{ex}(t)\bigr) - F\bigl(X(t)\bigr) \bigr\|_F \leq& L \bigl\| Y_{ex}(t) -X (t) \bigr\|_F \quad\text{and} \\ \bigl\langle F\bigl(Y (t)\bigr) - F\bigl(X(t)\bigr) , Y (t) - X(t) \bigr\rangle \leq& \lambda\bigl\| Y(t) -X (t) \bigr\|^2_F \end{aligned}$$

    are fulfilled.

  4. (A4)

    The singular values of the matricizations of the best approximation are bounded from below by

    $$ \sigma_{r_\mathcal{N}} \bigl( X^{(\mathcal{N}, \mathcal{N}_C)} (t) \bigr) \geq\rho> 0 \quad\text{for all } \mathcal{N}\in T\setminus\{\mathcal{N}_0\}. $$
  5. (A5)

    The distance from the best approximation X(t) to the exact solution Y ex (t) is bounded by

    $$ \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \leq\frac{1}{2 \beta} , $$

    where β=(32d−31)ρ −1.

  6. (A6)

    The tuple (Y(t),X(t)) is path-connected with respect to ODE (5.2).

Similar assumptions have been made in [8] where the approximation error of the non-hierarchical Tucker format has been analyzed.

Theorem 12

Under the above assumptions the difference between the variational approximation and the best approximation is bounded by

$$ \bigl\| Y(t) - X(t) \bigr\|_F \leq(L + 2 \mu\beta) \int_0^t \bigl\Vert Y_{ex}(s) -X(s) \bigr \Vert_F e^{(5 \mu\beta+ \lambda) (t-s) } \,ds $$
(6.1)

as long as the right-hand side of (6.1) is bounded by \(\frac{1}{2 \beta}\).

Proof

Because of X(t) being the best approximation, the deviation Y ex (t)−X(t) is orthogonal to the tangent space \(\mathcal{T}_{X} \mathcal{M}\),

$$ P\bigl(X(t)\bigr) \bigl( Y_{ex}(t) - X(t) \bigr) = 0 $$

and differentiating yields

$$\begin{aligned} 0 = & \frac{d}{dt} \bigl[ P\bigl(X(t)\bigr) \bigl( Y_{ex}(t) - X(t) \bigr) \bigr] \\ = & \bigl[ P'\bigl(X(t)\bigr) \cdot\bigl( Y_{ex}(t) - X(t) \bigr) \bigr] \dot{X} (t) + P\bigl(X(t)\bigr) \bigl( \dot{Y}_{ex}(t) - \dot{X}(t) \bigr). \end{aligned}$$

With \(\dot{X} (t) = P(X(t)) \dot{X}(t)\) we obtain

$$\begin{aligned} &\dot{X}(t) = P\bigl(X(t)\bigr) \dot{Y}_{ex}(t) + D(t) \\ &\quad\text{with } D(t) = P'\bigl(X(t)\bigr) \cdot\bigl( Y_{ex}(t) - X(t) \bigr) \dot{X}(t). \end{aligned}$$
(6.2)

Next, we derive an estimate for D(t). For an arbitrary tensor \(B \in \mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) the bound (5.24) yields

$$\begin{aligned} \bigl\| \bigl(P'\bigl(X(t)\bigr) \cdot B \bigr) \dot{X}(t) \bigr\|_F = & \biggl\Vert\frac{d}{dt} \bigl[ P\bigl(X(t)\bigr) B \bigr] \biggr\Vert_F \\ = & \lim_{h \rightarrow0} \frac{1}{h} \bigl\Vert\bigl[ P \bigl(X(t+h)\bigr) - P(X(t) \bigr] B \bigr\Vert_F \\ \leq& \lim_{h \rightarrow0} \frac{1}{h} \beta\bigl\Vert X(t+h) - X(t) \bigr\Vert_F \Vert B \Vert_F \\ = & \beta\bigl\Vert\dot{X}(t) \bigr\Vert_F \Vert B \Vert_F, \end{aligned}$$

and for B:=Y ex (t)−X(t) this yields

$$\begin{aligned} \bigl\| D(t) \bigr\|_F \leq& \beta\bigl\Vert \dot{X}(t) \bigr\Vert_F \bigl\Vert Y_{ex}(t) - X(t) \bigr\Vert _F . \end{aligned}$$
(6.3)

In the following we let δ(t):=∥Y ex (t)−X(t)∥ F and ε(t):=∥Y(t)−X(t)∥ F . Inserting (6.2) into (6.3) leads to

$$\begin{aligned} \bigl\| D(t) \bigr\|_F \leq& \beta\delta(t) \bigl( \bigl\Vert P \bigl(X(t)\bigr) \dot{Y}_{ex}(t) \bigr\Vert_F + \bigl\Vert D(t) \bigr \Vert_F \bigr). \end{aligned}$$

With Assumption (A5), i.e. \(\| \delta(t) \|_{F} \leq\frac{1}{2 \beta }\), we obtain the estimate

$$ \bigl\| D(t) \bigr\|_F \leq\beta\delta(t) \bigl\Vert P\bigl (X(t)\bigr) \dot{Y}_{ex}(t) \bigr\Vert_F + \frac{1}{2} \bigl\| D(t) \bigr\|_F , $$

and with Assumption (A2)

$$ \bigl\| D(t) \bigr\|_F \leq2\beta\delta(t) \bigl\Vert P\bigl (X(t)\bigr) \dot{Y}_{ex}(t) \bigr\Vert_F \leq2\beta\delta(t) \mu. $$

Subtracting (6.2) from \(\dot{Y} (t) = P (Y(t)) F(Y(t))\) gives

$$\begin{aligned} \dot{Y} (t) - \dot{X} (t) = & P\bigl(Y(t)\bigr) F\bigl( Y(t)\bigr) - P\bigl(X(t)\bigr) F\bigl( Y_{ex}(t)\bigr) - D(t) \\ = & \bigl[P\bigl(Y(t)\bigr) - P\bigl (X(t)\bigr) \bigr] F \bigl( X(t)\bigr) \\ & {}+ P\bigl(X(t)\bigr) \bigl[ F\bigl(X(t)\bigr) - F\bigl( Y_{ex}(t) \bigr) \bigr] + \bigl[ F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) \bigr] \\ & {}- P^\bot\bigl(Y(t)\bigr) \bigl[ F\bigl( Y(t)\bigr) - F \bigl(X(t)\bigr) \bigr] - D(t). \end{aligned}$$
(6.4)

Next, we want to find an estimate for the inner product \(\langle\dot {Y} (t) - \dot{X} (t) , Y(t) - X(t) \rangle\). For this purpose we consider the single terms on the right hand side of (6.4) and, via Corollary 11, obtain the following estimates:

  1. 1.
    $$\begin{aligned} &\bigl\langle\bigl[ P\bigl(Y(t)\bigr) -P\bigl(X(t)\bigr) \bigr] F\bigl(X(t) \bigr) , Y(t)-X(t) \bigr\rangle \\ &\quad\leq\bigl\| \bigl[ P\bigl(Y(t)\bigr) -P\bigl(X(t)\bigr) \bigr] F\bigl(X(t)\bigr) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad\leq\beta\bigl\| Y(t)-X(t) \bigr\|^2_F \bigl\| F\bigl(X(t) \bigr) \bigr\|_F \\ &\quad\leq\mu\beta\varepsilon^2 (t) , \end{aligned}$$
  2. 2.
    $$\begin{aligned} &\bigl\langle P\bigl(X(t)\bigr) \bigl[ F\bigl(X(t)\bigr)- F\bigl (Y_{ex}(t)\bigr) \bigr] , Y(t)-X(t) \bigr\rangle \\ &\quad\leq\bigl\| F\bigl(X(t)\bigr) - F\bigl(Y_{ex}(t)\bigr) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad\leq L \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad= L \delta(t) \varepsilon(t), \end{aligned}$$
  3. 3.
    $$ \bigl\langle F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) , Y(t)-X(t) \bigr\rangle\leq \lambda\bigl\| Y(t)-X(t) \bigr\|^2_F = \lambda \varepsilon^2 (t), $$
  4. 4.
    $$\begin{aligned} &\bigl\langle P^\bot\bigl(Y(t)\bigr) \bigl[F\bigl(Y(t)\bigr) - F \bigl(X(t)\bigr) \bigr] , Y(t)-X(t) \bigr\rangle \\ &\quad= \bigl\langle F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) , P^\bot \bigl(Y(t)\bigr) \bigl( Y(t)-X(t) \bigr) \bigr\rangle \\ &\quad\leq\bigl\| F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) \bigr\|_F \bigl\Vert P^\bot\bigl(Y(t)\bigr) \bigl( Y(t)-X(t) \bigr) \bigr \Vert _F \\ &\quad\leq\bigl( \bigl\| F\bigl(Y(t)\bigr) \bigr\|_F + \bigl\| F\bigl(X(t)\bigr) \bigr\|_F \bigr) 2 \beta\bigl\| Y(t)-X(t) \bigr\|_F^2 \\ &\quad\leq4 \mu\beta\varepsilon^2(t), \end{aligned}$$
  5. 5.
    $$ \bigl\langle D(t) , Y(t)-X(t) \bigr\rangle\leq\bigl\| D(t) \bigr\| _F \bigl\| Y(t)-X(t) \bigr\|_F \leq2\beta\mu\delta(t) \varepsilon(t) . $$

With these inequalities, we obtain

$$\begin{aligned} \varepsilon(t) \dot{\varepsilon}(t) = & \bigl\| Y(t) - X(t) \bigr \|_F \frac{d}{dt} \bigl\| Y(t) - X(t) \bigr\|_F \\ = & \frac{1}{2} \frac{d}{d t}\bigl\| Y(t) - X(t) \bigr\|^2_F \\ = & \bigl\langle\dot{Y}(t) - \dot{X}(t) , Y(t) - X(t) \bigr \rangle \\ \leq& \mu\beta\varepsilon^2 (t) + L \delta(t) \varepsilon(t) + \lambda\varepsilon^2 (t) + 4 \mu\beta\varepsilon^2(t) + 2\beta\delta(t) \mu\varepsilon(t) \end{aligned}$$

and hence

$$ \dot{\varepsilon}(t) \leq(5 \mu\beta+ \lambda) \varepsilon(t) + (L + 2 \mu \beta) \delta(t) . $$

Finally Gronwall’s inequality leads to the assertion

$$ \varepsilon(t) \leq(L +2 \mu\beta) \int_0^t \delta(t) e^{(5 \mu\beta+ \lambda) (t-s) } \,ds. $$

 □