Abstract
The hierarchical Tucker format is a way to decompose a high-dimensional tensor recursively into sums of products of lower-dimensional tensors. The number of degrees of freedom in such a representation is typically many orders of magnitude lower than the number of entries of the original tensor. This makes the hierarchical Tucker format a promising approach to solve ordinary differential equations for high-dimensional tensors. In order to propagate the approximation in time, differential equations for the parameters of the hierarchical Tucker format are derived from the Dirac-Frenkel variational principle. We prove an error bound for the dynamical approximation in the hierarchical Tucker format by extending previous results of Koch and Lubich for the non-hierarchical Tucker format.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Differential equations on high-dimensional state spaces arise in many different fields and applications. Typical examples are the Black-Scholes equation for basket options in mathematical finance, the chemical master equation used in systems biology to model the stochastic dynamics of a gene regulatory network, or the Schrödinger equation which describes the dynamics of the particles within a molecule according to the laws of quantum mechanics. Solving such problems numerically is notoriously difficult due to the fact that for standard discretizations the number of unknowns grows exponentially with respect to the dimension (“curse of dimensionality”). Hence, in high-dimensional approximations the main challenge is to compress the problem in such a way that the reduced equation can be solved numerically and nevertheless provides an acceptable approximation to the solution of the full problem.
If a multivariate function on a hyper-rectangle is discretized by an equidistant grid, the function values at the grid points can be regarded as the entries of a high-dimensional tensor. Hence, the approximation of high-dimensional functions and tensors is closely related, and for \(N_{1}, \ldots, N_{d} \in\mathbb{N}\) we will consider tensors \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) as functions Y=Y(i 1,…,i d ) with i k ∈{1,…,N k }. For a straightforward, entry-wise representation of Y, all \(\prod_{k=1}^{d}N_{k}\) values have to be stored. This number can be significantly reduced if Y can be approximated by a sum of products of univariate functions. Many such representations have been proposed, and an overview over the literature is given in [3, 9]. A particularly useful and popular representation is the orthogonal Tucker format
see, e.g., [9, Sect. 4]. Here, \(a\in\mathbb{R}^{r_{1} \times\cdots\times r_{d}}\) is the core tensor of coefficients, and for every direction k∈{1,…,d}, \(U^{k}_{1}, \ldots, U^{k}_{r_{k}}\) is a set of univariate basis functions which satisfy the orthonormality conditions
for all j,m∈{1,…,r k }. From the perspective of linear algebra, every
can be interpreted as a vector. Hence, (1.1) is a high-dimensional generalization of the well-known singular value decomposition, and \(\widetilde{Y}\) is a low-rank-approximation of Y(i 1,…,i d ), cf. [1]. For the representation (1.1) of \(\widetilde {Y}\) only
degrees of freedom have to be stored, which is a significant improvement if r k ≪N k for all k. Unfortunately, the fact that all possible combinations appear in the right-hand side of (1.1) causes again an exponential growth of the data (unless r k decreases to 1 when k→∞). For example, if r 1=⋯=r d =r, then the core tensor a(j 1,…,j d ) has r d entries.
The hierarchical Tucker format avoids the disadvantageous growth of degrees of freedom by using the approach (1.1) in a recursive way. For example, with univariate functions \(U^{\{k\}}_{j_{k}}:\{1, \ldots, N_{k}\}\rightarrow\mathbb{R}\) and coefficients \(\mathcal{B}^{\{1,2\}}_{m,j_{1},j_{2}}, \mathcal{B}^{\{3,4\} }_{l,j_{3},j_{4}} \in\mathbb{R}\), one can define bivariate functions
and then construct a four-dimensional tensor via
for some \(\mathcal{B}^{\{1,2,3,4\}}_{m,l}\in\mathbb{R}\). This idea has first been used in quantum chemistry to solve high-dimensional Schrödinger equations [13, 15, 16] and later been investigated from a mathematical point of view in [2, 4, 12, 14]; see also [10] for a useful Matlab toolbox. A formal definition of the hierarchical Tucker format will be given in Sect. 2. Due to the recursive definition, the hierarchical version is technically more involved than the standard Tucker format, but as we will discuss in Sect. 2, this avoids the exponential growth of the degrees of freedom.
The problem how a given tensor can be approximated in the hierarchical Tucker format up to a prescribed error tolerance has been studied in [2]. In the context of high-dimensional differential equations, however, an approximation for the unknown solution is sought after, and for time-dependent differential equations the question arises how an approximation in the hierarchical Tucker format can be propagated in time. For the non-hierarchical Tucker format, equations of motion for the core tensor and the basis functions have been derived in [8, 11] from the Dirac-Frenkel variational principle, and error bounds for the dynamical low-rank approximation in the non-hierarchical Tucker format have been proven in [8]. The main contribution of our article is to extend these results to the hierarchical Tucker format. We remark, however, that very similar results have independently and simultaneously been obtained in [12].
In Sect. 2 we introduce our notation, define the hierarchical Tucker format and discuss some of its properties. In Sect. 3, the hierarchical Tucker format is applied to approximate high-dimensional initial-value problems. In particular, differential equations for the parts of the representation are derived from the Dirac-Frenkel variational principle. The last three sections are devoted to the analysis of the accuracy of this approximation. The analysis is based on matricizations of tensors which enable a compact representation of certain projections defined in Sect. 4. The second important ingredient for the error analysis are curvature bounds which are proven in Sect. 5. Finally, an error bound for the approximation of high-dimensional differential equations in the hierarchical Tucker format is proven in Sect. 6.
2 Hierarchical low-rank representation of tensors: the \(\mathcal{H}\)-Tucker format
A tensor \(T \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) is a d-dimensional array with \(\prod_{j = 1}^{d} N_{j}\) entries. Defining the index sets
leads to the natural interpretation of tensors as multivariate functions
This interpretation is advantageous as it simplifies the construction of high-dimensional tensors as products of low-dimensional ones which are considered as functions that only depend on a subset μ={μ 1,…,μ m }⊆{1,…,d} of directions, i.e.
In order to define the multiplication of two tensors \(U : I_{\mu} \rightarrow\mathbb{R}\) and \(V : I_{\nu} \rightarrow \mathbb{R} \) for μ,ν⊆{1,…,d}, both tensors are interpreted as functions \(\widetilde{U}\) and \(\widetilde{V}\) in all variables κ={κ 1,…,κ k }=μ∪ν. For example, if U=U(i 2,i 4,i 7) and V=V(i 2,i 5), then we let \(\widetilde{U}(i_{2},i_{4},i_{5},i_{7})=U(i_{2},i_{4},i_{7})\) for all i 5∈I 5 and \(\widetilde {V}(i_{2},i_{4},i_{5},i_{7})=V(i_{2},i_{5})\) for all i 4∈I 4 and i 7∈I 7. Then, the product U⋅V is to be understood as the pointwise product of \(\widetilde{U}\) and \(\widetilde {V}\), i.e.
Besides multiplication of tensors, a second operation will be important in this article. Let μ={μ 1,…,μ m } and ν={ν 1,…,ν n } be subsets of {1,…,d}, let again κ={κ 1,…,κ k }=μ∪ν, and let λ={λ 1,…,λ m }=(μ∪ν)∖(μ∩ν) be the indices which occur either in μ or in ν. Then, with the convention that \(\mathbb{R}^{\emptyset} = \mathbb{R}\), we define
where the symbol \(\sum_{i_{\kappa_{j}}\in I_{\kappa_{j}}}^{(\kappa_{j}\in \mu\cap\nu)}\) means that the sum is taken only if κ j ∈μ∩ν. Hence, after multiplying the two tensors we take the sum over all common directions of the two tensors. For example, in case of the tensors U=U(i 2,i 3,i 5,i 9) and V=V(i 2,i 4,i 5), (2.1) simply reduces to
The result is a tensor which depends on all directions which are not common directions of U and V. Hence, if U and V depend on exactly the same directions (i.e. μ∩ν=μ=ν), then \(\langle U,V \rangle\in\mathbb {R}^{\emptyset} = \mathbb{R}\) is a real number, and it can be verified that 〈⋅,⋅〉 defines a scalar product in this case. If both U=U(i j ) and V=V(i j ) are one-dimensional tensors with respect to the same direction, then this scalar product coincides with the Euclidean scalar product of the two vectors (U(1),…,U(N j )) and (V(1),…,V(N j )). This is why we have chosen the symbol 〈⋅,⋅〉.
The numerical evaluation of 〈U,V〉 can be simplified if U and V are products of tensors. If, for example, U=U(i 2,i 5)=U {2}(i 2)U {5}(i 5) and V=V(i 2,i 5)=V {2}(i 2)V {5}(i 5), then
This will be frequently used in the proofs below.
The following definitions provide the framework of a hierarchical tensor structure.
Definition 2.1
(dimension tree)
A dimension tree T is a binary tree which represents a recursive decomposition of the set {1,…,d}. Every node \(\mathcal{N}\in T\) is a selection of directions, \(\mathcal {N}\subseteq\{1, \ldots, d \}\), where \(\mathcal{N}_{0} = \mathrm{root} (T) = \{1, \ldots, d \}\) is called the root of the tree. Apart from the root, every node \(\mathcal{N}\in T\) is linked by an edge with exactly one father \(\widehat{\mathcal{N}} \supset\mathcal{N}\). Conversely, every node is either linked with two successors (child nodes) \((\mathcal{N}_{1}, \mathcal{N}_{2} ) = \mathit{succ} (\mathcal{N})\), or it does not have any successors at all. The successors of \(\mathcal{N}\) have the properties that
\(\mathcal{N}\) is an interior node if
and \(\mathcal{N}\) is called a leaf of the dimension tree T if
Figure 1 shows an example of a dimension tree with inner nodes \(\mathcal{N}_{0}\), \(\mathcal{N}_{1}\) and leaves \(\mathcal{N}_{2}\), \(\mathcal{N}_{11}\), \(\mathcal{N}_{12}\).
Definition 2.2
(hierarchical Tucker format)
Let T be a dimension tree with root \(\mathcal{N}_{0}\) and let \(( r_{\mathcal{N}})_{\mathcal{N}\in T}\) be a family of positive integers with \(r_{\mathcal{N}_{0}} = 1\). Y is a tensor in the hierarchical Tucker format if and only if there exist transfer tensors
and univariate functions
such that Y can be represented by
with \(\boldsymbol{\mathcal{N}}\) -frames
defined by the recursion
The family \((r_{\mathcal{N}})_{\mathcal{N}\in T}\) is called the hierarchical representation rank of Y. The set of all tensors in the hierarchical Tucker format with hierarchical representation rank \((r_{\mathcal{N}})_{\mathcal{N}\in T}\) is denoted by \(\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\).
Remark
-
1.
Since the transfer tensors in Definition 2.2 are always three-dimensional objects, we do not interpret these particular tensors as multivariate functions.
-
2.
Every tensor \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) can be reconstructed from the transfer tensors \((\mathcal{B}^{\mathcal{N}})_{\mathcal{N}\in\mathcal{I}(T)}\) and the functions \((U^{\ell})_{\ell\in\mathcal{L}(T)} \) via the above recursion. The \(\mathcal{N}\)-frames at the interior nodes can be considered as auxiliary variables which are only used to reconstruct the full tensor Y if necessary.
-
3.
Definitions 2.1 and 2.2 have been adapted from Definitions 3.3 and 3.6 in [2]. At this point, it is not yet assumed that the \(\mathcal{N}\)-frames are linearly independent. For the approximation of time-dependent problems, however, we will later have to assume that the \(\mathcal{N}\)-frames at every node \(\mathcal{N}\in T\setminus\{ \mathcal{N}_{0}\}\) are an orthonormal basis, and that the above representation is not redundant, cf. Assumptions 1 and 2 below.
-
4.
A special case of the hierarchical Tucker format is the so-called tensor train format which has been investigated, e.g., in [5].
At this point, a comparison with the standard (non-hierarchical) Tucker format is helpful. For the hierarchical Tucker format, an equation similar to (1.1) can be obtained, but due to the recursive definition of the \(\mathcal {N}\)-frames, the corresponding formula is much more complicated. The crucial difference is the fact that the number of terms in the linear combination can be considerably lower. In fact, the total number of degrees of freedom in the hierarchical Tucker format is at most
because every \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) is defined by the r ℓ basis functions at the leaves ℓ={k} with k=1,…,d and by the entries of all transfer tensors, cf. Lemma 3.7 in [2].
Lemma 1
(orthonormality of transfer tensors)
Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor represented by \(((\mathcal{B}^{\mathcal{N}})_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)})\). Then, the following statements are equivalent:
-
(a)
The \(\mathcal{N}\)-frames \(( U^{\mathcal{N}} )_{\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \} }\) are orthonormal with respect to mapping (2.1), i.e.
$$ \bigl\langle U^\mathcal{N}_{i} , U^\mathcal{N}_{j} \bigr\rangle= \delta_{i,j} $$holds for all \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \} \) and \(i,j \in\{1, \ldots, r_{\mathcal{N}}\} \).
-
(b)
For all \(\mathcal{N}\in\mathcal {I}(T)\setminus\{ \mathcal{N}_{0}\}, (\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \dots, r_{\mathcal{N}}\}\) the equation
$$ \sum_{j=1}^{r_{\mathcal{N}_1}} \sum_{l=1}^{r_{\mathcal{N}_2}} \mathcal{B}^\mathcal{N}_{i_1,j,l} \mathcal{B}^\mathcal{N}_{i_2,j,l} = \delta_{i_1,i_2} $$(2.3)holds and the ℓ-frames \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) are orthonormal with respect to mapping (2.1), i.e.
$$ \bigl\langle U^\ell_{i} , U^\ell_{j} \bigr\rangle= \delta_{i,j} $$holds for all \(\ell\in\mathcal{L}(T), i,j \in\{1, \ldots, r_{\ell}\}\).
Proof
Let \(\mathcal{N}\in\mathcal{I}(T)\) be a node in the interior of the dimension tree, \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \ldots,r_{\mathcal{N}}\}\). If (a) holds, then it follows from Definition 2.2 that
which proves (a) ⇒ (b). Now assume that (b) holds, and that \(\mathcal{N}_{1}, \mathcal{N}_{2} \in\mathcal{L}(T)\). Then, Eq. (2.4) yields that
for the father \(\mathcal{N}\) of \(\mathcal{N}_{1}\) and \(\mathcal{N}_{2}\), and by induction over the levels of the tree we obtain (b) ⇒ (a). □
Assumption 1
For every tensor Y in the hierarchical Tucker format we assume the \(\mathcal{N}\)-frames \(( U^{\mathcal{N}} )_{\mathcal{N}\in T \setminus\{\mathcal{N}_{0}\}}\) to be orthonormal with respect to mapping (2.1), i.e.
Since for any \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) we can find a representation with orthonormal \(\mathcal{N}\)-frames, Assumption 1 is not a restriction; see [2, Algorithm 3] for an orthonormalization algorithm. It will be shown in Lemma 4 that when the tensor Y approximates the time-dependent solution of a high-dimensional differential equation, the differential equations for the \(\mathcal{N}\)-frames preserve the orthonormality for all times.
An important consequence of Assumption 1 is that for \(\mathcal{N}\in\mathcal{I}(T)\) with \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\), we have
for any \(k\in\{1, \ldots, r_{\mathcal{N}_{1}}\}\) and hence
Substituting this into the recursive representation of Y yields
for all \(\mathcal{N}_{1} \in T \setminus\{ \mathcal{N}_{0} \}\). These and similar formulas will be frequently used henceforth.
Lemma 2
(minimal representation rank)
Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format represented by \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)} , ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\). Assume that the tensors
are linearly independent for all \(\mathcal{N}\in T\setminus\{ \mathcal{N}_{0} \}\). Then, Y has minimal representation rank, which means that there exists no other family of positive integers \((r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T}\) and no node \(\mathcal{N}^{*} \in T\) with
such that \(Y \in\mathcal{H}\textit{-Tucker} ( ( r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T} )\).
Proof
Suppose that Y can also be represented by \(( ( \widetilde{\mathcal{B}}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( \widetilde{U}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{L}(T)} )\) and that the hierarchical representation rank \((r_{\mathcal{N}}^{*} )_{\mathcal{N}\in T}\) of this representation has the properties (2.6). According to (2.5) we have
Substituting the first representation into the second yields
and hence
This is a contradiction to the assumption that \(\langle Y , U^{\mathcal{N}^{*}}_{1} \rangle, \ldots, \langle Y , U^{\mathcal{N}^{*}}_{r_{\mathcal{N}^{*}}} \rangle\) are linearly independent. □
Remark
It can be shown that the converse assertion is also true.
Assumption 2
For every tensor Y in the hierarchical Tucker format and for every \(\mathcal{N}\in T \setminus\{\mathcal{N}_{0}\}\) we assume
to be linearly independent.
As shown in Sect. 4, under Assumption 2 the set \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) is a smooth manifold, which is essential in our approach. Furthermore we will use the term ‘hierarchical rank’ instead of ‘hierarchical representation rank’ and denote the \(\mathcal{N}\)-frames as basis functions. Assumption 2 and Lemma 2 motivate the following definition.
Definition 2.3
(Single-hole operator)
Let \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format, which is represented by the tuple \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal {I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\), \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0}\}\) and \(i \in\{1,\ldots , r_{\mathcal{N}}\}\). The single-hole operator \(\mathcal{S}\) is defined as
This is a generalization of the single-hole functions defined in Sect. II.3.3 in [11].
Remark
With the single-hole operator (2.5) reads
for all \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}),\mathcal{N}\in T \setminus\{\mathcal{N}_{0} \}\).
Definition 2.4
For \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}), \mathcal{N}\in T \setminus\{\mathcal{N}_{0} \} \) and \(X \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) the orthogonal projection of X onto the space spanned by \(U^{\mathcal{N}}_{1} , \ldots, U^{\mathcal{N}}_{r_{\mathcal{N}}}\) is denoted by
The orthogonal projection onto the orthogonal complement is given by \(\mathcal{P}_{\mathcal{N}}^{\bot}X = X- \mathcal{P}_{\mathcal{N}}X \).
3 Solving high-dimensional initial value problems via hierarchical tensor approximation
In this section we show how the hierarchical Tucker format can be used to approximate the time-dependent solution \(Y_{ex}(t)\in\mathbb {R}^{N_{1} \times\cdots\times N_{d}}\) of a high-dimensional ordinary differential equation
with a given initial value \(Y_{ex}(0) \in\mathbb{R}^{N_{1} \times \cdots\times N_{d}}\) and a function F mapping tensors to tensors. Such problems arise, e.g., when the method of lines is applied to a high-dimensional partial differential equation. If the partial differential equation is linear, then the function F is linear, too. The exact solution Y ex (t) of (3.1) is a time-dependent, d-dimensional tensor with \(\prod_{\mu= 1}^{d} N_{\mu}\) entries. Hence, classical numerical schemes for ordinary differential equations cannot be applied if d≫3 due to the huge number of unknowns. Therefore, we are looking for an approximation Y≈Y ex which lies on the manifold
at any time. For this purpose, the variational principle of Dirac-Frenkel is applied (cf. [11]): Starting with an approximation of the initial value on the manifold, the right-hand side F(Y) is projected onto \(\mathcal {T}_{Y}\mathcal{M}\), the tangent space of \(\mathcal{M}\) at Y(t), for every t≥0; see Fig. 2 for an illustration.
With P(Y) denoting the orthogonal projector on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) the differential equation for \(Y(t)\in\mathcal{M}\) takes the form
or equivalently
Even in applications where the original problem (3.1) is linear, the new differential equation (3.2) is nonlinear. The great advantage of (3.2) is the fact that for the propagation of Y(t) only the transfer tensors \(\mathcal{B}^{\mathcal{N}}(t)\), \(\mathcal{N}\in\mathcal{I}(T)\) and the basis functions \(U^{\ell}_{i}(t)\), \(\ell\in\mathcal{L}(T)\) for the time-dependent representation of Y(t) have to be computed. In order to make use of this advantage, however, (3.2) has to be replaced by differential equations for \(\mathcal{B}^{\mathcal{N}}(t)\) and \(U^{\ell}_{i}(t)\) instead of Y(t). This is the goal of this section.
Since every point Y on the manifold is determined by
the transfer tensors and the basis functions on the leaves of the dimension tree can be considered as the parameters of the representation. Let Ψ be the mapping which maps the parameters to the corresponding tensor, i.e.
\(\widetilde{Y}\) is constructed according to Definition 2.2. Then, each element δY of the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) can be written as
where the tuple \(( ( \delta\mathcal{B}^{\mathcal{N}} )_{\mathcal {N}\in\mathcal{I}(T)}, ( \delta U^{\ell})_{\ell\in\mathcal{L}(T)} )\) represents all possible variations of the transfer tensors and the basis functions on the leaves of the dimension tree. Since Ψ is linear in each argument, we find with
and
expressions for every partial derivative in (3.3). Therefore, the tangent space can be written as
Remark
Since Assumption 1 means no restriction for the set \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), the orthogonality conditions means no restriction for the tangent space as well. Even the restriction of mapping Ψ to transfer tensors and basis functions which fulfill Assumptions 2 does not influence the tangent space since the restricted set is a dense and open subset of the domain, cf. Sect. 3.2 in [14].
We are now ready to formulate the differential equations for the transfer tensors and the basis functions at the leaves. In view of the recursive definition of \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), it does not come at a surprise that these differential equations are defined recursively, too. The same result has also been obtained with other techniques in [12] and [14].
Theorem 3
(equations of motion)
Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format, let \((\mathcal{B}^{\mathcal{N}} )_{\mathcal{N} \in\mathcal{I}(T)}\) the corresponding transfer tensors, and let \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) the corresponding basis functions on the leaves of the dimension tree. Furthermore we define for every \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \}\) the symmetric stiffness matrix
and its inverse
Then, the differential equation
is equivalent to
with \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal {N}_{0})\). The time derivatives of the basis functions on an interior node of the dimension tree satisfy the recursion
for all \(\mathcal{N}\in\mathcal{I} (T) \setminus\{\mathcal{N}_{0} \} \) with \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ} (\mathcal{N})\). On the leaves of the dimension tree we have
The transfer tensors evolve according to the differential equations
and
We remark that the crucial terms are \(\dot{U}^{\ell}_{i}\) for \(\ell\in\mathcal{L}(T)\) and \(\dot{\mathcal{B}}^{\mathcal{N}}_{i,j,l}\) for \(\mathcal{N}\in \mathcal{I}(T)\). If these objects are known, all \(\dot{U}^{\mathcal{N}}_{i}\) with \(\mathcal{N}\in\mathcal{I} (T) \setminus\{\mathcal{N}_{0} \}\) can be computed recursively via (3.10). Hence, the time derivatives of the \(\mathcal{N}\)-frames at the interior nodes can be considered as auxiliary variables.
Before the proof of Theorem 3, we show that the solution of the equations of motion preserve the orthonormality of the basis functions.
Lemma 4
(Gauge-conditions)
Let \(( ( \mathcal{B}^{\mathcal{N}}(t) )_{\mathcal{N}\in\mathcal {I}(T)} , ( U^{\ell}(t) )_{\ell\in\mathcal{L}(T)} )\) be the solution of the equations of motion of Theorem 3, and let \(( (\dot{\mathcal{B}}^{\mathcal{N}}(t) )_{\mathcal{N}\in\mathcal{I}(T)} , ( \dot{U}^{\ell}(t) )_{\ell\in \mathcal{L}(T)} )\) be the corresponding time derivatives. Then the Gauge-condition
is satisfied for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\), \(i,j \in\{1, \ldots,r_{\mathcal{N}}\}\) and t≥0. Furthermore, the orthonormality of the basis functions
is preserved for all t≥0 if (3.12) holds for t=0.
Proof of Lemma 4
For all \(\mathcal{N}\in\mathcal{I}(T)\setminus\{\mathcal{N}_{0}\}\), \((\mathcal{N}_{1},\mathcal{N}_{2} ) = \mathit{succ}(\mathcal{N})\) and \(i_{1},i_{2} \in\{1, \ldots, r_{\mathcal{N}}\} \) the equations
hold. Furthermore, we have
for all \(\ell\in\mathcal{L}(T), i_{1},i_{2} \in\{1, \ldots, r_{\ell}\} \). Now suppose that
for an interior node \(\mathcal{N}\in\mathcal{I}(T)\setminus\{ \mathcal{N}_{0} \}\), \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) for all \(j_{1},j_{2} \in\{1, \ldots, r_{\mathcal{N}_{1}} \}\) and \(l_{1},l_{2} \in\{1, \ldots, r_{\mathcal{N}_{2}} \}\). Then with (3.13) we deduce
for all \(i_{1},i_{2} \in\{1, \ldots, r_{\mathcal{N}}\}\). Hence, the Gauge-conditions are satisfied, and the equations
show the preservation of the orthonormality. □
Proof of Theorem 3
In order to prove the theorem it has to be shown that the variational equation
is satisfied for every variation \(\delta Y \in\mathcal{T}_{Y} \mathcal{M}\).
(i) We start with the variation
for an arbitrary \(\delta\mathcal{B}^{\mathcal{N}_{0}} \in\mathbb {R}^{r_{\mathcal{N}_{1}} \times r_{\mathcal{N}_{2}}}, j \in\{1, \ldots, r_{\mathcal{N}_{1}} \},l \in\{1,\ldots, r_{\mathcal{N}_{2}} \} \). Together with (3.9) this yields
This proves that
(ii) Next, we assume \(\mathcal{N}\in\mathcal{I}(T)\setminus\{ \mathcal{N}_{0}\}\), \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ}(\mathcal{N}) \) and choose the variation
for an arbitrary \(\delta\mathcal{B}^{\mathcal{N}} \in\mathbb {R}^{r_{\mathcal{N}}\times r_{\mathcal{N}_{1}} \times r_{\mathcal{N}_{2}}}\), \(i \in\{1,\ldots, r_{\mathcal{N}} \}\), \(j \in\{1,\ldots, r_{\mathcal{N}_{1}} \}\), \(l \in\{1,\ldots, r_{\mathcal{N}_{2}} \} \). Deriving both sides of (2.7) with respect to t gives that
and since \(\mathcal{S}(Y, \mathcal{N}, i) \cdot U^{\mathcal{N}}_{i'} \in\mathcal{T}_{Y} \mathcal{M}\) we obtain
On the other hand
holds. Since (3.16) is equal to (3.17), it follows that
(iii) Finally, let \(\ell\in\mathcal{L}(T)\) and consider the variation
for an arbitrary \(\delta U^{\ell}_{i} : I_{\ell}\rightarrow\mathbb{R}, i\in\{1, \ldots,r_{\ell}\}\). Similar as before, the time derivative of (2.7) yields
and with \(\mathcal{S}( Y, \ell, i) \cdot U^{\ell}_{i'} \in\mathcal {T}_{Y} \mathcal{M}\) we get
On the other hand
holds. Since (3.19) is equal to (3.20), it follows that
The superposition of the variations (3.14), (3.15) and (3.18) span the whole tangent space and therefore
□
Corollary 5
Let \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) be an arbitrary tensor, let \(Y\in\mathcal{M}\) and let P(Y)B be the orthogonal projection of B onto the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) defined by
Then, formally replacing \(\dot{Y}\) by P(Y)B and F(Y) by B in Theorem 3 yields a recursive representation of P(Y)B.
4 Matrix representation of tensors
Since the solution Y ex of the original differential equation (3.1) does, in general, not evolve on the approximation manifold \(\mathcal{M} = \mathcal {H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\), the solution Y of the projected problem only yields an approximation Y≈Y ex . For the error analysis of this approximation, the matricization of a tensor is a useful concept. Such a matricization turns a tensor into a matrix with the same entries, just as the command reshape(…) in Matlab. For any \(\mathcal{N}\in T\) the set \(\mathcal{N}_{C}:=\{1, \ldots,d \} \setminus\mathcal{N}\) is called the complement node of \(\mathcal{N}\). Then, the matricization
of a tensor \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) rearranges the elements of the tensor in such a way that all the different directions in \(\mathcal{N}\) range over the rows whereas all the directions in \(\mathcal{N}_{C}\) range over the columns of the matricization. There are many different ways to do this, but it does not really matter which mapping is used. The only condition we impose is that \(Y^{(\mathcal{N}, \mathcal{N}_{C})} = ( Y^{(\mathcal{N}_{C}, \mathcal{N})} )^{T}\). For the inverse mapping the notation
will be used. A special case of a matricization is the vectorization
which reshapes a tensor to a vector. The corresponding inverse mapping is denoted by
For \(Y \in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) and \(\mathcal{N}\in\mathcal{I}(T)\backslash\{ \mathcal {N}_{0}\}\), let the single-hole matrix
be the matrix whose j-th column is the vectorization of \(\mathcal {S}(Y,\mathcal{N},j)\). Similarly, the orthonormal basis matrix is defined by
Assumptions 1 implies that \((\mathbb {U}^{\mathcal{N}})^{T}\mathbb{U}^{\mathcal{N}}=I\), and it follows from (2.7) that
Hence, the singular values of the matricization \(Y^{(\mathcal{N}, \mathcal{N}_{C})}\) coincide with the singular values of the single-hole matrix \(\mathbb{S}^{\mathcal{N}}\). The projection onto the space spanned by the columns of \(\mathbb {U}^{\mathcal{N}}\) and on the orthogonal complement will be denoted by
respectively. In addition we define for \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit {succ}(\mathcal{N})\) the binary matrix operator
After these preparations, the results of [14] can now be adapted to show that \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) is indeed a manifold.
Lemma 6
The set
is a smooth submanifold of \(\mathbb{R}^{N_{1} \times\cdots\times N_{d} }\).
Proof
In [14, Theorem 4.11] it has been shown that
is a smooth submanifold of \(\mathbb{R}^{N_{1} \times\cdots\times N_{d} }\). Hence, it only has to be shown that \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})=\widehat{\mathcal{M}}\). Let \(Y \in\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) be represented by \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\). By Assumption 2, the sequence \(\langle Y , U^{\mathcal{N}}_{1} \rangle, \ldots, \langle Y , U^{\mathcal{N}}_{r_{\mathcal{N}}} \rangle\) is linearly independent for all \(T\setminus\{\mathcal{N}_{0}\}\) , which means that
and therefore \(\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T}) \subseteq\widehat{\mathcal{M}}\). Conversely, for every \(\widehat{Y}\in\widehat{\mathcal{M}}\) Proposition 3.6 in [14] and orthogonalization with Algorithm 3 in [2] ensures the existence of \(( ( \mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}, ( U^{\ell})_{\ell\in\mathcal{L}(T)} )\) such that the equation
and the recursion
hold for all \(\mathcal{N}\in\mathcal{I}(T) , i \in\{1, \ldots, r_{\mathcal{N}}\}\) and \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\). The rank of the single-hole matrix \(\mathbb{S}^{\mathcal{N}}= ( X^{(\mathcal{N}, \mathcal{N}_{C})} )^{T} \mathbb{U}^{\mathcal{N}}\) equals \(r_{\mathcal{N}}\) for all \(\mathcal {N}\in T\setminus\{\mathcal{N}_{0}\}\), which means that the sequence \(\langle X , U^{\mathcal{N}}_{1} \rangle, \ldots, \langle X , U^{\mathcal{N}}_{r_{\mathcal{N}}} \rangle\) is linearly independent for all \(\mathcal{N}\in T \setminus\{ \mathcal{N}_{0} \}\) and therefore \(\widehat{\mathcal{M}} \subseteq\mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\). □
One of the main ingredients for the error analysis is a compact representation of the projection of a tensor on the tangent space of the manifold, i.e. an explicit representation which is not based on a recursion. Such a compact representation can be obtained in matrix-vector notation. As a preparatory step, we reformulate the results of Corollary 5 in matrix-vector notation.
Corollary 7
Let \(Y \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal {N}\in T})\) be a tensor in the hierarchical Tucker format represented by the transfer tensors \((\mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal{I}(T)}\) and by the basis functions \(( U^{\ell})_{\ell\in\mathcal{L}(T)}\) on the leaves of the dimension tree. If \(B \in\mathbb{R}^{N_{1} \times \cdots\times N_{d}}\) is an arbitrary tensor, then under the conditions of Corollary 5 there exists a tuple \(( ( \delta\mathcal{B}^{\mathcal{N}} )_{\mathcal{N}\in\mathcal {I}(T)} ,(\delta U^{\ell})_{\ell\in\mathcal{L}(T)} )\) such that for \((\mathcal {N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\)
with
For all \(\mathcal{N}\in\mathcal{I} (T) \backslash\{\mathcal{N}_{0} \} \) with \((\mathcal{N}_{1},\mathcal{N}_{2}) = \mathit{succ} (\mathcal{N})\), \(\delta U^{\mathcal{N}}\) is defined by the recursion
with
Here and below, M + is the pseudoinverse of a matrix M, i.e. M +=(M T M)−1 M T if M has full column rank.
Proof
The equation for \(\delta\mathcal{B}^{\mathcal{N}_{0}}\) can be deduced directly from Corollary 5. For \(\mathcal{N}\in\mathcal{I}(T)\backslash\{\mathcal{N}_{0} \} \) and \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal {N})\), Corollary 5 states that
for all \(j \in\{1,\ldots,r_{\mathcal{N}_{1}} \} , l \in\{1,\ldots ,r_{\mathcal{N}_{2}} \} \). Since \(W^{\mathcal{N}}\) defined in (3.8) satisfies
we obtain
Last, for \(\ell\in\mathcal{L}(T)\) and i=1,…,r ℓ we recall from Corollary 5 that
and substituting (4.1) with \(\mathcal{N}=\ell\) yields
□
With Corollary 7 we are now able to derive a compact representation for the projection of a tensor on the tangent space of the manifold.
Corollary 8
Under the assumptions of Corollary 7, the projection of a tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\) can be written as
where \((\mathcal{N}_{01}, \mathcal{N}_{02}) = \mathit{succ}(\mathcal{N}_{0})\).
Proof
By resolving the recursion in Corollary 5 and 7, respectively, we obtain the equation
see also Eqs. 3.3), (3.4), (3.5) and (3.6) in Sect. 3. A closer inspection of the first term (4.2) shows that
For the second term (4.3), we obtain
and together with
we obtain
Finally, the last term (4.4) can be reformulated as
This proves the assertion. □
5 Curvature bounds
In this section estimates for the curvature of the manifold are proven. These estimates, formulated in Lemma 10 below, will play a crucial role in the error analysis of the variational approximation in Sect. 6. Similar results have been obtained for the dynamical low-rank approximation of matrices or tensors, respectively, in [7, 8]. Our general strategy follows the ideas developed there, but extending the results of [7, 8] to the hierarchical Tucker format is a considerable challenge due to the recursive definition of the equations of motion.
In the following σ i (M) denotes the i-th singular value (in descending order) of a matrix \(M \in\mathbb{R}^{n \times m}\) with n,m≥i. Moreover, we will use the Frobenius norm \(\| Y \|_{F} = \sqrt{\langle Y , Y \rangle}\) of tensors \(Y \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). By construction the Frobenius norm of a tensor equals the matrix Frobenius norm of a corresponding matricization, i.e. \(\| Y \|_{F} = \Vert Y^{(\mathcal{N}, \mathcal {N}_{C})} \Vert_{F} \) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\).
Lemma 9
If \(Y, \widetilde{Y} \in\mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) are tensors in the hierarchical Tucker format fulfilling
for some \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\), then
If in addition ρ>δ, then the pseudoinverse \(\widetilde {\mathbb{S}}^{{\mathcal{N}}^{+}} \) of the single-hole-matrix \(\widetilde{\mathbb{S}}^{\mathcal{N}}\) of \(\widetilde{Y}\) is bounded by
Proof
The estimates
hold for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\). The inequality (5.1) is shown with Theorem 7.4.51 in [6]. Then
For ρ>δ the pseudoinverse can be bounded by
□
Definition 5.1
An ordered tuple \(( Y , \widetilde{Y} ) \in ( \mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) is called path-connected with respect to ODE (5.2), if there exists a solution for the boundary value problem
on the interval [0,1].
Lemma 10
(curvature bounds)
Let \((Y, \widetilde{Y} ) \in ( \mathcal{H}\textit {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) tensors in the hierarchical Tucker format, which are path-connected with respect to ODE (5.2). Let \((\rho_{\mathcal{N}})_{\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\} }\) and \((c_{\mathcal{N}})_{\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\} }\) be families of positive real values such that
for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\). Furthermore let \(\widetilde{c}> 0\) be a constant fulfilling the inequalities
and define
Then the bounds
hold for every tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).
A similar result has been proven independently in [12].
Proof
Since the proof is rather long and technical, it is divided into several parts.
Part 1: For the moment we assume that there are
such that
for all 0≤τ≤1. Below we will prove that this assumption is indeed true. In the following Δ denotes the projection of the difference from Y to \(\widetilde{Y}\) on the tangent space \(\mathcal{T}_{Y} \mathcal{M}\):
Obviously \(\| \varDelta\|_{F} \leq \| \widetilde{Y} - Y \|_{F} = \delta \) holds. The projection of (5.6) on the tangent space leads to the equation
and differentiating with respect to τ gives
The derivative \(\dot{X}(\tau)\) belongs to the tangent space of X(τ), therefore
is satisfied, which leads to
In part 6 of the proof it will be shown that
for all 0≤τ≤1. Together with (5.8) this yields
and hence
Estimate (5.9) guarantees for all τ∈[0,1] the convergence of the Neumann series for P(X(τ))−P(Y). Hence, the operator I−[P(X(τ))−P(Y)] is invertible. Defining the function
leads to the equations
Then, the existence of a solution X(τ) of (5.11), (5.8) respectively, with initial value X(0)=Y follows from the theory of ordinary differential equations and (5.10). Together with \(\widetilde{c} <1\) the estimates
show
Moreover the projection of (5.8) on the tangent space of X(τ) leads to
and hence
Since integrating (5.12) leads to (5.7), setting
shows the existence of the decomposition stated above.
Part 2: In this part of the proof, we prove a number of auxiliary inequalities for later use. Since X(0)=Y by construction, the fundamental theorem of calculus and (5.10) reveal the estimate
because τ≤1. Let \(\mathbb{U}^{\mathcal{N}}(\tau)\) and \(\mathbb{S}^{\mathcal{N}} (\tau)\) be the orthonormal basis matrix and the single-hole-matrix of X(τ), i.e. \((X(\tau))^{(\mathcal {N},\mathcal{N}_{C})}=\mathbb{U}^{\mathcal{N}}(\tau)\mathbb {S}^{\mathcal{N}}(\tau)^{T}\). Under Assumption (5.3), Lemma 9 can be applied and provides the bound
for all 0≤τ≤1 for the pseudoinverse \(\mathbb{S}^{\mathcal{N}}(\tau)\). Next, a bound for \(\| \dot{\mathbb{U}}^{\mathcal{N}}\mathbb {S}^{\mathcal{N}}(\tau)^{T} \|_{F}\) is shown. Since \(P^{\mathcal{N}}(\tau)^{\bot}= I - \mathbb{U}^{\mathcal{N}}(\tau) \mathbb{U}^{\mathcal{N}}(\tau)^{T} \) by definition, it follows from
that
With \(\| P^{\mathcal{N}}(\tau)^{\bot}\|_{2}\leq1\) and (5.10), we thus obtain
Combining this with (5.13) yields
With this inequality, the derivative of \(P^{\mathcal{N}}(\tau)^{\bot}= I - \mathbb{U}^{\mathcal{N}}(\tau) \mathbb{U}^{\mathcal{N}}(\tau)^{T} \) can be bounded by
A bound for the derivative of the single-hole-matrix \(\mathbb {S}^{\mathcal{N}}(\tau)\) of X(τ) can be derived from the orthonormality (5.14) and (5.10) via
Next, we want to estimate the derivative of \(\mathbb{S}^{\mathcal{N}}(\tau) \mathbb{S}^{\mathcal{N}}(\tau)^{+} \), which can be written as
with
It follows from (5.13) and (5.15) that
In order to bound T 2 we substitute
and obtain
Since \(\|\mathbb{S}^{\mathcal{N}}(\tau)\mathbb{S}^{\mathcal{N}}(\tau )^{+}\|_{2} = 1\), this gives the bound ∥T 2(τ)∥ F ≤2∥T 1(τ)∥ F and in total
Part 3: Since \(( Y, \widetilde{Y} ) \in ( \mathcal{H}\text {-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) is path-connected with respect to ODE (5.2), we have \(X(1) = \widetilde{Y}\). As a first step towards an error bound for \(\| ( P(\widetilde{Y}) - P(Y)) B \|_{F} \) we use
and replace P(X(τ))B by the representation from Corollary 8. This yields
where
and \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\).
Part 4: Now for each of the three terms (5.17), (5.18), (5.19), a suitable bound has to be derived. Assuming \(\mathcal{N}\) to be an interior node with successors \((\mathcal{N}_{1}, \mathcal{N}_{2}) = \mathit{succ}(\mathcal{N})\) allows us to bound
Hence, the inequality
holds for all tensors \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).
Next, we consider (5.18). Assuming \(\mathcal{N}\neq\mathcal{N}_{0}\) allows us to deduce the estimate
Finally, let ℓ be a leaf of the dimension tree. Then, for the term in (5.19) we obtain the estimate
Part 5: Substituting (5.20), (5.21), (5.22) into (5.17), (5.18), (5.19) yields the first assertion (5.4):
where \((\mathcal{N}_{01},\mathcal{N}_{02}) = \mathit{succ} (\mathcal{N}_{0})\). The proof of the second assertion (5.5) is much shorter. With \(P(X (\tau)) \dot{X} (\tau)=\dot{X}(\tau)\) we obtain
and with (5.4) and (5.10) the error bound
follows.
Part 6: Finally, we can prove inequality (5.9) which had been used in part 1 of the proof. Assume (5.9) does not hold. Since \(\mathcal{M}\) is a smooth manifold, there exist 0<τ ∗<1 and a tensor \(B^{*} \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) such that
hold for all s∈[0,τ ∗]. Under these conditions all the estimates of the proof remain valid for τ∈[0,τ ∗]. However, in (5.16) we take the integral over the interval [0,τ ∗] instead of [0,1] and obtain with similar arguments the estimate
for any tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). This results in a contradiction to (5.23) since
□
In order to simplify the previous theorem, we assume \(c_{\mathcal{N}}= c\) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and maximize this constant under the constraints of Lemma 10, i.e.
Since the number of nodes in the dimension tree is always 2d−1, the maximum will be attained for
As \(\widetilde{c} \rightarrow\frac{1}{2}\) for d→∞, we let
and obtain the next corollary.
Corollary 11
(curvature bounds)
Let \((Y,\widetilde{Y} ) \in ( \mathcal{H}\textit{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T}) )^{2}\) tensors, which are path-connected with respect to ODE (5.2). Let ρ>0 such that
is satisfied for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and β:=(32d−31)ρ −1. Then
for any tensor \(B \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\).
Proof
Inserting \(c = c_{\mathcal{N}}= ( 64d-62)^{-1}\) for all \(\mathcal{N}\in T\setminus\{\mathcal{N}_{0}\}\) and \(\widetilde{c} = \frac{1}{2}\) in the bounds (5.4) and (5.5) of Lemma 10 proves the assertion. □
6 Error analysis
We are finally ready to analyze the accuracy of the variational approximation \(Y(t)\in\mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) to the solution of (3.1). The error bound presented in Theorem 12 is the main result of this article. Very similar results have independently and simultaneously been obtained in [12].
Let \(Y_{ex}(t)\in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) be the solution of the differential equation (3.1), i.e. \(\dot{Y}_{ex}= F(Y_{ex})\) for all t∈[0,t end] with initial value \(Y_{ex}(0) \in\mathbb{R}^{N_{1} \times\cdots\times N_{d}}\). Let \(Y(t)\in\mathcal{M}= \mathcal{H}\text{-Tucker}((r_{\mathcal{N}})_{\mathcal{N}\in T})\) be the variational approximation defined by (3.2), i.e. \(\dot{Y} = P(Y) F(Y)\). For simplicity, it is assumed that \(Y_{ex}(0)\in\mathcal{M}\) and that Y(0)=Y ex (0). (Otherwise, an additional error term for the initial error has to be included in Theorem 12.) Moreover, let \(X(t) \in\mathcal{M}\) be the best approximation of Y ex (t) on \(\mathcal{M}\), i.e. for every t∈[0,T] we have
Since by definition and by the triangle inequality we have
it makes sense to prove a result which bounds the error ∥Y(t)−X(t)∥ F in terms of ∥X(t)−Y ex (t)∥ F . The following assumptions are made for all t∈[0,t end].
-
(A1)
X(t) is continuously differentiable.
-
(A2)
There is a constant μ>0 such that
$$ \bigl\| F\bigl(Y_{ex}(t)\bigr) \bigr\|_F \leq\mu, \qquad\bigl\| F\bigl(X(t)\bigr) \bigr\|_F \leq\mu\quad\text{and} \quad \bigl\| F\bigl(Y(t)\bigr) \bigr\|_F \leq\mu. $$ -
(A3)
There exist constants L>0 and \(\lambda\in \mathbb{R}\) such that the Lipschitz conditions
$$\begin{aligned} \bigl\| F\bigl(Y_{ex}(t)\bigr) - F\bigl(X(t)\bigr) \bigr\|_F \leq& L \bigl\| Y_{ex}(t) -X (t) \bigr\|_F \quad\text{and} \\ \bigl\langle F\bigl(Y (t)\bigr) - F\bigl(X(t)\bigr) , Y (t) - X(t) \bigr\rangle \leq& \lambda\bigl\| Y(t) -X (t) \bigr\|^2_F \end{aligned}$$are fulfilled.
-
(A4)
The singular values of the matricizations of the best approximation are bounded from below by
$$ \sigma_{r_\mathcal{N}} \bigl( X^{(\mathcal{N}, \mathcal{N}_C)} (t) \bigr) \geq\rho> 0 \quad\text{for all } \mathcal{N}\in T\setminus\{\mathcal{N}_0\}. $$ -
(A5)
The distance from the best approximation X(t) to the exact solution Y ex (t) is bounded by
$$ \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \leq\frac{1}{2 \beta} , $$where β=(32d−31)ρ −1.
-
(A6)
The tuple (Y(t),X(t)) is path-connected with respect to ODE (5.2).
Similar assumptions have been made in [8] where the approximation error of the non-hierarchical Tucker format has been analyzed.
Theorem 12
Under the above assumptions the difference between the variational approximation and the best approximation is bounded by
as long as the right-hand side of (6.1) is bounded by \(\frac{1}{2 \beta}\).
Proof
Because of X(t) being the best approximation, the deviation Y ex (t)−X(t) is orthogonal to the tangent space \(\mathcal{T}_{X} \mathcal{M}\),
and differentiating yields
With \(\dot{X} (t) = P(X(t)) \dot{X}(t)\) we obtain
Next, we derive an estimate for D(t). For an arbitrary tensor \(B \in \mathbb{R}^{N_{1} \times\cdots\times N_{d}}\) the bound (5.24) yields
and for B:=Y ex (t)−X(t) this yields
In the following we let δ(t):=∥Y ex (t)−X(t)∥ F and ε(t):=∥Y(t)−X(t)∥ F . Inserting (6.2) into (6.3) leads to
With Assumption (A5), i.e. \(\| \delta(t) \|_{F} \leq\frac{1}{2 \beta }\), we obtain the estimate
and with Assumption (A2)
Subtracting (6.2) from \(\dot{Y} (t) = P (Y(t)) F(Y(t))\) gives
Next, we want to find an estimate for the inner product \(\langle\dot {Y} (t) - \dot{X} (t) , Y(t) - X(t) \rangle\). For this purpose we consider the single terms on the right hand side of (6.4) and, via Corollary 11, obtain the following estimates:
-
1.
$$\begin{aligned} &\bigl\langle\bigl[ P\bigl(Y(t)\bigr) -P\bigl(X(t)\bigr) \bigr] F\bigl(X(t) \bigr) , Y(t)-X(t) \bigr\rangle \\ &\quad\leq\bigl\| \bigl[ P\bigl(Y(t)\bigr) -P\bigl(X(t)\bigr) \bigr] F\bigl(X(t)\bigr) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad\leq\beta\bigl\| Y(t)-X(t) \bigr\|^2_F \bigl\| F\bigl(X(t) \bigr) \bigr\|_F \\ &\quad\leq\mu\beta\varepsilon^2 (t) , \end{aligned}$$
-
2.
$$\begin{aligned} &\bigl\langle P\bigl(X(t)\bigr) \bigl[ F\bigl(X(t)\bigr)- F\bigl (Y_{ex}(t)\bigr) \bigr] , Y(t)-X(t) \bigr\rangle \\ &\quad\leq\bigl\| F\bigl(X(t)\bigr) - F\bigl(Y_{ex}(t)\bigr) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad\leq L \bigl\| X(t) - Y_{ex}(t) \bigr\|_F \bigl\| Y(t)-X(t) \bigr\|_F \\ &\quad= L \delta(t) \varepsilon(t), \end{aligned}$$
-
3.
$$ \bigl\langle F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) , Y(t)-X(t) \bigr\rangle\leq \lambda\bigl\| Y(t)-X(t) \bigr\|^2_F = \lambda \varepsilon^2 (t), $$
-
4.
$$\begin{aligned} &\bigl\langle P^\bot\bigl(Y(t)\bigr) \bigl[F\bigl(Y(t)\bigr) - F \bigl(X(t)\bigr) \bigr] , Y(t)-X(t) \bigr\rangle \\ &\quad= \bigl\langle F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) , P^\bot \bigl(Y(t)\bigr) \bigl( Y(t)-X(t) \bigr) \bigr\rangle \\ &\quad\leq\bigl\| F\bigl(Y(t)\bigr) - F\bigl(X(t)\bigr) \bigr\|_F \bigl\Vert P^\bot\bigl(Y(t)\bigr) \bigl( Y(t)-X(t) \bigr) \bigr \Vert _F \\ &\quad\leq\bigl( \bigl\| F\bigl(Y(t)\bigr) \bigr\|_F + \bigl\| F\bigl(X(t)\bigr) \bigr\|_F \bigr) 2 \beta\bigl\| Y(t)-X(t) \bigr\|_F^2 \\ &\quad\leq4 \mu\beta\varepsilon^2(t), \end{aligned}$$
-
5.
$$ \bigl\langle D(t) , Y(t)-X(t) \bigr\rangle\leq\bigl\| D(t) \bigr\| _F \bigl\| Y(t)-X(t) \bigr\|_F \leq2\beta\mu\delta(t) \varepsilon(t) . $$
With these inequalities, we obtain
and hence
Finally Gronwall’s inequality leads to the assertion
□
References
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31, 2029–2054 (2010)
Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Springer Series in Computational Mathematics, vol. 42. Springer, Berlin (2012)
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009)
Holtz, S., Rohwedder, T., Schneider, R.: On manifolds of tensors of fixed tt-rank. Numer. Math. 120(4), 701–731 (2012)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985), 1. publ. edition
Koch, O., Lubich, C.: Dynamical low-rank approximation. SIAM J. Matrix Anal. Appl. 29, 434–454 (2007)
Koch, O., Lubich, C.: Dynamical tensor approximation. SIAM J. Matrix Anal. Appl. 31, 2360–2375 (2010)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Kressner, D., Tobler, C.: htucker—a Matlab toolbox for tensors in hierarchical Tucker format. Technical report, ETH, Zurich (2012)
Lubich, C.: From Quantum to Classical Molecular Dynamics: Reduced Models and Numerical Analysis. Zurich Lectures in Advanced Mathematics. European Mathematical Society, Zürich (2008)
Lubich, C., Rohwedder, T., Schneider, R., Vandereycken, B.: Dynamical approximation by hierarchical Tucker and tensor-train tensors. SIAM J. Matrix Anal. Appl. 34, 470–494 (2013)
Manthe, U.: A multilayer multiconfigurational time-dependent Hartree approach for quantum dynamics on general potential energy surfaces. J. Chem. Phys. 128, 164116 (2008)
Uschmajew, A., Vandereycken, B.: The geometry of algorithms using hierarchical tensors. Linear Algebra Appl. 439, 133–166 (2013)
Vendrell, O., Meyer, H.-D.: Multilayer multi-configuration time-dependent Hartree method: implementation and applications to a Henon-Heiles Hamiltonian and to pyrazine. J. Chem. Phys. 134(4), 044135 (2011)
Wang, H., Thoss, M.: Multilayer formulation of the multiconfiguration time-dependent Hartree theory. J. Chem. Phys. 119, 1289–1299 (2003)
Acknowledgements
The authors thank the anonymous referees for their helpful remarks and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Christian Lubich.
Rights and permissions
About this article
Cite this article
Arnold, A., Jahnke, T. On the approximation of high-dimensional differential equations in the hierarchical Tucker format. Bit Numer Math 54, 305–341 (2014). https://doi.org/10.1007/s10543-013-0444-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10543-013-0444-2
Keywords
- High-dimensional differential equations
- Tensor approximation
- Hierarchical Tucker format
- Dirac-Frenkel variational principle
- Error bounds