1 Introduction

The approximation of high-dimensional functions is one of the most challenging tasks in computational science. Such high-dimensional problems arise in many domains of physics, chemistry, biology or finance, where the functions are the solutions of high-dimensional partial differential equations (PDEs). Such problems also typically arise in statistics or machine learning, for the estimation of high-dimensional probability density functions, or the approximation of the relation between a certain random variable and some predictive variables, the typical task of supervised learning. The approximation of high-dimensional functions is also required in optimization or uncertainty quantification problems, where the functions represent the response of a system (or model) in terms of some parameters. These problems require many evaluations of the functions and are usually intractable when one evaluation requires a specific experimental set-up or one run of a complex numerical code.

The approximation of high-dimensional functions from a limited number of information on the functions requires exploiting low-dimensional structures of functions. This usually call for nonlinear approximation tools [10, 43]. A prominent approach consists of exploiting the sparsity of functions relatively to a basis, a frame, or a more general dictionary of functions [4, 6, 44]. Another approach consists of exploiting low-rank structures of multivariate functions, interpreted as elements of tensor spaces, which is related to notions of sparsity in (uncountably infinite) dictionaries of separable functions. For a multivariate function \(v(x_1,\ldots ,x_d)\) defined on a product set \(\mathcal {X}_1\times \cdots \times \mathcal {X}_d\), which is here identified with a tensor of order d, a natural notion of rank is the canonical rank, which is the minimal integer r such that

$$\begin{aligned} v(x_1,\ldots ,x_d) = \sum _{k=1}^r v^1_k(x_1) \ldots v_k^d(x_d) \end{aligned}$$

for some univariate functions \(v^\nu _k\) defined on \(\mathcal {X}_\nu \). For \(d=2\), this corresponds to the unique notion of rank, which coincides with the matrix rank when the variables take values in finite index sets and v is identified with a matrix. A function with low canonical rank r has a number of parameters which scales only linearly with r and d. However, it turns out that this format has several drawbacks when \(d>2\) (see, e.g., [9, 22]), which makes it unsuitable for approximation. Then, other notions of rank have been introduced. For a subset of dimensions \(\alpha \) in \(\{1,\ldots ,d\}\), the \(\alpha \)-rank of a function v is the minimal integer \( {\mathrm {rank}}_\alpha (v)\) such that

$$\begin{aligned} v(x_1,\ldots ,x_d) = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} v^\alpha _k(x_\alpha ) v_k^{\alpha ^c}(x_{\alpha ^c}) \end{aligned}$$

for some functions \(v^\alpha _k\) and \(v^{\alpha ^c}_k\) of complementary groups of variables \(x_\alpha = (x_\nu )_{\nu \in \alpha }\in \mathcal {X}_\alpha \) and \(x_{\alpha ^c} = (x_\nu )_{\nu \in \alpha ^c} \in \mathcal {X}_{\alpha ^c}\), with \(\alpha ^c\) the complementary subset of \(\alpha \) in \(\{1,\ldots ,d\}.\) Approximation formats can then be defined by imposing \(\alpha \)-ranks for a collection of subsets \(\alpha \). More precisely, if A is a collection of subsets in \(\{1,\ldots ,d\}\), we define an approximation format

$$\begin{aligned} \mathcal {T}_r^A = \{v : {\mathrm {rank}}_\alpha (v) \le r_\alpha , \alpha \in A\} = \bigcap _{\alpha \in A} \mathcal {T}_{r_\alpha }^{\{\alpha \}}, \end{aligned}$$

where \(r=(r_\alpha )_{\alpha \in A}\) is a tuple of integers. When A is a tree-structured collection of subsets (a subset of a dimension partition tree), \(\mathcal {T}_r^A \) is a tree-based tensor format whose elements admit a hierarchical and data-sparse representation. Tree-based tensor formats are tree tensor networks, i.e. tensor networks with tree-structured graphs [36]. They include the hierarchical Tucker (HT) format [21] and the tensor-train (TT) format [38]. Tree-based formats have many favorable properties that make them favorable for numerical use. As an intersection of subsets of tensors with bounded \(\alpha \)-rank, \(\alpha \in A\), these formats inherit most of the nice properties of the low-rank approximation format for order-two tensors. In particular, under suitable assumptions on tensor norms, best approximation problems in the set \(\mathcal {T}_r^A\) are well-posed [13, 16]. Also, the \(\alpha \)-rank of a tensor can be computed through singular value decomposition, and the notion of singular value decomposition can be extended (in different ways) to these formats [8, 17, 37]. Another interesting property, which is not exploited in the present paper, is the fact that the set \(\mathcal {T}_r^A\) is a differentiable manifold [14, 15, 23, 45], which has interesting consequences in optimization or model order reduction of dynamical systems in tensor spaces [30]. There are only a few results available on the approximation properties of tree-based formats [42]. However, it has been observed in many domains of applications that tree-based formats have a high approximation power (or expressive power). Hierarchical tensor formats have been recently identified with deep neural networks with a particular architecture [7].

The reader is referred to the monograph [20] and surveys [1, 18, 27, 28, 34, 35] for an introduction to tensor numerical methods and an overview of recent developments in the field.

This paper is concerned with the problem of computing an approximation of a function \(u(x_1,\ldots ,x_d)\) using point evaluations of this function, where evaluations can be selected adaptively. This includes problems where the function represents the output of a black-box numerical code, a system or a physical experiment for a given value of the input variables \((x_1,\ldots ,x_d)\). This also includes the solution of high-dimensional PDEs with a probabilistic interpretation, where Monte-Carlo methods can be used to obtain point evaluations of their solutions. This excludes problems where evaluations of the functions come as an unstructured data set. A multivariate function \(u(x_1,\ldots ,x_d) \) is here considered as an element of a Hilbert tensor space \(\mathcal {H}_1\otimes \cdots \otimes \mathcal {H}_d\) of real-valued functions defined on a product set \( \mathcal {X}_1\times \cdots \times \mathcal {X}_d\) equipped with a probability measure. This includes the case of multidimensional arrays when the variables \(x_\nu \) take values in finite sets \(\mathcal {X}_\nu \). In this case, a point evaluation corresponds to the evaluation of an entry of the tensor.

Several algorithms have been proposed for the construction of approximations in tree-based formats using point evaluations of functions or entries of tensors. Let us mention algorithms that use adaptive and structured evaluations of tensors [2, 39] and statistical learning approaches that use unstructured (random) evaluations of functions [5, 11, 12, 19]. Let us also mention the recent work [31] for the approximation in Tucker format, with an approach similar to the one proposed in the present paper.

In the present paper, we propose and analyse a new algorithm which is based on a particular extension of the singular value decomposition for the tree-based format \(\mathcal {T}_r^A\) which allows us to construct an approximation using only evaluations of a function (or entries of a tensor). The proposed algorithm constructs a hierarchy of subspaces \(U_\alpha \) of functions of groups of variables \(x_\alpha \), for all \(\alpha \in A\), and associated interpolation operators \(I_{U_\alpha }\) which are oblique projections onto \(U_\alpha \). For the construction of \(U_\alpha \) for a particular node \(\alpha \in A\), we interpret the function u as a random variable \(u(\cdot ,x_{\alpha ^c})\) depending on a set of random variables \(x_{\alpha ^c}\) with values in the space of functions of the variables \(x_\alpha \). Then \(U_\alpha \) is obtained by estimating the principal components of this function-valued random variable using random samples \(u(\cdot ,x^k_{\alpha ^c})\). In practice, we estimate the principal components from interpolations \(I_{{V_\alpha }} u(\cdot ,x^k_{\alpha ^c})\) of these samples on a subspace \({V_\alpha }\) which is a certain approximation space when \(\alpha \) is a leaf of the tree, or the tensor product of subspaces \(\{U_\beta \}_{\beta \in S(\alpha )}\) associated with the sons \(S(\alpha )\) of the node \(\alpha \) when \(\alpha \) is not a leaf of the tree. This construction only requires evaluations of u on a product set of points which is the product of an interpolation grid in \(\mathcal {X}_{\alpha }\) (unisolvent for the space \({V_\alpha }\)), and a random set of points in \(\mathcal {X}_{\alpha ^c}\). It is a sequential construction going from the leaves to the root of the tree.

The proposed algorithm can be interpreted as an extension of principal component analysis for tree-based tensors which provides a statistical estimation of low-dimensional subspaces of functions of groups of variables for the representation of a multivariate function. It is able to provide an approximation \(u^\star \) in any tree-based format \(\mathcal {T}_r^A\) with either a prescribed rank r or a prescribed relative error (by adapting the rank r). For a given r, it has the remarkable property that it is able to provide an approximation in \(\mathcal {T}_r^A\) with a number of evaluations equal to the storage complexity of the resulting approximation. Under some assumptions on the estimation of principal components, we prove that the algorithm, up to some discretization error \(\rho \), provides with high probability a quasi-optimal approximation with a prescribed rank, i.e.

$$\begin{aligned} \Vert u - u^\star \Vert \le c \min _{v\in \mathcal {T}^A_r} \Vert u-v \Vert + \rho , \end{aligned}$$

where the constant c depends on the set A and the properties of orthogonal projections and interpolation operators associated with principal subspaces. Also, under some assumptions on the estimation of principal components and discretization error, we prove that the algorithm with prescribed tolerance \(\epsilon \) is able to provide an approximation \(u^\star \) such that

$$\begin{aligned} \Vert u - u^\star \Vert \le \tilde{c} \epsilon \Vert u \Vert \end{aligned}$$

holds with high probability, where the constant \(\tilde{c}\) depends on the set A and the properties of projections and interpolation operators. Sharp inequalities are obtained by considering the properties of projection and interpolation operators when restricted to minimal subspaces of tensors. The analysis takes into account the discretization errors for the approximation of infinite-dimensional tensors. For a tensor with finite and known rank in a tree-based format, and when there is no discretization error, the algorithm is able to recover the tensor in a stable way using a number of evaluations equal to the storage complexity of the representation of the tensor in this format. This algorithm may have important applications in the manipulation of big data, by providing a way to reconstruct a multidimensional array from a limited number of entries (tensor completion).

The outline of the paper is as follows. In Sect. 2, we introduce some definitions and properties of projections in Hilbert spaces, with a particular attention on Hilbert spaces of functions and projections based on point evaluations. In Sect. 3, we recall basic definitions on tensors and Hilbert tensor spaces of functions defined on measured product sets. Then we introduce some definitions and properties of operators on tensor spaces, with partial point evaluation functionals as a particular case. Finally, we introduce definitions and properties of projections on tensor spaces, with a particular attention on orthogonal projection and interpolation. In Sect. 4, we introduce tree-based low-rank formats in a general setting including classical HT and TT formats. In Sect. 5, we first introduce the notion of principal component analysis for multivariate functions and then propose an extension of principal component analysis to tree-based tensor format. This is based on a new variant of higher-order singular value decomposition of tensors in tree-based format. In Sect. 6, we present and analyse a modified version of the algorithm presented in Sect. 5 which only requires point evaluations of functions, and which is based on empirical principal component analyses and interpolations. In Sect. 7, the behavior of the proposed algorithm is illustrated and analysed in several numerical experiments.

2 Projections

For two vector spaces V and W equipped with norms \(\Vert \cdot \Vert _V\) and \(\Vert \cdot \Vert _W\) respectively, we denote by L(VW) the space of linear operators from V to W. We denote by \(\mathcal {L}(V,W)\) the space of linear and continuous operators from V to W, with bounded operator norm \(\Vert A \Vert _{V\rightarrow W} = \max _{\Vert v \Vert _V=1} \Vert A v \Vert _W\). We denote by \(V^*=L(V,\mathbb {R})\) the algebraic dual of V and by \(V'=\mathcal {L}(V,\mathbb {R})\) the topological dual of V, and we let \(\Vert \cdot \Vert _{V\rightarrow \mathbb {R}} = \Vert \cdot \Vert _{V'}\). We denote by \(\langle \cdot , \cdot \rangle \) the duality pairing between a space and its dual. We let \(L(V) := L(V,V)\) and \(\mathcal {L}(V) := \mathcal {L}(V,V)\), and we replace the notation \(\Vert \cdot \Vert _{V\rightarrow V}\) by \(\Vert \cdot \Vert _V\), where the latter notation also stands for the norm on V.

2.1 Projections

Let V be a Hilbert space and U be a finite-dimensional subspace of V. An operator P is a projection onto a subspace U if \({\mathrm {Im}}(P)=U\) and \(Pu=u\) for all \(u\in U\).

The orthogonal projection\(P_U\) onto U is a linear and continuous operator which associates to \(v \in V\) the unique solution \(P_U v \in U\) of

$$\begin{aligned} \Vert v - P_U v \Vert _V = \min _{u \in U} \Vert v-u \Vert _V, \end{aligned}$$

or equivalently \( (u,P_U v - v) = 0, \; \forall u\in U. \) The orthogonal projection \(P_U\) has operator norm \(\Vert P_U \Vert _V=1\).

Let W be a finite-dimensional subspace of \(V^*\) such that

$$\begin{aligned}&\dim (W) = \dim (U), \text { and} \end{aligned}$$
(1a)
$$\begin{aligned}&\{u \in U : \langle w , u\rangle = 0 \quad \text { for all } w \in W\} = \{0\}, \end{aligned}$$
(1b)

where the latter condition is equivalent to \(U\cap {}^\perp W = \{0\}\), with \({}^\perp W \) the annihilator of W in V (see [33, Definition 1.10.4]). Under the above assumptions, we have that for any \(v\in V\), there exists a unique \(u\in U\) such that \(\langle w,u-v\rangle = 0\) for all \(w\in W\).Footnote 1 This allows to define the projection \(P_{U}^W\) onto U along W which is the linear operator on V which associates to \(v \in V\) the unique solution \(P_{U}^W v \in U\) of

$$\begin{aligned} \langle w ,P_{U}^W v- v \rangle =0,\; \forall w \in W. \end{aligned}$$

For \(W = R_V U\), where \(R_V : V \rightarrow V'\) is the Riesz map, the projection \(P_{U}^W\) coincides with the orthogonal projection \(P_U\). A non orthogonal projection is called an oblique projection. If \(W \subset V',\) then \(P_U^W\) is a projection from V onto U parallel to \(Ker(P_U^W)=Z^\perp \), where \(Z = R_V^{-1}W\). If \(W\subset \tilde{U}'\), with \(\tilde{U} \) a closed subspace of V, then \(P_U^W \vert _{\tilde{U}}\) is a projection from \(\tilde{U}\) onto U parallel to \(Ker(P_U^W) \cap \tilde{U} = Z^\perp \cap \tilde{U}\), where \(Z = R_{\tilde{U}}^{-1} W\), with \(R_{\tilde{U}} \) the Riesz map from \(\tilde{U}\) to \(\tilde{U}'\).

Proposition 1

Let \(\tilde{U}\) be a closed subspace of V and assume that \(U\subset \tilde{U}\) and \(W\subset \tilde{U}'.\)Footnote 2 Then \(P_U^W\) is a continuous operator from \(\tilde{U}\) to V.

Proof

Let us equip W with the norm \(\Vert w \Vert _W = \Vert w \Vert _{\tilde{U}'} = \max _{v\in \tilde{U}} \langle w ,v \rangle /\Vert v \Vert _V,\) such that for all \(v\in \tilde{U}\), \(\langle w , v \rangle \le \Vert w \Vert _W \Vert v\Vert _V.\) Let

$$\begin{aligned} \alpha = \min _{0\ne u \in U} \max _{0\ne w \in W} \frac{\langle w,u\rangle }{\Vert u \Vert _V \Vert w \Vert _W}. \end{aligned}$$

Assumption (1b) implies that \(\alpha >0\). Then for all \(v\in \tilde{U}\), we have

$$\begin{aligned} \Vert P_U^W v \Vert _V \le \alpha ^{-1} \max _{0\ne w \in W} \frac{\langle w, P_U^W v\rangle }{ \Vert w \Vert _W} = \alpha ^{-1} \max _{0\ne w \in W} \frac{\langle w, v\rangle }{ \Vert w \Vert _W} \le \alpha ^{-1} \Vert v \Vert _V, \end{aligned}$$

which ends the proof. \(\square \)

Proposition 2

Let P and \(\tilde{P}\) be projections onto subspaces U and \(\tilde{U}\) respectively and assume \(U \subset \tilde{U}\). Then

$$\begin{aligned} \tilde{P} P= P. \end{aligned}$$

Moreover, if P and \(\tilde{P}\) are projections along W and \(\tilde{W}\) respectively, with \(W\subset \tilde{W}\), then

$$\begin{aligned} \tilde{P} P = P \tilde{P} = P. \end{aligned}$$

Proof

For all \(v\in V\), \(Pv \in U \subset \tilde{U}\), and therefore \(\tilde{P} P v=Pv\), which proves the first statement. For the second statement, by definition of the projection P, we have that \( \langle \phi ,P \tilde{P}v - \tilde{P}v \rangle = 0 \) for all \(\phi \in W\). Since \(W\subset \tilde{W}\) and by definition of \(\tilde{P}\), this implies that \( \langle \phi ,P \tilde{P}v - v \rangle = 0 \) for all \(\phi \in W\). By definition of Pv and since \(P \tilde{P}v \in U,\) this implies \(P\tilde{P} v = Pv = \tilde{P} Pv \). \(\square \)

Proposition 3

Let U and \(\tilde{U}\) be two closed subspaces of V, with U of finite dimension. Let \(P_U\) be the orthogonal projection onto U and let \(P_U^W\) be the projection onto U along \(W \subset \tilde{U}'\). For all \(v \in \tilde{U}\),

$$\begin{aligned} \Vert P_U^W v - P_U v \Vert _V \le \Vert P_U^W - P_U \Vert _{ \tilde{U} \rightarrow V} \Vert v - P_U v \Vert _V, \end{aligned}$$

with

$$\begin{aligned} \Vert P_U^W - P_U \Vert _{ \tilde{U} \rightarrow V} = \Vert P_U^W \Vert _{ (id-P_U)\tilde{U} \rightarrow V} \le \Vert P_U^W \Vert _{ \tilde{U} \rightarrow V} . \end{aligned}$$

Also, for all \(v \in \tilde{U}\),

$$\begin{aligned} \Vert v - P_U^W v \Vert ^2_V \le (1+\Vert P_U^W - P_U \Vert _{ \tilde{U} \rightarrow V}^2) \Vert v - P_U v \Vert _V^2 . \end{aligned}$$

Proof

For \(v\in \tilde{U}\), \(\Vert P_U^W v - P_U v \Vert _V = \Vert P_U^W(v -P_U v)\Vert _V =\Vert (P_U^W-P_U)( v - P_U v)\Vert _V \le \Vert P_U^W - P_U \Vert _{(id-P_U)\tilde{U} \rightarrow V} \Vert v - P_U v \Vert _V\), with \(\Vert P_U^W - P_U \Vert _{(id-P_U)\tilde{U} \rightarrow V} = \Vert P_U^W - P_U \Vert _{ \tilde{U} \rightarrow V}= \Vert P_U^W \Vert _{ (id-P_U)\tilde{U} \rightarrow V} \). This proves the first statement. The second statement directly follows from \(\Vert v - P_U^W v \Vert _V^2 = \Vert v - P_U v\Vert _V^2 + \Vert P_U v - P_U^W v \Vert _V^2\). \(\square \)

2.2 Projection of functions using point evaluations

Let V be a Hilbert space of functions defined on a set X. For \(x\in X\), the point evaluation functional \(\delta _{x} \in V^*\) is defined by \(\langle \delta _{x },v\rangle = v(x)\).

2.2.1 Interpolation

Let U be a n-dimensional subspace of V and let \(\varGamma = \{x^k\}_{k=1}^n\) be a set of n interpolation points in X. The set of interpolation points \(\varGamma \) is assumed to be unisolvent for U, i.e. for any \((a_k)_{k=1}^n \in \mathbb {R}^n\), there exists a unique \(u\in U\) such that \(u(x^k) = a_k\) for all \(1\le k\le n\). The interpolation operator \(I_U\) associated with \(\varGamma \) is a linear operator from V to U such that for \(v\in V\), \(I_U v\) is the unique element of U such that

$$\begin{aligned} \langle \delta _{x},I_U v - v \rangle = I_Uv(x) - v(x) = 0 \quad \forall x \in \varGamma . \end{aligned}$$

The interpolation operator \(I_U\) is an oblique projection \(P_U^W\) onto U along \(W = {\mathrm {span}}\{\delta _{x} : x\in \varGamma \}\). Note that the condition that \(\varGamma \) is unisolvent for U is equivalent to the condition (1b) on U and W, which ensures that \(I_U\) is well defined. From Proposition 2, we deduce the following property.

Proposition 4

Let U and \(\tilde{U}\) be two subspaces associated with sets of interpolation points \(\varGamma \) and \(\tilde{\varGamma }\) respectively. If \(U \subset \tilde{U}\) and \(\varGamma \subset \tilde{\varGamma }\), then

$$\begin{aligned} I_{U}I_{\tilde{U}} = I_{\tilde{U}} I_U = I_U. \end{aligned}$$

Magic points. For a given basis \( \{\varphi _i\}_{i=1}^n\) of U, a set of interpolation points \(\varGamma = \{x^k\}_{k=1}^n\), called magic points, can be determined with a greedy algorithm proposed in [32, Remark 2]. The procedure for selecting the set \(\varGamma \) in a subset \(\varGamma _\star \) in X is as follows. We first determine a point \(x^1\in \varGamma _\star \) and an index \(i_1\) such that

$$\begin{aligned} \vert \varphi _{i_1}(x^1) \vert = \max _{x\in \varGamma _\star } \max _{1\le i \le n} \vert \varphi _i(x) \vert . \end{aligned}$$

Then for \(k\ge 1\), we define \(\psi _i^{(k)}(x) = \varphi _i(x) -\sum _{m=1}^{k} \sum _{p=1}^{k} \varphi _{i_m}(x) a^{(k)}_{m,p} \varphi _i(x^p)\), with the matrix \((a^{(k)}_{m,p})_{1\le m,p\le k}\) being the inverse of the matrix \((\varphi _{i_m}(x^p))_{1\le p \le k,1\le m \le k}\), such that \(\psi _{i_m}^{(k)}(x)=0\) for all \(1\le m \le k\) and \(x\in X\), and \(\psi _{i}^{(k)}(x^p)=0\) for all \(1\le p \le k\) and \(1\le i \le n\). Then, we determine the point \(x^{k+1} \in \varGamma _\star \) and an index \(i_{k+1}\) such that

$$\begin{aligned} \vert \psi ^{(k)}_{i_{k+1}}(x^{k+1}) \vert = \max _{x\in \varGamma _\star } \max _{1\le i \le n} \vert \psi _i^{(k)}(x) \vert . \end{aligned}$$

2.2.2 Discrete least-squares projection

Let U be a n-dimensional subspace of V and let \(\varGamma = \{x^k\}_{k=1}^m\) be a set of m points in X, \(m\ge n\), such that \(\Vert v \Vert _\varGamma = (\sum _{x\in \varGamma } v(x)^2)^{1/2}\) defines a norm on U. The discrete least-squares projection \(Q_U\) is the linear operator from V to U such that for \(v\in V\), \(Q_U v\) is the unique element in U which minimizes \(\Vert v - u \Vert _\varGamma ^2\) over all \(u\in U\), or equivalently

$$\begin{aligned} (u,v-Q_Uv )_\varGamma = \sum _{x\in \varGamma } u(x) \langle \delta _x , v-Q_U v \rangle = 0 \quad \forall u \in U, \end{aligned}$$

where \((\cdot ,\cdot )_\varGamma \) is the inner product associated with the norm \(\Vert \cdot \Vert _\varGamma \) on U. The discrete least-squares projection \(Q_U\) is an oblique projection onto U along \(W = \{\sum _{x\in \varGamma } u(x) \delta _x : u \in U\}\). If \(\#\varGamma = \dim (U)\) and \(\varGamma \) is unisolvent for U, then \(Q_U\) coincides with the interpolation operator \(I_U\).

Proposition 5

Let U and \(\tilde{U}\) be two finite-dimensional subspaces such that \(U\subset \tilde{U}\). Let \(Q_U\) be the discrete least-squares projection onto U associated with a set of points \(\varGamma \) in X, and let \(Q_{\tilde{U}}\) be the discrete least-squares projection onto \(\tilde{U}\) associated with a set of points \(\tilde{\varGamma }\) in X. If either \(\varGamma = \tilde{\varGamma }\) or \(\varGamma \subset \tilde{\varGamma }\) and \(\tilde{\varGamma }\) is unisolvent for \(\tilde{U}\), then

$$\begin{aligned} Q_{U}Q_{\tilde{U}} = Q_{\tilde{U}} Q_U = Q_U. \end{aligned}$$

Proof

\(Q_U\) is the projection onto U along \( W= \{\sum _{x\in \varGamma } u(x) \delta _x : u \in U\}\), and \(Q_{\tilde{U}}\) is the projection onto \(\tilde{U}\) along \( \tilde{W}= \{\sum _{x\in \tilde{\varGamma }} \tilde{u}(x) \delta _x : \tilde{u} \in \tilde{U}\}\). If we prove that \(W\subset \tilde{W}\), then the result follows from Proposition 2. Let \(w = \sum _{x\in \varGamma } u(x) \delta _x \in W\), with \(u\in U\). If \(\varGamma = \tilde{\varGamma }\), then since \(u \in \tilde{U}\), we clearly have \(w\in \tilde{W}\). If \(\varGamma \subset \tilde{\varGamma }\) and \(\tilde{\varGamma }\) is unisolvent for \(\tilde{U}\), there exists a function \(\tilde{u} \in \tilde{U}\) such that \(\tilde{u}(x) = u(x)\) for all \(x\in \varGamma \) and \(\tilde{u}(x) = 0\) for all \(x\in \tilde{\varGamma }{{\setminus }} \varGamma \). Therefore, \(w = \sum _{x\in \tilde{\varGamma }} \tilde{u}(x) \delta _x\) is an element of \(\tilde{W}\), which ends the proof. \(\square \)

3 Tensors

Let \(\mathcal {H}_\nu \) be Hilbert spaces of real-valued functions defined on sets \(\mathcal {X}_\nu \) equipped with probability measures \(\mu _\nu \), \(1\le \nu \le d\). We denote by \(\Vert \cdot \Vert _{\mathcal {H}_\nu }\) the norm on \(\mathcal {H}_\nu \) and by \((\cdot ,\cdot )_{\mathcal {H}_\nu }\) the associated inner product. Let \(\mathcal {X}= \mathcal {X}_1\times \cdots \times \mathcal {X}_d\) and \(\mu = \mu _1\otimes \cdots \otimes \mu _d\). The tensor product of d functions \(v^\nu \in \mathcal {H}_\nu \), \(1\le \nu \le d\), denoted \(v^1\otimes \cdots \otimes v^d\), is a multivariate function defined on \(\mathcal {X}\) such that \((v^1\otimes \cdots \otimes v^d)(x) = v^1(x_1)\ldots v^d(x_d)\) for \(x=(x_1,\ldots ,x_d) \in \mathcal {X}\). Such a function is called an elementary tensor. The algebraic tensor space \(\mathcal {H}_{1}\otimes _a \ldots \otimes _a \mathcal {H}_d\) is defined as the linear span of all elementary tensors, which is a pre-Hilbert space when equipped with the canonical inner product \((\cdot ,\cdot )\) defined for elementary tensors by

$$\begin{aligned} ( v^1\otimes \cdots \otimes v^d,w^1\otimes \cdots w^d ) = ( v^1 , w^1)_{\mathcal {H}_1} \ldots ( v^d ,w^d)_{\mathcal {H}_d}, \end{aligned}$$

and then extended by linearity to the whole algebraic tensor space. We denote by \(\Vert \cdot \Vert \) the norm associated with inner product \((\cdot ,\cdot )\). A Hilbert tensor space \( \mathcal {H}= \overline{\mathcal {H}_{1}\otimes _a \ldots \otimes _a \mathcal {H}_d}^{\Vert \cdot \Vert }\) is then obtained by the completion of the algebraic tensor space, which we simply denote

$$\begin{aligned} \mathcal {H}= \mathcal {H}_1\otimes \cdots \otimes \mathcal {H}_d = \bigotimes _{\nu =1}^d \mathcal {H}_\nu . \end{aligned}$$

Example 1

Consider finite sets \(\mathcal {X}_\nu \) and \(\mathcal {H}_\nu = \mathbb {R}^{\mathcal {X}_\nu }\) equipped with the norm \(\Vert v \Vert _{\mathcal {H}_\nu }^2 = \sum _{x_\nu \in \mathcal {X}_\nu } \mu _\nu (\{x_\nu \}) \vert v(x_\nu )\vert ^2\). Then, \(\mathcal {H}\) is the space of multidimensional arrays \( \mathbb {R}^{\mathcal {X}_1}\otimes \cdots \otimes \mathbb {R}^{\mathcal {X}_d}\) and \(\Vert v \Vert ^2 = \sum _{x \in \mathcal {X}} \mu (\{x\}) \vert v(x) \vert ^2 \), where \(\mu (\{x_1,\ldots ,x_d\}) = \prod _{\nu =1}^d \mu _\nu (\{x_\nu \})\).

Example 2

Consider \(\mathcal {X}_\nu = \mathbb {R}\), \(\mu _\nu \) a finite measure on \(\mathbb {R}\), and \(\mathcal {H}_\nu = L^2_{\mu _\nu }(\mathcal {X}_\nu )\) equipped with the natural norm \(\Vert v \Vert _{\mathcal {H}_\nu }^2 = \int \vert v(x_\nu ) \vert ^2 \mu _\nu (dx_\nu )\). Then \(\mathcal {H}\) is identified with \(L^2_\mu (\mathcal {X})\), where \(\mu = \mu _1\otimes \cdots \otimes \mu _d\), and \(\Vert v \Vert ^2 = \int \vert v(x) \vert ^2 \mu (dx).\)

Example 3

Consider for \(\mathcal {H}_\nu \) a reproducing kernel Hilbert space (RKHS) with reproducing kernel \(k_\nu : \mathcal {X}_\nu \times \mathcal {X}_\nu \rightarrow \mathbb {R}\). Then \(\mathcal {H}\) is a RKHS with reproducing kernel \(k(x,x') = k_1(x_1,x_1')\ldots k_d(x_d,x_d')\).

For a non-empty subset \(\alpha \) in \(\{1,\ldots ,d\}:=D\), we let \(\mathcal {X}_\alpha \) be the set equipped with the product measure \(\mu _\alpha = \bigotimes _{\nu \in \alpha } \mu _\nu \). We denote by \(\mathcal {H}_\alpha = \bigotimes _{\nu \in \alpha } \mathcal {H}_\nu \) the Hilbert tensor space of functions defined on \(\mathcal {X}_\alpha \), equipped with the canonical norm \(\Vert \cdot \Vert _{\mathcal {H}_\alpha }\) such that

$$\begin{aligned} \Vert \bigotimes _{\nu \in \alpha } v^\nu \Vert _{\mathcal {H}_\alpha } = \prod _{\nu \in \alpha } \Vert v^\nu \Vert _{\mathcal {H}_\nu } \end{aligned}$$

for \(v^\nu \in \mathcal {H}_\nu \), \(1\le \nu \le d\). We have \(\mathcal {H}_D = \mathcal {H}\) and we use the convention \(\mathcal {H}_\emptyset = \mathbb {R}\).

Matricisations and\(\alpha \)-ranks. Let \(\alpha \subset D\), with \(\alpha \notin \{\emptyset ,D\}\), and let \(\alpha ^c = D{{\setminus }} \alpha \) be its complement in D. For \(x\in \mathcal {X}\), we denote by \(x_\alpha \) the subset of variables \( (x_\nu )_{\nu \in \alpha }\). A tensor \(v \in \mathcal {H}\) can be identified with an order-two tensor

$$\begin{aligned}\mathcal {M}_\alpha (v) \in \mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c},\end{aligned}$$

where \(\mathcal {M}_\alpha \) is the matricisation operator associated with \(\alpha \), which defines a linear isometry between \(\mathcal {H}\) and \(\mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\). We use the conventions \(\mathcal {M}_\emptyset (v) = \mathcal {M}_D (v) = v\) and \(\mathcal {H}_\emptyset \otimes \mathcal {H}_D = \mathcal {H}_D \otimes \mathcal {H}_\emptyset = \mathcal {H}\).

The \(\alpha \)-rank of a tensor \(v \in \mathcal {H}\), denoted \({\mathrm {rank}}_\alpha (v)\), is defined as the rank of the order-two tensor \(\mathcal {M}_{\alpha }(v)\), which is uniquely defined as the minimal integer such that

$$\begin{aligned} \mathcal {M}_{\alpha }(v) = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} v^\alpha _k \otimes v^{\alpha ^c}_k,\quad \text {or equivalently} \quad v(x) = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} v^\alpha _k(x_\alpha ) v^{\alpha ^c}_k(x_{\alpha ^c}), \end{aligned}$$
(2)

for some functions \(v^{\alpha }_k\in \mathcal {H}_\alpha \) and \(v^{\alpha ^c}_k \in \mathcal {H}_{\alpha ^c}\) of complementary subsets of variables \(x_\alpha \) and \(x_{\alpha ^c}\) respectively. By convention, we have \({\mathrm {rank}}_\emptyset (v) = {\mathrm {rank}}_D(v) = 1.\) From now on, when there is no ambiguity, \(\mathcal {M}_\alpha (v)\) and \(\mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\) will be identified with v and \(\mathcal {H}\) respectively.

Minimal subspaces. The minimal subspace\(U^{min}_\alpha (v)\) of v is defined as the smallest closed subspace in \( \mathcal {H}_\alpha \) such that

$$\begin{aligned} v \in U^{min}_\alpha (v) \otimes \mathcal {H}_{\alpha ^c}, \end{aligned}$$

and we have \({\mathrm {rank}}_\alpha (v) = \dim (U^{min}_\alpha (v))\) (see [13]). If v admits the representation (2), then \( U^{min}_\alpha (v)\) is the closure of \( {{\mathrm {span}}\{v^\alpha _k\}_{k=1}^{{\mathrm {rank}}_\alpha (v)}}. \) For any partition \(S(\alpha )\) of \(\alpha \), we have

$$\begin{aligned} U^{min}_{\alpha }(v) \subset \bigotimes _{\beta \in S(\alpha )} U^{min}_{\beta }(v). \end{aligned}$$

We have \(U^{min}_D(v) =\mathbb {R}v\) and for any partition S(D) of D,

$$\begin{aligned} v \in \bigotimes _{\beta \in S(D)} U^{min}_{\beta }(v) . \end{aligned}$$

3.1 Operators on tensor spaces

Let consider the Hilbert tensor space \(\mathcal {H}= \bigotimes _{\nu =1}^d \mathcal {H}_\nu \) equipped with the canonical norm \(\Vert \cdot \Vert \). For linear operators from \(\mathcal {H}\) to \(\mathcal {H}\), we also denote by \( \Vert \cdot \Vert \) the operator norm \(\Vert \cdot \Vert _{\mathcal {H}\rightarrow \mathcal {H}} = \Vert \cdot \Vert _\mathcal {H}\).

We denote by \({id}\) the identity operator on \(\mathcal {H}\). For a non-empty subset \(\alpha \subset D\), we denote by \({id}_\alpha \) the identity operator on \(\mathcal {H}_\alpha \). For \(A_\alpha \) in \(L(\mathcal {H}_\alpha )\), we define the linear operator \( A_{\alpha } \otimes {id}_{\alpha ^c}\) such that for \(v^\alpha \in \mathcal {H}_\alpha \) and \( v^{\alpha ^c} \in \mathcal {H}_{\alpha ^c}\),

$$\begin{aligned} (A_{\alpha } \otimes {id}_{\alpha ^c})(v^\alpha \otimes v^{\alpha ^c} ) = (A_\alpha v^\alpha )\otimes v^{\alpha ^c}, \end{aligned}$$

and we extend this definition by linearity to the whole algebraic tensor space \(\mathcal {H}_{\alpha }\otimes _a \mathcal {H}_{\alpha ^c}\). For a finite dimensional tensor space \(\mathcal {H}\), this completely characterizes a linear operator on \(\mathcal {H}\). For an infinite dimensional tensor space \(\mathcal {H}\), if \(A_\alpha \in \mathcal {L}(U_\alpha ,\mathcal {H}_\alpha )\), with \(U_\alpha \subset \mathcal {H}_\alpha \), then \(A_{\alpha } \otimes {id}_{\alpha ^c}\) can be extended by continuity to \(U_\alpha \otimes \mathcal {H}\).

We denote by \(\mathcal {A}_\alpha \), using calligraphic font style, the linear operator in \(L(\mathcal {H})\) associated with an operator \(A_\alpha \) in \(L(\mathcal {H}_\alpha )\), defined by \( \mathcal {A}_\alpha = \mathcal {M}_\alpha ^{-1} (A_{\alpha } \otimes {id}_{\alpha ^c}) \mathcal {M}_\alpha , \) and simply denoted

$$\begin{aligned} \mathcal {A}_\alpha = A_{\alpha } \otimes {id}_{\alpha ^c} \end{aligned}$$

when there is no ambiguity. If \(A_\alpha \in \mathcal {L}(\mathcal {H}_\alpha )\), then \(\mathcal {A}_\alpha \in \mathcal {L}(\mathcal {H})\) and the two operators have the same operator norm \( \Vert \mathcal {A}_\alpha \Vert = \Vert A_\alpha \Vert _{\mathcal {H}_\alpha }.\) Also, we have the following more general result.

Proposition 6

If \(A_\alpha \in \mathcal {L}(U_\alpha ,\mathcal {H}_\alpha )\), with \(U_\alpha \subset \mathcal {H}_\alpha \), then \(\mathcal {A}_\alpha \in \mathcal {L}(U_\alpha \otimes \mathcal {H}_{\alpha ^c},\mathcal {H})\) and the two operators have the same operator norm

$$\begin{aligned} \Vert \mathcal {A}_\alpha \Vert _{U_\alpha \otimes \mathcal {H}_{\alpha ^c} \rightarrow \mathcal {H}} = \Vert A_\alpha \Vert _{U_\alpha \rightarrow \mathcal {H}_\alpha }.\end{aligned}$$

Corollary 1

For a tensor \(v\in \mathcal {H}\) and an operator \(A_\alpha \in \mathcal {L}(U^{min}_\alpha (v),\mathcal {H}_\alpha )\),

$$\begin{aligned} \Vert \mathcal {A}_\alpha v \Vert \le \Vert A_{\alpha } \Vert _{U^{min}_\alpha (v)\rightarrow \mathcal {H}_\alpha } \Vert v \Vert . \end{aligned}$$

Let \(S = \{\alpha _1,\ldots ,\alpha _K\}\) be a collection of disjoint subsets of D and let \(A_{\alpha } \in L(\mathcal {H}_\alpha )\) be linear operators, \(\alpha \in S\). Then we can define a linear operator \(A_{\alpha _1} \otimes \cdots \otimes A_{\alpha _K}:=\bigotimes _{\alpha \in S} A_\alpha \) on \(\mathcal {H}_{\alpha _1} \otimes _a \ldots \otimes _a \mathcal {H}_{\alpha _K}\) such that

$$\begin{aligned} \left( \bigotimes _{\alpha \in S} A_\alpha \right) \left( \bigotimes _{\alpha \in S} v^\alpha \right) = \bigotimes _{\alpha \in S} (A_\alpha v^\alpha ) \end{aligned}$$

for \(v^\alpha \in \mathcal {H}_\alpha \), \(\alpha \in S\). The operator \(\bigotimes _{\alpha \in S} A_\alpha \) can be identified with an operator

$$\begin{aligned} \mathcal {A}= \prod _{\alpha \in S} \mathcal {A}_\alpha , \end{aligned}$$

defined on the algebraic tensor space \(\mathcal {H}_1\otimes _a \ldots \otimes _a \mathcal {H}_d\). The definition of \(\mathcal {A}\) is independent of the ordering of the elements of S. If the operators \(A_\alpha \) are continuous, then \(\mathcal {A}\) defines a continuous operator from \(\mathcal {H}\) to \(\mathcal {H}\) and since \(\Vert \cdot \Vert \) is a uniform crossnorm (see [20, Proposition 4.127]), the operator \(\mathcal {A}\) has for operator norm

$$\begin{aligned} \Vert \mathcal {A}\Vert = \prod _{\alpha \in S} \Vert \mathcal {A}_\alpha \Vert = \prod _{\alpha \in S} \Vert A_\alpha \Vert _{\mathcal {H}_\alpha }. \end{aligned}$$

Also, we have the following more general result.

Proposition 7

Let S be a collection of disjoint subsets of D and let \(\beta \subset D\) such that \(\beta \cup (\cup _{\alpha \in S} \alpha )=D\). Let \(U_\alpha \) be a subspace of \(\mathcal {H}_\alpha \) and \(A_{\alpha } \in \mathcal {L}(U_\alpha , \mathcal {H}_\alpha )\), for \(\alpha \in S\). Then \(\mathcal {A}= \prod _{\alpha \in S} \mathcal {A}_\alpha \) is a continuous operator from \(\mathcal {U}:= (\bigotimes _{\alpha \in S} U_\alpha )\otimes \mathcal {H}_{\beta }\) to \(\mathcal {H}\) such that

$$\begin{aligned} \Vert \mathcal {A}\Vert _{\mathcal {U}\rightarrow \mathcal {H}} = \prod _{\alpha \in S} \Vert \mathcal {A}_\alpha \Vert _{U_\alpha \otimes \mathcal {H}_{\alpha ^c} \rightarrow \mathcal {H}} =\prod _{\alpha \in S} \Vert A_\alpha \Vert _{U_\alpha \rightarrow \mathcal {H}_\alpha } . \end{aligned}$$

Corollary 2

Let S be a collection of disjoint subsets of D. For a tensor \(v\in \mathcal {H}\) and operators \(A_\alpha \), \(\alpha \in S\), such that \(A_\alpha \in \mathcal {L}(U^{min}_\alpha (v),\mathcal {H}_\alpha )\), the operator \(\mathcal {A}= \prod _{\alpha \in S} \mathcal {A}_\alpha \) is such that

$$\begin{aligned} \Vert \mathcal {A}v \Vert \le \Vert v \Vert \prod _{\alpha \in S} \Vert A_{\alpha } \Vert _{U^{min}_\alpha (v)\rightarrow \mathcal {H}_\alpha } . \end{aligned}$$

3.2 Partial evaluations of tensors

Let \(\alpha \) be a non-empty subset of D. For a linear form \(\psi _\alpha \in \mathcal {H}_\alpha ^*\), \(\psi _\alpha \otimes id_{\alpha ^c}\) is a linear operator from \(\mathcal {H}_{\alpha }\otimes _a \mathcal {H}_{\alpha ^c}\) to \(\mathcal {H}_{\alpha ^c}\) such that \( (\psi _\alpha \otimes id_{\alpha ^c} )(v^\alpha \otimes v^{\alpha ^c}) = \psi _\alpha (v^\alpha ) v^{\alpha ^c}.\) If \(\psi _\alpha \in \mathcal {H}_{\alpha }'\), the definition of \(\psi _\alpha \otimes id_{\alpha ^c}\) can be extended by continuity to \(\mathcal {H}\). Then \(\psi _\alpha \otimes id_{\alpha ^c}\) is a continuous operator from \(\mathcal {H}\) to \(\mathcal {H}_{\alpha ^c}\) with operator norm \(\Vert \psi _{\alpha } \otimes id_{\alpha ^c} \Vert _{\mathcal {H}\rightarrow \mathcal {H}_{\alpha ^c}}= \Vert \psi _\alpha \Vert _{\mathcal {H}_\alpha '}.\) Also, we have the following result.

Proposition 8

If \(\psi _\alpha \in U_{\alpha }'\), with \(U_\alpha \) a subspace of \( \mathcal {H}_\alpha \), then \(\psi _\alpha \otimes id_{\alpha ^c} \in \mathcal {L}(U_\alpha \otimes \mathcal {H}_{\alpha ^c},\mathcal {H}_{\alpha ^c})\) and

$$\begin{aligned} \Vert \psi _{\alpha } \otimes id_{\alpha ^c} \Vert _{U_\alpha \otimes \mathcal {H}_{\alpha ^c} \rightarrow \mathcal {H}_{\alpha ^c}}= \Vert \psi _\alpha \Vert _{U_\alpha '}. \end{aligned}$$

Corollary 3

For a tensor \(v \in \mathcal {H}\) and \(\psi _\alpha \in U^{min}_\alpha (v)'\), we have

$$\begin{aligned} \Vert (\psi _\alpha \otimes id_{\alpha ^c}) v \Vert \le \Vert \psi _{\alpha } \Vert _{(U^{min}_\alpha (v))'} \Vert v \Vert . \end{aligned}$$

For a point \(x_\alpha \in \mathcal {X}_\alpha \), we denote by \(\delta _{x_\alpha } \in \mathcal {H}_\alpha ^*\) the point evaluation functional at \(x_\alpha \), defined by \(\langle \delta _{x_\alpha } , v^\alpha \rangle = v^\alpha (x_\alpha )\) for \(v^\alpha \in \mathcal {H}_\alpha \). Then \(\delta _{x_\alpha } \otimes id_{\alpha ^c}\) defines a partial evaluation functional, which is a linear operator from \(\mathcal {H}\) to \(\mathcal {H}_{\alpha ^c}\) such that

$$\begin{aligned} (\delta _{x_\alpha }\otimes id_{\alpha ^c})(v^\alpha \otimes v^{\alpha ^c}) = v^\alpha (x_\alpha )v^{\alpha ^c}. \end{aligned}$$

From Corollary 3, we deduce that for a given tensor \(v \in \mathcal {H}\), if \(\delta _{x_{\alpha }} \in U^{min}_\alpha (v)'\), then the definition of \(\delta _{x_{\alpha }}\otimes id_{\alpha ^c}\) can be extended by continuity to \(U^{min}_\alpha (v) \otimes \mathcal {H}_{\alpha ^c}\) and the partial evaluation

$$\begin{aligned} v(x_\alpha ,\cdot ) = (\delta _{x_\alpha }\otimes id_{\alpha ^c})v \end{aligned}$$

is an element of \(\mathcal {H}_{\alpha ^c}\) such that

$$\begin{aligned} \Vert v(x_\alpha ,\cdot ) \Vert _{\mathcal {H}_{\alpha ^c}} = \Vert (\delta _{x_\alpha }\otimes id_{\alpha ^c})v \Vert \le \Vert \delta _{x_\alpha } \Vert _{ U^{min}_\alpha (v)'} \Vert v \Vert . \end{aligned}$$

3.3 Projection of tensors

Let \(\alpha \) be a non-empty and strict subset of D and let \(U_\alpha \) be a finite-dimensional subspace of \(\mathcal {H}_\alpha \). If \(P_{\alpha }\) is a projection from \(\mathcal {H}_\alpha \) onto \(U_\alpha \), then \(P_{\alpha }\otimes id_{\alpha ^c}\) is a projection from \(\mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\) onto \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\).

Proposition 9

Let \(v \in \mathcal {H}\) and \(\alpha ,\beta \subset D\). Let \(P_{\beta }\) be a projection from \(\mathcal {H}_\beta \) to a subspace \(U_\beta \) and let \(\mathcal {P}_{\beta }\) be the corresponding projection onto \(U_\beta \otimes \mathcal {H}_{\beta ^c}\). If \(\beta \subset \alpha \) or \(\beta \subset D{{\setminus }} \alpha \), we have

$$\begin{aligned} {\mathrm {rank}}_\alpha (\mathcal {P}_{\beta } v) \le {\mathrm {rank}}_\alpha ( v). \end{aligned}$$

Proof

A tensor v admits a representation \( v = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} v^\alpha _k \otimes w^{\alpha ^c}_k. \) If \(\beta \subset \alpha \), then \(\mathcal {P}_{\beta } = (P_{\beta }\otimes {id}_{\alpha {{\setminus }} \beta } ) \otimes {id}_{D{{\setminus }} \alpha }\) and \( \mathcal {P}_{\beta } v = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} ((P_{\beta }\otimes {id}_{\alpha {{\setminus }} \beta })v^\alpha _k) \otimes w^{\alpha ^c}_k. \) If \(\beta \subset D{{\setminus }} \alpha \), then \(\mathcal {P}_{\beta } = {id}_\alpha \otimes (P_{\beta } \otimes {id}_{D{{\setminus }} \{\alpha \cup \beta \}})\) and \( \mathcal {P}_{\beta } v = \sum _{k=1}^{{\mathrm {rank}}_\alpha (v)} v^\alpha _k \otimes ((P_{\beta }\otimes {id}_{D{{\setminus }} \{\alpha \cup \beta \}})w^{\alpha ^c}_k). \) The result follows from the definition of the \(\alpha \)-rank. \(\square \)

If \(P_{U_\alpha }\) is the orthogonal projection from \(\mathcal {H}_\alpha \) onto \(U_\alpha \), then \(P_{U_\alpha }\otimes id_{\alpha ^c}\) coincides with the orthogonal projection \(P_{U_\alpha \otimes \mathcal {H}_{\alpha ^c}}\) from \(\mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\) onto \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\), and is identified with the orthogonal projection \(\mathcal {P}_{U_\alpha } = P_{U_\alpha }\otimes id_{\alpha ^c}\) in \(\mathcal {L}(\mathcal {H})\). If \(P_{U_\alpha }^{W_\alpha }\) is the oblique projection onto \(U_\alpha \) along \(W_\alpha \subset \mathcal {H}_\alpha ^*\), then \(\mathcal {P}_{U_\alpha }^{W_\alpha }:=P_{U_\alpha }^{W_\alpha }\otimes id_{\alpha ^c}\) is the oblique projection from \(\mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\) onto \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\) along \(W_\alpha \otimes \mathcal {H}_{\alpha ^c}'\). If \(W_\alpha \subset \mathcal {H}_\alpha '\), then \(P_{U_\alpha }^{W_\alpha }\) and \(\mathcal {P}_{U_\alpha }^{W_\alpha }\) are continuous operators with equal norms \(\Vert \mathcal {P}_{U_\alpha }^{W_\alpha } \Vert = \Vert P_{U_\alpha }^{W_\alpha } \Vert _{\mathcal {H}_\alpha }.\)

Proposition 10

Let \(U_\alpha \) be a finite-dimensional subspace of \(\mathcal {H}_\alpha \) and let \(P_{U_\alpha }^{W_\alpha }\) be the projection onto \(U_\alpha \) along \(W_\alpha \). For a tensor \(v\in \mathcal {H}\) such that \(W_\alpha \subset U^{min}_\alpha (v)'\), \(\mathcal {P}^{W_\alpha }_{U_\alpha } v\) is an element of \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\) such that

$$\begin{aligned} \Vert \mathcal {P}^{W_\alpha }_{U_\alpha } v \Vert \le \Vert P^{W_\alpha }_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha } \Vert v \Vert , \end{aligned}$$

and

$$\begin{aligned} \Vert \mathcal {P}^{W_\alpha }_{U_\alpha } v - \mathcal {P}_{U_\alpha } v \Vert \le \Vert P^{W_\alpha }_{U_\alpha } - P_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha } \Vert v \Vert , \end{aligned}$$

with

$$\begin{aligned} \Vert P^{W_\alpha }_{U_\alpha } - P_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha } = \Vert P^{W_\alpha }_{U_\alpha } \Vert _{(id_\alpha - P_{U_\alpha }) U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha } \le \Vert P^{W_\alpha }_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha }. \end{aligned}$$

Also,

$$\begin{aligned} \Vert v - \mathcal {P}^{W_\alpha }_{U_\alpha } v \Vert ^2 \le ({1+\Vert P^{W_\alpha }_{U_\alpha } - P_{U_\alpha } \Vert ^2_{U^{min}_\alpha (v)\rightarrow \mathcal {H}_\alpha }}) \Vert v - \mathcal {P}_{U_\alpha } v\Vert ^2. \end{aligned}$$

Proof

We have \(v \in U^{min}_\alpha (v)\otimes \mathcal {H}_{\alpha ^c}\). Noting that \(\Vert \mathcal {P}^{W_\alpha }_{U_\alpha } \Vert _{U^{min}_\alpha (v)\otimes \mathcal {H}_{\alpha ^c} \rightarrow \mathcal {H}} = \Vert P^{W_\alpha }_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha }\) and \(\Vert \mathcal {P}^{W_\alpha }_{U_\alpha } - \mathcal {P}_{U_\alpha } \Vert _{U^{min}_\alpha (v)\otimes \mathcal {H}_{\alpha ^c} \rightarrow \mathcal {H}} = \Vert P^{W_\alpha }_{U_\alpha } - P_{U_\alpha } \Vert _{U^{min}_\alpha (v) \rightarrow \mathcal {H}_\alpha }\), the results directly follow from Proposition 3. \(\square \)

Now, let \(\alpha \) be a non-empty subset of D and let \(S(\alpha )\) be a partition of \(\alpha \). Let \(P_{U_\beta }^{W_\beta }\) be oblique projections onto subspaces \(U_\beta \) of \(\mathcal {H}_\beta \) along \(W_\beta \subset \mathcal {H}_\beta ^*\), \(\beta \in S(\alpha )\). Then \( \bigotimes _{\beta \in S(\alpha )} P_{U_\beta }^{W_\beta } := P_{U_{S(\alpha )}}^{W_{S(\alpha )}}\) is the oblique projection from \(\mathcal {H}_{S(\alpha )} = \bigotimes _{\beta \in S(\alpha )} \mathcal {H}_\beta \) onto \( \bigotimes _{\beta \in S(\alpha )} U_\beta := U_{S(\alpha )}\) along \( \bigotimes _{\beta \in S(\alpha )} W_\beta := W_{S(\alpha )}\), and \(\mathcal {P}^{W_{S(\alpha )}}_{U_{S(\alpha )}} = P_{U_{S(\alpha )}}^{W_{S(\alpha )}} \otimes id_{ \alpha ^c}\) is the oblique projection from \(\mathcal {H}_{\alpha } \otimes \mathcal {H}_{\alpha ^c}\) to \(U_{S(\alpha )} \otimes \mathcal {H}_{\alpha ^c}\) along \(W_{S(\alpha )}\otimes \mathcal {H}_{\alpha ^c}'\). From Proposition 2, we directly obtain the following result.

Proposition 11

If \( U_\alpha \subset \bigotimes _{\beta \in S(\alpha )} U_\beta \) and \(W_\alpha \subset \bigotimes _{\beta \in S(\alpha )} W_\beta \), then

$$\begin{aligned} \mathcal {P}_{U_\alpha }^{W_\alpha } \left( \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta }^{W_\beta }\right) = \left( \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta }^{W_\beta }\right) \mathcal {P}_{U_\alpha }^{W_\alpha } = \mathcal {P}_{U_\alpha }^{W_\alpha } . \end{aligned}$$

4 Tree-based tensor formats

Let \(T \subset 2^D{{\setminus }} \emptyset \) be a dimension partition tree over D, with root D. The elements of T are called the nodes of the tree. Every node \(\alpha \in T\) with \(\#\alpha \ge 2\) has a set of sons \(S(\alpha )\) which form a partition of \(\alpha \), i.e. \(\bigcup _{\beta \in S(\alpha )} \beta = \alpha \). A node \(\alpha \in T\) with \(\#\alpha =1\) is such that \(S(\alpha )=\emptyset \) and is called a leaf of the tree. The set of leaves of T is denoted \(\mathcal {L}(T)\) (see an example on Fig. 1). For \(\alpha \in T\), we denote by \({\mathrm {level}}(\alpha )\) the level of \(\alpha \) in T, such that \({\mathrm {level}}(D)=0\) and \({\mathrm {level}}(\beta ) = {\mathrm {level}}(\alpha )+1\) if \(\beta \in S(\alpha )\). We let \(L = {\mathrm {depth}}(T)= \max _{\alpha \in T} {\mathrm {level}}(\alpha )\) be the depth of T, which is the maximum level of the nodes in T, and \(T_{\ell } = \{\alpha \in T : {\mathrm {level}}(\alpha )=\ell \}\) be the subset of nodes with level \(\ell \), \(0\le \ell \le L\). We let \( t_\ell = \bigcup _{\alpha \in T_{\ell }} \alpha . \) We have \(t_{\ell +1} \subset t_{\ell } \) and \( t_{\ell } {{\setminus }} t_{\ell +1} \subset \mathcal {L}(T) \) (see example on Fig. 2).

Fig. 1
figure 1

A dimension partition tree T over \(D=\{1,2,3,4,5,6\}\) and its leaves (blue nodes) (color figure online)

Fig. 2
figure 2

A dimension partition tree T over \(D=\{1,\ldots ,6\}\) with depth \(L=3\) and the corresponding subsets \(T_{\ell }\), \(0\le \ell \le L\). Here \(t_3 = \{2,3\}\) and \(t_2=t_1 =t_0= D\) (color figure online)

We introduce a subset of active nodes \(A \subset T {{\setminus }} \{D\}\) such that \(T {{\setminus }} A \subset \{D\}\cup \mathcal {L}(T) \), which means that the set of non active nodes in \(T {{\setminus }} \{D\}\) is a subset of the leaves (see Fig. 3). A set A is admissible if for any \(\alpha \in A\), the parent node of \(\alpha \) is in \(A\cup \{D\}\). We let \(\mathcal {L}(A) = A \cap \mathcal {L}(T)\), \(A_\ell = A \cap T_{\ell }\) for \(1\le \ell \le L\), and \(a_\ell = \cup _{\alpha \in A_\ell } \alpha \). We define the A-rank of a tensor \(v\in \mathcal {H}\) as the tuple \({\mathrm {rank}}_A(v) = \{{\mathrm {rank}}_\alpha (v)\}_{\alpha \in A}\).

Fig. 3
figure 3

A dimension partition tree T over \(D=\{1,2,3,4,5,6\}\) and an admissible subset of active nodes A (red nodes) (color figure online)

Now we consider a tensor \(v\in \mathcal {H}\) with \({\mathrm {rank}}_A(v) =(r_\alpha )_{\alpha \in A}\). We let \(r_D = {\mathrm {rank}}_D(v) =1\). For all \(\alpha \in A \cup \{D\}\), we denote by \(\{v^\alpha _{k_\alpha }\}_{k_\alpha =1}^{r_\alpha }\) a basis of the minimal subspace \(U^{min}_\alpha (v)\subset \mathcal {H}_\alpha \), and we let \(v^D_1=v\). For \(\alpha \in A \cup \{D\}\) such that \(\emptyset \ne S(\alpha )\subset A\), since \(U^{min}_\alpha (v) \subset \bigotimes _{\beta \in S(\alpha )} U^{min}_\beta (v)\), the tensor \(v^{\alpha }_{k_\alpha }\) admits a representation

$$\begin{aligned} v^\alpha _{k_\alpha }(x_\alpha ) = \sum _{\begin{array}{c} 1\le k_\beta \le r_\beta \\ \beta \in S(\alpha ) \end{array}} C^\alpha _{k_\alpha ,(k_\beta )_{\beta \in S(\alpha )}} \prod _{\beta \in S(\alpha )} v^\beta _{k_\beta }(x_\beta ), \end{aligned}$$

with a tensor of coefficients . For \(\alpha \in A \cup \{D\}\) such that \(\emptyset \ne S(\alpha )\not \subset A\), we have \(U^{min}_\alpha (v) \subset (\bigotimes _{\beta \in S(\alpha )\cap A} U^{min}_\beta (v))\otimes (\bigotimes _{\beta \in S(\alpha ){{\setminus }} A}\mathcal {H}_\beta )\), and therefore the tensor \(v^{\alpha }_{k_\alpha } \) admits a representation

$$\begin{aligned} v^\alpha _{k_\alpha }(x_\alpha ) = \sum _{\begin{array}{c} 1\le k_\beta \le r_\beta \\ \beta \in S(\alpha )\cap A \end{array}} C^\alpha _{k_\alpha ,(k_\beta )_{\beta \in S(\alpha )\cap A}}((x_{\beta })_{\beta \in S(\alpha ){{\setminus }} A}) \prod _{\beta \in S(\alpha ) \cap A} v^\beta _{k_\beta }(x_\beta ), \end{aligned}$$

with Finally, a tensor v such that \({\mathrm {rank}}_A(v) = (r_\alpha )_{\alpha \in A}\) admits a representation

$$\begin{aligned} v = \sum _{\begin{array}{c} 1\le k_\alpha \le r_\alpha \\ \alpha \in A \cup \{D\} \end{array}} \prod _{\alpha \in (A \cup \{D\} ){{\setminus }} \mathcal {L}(A)} C^\alpha _{k_\alpha ,(k_\beta )_{\beta \in S(\alpha )\cap A}}((x_{\beta })_{\beta \in S(\alpha ){{\setminus }} A}) \prod _{\alpha \in \mathcal {L}({A})} v^{\alpha }_{k_\alpha }(x_\alpha ) \end{aligned}$$
(3)

For a tuple \(r=(r_\alpha )_{\alpha \in A}\), we define the subset \(\mathcal {T}^A_r(\mathcal {H})\) of tensors in \(\mathcal {H}\) with A-rank bounded by r,

$$\begin{aligned} \mathcal {T}^A_r(\mathcal {H}) = \{v \in \mathcal {H}: {\mathrm {rank}}_\alpha (v) \le r_\alpha , \alpha \in A\} = \bigcap _{\alpha \in A} \mathcal {T}^{\{\alpha \}}_{r_\alpha }(\mathcal {H}). \end{aligned}$$

Remark 1

A tensor \(v \in \mathcal {T}^A_r(\mathcal {H})\) admits a representation as a composition of functions. For \(\alpha \in A\), let \(v^\alpha (x_\alpha ) = (v_{1}^\alpha ,\ldots ,v_{r_\alpha }^\alpha ) \in \mathbb {R}^{r_\alpha }\). If \(\emptyset \ne S(\alpha ) \subset A\), the tensor \(C^\alpha \) can be identified with a multilinear function , and \(v^\alpha (x_\alpha )\) admits the representation

$$\begin{aligned} v^\alpha (x_\alpha ) = f^\alpha ((v^{\beta }(x_\beta ))_{\beta \in S(\alpha )}). \end{aligned}$$

For \(\alpha \in A\cup \{D\}\) such that \(\emptyset \ne S(\alpha )\not \subset A\), the tensor \(C^\alpha ((x_{\beta })_{\beta \in S(\alpha ){{\setminus }} A})\) can be identified with a multilinear function , and \(v^\alpha (x_\alpha )\) admits the representation

$$\begin{aligned} v^\alpha (x_\alpha ) = f^\alpha ((v^{\beta }(x_\beta ))_{\beta \in S(\alpha )\cap A} , (x_{\beta })_{\beta \in S(\alpha ){{\setminus }} A}), \end{aligned}$$

where the \(f^\alpha \) is linear in the arguments associated with active nodes \(\beta \in S(\alpha )\cap A\). As an example, for the case of Fig. 3, the tensor v admits the representation

$$\begin{aligned} v(x)=f^{1,2,3,4,5,6}(f^{1,2,3}(x_1,f^{1,2}(x_2,v^{3}(x_3))), f^{4,5,6}(x_4,x_5,v^{6}(x_6))). \end{aligned}$$

Proposition 12

Let \(V=V_1\otimes \cdots \otimes V_d \subset \mathcal {H}\), with \(V_\nu \) a subspace of \(\mathcal {H}_\nu \) with dimension \(\dim (V_\nu )=n_\nu \), \(1\le \nu \le d\). The storage complexity of a tensor in \(\mathcal {T}_{r}^A(\mathcal {H}) \cap V = \mathcal {T}_r^A(V)\) is

$$\begin{aligned} \mathrm {storage}(\mathcal {T}^A_r(V)) = \sum _{\alpha \in (A\cup \{D\}){{\setminus }} \mathcal {L}(A)} r_\alpha \prod _{\beta \in S(\alpha )\cap A} r_\beta \prod _{\beta \in S(\alpha ){{\setminus }} A} n_\beta + \sum _{\alpha \in \mathcal {L}(A)} r_\alpha n_\alpha . \end{aligned}$$

Example 4

(Tucker format) The Tucker format corresponds to a trivial tree \(T = \{\{1,\ldots ,d\} ,\{1\},\ldots ,\{d\}\}\) with depth \(L=1\), and \(A = T{{\setminus }} \{D\}\) (see Fig. 4). A tensor v with A-rank bounded by \((r_1,\ldots ,r_d)\) admits a representation of the form

$$\begin{aligned} v(x) = \sum _{k_1=1}^{r_1}\ldots \sum _{k_d=1}^{r_d} C_{k_1,\ldots ,k_d} v^{1}_{k_1}(x_1) \ldots v^d_{k_d}(x_d), \end{aligned}$$
(4)

where \(C \in \mathbb {R}^{r_1\times \cdots \times r_d}\), and \(v^\nu _{k_\nu } \in \mathcal {H}_\nu \), \(1\le \nu \le d\), or equivalently

$$\begin{aligned} v(x) = f^{1,\ldots ,d}(v^1(x_1),\ldots ,v^d(x_d)). \end{aligned}$$
Fig. 4
figure 4

Tucker format. Dimension partition tree T over \(D=\{1,\ldots ,5\}\) and subset of active nodes A (red nodes) (color figure online)

Example 5

(Degenerate Tucker format) A degenerate Tucker format corresponds to a trivial tree \(T = \{\{1,\ldots ,d\} ,\{1\},\ldots ,\{d\}\}\) with depth \(L=1\), and an active set of nodes A strictly included in \( T{{\setminus }} \{D\}\). Up to a permutation of dimensions, this corresponds to \(A = \{\{1\},\ldots ,\{p\}\}\), with \(p<d\). A tensor v with A-rank bounded by \((r_1,\ldots ,r_p)\) admits a representation of the form

$$\begin{aligned} v(x) = \sum _{k_1=1}^{r_1}\ldots \sum _{k_p=1}^{r_p} C_{k_1,\ldots ,k_p}(x_{p+1},\ldots ,x_d) v^{1}_{k_1}(x_1) \ldots v^{{p}}_{k_p}(x_p), \end{aligned}$$
(5)

where \(C \in \mathbb {R}^{r_1\times \cdots \times r_p} \otimes \mathcal {H}_{\{p+1,\ldots ,d\}}\), and \(v^\nu _{k_\nu } \in \mathcal {H}_\nu \), \(1\le \nu \le p\), or equivalently

$$\begin{aligned} v(x) = f^{1,\ldots ,d}(v^1(x_1),\ldots ,v^{p}(x_p),x_{p+1},\ldots ,x_d). \end{aligned}$$

Example 6

(Tensor train format) The tensor train (TT) format corresponds to a linear tree \(T=\{\{1\},\{2\},\ldots ,\{d\},\{1,2\},\ldots ,\{1,\ldots ,d\}\}\) and \(A = \{\{1\},\{1,2\},\ldots ,\{1,\ldots ,d-1\}\}\) (see Fig. 5). Here, A is a strict subset of \(T{{\setminus }} \{D\}\). The nodes \(\{2\},\ldots ,\{d\}\) in T are not active.Footnote 3 A tensor v with A-rank bounded by \((r_1,\ldots ,r_{d-1})\) admits a representation of the form

$$\begin{aligned} v(x) = \sum _{k_1=1}^{r_1}\ldots \sum _{k_{d-1}=1}^{r_{d-1}} v^{1}_{k_1}(x_1) C^{2}_{k_1,k_2}(x_2) \ldots C^{d-1}_{k_{d-2},k_{d-1}}(x_{d-1}) C^d_{k_{d-1},1}(x_d), \end{aligned}$$

where \(v^1\in \mathbb {R}^{r_1}\otimes \mathcal {H}_1\), \(C^{\nu } \in \mathbb {R}^{r_{\nu -1} \times r_\nu } \otimes \mathcal {H}_\nu \) for \(2\le \nu \le d\), with the convention \(r_d=1\). Here \(L = d-1\), and for \(1\le \ell \le L\), \(T_{\ell } = \{\{1,\ldots ,d-\ell \},\{d-\ell +1\}\}\), \(t_\ell = \{1,\ldots ,d-\ell +1\}\), \(A_\ell = \{\{1,\ldots ,d-\ell \}\}\) and \(a_\ell = \{1,\ldots ,d-\ell \}\). The tensor v admits the equivalent representation

$$\begin{aligned} v(x) = f^{1,\ldots ,d}\left( f^{1,\ldots ,d-1}\left( \ldots f^{1,2}(v^1(x_1),x_2)\ldots ,x_{d-1}\right) ,x_d\right) . \end{aligned}$$
Fig. 5
figure 5

Tensor train format. Dimension partition tree T over \(D=\{1,\ldots ,5\}\) and active nodes A (red nodes) (color figure online)

Example 7

(Tensor train Tucker format) The tensor train Tucker (TTT) format corresponds to a linear tree \(T= \{\{1\},\ldots ,\{d\},\{1,2\},\ldots ,\{1,\ldots ,d\}\}\) and \(A = T{{\setminus }} \{D\}\) (see Fig. 6). A tensor v having a A-rank bounded by \((r_1,\ldots ,r_{d},s_{2},\ldots ,s_{d-1})\) admits a representation of the form (4) with a tensor \(C \in \mathbb {R}^{r_1\times \cdots \times r_d}\) such that

$$\begin{aligned} C_{k_1,\ldots ,k_d} = \sum _{i_2=1}^{s_{2}}\ldots \sum _{i_{d-1}=1}^{s_{d-1}} C^{2}_{k_1,k_2,i_2} C^{3}_{i_2,k_3,i_3} \ldots C^{d-1}_{i_{d-2},k_{d-1},i_{d-1}} C^d_{i_{d-1},k_{d},1}, \end{aligned}$$

where \(C^2 \in \mathbb {R}^{r_1\times r_2 \times s_2}\) and \(C^{k} \in \mathbb {R}^{s_{k-1}\times r_k \times s_{k}}\) for \(3\le k \le d\), with the convention \(s_d=1\). Here \(L = d-1\), \(T_{\ell } = A_\ell = \{\{1,\ldots ,d-\ell \},\{d-\ell +1\}\}\) and \(t_\ell =a_\ell = \{1,\ldots ,d-\ell +1\}\) for \(1\le \ell \le L\). The tensor v admits the equivalent representation

$$\begin{aligned} v(x) = f^{1,\ldots ,d}\left( f^{1,\ldots ,d-1}\left( \ldots f^{1,2}(v^1(x_1),v^2(x_2))\ldots ,v^{d-1}(x_{d-1})\right) ,v^d(x_d)\right) . \end{aligned}$$
Fig. 6
figure 6

Tensor train Tucker format. Dimension partition tree T over \(D=\{1,\ldots ,5\}\) and active nodes A (red nodes) (color figure online)

5 Principal component analysis for tree-based tensor format

5.1 Principal component analysis of multivariate functions

Here we introduce the notion of principal component analysis for multivariate functions. We consider a given non-empty and strict subset \(\alpha \) of D. Any tensor in \( \mathcal {H}\) is identified (through the linear isometry \(\mathcal {M}_\alpha \)) with its \(\alpha \)-matricisation in \( \mathcal {H}_\alpha \otimes \mathcal {H}_{\alpha ^c}\). A tensor u with \(\alpha \)-rank \({\mathrm {rank}}_{\alpha }(u) \in \mathbb {N}\cup \{+\infty \}\) admits a singular value decomposition (see [20, Section 4.4.3])

$$\begin{aligned} u = \sum _{k=1}^{{\mathrm {rank}}_{\alpha }(u)} \sigma _k^\alpha u^\alpha _k \otimes u^{\alpha ^c}_k, \end{aligned}$$
(6)

where \(\{u^\alpha _k\}_{k=1}^{{\mathrm {rank}}_\alpha (u)}\) and \(\{u^{\alpha ^c}_k\}_{k=1}^{{\mathrm {rank}}_\alpha (u)}\) are orthonormal vectors in \(\mathcal {H}_\alpha \) and \(\mathcal {H}_{\alpha ^c}\) respectively, and where the \(\sigma _k^\alpha \) are the \(\alpha \)-singular values of u which are supposed to be arranged by decreasing values. The minimal subspace \(U^{min}_\alpha (u)\) of u is given by

$$\begin{aligned} U^{min}_\alpha (u) = \overline{{\mathrm {span}}\{u^\alpha _k\}_{k=1}^{{\mathrm {rank}}_\alpha (u)}}^{\Vert \cdot \Vert _{\mathcal {H}_\alpha }}. \end{aligned}$$

For \(r_\alpha < {\mathrm {rank}}_{\alpha }(u)\), the truncated singular value decomposition

$$\begin{aligned} u_{r_\alpha } = \sum _{k=1}^{r_\alpha } \sigma ^\alpha _k u^\alpha _k \otimes u^{\alpha ^c}_k, \end{aligned}$$

is such that

$$\begin{aligned} \Vert u - u_{r_\alpha } \Vert ^2 = \min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u- v \Vert ^2 = \sum _{k=r_\alpha +1}^{{\mathrm {rank}}_\alpha ({u})} (\sigma _k^\alpha )^2. \end{aligned}$$

The functions \(\{u^\alpha _k\}_{k=1}^{r_\alpha }\) are the \(r_\alpha \) principal components of u associated with dimensions \(\alpha \), hereafter called \(\alpha \)-principal components. The corresponding subspace \( U_\alpha ^\star = {\mathrm {span}}\{u^\alpha _k\}_{k=1}^{r_\alpha }\), which is a subspace of \(U^{min}_{\alpha }(u)\), is hereafter called a \(\alpha \)-principal subspace of dimension \(r_\alpha \). Denoting \(\mathcal {P}_{U_\alpha ^\star } = P_{U_\alpha ^\star }\otimes id_{\alpha ^c}\) the orthogonal projection from \(\mathcal {H}\) to \(U_\alpha ^\star \otimes \mathcal {H}_{\alpha ^c}\), we have \( u_{r_\alpha } = \mathcal {P}_{U_\alpha ^\star } u, \)Footnote 4 and

$$\begin{aligned} \Vert u - \mathcal {P}_{U_\alpha ^\star } u \Vert = \min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u- v \Vert = \min _{\dim (U_\alpha )=r_\alpha } \Vert u - \mathcal {P}_{U_\alpha } u \Vert . \end{aligned}$$
(7)

Remark 2

The optimization problem (7) over subspaces of dimension \(r_\alpha \) in \(\mathcal {H}_\alpha \) admits a unique solution \(U_\alpha ^\star \) if and only if \(\sigma _{r_\alpha +1}^{\alpha }>\sigma _{r_\alpha }^\alpha \).

5.2 Principal component analysis for tree-based tensor format

Here, we propose and analyse an algorithm for the construction of an approximation \(u^\star \) of a function u in tree-based format \(\mathcal {T}^A_r(\mathcal {H})\). It is based on the construction of a hierarchy of subspaces \(U_\alpha \), \(\alpha \in A\), from principal component analyses of approximations of u in low-dimensional spaces in \( \mathcal {H}_\alpha \). This is a variant of the leaves-to-root higher-order singular value decomposition method proposed in [17] (see also [20, Section 11.4.2.3]).

For each leaf node \(\alpha \in \mathcal {L}(T)\), we introduce a finite-dimensional approximation space \(V_\alpha \subset \mathcal {H}_\alpha \) with dimension \(\dim (V_\alpha )=n_\alpha \), and we let \(V = \bigotimes _{\alpha \in \mathcal {L}(T)} V_\alpha \subset \mathcal {H}\). For each non active node \(\alpha \in \mathcal {L}(T){{\setminus }} A\), we let \(U_\alpha = V_\alpha \). The algorithm then goes through all active nodes of the tree, going from the leaves to the root. For each \(\alpha \in A\), we let

$$\begin{aligned} u_\alpha = \mathcal {P}_{V_\alpha } u, \end{aligned}$$

where for \(\alpha \notin \mathcal {L}(T)\), \(V_\alpha \) is defined by

$$\begin{aligned} V_\alpha = \bigotimes _{\beta \in S(\alpha )} U_\beta , \end{aligned}$$

where the \(U_\beta \), \(\beta \in S(\alpha )\), have been determined at a previous step. Then we determine the \(r_\alpha \)-dimensional \(\alpha \)-principal subspace \(U_\alpha \) of \(u_\alpha \), which is solution of

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert = \min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u_\alpha - v \Vert . \end{aligned}$$
(8)

Finally, we define

$$\begin{aligned} u^\star = \mathcal {P}_{V_D} u, \end{aligned}$$
(9)

where \(\mathcal {P}_{V_D}\) is the orthogonal projection from \(\mathcal {H}\) onto \(V_D = \bigotimes _{\beta \in S(D)} U_\beta .\)

5.3 Analysis of the algorithm

Lemma 1

For \(\alpha \in \mathcal {L}(A)\), \(U_\alpha \subset V_\alpha \). For \(\alpha \in A {{\setminus }} \mathcal {L}(A)\),

$$\begin{aligned} U_\alpha \subset \bigotimes _{\beta \in S(\alpha )} U_\beta . \end{aligned}$$

Proof

For \(\alpha \in A\), we have \(U_\alpha \subset U^{min}_\alpha (u_\alpha )\). If \(\alpha \in \mathcal {L}(A)\), we have \(U^{min}_\alpha (u_\alpha ) \subset V_\alpha \) since \(u_\alpha = \mathcal {P}_{V_\alpha } u\). If \(\alpha \in A {{\setminus }} \mathcal {L}(A)\), we have \(U^{min}_\alpha (u_\alpha ) \subset \bigotimes _{\beta \in S(\alpha )} U^{min}_\beta (u_\alpha )\), and \(U^{min}_\beta (u_\alpha ) \subset U_\beta \) since \(u_\alpha = \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta } u.\)\(\square \)

Proposition 13

The approximation \(u^\star \) is an element of \(\mathcal {T}_r^A(\mathcal {H}) \cap V = \mathcal {T}^A_r(V)\).

Proof

Since \(u^\star = \mathcal {P}_{V_D} u\), we have \(u^\star \in \bigotimes _{\alpha \in S(D)} U_\alpha \). Then using Lemma 1, we prove by recursion that \(u^\star \in \bigotimes _{\alpha \in \mathcal {L}(T)} V_\alpha = V\). Also, for any \(\beta \in A\), Lemma 1 implies that \(u^\star \in U_{\beta }\otimes \mathcal {H}_{\beta ^c}\). Therefore, \(U^{min}_\beta (u^\star ) \subset U_\beta \), and \({\mathrm {rank}}_{\beta }(u^\star ) \le \dim (U_\beta )=r_\beta \). This proves that \(u^\star \in \mathcal {T}_r^A(\mathcal {H})\). \(\square \)

For any level \(\ell \), \(1 \le \ell \le L\), let \(\mathcal {P}_{T_\ell } = \prod _{\alpha \in T_\ell } \mathcal {P}_{U_\alpha }\) be the orthogonal projection from \(\mathcal {H}\) onto \(U_{T_\ell } \otimes \mathcal {H}_{t_\ell ^c}\), with \(U_{T_\ell } = \bigotimes _{\alpha \in T_\ell } U_\alpha ,\) and let

$$\begin{aligned} u^\ell = \mathcal {P}_{T_\ell } u^{\ell +1}, \end{aligned}$$

with the convention \(u^{L+1}=u.\)

Lemma 2

For all \(1\le \ell <\ell ' \le L\), we have

$$\begin{aligned} \mathcal {P}_{T_{\ell '}}\mathcal {P}_{T_\ell } = \mathcal {P}_{T_\ell } =\mathcal {P}_{T_\ell }\mathcal {P}_{T_{\ell '}} . \end{aligned}$$

Proof

For \(1\le \ell <L\), we deduce from Lemma 1 that

$$\begin{aligned} U_{T_\ell } = \bigotimes _{\alpha \in T_\ell } U_\alpha \subset \left( \bigotimes _{\beta \in T_{\ell +1}} U_\beta \right) \otimes \left( \bigotimes _{\begin{array}{c} \alpha \in T_\ell \\ S(\alpha ) = \emptyset \end{array}} U_\alpha \right) \subset U_{T_{\ell +1}} \otimes \mathcal {H}_{t_{\ell }{{\setminus }} t_{\ell +1}}, \end{aligned}$$

and then \(U_{T_\ell } \otimes \mathcal {H}_{t_\ell ^c} \subset U_{T_{\ell +1}} \otimes \mathcal {H}_{t_{\ell +1}^c}.\) Therefore, for \(1\le \ell < \ell ' \le L\), we have \(U_{T_\ell } \otimes \mathcal {H}_{t_\ell ^c} \subset U_{T_{\ell '}} \otimes \mathcal {H}_{t_{\ell '}^c},\) and the result follows from Proposition 2. \(\square \)

From Lemma 2, we have that

$$\begin{aligned} u^\ell = \mathcal {P}_{T_\ell } u^{\ell +1} = \mathcal {P}_{T_\ell } \ldots \mathcal {P}_{T_L} u = \mathcal {P}_{T_\ell } u, \end{aligned}$$

for \(1\le \ell \le L\), and

$$\begin{aligned} u^\star = \mathcal {P}_{T_1} u = u^1. \end{aligned}$$

We now state the two main results about the proposed algorithm.

Theorem 1

For a given r, the approximation \(u^\star \in \mathcal {T}_r^A(\mathcal {H})\cap V\) satisfies

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le {\#A} \min _{v\in \mathcal {T}^A_r(\mathcal {H})} \Vert u-v \Vert ^2 + \sum _{\alpha \in \mathcal {L}(T)} \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2. \end{aligned}$$

Proof

We first note that for all \(1\le \ell < \ell ' \le L\), \(u^{\ell }-u^{\ell +1}\) is orthogonal to \(u^{\ell '}-u^{\ell '+1}\). Indeed, using Lemma 2, we obtain that

$$\begin{aligned} (u^{\ell }-u^{\ell +1} , u^{\ell '} - u^{\ell '+1})&= (u^{\ell }-u^{\ell +1}, \mathcal {P}_{T_{\ell '}}u^{\ell '+1} - \mathcal {P}_{T_{\ell '+1}} u^{\ell '+1}) \\&= (\mathcal {P}_{T_{\ell '}}(u^{\ell }-u^{\ell +1}) , \mathcal {P}_{T_{\ell '+1}} (\mathcal {P}_{T_{\ell '}}u^{\ell '+1} - u^{\ell '+1}))\\&= (\mathcal {P}_{T_{\ell '+1}} \mathcal {P}_{T_{\ell '}}(u^{\ell }-u^{\ell +1}) , \mathcal {P}_{T_{\ell '}}u^{\ell '+1} - u^{\ell '+1})\\&=(\mathcal {P}_{T_{\ell '}}(u^{\ell }-u^{\ell +1}) , ( \mathcal {P}_{T_{\ell '}}-id)u^{\ell '+1}) = 0. \end{aligned}$$

Then, we have

$$\begin{aligned} \Vert u - u^\star \Vert ^2&= \sum _{\ell =1}^L \Vert u^{\ell +1} - u^{\ell } \Vert ^2 = \sum _{\ell =1}^L \Vert u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1} \Vert ^2\\&\le \sum _{\ell =1}^L \sum _{\alpha \in T_\ell } \Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert ^2. \end{aligned}$$

From Lemma 2, we know that \(u^{\ell +1} = \mathcal {P}_{T_{\ell +1}} u \), where we use the convention \(\mathcal {P}_{T_{L+1}} = id\). For \(\alpha \in \mathcal {L}(T_\ell )\), since \(\mathcal {P}_{U_\alpha }\) and \(\mathcal {P}_{T_{\ell +1}}\) commute, \(\Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert = \Vert \mathcal {P}_{T_{\ell +1}} u - \mathcal {P}_{U_\alpha } \mathcal {P}_{T_{\ell +1}} u \Vert = \Vert \mathcal {P}_{T_{\ell +1}} (u- \mathcal {P}_{U_\alpha } u) \Vert \le \Vert u- \mathcal {P}_{U_\alpha } u \Vert .\) Therefore, for \(\alpha \in \mathcal {L}(T_\ell ){{\setminus }} \mathcal {L}(A)\), we have \(\Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert \le \Vert u- \mathcal {P}_{V_\alpha } u \Vert \), and for \(\alpha \in \mathcal {L}(A_\ell )\), we have \(\Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert ^2 \le \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + \Vert \mathcal {P}_{V_\alpha } u- \mathcal {P}_{U_\alpha } u \Vert ^2 = \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2 \).

For \(\alpha \in A_\ell {{\setminus }} \mathcal {L}(A)\), we have

$$\begin{aligned} u^{\ell +1} = \mathcal {P}_{T_{\ell +1}} u= \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )} \mathcal {P}_{U_\delta } \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta }u = \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )}\mathcal {P}_{U_\delta } u_\alpha , \end{aligned}$$

so that

$$\begin{aligned} \Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert = \Vert \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )} \mathcal {P}_{U_\delta } (u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha ) \Vert \le \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert . \end{aligned}$$

Gathering the above results, we obtain

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le \sum _{\alpha \in A } \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2+ \sum _{\alpha \in \mathcal {L}(T)} \Vert u - \mathcal {P}_{V_\alpha } u\Vert ^2 . \end{aligned}$$
(10)

For \(\alpha \in A\), we let \( U_\alpha ^\star \) be the subspace in \(\mathcal {H}_{\alpha } \) such that

$$\begin{aligned} \Vert u - \mathcal {P}_{U_\alpha ^\star } u \Vert = \min _{{\mathrm {rank}}_\alpha (v)\le r_\alpha } \Vert u - v \Vert \le \min _{{\mathrm {rank}}_A(v)\le r} \Vert u - v \Vert . \end{aligned}$$

For \(\alpha \in \mathcal {L}(A) \), we have \(u_\alpha = \mathcal {P}_{V_\alpha } u\). From Proposition 9, we know that \({\mathrm {rank}}_\alpha (\mathcal {P}_{V_\alpha }\mathcal {P}_{U_\alpha ^\star } u ) \le {\mathrm {rank}}_\alpha (\mathcal {P}_{U_\alpha ^\star } u)\le r_\alpha \). The optimality of \(U_\alpha \) then implies that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert \le \Vert \mathcal {P}_{V_\alpha }u - \mathcal {P}_{V_\alpha }\mathcal {P}_{U_\alpha ^\star } u \Vert \le \Vert u - \mathcal {P}_{U_\alpha ^\star } u \Vert . \end{aligned}$$

Now consider \(\alpha \notin A{{\setminus }} \mathcal {L}(A)\). We know that \({\mathrm {rank}}_\alpha (\prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta } \mathcal {P}_{U_\alpha ^\star }u) \le {\mathrm {rank}}_\alpha ( \mathcal {P}_{U_\alpha ^\star }u) \le r_\alpha \) from Proposition 9. The optimality of \(U_\alpha \) then implies that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert&\le \Vert u_\alpha - \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta } \mathcal {P}_{U_\alpha ^\star }u \Vert = \Vert \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\beta } (u - \mathcal {P}_{U_\alpha ^\star }u )\Vert \\&\le \Vert u - \mathcal {P}_{U_\alpha ^\star }u \Vert . \end{aligned}$$

Finally, we obtain

$$\begin{aligned} \sum _{\alpha \in A } \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2 \le \sum _{\alpha \in A} \min _{{\mathrm {rank}}_\alpha (v)\le r_\alpha } \Vert u - v \Vert ^2 \le \#A \min _{{\mathrm {rank}}_A(v)} \Vert u - v \Vert ^2, \end{aligned}$$

which ends the proof. \(\square \)

Theorem 2

For any \(\epsilon \ge 0\), if for all \(\alpha \in A\), the rank \(r_\alpha \) is chosen such that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert \le \frac{\epsilon }{\sqrt{\#A}} \Vert u_\alpha \Vert , \end{aligned}$$

the approximation \(u^\star \) satisfies

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le \sum _{\alpha \in \mathcal {L}(T)} \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 + \epsilon ^2 \Vert u \Vert ^2. \end{aligned}$$

Proof

Starting from (10), we obtain

$$\begin{aligned} \Vert u - u^\star \Vert ^2&\le \sum _{\alpha \in \mathcal {L}(T)} \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 + \sum _{\alpha \in A} \frac{\epsilon ^2}{{\#A}} \Vert u_\alpha \Vert ^2 , \end{aligned}$$

and the result follows from \(\Vert u_\alpha \Vert = \Vert \prod _{\beta \in S(\alpha )} \mathcal {P}_{U_\alpha }u\Vert \le \Vert u \Vert \) if \(\alpha \notin A{{\setminus }} \mathcal {L}(A)\), and \(\Vert u_\alpha \Vert = \Vert \mathcal {P}_{V_\alpha } u\Vert \le \Vert u \Vert \) if \(\alpha \in \mathcal {L}(A)\). \(\square \)

6 Empirical principal component analysis for tree-based tensor format

6.1 Empirical principal component analysis of multivariate functions

Here we present the empirical principal component analysis for the statistical estimation of \(\alpha \)-principal subspaces of a multivariate function (see Sect. 5.1). We consider that \(\mathcal {H}= L^2_{\mu }(\mathcal {X})\) or that \(\mathcal {H}\) is a separable reproducing kernel Hilbert space compactly embedded in \(L^2_{\mu }(\mathcal {X})\), equipped with the natural norm in \(L^2_{\mu }(\mathcal {X})\). Let \( (X_\alpha ,X_{\alpha ^c})\) be the random vector with values in \(\mathcal {X}_\alpha \times \mathcal {X}_{\alpha ^c}\) with probability law \( \mu _\alpha \otimes \mu _{\alpha ^c}\). The tensor u can be identified with a random variable defined on \(\mathcal {X}_{\alpha ^c}\) with values in \(\mathcal {H}_{\alpha }\) which associates to \(x_{\alpha ^c} \in \mathcal {X}_{\alpha ^c}\) the function \(u(\cdot ,x_{\alpha ^c}) = (id_\alpha \otimes \delta _{x_{\alpha ^c}}) u,\) this random variable being an element of the Bochner space \(L^2_{\mu _{\alpha ^c}}(\mathcal {X}_{\alpha ^c};\mathcal {H}_\alpha )\). Then problem (7) is equivalent to find a \(r_\alpha \)-dimensional subspace in \(\mathcal {H}_\alpha \) solution of

$$\begin{aligned} \min _{\dim (U_\alpha ) = r_\alpha } \mathbb {E}\left( \Vert u(\cdot , X_{\alpha ^c}) - P_{U_\alpha } u(\cdot , X_{\alpha ^c}) \Vert _{\mathcal {H}_{\alpha }}^2 \right) . \end{aligned}$$
(11)

Given a set \(\{x_{\alpha ^c}^k\}_{k=1}^{m_\alpha }\) of \(m_\alpha \) samples of \(X_{\alpha ^c}\), the \(\alpha \)-principal subspace can be estimated by an empirical\(\alpha \)-principal subspace\(\widehat{U}_\alpha \) solution of

$$\begin{aligned} \Vert u - \mathcal {P}_{\widehat{U}_\alpha } u \Vert _{\alpha ,m_\alpha } = \min _{ \dim (U_\alpha ) = r_\alpha } \Vert u - \mathcal {P}_{ U_\alpha } u\Vert _{\alpha ,m_\alpha }, \end{aligned}$$
(12)

where

$$\begin{aligned} \Vert u - \mathcal {P}_{ U_\alpha } u \Vert _{\alpha ,m_\alpha }^2 = \frac{1}{m_\alpha } \sum _{k=1}^{m_\alpha } \Vert u(\cdot ,x_{\alpha ^c}^k) - P_{U_\alpha } u (\cdot ,x_{\alpha ^c}^k) \Vert _{\mathcal {H}_\alpha }^2. \end{aligned}$$

The problem is equivalent to finding the \(r_\alpha \) left principal components of \(\{u(\cdot ,x_{\alpha ^c}^k)\}_{k=1}^{m_\alpha } \), which is identified with an order-two tensor in \(\mathcal {H}_\alpha \otimes \mathbb {R}^{m_\alpha }\). We note that the number of samples \(m_\alpha \) must be such that \(m_\alpha \ge r_\alpha \) in order to estimate \(r_\alpha \) principal components. In the case of i.i.d. samples, the semi-norm \(\Vert \cdot \Vert _{\alpha ,m_\alpha }\) on \(\mathcal {H}\) is the natural statistical estimation of the Bochner norm \(\Vert \cdot \Vert _\alpha \) in \(L^2_{\mu _{\alpha ^c}}(\mathcal {X}_{\alpha ^c};\mathcal {H}_\alpha )\), defined by \(\Vert v \Vert _\alpha ^2 = \mathbb {E}(\Vert v(X_{\alpha ^c}) \Vert _{\mathcal {H}_\alpha })\). This norm \(\Vert \cdot \Vert _\alpha \) coincides with the norm \(\Vert \cdot \Vert \) on \(\mathcal {H}\) when \(\mathcal {H}\) is equipped with the \(L^2_{\mu }(\mathcal {X})\)-norm.Footnote 5

For some results on the comparison between \(\Vert u - \mathcal {P}_{\widehat{U}_\alpha } u \Vert \) and the best approximation error \(\Vert u - \mathcal {P}_{ U_\alpha ^\star } u \Vert ,\) see [3, 24, 25, 41]. Under suitable assumptions on u (e.g., u uniformly bounded), for any \(\eta >0\) and \(\epsilon >0\), there exists a \(m_\alpha \) sufficiently large (depending on \(\eta ,\epsilon ,r_\alpha \) and u) such that

$$\begin{aligned} \Vert u - \mathcal {P}_{\widehat{U}_\alpha } u \Vert ^2 \le \Vert u - \mathcal {P}_{ U_\alpha ^\star } u \Vert ^2 + \epsilon ^2 \end{aligned}$$

holds with probability higher than \(1-\eta \). Then, for any \(\tau > 0\), there exists a \(m_\alpha \) sufficiently large (depending on \(\eta ,\tau ,r_\alpha \) and u) such that

$$\begin{aligned} \Vert u - \mathcal {P}_{\widehat{U}_\alpha } u \Vert ^2 \le (1+\tau ^2) \Vert u - \mathcal {P}_{ U_\alpha ^\star } u \Vert ^2 \end{aligned}$$

holds with probability higher than \(1-\eta \).

6.2 Empirical principal component analysis for tree-based format

Now we propose a modification of the algorithm proposed in Sect. 5.2 using only evaluations of the function u at some selected points in \(\mathcal {X}\). It is based on the construction of a hierarchy of subspaces \(\{U_\alpha \}_{\alpha \in A}\), from empirical principal component analysis, and a corresponding hierarchy of commuting interpolation operators associated with nested sets of points.

For each leaf node \(\alpha \in \mathcal {L}(T)\), we introduce a finite-dimensional approximation space \(V_\alpha \subset \mathcal {H}_\alpha \) with dimension \(\dim (V_\alpha )=n_\alpha \), we introduce a set \(\varGamma _{V_\alpha }\) of points in \(\mathcal {X}_\alpha \) which is unisolvent for \(V_\alpha \), we denote by \(I_{V_\alpha }\) the corresponding interpolation operator from \(\mathcal {H}_\alpha \) to \(V_\alpha \), and we let \(\mathcal {I}_{V_\alpha } = I_{V_\alpha } \otimes id_{\alpha ^c}\) be the corresponding oblique projection from \(\mathcal {H}\) to \(V_\alpha \otimes \mathcal {H}_{\alpha ^c}\). We let \(V = \bigotimes _{\alpha \in \mathcal {L}(T)} V_\alpha \subset \mathcal {H}\). For each non active \(\alpha \in \mathcal {L}(T){{\setminus }} A\), we let \(U_\alpha = V_\alpha \) and \(\varGamma _{U_\alpha } = \varGamma _{V_\alpha }\).

The algorithm then goes through all active nodes of the tree, going from the leaves to the root.

For each active node \(\alpha \in A\), we let

$$\begin{aligned} u_\alpha = \mathcal {I}_{V_\alpha } u \end{aligned}$$

where for \(\alpha \notin \mathcal {L}(A)\), the space \(V_\alpha \) is defined by

$$\begin{aligned} V_\alpha = \bigotimes _{\beta \in S(\alpha )} U_\beta , \end{aligned}$$

where the \(U_\beta \), \(\beta \in S(\alpha )\), have been determined at a previous step. For \(\alpha \notin \mathcal {L}(A)\), \(\mathcal {I}_{V_\alpha } = I_{V_\alpha } \otimes id_{\alpha ^c}\), where \(I_{V_\alpha }\) is the interpolation operator onto \(V_\alpha = \bigotimes _{\beta \in S(\alpha )} U_\beta \) associated with the product grid , where each \( \varGamma _{V_\beta } \) have been determined at a previous step. Then we determine a \(r_\alpha \)-dimensional empirical \(\alpha \)-principal subspace \(U_\alpha \) of \(u_\alpha \), which is solution of

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert _{\alpha ,m_\alpha } = \min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u_\alpha - v \Vert _{\alpha ,m_\alpha }, \end{aligned}$$
(13)

where

$$\begin{aligned} \Vert u_\alpha - v \Vert _{\alpha ,m_\alpha }^2 = \frac{1}{m_\alpha } \sum _{k=1}^{m_\alpha } \Vert u_{\alpha }(\cdot ,x_{\alpha ^c}^k) - v(\cdot ,x_{\alpha ^c}^k) \Vert _{\mathcal {H}_\alpha }^2, \end{aligned}$$

and where \(\{x_{\alpha ^c}^k\}_{k=1}^{m_\alpha }\) are \(m_\alpha \) random samples of \(X_{\alpha ^c}\), with \(m_\alpha \ge r_\alpha \). The problem is equivalent to finding the \(r_\alpha \) left principal components of \(\{u_\alpha (\cdot ,x_{\alpha ^c}^k)\}_{k=1}^{m_\alpha } \), which is identified with an order two tensor in \(V_\alpha \otimes \mathbb {R}^{m_\alpha }\). The number of evaluations of the function u for computing \(U_\alpha \) is \(m_\alpha \times \dim (V_\alpha )\). We let \(\{\varphi ^\alpha _k\}_{k=1}^{r_\alpha }\) be the set of principal components, such that \(U_\alpha = {\mathrm {span}}\{\varphi ^\alpha _k\}_{k=1}^{r_\alpha }\). We then construct a set of points \(\varGamma _{U_\alpha }\) which is unisolvent for \(U_\alpha \), and such that

$$\begin{aligned} \varGamma _{U_\alpha } \subset \varGamma _{V_\alpha }. \end{aligned}$$
(14)

For the practical construction of the set \(\varGamma _{U_\alpha }\), we use the procedure described in Sect. 2.2.1. We denote by \(I_{U_\alpha }\) the interpolation operator from \(\mathcal {H}_\alpha \) onto \(U_\alpha \) associated with the grid \(\varGamma _{U_\alpha }\), and we let \(\mathcal {I}_{U_\alpha } = I_{U_\alpha } \otimes id_{\alpha ^c}\) be the corresponding projection from \(\mathcal {H}\) onto \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\).

Finally, we compute

$$\begin{aligned} u^\star = \mathcal {I}_{V_D} u, \end{aligned}$$
(15)

where \(\mathcal {I}_{V_D} = \bigotimes _{\beta \in S(D)} I_{U_\beta }\) is the interpolation operator from \(\mathcal {H}\) onto \(V_D = \bigotimes _{\beta \in S(D)} U_\beta ,\) associated with the product grid .

6.3 Analysis of the algorithm

Let us first prove that the algorithm produces an approximation \(u^\star \) in the desired tensor format.

Lemma 3

For \(\alpha \in \mathcal {L}(T){{\setminus }} A\), \( U_\alpha = V_\alpha \). For \(\alpha \in \mathcal {L}(A)\), \(U_\alpha \subset V_\alpha \). For \(\alpha \in A {{\setminus }} \mathcal {L}(A)\),

$$\begin{aligned} U_\alpha \subset \bigotimes _{\beta \in S(\alpha )} U_\beta . \end{aligned}$$

Proof

For \(\alpha \in A\), we have \(U_\alpha \subset U^{min}_\alpha (u_\alpha )\). If \(\alpha \in \mathcal {L}(A)\), we have \(U^{min}_\alpha (u_\alpha ) \subset V_\alpha \) since \(u_\alpha = \mathcal {I}_{V_\alpha } u\). If \(\alpha \in A {{\setminus }} \mathcal {L}(A)\), we have \(U^{min}_\alpha (u_\alpha ) \subset \bigotimes _{\beta \in S(\alpha )} U^{min}_\beta (u_\alpha )\), and \(U^{min}_\beta (u_\alpha ) \subset U_\beta \) since \(u_\alpha = \prod _{\beta \in S(\alpha )} \mathcal {I}_{U_\beta } u.\)\(\square \)

Proposition 14

The algorithm produces an approximation

$$\begin{aligned} u^\star \in \mathcal {T}_r^A(\mathcal {H}) \cap V = \mathcal {T}^A_r(V). \end{aligned}$$

Proof

Since \(u^\star = \mathcal {I}_{V_D} u\), we have \(u^\star \in V_D = \bigotimes _{\alpha \in S(D)} U_\alpha \). Then using Lemma 3, we prove by recursion that \(u^\star \in \bigotimes _{\alpha \in \mathcal {L}(T)} V_\alpha = V\). Also, for any \(\alpha \in A\), Lemma 3 implies that \(u^\star \in U_{\alpha } \otimes \mathcal {H}_{\alpha ^c}\). Therefore, \(U^{min}_\alpha (u^\star ) \subset U_\alpha \), and \({\mathrm {rank}}_{\alpha }(u^\star ) \le \dim (U_\alpha )=r_\alpha \). This proves that \(u^\star \in \mathcal {T}_r^A(\mathcal {H})\). \(\square \)

For all \(\alpha \in T\), the operator \(\mathcal {I}_{V_\alpha }= I_{V_\alpha } \otimes id_{\alpha ^c}\) is a projection from \(\mathcal {H}\) onto \(V_\alpha \otimes \mathcal {H}_{\alpha ^c}\) along \(W_{\alpha }^\star \otimes \mathcal {H}^*_{\alpha ^c},\) with \(W_\alpha ^\star = {\mathrm {span}}\{\delta _x : x \in \varGamma _{V_\alpha }\}\). For all \(\alpha \in T{{\setminus }} \{D\}\), the operator \(\mathcal {I}_{U_\alpha }= I_{U_\alpha } \otimes id_{\alpha ^c}\) is an oblique projection from \(\mathcal {H}\) onto \(U_\alpha \otimes \mathcal {H}_{\alpha ^c}\) along \(W_\alpha \otimes \mathcal {H}^*_{\alpha ^c},\) with \(W_\alpha = {\mathrm {span}}\{\delta _x : x \in \varGamma _{U_\alpha }\}\). From the property (14) of the grids, we deduce the following result.

Lemma 4

For \(\alpha \in \mathcal {L}(T){{\setminus }} A\), \( W_\alpha = W_{\alpha }^\star \). For \(\alpha \in \mathcal {L}(A)\), \(W_\alpha \subset W_\alpha ^\star \). For \(\alpha \in A {{\setminus }} \mathcal {L}(A)\),

$$\begin{aligned} W_\alpha \subset W_{S(\alpha )} = \bigotimes _{\beta \in S(\alpha )} W_\beta . \end{aligned}$$

Remark 3

Note that interpolation operators \(I_{U_\alpha }\), \(\alpha \in A\), could be replaced by oblique projections \(P^{W_\alpha }_{U_\alpha }\) onto \(U_\alpha \) along subspaces \(W_\alpha \) in \(\mathcal {H}_\alpha ^*\), with subspaces \(W_\alpha \) satisfying for \(\alpha \notin \mathcal {L}(T)\), \(W_\alpha \subset \bigotimes _{\beta \in S(\alpha )} W_\beta \). Under this condition, all results of this section remain valid.

For any level \(\ell \), \(1\le \ell \le L\), let

$$\begin{aligned} \mathcal {I}_{T_\ell } = \prod _{\alpha \in T_\ell } \mathcal {I}_{U_\alpha } = I_{U_{T_\ell }} \otimes id_{t_\ell ^c}, \end{aligned}$$

where \(I_{U_{T_\ell }} = \bigotimes _{\alpha \in T_\ell } I_{U_\alpha }\) is the interpolation operator from \(\mathcal {H}_{t_\ell } \) to \(U_{T_\ell } = \bigotimes _{\alpha \in T_\ell } U_\alpha \) associated with the tensor product grid and let

$$\begin{aligned} u^\ell = \mathcal {I}_{T_\ell }u^{\ell +1}, \end{aligned}$$

with the convention \(u^{L+1} = u.\) We then prove that operators \(\mathcal {I}_{T_\ell }\), \(1\le \ell \le L\), are commuting oblique projections.

Lemma 5

For all \(1\le \ell \le L\), the operator \(\mathcal {I}_{T_\ell }\) is an oblique projection from \(\mathcal {H}\) to \(\mathcal {U}_{\ell } := U_{T_\ell } \otimes \mathcal {H}_{t_\ell ^c}\) along \(\mathcal {W}_{\ell } := W_{T_\ell } \otimes \mathcal {H}_{t_\ell ^c}^*\). For all \(1\le \ell < \ell ' \le L\), we have \(\mathcal {U}_{\ell } \subset \mathcal {U}_{\ell '}\) and \(\mathcal {W}_{\ell }\subset \mathcal {W}_{\ell '}\), and therefore

$$\begin{aligned} \mathcal {I}_{T_\ell }\mathcal {I}_{T_{\ell '}} = \mathcal {I}_{T_\ell } = \mathcal {I}_{T_{\ell '}} \mathcal {I}_{T_\ell }. \end{aligned}$$

Proof

For \(1\le \ell < L\), we have

$$\begin{aligned} \mathcal {U}_{\ell } = \Big (\bigotimes _{\alpha \in T_\ell {{\setminus }} \mathcal {L}(T)} U_\alpha \Big ) \otimes \Big ( \bigotimes _{\alpha \in T_\ell \cap \mathcal {L}(T)} U_\alpha \Big ) \otimes \mathcal {H}_{t_\ell ^c}\; \text {and} \; \; \mathcal {U}_{{\ell +1}} = \Big (\bigotimes _{\beta \in T_{\ell +1}} U_\beta \Big ) \otimes \mathcal {H}_{t_{\ell +1}^c}. \end{aligned}$$

From Lemma 3, we know that \(\bigotimes _{\alpha \in T_\ell {{\setminus }} \mathcal {L}(T)} U_\alpha \) is a subspace of \(\bigotimes _{\beta \in T_{\ell +1}} U_\beta \subset \mathcal {H}_{t_{\ell +1}}.\) Therefore, we obtain \(\mathcal {U}_\ell \subset \mathcal {U}_{\ell +1}\). In the same way, using Lemma 4, we obtain that \(\mathcal {W}_\ell \subset \mathcal {W}_{\ell +1}\). We then deduce \(\mathcal {I}_{T_\ell } \mathcal {I}_{T_{\ell +1}} =\mathcal {I}_{T_{\ell +1}} \mathcal {I}_{T_\ell } = \mathcal {I}_{T_\ell } \) from Proposition 2, which ends the proof. \(\square \)

Lemma 6

The approximation \(u^\star \) satisfies

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le (1+\delta (L-1)) \sum _{\ell =1}^{L} \Vert u^{\ell +1} - u^\ell \Vert ^2 , \end{aligned}$$

where \(\delta = \max _{\ell } \delta _{T_\ell }\) and

$$\begin{aligned} \delta _{T_\ell } = \Vert I_{U_{T_\ell }}-P_{U_{T_\ell }} \Vert _{U^{min}_{T_\ell }(u^{\ell +1}) \rightarrow \mathcal {H}_{t_\ell }}, \end{aligned}$$

with \(t_\ell = \cup _{\alpha \in T_\ell } \alpha \). If \(u\in V\), then

$$\begin{aligned} \delta _{T_\ell }\le \delta _{A_\ell } := \Vert I_{U_{A_\ell }} - P_{U_{A_\ell }} \Vert _{U^{min}_{A_\ell }(u^{\ell +1})\rightarrow \mathcal {H}_{a_\ell }}, \end{aligned}$$

with \(a_\ell = \cup _{\alpha \in A_\ell } \alpha .\)

Proof

Since \(u - u^\star = \sum _{\ell =1}^L (u^{\ell +1} - u^{\ell })\), we have

$$\begin{aligned} \Vert u - u^\star \Vert ^2&= \sum _{\ell =1}^L \Vert u^{\ell +1} - u^{\ell } \Vert ^2 + 2\sum _{\ell ' < \ell }(u^{\ell +1} - u^{\ell },u^{\ell '+1} - u^{\ell '}). \end{aligned}$$

For \(\ell '<\ell \), since \(\mathcal {P}_{T_{\ell }}(u^{\ell '+1} - u^{\ell '}) = u^{\ell '+1} - u^{\ell '}\), we have

$$\begin{aligned} (u^{\ell +1} -u^{\ell },u^{\ell '+1} - u^{\ell '})&=(u^{\ell +1} -u^{\ell },\mathcal {P}_{T_{\ell }}(u^{\ell '+1} - u^{\ell '}))\\&= (\mathcal {P}_{T_\ell } (u^{\ell +1} - u^{\ell }) , u^{\ell '+1} - u^{\ell '})\\&= (\mathcal {P}_{T_\ell } u^{\ell +1} - \mathcal {I}_{T_\ell } u^{\ell +1} , u^{\ell '+1} - u^{\ell '})\\&= ( (\mathcal {P}_{T_{\ell }} - \mathcal {I}_{T_{\ell }})(u^{\ell +1} - u^{\ell }), u^{\ell '+1} - u^{\ell '})\\&\le \Vert (\mathcal {P}_{T_{\ell }} - \mathcal {I}_{T_{\ell }})(u^{\ell +1} - u^\ell )\Vert \Vert u^{\ell '+1} - u^{\ell '} \Vert , \end{aligned}$$

where we have used the fact that \(\mathcal {P}_{T_\ell }\mathcal {I}_{T_\ell } = \mathcal {I}_{T_\ell }\) and \((\mathcal {P}_{T_{\ell }} - \mathcal {I}_{T_{\ell }})u^{\ell }=0.\) Since \(\mathcal {P}_{T_\ell }- \mathcal {I}_{T_\ell }= (P_{U_{T_\ell }} - I_{U_{T_\ell }}) \otimes id_{t_{\ell }^c}\) and \(u^{\ell +1}-u^\ell = u^{\ell +1}-\mathcal {I}_{T_\ell } u^{\ell +1} \subset U^{min}_{T_\ell }(u^{\ell +1})\otimes \mathcal {H}_{ t_\ell ^c}\), we obtain from Proposition 10 that

$$\begin{aligned} \vert (u^{\ell +1} - u^{\ell },u^{\ell '+1} - u^{\ell '}) \vert&\le \delta _{T_\ell } \Vert u^{\ell +1} - u^\ell \Vert \Vert u^{\ell '+1} - u^{\ell '} \Vert , \end{aligned}$$

for \(\ell '<\ell \). We deduce that

$$\begin{aligned} \Vert u- u^\star \Vert ^2 \le \sum _{\ell ,\ell '=1}^L B_{\ell ,\ell '} \Vert u^{\ell +1} - u^{\ell } \Vert \Vert u^{\ell '+1} - u^{\ell '} \Vert \le \rho (B) \sum _{\ell =1}^L \Vert u^{\ell +1} - u^{\ell } \Vert ^2, \end{aligned}$$

where the matrix \(B\in \mathbb {R}^{L\times L}\) is such that \(B_{\ell ,\ell }=1\) and \(B_{\ell ,\ell '} = \delta _{T_{\max \{\ell ,\ell '\}}}\) if \(\ell \ne \ell '\). Using the theorem of Gerschgorin, we have that

$$\begin{aligned} \rho (B)\le 1 + \max _\ell \sum _{\ell '\ne \ell } B_{\ell ,\ell '} = 1 + \max _\ell \left( (\ell -1)\delta _{T_\ell } + \sum _{\ell '>\ell } \delta _{T_{\ell '}}\right) \le 1+ \delta (L-1), \end{aligned}$$

with \(\delta = \max _{\ell } \delta _{T_\ell }\).

Finally, when \(u\in V\), we have \(U^{min}_\alpha (u^{\ell +1}) \subset U^{min}_\alpha (u)\subset V_\alpha \) for all \(\alpha \in \mathcal {L}(T).\) Therefore, \(I_{V_\alpha }v = P_{V_\alpha }v\) for all \(v \in U^{min}_\alpha (u^{\ell +1}) \) and \(\alpha \in \mathcal {L}(T)\), and

$$\begin{aligned} \delta _{T_\ell }&= \Vert I_{U_{A_\ell }} \otimes I_{V_{T_\ell {{\setminus }} A_\ell }} - P_{U_{A_\ell }} \otimes P_{V_{T_\ell {{\setminus }} A_\ell }} \Vert _{U^{min}_{T_\ell }(u^{\ell +1})\rightarrow \mathcal {H}_{t_\ell }}\\&=\Vert (I_{U_{A_\ell }} - P_{U_{A_\ell }} ) \otimes P_{V_{T_\ell {{\setminus }} A_\ell }} \Vert _{U^{min}_{T_\ell }(u^{\ell +1})\rightarrow \mathcal {H}_{t_\ell }}\\&= \Vert I_{U_{A_\ell }} - P_{U_{A_\ell }} \Vert _{U^{min}_{A_\ell }(u^{\ell +1}) \rightarrow \mathcal {H}_{a_\ell }} \Vert P_{V_{T_\ell {{\setminus }} A_\ell }} \Vert _{U^{min}_{T_\ell {{\setminus }} A_\ell }(u^{\ell +1}) \rightarrow \mathcal {H}_{t_\ell {{\setminus }} a_\ell }}\\&\le \Vert I_{U_{A_\ell }} - P_{U_{A_\ell }} \Vert _{U^{min}_{A_\ell }(u^{\ell +1}) \rightarrow \mathcal {H}_{a_\ell }} = \delta _{A_\ell }. \end{aligned}$$

\(\square \)

Lemma 7

For \(1\le \ell \le L\),

$$\begin{aligned} \Vert u^{\ell +1} - u^\ell \Vert ^2 \le&(1+\delta _{T_\ell }^2) \Big ( \sum _{\alpha \in A_\ell } \varLambda _{T_{\ell +1}{{\setminus }} S(\alpha )}^2 (1+a_\alpha ) \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2\\&+\, \sum _{\alpha \in T_\ell \cap \mathcal {L}(T)} \varLambda _{T_{\ell +1}}^2 (1+ 2 a_\alpha \delta _\alpha ^2) \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 \Big ), \end{aligned}$$

where for \(S\subset T\),

$$\begin{aligned} \varLambda _{S}= & {} \prod _{\alpha \in S} \varLambda _\alpha (U_\alpha ), \quad \varLambda _\alpha (U_\alpha ) = \Vert I_{U_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha },\nonumber \\ a_\alpha= & {} \mathbf {1}_{\alpha \in \mathcal {L}(A)}\mathbf {1}_{\delta _\alpha \ne 0}, \end{aligned}$$
(16)

and

$$\begin{aligned} \delta _\alpha = \Vert I_{V_\alpha } - P_{V_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha } \end{aligned}$$
(17)

for \(\alpha \in \mathcal {L}(T)\). Moreover, if \(u\in V\), then \(\delta _\alpha = 0\) for all \(\alpha \in \mathcal {L}(T)\), and \(a_\alpha = 0\) for all \(\alpha \in T\).

Proof

For all \(1 \le \ell \le L\), we have

$$\begin{aligned} \Vert u^{\ell +1} - u^{\ell } \Vert ^2&=\Vert u^{\ell +1} - \mathcal {I}_{T_\ell } u^{\ell +1} \Vert ^2 = \Vert u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1} \Vert ^2 + \Vert \mathcal {I}_{T_\ell } u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1} \Vert ^2\\&=\Vert u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1} \Vert ^2 + \Vert (\mathcal {I}_{T_\ell } - \mathcal {P}_{T_\ell }) (u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1}) \Vert ^2\\&\le (1+\delta _{T_\ell }^2) \Vert u^{\ell +1} - \mathcal {P}_{T_\ell } u^{\ell +1} \Vert ^2 \le (1+\delta _{T_\ell }^2) \sum _{\alpha \in T_\ell } \Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert ^2. \end{aligned}$$

For \(\alpha \in T_\ell {{\setminus }} \mathcal {L}(T) = A_\ell {{\setminus }} \mathcal {L}(T)\),

$$\begin{aligned} u^{\ell +1} = \mathcal {I}_{T_{\ell +1}} u= \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )} \mathcal {I}_{U_\delta } \prod _{\beta \in S(\alpha )} \mathcal {I}_{U_\beta } u = \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )} \mathcal {I}_{U_\delta } u_\alpha , \end{aligned}$$

and since \(\mathcal {P}_{U_\alpha }\) and \(\prod _{\delta \in T_{\ell +1}{{\setminus }} S(\alpha )} \mathcal {I}_{U_\delta }\) commute, we have

$$\begin{aligned} \Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert&=\Vert \prod _{\delta \in T_{\ell +1} {{\setminus }} S(\alpha )} \mathcal {I}_{U_\delta } (u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha ) \Vert \le \varLambda _{T_{\ell +1} {{\setminus }} S(\alpha )} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert . \end{aligned}$$

Now for \(\alpha \in T_\ell \cap \mathcal {L}(T)\), we have that \(\mathcal {P}_{U_\alpha }\) and \(\mathcal {I}_{T_{\ell +1}}\) commute, and therefore

$$\begin{aligned} \Vert u^{\ell +1} - \mathcal {P}_{U_\alpha } u^{\ell +1} \Vert&= \Vert \mathcal {I}_{T_{\ell +1}} (u- \mathcal {P}_{U_\alpha } u) \Vert \le \varLambda _{T_{\ell +1}} \Vert u- \mathcal {P}_{U_\alpha } u \Vert . \end{aligned}$$

If \(\alpha \in T_\ell {{\setminus }} A_\ell \), we have \(U_\alpha =V_\alpha \). If \(\alpha \in A_\ell \cap \mathcal {L}(T)\), we have

$$\begin{aligned} \Vert u- \mathcal {P}_{U_\alpha } u \Vert ^2 = \Vert u- \mathcal {P}_{U_\alpha }\mathcal {P}_{V_\alpha } u \Vert ^2&= \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + \Vert (id - \mathcal {P}_{U_\alpha }) \mathcal {P}_{V_\alpha } u\Vert ^2, \end{aligned}$$

so that if \(\delta _\alpha = \Vert I_{V_\alpha } - P_{V_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha } =0\), we have \(\mathcal {P}_{V_\alpha } u = \mathcal {I}_{V_\alpha } u = u_\alpha \) and

$$\begin{aligned} \Vert u- \mathcal {P}_{U_\alpha } u \Vert ^2 \le \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + \Vert (id - \mathcal {P}_{U_\alpha }) u_\alpha \Vert ^2, \end{aligned}$$
(18)

and if \(\delta _\alpha \ne 0\), we have

$$\begin{aligned} \Vert u- \mathcal {P}_{U_\alpha } u \Vert ^2&\le \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + 2 \Vert (id - \mathcal {P}_{U_\alpha }) (\mathcal {P}_{V_\alpha } - \mathcal {I}_{V_\alpha }) u\Vert ^2 + 2 \Vert (id - \mathcal {P}_{U_\alpha }) \mathcal {I}_{V_\alpha } u\Vert ^2 \nonumber \\&= \Vert u- \mathcal {P}_{V_\alpha } u \Vert ^2 + 2 \Vert (id - \mathcal {P}_{U_\alpha }) (\mathcal {P}_{V_\alpha } - \mathcal {I}_{V_\alpha }) (u-\mathcal {P}_{V_\alpha }u)\Vert ^2 \nonumber \\&\quad +\, 2\Vert (id - \mathcal {P}_{U_\alpha }) u_\alpha \Vert ^2\nonumber \\&\le (1 + 2 \delta _\alpha ^2 )\Vert u -\mathcal {P}_{V_\alpha }u \Vert ^2 + 2\Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2, \end{aligned}$$
(19)

where we have used Proposition 10. We conclude from (18) and (19) that if \(\alpha \in A_\ell \cap \mathcal {L}(T)\),

$$\begin{aligned} \Vert u- \mathcal {P}_{U_\alpha } u \Vert ^2&\le (1 + 2 a_\alpha \delta _\alpha ^2 )\Vert u -\mathcal {P}_{V_\alpha }u \Vert ^2 +(1+a_\alpha )\Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2. \end{aligned}$$

Gathering the above results, we obtain

$$\begin{aligned}&\Vert u^{\ell +1} - u^\ell \Vert ^2 \le \left( 1+\delta _{T_\ell }^2\right) \Big (\sum _{\alpha \in A_\ell {{\setminus }} \mathcal {L}(T)} \varLambda _{T_{\ell +1}{{\setminus }} S(\alpha )}^2 \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2 \\&\quad +\, \sum _{\alpha \in A_\ell \cap \mathcal {L}(T)} (1+a_\alpha ) \varLambda _{T_{\ell +1}}^2 \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2\\&\quad +\, \sum _{\alpha \in A_\ell \cap \mathcal {L}(T)} (1 + 2 a_\alpha \delta _\alpha ^2 ) \varLambda _{T_{\ell +1}}^2 \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 + \sum _{\alpha \in T_\ell {{\setminus }} A_\ell } \varLambda _{T_{\ell +1}}^2 \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 \Big ) , \end{aligned}$$

which ends the proof. \(\square \)

We now state the two main results about the proposed algorithm.

Theorem 3

Assume that for all \(\alpha \in A\), the subspace \(U_\alpha \) is such that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert ^2 \le (1+ \tau ^2) \min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u_\alpha - v \Vert ^2 \end{aligned}$$
(20)

holds with probability higher than \(1-\eta \), for some \(\tau \ge 1\). Then the approximation \(u^\star \in \mathcal {T}_r^A(\mathcal {H}) \cap V\) is such that

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le (1+\tau ^2) C^2 \min _{v\in \mathcal {T}_r^A(\mathcal {H})} \Vert u-v \Vert ^2 + \sum _{\alpha \in \mathcal {L}(T)} D_\alpha ^2 \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 \end{aligned}$$
(21)

holds with probability higher than \(1-\#A \eta \), where C is defined by

$$\begin{aligned}&C^2 = (1+\delta (L-1)) \sum _{\ell =1}^{L} (1+\delta _{T_\ell }^2) \varLambda _{T_{\ell +1}}^2 \sum _{\alpha \in A_\ell } (1+a_\alpha ) \lambda _\alpha ^2, \end{aligned}$$
(22)

with

$$\begin{aligned} \lambda _\alpha = \mathbf {1}_{\alpha \notin \mathcal {L}(A)}+ \mathbf {1}_{\alpha \in \mathcal {L}(A)} \Vert I_{V_\alpha } \Vert _{U^{min}_\alpha (u) \rightarrow \mathcal {H}_\alpha } \end{aligned}$$
(23)

and \(a_\alpha \) and \(\delta _\alpha \) defined by (16) and (17) respectively, and where \(D_\alpha \) is defined by

$$\begin{aligned}&D_\alpha ^2 = (1+\delta (L-1)) (1+\delta _{T_\ell }^2) \varLambda _{T_{\ell +1}}^2 (1+2a_\alpha \delta _\alpha ^2) \end{aligned}$$
(24)

for \(\alpha \in \mathcal {L}(T) \cap T_\ell \).

Proof

For \(\alpha \in A\), let \(\widehat{U}_\alpha \) be a subspace such that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{\widehat{U}_\alpha } u_\alpha \Vert =\min _{{\mathrm {rank}}_{\alpha }(v) \le r_\alpha } \Vert u_\alpha - v \Vert , \end{aligned}$$

and let \( U_\alpha ^\star \subset U^{min}_\alpha (u) \) be a subspace such that

$$\begin{aligned} \Vert u - \mathcal {P}_{ U_\alpha ^\star } u \Vert = \min _{{\mathrm {rank}}_\alpha (v)\le r_\alpha } \Vert u - v \Vert \le \min _{v\in \mathcal {T}_r^A(\mathcal {H})} \Vert u - v \Vert . \end{aligned}$$

For \(\alpha \in \mathcal {L}(A)\), we have \(u_\alpha = \mathcal {I}_{V_\alpha } u\). We know that \({\mathrm {rank}}_\alpha (\mathcal {I}_{V_\alpha } \mathcal {P}_{ U_\alpha ^\star } u)\le r_\alpha \) from Proposition 9. By the optimality of \(\widehat{U}_\alpha \), we obtain

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{\widehat{U}_\alpha } u_\alpha \Vert \le \Vert u_\alpha - \mathcal {I}_{V_\alpha } \mathcal {P}_{ U_\alpha ^\star } u \Vert \le \Vert I_{V_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha } \Vert u - \mathcal {P}_{ U_\alpha ^\star } u \Vert . \end{aligned}$$

Now consider \(\alpha \in A {{\setminus }} \mathcal {L}(A)\). We know that \({\mathrm {rank}}_\alpha (\prod _{\beta \in S(\alpha )} \mathcal {I}_{U_\beta } \mathcal {P}_{ U_\alpha ^\star }u) \le {\mathrm {rank}}_\alpha ( \mathcal {P}_{ U_\alpha ^\star }u) \le r_\alpha \) from Proposition 9. By the optimality of \(\widehat{U}_\alpha \), we obtain

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{\widehat{U}_\alpha } u_\alpha \Vert&\le \Vert u_\alpha - \prod _{\beta \in S(\alpha )} \mathcal {I}_{U_\beta } \mathcal {P}_{ U_\alpha ^\star }u \Vert = \Vert \prod _{\beta \in S(\alpha )} \mathcal {I}_{U_\beta } (u - \mathcal {P}_{ U_\alpha ^\star }u )\Vert \\&\le \varLambda _{S(\alpha )}\Vert u - \mathcal {P}_{ U_\alpha ^\star }u \Vert . \end{aligned}$$

Then, using Lemma 7 and assumption (20), we obtain

$$\begin{aligned} \Vert u^{\ell +1} - u^\ell \Vert ^2 \le&(1+\delta _{T_\ell }^2) \varLambda _{T_{\ell +1}}^2 \Big ( \sum _{\alpha \in A_\ell } (1+a_\alpha ) \lambda _\alpha ^2 (1+ \tau ^2) \min _{{\mathrm {rank}}_\alpha (v)\le r_\alpha } \Vert u - v \Vert ^2 \\&+\, \sum _{\alpha \in T_\ell \cap \mathcal {L}(T)} (1+ 2a_\alpha \delta _\alpha ^2) \Vert u - \mathcal {P}_{V_\alpha } u \Vert ^2 \Big ). \end{aligned}$$

Then, using Lemma 6, we obtain (21). \(\square \)

Remark 4

Assume \(u\in V\) (no discretization). Then \(\delta _\alpha = 0\) and \(\Vert I_{V_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha }=1\) for all \(\alpha \in \mathcal {L}(T)\), \(a_\alpha = 0\) and \( \lambda _\alpha = 1\) for all \(\alpha \in T\), \(\varLambda _{T_{\ell }} = \varLambda _{A_\ell }\) and \(\delta _{T_\ell } = \delta _{A_\ell }\) for all \(\ell \). Also, the constant C defined by (22) is such that

$$\begin{aligned}&C^2 = (1+\delta (L-1)) \sum _{\ell =1}^{L} (1+\delta _{A_\ell }^2) \varLambda _{A_{\ell +1}}^2 \#A_\ell . \end{aligned}$$
(25)

Moreover, if \(U_\alpha = U^{min}_\alpha (u)\) for all \(\alpha \), then \(\varLambda _{T_\ell } = \varLambda _{A_\ell }=1\) and \(\delta _{T_\ell }=\delta _{A_\ell }=0\) for all \(\ell \), which implies

$$\begin{aligned}&C^2 = \#A . \end{aligned}$$
(26)

Theorem 4

Let \(\epsilon ,\tilde{\epsilon } \ge 0\). Assume that for all \(\alpha \in A\), the subspace \(U_\alpha \) is such that

$$\begin{aligned} \Vert u_\alpha - \mathcal {P}_{U_\alpha } u_\alpha \Vert \le \epsilon \Vert u_\alpha \Vert \end{aligned}$$
(27)

holds with probability higher than \(1- \eta \), and further assume that the subspaces \(V_\alpha \), \(\alpha \in \mathcal {L}(T)\), are such that

$$\begin{aligned} \Vert u - \mathcal {P}_{V_\alpha } u \Vert \le \tilde{\epsilon } \Vert u \Vert . \end{aligned}$$
(28)

Then the approximation \(u^\star \) is such that

$$\begin{aligned} \Vert u - u^\star \Vert ^2 \le (C^2 \epsilon ^2 + D^2 \tilde{\epsilon }^2) \Vert u \Vert ^2 \end{aligned}$$

holds with probability higher than \(1-\#A \eta \), where C is defined by (22) and where \(D^2 = \sum _{\alpha \in \mathcal {L}(T)} D_\alpha ^2\), with \(D_\alpha \) defined by (24), is such that

$$\begin{aligned} D^2 = (1+\delta (L-1)) \sum _{\ell =1}^{L} (1+\delta _{T_\ell }^2) \varLambda _{T_{\ell +1}}^2 \sum _{\alpha \in T_\ell \cap \mathcal {L}(T)} (1+2a_\alpha \delta _\alpha ^2) . \end{aligned}$$
(29)

Proof

We first note that for \(\alpha \in A {{\setminus }} \mathcal {L}(A),\) we have \( \Vert u_\alpha \Vert \le \varLambda _{S(\alpha )} \Vert u \Vert . \) Also, for \(\alpha \in \mathcal {L}(T)\), we have \( \Vert u_\alpha \Vert \le \lambda _\alpha \Vert u \Vert , \) with \(\lambda _\alpha \) defined in (23). Using Lemma 7 and assumptions (27) and (28), we then obtain

$$\begin{aligned} \Vert u^{\ell +1} - u^\ell \Vert ^2&\le (1+\delta _{T_\ell }^2) \varLambda _{T_{\ell +1}}^2 \left( \sum _{\alpha \in A_\ell } (1+a_\alpha ) \lambda _\alpha ^2 \epsilon ^2 \Vert u \Vert ^2 \right. \\&\quad \left. +\, \sum _{\alpha \in T_\ell \cap \mathcal {L}(T)} (1+2a_\alpha \delta _\alpha ^2) \tilde{\epsilon }^2 \Vert u \Vert ^2 \right) . \end{aligned}$$

Finally, we obtain the desired result by using Lemma 6. \(\square \)

Example 8

For the Tucker format described in Example 4, the constants C and D are given by

$$\begin{aligned} C^2= & {} (1+\delta _{T_1}^2) \sum _{\alpha \in \mathcal {L}(T)} (1+\mathbf {1}_{\delta _\alpha \ne 0}) \Vert I_{V_\alpha } \Vert _{U^{min}_\alpha (u)\rightarrow \mathcal {H}_\alpha }^2,\\ D^2= & {} (1+\delta _{T_1}^2)\sum _{\alpha \in \mathcal {L}(T)} (1+ 2\delta _\alpha ^2), \end{aligned}$$

with

$$\begin{aligned} \delta _{T_1}&= { \left\| \bigotimes _{\alpha \in \mathcal {L}(T)} I_{U_\alpha } - \bigotimes _{\alpha \in \mathcal {L}(T)} P_{U_\alpha } \right\| _{U^{min}_D(u)\rightarrow \mathcal {H}}} = { \left\| \bigotimes _{\alpha \in \mathcal {L}(T)} I_{U_\alpha } u- \bigotimes _{\alpha \in \mathcal {L}(T)} P_{U_\alpha } u \right\| }/{\Vert u \Vert } . \end{aligned}$$

If \(u\in V\), then

$$\begin{aligned} C = (1+ \delta _{T_1}^2)^{1/2} \sqrt{d}. \end{aligned}$$

Example 9

For the tensor train format described in Example 6, the constant C and D are given by

$$\begin{aligned} C^2&= (1+\delta (d-2)) \left( \sum _{\ell =1}^{d-2} \left( 1+\delta _{T_\ell }^2\right) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2 \Vert I_{V_{d-\ell }} \Vert ^2_{U^{min}_{d-\ell }(u)\rightarrow \mathcal {H}_{d-\ell }} \right. \\&\quad \left. +\, \left( 1+\delta _{T_{d-1}}^2\right) \left( 1+\mathbf {1}_{\delta _1\ne 0}\right) \Vert I_{V_1} \Vert ^2_{U^{min}_1(u)\rightarrow \mathcal {H}_1} \right) ,\\ D^2&= (1+\delta (d-2)) \left( \sum _{\ell =1}^{d-2} (1+\delta _{T_\ell }^2) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2\Vert I_{V_{d-\ell }} \Vert _{U^{min}_{d-\ell }(u)\rightarrow \mathcal {H}_{d-\ell }}^2 \right. \\&\quad \left. +\, \left( 1+\delta _{T_{d-1}}^2\right) (2+2\delta _1^2) \right) , \end{aligned}$$

with

$$\begin{aligned} \delta _{T_\ell } = \Vert I_{U_{\{1,\ldots ,d-\ell \}}}\otimes I_{V_{\{d-\ell +1\}}} -P_{U_{\{1,\ldots ,d-\ell \}}}\otimes P_{V_{\{d-\ell +1\}}} \Vert _{U^{min}_{\{1,\ldots ,d-\ell +1\}}(u^{\ell +1}) \rightarrow \mathcal {H}_{\{1,\ldots ,d-\ell +1\}}}. \end{aligned}$$

If \(u \in V\), then

$$\begin{aligned} C^2 = (1+\delta (d-2)) \left( \sum _{\ell =1}^{d-2} (1+\delta _{T_\ell }^2) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2 + (1+\delta _{T_{d-1}}^2) \right) . \end{aligned}$$

Example 10

For the tensor train Tucker format described in Example 7, the constant C and D are given by

$$\begin{aligned} C^2 =\,\,&(1+\delta (d-2)) \\&\times \Big (\sum _{\ell =1}^{d-2} \left( 1+\delta _{T_\ell }^2\right) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2 \varLambda _{\{d-\ell \}}^2\big (1+(1+\mathbf {1}_{\delta _{d-\ell +1}\ne 0})\Vert I_{V_{\{d-\ell +1\}}} \Vert _{U^{min}_{\{d-\ell +1\}} \rightarrow \mathcal {H}_{\{d-\ell +1\}}} \big ) \\&+\, \left( 1+\delta _{T_{d-1}}^2\right) \big (\left( 1+\mathbf {1}_{\delta _1\ne 0}\right) \Vert I_{V_1} \Vert ^2_{U^{min}_1(u)\rightarrow \mathcal {H}_1}+(1+\mathbf {1}_{\delta _2\ne 0})\Vert I_{V_2} \Vert ^2_{U^{min}_2(u)\rightarrow \mathcal {H}_2}\big ) \Big ) ,\\ D^2 =\,\,&(1+\delta (d-2)) \Big ( \sum _{\ell =1}^{d-2} \left( 1+\delta _{T_\ell }^2\right) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2 \varLambda _{\{d-\ell \}}^2 (2+2\delta _{d-\ell +1}^2) \\&+\, \left( 1+\delta _{T_{d-1}}^2\right) (1+2\delta _1^2)(1+2\delta _2^2) \Big ). \end{aligned}$$

If \(u \in V\), then

$$\begin{aligned} C^2 = (1+\delta (d-2)) \left( \sum _{\ell =1}^{d-2} 2(1+\delta _{T_\ell }^2) \varLambda _{\{1,\ldots ,d-\ell -1\}}^2 \varLambda _{\{d-\ell \}}^2 + (1+\delta _{T_{d-1}}^2) \right) . \end{aligned}$$

6.4 Complexity

Here we analyse the complexity of the algorithm in terms of the number of evaluations of the function. Evaluations of the function u are required (i) for the computation of the subspaces \(\{U_\alpha \}_{\alpha \in A}\) through empirical principal component analysis of the \(V_\alpha \)-valued functions \(u_\alpha (\cdot ,X_{\alpha ^c})\), with \(V_\alpha \) a given approximation space if \(\alpha \in \mathcal {L}(A)\) or \(V_\alpha = \bigotimes _{\beta \in S(\alpha )} U_\beta \) if \(\alpha \in A{{\setminus }} \mathcal {L}(A)\), and (ii) for the computation of the final interpolation \(\mathcal {I}_{V_D} u\).

We then obtain the following result about the number of evaluations of the function required by the algorithm

Proposition 15

The total number of evaluations of u required by the algorithm for computing an approximation \(u^\star \) in the tensor format \(\mathcal {T}_{r}^A(V)\) is

$$\begin{aligned} M(A,r,{m},n) =&\sum _{\alpha \in \mathcal {L}(A)} m_\alpha n_\alpha + \sum _{\alpha \in A {{\setminus }} \mathcal {L}(A)} m_\alpha \prod _{\beta \in S(\alpha )\cap A} r_\beta \prod _{\beta \in S(\alpha ){{\setminus }} A} n_\beta \\&+\, \prod _{\beta \in S(D)\cap A} r_\beta \prod _{\beta \in S(D){{\setminus }} A} n_\beta . \end{aligned}$$

where \(n=(n_\alpha )_{\alpha \in \mathcal {L}(T)}\), with \(n_\alpha =\dim (V_\alpha ),\) and \({m}=(m_\alpha )_{\alpha \in A}\), with \(m_\alpha \) the number of samples of the \(Z_{\alpha }\)-valued random variable \(u_\alpha (\cdot ,X_{\alpha ^c})\) used for computing \(U_\alpha \).

Proof

For \(\alpha \in A\), the function \(u_\alpha \) is an interpolation of u in \( Z_\alpha = V_\alpha \) if \(\alpha \in \mathcal {L}(A)\), or in \( Z_\alpha = \bigotimes _{\beta \in S(\alpha )} U_\beta = \left( \bigotimes _{\beta \in S(\alpha ) \cap A} U_\beta \right) \otimes \left( \bigotimes _{\beta \in S(\alpha ) {{\setminus }} A} V_\beta \right) \) if \(\alpha \notin \mathcal {L}(A).\) Therefore, computing \(u_\alpha (\cdot ,x_{\alpha ^c}^k)\) for one realization \(x^k_{\alpha ^c}\) of \(X_{\alpha ^c}\) requires \(\dim (V_\alpha ) = n_\alpha \) evaluations of u if \(\alpha \in \mathcal {L}(A)\) or \(\dim (\bigotimes _{\beta \in S(\alpha )} U_\beta ) = \prod _{\beta \in S(\alpha )\cap A} r_\beta \prod _{\beta \in S(\alpha ){{\setminus }} A} n_\beta \) if \(\alpha \notin \mathcal {L}(A)\). Finally, the computation of the interpolation \(\mathcal {I}_{T_1} u = \mathcal {I}_{S(D)} u\) requires \(\dim (\bigotimes _{\alpha \in S(D)} U_\alpha ) = \prod _{\beta \in S(D)\cap A} r_\beta \prod _{\beta \in S(D){{\setminus }} A} n_\beta \) evaluations of u. \(\square \)

For computing a \(r_\alpha \)-dimensional subspace \(U_\alpha \), the number of samples \(m_\alpha \) of \(u_\alpha (\cdot ,X_{\alpha ^c})\) has to be at least \(r_\alpha \).

Corollary 4

If the number of samples \(m_\alpha = r_\alpha \) for all \(\alpha \in A\), then the number of evaluations of the function required by the algorithm is

$$\begin{aligned} M(A,r, r,n) = \mathrm {storage}(\mathcal {T}_{r}^A(V)). \end{aligned}$$

The above result states that for a prescribed rank \(r = (r_\alpha )_{\alpha \in A}\), the algorithm is able to construct an approximation of u using a number os samples equal to the storage complexity of the tensor format \(\mathcal {T}_r^A(V)\).

When using the algorithm with a prescribed tolerance \(\epsilon \), the rank \(r_\alpha \) is not fixed a priori but defined as the minimal integer such that the condition (27) is satisfied. Since samples of \(u_\alpha (\cdot ,X_{\alpha ^c})\) belongs to the subspace \(U^{min}_\alpha (u_\alpha ) \subset Z_\alpha \) with dimension \({\mathrm {rank}}_\alpha (u_\alpha ) \le \dim (Z_\alpha )\), the selected rank \(r_\alpha \) is at most \( \dim (Z_\alpha )\). Therefore, by taking \(m_\alpha = \dim (Z_\alpha )\) for all \(\alpha \in A\), if we assume that the set of \(m_\alpha \) samples of \(u(\cdot ,X_{\alpha ^c})\) contains \({\mathrm {rank}}_{\alpha }(u_\alpha )\) linearly independent functions in \(Z_\alpha \), then the algorithm is able to produce an approximation with arbitrary small tolerance \(\epsilon .\)

Corollary 5

If the number of samples \(m_\alpha = \dim (Z_\alpha )\) for all \(\alpha \in A\), then

$$\begin{aligned} M(A,r,{m},n) =&\sum _{\alpha \in \mathcal {L}(A)} n_\alpha ^2 + \sum _{\alpha \in A {{\setminus }} \mathcal {L}(A)} \prod _{\beta \in S(\alpha )\cap A} r_\beta ^2 \prod _{\beta \in S(\alpha ){{\setminus }} A} n_\beta ^2 \\&+\, \prod _{\beta \in S(D)\cap A} r_\beta \prod _{\beta \in S(D){{\setminus }} A} n_\beta . \end{aligned}$$

Remark 5

For numerical experiments, when working with prescribed tolerance, we will use \(m_\alpha = \dim (Z_\alpha )\) for al \(\alpha \in A\).

7 Numerical examples

In all examples, we consider functions u in the tensor space \(L^2_\mu (\mathcal {X})\), with \(\mathcal {X}\subset \mathbb {R}^d\), equipped with the natural norm \(\Vert \cdot \Vert \) (see Example 2).Footnote 6 For an approximation \(u^\star \) provided by the algorithm, we estimate the relative error \(\varepsilon (u^\star ) = \Vert u - u^\star \Vert /\Vert u\Vert \) using Monte-Carlo integration. We denote by M the total number of evaluations of the function u required by the algorithm to provide an approximation \(u^\star \), and by S the storage complexity of the approximation \(u^\star \). Since the algorithm uses random evaluations of the function u (for the estimation of principal components), we run the algorithm several times and indicate confidence intervals of level \(90\%\) for \(\varepsilon (u^\star )\), and also for M, S and approximation ranks when these quantities are random.

For the approximation with a prescribed A-rank, we use \(m_\alpha = \gamma r_\alpha \) samples for the estimation of principal subspaces \(U_\alpha \), \(\alpha \in A\). If \(\gamma =1\), then \(M = S\) (see corollary 4).

For the approximation with a prescribed tolerance \(\epsilon \), we use \(m_\alpha = \dim (Z_\alpha ) \) for all \(\alpha \in A\) (see corollary 5 for the estimation of M).

In all examples except the last one, we use polynomial approximation spaces \(V_\nu = \mathbb {P}_p(\mathcal {X}_\nu )\) over \(\mathcal {X}_\nu \subset \mathbb {R}\), \(\nu \in D\), with the same polynomial degree p in all dimensions. For each \(\nu \in D\), we use an orthonormal polynomial basis of \(V_\nu = \mathbb {P}_p(\mathcal {X}_\nu )\) (Hermite polynomials for a Gaussian measure, Legendre polynomials for a uniform measure,...), and associated interpolation grids \(\varGamma ^\star _\nu \) selected in a set of 1000 random points (drawn from the measure \(\mu _\nu \)) by using the greedy algorithm described in Sect. 2.2.1.

7.1 Henon–Heiles potential

We consider \(\mathcal {X}= \mathbb {R}^d \) equipped with the standard Gaussian measure \(\mu \) and the modified Henon–Heiles potential [29]

$$\begin{aligned} u(x_1,\ldots ,x_d) = \frac{1}{2} \sum _{i=1}^d x_i^2 + \sigma _* \sum _{i=1}^{d-1} \left( x_ix_{i+1}^2-x_i^3\right) + \frac{\sigma _*^2}{16} \sum _{i=1}^{d-1} \left( x_i^2 + x_{i+1}^2\right) ^2, \end{aligned}$$

with \(\sigma _\star =0.2\). We consider approximation in the tensor train format \(\mathcal {T}_r^A(V)\) described in Example 6. The function is such that \({\mathrm {rank}}_{\alpha }(u) = 3\) for all \(\alpha \in A\). We use a polynomial degree \(p=4\), so that there is no discretization error, i.e. \(u\in V\).

In Table 1, we observe that the algorithm with a prescribed rank \(r=(3,\ldots ,3)\) is able to recover the function at very high precision with high probability with a number of samples equal to the storage complexity of the approximation (when \(\gamma =1\)), with no deterioration when the dimension d increases from 5 to 100. The accuracy is slightly improved when \(\gamma =100\) but with a much higher number of evaluations of the function.

Table 1 Henon–Heiles potential

7.2 Sine of a sum

We consider \(\mathcal {X}= [-1,1]^d \) equipped with the uniform measure and the function

$$\begin{aligned} u(x_1,\ldots ,x_d) = \sin (x_1 + \cdots + x_d). \end{aligned}$$

We consider approximation in the tensor train Tucker format \(\mathcal {T}_r^A(V)\) described in Example 7. The function is such that \({\mathrm {rank}}_\alpha (u) = 2\) for all \(\alpha \in A\). In Table 2, we observe the behavior of the algorithm with a prescribed rank \(r=(2,\ldots ,2)\) for different polynomial degrees p and different values of d. We observe a linear dependence of the complexity with respect to d.

Table 2 Sine of a sum

In Table 3, we observe the behavior of the algorithm with prescribed tolerance \(\epsilon = 10^{-12}\) and fixed polynomial degree \(p=17\), for different values of d. For this value of \(\epsilon \), the algorithm always provides an approximation with rank \((2,\ldots ,2)\) with a fixed number of evaluations which is about ten times the storage complexity.

Table 3 Sine of a sum

7.3 Sum of bivariate functions

We consider \(\mathcal {X}= [-1,1]^d \) equipped with the uniform measure and the function

$$\begin{aligned} u(x_1,\ldots ,x_d) = g(x_1,x_2) + g(x_3,x_4) + \cdots + g(x_{d-1},x_d) \end{aligned}$$
(30)

where g is a bivariate function, and \(d=10\). We consider approximation in the tensor train Tucker format \(\mathcal {T}_r^A(V)\) described in Example 7. The function is such that \({\mathrm {rank}}_{\{\nu \}}(u) = {\mathrm {rank}}(g)+1\) for all \(\nu \in D \), and \({\mathrm {rank}}_{\{1,\ldots ,\nu \}}(u)=2\) if \(\nu \) is even, or \({\mathrm {rank}}_{\{1,\ldots ,\nu \}}(u)={\mathrm {rank}}(g)+1\) if \(\nu \) is odd. Here, we use the algorithm we a prescribed tolerance \(\epsilon \).

We first consider the function \(g(y,z) = \sum _{j=0}^3 y^j z^j\) whose rank is 4 and we use polynomial spaces of degree \(p=5\), so that there is no discretization error. We observe in Table 4 the behavior of the algorithm for decreasing values of \(\epsilon \). For \(\epsilon = 10^{-4}\), the algorithm always provides the solution at almost machine precision, with an exact recovery of the rank of the function u. We observe that increasing \(\gamma \) (i.e. the number of evaluations for the estimation of principal components) allows us to obtain a more accurate approximation for a given prescribed tolerance but with a significant increase in the number of evaluations.

Table 4 Sum of bivariate functions (30) with \(g(y,z) = \sum _{j=0}^3 y^j z^j\)

We now consider the function \(g(y,z) = \exp ^{-\frac{1}{8}(y-z)^2}\) with infinite rank. We observe in Tables 5 and 6 the behavior of the algorithm for decreasing values of \(\epsilon \), and for a fixed polynomial degree \(p=10\) in Table 5, and an adaptive polynomial degree \(p(\epsilon ) = \log _{10}(\epsilon ^{-1})\) in Table 6. We observe that the relative error of the obtained approximation is below the prescribed tolerance with high probability. Also, we clearly see the interest of adapting the discretization to the desired precision, which yields a lower complexity for small or moderate \(\epsilon \).

Table 5 Sum of bivariate functions (30) with \(g(y,z) = \exp ^{-\frac{1}{8}(y-z)^2}\)
Table 6 Sum of bivariate functions (30) with \(g(y,z) = \exp ^{-\frac{1}{8}(y-z)^2}\)

7.4 Borehole function

We here consider the function

$$\begin{aligned} f(Y_1,\ldots ,Y_8)= \frac{2\pi Y_3(Y_4-Y_6)}{(Y_2-\log (Y_1)) \left( 1+\frac{2 Y_7 Y_3}{(Y_2-\log (Y_1))Y_1^2 Y_8}+ \frac{Y_3}{Y_5}\right) } \end{aligned}$$

which models the water flow through a borehole as a function of 8 independent random variables \(Y_1 \sim \mathcal {N}(0.1,0.0161812) \), \(Y_2 \sim \mathcal {N}(7.71, 1.0056)\), \(Y_3 \sim \mathcal {U}(63070,115600)\), \(Y_4\sim \mathcal {U}(990,1110)\), \(Y_5\sim \mathcal {U}(63.1,116)\), \(Y_6\sim \mathcal {U}(700,820)\), \(Y_7\sim \mathcal {U}(1120,1680)\), \(Y_8\sim \mathcal {U}(9855,12045)\). We then consider the function

$$\begin{aligned} u(x_1,\ldots ,x_d) = f(g_1(x_1),\ldots ,g_8(x_8)), \end{aligned}$$

where \(g_\nu \) are functions such that \(Y_\nu = g_\nu (X_\nu )\), with \(X_\nu \sim \mathcal {N}(0,1)\) for \(\nu \in \{1,2\}\), and \(X_\nu \sim \mathcal {U}(-1,1)\) for \(\nu \in \{3,\ldots ,8\}.\) Function u is then defined on \(\mathcal {X}= \mathbb {R}^2 \times [-1,1]^{6}.\) We use polynomial approximation spaces \(V_\nu = \mathbb {P}_p(\mathcal {X}_\nu )\), \(\nu \in D\). We consider approximation in the tensor train Tucker format \(\mathcal {T}_r^A(V)\) described in Example 7.

In Table 7, we observe the behavior of the algorithm with prescribed ranks \((r,\ldots ,r)\) and fixed degree \(p=10\). We observe a very fast convergence of the approximation with the rank. Increasing \(\gamma \) (i.e. the number of evaluations for the estimation of principal components) allows us to improve the accuracy for a given rank but it we look at the error as a function of the complexity M, \(\gamma =1\) is much better than \(\gamma =100\).

Table 7 Borehole function

In Table 8, we observe the behavior of the algorithm for decreasing values of \(\epsilon \), and for an adaptive polynomial degree \(p(\epsilon ) = \log _{10}(\epsilon ^{-1})\). We observe that for all \(\epsilon ,\) the relative error of the obtained approximation is below \(\epsilon \) with high probability. We note that the required number of evaluations M is about 2 to 4 times the storage complexity.

Table 8 Borehole function
Table 9 Tensorization of \(f(t) = t^2\), \(d=40\)
Table 10 Tensorization of \(f(t) = t^{1/2}\), \(d=40\)

7.5 Tensorization of a univariate function

We consider the approximation of the univariate function \(f : [0,1] \rightarrow \mathbb {R}\) using tensorization of functions [26, 40]. We denote by \( f_N\) the piecewise constant approximation of f on a uniform partition \(0 = t_0 \le t_1 \le \cdots \le t_{N} = 1\) with \(N=2^d\) elements, such that \(f_N(i h) = f(ih)\) for \(0\le i \le N\) and \(h=N^{-1}=2^{-d}.\) We denote by \(v\in \mathbb {R}^N\) the vector with components \(v(i) = f(ih)\), \(0\le i \le N-1\). The vector \(v \in \mathbb {R}^{2^d}\) can be identified with an order-d tensor \(u \in \mathcal {H} = \mathbb {R}^2\otimes \cdots \otimes \mathbb {R}^2\) such that

$$\begin{aligned} u(i_1,\ldots ,i_d) = v(i), \quad i=\sum _{k=1}^{d} i_{k} 2^{d-k}, \end{aligned}$$

where \((i_{1},\ldots ,i_{d})\in \{0,1\}^d = \mathcal {X}\) is the binary representation of the integer \(i\in \{0,\ldots ,2^d-1\}\). The set \(\mathcal {X}\) is equipped with the uniform measure \(\mu \). Then we consider approximation of the tensor u in tensor train format. The algorithm evaluates the tensor u at some selected entries \((i_1,\ldots ,i_d)\), which corresponds to evaluating the function f at some particular points \(t_i\).

In this finite-dimensional setting, we consider \(V = \mathcal {H}.\) In all examples, we consider \(d=40,\) and \(N=2^d \approx 10^{12}\). This corresponds to a storage complexity of one terabyte for the standard representation of \(f_N\) as a vector v of size N.

We observe in Tables 9 and 10 the behavior of the algorithm with prescribed tolerance \(\epsilon \) applied to the functions \(f(t)=t^2\) and \(f(t)=t^{1/2}\) respectively. We indicate relative errors in \(\ell ^2\) and \(\ell ^\infty \) norms between the tensor u and the approximation \(u^\star \). Let us recall that for \(f(t)=t^\alpha \), the approximation error \(\Vert f - f_N \Vert _{L^\infty } = O(N^{-\beta }) = O(2^{-d\beta })\) with \(\beta = \min \{1,\alpha \}\), which is an exponential convergence with respect to d. For the function \(f(t) = t^2\), we observe that the relative error in \(\ell ^2\) norm is below the prescribed tolerance with high probability. For the function \(f(t) = t^{1/2}\), the probability of obtaining a relative error in \(\ell ^2\) norm below the prescribed tolerance decreases with \(\epsilon \) but the ratio between the true relative error and the prescribed tolerance remains relatively small (below 100). We note that for \(f(t)=t^2\), the approximation ranks are bounded by 3, which is the effective rank of \(f_N\). For \(f(t)=t^{1/2}\), the approximation ranks slowly increase with \(\epsilon ^{-1}\).

In both cases, we observe a very good behavior of the algorithm, which requires a number of evaluations which scales as \(\log (\epsilon ^{-1})\).