1 Introduction and Brief Summary of Main Results

Recursive Analysis provides a rigorous algorithmic foundation to Numerics, that is, to operations on continuous data by means of approximations [18]. It has a long history of thorough investigations regarding ordinary (ODEs) and partial differential equations (PDEs) with respect to computability and real complexity. For example, it has been established that (a) ODEs with a polynomial/analytic right-hand side can be solved in (appropriately parameterized) sequential polynomial time [3]; while (b) general non-linear ODEs are optimally solved by Euler’s Method using an amount of memory polynomial in the output precision parameter n [9], equivalently: in polynomial parallel time [2]; and (c) solving Poisson’s linear PDEs corresponds to the complexity class \({\#\mathsf P} \) [11].

The main definitions and notions of real complexity theory (the framework in which we work) are summarized in Sect. 2. The discrete complexity hierarchyFootnote 1 [17, Corollary 2.34] translates to the real setting (see Definition 4) as

$$\begin{aligned} {\mathbb {R}\mathsf {NC}} ^1 \;\subseteq \; {\mathbb {R}\mathsf {SPACE}} (\log n) \;\subseteq \; {\mathbb {R}\mathsf {NC}} ^2 \;\subseteq \; {\mathbb {R}\mathsf {PTIMESPACE}} (\log ^2 n) \;\subseteq \; {\mathbb {R}\mathsf {NC}} ^4 \nonumber \\ \text {and more generally } {\mathbb {R}\mathsf {NC}} ^i \;\subseteq {\mathbb {R}\mathsf {PTIMESPACE}} (\log ^i n) \;\subseteq {\mathbb {R}\mathsf {NC}} ^{2i} \ldots \subseteq \\ {\mathbb {R}\mathsf {PTIME}} \;\subseteq {\mathbb {R}{\#\mathsf P}} \;\subseteq {\mathbb {R}{\#\mathsf P}} ^{\#\mathsf P} \;\subseteq {\mathbb {R}\mathsf {PSPACE}} \;= {\mathbb {R}\mathsf {PAR}} \;\subseteq {\mathbb {R}\mathsf {EXP}} . \nonumber \end{aligned}$$
(1)

The present work continues the above complexity investigations for linear ODEs and PDEs with the following contributions (see rigorous formulations later in this section and ideas of proofs in Sect. 4):

  1. (a)

    Theorem 1 establishes linear systems of ODEs computable in polylogarithmic parallel time (=depth), specifically in \({\mathbb {R}\mathsf {NC}} ^2\), the real counterpart to \(\mathsf{NC} ^2\).

  2. (b)

    Also linear analytic PDEs can be solved in polylogarithmic parallel time according to Theorem 3, generalizing the case of analytic ODEs.

  3. (c)

    By Theorem 2, a large class of continuously differentiable linear PDEs with boundary conditions, classically treated with numerical difference schemes, can be solved with computational complexity \({\mathbb {R}\mathsf {PSPACE}} \);

  4. (d)

    For many cases with periodic boundary conditions this can be improved to \({\mathbb {R}{\#\mathsf P}} ^{{\#\mathsf P}}\) (probably even \({\mathbb {R}{\#\mathsf P}} \), which will be a subject of future investigation).

In all these cases, the output consists of (approximations up to error \(2^{-n}\) to) the real value u(t) or u(tx) of the solution at given time/space t and x. Our results are obtained by applying matrix/operator exponentials, known but rarely used in classical Numerics. They in turn rely on efficient recursive algorithms for powering polynomials, matrices and operators developed in this paper (see Sect. 3), which may be of independent interest:

  1. (e)

    Given \(A\in [-1;1]^{{\text {poly}}(n)\times {\text {poly}}(n)}\) with bounded powers \(\Vert A^k\Vert \le 1\), its power \(A^{{\text {poly}}(n)}\) can be computed in polylogarithmic depth \({\mathbb {R}\mathsf {NC}} \); see Proposition 6(c). This result is used to prove Theorem 1.

  2. (f)

    If the entries of \(A\in [-1;1]^{2^n\times 2^n}\) are computable in polynomial time, then the entries of \(A^{2^n}\) are computable in \({\mathbb {R}\mathsf {PSPACE}} \), and this is optimal in general; see Proposition 6(g). This result is used to prove Theorem 2, general case of a difference scheme approximating the considered system of PDEs.

  3. (g)

    For circulant matrices A of constant bandwidth with polynomial-time computable entries and bounded powers this result can be improved: the entries of the matrix power \(A^{2^n}\) are computable in \({\mathbb {R}{\#\mathsf P}} \), see Theorem 7, where also powering of polynomials as an important auxiliary tool is considered. These results are used to prove Theorem 2, particular case of a difference scheme.

  4. (h)

    Theorem 8 and Example 9 establish complexity of powering and exponentiation of (e.g. differential) operators in Banach spaces. These results are used to prove Theorem 3.

Note that the hypothesis on uniformly bounded powers of the matrices in (e) and (g) makes the problems trivial over integersFootnote 2, yet interesting and new in the real case under consideration. The matrices corresponding to convergent difference schemes for PDEs meet this hypothesis because of the stability property required for a difference scheme to be convergent. Our work can be regarded as instances and confirmation of [1].

When translating the classical discrete parallel/depth complexity to the real setting, we restrict to \(2^\ell \)-Lipschitz functions f. A family \(C_n\) of Boolean circuits is required to approximate \(y=f(x)\) up to error \(2^{-n}\) for every \(x\in {\text {dom}}(f)\), the latter given by approximations up to error \(2^{-\ell -n}\). Space complexity of f is formalized as the number of working tape cells used by a Turing machine, not counting input nor output tape, to similarly approximate \(y=f(x)\) up to error \(2^{-n}\): see Sect. 2 for details. We thus follow the general paradigm of measuring computational cost (depth=parallel/sequential time, memory) over the reals, other than in the discrete setting with finite inputs, in dependence on n reflecting roughly the number of correct bits of the output approximation attained in the worst-case over all (uncountably many) continuous arguments. ‘Binary’ error bounds \(2^{-n}\), rather than ‘unary’ 1 / n, capture roughly n correct binary digits and yield stronger statements.

We consider Cauchy (i.e. initial-value) problems for autonomous linear evolutionary differential equations in a general form

$$\begin{aligned} \tfrac{\partial }{\partial t} \vec u(t) \;=\; A\vec u(t), \quad t\in [0;1], \quad \vec u(0)=\vec \varphi . \end{aligned}$$
(2)

Here A is a matrix in the case of ODEs and a more general operator (including partial differentiation for PDEs) for other cases. Similarly, \(\vec {\varphi }\) is an initial real-valued vector for ODEs and an initial function in a more general setting. Let us now formally state our main contributions regarding differential equations:

Theorem 1

Given \(A\in [-1;1]^{d\times d}\) and \(\vec v\in [-1;1]^d\) and \(t\in [0;1]\), the solution

$$\begin{aligned} \vec u(t)=\exp (tA)\vec v:=\sum _k \frac{t^k}{k!} A^k\vec v\end{aligned}$$
(3)

to the system of linear ordinary differential Eq. (2) is computable by Boolean circuits of depth \(\mathcal {O}\big ((\log d+\log n)^2\big )\) and size polynomial in \(d+n\).

Recall that circuit depth is synonymous for parallel time. Theorem 1 thus formally captures the intuition that solving ODEs in the linear case is easier than in the analytic case [3], and much easier than in the general \(C^1\) smooth case proven “\(\mathsf{PSPACE} \)-complete” [9]. It is no loss of generality to impose unit bounds on \(A,\vec v,t\): the general case is covered by rescaling.

Our next result is concerned with finitely-often differentiable solutions to Eq. (2) with \(A=\sum \limits _{|\alpha |\le k}b_{\alpha }(x)D^{\alpha }_x\) and \(\vec {\varphi }=\vec {\varphi }(x)\). They can be reduced [4], by adding extra variables, to first-order linear systems of PDEs with

$$\begin{aligned} A=\sum \limits _{j=1}^mB_j(x)\frac{\partial }{\partial x_j},\end{aligned}$$
(4)

where \(B_j(x)\) are matrices of a suitable dimension.

Theorem 2

For \(m\in \mathbb {N}\) and a convex open bounded \(\varOmega \subseteq \mathbb {R}^m\) consider the initial-value problem (IVP) (2) with the operator (4), and the boundary-value problem (BVP) with additionally given linear boundary conditions \({\mathcal L}{u}(t,x)|_{\partial \varOmega }=0, (t,x)\in [0,1]\times \partial {\varOmega }\). Suppose the given IVP and BVP be well posed in that the classical solution \(\vec u:[0;1]\times \overline{\varOmega }\rightarrow \mathbb {R}\) (i) exists, (ii) is unique, and (iii) depends continuously on the data \(\varphi (x)\) and \(B_i(x)\) (for the BVP also on coefficients of \({\mathcal L}\)). More precisely we assume that \(u(t,x)\in C^2\) and its \(C^2\)-norm is bounded linearly by \(C^2\)-norms of the data (in functional spaces guaranteeing all the required properties). Moreover suppose that the given IVP and BVP admit a (iv) stable and (v) approximating with at least the first order of accuracy difference scheme \((A_n)\) in the sense of Definition 10.

  1. (a)

    If the difference scheme (meaning its matrix) \(A_n\) and the initial condition \(\varphi \) are (vi) computable in depth \(s(n)\ge \log (n)\) in the sense of Definition 5, then evaluating the solutions \(u:[0;1]\times \overline{\varOmega }\ni (t,x)\mapsto u(t,x)\) of both IVP and BVP is feasible in depth \(\mathcal {O}\big (s(2^n)+n\cdot \log n\big )\).

  2. (b)

    If \(A_n\) and \(\varphi \) are (vi’) computable in polynomial sequential time and \(A_n\) is additionally circulant of constant bandwidth, then the solution function u belongs to the real complexity class \({\mathbb {R}{\#\mathsf P}} ^{\#\mathsf P} \).

This second result establishes polynomial parallel time (equivalently: polynomial space or depth) complexity for the considered PDEs in the binary output precision parameter n and further down to the second level of the Counting Hierarchy. As a main tool we modify the classical difference schemes approach: standard step-by-step iterations would yield only exponential sequential time; we replace them with efficient matrix powering according to Proposition 6 and Theorem 7 below. It complements work like [14] measuring bit-cost in dependence on \(N=2^{\mathcal {O}(n)}\), the size of the grid under consideration, and implicitly supposing the difference scheme and initial data computable in logarithmic depth \(s(n)=\mathcal {O}(\log n)\): where we consider the output precision parameter n, and allow for more involved difference schemes with \(s(n)=\mathcal {O}(\log ^2 n)\), say. Theorem 2 also complements rigorous cost analyses considering approximations up to output error 1 / n, or up to fixed error and in dependence on the length of the (algebraic) input [15, 16]; and it vastly generalizes previous works on the computational complexity of Poisson’s PDE [11].

Between linear ODEs and finitely-often continuously differentiable PDEs are analytic PDEs, captured by the Kovalevskaya Theorem; and their computational complexity also turns out to lie between the aforementioned two:

Theorem 3 (Polynomial-Time/Polylogarithmic-Space Kovalevskaya)

Let \(f_1,\ldots ,f_e:[-1;1]^{d}\rightarrow \mathbb {R}^{e\times e}\) and \(v:[-1;1]^{d}\rightarrow \mathbb {R}^{e}\) denote real functions analytic on some open neighborhood of \([-1;1]^{d}\) and consider the system of linear partial differential equations

$$\begin{aligned} \partial _t \vec u(\vec x,t) \;{=}\; f_1(\vec x)\partial _1\vec u\;+\cdots +\;f_e(\vec x)\partial _e\vec u \qquad \vec u(\vec x,0)\equiv v . \end{aligned}$$
(5)

If \(f_1,\ldots f_e\) are computable in sequential polynomial time, then the unique real analytic local solution \(\vec u:[-\varepsilon ;+\varepsilon ]^{d+1}\ni (\vec x,t)\mapsto \vec u(\vec x,t)\in \mathbb {R}^e\) to Eq. (5) is again computable in sequential polynomial time.

If \(f_1,\ldots f_e\) are computable in polylogarithmic depth, then so is \(\vec u\).

We emphasize that the constructive proof of Kovalevskaya’s Theorem [4, §4.6.3] expresses the solution’s j-th coefficient as a multivariate integer polynomial \(p_j\) in the initial condition’s and right-hand side’s coefficients; however as \(p_j\)’s total degree and number of variables grows with j, its number of terms explodes exponentially. Symbolic-numerical approaches employ Janet, Riquier or Gröbner bases whose worst-case complexity however is also exponential [13].

2 Real Complexity Theory

We consider the computational worst-case cost of computing continuous real functions on a compact domain, formalized as follows:

Definition 4

Equip \(\mathbb {R}^d\) with the maximum norm and fix \(2^\ell \)-Lipschitz \(f:\varOmega \subseteq [-2^k;2^k]^d\rightarrow [-2^m;2^m]\), \(k,\ell ,m\in \mathbb {N}\).

  1. (a)

    Consider a Turing Machine \(\mathcal {M}\) with read-only input tape, one-way output tape, and working tape(s). \(\mathcal {M}\) is said to compute f (w.r.t. \(\mu \)) if, given \(\texttt {0} ^{n}\texttt {1} {\text {bin}}(\vec a)\in \{\texttt {0} ,\texttt {1} \}^{n+1+d\cdot \mathcal {O}(k+\ell +n)}\) for \(\vec a\in \{-2^{k+\ell +n},\ldots ,0,\ldots +2^{k+\ell +n}\}^d\) with \(|\vec x-\vec a/2^{\ell +n}|\le 2^{-\ell -n}\) for some \(\vec x\in \varOmega \), \(\mathcal {M}\) outputs \({\text {bin}}(b)\) for some \(b\in \{-2^{m+n},\ldots ,0,\ldots ,2^{m+n}\}\) with \(|f(\vec x)-b/2^n|\le 2^{-n}\) and stops. Here \({\text {bin}}(\vec a)\in \{\texttt {0} ,\texttt {1} \}^*\) denotes some binary encoding of integer vectors.

  2. (b)

    Fix \(s,t:\mathbb {N}\rightarrow \mathbb {N}\) with \(t(n)\ge n\) and \(s(n)\ge \log _2(n)\). The computation from (a) runs in time t and space s if \(\mathcal {M}\) stops after at most \(t\big (n+m+d\cdot \mathcal {O}(n+k+\ell )\big )\) steps and uses at most \(s\big (n+m+d\cdot \mathcal {O}(n+k+\ell )\big )\) cells on its work tape, not counting input nor output tape usage and regardless of \(\vec a\) as above. In this case write \(f\in {\mathbb {R}\mathsf TIME} (t)\cap {\mathbb {R}\mathsf {SPACE}} (s)\). Polynomial time is abbreviated \({\mathbb {R}\mathsf {PTIME}} =\bigcup _{i} {\mathbb {R}\mathsf TIME} \big (\mathcal {O}(n^i)\big )\), polynomial space means \({\mathbb {R}\mathsf {PSPACE}} =\bigcup _{i} {\mathbb {R}\mathsf {SPACE}} \big (\mathcal {O}(n^i)\big )\), and \({\mathbb {R}\mathsf {PTIMESPACE}} (\log ^i n):={\mathbb {R}\mathsf {PTIME}} \cap {\mathbb {R}\mathsf {SPACE}} (\log ^i n)\).

  3. (c)

    Consider a Boolean circuit \(C_n\) having \(\mathcal {O}(n+m)\) binary outputs and \(d\cdot \mathcal {O}(k+\ell +n)\) binary inputs. Such a sequence \((C_n)\) computes f if \(C_n\), on every (possibly padded) input \({\text {bin}}(\vec a)\) with \(|\vec x-\vec a/2^{n+\ell }|\le 2^{-k-\ell }\) for some \(\vec x\in \varOmega \), it outputs some \({\text {bin}}(b)\) such that \(|f(\vec x)-b/2^n|\le 2^{-n}\).

  4. (d)

    We say that f is computable in depth t, written \(f\in {\mathbb {R}\mathsf {DEPTH}} (t)\), if there exists a sequence \((C_n)\) of Boolean circuits over basis binary NAND, say, of depth at most \(t\big (n+m+d\cdot \mathcal {O}(n+k+\ell )\big )\) computing f. \({\mathbb {R}\mathsf {NC}} ^i\) abbreviates \({\mathbb {R}\mathsf {DEPTH}} \big (\mathcal {O}(\log ^i n)\big )\) with the additional requirements that (i) the circuits be logspace uniform and (ii) their size (#gates) grows at most polynomially in \(n+m+d\cdot \mathcal {O}(n+k+\ell )\). Similarly, \({\mathbb {R}\mathsf {PAR}} \) abbreviates \(\bigcup _{i} {\mathbb {R}\mathsf {DEPTH}} \big (\mathcal {O}(n^i)\big )\) with the additional requirement that the (possibly exponentially large) circuits \(C_n\) be polynomial-time uniform in that a polynomial-time Turing machine can, given \(n\in \mathbb {N}\) in unary and \(I<J\in \mathbb {N}\) in binary with respect to some fixed topological order, report whether in \(C_n\) the output of gate #I is connected to gate #J.

  5. (e)

    We say f belongs to \({\mathbb {R}{\#\mathsf P}} \) if it can be computed by a Turing machine \(\mathcal {M}^\varphi \) in polynomial time (b) given oracle access to some counting problem \(\varphi \) in the discrete complexity class \(\#\mathsf P\).

    Similarly for \({\mathbb {R}{\#\mathsf P}} ^{\#\mathsf P} \).

We follow the classical conception of real numbers as ‘streams’ of approximations, both for input and output [6]: the alternative approach based on oracles [8, 12] involves a stack of query tapes to ensure closure under composition.

Common numerical difference schemes are matrices \(A_m\) whose dimension \(D_m\) grows exponentially with the precision parameter m and therefore cannot be output entirely within polynomial time; instead Definition 5 requires any desired entry \((A_m)_{I,J}\) to admit efficient approximations, where indices IJ are given in binary such that their length remains polynomial in m:

Definition 5

  1. (a)

    Computing a vector \(\vec v\in \mathbb {R}^D\) means to output, given \(J\in \{0,\ldots D-1\}\) in binary and \(n\in \mathbb {N}\) in unary, some \(a\in \mathbb {Z}\) in binary such that \(|v_J-a/2^n|\le 2^{-n}\); similarly for matrices \(A\in \mathbb {R}^{D\times D}\), considered as vectors in \(\mathbb {R}^{\mathcal {O}(D\cdot D)}\) via the pairing function \((I,J)\mapsto (I+J)\cdot (I+J+1)/2+J\).

  2. (b)

    For in both arguments monotonically non-decreasing t(nm) and \(s(n,m)\le t(n,m)\), a sequence \(\vec v_m\in \mathbb {R}^{D_m}\) of vectors is computed in sequential time t and space s if its entries have binary length at most polynomial in its dimension \(\sup _J|v_{m,J}|\le 2^{{\text {poly}}(D_m)}\) and it takes a Turing machine at most t(nm) steps and s(nm) tape cells to output, given \(J\in \{0,\ldots D_m-1\}\) in binary and \(m\in \mathbb {N}\) in unary, some \(a\in \mathbb {Z}\) in binary with \(|v_{m,J}-a/2^n|\le 2^{-n}\); similarly for sequences of matrices \(A_m\in \mathbb {R}^{D_m\times D_m}\).

  3. (c)

    Polynomial sequential time/space and poly/logarithmic space means polynomial and poly/logarithmic in \(n+m\), respectively. \({\mathbb {R}{\#\mathsf P}} \) for (sequences of) vectors consists of those \(\vec v_m\in \mathbb {R}^{D_m}\) computable in polynomial time by a Turing machine with a \({\#\mathsf P} \) oracle; similarly for (sequences of) matrices.

  4. (d)

    A sequence \(\vec v_m\in \mathbb {R}^{D_m}\) of vectors is computed in depth s(nm) if its entries have binary length at most polynomial in \(D_m\): \(\sup _J|v_{m,J}|\le 2^{{\text {poly}}(D_m)}\), and a family of Boolean circuits \(C_{n,m}\) of depth s(nm) can output, given \(J\in \{0,\ldots D_m-1\}\) in binary, some \(a\in \mathbb {Z}\) in binary with \(|v_{m,J}-a/2^n|\le 2^{-n}\).

  5. (e)

    Polylogarithmic depth means Boolean circuits \(C_{n,m}\) in (d) of depth polylogarithmic in \(n+m\); \({\mathbb {R}\mathsf {NC}} ^i\) abbreviates \({\mathbb {R}\mathsf {DEPTH}} \big (\mathcal {O}(\log ^i n)\big )\) with additional requirements (i) and (ii) as in Definition 4(f); similarly for \({\mathbb {R}\mathsf {PAR}} \).

Note that already reading J in (b) takes time of order \(\log D_m\le t(n,m)\), and space of order \(s(n,m)\ge {\text {loglog}}D_m\). Similarly, a circuit of depth s(nm) can access at most \(2^{s(n,m)}\ge \log D_m\) input gates.

3 Efficient Real Polynomial/Matrix/Operator Powering

A major ingredient to our contributions are new efficient algorithms for real polynomial/matrix/operator powering, analyzed in dependence on various parameters that do not occur/make sense in the classical discrete setting:

Proposition 6

  1. (a)

    For \(D,K\in \mathbb {N}\) and \(A,A',B,B'\in [-2^L;2^L]^{D\times D}\), it holds \(|A^K|_\infty \le 2^{KL}\cdot D^{K-1}\) and \(|A\cdot B-A'\cdot B'|_\infty \le D\cdot 2^{L+1}\cdot \max \big \{|A-A'|_\infty ,\) \(|B-B'|_\infty \big \}\), where \(|B|_\infty :=\max _{I,J} |B_{I,J}|\).

  2. (b)

    For \(D,K\in \mathbb {N}\), matrix powering \([-2^L;2^L]^{D\times D}\ni A\mapsto A^K\) is computable by circuits \(C_n\) of depth \(\mathcal {O}\big (\log (K)\cdot (\log n+\log D+\log K+\log L)\big )\) and size polynomial in \(n+D+K+L\).

  3. (c)

    Refining (b), if \(|A^k|_\infty \le 2^L\) holds for all \(k\le K\), the depth of circuits computing \(\mathbb {R}^{D\times D}\ni A\mapsto A^K\) can be reduced to \(\mathcal {O}\big ((\log n+\log D+\pmb {{\text {loglog}}} K+\log L)\cdot \log K\big )\).

  4. (d)

    Suppose sequences \(\vec u_m,\vec v_m\in \big [-2^{L_m};+2^{L_m}\big ]^{D_m}\) are computable in depth \(s(n,m)\ge \log (n)+\log (m)\) and have inner product \(\vec u_m^{\bot }\cdot \vec v_m\in \big [-2^{L_m};+2^{L_m}\big ]\). Then said inner product is computable in depth \(\mathcal {O}\big (\log (D_m)+s(n+L_m+\log D_m),m\big )\).

  5. (e)

    Suppose sequences \(\vec v_m,\vec w_m\) are computable in polynomial sequential time. Then their inner product \(\vec u_m^{\bot }\cdot \vec v_m\in \mathbb {R}\) is in \({\mathbb {R}{\#\mathsf P}} \); and this is optimal in general.

  6. (f)

    Suppose \(A_m\in \mathbb {R}^{D_m\times D_m}\) are computable in depth \(s(n,m)\ge \log (n)+\log (m)\) and satisfy \(A_m^{k}\in \big [-2^{L_m};2^{L_m}\big ]^{D_m\times D_m}\) for all \(k\le K_m\). Then the powers \(A_m^{K_m}\in \mathbb {R}^{D_m\times D_m}\) are computable in depth

    $$\begin{aligned}&\mathcal {O}\Big (\log (K_m)\cdot (\log n +\log L_m+\log D_m+{\text {loglog}}K_m) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad +\; s\big (n+(L_m+\log D_m)\cdot \log K_m,m\big )\Big ) \end{aligned}$$
  7. (g)

    If \(A_m\) is computable in polynomial time and same for \(\mathbb {N}\ni K_m<2^{{\text {poly}}(m)}\), then the powers \(A_m^{K_m}\) are computable in \({\mathbb {R}\mathsf {PSPACE}} \); and this is optimal in general.

Proof

(Sketch). Claims (b), (c) and (f) are based on repeated squaring, each matrix multiplication being \(D^2\) inner products, realized as prefix sums (carry look-ahead) and known logarithmic-time integer multiplication (d). Regarding (g), encode PSPACE-complete reachability as matrix powering.    \(\square \)

Theorem 7

For a uni-variate polynomial \(p=\sum _{j=0}^d p_jX^j\in \mathbb {R}[X]\), let \(|p|:=\sum _j |p_j|\) denote its norm. Fix \(d\in \mathbb {N}\).

  1. (a)

    Given \(a,b\in \mathbb {R}\) with \(|a|+|b|\le 1\) and \(J\le K\in \mathbb {N}\), \(\left( {\begin{array}{c}K\\ J\end{array}}\right) \cdot a^J\cdot b^{K-J}\) can be approximated to \(2^{-n}\) in time polynomial in n and the binary length of K.

  2. (b)

    Given \(a_1,\ldots ,a_d\in \mathbb {R}\) with \(\sum _j|a_j|\le 1\) and given \(J_1,\ldots J_d,K\in \mathbb {N}\) with \(\sum _j J_j=K\), \(\left( {\begin{array}{c}K\\ J_1,\ldots J_d\end{array}}\right) \cdot a_1^{J_1}\cdots a_d^{J_d}\) can be approximated up to error \(2^{-n}\) in time polynomial in n and the binary length of K. Here \(\left( {\begin{array}{c}K\\ J_1,\ldots J_d\end{array}}\right) =\frac{K!}{J_1!\cdots J_d!}\) denotes the multinomial coefficient.

  3. (c)

    For \(p=\sum _{j=0}^d p_jX^j\in \mathbb {R}[X]\) with \(|p|\le 1\) and polynomial-time computable coefficients, the coefficient vector

    $$\begin{aligned} \sum \limits _{\begin{array}{c} J_0+J_1+\ldots +J_d=2^n \\ J_1+2J_2+\cdots +dJ_d=J \end{array}} p_0^{J_0}\cdots p_d^{J_d}\cdot \left( {\begin{array}{c}2^n\\ J_0 ,\ldots J_d\end{array}}\right) , \qquad J\le d\cdot 2^n \end{aligned}$$
    (6)

    of \(p^{2^n}\) belongs to \({\mathbb {R}{\#\mathsf P}} \).

  4. (d)

    Let \(C_m\in \mathbb {R}^{2^m\times 2^m}\) denote a circulant matrix of bandwidth d with polynomial-time computable entries \(c_{-d},\ldots c_{+d}\in \mathbb {R}\) such that \(|p_{-d}|+\ldots +|p_{+d}|\le 1\). Then the matrix power \(C_m^{2^m}\) belongs to \({\mathbb {R}{\#\mathsf P}} \).

We omit the proof of Theorem 7 because of space constraints. Item (d) is proved by means of Items (a)–(c). Items (a) and (c) use the Gaussian Distribution as approximation. Note that, again, Items (c) and (d) are trivial over integers; considering the real setting makes them meaningful and crucial for cost analyses of difference schemes.

Our next tool is about efficient operator powering. It generalizes Proposition 6(b) to compact subsets of some infinite-dimensional vector space.

Theorem 8

Fix a Banach space \(\mathcal {B}\) with norm \(\Vert \,\cdot \,\Vert \) and linear map \(A:\mathcal {B}\rightarrow \mathcal {B}\). And fix an increasing sequence \(\mathcal {V}_{d}\subseteq \mathcal {V}_{d+1}\subseteq \mathcal {B}\) of non-empty compact convex symmetric subsets such that

  1. (i)

    \(A^K:\mathcal {V}_{d}\rightarrow \mathcal {V}_{{\text {poly}}(d+K)}\) is well-defined for all \(d,K\in \mathbb {N}\) and

  2. (ii)

    satisfies \(\Vert A^Kv\Vert \le \mathcal {O}(1)^d\cdot d^K\cdot K!\) for all \(v\in \mathcal {V}_d\)

  3. (iii)

    and is computable in sequential time polynomial in \(n+d+K\)

  4. (iii’)

    or is computable in polylogarithmic depth \({\text {poly}}(\log n+\log d+\log K)\).

$$\begin{aligned} u(t) \;:=\; \exp (tA)v \;=\; \sum \nolimits _K t^K\cdot A^K v/K! \;\in \;\mathcal {B}\end{aligned}$$

is well-defined for all \(\vec v\in \mathcal {V}_d\) and all \(|t|<1/d\) and satisfies \(u_t = A u\). Moreover \(\mathcal {V}_d\times [0;1/2d]\ni (v,t)\mapsto u(t)\) is computable in sequential time polynomial in \(n+d\) (iii) or (iii’) in depth \({\text {poly}}(\log n+\log d)\).

Proof

(Sketch). Under hypothesis (ii), the series \(u(t)=\sum \nolimits _K t^K\cdot A^K v/K!\) permits differentiation under the sum for \(|t|\le 1/2d\) and hence solves \(u_t = A u\). Moreover the first n terms of the series approximate u up to error \(2^{-n}\).    \(\square \)

Notice that a naïve hypothesis (i’) \(A:\mathcal {V}_d\rightarrow \mathcal {V}_{2d}\) would only imply \(A^K:\mathcal {V}_{d}\rightarrow \mathcal {V}_{2^Kd}\): blowing up exponentially in K. The stronger hypothesis (i), as well as (ii) to (iii’), are for instance satisfied for \(\mathcal {V}_d:=[-2^d;2^d]^D\) by every \(A\in [-2^d;2^d]^{D\times D}\); recall Proposition 6(a). A more involved case revolves around univariate analytic functions.

Example 9

Consider the space \(\mathcal {B}\) of complex-valued functions \(v:[-1;1]\rightarrow \mathbb {C}\) infinitely often continuously differentiable on the real interval. Write \(v^{(j)}\) for its j-th iterated derivative, and abbreviate \(|v|_\infty :=\sup _x |v(x)|\). For each \(d\in \mathbb {N}\) let

$$ \mathcal {V}_{d} \;=\; \big \{ v:[-1;1]\rightarrow \mathbb {C}, \;\; \forall j\in \mathbb {N}\; |v^{(j)}|_\infty \le 2^d\cdot j!\cdot d^j \big \}, \quad \Vert v\Vert := \sum \nolimits _j |v^{(j)}|_\infty /(j!)^2 $$

Represent each \(v\in \mathcal {V}_d\) by the \((2d+1)\)-tuple of its germs around positions 1, \((d-1)/d, \ldots ,(-d+1)/d, 1\in [-1;1]\); and represent each germ by its power series coefficient sequence.

  1. (i)

    \(\partial ^K:\mathcal {V}_d\ni v\mapsto v^{(K)}\in \mathcal {V}_{3d+\lceil K\log d\rceil +\lceil K\log K\rceil }\) is well-defined.

  2. (ii)

    satisfies \(\Vert \partial ^Kv\Vert \le (2e^2)^d\cdot d^K\cdot K!\) for all \(v\in \mathcal {V}_d\)

  3. (iii)

    \(\partial ^K\big |_{\mathcal {V}_d}\) is computable in sequential time polynomial in \(n+d+K\)

  4. (iii’)

    and in polylogarithmic depth \({\text {poly}}(\log n+\log d+\log K)\).

  5. (iv)

    \(\mathcal {V}_d\) is compact.

Proof

(Sketch). For (i) and (iii) see the proof of Theorem 16(d) in [7, §3.2]. The underlying algorithm basically shifts and scales the coefficient sequences to symbolically take the derivative of the power series: easy to parallelize [19]. It also needs to add new germs/points of expansion in \([-1;1]\): which again can be performed in parallel, thus establishing (iii’).    \(\square \)

4 Complexity of Differential Equations

In this section we apply results on matrix/operator powering from the previous section to prove Theorems 13.

Proof

(Theorem 1, sketch). Recall from Proposition 6(a) that \(A^{k}\) has entries bounded by \(d^{k-1}\). Hence the tail \(\sum _{k>K} \frac{t^k}{k!} A^k\vec v\) is bounded, according to Stirling formula, by

$$ \sum \nolimits _{k>K} 1/{k!} \cdot 2^{k\cdot \log d} \;\le \; \sum \nolimits _{k>K} \mathcal {O}\big (2^{-k\cdot (\log k-\log d)}\big ) \;\le \; \sum \nolimits _{k>K} 2^{-k} \;=\; 2^{-K} $$

for \(K\ge 2d\). Thus we can calculate the first \(K:=\max \{2d,n\}\) terms of the power series in Eq. (3) simultaneously within depth \(\mathcal {O}\big (\log (\max \{n,2d\})\cdot (\log n+\log d+\log \log \max \{n,2d\})\big )=\mathcal {O}\big ((\log n+\log d)^2\big )\) (by Proposition 6(c)); and then add them, incurring additional depth of the same magnitude, which completes the proof of the theorem.    \(\square \)

Proof

(Theorem 3, Sketch). Example 9 generalizes to functions of several variables [10]. The statements of the theorem thus follow from Theorem 8 with \(A:=f_1\partial _1+\cdots +f_e\partial _e\).   \(\square \)

Before proceeding to the proof of Theorem 2 let us briefly recall basic definitions and facts about difference schemes. Theorem 2 implicitly assumes the domain \(\varOmega \) be “good enough”; w.l.o.g. we will consider uniform grids \(G_h\) on \(\bar{\varOmega }\) and \(G_h^{\tau }\) on \([0,1]\times \bar{\varOmega }\) for BVPs (resp. on the intersection of the domain of existence and uniqueness with \([0,1]\times \bar{\varOmega }\) for IVPs). Here h, \(\tau \) are respectively the space and time steps. Unlikely choosing them heuristically as it is usually done in Numerics, we will compute them from the output precision as \(h=C_h/2^n\), \(\tau =C_{\tau }/2^n\), giving precise estimates on \(C_h\), \(C_{\tau }\). That’s the reason why we denote the matrix of the difference scheme \(A_n\) (depending on n) in Definition 10 below.

Definition 10

  1. (a)

    For the IVP or BVP for a system of PDEs considered in Theorem 2, an explicit difference scheme is a system of algebraic equations

    $$\begin{aligned} u^{(h,(l+1)\tau )}=\mathbf{A_n}u^{(h,l\tau )},\quad u^{(h,0)}=\varphi ^{(h)}\end{aligned}$$
    (7)

    Here \(A_n\) is a difference operator, which is in our case linear, i.e. a matrix of dimension \(\mathcal {O}(2^n)\) (for BVPs it also includes the boundary conditions); \(u^{(h, l\tau )}\) and \(\varphi ^{(h)}\) are grid functions, i.e. vectors of dimension \(\mathcal {O}(2^n)\), approximating the corresponding continuously differentiable functions.

  2. (b)

    The scheme (7) is said to approximate the given differential problem with order of accuracy p (where p is a positive integer) on a solution u(tx) of the considered boundary-value problem if \(\Bigl |(u_t-A{ u})|_{G_h^{\tau }}-L_h {u}^{(h)}\Bigr |\le C_1h^p,\; \text { and }\Bigl |\varphi |_{G_h}-\varphi ^{(h)}\Bigr | \le C_1h^p\) for some constant \(C_1\) not depending on h and \(\tau \). Here \(u^{(h)}=\{u^{(h,l\tau )}\}_{l=1}^M\), \(L_h\) is the linear difference operator corresponding to (7) rewritten in the form \(L_hu^{(h)}=0\), M is the number of time steps.

  3. (c)

    The difference scheme (7) is called stable if its solution \(\mathbf{u}^{(h)}\) satisfies \(|\mathbf{u}^{(h)}|\le c_2|\varphi ^{(h)}|\) for a constant \(c_2\) independent of h, \(\tau \) and \(\varphi ^{(h)}\). Here \(|\cdot |\) is the \(\sup \)-norm, i.e. the maximal value over all grid cells.

  4. (d)

    We will call the complexity of the difference scheme (7) be the complexity of the corresponding matrix \(A_n\) in the in sense of Definition 5(b).

Fact 11

(Well-known facts, see e.g. [5]): (a) Let the difference scheme be stable and approximate (1) on the solution u with order p. Then the solution \(u^{(h)}\) of the recursively defined linear algebraic systems (7) uniformly converges to the solution u in the sense that

$$\begin{aligned} |u|_{G^\tau _h}- u^{(h)}|\le Ch^p\end{aligned}$$
(8)

for C not depending on h and \(\tau \) (but possibly depending on the inputs; this dependence will occur later in the proofs).

(b) The difference scheme (7) is stable iff there is a constant \(C_2\) uniformly bounding all powers of \(A_n\):

$$\begin{aligned} |A_n^q|\le C_2,\ q=1,2,\ldots ,M.\end{aligned}$$
(9)

The stability property implies \(\tau \le \nu h\); \(\nu \) is called the Courant number.

(c) The convergence constant in (8) is \(C=C_1\cdot C_2\), where \(C_2\) is from (9); \(C_1\) is from the approximation property.

Proof (Theorem 2, sketch). To evaluate the solution u at a fixed point (tx) with the prescribed precision \(2^{-n}\) and estimate the bit-cost of the computation, consider the following computation steps.

  1. 1.

    Choose binary-rational grid steps \(h=2^{-N}\) (where \(N=\mathcal {O}(n)\)) and \(\tau \le \nu h\) (where \(\nu \) is the Courant number existing due to stability property): \(\tau \) just any binary-rational meeting this inequality; h defined by the inequality (12) below.

  2. 2.

    For a grid point (tx) put \(l=\frac{t}{\tau }\) (note that \(l\le M=\left[ \frac{1}{\tau }\right] =\mathcal {O}(2^n)\)) and calculate the matrix powers and vector products

    $$\begin{aligned} u^{(h,l\tau )}=A_n^l\varphi ^{(h)}.\end{aligned}$$
    (10)

    Note that (10) uses matrix powering instead of step-by-step iterations initially suggested by the difference scheme (7).

  3. 3.

    For non-grid points take (e.g.) a multilinear interpolation \(\widetilde{u^{(h)}}\) of \(u^{(h)}\) computed from the (constant number of) “neighbor” grid points.

Due to well-known properties of multilinear interpolations,

$$\begin{aligned} \sup _{t,x}|\widetilde{u^{(h)}}(t,x)|\le \tilde{C}\sup _{G_h^{\tau }}|u^{(h)}|;\quad \sup _{t,x}|\widetilde{u\mid _{G_h^{\tau }}}(t,x)|\le \bar{C}\sup _{t,x}|D^2u(t,x)|\cdot h^2,\end{aligned}$$
(11)

where \(\tilde{C}\) and \(\bar{C}\) are absolute constants. Based on (11) and on the continuous dependence property, as well as on linearity of the interpolation operator, infer

$$\begin{aligned} \sup _{t,x}|u(t,x)-\widetilde{u^{(h)}}(t,x)|\le & {} \sup _{t,x}\left( |u(t,x)-\widetilde{u\mid _{G_h^{\tau }}}(t,x)|+|\widetilde{u\mid _{G_h^{\tau }}}(t,x)-\widetilde{u^{(h)}}(t,x)|\right) \\&\qquad \le \; \tilde{C}C_0\sup \nolimits _{x}|D^2\varphi (x)|h^2+\bar{C}C_1C_2\cdot h\;\le \; 2^{-n}. \end{aligned}$$

Thus choosing a grid step \(h=2^{-N}\) such that

$$\begin{aligned} h\le C_h\cdot 2^{-n}, \quad C_h=\tilde{C}C_0\sup \nolimits _{x}|D^2\varphi (x)|+\bar{C}C_1C_2, \end{aligned}$$
(12)

will guarantee the computed function \(\widetilde{u^{(h)}}\) approximate the solution u with the prescribed precision \(2^{-n}\) (here \(C_h\) depends only on the fixed s(n) space computable functions \(\varphi \), \(B_i\) and therefore is a fixed constant).

According to (10), item (a) of Theorem 2 follows from items (f) and (d) of Proposition 6; item (b) of Theorem 2 follows from item (d) of Theorem 7 combined with item (d) of Proposition 6.    \(\square \)

Conditions of Theorem 2 hold for large classes of IVPs and BVPs for linear PDEs. E.g. for certain BVPs for symmetric hyperbolic systems \(u_t+\sum \limits _{i=1}^mB_i u_{x_i}=0\) with constant matrices \(B_i=B_i^*\) (to which also the wave equation \(p_{tt}-a^2\sum \limits _{i=1}^mp_{x_ix_i}=0\) can be reduced), as well as for the heat equation \(p_{t}-a^2\sum \limits _{i=1}^mp_{x_ix_i}=0\), difference schemes with circulant constant bandwidth matrices can be constructed. For equations with non-constant coefficients the matrices are more complicated, thus the corresponding problems might have higher complexity. Deriving optimal complexity bounds for the considered (and possibly broader) classes of equations is one of the directions of future work.