1 Introduction

Polynomial processes in finite dimensions, introduced in Cuchiero et al. [25] (see also Filipović and Larsson [36]), constitute a class of time-homogeneous Markovian Itô semimartingales which are inherently tractable: conditional moments can be expressed through a deterministic dual process which is the solution of a linear ODE. This is the so-called moment formula. They form a rich class that includes Wright–Fisher diffusions (Kimura [49]) from population genetics, Wishart correlation matrices (Ahdida and Alfonsi [6]) and affine processes (Duffie et al. [29]), to name just a few. The computational advantages due to the moment formula are obvious and have led to a wide range of applications, in particular in mathematical finance and population genetics. In mathematical finance, this concerns especially interest rate theory, stochastic volatility models, life insurance liability modelling, variance swaps and stochastic portfolio theory (see e.g. Ackerer et al. [5], Biagini and Zhang [16], Filipović et al. [35] or Cuchiero [23]). In population genetics, dual processes associated to moments and simple procedures to compute them play an equally important role; the Wright–Fisher diffusion with seed-bank component (see e.g. Blath et al. [17] and the references therein) is for instance an important example of a recently investigated two-dimensional polynomial process in this field.

The main goal of this paper is to introduce the concept of polynomial processes taking values in a general Banach space \(B\) and to provide a corresponding moment formula. The resulting comprehensive theory covers practically all finite- and infinite-dimensional state spaces, going far beyond the specific cases considered until now. The second main goal is to illustrate the applicability of the moment formula and the powerful and easy-to-use results which arise therefrom. The potential of this formula can be appreciated by means of several practically relevant examples.

Let us here first focus on examples from mathematical finance. The most recent appearance of infinite-dimensional polynomial processes is certainly in the field of rough volatility (see e.g. Alòs et al. [7], Gatheral et al. [42] or Bennedsen et al. [10]). This roughness can be seen as a generic non-Markovianity of the corresponding volatility processes. By lifting these processes to infinite dimensions, it is, however, possible to recover the Markov property. Intriguingly, such infinite-dimensional models often stem from the class of polynomial processes: Hawkes processes, rough Heston (El Euch and Rosenbaum [30]) and rough Wishart processes can be viewed as infinite-dimensional affine and thus polynomial processes as shown in Abi Jaber and El Euch [2], Cuchiero and Teichmann [27,28]. In the current article, we show that the rough Bergomi model (Bayer et al. [8]) also belongs to the class of infinite-dimensional polynomial processes. In other areas like in stochastic portfolio theory, the most flexible and tractable models appear again to be (measure-valued) polynomial (Cuchiero et al. [26]), and also the Zakai equation from filtering theory belongs to this class. Let us here also mention that the (sub)class of affine processes taking values in Hilbert spaces has recently been studied in Schmidt et al. [59], in particular from an existence and pathwise uniqueness point of view.

In population genetics, infinite-dimensional models appear in the form of the well-known measure-valued diffusions such as the Fleming–Viot process, the super-Brownian motion and the Dawson–Watanabe superprocess (see e.g. Etheridge [31, in particular Chaps. 1 and 2] and the references therein). All these examples are polynomial processes.

The deeper reason behind this predominance can be explained by a universal approximation property of polynomial dynamics in the space of all stochastic dynamics driven by, say, Brownian motion (or many other continuous processes). This is based on the properties of the signature process, which plays a prominent role in rough path theory introduced by Lyons [54] and which serves as a regression basis for solutions of general stochastic differential equation (see e.g. Levin et al. [53]). As the signature of many processes, in particular of \(d\)-dimensional Brownian motion, also turns out to be an infinite-dimensional polynomial process, this suggests an inherent universality of the polynomial class (see Sect. 4.5). We also refer to Cuchiero et al. [24] where a randomised polynomial signature process is used as regression basis.

Passing to infinite-dimensional polynomial models is also supported from a purely computational point of view. Indeed, the synergy of increasing computer power with machine learning techniques has enabled to treat high-dimensional linear PDEs (or infinite-dimensional linear ODEs), exactly the kind of equation that arises in our context, very efficiently, for instance via neural network approaches (see e.g. Beck et al. [9]). Beside these techniques, the nice symmetries that typically characterise polynomial processes allow employing highly efficient algorithms, as those proposed by Heitzinger et al. [43].

Let us now outline our approach: we fix a state space \({\mathcal{S}}\subseteq B\) and define polynomial processes as \(\mathcal{S}\)-valued solutions of martingale problems for certain linear operators \(L\), which we coin polynomial as well. For this class of processes, we derive representations of the moments in terms of two moment formulas.

More precisely, we first introduce polynomials on \(B\) following the definition of polynomials in finite dimensions. In this spirit, we first define homogeneous polynomials of degree \(k\) to be bounded linear maps of \({y}^{\otimes k}\), where \({y}^{\otimes k}\) denotes the \(k\)-fold tensor product of \({y}\in {B}\) with itself. Then we define polynomials on \(B\) of degree \(k\) as linear combinations of homogeneous polynomials on \(B\) of degree less than or equal to \(k\). That is, a polynomial on \({B}\) with coefficients \({a}_{0}\in ({B}^{\otimes 0})^{*},\ldots , {a}_{k}\in ({B}^{\otimes k})^{*}\) is defined as

$$ p({y})={a}_{0}+\langle {a}_{1},{y}\rangle +\cdots +\langle {a}_{k},{y}^{ \otimes k}\rangle , $$

where \({B}^{\otimes j}\) denotes the \(j\)-fold symmetric algebraic tensor product of \({B}\) endowed with some crossnorm, \((B^{\otimes j})^{*}\) the dual space of \(B^{\otimes j}\) and \(\langle {\,\cdot \,}, {\,\cdot \,}\rangle \) the pairing between \(B^{\otimes j}\) and \((B^{\otimes j})^{*}\).

We then define polynomial operators as linear operators acting on classes of cylindrical polynomials, i.e., as functions \(p\) of the form

$$ p(y)=\phi (\langle a_{1}, {y}\rangle , \ldots , \langle a_{d}, {y} \rangle ), $$

where \(\phi \) is a polynomial on \(\mathbb{R}^{d}\) and \(a_{1},\ldots ,a_{d} \in D\) \(\subseteq B^{*}\). Their defining property consists in mapping each polynomial \(p\) to a (not necessarily cylindrical) polynomial \(Lp\) without increasing the degree. Polynomial processes taking values in \(\mathcal{S} \subseteq B\) are then defined to be \({\mathcal{S}}\)-valued solutions of martingale problems for polynomial operators.

This allows associating to \(L\) two families \((L_{k})_{k \in \mathbb{N}}\) and \((M_{k})_{k \in \mathbb{N}}\) of linear operators that we call dual and bidual, respectively. Those names are mnemonic for the spaces where those linear operators act. Indeed, \(L_{k}\) maps the coefficient vector of a cylindrical polynomial \(p\) to the coefficient vector of the polynomial \(Lp\) and is thus defined on the truncated graded algebra \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{*}\). The bidual operator \(M_{k}\) is the adjoint operator of \(L_{k}\) when pairing \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{*}\) with \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{**}\) and is thus defined on the truncated graded algebra \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{**}\). To both operators, we can associate a system of \((k+1)\)-dimensional ODEs on the respective truncated algebras. The corresponding solution will then describe the evolution of the polynomial process’ moments (see Theorems 3.4 and 3.8).

Note that our approach differs from the inspiring paper by Benth et al. [11] on the related notion of multilinear processes in Banach spaces in the following sense. In Benth et al. [11], the moment formula itself is the defining property, and examples satisfying it, e.g. independent increment processes, are studied. We tackle the problem from a different angle. Indeed, we show methodologically that for Banach-space-valued processes defined via polynomial operators, the moment formula holds true. This allows replacing ad hoc methods by a clear systematic approach and revealing all those models for which this approach is applicable, e.g. all SDE models with polynomial vector fields (see Sect. 4.5).

Indeed, we then apply this abstract theory to specific Banach spaces. For finite dimensions, we show that the operator \(L_{k}\) corresponds to the matrix representation (with respect to a certain basis) of the infinitesimal generator \(L\) restricted to the set of all polynomials up to degree \(k\). The operator \(M_{k}\) is then simply the transpose of this matrix, and the moment formulas boil down to the well-known matrix exponential representation of the conditional moments of finite-dimensional polynomial processes (see Cuchiero et al. [25] and Filipović and Larsson [37] for more details). Turning then to an infinite-dimensional setting, we show that for the state space of probability measures on ℝ, the linear ODEs corresponding to \(L_{k}\) and \(M_{k}\) can be identified with the Kolmogorov backward and forward equation, respectively, of an associated \(\mathbb{R}^{k}\)-valued Markov process. As our main application, we introduce a class of polynomial forward variance curve models which take values in a Hilbert space as in Filipović [34]. These are SPDEs with polynomial characteristics describing the evolution of the forward variance curve, that is, \(x \mapsto \mathbb{E}[V_{t+x}|\mathcal{F}_{t}]\), where \(V\) denotes the spot variance of some asset. This setup includes for instance the (rough) Bergomi model as considered in Bergomi [14], Bayer et al. [8] or Jacquier et al. [46]. We show in particular how to exploit the moment formula to price options on VIX in such models by either approximating the payoff via polynomials or to use the moments as control variates in Monte Carlo pricing. As last example we illustrate that the signature process of a \(d\)-dimensional Brownian motion is polynomial and derive its expected value (see e.g. Friz and Hairer [41, Theorem 3.9]) via the polynomial approach.

The remainder of the article is organised as follows. In Sect. 1.1, we introduce some basic notation and definitions. In Sect. 2, we define polynomials on \(B\) and polynomial operators. Sect. 3 is devoted to the introduction of polynomial processes as well as the formulation and proofs of the moment formulas. We then conclude with Sect. 4 which covers several examples and applications, including generic polynomial operators of Lévy type, probability-measure-valued polynomial processes, forward variance models and the signature process of a \(d\)-dimensional Brownian motion.

1.1 Notation and basic definitions

Throughout this paper, let \({B}\) be a Banach space. We denote by \({B}^{*}\) its dual space, i.e., the space of linear continuous functionals with the strong dual norm

$$ {\|{a}\|}_{*} := \sup _{\|{y}\| \leq 1} |{a}({y})|. $$

We introduce now some basic notions in the context of tensor products; for more details, we refer to Ryan [58]. The algebraic symmetric tensor product of \({B}\) with itself is a vector space \(B \otimes B\) together with a symmetric bilinear map \(\otimes : B \times B \to B \otimes B\) satisfying the following two properties (see also Ryan [58, Theorem 1.5]):

\({B}\otimes {B}\) admits the representation

$$ B\otimes B=\bigg\{ \sum _{i=1}^{d}\alpha _{i} ({y}_{i}\otimes {y}_{i}) : \alpha _{i}\in {\mathbb{R}}, {y}_{i}\in {B}, d\in {\mathbb{N}}\bigg\} ; $$

– given any symmetric bilinear map \(\ell : B \times B \to \mathbb{R}\), there exists a unique linear map \(\widetilde{\ell } : B \otimes B \to \mathbb{{\mathbb{R}}}\) such that \(\ell = \widetilde{\ell } \circ \otimes \), i.e., \(\ell (y_{1}, y_{2}) = \widetilde{\ell }(y_{1} \otimes y_{2})\) for all \(y_{1}, y_{2} \in B\).

The term algebraic refers to the fact that the definition of \(B\otimes B\) does not involve a closure with respect to a given norm.

For an element \({a}\in {B}^{*}\), the linear map \({a}\otimes {a}:{B}\otimes {B}\to {\mathbb{R}}\) is defined by

$$ ({a}\otimes {a})({y}_{1}\otimes {y}_{2}):={a}({y}_{1}){a}({y}_{2}). $$

For \({a}_{1}, {a}_{2}\in {B}^{*}\), we then get \({a}_{1}\otimes {a}_{2}\) via polarisation. With \(\|{\,\cdot \,}\|_{\times }\), we denote a crossnorm on \({B}\otimes {B}\), i.e., a norm that satisfies

  1. (i)

    \(\|{y}_{1}\otimes {y}_{2}\|_{\times }=\|{y}_{1}\|\|{y}_{2}\|\) for all \({y}_{1},{y}_{2}\in {B}\), and

  2. (ii)

    \(\sup _{{y}\in {B}\otimes {B}, \|{y}\|_{\times }\leq 1}|({a}_{1} \otimes {a}_{2})({y})|= \|{a}_{1}\|_{*}\|{a}_{2}\|_{*}\) for all \({a}_{1},{a}_{2}\in {B}^{*}\).

Throughout, we fix some crossnorm with respect to which we define the corresponding dual spaces. Since our results hold for each crossnorm, we do not choose a particular one. We then denote by \(({B}\otimes {B})^{*}=:(B^{\otimes 2})^{*}\) the dual space of \(({B}\otimes {B},\|{\,\cdot \,}\|_{\times })\). Observe that \({a}_{1}\otimes {a}_{2}\in ({B}\otimes {B})^{*}\) for all \({a}_{1},{a}_{2}\in {B}^{*}\). Furthermore, we denote by \(\|{\,\cdot \,}\|_{*2}\) the strong dual norm on \(({B}\otimes {B})^{*}\), i.e.,

$$\begin{aligned} \|{a}\|_{*2}:=\sup \{|{a}({y}) |:\ {y}\in {B}\otimes {B}, \|{y}\|_{\times }\leq 1\}. \end{aligned}$$
(1.1)

Observe that the larger a crossnorm, the larger the corresponding dual space. Since a larger dual space does not necessarily lead to stronger results, we do not fix a particular one here. Choosing one crossnorm over another will be a question of the specifically involved stochastic processes and thus depends on the application.

Finally, we set \(({B}\otimes {B})^{**}:=(({B}\otimes {B})^{*})^{*}\). Recall that each element \({y}\in {B}\otimes {B}\) can be seen as an element of \(({B}\otimes {B})^{**}\) by setting \({y}({a}):={a}({y})\) for each \({a}\in\! ({B}\!\otimes \!{B})^{*}\). The \(k\)-fold tensor products \({y}^{\otimes k}, {B}^{\otimes k}, {a}^{\otimes k}, ({B}^{\otimes k})^{*}\) and the norm \(\|{\,\cdot \,}\|_{*k}\) for \(k\in {\mathbb{N}}\) are defined analogously. For \(k=0\), we identify \(({B}^{\otimes 0})^{*}\) with ℝ, so that \({a}({y}^{\otimes 0}):={a}\in {\mathbb{R}}\) for all \({y}\in {B}\). We also set \(y^{\otimes 0}:=1\). Generally, for subsets \(D_{j} \subseteq ({B}^{\otimes j})^{*}\) or \({\mathcal{S}}_{j} \subseteq ({B}^{\otimes j})\), we write \(\vec{a}\in \bigoplus _{j=0}^{k} D_{j}\) for \(\vec{a}=({a}_{0},\ldots ,{a}_{k})\) with \({a}_{j}\in D_{j}\) and \(\vec{y}\in \bigoplus _{j=0}^{k} {\mathcal{S}}_{j}\) for \(\vec{y}=({y}_{0},\ldots ,{y}_{k})\) with \({y}_{j}\in {\mathcal{S}}_{j}\). Moreover, for \({y}\in {B}\), we write \(\overline{{y}}\) for the vector \((1,{y},\ldots ,{y}^{\otimes k})\).

2 Polynomials on \({B}\) and polynomial operators

The goal of this section is to introduce polynomials on \(B\) and to define all sorts of operators that we need when dealing with polynomial processes.

For \({a}\in ({B}^{\otimes k})^{*}\), \({y}\in {B}\) and \(k\in {\mathbb{N}}_{0}\), we use the notation \(\langle {a},{y}^{\otimes k}\rangle :={a}({y}^{\otimes k})\). A polynomial on \({B}\) with coefficients \({a}_{0}\in ({B}^{\otimes 0})^{*},\ldots , {a}_{k}\in ({B}^{\otimes k})^{*}\) is then defined as

$$ p({y})={a}_{0}+\langle {a}_{1},{y}\rangle +\cdots +\langle {a}_{k},{y}^{ \otimes k}\rangle . $$
(2.1)

Setting \(\vec{a}:=({a}_{0},\ldots ,{a}_{k})\in \bigoplus _{j=0}^{k} ({B}^{ \otimes j})^{*}\), we say that \(\vec{a}\) is the coefficient vector corresponding to \(p\). The degree of a polynomial \(p({y})\), denoted by \(\deg (p)\), is the largest \(j\) such that \({a}_{j}\) is not the zero function, and \(-\infty \) if \(p\) is the zero polynomial. As shorthand notation, we often denote \(p({y})\) via

$$ p({y})= (\vec{a} \cdot \overline{{y}}). $$
(2.2)

Note that the dependence on the degree \(k\) is hidden in the notation. More generally, we write

$$\begin{aligned} (\vec{a}\cdot \vec{{y}}):={a}_{0}+\langle {a}_{1},{y}_{1}\rangle + \cdots +\langle {a}_{k},{y}_{k}\rangle \end{aligned}$$
(2.3)

for \(\vec{a}\in \bigoplus _{j=0}^{k} (B^{\otimes j})^{*} \) and \(\vec{y}\in \bigoplus _{j=0}^{k} (B^{\otimes j})^{**}\), where here \(\langle {a}_{j},{y}_{j}\rangle :={y}_{j}({a}_{j})\) for each \(j\). Next, we denote by

$$ P:=\{{y}\mapsto p({y}) : p\text{ is a polynomial on }{B}\} $$

the algebra of all polynomials on \({B}\) regarded as real-valued maps, equipped with pointwise addition and multiplication. In practice, it is often convenient to consider a subspace of polynomials with more regular coefficients. Fix \(D\subseteq {B}^{*}\). In our context, a cylindrical polynomial with coefficients in \(D\) is a function \(p: {B}\to \mathbb{R}\) of the form \(p({y}):= \phi (\langle {a}_{1}, {y}\rangle ,\ldots ,\langle {a}_{d}, {y}\rangle ) \), where \(d\in {\mathbb{N}}\), \(\phi :{\mathbb{R}}^{d}\to {\mathbb{R}}\) is a polynomial and \({a}_{i}\in D\) for each \(i\in \{1,\ldots ,d\}\). The space of cylindrical polynomials with coefficients in \(D\) is defined by

$$ P^{D}=\mathrm{span}\{{y}\mapsto \langle {a}, {y}\rangle ^{k} : k \in \mathbb{N}_{0}, {a}\in D\}. $$

Since for \({a}\in {B}^{*}\) and \({y}\in {B}\), it holds that \(\langle {a},{y}\rangle ^{k}=\langle {a}^{\otimes k},{y}^{\otimes k} \rangle \), we can equivalently write \(P^{D}=\{{y}\mapsto \sum _{j=0}^{k} \langle a_{j}, y^{\otimes j} \rangle : k \in \mathbb{N}_{0}, {a}_{j} \in D^{\otimes j}\}\).

We now define polynomial operators, which constitute a class of possibly unbounded linear operators acting on polynomials. They are not defined on all of \(P\) in general, but only on the subspace \(P^{D}\) for some subspace \(D \subseteq {B}^{*}\). An analogue of this notion has appeared previously in connection with finite-dimensional and measure-valued polynomial processes; see e.g. Cuchiero et al. [25], Filipović and Larsson [36] and Cuchiero et al. [26].

Definition 2.1

Fix \({\mathcal{S}}\subseteq {B}\). A linear operator \(L\colon P^{D}\to P\) is called \({\mathcal{S}}\)-polynomial if for every \(p\in P^{D}\), there is some \(q\in P\) such that \(q|_{{\mathcal{S}}}=(Lp)|_{{\mathcal{S}}}\) and

$$ \deg (q) \le \deg (p). $$

Choosing \({B}={\mathbb{R}}\), it is well known (see for instance Cuchiero et al. [25]) that the second order differential operator \(L\colon P^{D}\to C({\mathbb{R}})\) (corresponding to the extended generator of a diffusion with values in ℝ) given by

$$ Lp({y}):=b({y}) p'({y})+\frac{1}{2} a({y})p''({y}), \qquad a,b\in C({ \mathbb{R}}), $$

is polynomial if and only if \(b\) is an affine function and \(a\) a quadratic polynomial.

Remark 2.2

Recall from Sect. 1.1 that we work with the symmetric tensor product. A non-symmetrised tensor product would indeed lead to an unnecessary redundancy; since \(\langle {a}\otimes b,{y}^{\otimes 2}\rangle =\langle {a},{y}\rangle \langle b,{y}\rangle =\langle b\otimes {a},{y}^{\otimes 2}\rangle \), the polynomial with coefficient \({a}\otimes b\) coincides with that having coefficient \(b\otimes {a}\).

2.1 Dual operators

We should now like to associate to an \({\mathcal{S}}\)-polynomial operator \(L\) a family of so-called dual operators \((L_{k})_{k \in \mathbb{N}}\) which are linear operators mapping the coefficient vector of \(p\) to the coefficient vector of \(Lp\).

If \({\mathcal{S}}={B}\), the definition of those operators is easily achievable. Indeed, in this case, the representation provided in (2.1) is unique (see Lemma A.1 for more details) and we can identify each polynomial with its coefficient vector. For a \({B}\)-polynomial operator \(L\), this means that \(L\) can be uniquely identified with a family of operators \((L_{k})_{k\in {\mathbb{N}}}\), where \(L_{k}: \bigoplus _{j=0}^{k} D^{\otimes j}\to \bigoplus _{j=0}^{k} ({B}^{ \otimes j})^{*}\) maps the coefficient vector of \(p\) to the coefficient vector of \(Lp\), for each \(p\in P^{D}\) with \(\deg (p)\leq k\). By letting \(L_{k}^{j}: \bigoplus _{j=0}^{k} D^{\otimes j}\to ({B}^{\otimes j})^{*}\) be the operator mapping the coefficient vector of \(p\) to the \(j\)th coefficient of \(Lp\), we can write \(L_{k}\vec{a}=(L_{k}^{0}\vec{a},\ldots ,L_{k}^{k}\vec{a})\) for all \(\vec{a}\in \bigoplus _{j=0}^{k} D^{\otimes j}\). It is important to note that the operators \(L_{k}\) and \(L_{k}^{j}\) inherit linearity from \(L\).

If \({\mathcal{S}}\subsetneq {B}\), two different polynomials can coincide on \({\mathcal{S}}\). This is for instance the case if \({\mathcal{S}}=\{{y}\in {B}:\langle {a}_{1},{y}\rangle =\langle {a}_{2},{y} \rangle \}\) for some \({a}_{1} \neq {a}_{2} \in B^{*}\). In such a situation, there is no one-to-one correspondence between the restriction of a polynomial to \({\mathcal{S}}\) and its coefficient vector. However, it is still possible to define a dual operator that maps the coefficient vector of \(p\) to the coefficient vector of \(q\), where \(q\) is some polynomial such that \((Lp)|_{\mathcal{S}}=q|_{\mathcal{S}}\) and \(\deg (q)\leq \deg (p)\). Since the choice of such a \(q\) need not be unique, the linearity of such an operator is a priori not clear. We show in Lemma A.2 that one can always choose \(q\) in such a way that linearity of the dual operator is satisfied. We can thus conclude that to each \({\mathcal{S}}\)-polynomial operator \(L\), we can associate a family of dual operators \((L_{k})_{k \in \mathbb{N}}\), rigorously introduced in the following definition. Recall that the pairing \(( {\,\cdot \,})\) has been introduced in (2.2).

Definition 2.3

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator. For \(k\in {\mathbb{N}}_{0}\), a \(k\)th dual operator corresponding to \(L\) is a linear operator \(L_{k}: \bigoplus _{j=0}^{k} D^{ \otimes j}\to \bigoplus _{j=0}^{k} ({B}^{\otimes j})^{*}\) such that \(L_{k}\vec{a}=:(L_{k}^{0}\vec{a},\ldots , L_{k}^{k}\vec{a})\) satisfies

$$ Lp({y})=(L_{k} \vec{a}\cdot \overline{{y}})\qquad \text{for all }{y} \in {\mathcal{S}}, $$

where \(p({y}):=(\vec{a}\cdot \overline{{y}})\). Whenever \(L_{k}\) is a closable operator, we still denote its closure by \(L_{k}: {\mathcal{D}}(L_{k})\to \bigoplus _{j=0}^{k} ({B}^{\otimes j})^{*}\) and its domain by \({\mathcal{D}}(L_{k})\subseteq \bigoplus _{j=0}^{k}({B}^{\otimes j})^{*}\). We refer for instance to Ethier and Kurtz [32, Chap. 1] for the precise definition of closure and the related concepts.

We illustrate this notion by means of the well-studied one-dimensional Jacobi diffusion.

Example 2.4

Let \({B}={B}^{*}=D={\mathbb{R}}\), \({\mathcal{S}}:=[0,1]\) and recall that \({\mathbb{R}}\otimes {\mathbb{R}}={\mathbb{R}}\). Let \(P\) denote the space of all polynomials on ℝ and let \(L:P\to P\) be the \({\mathcal{S}}\)-polynomial operator given by

$$ Lp({y})={y}(1-{y})p''({y}). $$

For each \(\vec{a}:=({a}_{0},\ldots ,{a}_{k})\in {\mathbb{R}}^{k+1}\), fix then \(p_{\vec{a}}({y}):={a}_{0}+{a}_{1}{y}+\cdots +{a}_{k}{y}^{k}\) and compute

$$\begin{aligned} Lp_{\vec{a}}({y})&=2{a}_{2}{y}+\cdots +\big(-j(j-1){a}_{j}+(j+1)j{a}_{j+1} \big){y}^{j} \\ &\phantom{=:}+\cdots +\big(-k(k-1){a}_{k}\big){y}^{k}. \end{aligned}$$
(2.4)

Observe that \(Lp_{\vec{a}}\) is again a polynomial of degree at most \(k\), showing that \(L\) is indeed ℝ-polynomial and thus \([0,1]\)-polynomial. Moreover, one can see from (2.4) that the \(k\)th dual operator \(L_{k}:{\mathbb{R}}^{k+1}\to {\mathbb{R}}^{k+1}\) corresponding to \(L\) is given by \(L_{k}^{j}\vec{a}=-j(j-1){a}_{j}+(j+1)j{a}_{j+1}\) for \(j\in \{0,\ldots ,k-1\}\) and \(L_{k}^{k}\vec{a}=-k(k-1){a}_{k}\). Thus \(L_{k}\) can be identified with the unique matrix \(G_{k}\) such that \(L_{k}\vec{a}=G_{k}\vec{a}\).

Remark 2.5

Observe that whenever a \(k\)th dual operator \(L_{k}\) satisfies

$$ L_{k}(0,\ldots ,D^{\otimes j},\ldots ,0)\subseteq \big(0,\ldots ,({B}^{ \otimes j})^{*},\ldots ,0\big) $$

for \(j\in \{0, \ldots , k\}\), one can define auxiliary operators \({\mathcal{L}}_{j}:D^{\otimes j}\to ({B}^{\otimes j})^{*}\) such that

$$ L_{k}\vec{a}=({\mathcal{L}}_{0}{a}_{0},\ldots , {\mathcal{L}}_{k}{a}_{k}) $$

for each \(\vec{a}\in \bigoplus _{j=0}^{k} D^{\otimes j}\) and in turn also for each \(\vec{a}\in \bigoplus _{j=0}^{k}{\mathcal{D}}({\mathcal{L}}_{j}):= { \mathcal{D}}(L_{k})\).

Observe that whenever Remark 2.5 applies, the notation becomes considerably simpler and the computations are easier to handle. This is for instance the case when working with probability-measure-valued processes (see Sect. 4.3).

2.2 Bidual operators

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator and fix \(k\in {\mathbb{N}}_{0}\). We introduce now the notion of a bidual operator which is slightly more delicate. Its domain of definition is

$$\begin{aligned} {\mathcal{D}}_{k}:=\mathrm{span}\bigg\{ \overline{{y}}\in \bigoplus _{j=0}^{k}{ \mathcal{S}}^{\otimes j} &\colon |Lp_{{a}_{i}}({y})|\leq C_{y}\|{a}_{i} \|_{*i} \\ &\phantom{=}\text{for all ${a}_{i}\in D^{\otimes i}$ and $i\in \{0,\ldots ,k\}$} \bigg\} , \end{aligned}$$
(2.5)

where \(p_{{a}_{i}}({y}):=\langle {a}_{i},{y}^{\otimes i}\rangle \), \(C_{y}\in {\mathbb{R}}\) and \(\|{\,\cdot \,}\|_{*i}\) is given by (1.1) (extended to higher order). In words, \({\mathcal{D}}_{k}\) consists of linear combinations of \(\overline{{y}}=(1,{y},\ldots ,{y}^{\otimes k})\) such that the linear operator from \((D^{\otimes i},\|{\,\cdot \,}\|_{*i})\) to ℝ given by

$$ {a}_{i}\mapsto Lp_{{a}_{i}}({y}) $$

is bounded (and thus continuous) for each \(i\in \{0,\ldots ,k\}\).

Definition 2.6

Let \({\mathcal{D}}_{k}\) be the set defined in (2.5). A \(k\)th bidual operator corresponding to \(L\), denoted by \({M}_{k}:{\mathcal{D}}_{k} \to \bigoplus _{j=0}^{k} ({B}^{\otimes j})^{**}\), is a linear operator satisfying

$$ Lp( {y})= (\vec{a}\cdot M_{k} \overline{y})\qquad \text{for all } \vec{a}\in \bigoplus _{j=0}^{k} D^{\otimes k}, $$

where \(p({y}):=(\vec{a}\cdot \overline{y})\) as in (2.2).

The motivation to work with the potentially large bidual spaces is given in Remark 3.11 below. As one would expect, dual and bidual operators are strongly connected. We exploit this relation in the following lemma.

Lemma 2.7

Fix a \(k\)th dual operator \(L_{k}\) and a \(k\)th bidual operator \({M}_{k}\). Then \(L_{k}\) and \({M}_{k}\) are adjoint with respect to the relation \(( {\,\cdot \,})\) defined in (2.3), meaning that

$$\begin{aligned} (L_{k} \vec{a}\cdot \vec{y})=(\vec{a}\cdot M_{k} \vec{y}) \end{aligned}$$
(2.6)

for all \(\vec{a}\in \bigoplus _{j=0}^{k} D^{\otimes k}\) and \(\vec{y}\in {\mathcal{D}}_{k}\). Whenever \(L_{k}\) is a closable operator, (2.6) holds also for all \(\vec{a}\in {\mathcal{D}}(L_{k})\).

Proof

The result follows by noting that \((L_{k} \vec{a}\cdot \overline{{y}})=Lp_{\vec{a}}( {y})=(\vec{a} \cdot M_{k} \overline{{y}})\) for all \(\vec{a}\in \bigoplus _{j=0}^{k} D^{\otimes j}\) and \({y}\in {\mathcal{S}}\) such that \(\overline{{y}}\in {\mathcal{D}}_{k}\), where \(p_{\vec{a}}({y}):=(\vec{a}\cdot \overline{y})\). □

To illustrate the notion of the bidual operator, we consider again the one-dimensional Jacobi diffusion.

Example 2.8

Consider again the setting of Example 2.4. An inspection of (2.4) shows that the \(k\)th bidual operator \({M}_{k}:{\mathbb{R}}^{k+1}\to {\mathbb{R}}^{k+1}\) corresponding to \(L\) satisfies \({M}_{k}^{i}\overline{{y}}=i(i-1)({y}^{i-1}-{y}^{i})\) for \({y}\in [0,1]\), yielding \({M}_{k}^{i}\vec{y}=i(i-1)({y}_{i-1}-{y}_{i})\) for \(i\in \{0,\ldots ,k\}\) and \(\vec{y}\in {\mathbb{R}}^{k+1}\). As in the case of dual operators, the \(k\)th bidual operator can be identified with the unique matrix \(\widetilde{G}_{k}\) such that \({M}_{k}\vec{y}=\widetilde{G}_{k}\vec{y}\) for all \(\vec{y}\in {\mathbb{R}}^{k+1}\). A direct computation shows then that \(\widetilde{G}_{k}=G_{k}^{\top }\). This relation is nothing else than (2.6), which in this setting reads \(\vec{a}^{\top }\widetilde{G}_{k}\vec{y}= (\vec{a}\cdot M_{k} \vec{y})= (L_{k} \vec{a}\cdot \vec{y}) =\vec{a}^{\top }G_{k}^{\top }\vec{y}\) for all \(\vec{a},\vec{y}\in {\mathbb{R}}^{k+1}\).

Remark 2.9

Suppose that the conditions of Remark 2.5 hold for some \(k\)th dual operator \(L_{k}\), set \({\mathcal{D}}_{k,j}:=\{{y}_{j}\in {\mathcal{S}}^{\otimes j}\colon \vec{y}\in {\mathcal{D}}_{k}\}\) and consider the linear operator \({\mathcal{M}}_{j}:{\mathcal{D}}_{k,j}\to ({B}^{\otimes j})^{**}\) uniquely defined by

$$ \langle {a}_{j},{\mathcal{M}}_{j}{y}_{j}\rangle :=\langle {\mathcal{L}}_{j}{a}_{j},{y}_{j} \rangle ,\qquad {a}_{j}\in {\mathcal{D}}({\mathcal{L}}_{j}), {y}_{j} \in {\mathcal{D}}_{k,j}. $$

By Lemma 2.7, we can then show that \({M}_{k}\vec{y}=({\mathcal{M}}_{0}{y}_{0},\ldots , {\mathcal{M}}_{k}{y}_{k})\) for each \(\vec{y}\in {\mathcal{D}}_{k}\). Indeed, under the given conditions, we have

$$ \langle {\mathcal{L}}_{j} {a}_{j},{y}^{\otimes j}\rangle =\big(L_{j}({a}_{j} \vec{e}_{j})\cdot \overline{y}\big)=({a}_{j} \vec{e}_{j} \cdot M_{j} \overline{y}) $$

for each \({y}\in {B}\). By linearity, we can conclude that the \(j\)th component of \(M_{j}\) can depend on \(\overline{y}\) just through \({y}^{\otimes j}\), whence the above equality.

Similarly as in Remark 2.5, the simplification of notation proposed in Remark 2.9 is substantial and particularly useful when working with probability-measure-valued processes.

3 Polynomial processes with values in \({B}\)

In this section, we define a \({B}\)-valued polynomial process and derive two moment formulas. We start by introducing a concept of measurability to which we implicitly always refer when speaking of \({B}\)-valued random variables and processes.

Definition 3.1

Fix a filtered probability space \((\Omega , {\mathcal{F}},({\mathcal{F}}_{t})_{t\geq 0}, {\mathbb{P}})\).

(i) Fix \({\mathcal{G}}\subseteq {\mathcal{F}}\) and \(k\in {\mathbb{N}}\). A map \(\lambda :\Omega \to {B}^{\otimes k}\) is \({\mathcal{G}}\)-weakly measurable if \(\langle {a},\lambda \rangle \) is \({\mathcal{G}}\)-measurable for all \({a}\in ({B}^{\otimes k})^{*}\). For \({\mathcal{G}}={\mathcal{F}}\), we say that \(\lambda \) is weakly measurable.

(ii) A \({B}\)-valued adapted process \((\lambda _{t})_{t\geq 0}\) (or simply a \({B}\)-valued process) is a map defined on \({\mathbb{R}}_{+}\times \Omega \) with values in \({B}\) such that \(\lambda _{t}^{\otimes k}:\Omega \to {B}^{\otimes k}\) is \({\mathcal{F}}_{t}\)-weakly measurable for each \(k\in {\mathbb{N}}\) and each \(t\geq 0\).

In particular, for a \({B}\)-valued process \((\lambda _{t})_{t\geq 0}\), one has that \((p(\lambda _{t}))_{t\geq 0}\) is a real-valued adapted process for all \(p\in P\) and \(t\geq 0\).

Let \({\mathcal{S}}\subseteq {B}\), fix a linear subspace \(D\subseteq {B}^{*}\) and let \(L: P^{D} \to P\) be a linear operator. An \({\mathcal{S}}\)-valued process \((\lambda _{t})_{t\geq 0}\) defined on some filtered probability space \((\Omega , {\mathcal{F}},({\mathcal{F}}_{t})_{t\geq 0}, {\mathbb{P}})\) is called a solution to the martingale problem for \(L\) if

(i) \(\lambda _{0}= {y}_{0}\) ℙ-a.s. for some initial value \({y}_{0}\in {\mathcal{S}}\);

(ii) for every \(p\in P^{D}\), there exist càdlàg versions of \((p(\lambda _{t}))_{t \geq 0}\) and \((Lp(\lambda _{t}))_{t \geq 0}\);

(iii) the process

$$ N^{p}_{t} := p(\lambda _{t}) - p(\lambda _{0}) - \int _{0}^{t} Lp( \lambda _{s}) ds, \qquad t \geq 0, $$
(3.1)

defines a local martingale for every \(p\in P^{D}\).

Uniqueness of solutions to the martingale problem is always understood as uniqueness in law. The martingale problem for \(L\) is well posed if for every \({y}_{0}\in {\mathcal{S}}\), there exists a unique \({\mathcal{S}}\)-valued solution to the martingale problem for \(L\) with initial value \({y}_{0} \in {\mathcal{S}}\).

Definition 3.2

Let \(L\) be \({\mathcal{S}}\)-polynomial. A solution to the martingale problem for \(L\) is called an \({\mathcal{S}}\)-valued polynomial process.

3.1 Dual moment formula

Our goal here is to derive an analogue of the moment formula in this general infinite-dimensional setting. To do this, it is crucial that the local martingales defined in (3.1) are in fact true martingales. In the finite-dimensional case (see Cuchiero et al. [25]), but also in the infinite-dimensional case when dealing with compact state spaces (see the probability-measure case in Cuchiero et al. [26]), this is always true; see Remark 3.20 below.

Since in our setting this does not necessarily hold true, we need to include the true martingale property as an additional assumption (see the condition in Theorem 3.4 (ii) below). In Sect. 3.3, we illustrate some conditions under which this assumption is satisfied. This is in particular the case if \(D=B^{*}\).

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator, fix \(k\in {\mathbb{N}}_{0}\) and let \(L_{k}\) be a closable \(k\)th dual operator corresponding to \(L\) with domain \({\mathcal{D}}(L_{k})\). Before stating the theorem, we extend the domain of \(L\) to polynomials with coefficients in \({\mathcal{D}}(L_{k})\) by setting

$$ p_{\vec{a}}(y):=(\vec{a}\cdot \overline{y})\qquad \text{and}\qquad Lp_{ \vec{a}}(y):=(L_{k} \vec{a}\cdot \overline{y}) $$
(3.2)

for all \(\vec{a}\in {\mathcal{D}}(L_{k})\) and \(k\in {\mathbb{N}}_{0}\). As in finite dimensions, the moment formula corresponds to a solution of a system of linear ODEs. In the current infinite-dimensional setting, we need to make the solution concept precise.

Definition 3.3

Let ℬ be a subset of \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{**}\). We call a function \(t \mapsto \vec{a}_{t}\) with values in \(\mathcal{D}(L_{k})\) a ℬ-solution of the \((k+1)\)-dimensional system of ODEs

$$ \partial _{t} \vec{a}_{t} = L_{k} \vec{a}_{t}, \qquad \vec{a}_{0}= \vec{a}, $$

if for every \(t >0\), it holds that \((\vec{a}_{t} \cdot \vec{y}) = (\vec{a}\cdot \vec{y})+\int _{0}^{t} (L_{k} \vec{a}_{s} \cdot \vec{ y})ds \) for all \(\vec{y}\in \mathcal{B}\).

This solution concept resembles at first sight weak solutions due to the pairing with \(\vec{y}\in \mathcal{B}\). However, we require here that \(\vec{a}_{t} \in \mathcal{D}(L_{k})\) which corresponds rather to a strong solution. The pairing with \(\vec{y}\) allows us in particular to avoid checking whether Bochner integration is well defined.

We are now ready to state the dual moment formula as the main result of this section. To simplify the notation, set \(\overline{{\mathcal{S}}}_{k}:=\{\overline{{y}}=(1,{y},\ldots ,{y}^{ \otimes k})\colon {y}\in {\mathcal{S}}\}\).

Theorem 3.4

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator, fix \(k\in {\mathbb{N}}_{0}\) and \(T>0\), let \(L_{k}\) be a \(k\)th dual operator corresponding to \(L\) and assume that \(L_{k}\) is closable with domain \({\mathcal{D}}(L_{k})\). Let \((\lambda _{t})_{t \geq 0} \) be an \({\mathcal{S}}\)-valued polynomial process corresponding to \(L\) and fix \(\vec{a}=(a_{0}, \ldots , a_{k})\in {\mathcal{D}}(L_{k})\). Suppose that the following conditions hold true:

  1. (i)

    There is an \(\overline{{\mathcal{S}}}_{k}\)-solution in the sense of Definition 3.3to the \((k+1)\)-dimensional system of linear ODEs on \([0,T]\) given by

    $$ \partial _{t} \vec{a}_{t} = L_{k} \vec{a}_{t}, \qquad \vec{a}_{0}= \vec{a}. $$
    (3.3)
  2. (ii)

    The process \((N_{t}^{p_{\vec{a}_{s}}})_{t\in [0,T]}\) given by (3.1), for \(p_{\vec{a}_{s}}\) and \(Lp_{\vec{a}_{s}}\) as in (3.2), defines a true martingale on \([0,T]\).

  3. (iii)

    \(\int _{0}^{T}\int _{0}^{T} |{\mathbb{E}}[ Lp_{\vec{a}_{s}} (\lambda _{u})] |dsdu<\infty \).

Then for all \(0\leq t\leq T\), the representation

$$ \mathbb{E}[ {a}_{0}+ \langle {a}_{1}, \lambda _{T} \rangle +\cdots + \langle {a}_{k}, \lambda _{T}^{\otimes k} \rangle \,|\, \mathcal{F}_{t} ]={a}_{T-t,0}+\langle {a}_{T-t,1}, \lambda _{t}\rangle +\cdots + \langle {a}_{T-t,k}, \lambda ^{\otimes k}_{t}\rangle $$

holds almost surely. In shorthand notation, \(\mathbb{E}[ (\vec{a}_{0} \cdot \overline{\lambda }_{T}) \,|\, \mathcal{F}_{t} ]=( \vec{a}_{T-t } \cdot \overline{\lambda _{t}}) \).

Proof

We follow the proof of Ethier and Kurtz [32, Theorem 4.4.11] in order to obtain a slightly more general result (compare also with Cuchiero et al. [26]).

Fix \(T\in {\mathbb{R}}_{+}\), \(t\in [0,T]\) and \(A\in {\mathcal{F}}_{t}\). For all \((s,u)\in [0,T-t]\times [0,T-t]\), define

$$ f(s, u) :={\mathbb{E}}[(\vec{a}_{s}{\,\cdot \,}\overline{\lambda }_{t+u})1_{A}]. $$

Fix \(u\in [0,T-t]\) and note that (3.3) yields

$$\begin{aligned} f(\overline{s}, u) -f(\underline{s}, u) ={\mathbb{E}}\big[\big( ( \vec{a}_{\overline{s}}\cdot \overline{\lambda }_{t+u} )-( \vec{a}_{ \underline{s}}\cdot {\overline{\lambda }_{t+u}} ) \big) 1_{A}\big] =\! \int _{\underline{s}}^{\overline{s}}\!{\mathbb{E}}[ ( L_{k}\vec{a}_{s} \cdot {\overline{\lambda }_{t+u}} ) 1_{A}]ds \end{aligned}$$

for all \(\overline{s}, \underline{s} \in [0,T-t]\). Fix then \(s\in [0,T-t]\) and note that condition (ii) yields

$$ f(s, \overline{s}) -f(s, \underline{s}) ={\mathbb{E}}\big[{\mathbb{E}}[ ( \vec{a}_{s}\cdot {\overline{\lambda }_{t+\overline{s}}}) - (\vec{a}_{s} \cdot {\overline{\lambda }_{t+\underline{s}}}) |{\mathcal{F}}_{t}] 1_{A} \big]= \!\int _{\underline{s}}^{\overline{s}}\!{\mathbb{E}}[ (L_{k} \vec{a}_{s}\cdot \overline{\lambda }_{t+u}) 1_{A}]du. $$

Since \(\int _{0}^{T-t}\int _{0}^{T-t} |{\mathbb{E}}[ ( L_{k}\vec{a}_{s}{\, \cdot \,}{ \overline{\lambda }_{t+u}})] |dsdu<\infty \) by condition (iii), Ethier and Kurtz [32, Lemma 4.4.10] then yields

$$\begin{aligned} &{\mathbb{E}}[(\vec{a}_{T-t}{\,\cdot \,}\overline{\lambda }_{t})1_{A}]-{ \mathbb{E}}[(\vec{a}_{0}{\,\cdot \,}\overline{\lambda }_{T})1_{A}] \\ &=f(T-t,0)-f(0,T-t) \\ &=\int _{0}^{T-t} {\mathbb{E}}[( L_{k}\vec{a}_{s}\cdot { \overline{\lambda }_{T-t -s}} ) 1_{A}] -{\mathbb{E}}[ ( L_{k}\vec{a}_{s} \cdot \overline{\lambda }_{T-t-s})1_{A}]ds =0, \end{aligned}$$

and the result follows. □

Example 3.5

Consider again the setting of Example 2.8 and let \((\lambda _{t})_{t \geq 0} \) be a Jacobi diffusion with vanishing drift, i.e., a continuous ℝ-valued polynomial process corresponding to \(L\), and fix \(\vec{a}\in {\mathbb{R}}^{k+1}\). We illustrate now how Theorem 3.4 can be applied in this setting. Observe that \(L_{k}\) is well defined on \({\mathcal{D}}(L_{k})={\mathbb{R}}^{k+1}\) by definition. Observe that the system of linear ODEs given in (i) of Theorem 3.4 is given by

$$ \partial _{t} \vec{a}_{t} = G_{k}\vec{a}_{t}, \qquad \vec{a}_{0}= \vec{a}, $$

and it is solved by \(\vec{a}_{t}=e^{tG_{k}}\vec{a}\) which lies in \({\mathbb{R}}^{k+1}\). As we shall see in Remark 3.20 and Example 3.21, condition (ii) of Theorem 3.4 is always satisfied in the finite-dimensional case. Since \(L_{k}\) is a bounded operator, continuity of \((\vec{a}_{t})_{t\geq 0}\) is enough to guarantee condition (iii) of Theorem 3.4. We can thus conclude that

$$\begin{aligned} \mathbb{E}[{a}_{0}+ {a}_{1}\lambda _{T} +\cdots + {a}_{k} \lambda _{T}^{k} \,|\, \mathcal{F}_{t} ]&={a}_{T-t,0}+\ {a}_{T-t,1}\lambda _{t}+ \cdots + {a}_{T-t,k} \lambda ^{k}_{t} \\ &=(1,\lambda _{t},\ldots ,\lambda _{t}^{k})^{\top }e^{(T-t)G_{k}} \vec{a}. \end{aligned}$$

3.2 Bidual moment formula

Let us now pass to the bidual moment formula which involves the bidual operator. As before, let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator and fix \(k\in {\mathbb{N}}_{0}\). Before stating the result, we need to introduce a notion of integration for \({B}^{\otimes k}\)-valued maps. One possibility is to use the notion of the Dunford integral (see e.g. Ryan [58, Sect. 3.3]).

Definition 3.6

Let \(\lambda :\Omega \to {B}^{\otimes k}\) be weakly measurable in the sense of Definition 3.1. We say that \(\lambda \) is Dunford-integrable if

$$ {\mathbb{E}}[|\langle {a},\lambda \rangle |]< \infty \qquad \text{and} \qquad {\mathbb{E}}[\langle {a},\lambda \rangle ]=\langle {a},m \rangle $$

for some \(m\in ({B}^{\otimes k})^{**}\) and all \({a}\in ({B}^{\otimes k})^{*}\). In this case, we write \({\mathbb{E}}[\lambda ]=m\).

For the bidual moment formula, we need the following weak solution concept.

Definition 3.7

Fix a \(k\)th dual operator \(L_{k}\) and a \(k\)th bidual operator \({M}_{k}\) corresponding to \(L\). Let \(\mathcal{H} \subseteq \mathcal{D}(L_{k})\). We call a function \(t \mapsto \vec{m}_{t}\) with values in \(\bigoplus _{j=0}^{k} ({B}^{\otimes j})^{**}\) an ℋ-weak solution of the \((k+1)\)-dimensional system of ODEs

$$ \partial _{t} \vec{m}_{t} = M_{k} \vec{m}_{t}, \qquad \vec{m}_{0}=\vec{m}, $$

if for every \(t >0\) and \(a \in \mathcal{H}\), it holds that \((\vec{a} \cdot \vec{m}_{t}) = (\vec{a} \cdot \vec{m})+\int _{0}^{t} (L_{k} \vec{a} \cdot \vec{m}_{s})ds \).

Note that in contrast to Definition 3.3, we deal here with a truly weak solution concept since the adjoint operator \(L_{k}\) is involved. The next result is then the bidual moment formula.

Theorem 3.8

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator, fix \(k\in {\mathbb{N}}_{0}\), fix a \(k\)th dual operator \(L_{k}\) and a \(k\)th bidual operator \({M}_{k}\) corresponding to \(L\) and assume that \(L_{k}\) is closable with domain \(\mathcal{D}(L_{k})\). Let \((\lambda _{t})_{t \geq 0} \) be an \({\mathcal{S}}\)-valued polynomial process corresponding to \(L\). Suppose that

$$ \lambda _{t}^{\otimes j}\ \textit{is Dunford-integrable for all}\ j\in \{0,\ldots , k\}\ \textit{and}\ t >0 $$
(3.4)

and set \(\vec{m}_{t}:=(1,{\mathbb{E}}[\lambda _{t}],\ldots ,{\mathbb{E}}[ \lambda _{t}^{\otimes k}])\). Using the notation of (3.1) and (3.2), set

$$ \mathcal{E}:=\{\vec{a}\in {\mathcal{D}}(L_{k})\colon N^{p_{\vec{a}}} \textit{ is a true martingale on}\ [0, \infty )\}. $$

Then \((\vec{m}_{t})_{t\geq 0}\) is an ℰ-weak solution of the \((k+1)\)-dimensional system of linear ODEs given by

$$\begin{aligned} \partial _{t} \vec{m}_{t} = {M}_{k}\vec{m}_{t}, \qquad \vec{m}_{0}=(1, \lambda _{0},\ldots ,\lambda _{0}^{\otimes k}). \end{aligned}$$
(3.5)

Remark 3.9

Condition (3.4) explicitly reads as

$$ \sup \{{\mathbb{E}}[\langle {a}_{j},\lambda _{t}^{\otimes j}\rangle ] : {a}_{j}\in ({B}^{\otimes j})^{*}, \|{a}_{j}\|_{*j}\leq 1\}< \infty , \qquad j\in \{0,\ldots , k\}. $$

Since for \(k\) even, the map \({y}^{\otimes k}\mapsto \|{y}\|^{k}_{\times }\) can be extended to an element of \(({B}^{\otimes k})^{*}\), this condition is equivalent to \({\mathbb{E}}[\|\lambda _{t}\|^{k}]<\infty \). See also Remark 3.20.

Proof of Theorem 3.8

Fix \(\vec{a}\in \mathcal{E}\) and define \(p_{\vec{a}}\) and \(Lp_{\vec{a}}\) as in (3.2). Recall that by the definition of ℰ, we have that \(N^{p_{\vec{a}}}\) is a true martingale and thus

$$ {\mathbb{E}}[p_{\vec{a}}(\lambda _{t})] - p_{\vec{a}}(\lambda _{0}) - \int _{0}^{t} {\mathbb{E}}[Lp_{\vec{a}}(\lambda _{s})] ds =0. $$
(3.6)

Recall that \({\mathbb{E}}[(\vec{a}\cdot \overline{\lambda _{t}})]=(\vec{a}\cdot \vec{m_{t}})\) and \((\vec{a}\cdot \overline{\lambda _{0}})=(\vec{a}\cdot \vec{m}_{0})\). Moreover, by the definition of \(L_{k}\), we have \({\mathbb{E}}[Lp_{\vec{a}}(\lambda _{s})] ={\mathbb{E}}[(L_{k} \vec{a}\cdot \overline{\lambda }_{s})]=(L_{k}\vec{a}\cdot \vec{m}_{s}) \). Plugging those terms into (3.6) yields \((\vec{a} \cdot \vec{m}_{t}) = (\vec{a} \cdot \vec{m}_{0})+\int _{0}^{t} (L_{k} \vec{a} \cdot \vec{m}_{s})ds \) and thus the assertion. □

Remark 3.10

An inspection of the moment formulas gives the impression that the dual moment formula is more suitable for computing \({\mathbb{E}}[p(\lambda _{t})]\) for some fixed polynomial \(p \in P^{D}\). In practice, it can, however, happen that the system of linear ODEs given by (3.3) is harder to solve than its adjoint given by (3.5). The application presented in Sect. 4.4 is a clear instance of this situation. If this is the case, the bidual moment formula can be used to provide a heuristic ansatz for a solution \((\vec{a}_{t})_{t\geq 0}\) of the dual system of ODEs. Indeed, let \((\vec{m}_{t})_{t\geq 0}\) be a solution of the bidual ODE system and assume that both moment formulas hold. Then the relation \((\vec{a}_{t} \cdot \overline{\lambda }_{0})={\mathbb{E}}[(\vec{a}\cdot \overline{\lambda }_{t})|\lambda _{0}]=(\vec{a}\cdot \vec{m}_{t})\) holds and can be used as defining property for \((\vec{a}_{t})_{t\geq 0}\).

Remark 3.11

We are now in the position to explain why we decided to work with the potentially very large bidual space \((B^{\otimes k})^{**}\) instead of the space \(B^{\otimes k}\) itself (recall that \(B^{\otimes k}\) stands for the algebraic tensor product). As we shall see in the applications (see for instance Sect. 4.4), the solution \((\vec{m}_{t})_{t\geq 0}\) of (3.5) typically does not belong to \(B^{\otimes k}\). A possible alternative choice would be to work with the closure \(\overline{B^{\otimes k}}\) of \(B^{\otimes k}\) with respect to some crossnorm. This would, however, have two main disadvantages: first, elements of \(\overline{B^{\otimes k}}\setminus B^{\otimes k}\) are just abstractly defined, and checking if some \({y}_{k}\) is in \(\overline{B^{\otimes k}}\) is typically quite involved. Second, every element of \(\overline{B^{\otimes k}}\) corresponds to an element of \(({B^{\otimes k}})^{**}\), which implies that requiring that \({y}_{k}\in ({B^{\otimes k}})^{**}\) is less restrictive than requiring that \({y}_{k}\in \overline{B^{\otimes k}}\).

Observe that Theorem 3.8 does not guarantee that a solution of the given system of ODEs coincides with the deterministic process \((1,{\mathbb{E}}[\lambda _{t}],\ldots ,{\mathbb{E}}[\lambda _{t}^{ \otimes k}])\). Indeed, this result can fail if the solution of (3.5) is not unique. Let us here state some precise conditions under which this correspondence holds true. Since this result is a direct consequence of Theorem 3.8, we omit the proof.

Corollary 3.12

Consider two ℰ-weak solutions \((\vec{m}_{t}^{1})_{t \geq 0}\), \((\vec{m}_{t}^{2})_{t \geq 0}\) of (3.5).

(i) Suppose that strong uniqueness holds, i.e., \(\vec{m}_{t}^{1}= \vec{m}_{t}^{2}\) for all \(t \geq 0\). Then \(\vec{m}_{t}^{1}=\vec{m}_{t}^{2}= (1,{\mathbb{E}}[\lambda _{t}],\ldots ,{ \mathbb{E}}[\lambda _{t}^{\otimes k}]) \).

(ii) Fix \({\mathcal{H}}\subseteq {\mathcal{E}}\) and suppose that ℋ-weak uniqueness holds, i.e., \((\vec{a} \cdot \vec{m}_{t}^{1})= (\vec{a} \cdot \vec{m}_{t}^{2})\) for all \(\vec{a} \in {\mathcal{H}}\) and \(t \geq 0\). Then \((\vec{a} \cdot \vec{m}^{1}_{t})=(\vec{a} \cdot \vec{m}^{2}_{t})={ \mathbb{E}}[(\vec{a} \cdot \overline{\lambda }_{t})] \) for each \(\vec{a} \in {\mathcal{H}}\).

Remark 3.13

Suppose that the martingale problem for \(L\) is well posed, implying that the corresponding semigroup \((P_{t})_{t \geq 0}\) is well defined on \(P^{D}\). Assume that it can be uniquely extended to \(P\). Then \(P_{t}p_{\vec{a}}(\lambda )=\mathbb{E}[p_{\vec{a}}(\lambda _{t})]\) uniquely solves the abstract Cauchy problem given by

$$\begin{aligned} \partial _{t} u(t,\lambda )&= \overline{L} u(t, \lambda ),\qquad t \geq 0, \\ u(0,\lambda )&=p_{\vec{a}}(\lambda )= (\vec{a} \cdot \overline{\lambda }), \end{aligned}$$
(3.7)

where \(\overline{L}\) denotes the extension of \(L\) as generator of \((P_{t})_{t \geq 0}\). Let now \(\vec{m}_{t}\) be an ℰ-weak solution of (3.5) with \(\vec{m}_{0}=\overline{\lambda }\). Then ℰ-weak uniqueness holds, as \((\vec{a} \cdot \vec{m}_{t})\) solves (3.7) and by uniqueness of the solution to the Cauchy problem, this must be equal to \(P_{t}p_{\vec{a}}(\lambda )\).

The next corollary provides other sufficient conditions under which any solution \((m_{t})_{t \geq 0}\) of (3.5) satisfies \(\mathbb{E}[ (\vec{a}\cdot \overline{\lambda }_{T})]= (\vec{a} \cdot \vec{m}_{T})\) for \(\vec{a}\in \mathcal{D}(L_{k})\). Recall the notation \(\overline{{\mathcal{S}}}_{k}:=\{\overline{{y}}=(1,{y},\ldots ,{y}^{ \otimes k})\colon {y}\in {\mathcal{S}}\}\).

Corollary 3.14

Let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator, fix \(k\in {\mathbb{N}}_{0}\) and \(T>0\) and fix a closable \(k\)th dual operator \(L_{k}\) with domain \({\mathcal{D}}(L_{k})\) and a \(k\)th bidual operator \({M}_{k}\) corresponding to \(L\). Let \((\lambda _{t})_{t \geq 0} \) be an \({\mathcal{S}}\)-valued polynomial process corresponding to \(L\) and fix \(\vec{a}\in {\mathcal{D}}(L_{k})\). Suppose that the following conditions hold:

  1. (i)

    Let \((\vec{a}_{t})_{t\in [0,T]}\) be an \(\overline{{\mathcal{S}}}_{k}\)-solution of (3.3) with \(\vec{a}_{0}=\vec{a}\) such that the conditions of Theorem 3.4are satisfied, and denote bythe set \({\mathcal{R}}:=\{\vec{a}_{t}\colon t\in [0,T]\}\).

  2. (ii)

    There is an ℛ-weak solution \((\vec{m}_{s})_{s \in [0,T]}\) in the sense of Definition 3.7to the \((k+1)\)-dimensional system of linear ODEs given by (3.5).

  3. (iii)

    \((\vec{a}_{t})_{t\in [0,T]}\) can be paired with \((\vec{m}_{s})_{s \in [0,T]}\), that is, \((\vec{a}_{t})_{t\in [0,T]}\) satisfies additionally for all \(t,s \in [0,T]\) that

    $$ (\vec{a}_{t} \cdot \vec{m}_{s}) = (\vec{a}\cdot \vec{m}_{s})+\int _{0}^{t} (L_{k} \vec{a}_{u} \cdot \vec{m}_{s})du. $$
  4. (iv)

    \(\int _{0}^{T}\int _{0}^{T} |(L_{k} \vec{a}_{s} \cdot \vec{m}_{t})| ds dt< \infty \).

Then \(\mathbb{E}[ (\vec{a}\cdot \overline{\lambda }_{T})]= (\vec{a}\cdot \vec{m}_{T})\).

Proof

The proof consists in proving \((\vec{a}_{T} \cdot \vec{m}_{0})=(\vec{a}\cdot \vec{m}_{T})\) and then applying the dual moment formula. Set \(f(s,t):=(\vec{a}_{s}\cdot \vec{m}_{t})\) and \(F(s,t):=( L_{k} \vec{a}_{s} \cdot \vec{m}_{t})\) so that due to the conditions (i), (ii) and (iii), we have \(f(\overline{s}, t) -f(\underline{s}, t)=\int _{\underline{s}}^{ \overline{s}}F(s,t) ds \) and \(f( s, \overline{t}) -f( s, \underline{t})=\int _{\underline{t}}^{ \overline{t}}F(s,t) dt \). This together with condition (iv) and Ethier and Kurtz [32, Lemma 4.4.10] then yields

$$ f(T,0)= (\vec{a}_{T} \cdot \vec{m}_{0})=(\vec{a}\cdot \vec{m}_{T})=f(0,T). $$

Since \(\vec{m}_{0}=(1,\lambda _{0},\ldots ,\lambda _{0}^{\otimes k})\) and \(\mathbb{E}[ \vec{a}\cdot \overline{\lambda }_{T}]= (\vec{a}_{T} \cdot \overline{\lambda }_{0})\) by Theorem 3.4, the claim follows. □

Remark 3.15

Consider the setting of Corollary 3.14 and let \((\vec{a}_{t})_{t\in [0,T]}\) be given by (i). A deterministic process \((\vec{m}_{t})_{t\in [0,T]} \) then satisfies conditions (ii) and (iii) if and only if \((\vec{a}_{s} \cdot \vec{m}_{t} )\) is absolutely continuous in \(s\) and \(t\) and satisfies

t ( a s m t ) = ( L k a s m t ) , m 0 = ( 1 , λ 0 , , λ 0 k ) , s ( a s m t ) = ( L k a s m t ) , a 0 = a .

Example 3.16

Consider again the setting of Example 3.5. We now illustrate how Theorem 3.8 and Corollary 3.14 can be applied in this setting. Since \({\mathbb{E}}[\lambda _{t}^{j}] < \infty \) for each \(j\), we can set \(\vec{m}_{t}:=(1,{\mathbb{E}}[\lambda _{t}],\ldots ,{\mathbb{E}}[ \lambda _{t}^{k}])\) and the conditions of Theorem 3.8 are satisfied. We thus get that \(\vec{m}_{t}\) is a solution to the system of linear ODEs given by \(\partial _{t} \vec{m}_{t} = {M}_{k}\vec{m}_{t} \), for \(\vec{m}_{0}=(1,\lambda _{0},\ldots ,\lambda _{0}^{k}) \). Since also the conditions of Corollary 3.14 are satisfied (or more directly, since this system has a unique solution), we conclude that

$$ (1,{\mathbb{E}}[\lambda _{t}],\ldots ,{\mathbb{E}}[\lambda _{t}^{k}])^{\top }=e^{tG_{k}^{\top }}(1,\lambda _{0},\ldots ,\lambda _{0}^{k})^{\top }. $$

3.3 Some practical conditions for applying Theorem 3.4

We provide here some sufficient conditions which imply (ii) and (iii) of Theorem 3.4. Throughout the section, we let \(L\colon P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator, fix \(k\in {\mathbb{N}}_{0}\) and let \(L_{k}\) be a closable \(k\)th dual operator with domain \({\mathcal{D}}(L_{k})\) corresponding to \(L\). We also let \((\lambda _{t})_{t \geq 0} \) be a polynomial process corresponding to \(L\) and assume that \((\vec{a}_{t})_{t\geq 0}\) is an \(\overline{{\mathcal{S}}}_{k}\)-solution to (3.3) in the sense of Definition 3.3. A first intuitive sufficient condition that implies conditions (ii) and (iii) of Theorem 3.4 is provided in the following lemma. This result is particularly convenient when the set \({\mathcal{S}}\) (and thus the polynomial process \((\lambda _{t})_{t \geq 0} \)) is bounded, as in the probability-measure-valued setting (see Sect. 4.3).

Lemma 3.17

If \({\mathbb{E}}[\sup _{t\leq T}\|\lambda _{t}\|^{k}]<\infty \), then condition (ii) of Theorem 3.4is satisfied. In this case, condition (iii) of the same theorem is implied by

$$ \int _{0}^{T}\| {L_{k}^{j}}\vec{a}_{s}\|_{*j}ds< \infty \qquad \textit{for all } j\in \{0,\ldots ,k\}. $$
(3.8)

Proof

Set \(\Lambda _{k}:=(1+\sup _{t\in [0,T]}\|\lambda _{t}\|^{k})\) and define \(p_{\vec{a}}\) and \(Lp_{\vec{a}}\) as in (3.2). Since for each \(\vec{a}\in {\mathcal{D}}(L_{k})\) it holds that

$$ \sup _{t\in [0,T]}|p_{\vec{a}}(\lambda _{t})|\leq \Lambda _{k}\sum _{j=0}^{k} \| {a}_{j}\|_{*j}, \qquad \sup _{t\in [0,T]}| Lp_{\vec{a}}(\lambda _{t})| \leq \Lambda _{k}\sum _{j=0}^{k}\| {L_{k}^{j}}\vec{a}\|_{*j}, $$
(3.9)

we see that \({\mathbb{E}}[\sup _{t\leq T}\|\lambda _{t}\|^{j}]< \infty \) for all \(j\in \{0,\ldots ,k\}\) guarantees that the process \(N^{p}\) given by (3.1) is a true martingale for each \(p\in P^{D}\). The same bound together with the dominated convergence theorem can be used to prove that \(N^{p_{\vec{a}}}\) is a true martingale for each \(\vec{a}\in {\mathcal{D}}(L_{k})\). The second part of the statement follows from (3.9). □

We now move to a different condition which in particular guarantees that condition (ii) is always satisfied in the classical cases; see Example 3.21 below.

Definition 3.18

(i) We say that \(p\in P^{D}\) is \((C,q)\)-bounded on \({\mathcal{S}}\) if there exist a constant \(C >0\) and a polynomial \(q\in P^{D}\) such that on \({\mathcal{S}}\), we have

$$ p^{2}\leq C q,\qquad (Lp)^{2}\leq C q,\qquad |L q|\leq C q. $$

(ii) A general \(p \in P\) is also called \((C,q)\)-bounded on \({\mathcal{S}}\) if \((p,Lp)\) can be approximated by a sequence \(((p_{n},Lp_{n}))_{n\in {\mathbb{N}}}\) with \(p_{n}\in P^{D}\) being \((C,q)\)-bounded in the sense of (i).

The reason why the property in Definition 3.18 is so important can be seen from the next lemma.

Lemma 3.19

Fix \(\vec{a}\in {\mathcal{D}}(L_{k})\), define \(p_{\vec{a}}\) and \(Lp_{\vec{a}}\) as in (3.2) and suppose that \(p_{\vec{a}}\) is \((C,q)\)-bounded for some \(C >0\) and \(q\in P^{D}\) with \(q(\lambda _{0})=1\). Then for all \(t\geq 0\),

$$ {\mathbb{E}}[q(\lambda _{t})]\leq e^{ C t},\qquad {\mathbb{E}}[p_{ \vec{a}}(\lambda _{t})^{2}], {\mathbb{E}}\big[\big(Lp_{\vec{a}}( \lambda _{t})\big)^{2}\big]\leq Ce^{ C t}, \qquad {\mathbb{E}}\Big[ \sup _{s\leq t}(N_{s}^{p_{\vec{a}}})^{2}\Big]< \infty . $$

In this case, the process \(N^{p_{\vec{a}}}\) is a square-integrable martingale.

Proof

For \(p_{\vec{a}}\in P^{D}\), the proof follows that of Cuchiero et al. [25, Theorem 2.10], where the stopping times are chosen to be localising sequences for \(N^{q}\) and \(N^{p_{\vec{a}}}\) and the role of \(F\) there is now taken by \(q\). For \(p_{\vec{a}}\notin P^{D}\), the claim follows by the dominated convergence theorem. Indeed, by (3.1), \((C,q)\)-boundedness yields

$$ {\mathbb{E}}[(N^{p}_{t})^{2}]={\mathbb{E}}\Big[\lim _{n\to \infty }(N^{p_{n}}_{t})^{2} \Big] \leq 3C{\mathbb{E}}\bigg[q(\lambda _{t})+q(\lambda _{0})+t \int _{0}^{t} q(\lambda _{u})du\bigg]< \infty , $$

where the last inequality uses that \(q(\lambda _{t})\) is integrable due to the first part of the proof. Analogously, observe that \(|N^{p_{n}}_{t}-N^{p_{n}}_{s}|\) is dominated by the integrable quantity \(2C(q(\lambda _{t})+q(\lambda _{s})+(t-s) \int _{0}^{t} q(\lambda _{u})du)\). The dominated convergence theorem hence yields

$$ {\mathbb{E}}[(N^{p}_{t}-N^{p}_{s})1_{A}] =\lim _{n\to \infty }{ \mathbb{E}}[(N^{p_{n}}_{t}-N^{p_{n}}_{s})1_{A}]=0 $$

for all \(A\in {\mathcal{F}}_{s}\), proving the martingale property. □

Remark 3.20

Note that in the above definition, the pair \((C,q)\) can always depend on the polynomial \(p\). In classical cases (see Example 3.21 below), we can, however, choose \(q\) uniformly for all polynomials up to degree \(m\), say. Indeed, there typically exists a polynomial \(q_{m}\in P^{D}\) such that each \(p\in P^{D}\) with \(\deg (p)\leq m\) is \((C_{p},q_{m})\)-bounded for some constant \(C_{p}\) that can depend on the polynomial \(p\) (and we make this dependence now explicit). More precisely, in such cases, there is a \(q_{m}\in P^{D}\) satisfying

$$ 1+\|{y}\|^{2m}\leq K q_{m}({y})\qquad \text{for all } {y}\in { \mathcal{S}}, $$
(3.10)

for some constant \(K\). Since each polynomial \(p\in P\) with \(\deg (p)\leq m\) satisfies \(p({y})^{2}\leq K_{p}(1+ \|{y}\|^{2m})\) for some constant \(K_{p}\), condition (3.10) implies that \(p_{\vec{a}}\) is \((C_{p},q_{m})\)-bounded for each \(\vec{a}\in {\mathcal{D}}(L_{k})\), where \(C_{p}=(K_{p}+K_{Lp}+K_{q_{m}})K\).

Observe that condition (3.10) has two important consequences. First, it guarantees that \(N^{p_{\vec{a}}}\) is a square-integrable martingale for each \(\vec{a}\in {\mathcal{D}}(L_{k})\), and thus condition (ii) is satisfied. Second, since

$$ p_{\vec{a}_{s}}({y})=\sum _{j=0}^{k}\langle {L_{k}^{j}}\vec{a}_{s},{y}^{ \otimes j}\rangle \leq (1+\|{y}\|^{2k})\sum _{j=0}^{k}\| {L_{k}^{j}} \vec{a}_{s}\|_{*j} \leq Kq_{k}({y})\sum _{j=0}^{k}\| {L_{k}^{j}} \vec{a}_{s}\|_{*j}, $$

one can see that condition (iii) in Theorem 3.4 is implied by (3.8) also under (3.10).

Example 3.21

As mentioned before, condition (3.10) is always satisfied in the classical cases. Examples include the case where \({B}\) is finite-dimensional (where we can take \(q_{m}({y}):=1+\sum _{i} {y}_{i}^{2m}\)), the case where \({\mathcal{S}}\) is the set of finite positive measures on some underlying space \(E\) (where we can take \(q_{m}({y}):=1+{y}(E)^{2m}\), where \(y(E)\) denotes the total mass of \({y}\)), and the case where \({\mathcal{S}}\) is bounded (where we can take \(q_{m}(y):=1\); see for instance Sect. 4.3).

In the general case, however, \((C,q)\)-boundedness on \({\mathcal{S}}\) need not hold for each \(p\in P^{D}\).

Lemma 3.22

Suppose that \((\vec{a}_{t})_{t\geq 0}\) satisfies condition (i) of Theorem 3.4. If \(p_{\vec{a}_{s}}\) is \((C,q_{s})\)-bounded for each \(s\in [0,T]\) and some family of polynomials \((q_{s})_{s \in [0,T]}\), then conditions (ii) and (iii) of Theorem 3.4are satisfied.

Proof

Condition (ii) follows from Lemma 3.19. By the same result, we can compute

$$ \sup _{s,u\in [0,T]} |{\mathbb{E}}[ Lp_{\vec{a}_{s}}(\lambda _{u})] | \leq \sup _{u\in [0,T]} (1+Ce^{ Cu}) < \infty , $$

proving that condition (iii) is satisfied as well. □

4 Examples and applications

This section is devoted to showing the connection of our general setup with some examples from the literature as well as the wide applicability of the previously derived moment formulas. The wide generality under which the results are stated allows covering all kinds of state spaces where properties as the UMD property (needed for stochastic integration), reflexivity or separability fail to hold. Important examples of Banach spaces, e.g. \(\ell ^{\infty }\) or \(L^{\infty }\), do not satisfy any of these properties. For the space of continuous functions (on compacts) and its dual given by finite signed measures, the first two properties fail to hold. Measure-valued processes play an important role as illustrated in Sect. 4.3. When passing to densities, we encounter \(L^{1}\) which is another example where reflexivity and the UMD property fail.

Our goal here is in particular to consider applications in finance. Therefore we devote a large part to forward variance modelling and the computation of the expected signature which already plays an important role in finance and machine learning (see e.g. Levin et al. [53]).

4.1 Generic polynomial operators

We start by introducing generic polynomial operators of Lévy type (see also Larsson and Svaluto-Ferro [52, Sect. 4]). To do so, we briefly recall the notion of the Fréchet derivative.

Definition 4.1

Let \(({B},\|{\,\cdot \,}\|)\) be a Banach space. A map \(f:{B}\to {\mathbb{R}}\) is said to be Fréchet-differentiable at \({y}\in {B}\) if

$$ \lim _{\|\widetilde{{y}}\|\to 0} \frac{|f({y}+\widetilde{{y}})-f({y})-\langle \partial f({y}),\widetilde{{y}}\rangle |}{\|\widetilde{{y}}\|}=0 $$

for some \(\partial f({y})\in {B}^{*}\). Analogously, whenever it exists, we denote by \(\partial ^{k} f({y})\) the element of \(({B}^{\otimes k})^{*}\) corresponding to the \(k\)th iterated Fréchet derivative of \(f\) at \({y}\).

Observe in particular that for every sufficiently regular function \(\phi :{\mathbb{R}}^{d}\to {\mathbb{R}}\) and any \({a}_{1},\ldots ,{a}_{d} \in {B}^{*}\), setting \(p({y}):=\phi (\langle {a}_{1},{y}\rangle ,\ldots ,\langle {a}_{d},{y} \rangle )\) yields that \(p\) is Fréchet-differentiable at each \({y}\) in \({B}\) and

$$\partial ^{n} p({y})=\sum _{i_{1},\ldots , i_{n}=1}^{d} \phi _{i_{1}, \dots , i_{n}} (\langle {a}_{1},{y}\rangle ,\ldots ,\langle {a}_{d},{y} \rangle ) {a}_{i_{1}}\otimes \cdots \otimes {a}_{i_{n}}, $$

where \(\phi _{i_{1},\dots , i_{n}}(x):= \frac{\partial ^{n}\phi }{\partial x_{i_{1}}\cdots \partial x_{i_{n}}} (x)\). This in particular implies that for each \(p\in P^{D}\) for some \(D\subseteq B^{*}\), we have \(\partial ^{n} p({y})\in D^{\otimes n}\) for all \({y}\in {B}\). With the notion of the Fréchet derivative, we can now show how generic polynomial operators \(L:P^{D}\to P\) look.

Lemma 4.2

Let \({\mathcal{S}}\subseteq {B}\), fix a linear subspace \(D\subseteq {B}^{*}\) and let \(L: P^{D} \to P\) be a linear operator. Suppose that \(L:P^{D}\to P\) acts on polynomials \(p\in P^{D}\) by

$$ Lp({y}) = B\big(\partial p({y}), {y}\big) + \frac{1}{2} Q\big(\partial ^{2}p({y}), {y}\big) +\! \int _{{\mathcal{S}}}\big(p({z})-p({y})-\langle \partial p({y}),{z}-{y} \rangle \big) N({y},d{z}) $$

for each \({y}\in {\mathcal{S}}\), where the involved parameters are as follows:

\(B({\,\cdot \,},{y})\) is a linear operator from \(D\) to ℝ, and \(B({a},\cdot )\) is a polynomial of degree at most 1 on \({B}\) for each \({a}\in D\).

\(N({y},d{z})\) is a measure on \({\mathcal{S}}\) such that \(\int _{{\mathcal{S}}}\langle {a},{y}-{z}\rangle ^{k} N({y},d{z})\) is a polynomial of degree at most \(k\) on \(B\) for each \({a}\in D\) and \(k\in \{3,4,\ldots \}\).

\(Q({\,\cdot \,},{y})\) is a linear operator from \(D\otimes D\) to ℝ, \(Q({a}\otimes {a},{y})\geq 0\) and

$$ y \mapsto Q({a}\otimes {a},{y})+\int _{{\mathcal{S}}}\langle {a},{y}-{z} \rangle ^{2} N({y},d{z}) $$

is a polynomial of degree at most 2 on \({B}\) for each \({a}\in D\).

Then \(L\) is \({\mathcal{S}}\)-polynomial. Moreover, the drift, diffusion and jump behaviour of the corresponding \({\mathcal{S}}\)-valued polynomial process \((\lambda _{t})_{t\geq 0}\) is governed by these objects. This means that for each \({a}_{1},\ldots ,{a}_{d}\in D\), the \({\mathbb{R}}^{d}\)-valued process

$$ \big((\langle {a}_{1},\lambda _{t}\rangle ,\ldots ,\langle {a}_{d}, \lambda _{t}\rangle )\big)_{t\geq 0} $$

is a semimartingale whose characteristics \((B^{\vec{a}},\widetilde{C}^{\vec{a}},\nu ^{\vec{a}})\) (with \(\widetilde{C}^{\vec{a}}\) denoting the modified second characteristic in the sense of Jacod and Shiryaev [45, Sect. II.2.]) satisfy

$$\begin{aligned} B^{\vec{a}}_{t,i}&=\int _{0}^{t} B({a}_{i}, \lambda _{s})ds, \\ \widetilde{C}^{\vec{a}}_{t,ij}&=\int _{0}^{t} \bigg(Q({a}_{i}\otimes {a}_{j}, \lambda _{s}) \\ &\phantom{=:}\qquad\, +\int _{{\mathcal{S}}}\langle {a}_{i},\lambda _{s}-{z} \rangle \langle {a}_{j},\lambda _{s}-{z}\rangle N(\lambda _{s},d{z}) \bigg)ds, \\ \int \xi _{1}^{k_{1}}\cdots \xi _{d}^{k_{d}}\ \nu _{t}^{\vec{a}}(dt, d \xi )&=dt\int _{\mathcal{S}}\langle {a}_{1},{z}-\lambda _{t}\rangle ^{k_{1}} \cdots \langle {a}_{d},{z}-\lambda _{t}\rangle ^{k_{d}}\ N(\lambda _{t},d{z}) \end{aligned}$$

for all \(k_{1},\ldots ,k_{d}\in {\mathbb{N}}_{0}\) such that \(\sum _{j=0}^{d} k_{j}\geq 3\).

Proof

Setting \(p({y}):=\langle {a},{y}\rangle ^{k}\) and noting that

$$\begin{aligned} Lp({y}) &= k\langle {a},{y}\rangle ^{k-1}B({a}, {y}) \\ & \phantom{=:}+ \frac{k(k-1)}{2} \bigg(Q({a}\otimes {a}, {y})+ \int _{{\mathcal{S}}} \langle {a},{z}-{y}\rangle ^{2} N({y},d{z})\bigg)\langle {a},{y} \rangle ^{k-2} \\ &\phantom{=:}+\sum _{\ell =3}^{k}\binom{k}{\ell }\langle {a},{y}\rangle ^{k-\ell } \int _{{\mathcal{S}}}\langle {a},{z}-{y}\rangle ^{\ell }N({y},d{z}), \end{aligned}$$

the first part of the claim follows. For the second part, we apply Itô’s formula to the process \((p(\langle {a}_{1},\lambda _{t}\rangle ,\ldots ,\langle {a}_{k}, \lambda _{t}\rangle ))_{t\geq 0}\) for all polynomials \(p\). Note that this is justified since \(((\langle {a}_{1},\lambda _{t}\rangle ,\ldots ,\langle {a}_{k}, \lambda _{t}\rangle ))_{t \geq 0}\) is a semimartingale due to (3.1). Comparing the obtained representation with (3.1) inductively over the degree of \(p\) yields the result. □

4.2 Finite-dimensional setting

We show now how our results look in the finite-dimensional setting. Polynomial processes on a finite-dimensional space have been characterised in Cuchiero et al. [25]; see also Filipović and Larsson [36]. The particular example of the Jacobi diffusion with vanishing drift has already been presented in Examples 2.4, 2.8, 3.5 and 3.16.

Let \({B}={\mathbb{R}}^{d}\), \({\mathcal{S}}\subseteq {\mathbb{R}}^{d}\), \(D={B}^{*}={\mathbb{R}}^{d}\) and observe that \(\bigoplus _{j=0}^{k}({ \mathbb{R}}^{d})^{\otimes j}={\mathbb{R}}^{N_{k}}\), where \(N_{k}\) denotes the dimension of the space \(P_{k}\) of polynomials on \({\mathbb{R}}^{d}\) up to degree \(k\). For simplicity, assume that \({\mathcal{S}}\) contains an open set so that there is a one-to-one correspondence between \(P_{k}\) and the space of polynomials on \({\mathcal{S}}\). Fix then an \({\mathcal{S}}\)-polynomial operator \(L:P\to P\), let \(H:=(h_{1},\ldots ,h_{N_{k}})^{\top }\) be a basis of \(P_{k}\), \(H(y)\) the vector of basis elements evaluated at \(y\), and let \(G_{k}\in {\mathbb{R}}^{N_{k}\times N_{k}}\) be the unique matrix such that

$$ L p_{\vec{a}}(y)= H(y)^{\top }G_{k} \vec{a}\qquad \text{for all } \vec{a}\in {\mathbb{R}}^{N_{k}}, $$

where \(p_{\vec{a}}(y)= H(y)^{\top }\vec{a}\). The \(k\)th dual operator \(L_{k}:{\mathbb{R}}^{N_{k}}\to {\mathbb{R}}^{N_{k}}\) corresponding to \(L\) is given by \(L_{k} \vec{a}:=G_{k} \vec{a}\), and the map \((\vec{a}_{t})_{t\geq 0}\) with \(\vec{a}_{t}=e^{tG_{k}}\vec{a}\) solves the system of linear ODEs given by (3.3) for \(\vec{a}_{0}=\vec{a}\). The dual moment formula (Theorem 3.4) then leads to the classical moment formula for finite-dimensional polynomial processes, namely

$$ {\mathbb{E}}[p_{\vec{a}}(\lambda _{T})^{\top }|{\mathcal{F}}_{t}]=H( \lambda _{t})^{\top }e^{(T-t)G_{k}} \vec{a}. $$

On the other hand, we know by (2.6) that \(M_{k}\vec{y}=G_{k}^{\top }\vec{y}\) for all \(\vec{y}\in {\mathbb{R}}^{N_{k}}\). Since the map \((\vec{m}_{t})_{t\geq 0}\) with \(\vec{m}_{t}=e^{tG_{k}^{\top }}H(\lambda _{0})\) is in fact the unique solution (and ℰ-weak solution) of (3.5) for \(\vec{m}_{0}= H(\lambda _{0})^{\top }\), the bidual moment formula (Theorem 3.8) yields

$$ {\mathbb{E}}[H(\lambda _{T})|\lambda _{0}]= e^{TG_{k}^{\top }}H(\lambda _{0}). $$

This result generalises to \({\mathbb{E}}[H(\lambda _{T})|{\mathcal{F}}_{t}]= e^{(T-t)G_{k}^{\top }}H( \lambda _{t})\), as expected.

4.3 Probability-measure-valued setting

Probability-measure-valued polynomial diffusions have been studied extensively in Cuchiero et al. [26]. In that paper, the authors also develop conditions under which existence of solutions to the martingale problem is guaranteed.

Let \(({B},\|{\,\cdot \,}\|)\) be the space of finite signed measures on a Polish space \(E\) and let \(\|{\,\cdot \,}\|\) denote the total variation norm. Let \({\mathcal{S}}\) be the space of probability measures on \(E\) and \(D\) a dense subset of the space \(C_{b}(E)\) of continuous bounded functions on \(E\). Note that in this setting,

$$ \langle {a},{y}\rangle =\int {a}(x){y}(dx) \qquad \text{for all }{a} \in D,{y}\in {\mathcal{S}}. $$

Let then \(L:P^{D}\to P\) be an \({\mathcal{S}}\)-polynomial operator and observe that for each \(k\in {\mathbb{N}}_{0}\), there is a \(k\)th bidual operator \(L_{k}\) admitting the representation given in Remark 2.5 for some auxiliary operators \({\mathcal{L}}_{j}:D({\mathcal{L}}_{j})\to ({B}^{\otimes j})^{*}\). As in Cuchiero et al. [26], we additionally assume that \({\mathcal{L}}_{j} {a}\in C_{b}(E^{j})\) for each \({a}\in D({\mathcal{L}}_{j})\), where \(E^{j}\) is the \(j\)-fold product of \(E\).

Observe that since \({\mathcal{S}}\) is bounded, Remark 3.20 and Example 3.21 yield that condition (ii) of Theorem 3.4 holds true and condition (iii) of the same theorem is implied by (3.8). Assume now that there exists a \(C_{b}(E^{k})\)-valued (classical) solution \(({a}_{t})_{t\geq 0}\) of the \(k\)-dimensional PDE on \([0,T]\) given by

$$ \partial _{t} {a}_{t}( x) = {\mathcal{L}}_{k} \big({a}_{t}({\,\cdot \,}) \big)( x), \qquad {a}_{0}( x)= {a}( x), $$
(4.1)

satisfying (3.8). By Theorem 3.4, we can then conclude that

$$\begin{aligned} &\mathbb{E}\bigg[ \int {a}(x_{1},\ldots ,x_{k}) \lambda _{T}(dx_{1}) \cdots \lambda _{T}(dx_{k}) \,\bigg|\, \mathcal{F}_{t} \bigg] \\ &=\mathbb{E}[ \langle {a}, \lambda _{T}^{\otimes k} \rangle \,|\, \mathcal{F}_{t} ] \\ &= \langle {a}_{T-t}, \lambda ^{\otimes k}_{t}\rangle \\ &=\int {a}_{T-t}(x_{1},\ldots ,x_{k}) \lambda _{t}(dx_{1})\cdots \lambda _{t}(dx_{k}) \end{aligned}$$

for each polynomial process \((\lambda _{t})_{t\geq 0}\) corresponding to \(L\). This result coincides with the conclusion of Theorem 5.3 in Cuchiero et al. [26].

As explained in Cuchiero et al. [26, Remark 5.4], (4.1) can often be seen as the Kolmogorov backward equation corresponding to an \(E^{k}\)-valued process \(Z^{(k)}\) with generator \({\mathcal{L}}_{k}\). If this is the case, the process \((m_{t,k})_{t\geq 0}\) given by \(m_{t,k}:={\mathbb{E}}[\lambda _{t}^{\otimes k}]\) coincides with the law of \(Z_{t}^{(k)}\), and the equation given in (3.5) (formulated with \(\mathcal{M}_{k}\) as specified in Remark 2.9) is given by the Kolmogorov forward equation corresponding to \(Z^{(k)}\). We propose now a concrete example (see Cuchiero et al. [26, Example 4.4] for more details).

Example 4.3

Let \(D=C_{0}^{2}({\mathbb{R}})\) be the space of twice continuously differentiable functions on ℝ vanishing at infinity. The Fleming–Viot diffusion \((\lambda _{t})_{t\geq 0}\) was introduced by Fleming and Viot [39] and subsequently studied by several other authors (see e.g. Ethier and Kurtz [32, Chap. 10.4]). This process takes values in the space of probability measures on ℝ, again denoted by \({\mathcal{S}}\).

Recall that \(\partial p({y})\in D\) for all \(p\in P^{D}\) and \({y}\in {\mathcal{S}}\), which in particular means that \(\partial p({y})\) is a continuous bounded map on ℝ. Similarly, \(\partial ^{2} p({y})\in D^{\otimes 2}\) is a \(C_{0}\)-map on \({\mathbb{R}}^{2}\). The generator \(L\) of a Fleming–Viot diffusion acts on polynomials \(p\in P^{D}\) by

$$ Lp({y}) = \big\langle {\mathcal{G}}\big(\partial p({y}) \big), {y} \big\rangle + \frac{1}{2} \big\langle \Psi \big(\partial ^{2} p({y}) \big),{y}^{\otimes 2}\big\rangle , \qquad {y}\in {\mathcal{S}}, $$

where \({\mathcal{G}}:D\to C_{0}({\mathbb{R}})\) is given by \({\mathcal{G}}g:=\frac{1}{2} \sigma ^{2}g''\) for some \(\sigma \in {\mathbb{R}}\) and the map \(\Psi :D\otimes D\to C_{0}({\mathbb{R}}^{2})\) by \(\Psi {a}(x_{1},x_{2})=\frac{1}{2} ({a}(x_{1},x_{1})+{a}(x_{2},x_{2})-2{a}(x_{1},x_{2}))\). Observe that \(L\) is \({\mathcal{S}}\)-polynomial.

Now, using the representation introduced in Remark 2.5, we can see that we have \({\mathcal{L}}_{1}{a}(x) =\frac{1}{2} \sigma ^{2}{a}''(x)={ \mathcal{G}}{a}(x)\) and

$$ {\mathcal{L}}_{2}{a}( x) =\frac{1}{2} \sigma ^{2}\bigg( \frac{\partial }{\partial x_{1}^{2}}{a}( x)+ \frac{\partial }{\partial x_{2}^{2}}{a}( x)\bigg) + \int \big({a}( x+ \xi )-{a}( x)\big){N}( x,d \xi ), $$

where \({N}( x,d \xi )=\frac{1}{2} (\delta _{(0,x_{1}-x_{2})}(d \xi )+\delta _{(x_{2}-x_{1},0)}(d \xi ))\). One can see that \({\mathcal{L}}_{1}\) coincides with the generator of the real-valued diffusion \(Z^{(1)}\) given by \(Z^{(1)}=\sigma W\), where \((W_{t})_{t\geq 0}\) denotes a Brownian motion. Moreover, (4.1), which reads

$$ \partial _{t}{a}_{t}(x)=\frac{1}{2} \sigma ^{2}{a}_{t}''(x),\qquad {a}_{0}(x)= {a}(x), $$

coincides with the corresponding Kolmogorov backward equation and thus with the heat equation. Since it is solved by \({a}_{t}(x):={\mathbb{E}}[ {a}(Z_{t}^{(1)})|Z_{0}^{(1)}=x]\), Theorem 3.4 yields

$$ {\mathbb{E}}\bigg[\int {a}(x)\lambda _{T}(dx)\bigg|{\mathcal{F}}_{t} \bigg]={\mathbb{E}}[ {a}(Z_{T-t}^{(1)})|Z_{0}^{(1)}\sim \lambda _{t}]. $$

On the other hand, one can see that under the ansatz \(m_{t,1}(dx)=f_{t}(x)dx\), we have

$$ \int {\mathcal{L}}_{1} {a}(x)f_{t}(x)dx=\int \frac{1}{2} \sigma ^{2} {a}''(x)f_{t}(x)dx= \int {a}(x)\frac{1}{2} \sigma ^{2} f''_{t}(x)dx $$

for all \({a}\in C^{2}_{0}({\mathbb{R}})\), showing that the first bidual operator is given by

$$ {\mathcal{M}}_{1}\big(f_{t}(x)dx\big)=\frac{1}{2} \sigma ^{2} \big(f_{t}''(x)dx \big). $$

By Theorem 3.8, we thus conclude that the family of measures \({\mathbb{E}}[\lambda _{t}](dx):=f_{t}(x)dx\) indexed by \(t\) satisfies

$$ \partial _{t}f_{t}(x)=\frac{1}{2} \sigma ^{2}f_{t}''(x),\qquad f_{0}(x)dx= \lambda _{0}(dx), $$

which as expected coincides with the Kolmogorov forward equation for \(Z^{(1)}\). Since the conditions of Corollary 3.14 are satisfied and \({\mathbb{P}}[Z_{t}^{(1)}\in {\,\cdot \,}|Z_{0}^{(1)}=\lambda _{0}]\) solves the given PDE, we can conclude that \({\mathbb{E}}[\lambda _{t}]={\mathbb{P}}[Z_{t}^{(1)}\in {\,\cdot \,}|Z_{0}^{(1)}= \lambda _{0}]\).

Let us now focus on \({\mathcal{L}}_{2}\). This operator coincides with the generator of a jump-diffusion \((Z_{t}^{(2)})_{t\geq 0}\) taking values in \(\mathbb{R}^{2}\). Between two jumps, this process moves like \((\sigma W_{t})_{t\geq 0}\), where \((W_{t})_{t\geq 0}\) now denotes a 2-dimensional Brownian motion. When a jump occurs after an exponential time with intensity 1, the process jumps either vertically or horizontally to the diagonal. Again, (4.1) coincides with the Kolmogorov backward equation corresponding to \(Z^{(2)}\) and the corresponding solution is given by \((t,x)\mapsto {a}_{t}(x):={\mathbb{E}}[ {a}(Z_{t}^{(2)})|Z_{0}^{(2)}=x]\). Theorem 3.4 then yields

$$ {\mathbb{E}}\bigg[\int {a}(x_{1},x_{2})\lambda _{T}(dx_{1})\lambda _{T}(dx_{2}) \bigg|{\mathcal{F}}_{t}\bigg]={\mathbb{E}}[ {a}(Z_{T-t}^{(2)})|Z_{0}^{(2)} \sim \lambda _{t}\otimes \lambda _{t}]. $$

Proceeding as before, we can use Corollary 3.14 to conclude that the only probability measure \(m_{t,2}(dx):=f_{t}(x)dx\) supported on \(\mathbb{R}^{2}\) satisfying the forward Kolmogorov equation for \(Z^{(2)}\),

$$ \partial _{t}f_{t}(x)=\frac{1}{2} \sigma ^{2} \Delta f_{t}(x)+ \frac{1}{2}\int f_{t}(x)\big(\delta _{x_{2}}(dx_{1})+\delta _{x_{1}}(dx_{2}) \big)-f_{t}(x), $$

is given by \(m^{2}_{t}={\mathbb{E}}[\lambda _{t}\otimes \lambda _{t}]={\mathbb{P}}[Z_{t}^{(2)} \in {\,\cdot \,}|Z_{0}^{(2)}\sim \lambda _{0}\otimes \lambda _{0}]\).

As a final remark, observe that in the case of Example 4.3, the bidual system of ODEs could be formulated using a fairly strong formulation. It is, however, well known (see for instance Figalli [33]) that forward Kolmogorov equations can be treated using a weak formulation, consistently with the notion of solution used in Theorem 3.8.

4.4 Polynomial forward variance curve models

This section is dedicated to introducing polynomial forward variance curve models. The main motivation for this class of models stems from the rather new paradigm of rough volatility (see e.g. Alòs et al. [7], Gatheral et al. [42] or Bayer et al. [8]). Rough volatility or rough variance is usually introduced via stochastic Volterra processes with singular kernels (e.g. Abi Jaber et al. [3,1]). These processes are non-Markovian, but a Markovian structure can be established by lifting them to infinite dimensions (see Cuchiero and Teichmann [28,27]). One such lift is the forward variance curve, i.e., one considers the curve \(x \mapsto \lambda _{t}(x):=\mathbb{E}[V_{t+x}\, |\, \mathcal{F}_{t}]\) with \((V_{t})_{t \geq 0}\) being the (rough) spot variance and \(x\) the time to maturity, which corresponds to the so-called Musiela parametrisation.

Forward variance curve models of course have a longer history and do not just date back to the introduction of rough volatility. Indeed, Bergomi [13, 14, 15] proposed them to achieve a market-consistent forward skew which cannot be reproduced by traditional stochastic volatility models even with jumps. We refer also to related work by Buehler [19]. Instead of modelling the spot volatility or variance, the idea is to specify the dynamics of the forward variance curve, similarly to the Heath–Jarrow–Morton framework in interest rate theory. Due to the martingale property of \((\mathbb{E}[V_{T}|\mathcal{F}_{t}])_{t \leq T}\), the dynamics of the forward curve process \((\lambda _{t})_{t \geq 0}\) are necessarily of the form

$$\begin{aligned} d\lambda _{t}(x)={\mathcal{A}}\lambda _{t}(x)dt + dM_{t} \end{aligned}$$
(4.2)

for some general function-space-valued martingale \((M_{t})_{t\geq 0}\) that we here specify as a polynomial process. The operator \({\mathcal{A}}\) is necessarily the first (space) derivative and thus corresponds to the generator of the shift semigroup. To make the shift semigroup strongly continuous so that we can treat the SPDE (4.2) by standard theory, we work with the following Hilbert space of forward curves introduced by Filipović [34, Sect. 5].

Let \(\alpha :{\mathbb{R}}_{+}\to [1,\infty )\) be a nondecreasing \(C^{1}\)-function with \(\alpha ^{-1}\in L^{1}({\mathbb{R}}_{+})\). Set then

$$ {B}=\{{y}\in AC({\mathbb{R}}_{+};{\mathbb{R}}) : \|y\|_{\alpha }< \infty \}, $$

where \(AC({\mathbb{R}}_{+};{\mathbb{R}})\) denotes the space of absolutely continuous functions from \({\mathbb{R}}_{+}\) to ℝ and \(\|{y}\|_{\alpha }^{2}:=|{y}(0)|^{2}+\int _{0}^{\infty }|y'(x)|^{2} \alpha (x)dx\). By Filipović [34, Theorem 5.1.1], we know that \({B}\) is a Hilbert space with respect to the scalar product

$$ \langle {a},{y}\rangle _{\alpha }:={a}(0){y}(0)+\int _{0}^{\infty }{a}'(x){y}'(x) \alpha (x)dx, $$

and we thus identify its dual \(B^{*}\) with \(B\). Moreover, by Benth and Krühner [12, Lemma 3.2], we also know that \({B}\subseteq {\mathbb{R}}+C_{0}({\mathbb{R}}_{+})\), which is the space of continuous functions with continuous continuation to infinity.

In order to simplify the computations, we consider here instead of the generalization of the norm \(\|\cdot\|_{\alpha}\) to \(B^{\otimes k}\) the symmetric projective norm

$$ \|{y}\|_{\times }:=\inf \bigg\{ \sum _{i=1}^{n}|\alpha _{i}|\|{y}_{i}\|_{\alpha }^{k}\colon {y}=\sum _{i=1}^{n}\alpha _{i} {y}_{i}^{\otimes k} \bigg\} ,\qquad {y}\in {B}^{\otimes k}. $$

Note that since \({B}\) is a Hilbert space, by Floret [40] (or also Janson [47]), this norm coincides with the projective tensor norm in the sense of Ryan [58, Sect. 2] and is thus a crossnorm. This choice is particularly convenient since in order to check that \(\|{a}\|_{*k} \leq C\) for some \({a}\in ({B}^{\otimes k})^{*}\), it is enough to verify that

$$ |{a}({y}^{\otimes k})|\leq C\|{y}\|_{\alpha }^{k}\qquad \text{ for each }{y}\in {B}. $$
(4.3)

Similarly, condition (4.3) is enough for checking that a linear map \({a}\) belongs to \(({B}^{\otimes k})^{*}\). Finally, recall that the projective norm is the largest crossnorm (see Ryan [58, Proposition 6.1]). This in particular implies that the space of coefficients obtained by considering the projective norm is larger than the space of coefficients obtained considering any other crossnorm.

As forward variance process, the solution of (4.2) should take values in the cone of nonnegative functions. Conditions under which diffusion processes taking values in Hilbert spaces of functions stay nonnegative have been studied for instance in Kotelenez [50] or Milian [55]. For the jump-diffusion case, we refer to Filipović et al. [38]. Generally, it suffices to ensure that the one-dimensional processes \((\lambda _{t}(x))_{t\geq 0}\) remain nonnegative. A necessary condition is that the respective volatility vanishes when \((\lambda _{t}(x))_{t\geq 0}\) reaches 0.

In this paper, we do not address the questions of existence, uniqueness and positivity. The goal is to cast (4.2) in the polynomial framework. The conditions of Kotelenez [50] or Milian [55] can then be applied for some specific model choices.

Consider the operator

$$ \mathcal{A}: \operatorname{dom}( \mathcal{A})\to {B},\qquad { \mathcal{A}}{y}:={y}', $$
(4.4)

where \(\operatorname{dom}( \mathcal{A}):=\{{y}\in {B}\colon {\mathcal{A}}{y} \in {B}\}\). In order to define an appropriate set \(D\) of coefficients, we have to make sure that the adjoint \(\mathcal{A}^{*}\) of \({\mathcal{A}}\) is well defined on \(D\). Let therefore

$$ \operatorname{dom}(\mathcal{A}^{*}):= \{ {a}\in {B}\colon \exists C \geq 0 \text{ such that $| \langle {a}, \mathcal{A} y \rangle _{\alpha }| \leq C \|{y}\|_{\alpha }$ for all ${y}\in \operatorname{dom}(\mathcal{A})$\}}, $$

and define the adjoint \({\mathcal{A}}^{*}:\operatorname{dom}( \mathcal{A^{*}})\to {B}\) as usual as the linear operator that is uniquely determined by

$$ \langle {\mathcal{A}}^{*}{a},{y}\rangle _{\alpha }=\langle {a},{ \mathcal{A}}y\rangle _{\alpha },\qquad {a}\in \operatorname{dom}( \mathcal{A}^{*}), {y}\in \operatorname{dom}( \mathcal{A}) . $$

Fix then \(D\subseteq \operatorname{dom}( \mathcal{A}^{*})\) and let \((\lambda _{t})_{t\geq 0}\) be a polynomial process corresponding to the linear operator \(L:P^{D}\to P\) given by

$$\begin{aligned} Lp({y}):=\big\langle {\mathcal{A}}^{*}\big(\partial p({y})\big),{y} \big\rangle _{\alpha }+ \frac{1}{2} \sum _{i=0}^{2}\big\langle Q^{i}\big( \partial ^{2}p({y})\big),{y}^{\otimes i}\big\rangle _{\alpha } \end{aligned}$$
(4.5)

for some linear operators \(Q^{i}:D\otimes D\to ({B}^{\otimes i})^{*}\).

In the next lemma, we establish the connection between forward variance curve models as given in (4.2) and such polynomial diffusions.

Lemma 4.4

Let \((M_{t})_{t\geq 0}\) be a \({B}\)-valued square-integrable continuous martingale. Let \((\lambda _{t})_{t\geq 0}\) be an analytically (and also probabilistically) weak solution of the SPDE

$$\begin{aligned} d\lambda _{t}={\mathcal{A}}\lambda _{t}dt+dM_{t}, \end{aligned}$$
(4.6)

i.e., \(\langle {a}, \lambda _{t} \rangle _{\alpha }=\langle {a},\lambda _{0} \rangle _{\alpha } + \int _{0}^{t} \langle \mathcal{A}^{*} {a}, \lambda _{s} \rangle _{\alpha } ds+\langle {a}, M_{t} \rangle _{\alpha } \) for each \(a\in D \subseteq \operatorname{dom}(\mathcal{A}^{*})\). Suppose that the dynamics of the quadratic variation process are given by

$$ d[\langle {a},\lambda _{\cdot }\rangle _{\alpha },\langle {a},\lambda _{\cdot }\rangle _{\alpha }]_{t}=[\langle {a},M_{\cdot }\rangle _{\alpha }, \langle {a},M_{\cdot }\rangle _{\alpha }]_{t} =\sum _{i=0}^{2}\langle Q^{i}({a} \otimes {a}),\lambda _{t}^{\otimes i}\rangle _{\alpha }dt. $$

Then \((\lambda _{t})_{t\geq 0}\) is a polynomial process corresponding to \(L\) given in (4.5).

Proof

We have to prove that \((\lambda _{t})_{t \geq 0}\) is a solution to the martingale problem for the polynomial operator \(L\). The existence of càdlàg versions of \(t \mapsto p(\lambda _{t})\) and \(t \mapsto Lp( \lambda _{t})\) for \(p\in P^{D}\) is clear since \(M\) is continuous. Moreover, by Itô’s formula, \(N^{p}\) as in (3.1) is a martingale for every \(p \in P^{D}\), which proves the assertion. □

Remark 4.5

  1. (i)

    By using the concept of a mild solution of (4.6), we can weaken the assumptions on \(M\) to allow for instance

    $$\begin{aligned} M_{t}(x)=\int _{0}^{t} K(x)\lambda _{s}(x) dW_{s}, \end{aligned}$$
    (4.7)

    where \(K \in L^{2}_{\text{loc}}({\mathbb{R}}_{+})\) is a fractional kernel with \(K(t)\approx t^{\alpha }\) for \(\alpha \in (-\frac{1}{2}, 0)\) having a singularity at 0 so that \(M_{t}\) is not an element in \(B\). This form of \(M\) is an important example for rough volatility modelling as we shall see in Sects. 4.4.2 and 4.4.3 below. Denoting the shift semigroup by \((S_{t})_{t \geq 0}\), a weakly mild solution of (4.6) is given by

    $$ \langle {a}, \lambda _{t} \rangle _{\alpha }=\langle {a},S_{t}\lambda _{0} \rangle _{\alpha } +\int _{0}^{t} \langle {a}, S_{t-s}dM_{s} \rangle _{ \alpha }, \qquad {a}\in B, $$

    where it is just required that \(\int _{0}^{t} \langle {a}, S_{t-s}dM_{s} \rangle _{\alpha }\) is well defined. This is for instance the case if we consider the example given in (4.7), where we obtain

    $$ \int _{0}^{t} \langle {a}, S_{t-s}dM_{s} \rangle _{\alpha }=\int _{0}^{t} \langle {a}, K(t-s+{\,\cdot \,})\lambda _{t}(t-s+{\,\cdot \,}) \rangle _{\alpha }dW_{s}. $$
  2. (ii)

    One can still consider a weak solution concept when one restricts the set \(D \subseteq \operatorname{dom}(\mathcal{A}^{*})\) to elements \({a}\in D\) for which we can make sense out of \(\langle {a}, M_{t}\rangle _{\alpha }\), even if \(M_{t}\) is not in \(B\). To deal with kernels \(K\) with a singularity at 0 as in (4.7), we introduce some new notation. We let \(B_{z}:=\{{y}:S_{z}{y}\in {B}\}\) and define \(\widetilde{{B}}:=\bigcap _{z>0}{B}_{z}\). For \({a}\in {B}\) and \(K\in \widetilde{{B}}\), set \(\langle \langle {a},K\rangle \rangle _{\alpha }:=\lim _{z\to 0 } \langle {a},K({\,\cdot \,}+z)\rangle _{\alpha }\) whenever the limit exists, and \(\langle \langle {a},K\rangle \rangle _{\alpha }:=\infty \) otherwise. Observe in particular that \(\langle \langle {a},{y}\rangle \rangle _{\alpha }=\langle {a},{y} \rangle _{\alpha }\) for each \({y}\in {B}\).

  3. (iii)

    Weakly mild solutions are then actually weak solutions when restricting the set \(D \subseteq \operatorname{dom}(\mathcal{A}^{*})\) to elements \({a}\in D\) for which \(\langle \langle {a}, M_{t} \rangle \rangle _{\alpha }\) is well defined (see for instance Example 4.4.2 below). More precisely, a weakly mild solution \((\lambda _{t})_{t\geq 0}\) of (4.6) satisfies

    $$ \langle {a}, \lambda _{t} \rangle _{\alpha }=\langle {a},\lambda _{0} \rangle _{\alpha } + \int _{0}^{t} \langle \mathcal{A}^{*} {a}, \lambda _{s} \rangle _{\alpha } dt+\langle \langle {a}, M_{t} \rangle \rangle _{\alpha }. $$

    This follows e.g. from the results in Kunze [51, Proposition 6.3]. Hence a weakly mild solution solves the martingale problem for such a restricted set \(D\).

4.4.1 Moments of the VIX index

As a concrete application of the moment formula in the case of polynomial forward variance curve models, we have pricing of VIX options in mind. If the payoff is a polynomial in \(\mathrm{VIX}_{T}^{2}\) (defined below) for some maturity \(T >0\), we obtain analytical expressions. Otherwise, e.g. in the case of a call option on \(\mathrm{VIX}\), we can either approximate the payoff via polynomials (as e.g. in Benth et al. [11, Sect. 4.1]) or use the moments as control variates for variance reduction in Monte Carlo pricing as outlined in Cuchiero et al. [25]. If polynomial approximation does not work well (e.g. when the support of \(\mathrm{VIX}_{T}^{2}\) cannot be easily restricted to a compact set), variance reduction in Monte Carlo pricing is crucial since simulation of these infinite-dimensional forward variance processes is time-consuming.

Define the VIX at time \(t\) via the continuous-time monitoring formula

$$ {\mathrm{VIX}}_{t}^{2}:= \frac{1}{\Delta }\int _{0}^{\Delta } \lambda _{t}(x) dx, $$

where \(\Delta \) is typically 30 days (see e.g. Horvath et al. [44]). This is clearly a linear functional of \(\lambda \), explicitly

$$\begin{aligned} {\mathrm{VIX}}_{t}^{2}=\langle \widehat{{a}},\lambda _{t}\rangle _{\alpha } \end{aligned}$$

for

$$ \widehat{a}(x):=1+\frac{1}{\Delta }\int _{0}^{x}(\Delta -\Delta \land z) \alpha (z)^{-1}dz, $$

which lies in \(\operatorname{dom}(\mathcal{A}^{*})\) and also in subsets \(D\subseteq \operatorname{dom}(\mathcal{A}^{*})\) that we consider below. The risk-neutral valuation formula for an option on \(\mathrm{VIX}^{2}\) with payoff \(\phi \) (for the VIX future, \(\phi (x) =\sqrt{x}\), and for the call on VIX, \(\phi (x) =(\sqrt{x}-K)^{+}\)) is then

$$ \mathbb{E} [ \phi (\mathrm{VIX}^{2}_{t}) ]= \mathbb{E}\bigg[ \phi \bigg( \frac{1}{\Delta }\int _{0}^{\Delta } \lambda _{t}(x) dx\bigg) \bigg]= { \mathbb{E}}[\phi (\langle \widehat{{a}},\lambda _{t}\rangle _{\alpha })]. $$
(4.8)

Modulo technicalities related to the true martingale property of (3.1), the polynomial property of \((\lambda _{t})_{t\geq 0}\) implies that for \(\phi (x)=x^{k}\), the expression in (4.8) can be computed by solving a system of infinite-dimensional linear ODEs. For all other payoffs, in particular futures and calls, we can then resort to polynomial approximation or Monte Carlo pricing with considerable variance reduction due to the moment control variates.

Below, we analyse two concrete specifications of polynomial forward variance models, namely the (rough) Bergomi model (Sect. 4.4.2) and a polynomial Volterra model (Sect. 4.4.3). In both cases, we give an explicit formula (up to a Lebesgue integration on \(\mathbb{R}^{k}\)) for the \((2k)\)th moment of the VIX. The following gives a road map for our proof strategy, following the idea explained in Remark 3.10. For \(\widehat{{a}}\), we write

$$ \langle \widehat{{a}}^{\otimes k},f\rangle _{\alpha }:= \frac{1}{\Delta ^{k}}\int _{[0,\Delta ]^{k}}f(x)dx $$
(4.9)

for each \(f:{\mathbb{R}}_{+}^{k}\to {\mathbb{R}}\) such that (4.9) is well defined. Note that this defines a pairing on the tensor products only for the specific element \(\widehat{{a}}^{\otimes k}\) which then coincides with \(\langle \langle \widehat{{a}},{y}\rangle \rangle _{\alpha }^{k}\) for \(f:={y}^{\otimes k}\) and \({y}\in \widetilde{{B}}\), and also with \(\langle \widehat{{a}},{y}\rangle _{\alpha }^{k}\) for \(f:={y}^{\otimes k}\) and \({y}\in {B}\).

  1. (i)

    In both examples, the \({B}\)-polynomial operators are homogenous, i.e., they map homogeneous polynomials of degree \(k\) to homogeneous polynomials of degree \(k\). Therefore we can express the \(n\)th bidual operator \(M_{n}\) as \(M_{n}\vec{y}=({\mathcal{M}}_{0}{y}_{0},\ldots ,{\mathcal{M}}_{n}{y}_{n})\) (see Remark 2.9 for the ℳ-notation).

  2. (ii)

    We observe that for each \({y}\in (\operatorname{dom}(\mathcal{A}))^{\otimes k}\), the corresponding ODE

    $$ \partial _{t}m_{t}={\mathcal{M}}_{k} m_{t},\qquad m_{0}={y}, $$
    (4.10)

    can be seen as a PDE that has a classical strong solution (see Remark 4.6 below) on \(E^{k}\times [0,T]\) for some \(E\subseteq {\mathbb{R}}_{+}\) such that \(E\cap [0,\Delta +T]\) has full Lebesgue measure. We denote it by \((Z_{t}{y})_{t\geq 0}\) for some \(Z_{t}{y}:E^{k}\to {\mathbb{R}}\) such that (4.9) is well defined for \(f=Z_{t}y\) and

    $$\begin{aligned} \langle \widehat{{a}}^{\otimes k},Z_{t}{y}\rangle _{\alpha }\leq c_{t} \|y\|_{\times } \end{aligned}$$
    (4.11)

    for some \(t\)-dependent constant \(c_{t}\) not depending on \({y}\). It will turn out that (4.10) corresponds to a Cauchy problem associated to a \(k\)-dimensional Markov process and thus that \(Z_{t}y\) is the unique classical solution of (4.10). It is, however, not clear at this point that we have weak solutions in the sense of Definition 3.7 and that the conditions of Corollary 3.12 are satisfied for \({\mathcal{H}}= \{\widehat{a}^{\otimes k}\}\). Therefore we cannot directly conclude that \(\langle \widehat{a}^{\otimes k} , Z_{t}y \rangle _{\alpha }= \mathbb{E}[{\mathrm{VIX}}_{t}^{2k}]\).

  3. (iii)

    The next step consists of using \((Z_{t}y)_{t\ge 0}\) to construct an ansatz for the solution \((a_{t})_{t \geq 0}\) of the dual system of ODEs. Due to (4.11), we can define \({a}_{t}\in ({B}^{\otimes k})^{*}\) as \(\langle {a}_{t},{y}\rangle _{\alpha }:=\langle \widehat{{a}}^{ \otimes k},Z_{t}{y}\rangle _{\alpha }\) for each \({y}\in {B}^{\otimes k}\). Here, \({a}_{t}\) denotes a candidate solution (in the sense of Definition 3.3) of \(\partial _{t}{a}_{t}={\mathcal{L}}_{k}{a}_{t}\) for \({a}_{0}=\widehat{{a}}^{\otimes k}\).

  4. (iv)

    Finally, we verify that \(({a}_{t})_{t\geq 0}\) satisfies the conditions of the dual moment formula in Theorem 3.4 and we can thus conclude that

    $$ {\mathbb{E}}[ (\mathrm{VIX}_{t}^{2})^{k}|\lambda _{0}={y}]={\mathbb{E}}[ \langle \widehat{a}^{\otimes k},\lambda _{t}^{\otimes k}\rangle _{\alpha }|\lambda _{0}={y}]=\langle {a}_{t},{y}^{\otimes k}\rangle _{\alpha }\!\!=\langle \widehat{{a}}^{\otimes k},Z_{t}{y}^{\otimes k} \rangle _{\alpha }. $$

Remark 4.6

A classical strong solution of (4.10) seen as a PDE requires that we have \(\partial _{t}m_{t}(x)={\mathcal{M}}_{k} m_{t}(x)\) and \(m_{0}(x)={y}(x)\) for each \(x\) in \({\mathbb{R}}_{+}^{k}\). It may thus fail to be an ℋ-weak solution for an arbitrarily chosen set ℋ.

We now specify the two concrete examples and employ the above program for the computation of the VIX moments.

4.4.2 The (rough) Bergomi model and its VIX moments

The first example corresponds to the Bergomi model, either in its rough form (e.g. Bayer et al. [8] or Horvath et al. [44]) or in the original form, depending on the choice of the kernel.

Model specification

Recall the notation of Remark 4.5 (ii). For \(\ell =1, \ldots , m\), let \(K_{\ell } \in L^{2}_{\text{loc}}\) denote some (potentially fractional) kernels such that \(K_{\ell }\in \widetilde{{B}}\) and

$$ \sum _{\ell =1}^{m}\int _{0}^{T}\sup _{t\in [0,T]} K_{\ell }(x+t)^{2}dx< \infty $$
(4.12)

for each \(T>0\). Define

$$ D:= \bigg\{ {a}\in \operatorname{dom}(\mathcal{A}^{*})\colon \exists C \geq 0 \text{ with } \sum _{\ell =1}^{m}|\langle \langle {a},K_{\ell }{y} \rangle \rangle _{\alpha }|\leq C\|{y}\|_{\alpha }\text{ for all } y\in B \bigg\} . $$

Consider (4.5) with \(Q^{0}=0\), \(Q^{1}=0\) and \(Q^{2}:D\otimes D\to ({B}\otimes {B})^{*}\) being uniquely determined by \(\langle Q^{2} ({a}\otimes {a}),{y}\otimes {y}\rangle _{\alpha }=\sum _{ \ell =1}^{m}\langle \langle {a},K_{\ell }{y}\rangle \rangle _{\alpha }^{2}\) for \({a}\in D\) and \({y}\in B\). The corresponding SPDE (4.6) can then be realised as

$$\begin{aligned} d\lambda _{t}(x) = {\mathcal{A}}\lambda _{t}(x)dt+\sum _{\ell =1}^{m} K_{\ell }(x) \lambda _{t} (x) dB_{t}^{\ell }, \end{aligned}$$
(4.13)

where \(B^{1},\ldots ,B^{m}\) are \(m\) independent Brownian motions. Although we do not speak here about existence of solutions to (4.13), the solution concept we have in mind is a mild one as outlined in Remark 4.5 (i). Note in particular that for \(a \in B\),

$$\begin{aligned} \int _{0}^{t} \langle a , S_{t-s} dM_{s}\rangle _{\alpha } &= \sum _{ \ell =1}^{m} \int _{0}^{t}\langle a , S_{t-s} K_{\ell }\lambda _{s} dB_{s}^{\ell }\rangle _{\alpha } \\ &= \sum _{\ell =1}^{m} \int _{0}^{t}\langle a , K_{\ell }(t-s+\cdot ) \lambda _{s}(t-s+\cdot ) \rangle _{\alpha }dB_{s}^{\ell } \end{aligned}$$

is well defined since \(K_{\ell }(t-s+\cdot ) \lambda _{t}(t-s+\cdot )\) takes values in \(B\) for all \(0 \leq s < t\).

To see that this setting includes the (rough) Bergomi model, compare for instance with Horvath et al. [44]. The corresponding \(k\)th dual operator \(L_{k}\) satisfies the conditions of Remark 2.5 and can thus be written as \(L_{k}\vec{a}=({\mathcal{L}}_{0}{a}_{0},\ldots , {\mathcal{L}}_{k}{a}_{k})\), where

$$ \langle {\mathcal{L}}_{j}{a}^{\otimes j},{y}^{\otimes j}\rangle _{\alpha }=j\langle {\mathcal{A}}^{*}{a},{y}\rangle _{\alpha }\langle {a},y \rangle _{\alpha }^{j-1} +\frac{j(j-1)}{2} \sum _{\ell =1}^{m}\langle \langle {a},K_{\ell }{y}\rangle \rangle _{\alpha }^{2}\langle {a},{y} \rangle _{\alpha }^{j-2}. $$

VIX moments

We now consider the computation of the moments of the VIX in this model. An explicit formula is given in the following result whose proof is postponed to Sect. B. Recall the notion of a weakly mild solution from Remark 4.5 (i).

Proposition 4.7

Fix \(k \in \mathbb{N}\) and let \((\lambda _{t})_{t \geq 0}\) be a weakly mild solution of (4.13). Assume that we have \({\mathbb{E}}[\sup _{t\leq T}\|\lambda _{t}\|_{\alpha }^{2k}]<\infty \) and for each \(x=(x_{1}, \ldots , x_{k})\), define \(V_{k}(x):=\sum _{ \ell =1}^{m} \sum _{i< j}K_{\ell }(x_{i})K_{\ell }(x_{j}) \). Then for each \(\lambda _{0}\in {B}\), we have

$$ \mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ] = \frac{1}{\Delta ^{k}}\int _{[0,\Delta ]^{k}}\prod _{i=1}^{k}\lambda _{0}(x_{i}+t)e^{ \int _{0}^{t}V_{k}( x+\tau 1)d\tau }d x, $$

where \(1 \in {\mathbb{R}}^{k}\) denotes the vector consisting of ones.

In the following two examples, we give concrete specifications of the kernels.

Example 4.8

Applying Proposition 4.7 to the classical Bergomi model (see Bergomi [13,14]) where we have \(K_{\ell }(x)=\omega _{\ell }e^{-\gamma _{\ell }x}\) for some constants \(\omega _{\ell }\) and \(\gamma _{\ell }\) yields

$$ \mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ]= \frac{1}{\Delta ^{k}}\int _{[0,\Delta ]^{k}}\prod _{i=1}^{k}\lambda _{0}(x_{i}+t) \prod _{i< j}e^{\sum _{\ell =1}^{m} \frac{\omega _{\ell }^{2}}{2\gamma _{\ell }}(1-e^{-2\gamma _{\ell }t})e^{- \gamma _{\ell }(x_{i}+x_{j})}}dx. $$

Example 4.9

In the case of the rough Bergomi model (see Bayer et al. [8]), we have \(m=1\) and \(K_{1}(x)=x^{H-1/2}\) for \(H\in (0,1/2)\) (modulo a multiplicative constant). In this case, we get

$$ \mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ]= \frac{1}{\Delta ^{k}}\int _{[0,\Delta ]^{k}}\prod _{i=1}^{k}\lambda _{0}(x_{i}+t) \prod _{i< j}e^{\int _{0}^{t} ((x_{i}+\tau )(x_{j}+\tau ) )^{H-1/2}d \tau }d x. $$

Setting \(\mathrm{VIX}^{2}_{0,t}:=\frac{1}{\Delta }\int _{0}^{\Delta }\lambda _{0}(x+t)dx\), an inspection of this expression yields the estimate \(\mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ] \leq (\mathrm{VIX}^{2}_{0,t})^{k} e^{\frac{k(k-1)}{2}\int _{0}^{t}\tau ^{2(H-1/2)}d\tau } \), proving that the moments of \(\mathrm{VIX}^{2}_{t}\) are bounded from above by the moments of a lognormal random variable \(\overline{X}^{2}\) with parameters \(\mu =\ln {\mathrm{VIX}}^{2}_{0,t} -t^{2H}/4H\) and \(\sigma ^{2}=t^{2H}/2H\). Similarly, since we have \(\mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ] \geq (\mathrm{VIX}^{2}_{0,t})^{k} e^{\frac{k(k-1)}{2}\int _{0}^{t}(\Delta +\tau )^{2(H-1/2)}d\tau } \), we also obtain that the moments of \(\mathrm{VIX}^{2}_{t}\) are bounded from below by the moments of a lognormal random variable \(\underline{X}^{2}\) with parameters

$$ \mu =\ln {\mathrm{VIX}}^{2}_{0,t} -((t+\Delta )^{2H}-\Delta ^{2H})/4H \ \text{and} \ \sigma ^{2}=((t+\Delta )^{2H}-\Delta ^{2H})/2H. $$

This type of relation to lognormal random variables has been used in Horvath et al. [44], where lognormal control variates are employed for variance reduction in Monte Carlo simulations.

Finally, observe that from the proof of Proposition 4.7, we know that the function \(m_{t}(x):=\lambda _{0}(x+t1)e^{\int _{0}^{t}V_{k}(x+ \tau 1)d\tau }\) is a candidate solution for the bidual system of ODEs corresponding to \({\mathcal{M}}_{k}\). This suggests that \(\langle {a}^{\otimes k},m_{t}\rangle _{\alpha }\), whenever it is well defined, should coincide with \({\mathbb{E}}[\langle {a},\lambda _{t}\rangle _{\alpha }^{k}|\lambda _{0}]\). For the moments of the spot volatility, these heuristics therefore suggest that \({\mathbb{E}}[\lambda _{t}(0)^{k}|\lambda _{0}]=\lambda _{0}(t)^{k}e^{ \frac{k(k-1)}{4H}t^{2H}}\).

4.4.3 A polynomial Volterra model and its VIX moments

The following example corresponds to a polynomial Volterra process for the spot variance and extends therefore (rough) affine models and the Jacobi stochastic volatility model of Ackerer and Filipović [4]. As explained below, we understand here stochastic Volterra processes in the sense of Abi Jaber et al. [1].

Model specification

Let \(K \in L^{2}_{\text{loc}}\) again denote some (potentially fractional) kernel such that \(K \in \widetilde{{B}}\) in the sense of Remark 4.5 (ii). Define \(D\) as

$$ D:= \{{a}\in \operatorname{dom}(\mathcal{A}^{*}): | \langle \langle {a},K \rangle \rangle _{\alpha }|< \infty \} $$

and consider (4.5) with \(Q^{i}({a}\otimes {a}):=c_{i} \langle \langle {a}, K \rangle \rangle _{ \alpha }^{2}1^{\otimes i}\) for \(c_{0}=0\) and some constants \(c_{2}\), \(c_{1}\). Since \(\langle 1, y \rangle _{\alpha }=y(0)\), the corresponding SPDE (4.6) can then be realised by

$$\begin{aligned} d\lambda _{t}(x) = {\mathcal{A}}\lambda _{t}(x)dt+ K(x) \sqrt{C\big( \lambda _{t}(0)\big)} dB_{t}, \end{aligned}$$
(4.14)

where \(B\) is a Brownian motion and \(C(v)= c_{2} v^{2}+ c_{1} v \). Again, although we do not treat existence of solutions, the solution concept we have in mind is a mild one as outlined in Remark 4.5 (i). Note in particular that for \(a \in B\),

$$\begin{aligned} \langle a, \lambda _{t} \rangle _{\alpha }&= \langle a, S_{t}\lambda _{0} \rangle _{\alpha }+ \int _{0}^{t} \langle a, S_{t-s}K \rangle _{\alpha } C\big(\lambda _{s}(0)\big) dB_{s} \\ &= \langle a, \lambda _{0}(t+\cdot ) \rangle _{\alpha }+ \int _{0}^{t} \langle a, K(t-s +\cdot ) \rangle _{\alpha } C\big(\lambda _{s}(0) \big) dB_{s}. \end{aligned}$$

Similarly as in Cuchiero and Teichmann [28], setting \(V_{t}=\langle 1, \lambda _{t} \rangle _{\alpha }=\lambda _{t}(0)\) and choosing \(a=1\) then yields the Volterra equation

$$ V_{t}=\lambda _{0}(t)+ \int _{0}^{t} K(t-s) C(V_{s}) dB_{s}. $$

For \(c_{0}=c_{1}=c_{2}-1=0\), this corresponds to a Volterra geometric Brownian motion, i.e., \(V_{t}=\lambda _{0}(t)+ \int _{0}^{t} K(t-s) V_{s} dB_{s} \). In this particular parameter case, the corresponding \(k\)th dual operator satisfies the conditions of Remark 2.5 and can thus be written as \(L_{k}\vec{a}=({\mathcal{L}}_{0}{a}_{0},\ldots , {\mathcal{L}}_{k}{a}_{k})\), where

$$ {\mathcal{L}}_{j}{a}^{\otimes j}=j({\mathcal{A}}^{*}{a})\otimes {a}^{ \otimes (j-1)} +\frac{j(j-1)}{2} \langle \langle {a}, K \rangle \rangle _{\alpha }^{2} 1\otimes 1\otimes {a}^{\otimes (j-2)}. $$

VIX moments

We now compute the VIX moments in the case of the Volterra geometric Brownian motion, i.e., \(c_{0}=c_{1}=c_{2}-1=0\), and we additionally suppose that \(K\in {B}\). Observe that since \(B\subseteq {\mathbb{R}}+C_{0}({\mathbb{R}}_{+})\) this assumption automatically implies that \(K\) is a bounded function. The proof of the following proposition can be found in Sect. B. Recall the notion of a weak solution from Lemma 4.4, which we can apply here since we suppose that \(K \in B\).

Proposition 4.10

Let \((\lambda _{t})_{t \geq 0}\) be a weak solution of (4.14) with \(c_{0}=c_{1}=0\), \(c_{2}=1\) and \(K \in B\). Let \(k \in \mathbb{N}\) and assume that \({\mathbb{E}}[\sup _{t\leq T}\|\lambda _{t}\|_{\alpha }^{2k}]<\infty \). Then

$$ \mathbb{E} [ (\mathrm{VIX}^{2}_{t})^{k}|\lambda _{0} ] ={\mathbb{E}}\big[e^{ \int _{0}^{t}V_{k}(X^{(k)}_{\tau })d\tau }\lambda _{0}^{\otimes k}(X^{(k)}_{t}) \big|X^{(k)}_{0}\sim {\mathcal{U}}([0,\Delta ]^{k})\big], $$

where \(V_{k}(x)=\sum _{i< j}K(x_{i})K(x_{j})\) and \((X^{(k)}_{t})_{t\geq 0}\) is the \({\mathbb{R}}^{k}\)-valued process generated by

$$ {\mathcal{G}}^{(k)} f(x)=1^{\top }\nabla f( x) +\int \big(f( x+\xi )-f( x) \big)\nu (x,d\xi ), \qquad x\in {\mathbb{R}}^{k}, $$

for \(\nu ( x ,{\,\cdot \,})=\sum _{i< j}K(x_{i})K(x_{j})\delta _{(\dots ,0,-x_{i},0, \dots , 0,-x_{j},0, \dots )}\).

In the following, we analyse some specific parametrisation and specific moments.

Example 4.11

Let \(K(x)=\omega e^{-\gamma x}\) and \(\lambda _{0}=c(1-e^{-\gamma x})+e^{-\gamma x}V_{0}\). Then the assumption of Proposition 4.10 is automatically satisfied. Indeed, in this case,

$$ \lambda _{t}(x)={\mathbb{E}}[V_{t+x}|{\mathcal{F}}_{t}]=c(1-e^{-\gamma x})+e^{- \gamma x}V_{t}, $$

where \(V\) solves \(dV_{t}=\gamma (c- V_{t})dt+\omega V_{t}dW_{t}\). The equation for \(V\) becomes then

$$ dV_{t}=\gamma \bigg(\lambda _{0}(t)+\frac{1}{\gamma }\lambda _{0}'(t)- V_{t} \bigg)dt+\omega V_{t}dW_{t}. $$

Finally, observe that assuming \(\lambda _{0}=be^{-\gamma x}\) for some constant \(b\in {\mathbb{R}}\) yields

$$ \mathbb{E} [ {\mathrm{VIX}}^{2k}_{t} |\lambda _{0}=be^{-\gamma x} ]=\big(b(1-e^{- \gamma \Delta })\big)^{k}(\gamma \Delta )^{-k}e^{-(k\gamma - \frac{k(k-1)}{2}\omega ^{2})t}. $$

Remark 4.12

Proposition 4.10 only treats the case when \(K \in B\). We expect, however, that the result still holds true if \(K\in \widetilde{{B}}\setminus {B}\) and \(K \in L^{2}_{\text{loc}}\). Analysing the generator \({\mathcal{G}}^{(k)}\), we can deduce that the process \((X^{(k)}_{t})_{t\geq 0}\) has constant drift 1 until the first jump. This occurs at an exponential time. When a jump occurs, two of the components of the process jump to 0.

4.5 Signature of Brownian motion

In this section, we consider the signature process of a \(d\)-dimensional Brownian motion (see (4.16) below for the precise definition). We show that it is a polynomial process. We then exploit the polynomial machinery to compute its expectation, a well-known formula which can for instance be found in Friz and Hairer [41, Theorem 3.9]. Before introducing the mathematical framework, let us here briefly outline the relevance of the signature of general stochastic processes. The signature of a path, first studied by Chen [20,21], is a highly important object in rough path theory (Lyons [54]). This is explained by the following three facts:

– The signature of a path (of bounded \(p\)-variation) uniquely determines the path up to the so-called tree-like equivalences (which include e.g. reparametrisations; see Boedihardjo et al. [18]).

– Under certain regularity conditions, the expected signature of a stochastic process determines the law of the signature (see Chevyrev and Lyons [22, Proposition 6.1 and Theorem 6.3]).

– Every continuous functional (with respect to a certain \(p\)-variation norm) on continuous paths can be approximated by a linear function of the signature arbitrarily well (on compacts of non-tree-like paths).

These three properties have led Levin et al. [53] to introduce the so-called expected signature model which is nothing else than a linear regression model for the signature of a stochastic process \(Y\) on the signature of a stochastic process \(X\).

In practice, the stochastic process \(Y\) is often the solution to a stochastic differential equation driven by \(X\). The (generically non-linear and non-Lipschitz) functional relationship \(X \mapsto Y\) can then be expressed by a linear map of the signature of \(X\). For example, let \(X\) be a \(d\)-dimensional Brownian motion \(B\) and let \(Y\) solve an \(n\)-dimensional SDE of the form

$$\begin{aligned} dY_{t}= \sum _{i=1}^{d} V_{i}(Y_{t}) \circ dB_{t}^{i}, \qquad Y_{0}=y, \end{aligned}$$
(4.15)

where \(V_{i}: \mathbb{R}^{n} \to \mathbb{R}^{n}\) are polynomial or possibly analytic vector fields and ∘ denotes the Stratonovich integral. Then \(Y\) can be represented by a linear map of the signature of \(B\). Since the signature process of \(B\) is a polynomial process as outlined below, this then also translates to the signature process of \(Y\). In this sense, we encounter a surprising universality of the polynomial class. In a subsequent paper, we derive the polynomial property of the signature of processes of the form (4.15) directly and provide a procedure how to compute the expected signature. This has applications for instance for the generalised method of moments on a process level as considered for instance in Papavasiliou and Ladroue [56].

In mathematical finance, signature methods have recently gained considerable importance in view of machine learning applied to financial time series and option price data. For recent advances in this direction, see Cuchiero et al. [24], Levin et al. [53], Kidger et al. [48] and Perez Arribas et al. [57]. The universality of the polynomial class also plays a crucial role in this context, which we illustrate in a subsequent paper as well. Here, we consider the important case of Brownian motion.

The algebraic \(n\)-fold nonsymmetric tensor product of \({\mathbb{R}}^{d}\) with itself is a vector space \(({\mathbb{R}}^{d})^{\underline{\otimes }n}\) together with a multilinear map \(\underline{\otimes }: ({\mathbb{R}}^{d})^{n}\to ({\mathbb{R}}^{d})^{ \underline{\otimes }n}\) satisfying the following two properties:

\(({\mathbb{R}}^{d})^{\underline{\otimes }n}\) admits the representation

$$ ({\mathbb{R}}^{d})^{\underline{\otimes }n}=\bigg\{ \sum _{i_{1}=1}^{d} \cdots \sum _{i_{n}=1}^{d}\alpha _{i_{1}\cdots i_{n}} (e_{i_{1}} \underline{\otimes }\cdots \underline{\otimes }e_{i_{n}})\ : \ \alpha _{i_{1} \cdots i_{n}}\in {\mathbb{R}}\bigg\} ; $$

– given any multilinear map \(\ell :({\mathbb{R}}^{d})^{n}\to {\mathbb{R}}\), there is a unique linear map \(\tilde{\ell }:({\mathbb{R}}^{d})^{\underline{\otimes }n}\to { \mathbb{R}}\) such that \(\ell = \widetilde{\ell } \circ \otimes \).

We also set \(e_{i}^{\underline{\otimes }0}:=1\) for each \(i\in \{1,\ldots , d\}\). Observe that we use \(\underline{\otimes }\) instead of ⊗ to stress that it denotes a nonsymmetric tensor product. Set then

$$\begin{aligned} T \big((\mathbb{R}^{d})\big) &:= \{(y_{0}, y_{1} , \ldots , y_{n} , \ldots ): y_{n} \in (\mathbb{R}^{d})^{\underline{\otimes }n} \text{ for all } n \geq 0 \}, \\ T^{N}({\mathbb{R}}^{d}) &:= \{(y_{0}, y_{1} , \ldots , y_{N}): y_{n} \in (\mathbb{R}^{d})^{\underline{\otimes }n} \}. \end{aligned}$$

For each \({y}\in T^{N}({\mathbb{R}}^{d})\), let \({y}_{n,\mathbf{i}}\) denote the components of \({y}=({y}_{0},\ldots ,{y}_{N})\), i.e.,

$$ {y}={y}_{0}+\sum _{n=1}^{N}\sum _{\mathbf{i}\in {\mathcal{I}}_{n}} {y}_{n,{ \mathbf{i}}} e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}, \qquad {\mathcal{I}}_{n}:=\{1,\ldots ,d\}^{n}. $$

Fix \(N\in {\mathbb{N}}\) and consider the Banach space \(B:=T^{N}({\mathbb{R}}^{d})\). The corresponding dual is then \(B^{*}=T^{N}({\mathbb{R}}^{d})\) with pairing \(\langle a, y \rangle = a_{0}y_{0}+\sum _{n=1}^{N}\sum _{\mathbf{i}\in {\mathcal{I}}_{n}} y_{n,\mathbf{i}} a_{n,\mathbf{i}}\), for \({\mathcal{I}}_{n}=\{1,\ldots ,d \}^{n}\).

Let now \((B_{t})_{t\in [0,T]}\) be a \(d\)-dimensional Brownian motion. Denoting by ∘ the Stratonovich integral, set

$$ \int H_{s}\underline{\otimes }\circ dB_{s}=\sum _{i=1}^{\ell }\sum _{j=1}^{d} \bigg(\int H_{s}^{i} \circ dB_{s}^{j}\bigg) e_{i}\underline{\otimes }e_{j} $$

for each \({\mathbb{R}}^{\ell }\)-valued process \(H\) such that the right-hand side is well defined. Define then

$$ S_{T}^{(0)}=1,\qquad S_{T}^{(n)}:=\int _{0< t_{1}< \cdots < t_{n}< T} \circ dB_{t_{1}}\underline{\otimes }\cdots \underline{\otimes }\circ dB_{t_{n}} \qquad n\geq 1. $$
(4.16)

The signature \(S(B)_{0,T}\in T(({\mathbb{R}}^{d}))\) and the truncated signature \(S(B)^{N}_{0,T}\in T^{N}({\mathbb{R}}^{d})\) of \(B\) are then given by

$$ S(B)_{0,T}:=(S_{T}^{(0)},S_{T}^{(1)},S_{T}^{(2)},\ldots )\qquad \text{and}\qquad S(B)^{N}_{0,T}:=(S_{T}^{(0)},S_{T}^{(1)},\ldots , S_{T}^{(N)}), $$

respectively. Observe that by the definition of the Stratonovich integral, we have

$$ dS_{t}^{(n)}=\frac{1}{2} \sum _{i=1}^{d} (S_{t}^{{(n-2)}} \underline{\otimes }e_{i}\underline{\otimes }e_{i} )dt+S_{t}^{{(n-1)}} \underline{\otimes }dB_{t}. $$

An application of the Itô formula yields that the generator \(L:P\to P\) of \((S(B)^{N}_{0,t})_{t\geq 0}\) is given by

$$\begin{aligned} Lp( {y})&=\frac{1}{2} \sum _{n=2}^{N}\sum _{\mathbf{i}\in {\mathcal{I}}_{n}} {y}_{n-2,i_{1}\cdots i_{n-2}}1_{\{i_{n-1}=i_{n}\}} \frac{\partial }{\partial {y}_{n,\mathbf{i}}}p( {y}) \\ &\phantom{=:}+\frac{1}{2} \sum _{n,m=1}^{N}\sum _{\mathbf{i} \in {\mathcal{I}}_{n}}\sum _{{ \mathbf{j}} \in {\mathcal{I}}_{m}} {y}_{n-1,i_{1}\cdots i_{n-1}} {y}_{m-1,j_{1} \cdots j_{m-1}}1_{\{i_{n}=j_{m}\}} \frac{\partial ^{2}}{\partial {y}_{n,\mathbf{i}}\partial {y}_{m,\mathbf{j}}}p( {y}) \end{aligned}$$

for each \(p\in P\) and \(y\in T^{N}({\mathbb{R}}^{d})\), showing that \((S(B)^{N}_{0,t})_{t\geq 0}\) is a polynomial diffusion. Letting \(L_{1}\) be the first dual operator corresponding to \(L\), compute then

$$ L_{1}(e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n}})= \textstyle\begin{cases} \frac{1}{2} 1_{\{i_{n-1}=i_{n}\}}e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n-2}}&\quad \text{for $n\geq 2$}, \\ 0&\quad \text{else}, \end{cases} $$

and observe that for \(n\) even, we have

$$\begin{aligned} \exp (tL_{1})(e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}) &=\sum _{\ell =0}^{\infty }\frac{t^{\ell }}{\ell !}L_{1}^{\ell }(e_{i_{1}} \underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}) \\ &=\sum _{\ell =0}^{n/2} \frac{(t/2)^{\ell }}{\ell !}\prod _{k=0}^{\ell -1}1_{ \{i_{n-2k}=i_{n-2k-1}\}}e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n-2\ell }}. \end{aligned}$$
(4.17)

Note here that \(L^{\ell }_{1}\) means an \(\ell \)-fold application of \(L_{1}\) and the empty product is equal to 1. Let us now compute the expectation of the \((i_{1}\cdots i_{n})\)-component of \(S_{t}^{(n)}\), i.e., \(\mathbb{E}[ \langle e_{i_{1}} \underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}, S(B)^{N}_{0,t} \rangle ]\). By the dual moment formula, this equals \(\langle a_{t}, S(B)^{N}_{0,0} \rangle \), where \(a_{t}\) is the solution of \(\partial _{t} a_{t} =L_{1} a_{t}\) for \(a_{0} =e_{i_{1}} \underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}\). Since \(a_{t}\) is given by the exponential in (4.17) and \(S(B)^{N}_{0,0}= {e^{\underline{\otimes }0}} +\sum _{n=1}^{N} \sum _{\mathbf{i}\in {\mathcal{I}}_{n}} 0 e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}\), we conclude that

$$\begin{aligned} \mathbb{E}[ \langle e_{i_{1}} \underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}, S(B)^{N}_{0,t} \rangle ]&= \langle \exp (tL_{1})(e_{i_{1}}\underline{\otimes }\cdots \underline{\otimes }e_{i_{n}}), S(B)^{N}_{0,0} \rangle \\ &= \frac{(t/2)^{(n/2)}}{(n/2)!}\prod _{k=0}^{n/2-1}1_{\{i_{n-2k}=i_{n-2k-1} \}}. \end{aligned}$$
(4.18)

In words, the expectation of the \((i_{1}\cdots i_{n})\)-component of \(S_{t}^{(n)}\) is the coefficient of the basis element \({e^{\underline{\otimes }0}}\) in (4.17). All these arguments are for \(n\) even. A similar reasoning shows that the expectation of the \((i_{1}\cdots i_{n})\)-component of \(S_{t}^{(n)}\) is 0 for each \(n\) odd. Let now \(n=2k\). Then the only indices \((i_{1}\cdots i_{2k})\) which lead to nonzero terms in (4.18) are of the form \((j_{1}, j_{1}, j_{2}, j_{2}, \dots ,j_{k}, j_{k})\). Hence the only basis elements of \({\mathbb{E}}[S_{t}^{(2k)}]\) whose coefficient is nonzero are of the form \(e_{j_{1}} \underline{\otimes }e_{j_{1}} \underline{\otimes }e_{j_{2}} \underline{\otimes }e_{j_{2}},\dots ,e_{j_{k}}\underline{\otimes }e_{j_{k}}\). Summing over all those yields

$$ \sum _{j_{1}=1}^{d}\sum _{j_{2}=1}^{d} \cdots \sum _{j_{k}=1}^{d} e_{j_{1}} \underline{\otimes }e_{j_{1}} \underline{\otimes }e_{j_{2}} \underline{\otimes }e_{j_{2}} \cdots e_{j_{k}}\underline{\otimes }e_{j_{k}}= \bigg(\sum _{i=1}^{d} e_{i}\underline{\otimes }e_{i}\bigg)^{ \underline{\otimes }k}, $$

and we can thus conclude that

$$ {\mathbb{E}}[S_{t}^{(n)}]= \textstyle\begin{cases} \frac{(t/2)^{k}}{k!}(\sum _{i=1}^{d} e_{i}\underline{\otimes }e_{i})^{ \underline{\otimes }k} &\quad \text{if $n=2k$ for some $k\in {\mathbb{N}}\cup {\{0\}}$,} \\ 0&\quad \text{otherwise.} \end{cases} $$

Summing over \(n\) yields \({\mathbb{E}}[S(B)^{N}_{0,t}]=\sum _{k=0}^{\lfloor N/2\rfloor } \frac{(t/2)^{k}}{k!}(\sum _{i=1}^{d} e_{i}\underline{\otimes }e_{i})^{ \underline{\otimes }k}\). Since no basis element appears in more than one term of the sum, we can replace \(N\) with \(\infty \) to get

$$ {\mathbb{E}}[S(B)_{0,t}]=\sum _{k=0}^{\infty }\frac{(t/2)^{k}}{k!} \bigg(\sum _{i=1}^{d} e_{i}\underline{\otimes }e_{i}\bigg)^{ \underline{\otimes }k}=\exp \bigg(\frac{t}{2}\sum _{i=1}^{d} e_{i} \underline{\otimes }e_{i}\bigg). $$

This result coincides with that of Friz and Hairer [41, Theorem 3.9].