1 Introduction

We formulate a search-trajectory optimization problem, with multiple searchers looking for multiple targets. The searchers could be manned or unmanned aircraft and surface vessels with sensors designed to detect targets, such as pirates or terrorists, that pose a threat to commercial shipping and capital naval vessels. The searchers may be coast guard cutters and aircraft seeking survivors after a shipwreck. They may also be maritime patrol aircraft looking for drug smugglers in go-fast boats and self-propelled semi-submersible vessels. Regardless of the nature of the objects sought, we designate them “targets” for consistency. There is typically significant uncertainty about the characteristics of the targets such as their location, heading, intent, and possibly even their existence. In this environment of uncertainty, we aim at optimizing the trajectories of a given set of searchers such that the probability of detecting the targets, as defined precisely below, is maximized and dynamical constraints on the searchers are obeyed.

The problem is a parameter-distributed optimal control model, where the searcher dynamics are given by ordinary differential equations (ODEs) and the targets follow conditionally deterministic trajectories. This means the target trajectories depend on unknown parameters that we treat as random variables, but given a realization of these parameters, the target trajectories are known.Footnote 1 We focus on parameters with continuous probability distributions, but our framework easily extends to include discrete ones as well. Conditionally deterministic target trajectories result in an objective function with integration across the parameters as well as time. The parameter-distributed optimal control model is similar to those in [14], but in contrast to these studies, we consider multiple targets. We refer to [3, 5] for target models based on diffusion. We find models that relax the requirement of continuous searcher trajectories in [68]. These papers optimize a time-varying allocation of search effort across an area of interest without concern for the difficulty a real-world searcher may face in implementing this optimized plan. For a discussion of search in discrete time and space, we refer to [9, 10] and references therein.

In reality, targets may be intelligent and attempt to avoid or help the searchers, which, if fully accounted for, may lead to a differential game [11]. However, the presence of uncertainty about the target’s characteristics, location, situational awareness, and degree of rationality complicates the formulation of such games, and we opt, as in [14], for a probabilistic formulation where targets follow conditionally deterministic trajectories. However, this still allows for the modeling of some elements of “intelligent” target behavior.

Studies of search-trajectory optimization problems and related problems focus on the development of necessary optimality conditions (see, e.g., [2, 1215]) in the tradition of Pontryagin and sufficient conditions for optimality (see, e.g., [5, 16]) in the tradition of Hamilton, Jacobi, and Bellman. It is generally challenging determining search trajectories by solving these conditions directly. However, numerical approaches are proposed in [3, 5, 1720]. Other studies on trajectory optimization problems have used direct methods, including direct collocation and direct shooting methods. For a survey of these methods, we refer to [21], while [22] considers direct collocation and [23] describes direct shooting.

We adopt a direct method for the solution of the parameter-distributed optimal control model and discretize time as well as parameter space to obtain a nonlinear program. With the exception of the recent efforts [1, 15, 24], we find no other attempt at solving search-trajectory optimization problems in this manner. We go beyond paper [1], which focuses on modeling of a special target operating in a channel, and consider multiple targets following more general conditionally deterministic trajectories. In contrast to the exclusive focus on parameter-space discretization in [15, 24], we consider discretization of both time and parameter space.

We adopt an Euler discretization scheme for the solution of the ODEs governing the searchers’ movement and the time integral in the objective function. Higher-order methods such as Runge–Kutta [25, 26] and pseudospectral [27, 28] could also be considered, but the inclusion of both time and parameter discretization and the need to consider stationary solutions add complications more easily overcome with an Euler scheme. The objective function includes an integral over the parameter space, which we handle using Simpson’s rule. We specifically deal with a two-dimensional parameter space, which is of particular interest in our naval application: The starting location of the group of targets is an unknown (xy) coordinate, or the starting time and location along a coastal line are unknown. In principle, an analysis of higher-dimensional parameter spaces follows similar arguments as those below. However, high-dimensional integration adds to the computational burden and requires sparse-grid and Monte Carlo methods beyond the scope of this paper; see, for example, [29, 30] for convergence of Monte Carlo approximations. We show that the discretization schemes result in approximate nonlinear programs consistent in the sense of Polak [31, Section 3.3]; see also “Appendix.” Consequently, optimal, as well as stationary points of the nonlinear programs, converge to corresponding points of the parameter-distributed optimal control model, as the level of discretization tends to infinity. Convergence of stationary solutions is particularly important for this nonconvex model, and in fact, we derive a convenient optimality condition for the parameter-distributed optimal control model, which is closely related to the first-order optimality conditions of the nonlinear programs. In our companion paper [32], we use these results to develop an implementable, adaptive precision adjustment algorithm, which ensures convergence to stationary points of the parameter-distributed optimal control model.

The remainder of the paper is organized as follows. Section 2 defines the parameter-distributed optimal control model. Section 3 gives our assumptions and optimality conditions. Section 4 describes discretization schemes and shows consistency in the sense of Polak. Finally, Sect. 5 provides our conclusions.

2 Model Formulation and Notation

We consider K searchers and L targets during a normalized planning horizon [0, 1]. The lth target’s state trajectory \(\{y^l(t;\alpha )\in {{\mathbb {R}}}^{n_y}{:}\,0 \le t \le 1\}\) is conditional on a random vector \(\alpha \), with probability density \(\phi \), which takes values in a compact set \(A \subset {\mathbb {R}}^2\). Consequently, given a realization \(\alpha \in A\), \(y^l(\cdot ; \alpha )\) is known.Footnote 2 The target trajectories may be the output from complicated simulations accounting for weather, intent, and other factors. We could also augment \(\alpha \) to include any number of discrete random variables with only trivial changes in the development, but omit this extension here for notational simplicity. The random vector \(\alpha \) may represent uncertainty about the targets’ initial locations, the time of their attack, their velocity, radar cross section, and determination, as well as other factors. The distribution of \(\alpha \) is established based on intelligence and area analysis and may be rather “diffuse” in the absence of concrete leads about the targets.

We seek to optimize the state trajectories \(\{x^k(t)\in {\mathbb {R}}^{n_k}{:}\,0 \le t \le 1\}\), \(k=1, 2, \ldots , K\), of the searchers with respect to an objective function described next. We refer to \(x^k(t)\) as the “physical state” of the kth searcher at time t. Since we consider multiple searchers and targets, “probability of detection” becomes ambiguous. In this paper, we adopt an objective function that gives the probability that all of the searchers fail to detect any of the targets during the planning horizon. Alternative formulations are also possible; see [33] for details.

The probability of detection of target l by searcher k depends on the corresponding detection rate, which is denoted by \(r^{k,l}{:}\,{{\mathbb {R}}}^{n_k} \times {{\mathbb {R}}}^{n_y} \rightarrow [0,\infty [\). Informally stated, the detection rate is defined such that \(r^{k,l}(x^k,y^l)\varDelta t\) gives the probability that the kth searcher in state \(x^k \in {{\mathbb {R}}}^{n_k}\) detects the lth target in state \(y^l \in {{\mathbb {R}}}^{n_y}\) during a small time interval \([t, t +\varDelta t[\). The states for the searchers and targets typically involve their locations, headings, and velocities, but could also include other quantities that influence the detection such as time of day. The detection rate reflects a searcher’s effectiveness and often \(r^{k,l}(x^k,y^l)\) decreases as the “distance” between \(x^k\) and \(y^l\) increases. Given \(\alpha \in A\), \(\{x^k(t){:}\,0\le t\le 1\}\), \(\{y^l(t;\alpha ): 0\le t\le 1\}\), and \(t \in [0,1]\), we denote by \(q^{k,l}(t;\alpha )\) the probability that the kth searcher fails to detect the lth target during [0, t]. Then, under the assumption that detection events in nonoverlapping time intervals are independent, we find thatFootnote 3

$$\begin{aligned} q^{k,l}(t+\varDelta t;\alpha ) = q^{k,l}(t;\alpha )\left( 1-\left( r^{k,l}(x^k(t), y^l(t;\alpha ))\varDelta t+o(\varDelta t)\right) \right) , \end{aligned}$$

with \(q^{k,l}(0;\alpha ) = 1\), which becomes the parameterized differential equation

$$\begin{aligned} \dot{q}^{k,l}(t;\alpha ) := \frac{d}{dt}q^{k,l}(t;\alpha ) = -q^{k,l}(t;\alpha )r^{k,l}(x^k(t), y^l(t;\alpha )), \ q^{k,l}(0;\alpha )=1, \end{aligned}$$

as \(\varDelta t \rightarrow 0\), with solution

$$\begin{aligned} q^{k,l}(t;\alpha ) = \exp \left( -\int _0^t r^{k,l}(x^k(s), y^l(s;\alpha )) \hbox {d}s\right) . \end{aligned}$$
(1)

We assume that the searchers make independent detection attempts and can simultaneously detect multiple targets. Then, it follows from (1) that the conditional probability that no searcher detects any target during [0, 1], given \(\alpha \in A\), \(\{x^k(t){:}\,0\le t\le 1\}\), \(k = 1, 2,\ldots , K\), and \(\{y^l(t;\alpha ){:}\,0\le t\le 1\}\), \(l=1, 2, \ldots , L\), is simply

$$\begin{aligned} \prod _{k=1}^K \prod _{l=1}^L q^{k,l}(1;\alpha ) = \exp \left( - \int _0^1 \sum _{k=1}^K \sum _{l=1}^L r^{k,l}(x^k(t), y^l(t;\alpha ))\hbox {d}t\right) . \end{aligned}$$

Then, the probability that all searchers fail to detect any targets during [0, 1] takes the form

$$\begin{aligned} \int _{A} \exp \left( - \int _0^1 \sum _{k=1}^K \sum _{l=1}^L r^{k,l}(x^k(t), y^l(t;\alpha ))\hbox {d}t\right) \phi (\alpha ) \hbox {d}\alpha . \end{aligned}$$
(2)

Although not discussed further here, (2) is easily modified to account for the possibility that a target may not exist with probability one.

With the objective function (2) in place, model formulation follows in a straightforward manner with the specification of searcher dynamics and control spaces. We assume that the motion of the kth searcher is governed by the differential equation

$$\begin{aligned} \dot{x}^k(t) = h^k(x^k(t),u^k(t)), \ t \in [0,1], \ x^k(0) = \xi ^k, \end{aligned}$$
(3)

where \(h^k{:}\,{{\mathbb {R}}}^{n_k} \times {{\mathbb {R}}}^{m_k} \rightarrow {{\mathbb {R}}}^{n_k}\), \(\xi ^k\in {{\mathbb {R}}}^{n_k}\) is the initial condition and \(u^k(t) \in {{\mathbb {R}}}^{m_k}\) is the control input for the kth searcher at \(t\in [0,1]\). Let \(n:=\sum _{k=1}^K n_k\), \(m\,{:=}\,\sum _{k=1}^K m_k\), \(\xi := ((\xi ^1)^\top ,\ldots ,(\xi ^K)^\top )^{\top }\), \(x(t) := (x^1(t)^{\top },\ldots ,x^K(t)^{\top })^{\top }\), \(u(t) := (u^1(t)^{\top },\ldots ,u^K(t)^{\top })^{\top }\), and \(h(x(t),u(t)) := (h^1(x^1(t),u^1(t))^\top , \ldots , h^K(x^K(t),u^K(t))^\top )^\top \). We handle time-varying systems by using standard transcriptions of (3); see, for example, p. 493 in [31]. We carry out the analysis under the assumption (formalized in Sect. 3) that (3) has a unique solution for all \(k=1, 2, \ldots , K\). Given \(\eta := (\xi ,u)\), with \(\xi \in {{\mathbb {R}}}^{n}\) being an initial condition and \(u:[0,1]\rightarrow {\mathbb {R}}^m\) being the control, we denote by \(x^{\eta ,k}\) that solution and let \(x^{\eta } := ((x^{\eta ,1})^{\top }, \ldots , (x^{\eta ,K})^{\top })^{\top }\).

We then arrive at the parameter-distributed optimal control model

$$\begin{aligned} P{:}\,\min _{\eta \in \mathbf{H}_c}~~f(\eta ), \end{aligned}$$

where \(f{:}\,\mathbf{H}\rightarrow {\mathbb {R}}\), for any \(\eta \in \mathbf{H}\), is given by (2) with \(x^{k}(t)\) replaced by \(x^{\eta ,k}(t)\). Detailed definitions of the control spaces \(\mathbf{H}\) and \(\mathbf{H}_c\) are provided below. While not dealt with in P, fixing the initial conditions or constraining them to a convex set causes no complications, and the following analysis holds with minor changes. We find it convenient analytically to work with P, which optimizes over initial condition and control only, instead of an essentially equivalent model that optimizes over initial condition, control, and state subject to the ODE constraints (3). Still, computational solutions can be obtained from both a discretization of P and an augmented problem with state variables and equality constraints.

As in [31, Chapts. 4 and 5], in order to establish continuity and differentiability of solutions of (3) with respect to the control u, it is easiest to assume that the controls are in \(L_{\infty }^m[0,1]\), the space of essentially bounded, measurable functions, from [0, 1] into \({{\mathbb {R}}}^m\). However, when we extend optimality functions from problems defined on \({{\mathbb {R}}}^n\), which is a Hilbert space, it is more natural to assume the controls are in \(L_2^m[0,1]\), the space of Lebesgue square-integrable functions from [0, 1] into \({{\mathbb {R}}}^m\). In order to resolve these competing requirements, we conduct our analysis in a subspace of the Hilbert space, where initial condition and control pairs are elements of the space \({H_2 := {{\mathbb {R}}}^n \times L_2^m[0,1]}\). For any \(\eta =(\xi ,u)\in H_2\), with \(\xi \in {{\mathbb {R}}}^n\) and \(u\in L_2^m[0,1]\), and any \(\eta '=(\xi ',u')\in H_2\), with \(\xi '\in {{\mathbb {R}}}^n\) and \(u'\in L_2^m[0,1]\), we define the inner product and norm on \(H_2\), respectively, by \(\langle \eta ,\eta '\rangle _{H_2} := \langle \xi ,\xi '\rangle + \langle u, u' \rangle _2\) and \(\Vert \eta \Vert _{H_2} := (\Vert \xi \Vert ^2 + \Vert u\Vert _2^2)^{1/2}\), where \(\langle \cdot , \cdot \rangle _2\) and \(\Vert \cdot \Vert _2\) are the \(L_2^m[0,1]\) inner product and norm, respectively. The only type of pointwise pure control constraint that we will examine will be of the form \(u(t)\in U_c\) for all \(t\in [0,1]\), where \(U_c \subset {{\mathbb {R}}}^m\) is convex and compact. This limits the control to a subset of \(L_{\infty }^m[0,1]\). We then define the subspace of \(H_2\), which we will use to carry out our analysis, to be the pre-Hilbert space \(H_{\infty ,2} := {{\mathbb {R}}}^n \times L_{\infty ,2}^m[0,1]\), where \(L_{\infty ,2}^m[0,1]\) denotes the pre-Hilbert space with the same elements as \( L_\infty ^m[0,1]\), but is equipped with \(\langle \cdot ,\cdot \rangle _2\) and \(\Vert \cdot \Vert _2\), the inner product and norm of \( L_2^m[0,1]\). Consequently, we optimize the initial condition and control pair \(\eta = (\xi ,u)\) over the feasible region

$$\begin{aligned} \mathbf{H}_c := {{\mathbb {R}}}^n \times \mathbf{U}_c \subset H_{\infty ,2}, \text{ with } \mathbf{U}_c := \left\{ u \in L_{\infty ,2}^m[0,1] {:}\,u(t) \in U_c, t \in [0,1] \right\} . \end{aligned}$$

We avoid difficulties related to differentiability by ensuring that functions are well defined on the slightly larger set

$$\begin{aligned} \mathbf{H} := {{\mathbb {R}}}^n \times \mathbf{U}, \text{ with } \mathbf{U} := \left\{ u \in L_{\infty ,2}^m[0,1] {:}\,u(t)\in U, t\in [0,1] \right\} , \end{aligned}$$

with \(U\subset {\mathbb {R}}^m\) being such that it strictly contains \(U_c\).

We close this section with additional notation. Given \(t \in [0,1]\), \(\alpha \in A\), and \(\{x(s){:}\,0\le s\le t\}\), let \(z(t;\alpha ) \in {{\mathbb {R}}}\) be a parametric “information state” that represents the cumulative detection rate up to time t; i.e., \(z(\cdot ;\alpha )\) is the solution of

$$\begin{aligned} \dot{z}(s;\alpha )=\sum _{k=1}^K \sum _{l=1}^L r^{k,l}\left( x^{k}(s),y^l(s;\alpha )\right) , \ \ s \in [0,t], ~z(0;\alpha ) = 0. \end{aligned}$$
(4)

Using tilde to denote a combination of the physical and information states, we obtain for each \(\alpha \in A\) the augmented system

$$\begin{aligned} \dot{\tilde{z}}(t;\alpha ) := {\tilde{h}}(x(t),u(t);\alpha ), \ t \in [0,1], \end{aligned}$$
(5)

in the augmented state

$$\begin{aligned} {\tilde{z}}(t;\alpha ):= \left( \begin{array}{c} x(t) \\ z(t;\alpha ) \end{array}\right) \in {{\mathbb {R}}}^{n+1} \end{aligned}$$

with initial condition \({\tilde{z}}(0;\alpha ) = {\tilde{\xi }} := (\xi ^\top ,0)^{\top } \in {{\mathbb {R}}}^{n+1}\) and

$$\begin{aligned} {\tilde{h}}(x(t),u(t);\alpha ):= \left( \begin{array}{c} h(x(t),u(t)) \\ \sum \nolimits _{k=1}^K \sum \nolimits _{l=1}^L r^{k,l}(x^k(t),y^l(t;\alpha )) \end{array} \right) . \end{aligned}$$

We denote by \(\tilde{z}^{\eta }(\cdot ;\alpha ) := ((x^\eta )^\top ,z^\eta (\cdot ;\alpha ))^\top \) the solution of (5) for input \(\eta = (\xi ,u)\in \mathbf{H}\) (which we below ensure exists and is unique). In this notation, the inner integrand in (2) simplifies to \(\exp (-z(1;\alpha ))\). Hence, with \({\tilde{f}}(\cdot ;\alpha ){:}\,\mathbf H\rightarrow {\mathbb {R}}\) defined by \({\tilde{f}}(\eta ;\alpha ) := \exp (-z^\eta (1;\alpha ))\), we obtain the objective function

$$\begin{aligned} f(\eta ) = \int _{A} {\tilde{f}}(\eta ;\alpha ) \phi (\alpha )d \alpha . \end{aligned}$$
(6)

3 Assumptions and Optimality Conditions

We aim to construct a discretized approximation in both time and parameter space with the property that if we obtain a stationary point for the approximating problem and the discretization is refined, then the stationary point for the approximating problem converges to a stationary point of the original problem P. One approach could have been to develop a Pontryagin-type condition for the problem and then examine the relationship between the KKT conditions of the discretized problem as the discretization is refined to the Pontryagin condition on the original problem. This seems to be a difficult approach, particularly in developing consistency results. For steps in this direction, we refer to [15, 24]. Instead, we develop a new optimality condition that is easier to relate to standard nonlinear programming optimality conditions. Previous optimality conditions for search-trajectory optimization problems in the literature (see, e.g., [2, 5]) appear incompatible with first-order optimality conditions for nonlinear programs obtained by discretization of such problems. Hence, we are unable to develop consistent approximations in the sense of Polak (see “Appendix”) using conditions from the literature. We develop this new optimality condition for the limiting case of the discretized problem, which is in the spirit of Polak [31, Chapts. 4 and 5]. We develop a necessary optimality condition for P based on the Gâteaux differential of f, which as we show in the subsequent section relates to a necessary optimality condition for a discretized version of P.

We first impose assumptions that ensure existence, uniqueness, and differentiability of the solution of (5), and that f is well defined and differentiable. Here, we let \(h_x^k\) denote the \(n_k \times n_k\) matrix function of partial derivatives with element (ij) given by \(\partial h^k_i/\partial x_j^k\), and let \(h_u^k\) denote the \(n_k \times m_k\) matrix function of partial derivatives with element (ij) given by \(\partial h^k_i/\partial u_j^k\). Moreover, let \({\tilde{h}}_x(x(t),u(t);\alpha )\) be the \((n+1) \times n\) block-diagonal matrix with \(h_x^1(x^1(t),u^1(t))^{\top }\), ..., \(h_x^K(x^K(t),u^K(t))^{\top }\) along its main diagonal and \(\sum _{k=1}^K \sum _{l=1}^L \nabla _x r^{k,l}(x^k(t),y^l(t;\alpha ))^{\top }\) as its final row, and let \({\tilde{h}}_u(x(t),u(t);\alpha )\) be the \((n+1) \times m\) block-diagonal matrix with \(h_u^1(x^1(t),u^1(t))^{\top }\), ..., \(h_u^K(x^K(t),u^K(t))^{\top }\) along its main diagonal and 0 as its final row. For any matrix \({\mathcal {M}}\), we let \(\Vert {\mathcal {M}}\Vert := \max _{\Vert v\Vert =1} \Vert {\mathcal {M}}v \Vert \).

Assumption 3.1

For all \(k=1,2,\ldots ,K\) and \(l=1,2,\ldots ,L\),

  1. (i)

    \(h^k\), \(r^{k,l}\), \(\phi \), and \(y^l\) are continuously differentiable and

  2. (ii)

    there exists \(C \in [1,\infty [\) such that, for all \(x^\prime \), \(x^{\prime \prime } \in {{\mathbb {R}}}^n\), \(v^\prime \), \(v^{\prime \prime } \in U\), and \(\alpha \in A\),

    $$\begin{aligned} \Vert {\tilde{h}}(x^\prime ,v^\prime ;\alpha )-{\tilde{h}}(x^{\prime \prime },v^{\prime \prime };\alpha )\Vert\le & {} C \left[ \Vert x^\prime -x^{\prime \prime }\Vert +\Vert v^\prime -v^{\prime \prime }\Vert \right] , \nonumber \\ \Vert {\tilde{h}}_x(x^\prime ,v^\prime ;\alpha )-{\tilde{h}}_x(x^{\prime \prime } ,v^{\prime \prime };\alpha )\Vert\le & {} C \left[ \Vert x^\prime -x^{\prime \prime }\Vert +\Vert v^\prime -v^{\prime \prime }\Vert \right] , \nonumber \\ \Vert {\tilde{h}}_u(x^\prime ,v^\prime ;\alpha )-{\tilde{h}}_u(x^{\prime \prime } ,v^{\prime \prime };\alpha )\Vert\le & {} C \left[ \Vert x^\prime -x^{\prime \prime }\Vert +\Vert v^\prime -v^{\prime \prime }\Vert \right] . \end{aligned}$$
    (7)

We note that (7) implies that there exists a \(C^{\prime } < \infty \) such that for all \(x^\prime \in {{\mathbb {R}}}^n\), and \(v \in U\),

$$\begin{aligned} \Vert {\tilde{h}}(x^\prime ,v)\Vert \le C^{\prime }\left[ \Vert x^\prime \Vert +1\right] . \end{aligned}$$
(8)

The assumptions about the detection rate function, \(r^{k,l}(\cdot ,\cdot )\), in Assumption 3.1 are not overly restrictive as they allow for the use of many types of sensor models and are similar to those used in other studies (see, e.g., [1]). Assumption 3.1 includes standard assumptions similar to those adopted in Assumption 5.6.2 in [31]. Assumption 3.1 (ii) guarantees a unique solution to the differential equations governing the searcher dynamics given by (5). It suffices for Assumption 3.1 to hold on compact sets if the state remains in a compact set. In Sect. 4, we impose further assumptions on the detection rate function and attacker trajectories.

We next show that f is Gâteaux differentiable on \(\mathbf{H}_c\), and start with the following intermediate result, that is deduced from results in [31, Chap. 5].

Lemma 3.1

Suppose that Assumption 3.1 holds. For any \(\alpha \in A\),

  1. (i)

    \({\tilde{f}}(\cdot ;\alpha )\) has a Gâteaux differential \(D {\tilde{f}}(\eta ; \alpha ;\delta \eta )\) at \(\eta \in \mathbf{H}_c\) for all \(\delta \eta \in H_{\infty ,2}\), which is given by

    $$\begin{aligned} D{\tilde{f}}(\eta ; \alpha ;\delta \eta ) = \left\langle \nabla _{\eta } {\tilde{f}}(\eta ; \alpha ), \delta \eta \right\rangle _{H_2}, \end{aligned}$$

    where the gradientFootnote 4 \(\nabla _{\eta } {\tilde{f}}(\eta ; \alpha ) := (\nabla _\xi {\tilde{f}}(\eta ; \alpha ), \nabla _u {\tilde{f}}(\eta ;\alpha ))^{\top } \in H_{\infty ,2}\) and

    $$\begin{aligned} \nabla _\xi {\tilde{f}}(\eta ; \alpha ):= & {} p^{\eta }(0;\alpha ), \text{ and } \nabla _u {\tilde{f}}(\eta ;\alpha )(t) := {\tilde{h}}_u(x^\eta (t),u(t);\alpha )^{\top } p^{\eta }(t;\alpha ), \end{aligned}$$

    \(t \in [0,1]\), and \(p^{\eta }(t;\alpha )\) is the solution of the adjoint equation

    $$\begin{aligned} \dot{p}^{\eta }(t;\alpha )= & {} - {\tilde{h}}_{x}(x^\eta (t),u(t);\alpha )^{\top } p^{\eta }(t;\alpha ), \quad t \in [0,1],\\ p^{\eta }(1;\alpha )= & {} \left( 0,\ldots ,0,-\exp \left( - z^{\eta }(1;\alpha )\right) \right) ^{\top } \in {{\mathbb {R}}}^{n+1}, \end{aligned}$$
  2. (ii)

    \({\tilde{f}}(\cdot ;\alpha )\) and \(\nabla _{\eta } {\tilde{f}}(\cdot ;\alpha )\) are Lipschitz \(\mathbf H\)-continuousFootnote 5 on bounded subsets of \(\mathbf{H}_c\) with Lipschitz constants \(\sqrt{2} C e^C\) and \(e^{C^{\prime }} (\exp (\sqrt{2} C e^C ) + C (\sqrt{2} C e^C + 1 ) )\), respectively, where C is as in Assumption 3.1 and \(C^{\prime } < \infty \) is as in (8).

Proof

Part (i) results from an application of Corollary 5.6.9 in [31]. For part (ii), we deduce from Lemma 5.6.7 in [31] that \(z^{\eta }(1;\alpha )\) is Lipschitz \(\mathbf H\)-continuous as a function of \(\eta \) with Lipschitz constant \(\sqrt{2} C e^{C}\). Since \(z^{\eta }(1;\alpha ) \ge 0\) for all \(\eta \in \mathbf{H}_c\) and \(\alpha \in A\), it follows that \({\tilde{f}}(\cdot ;\alpha )\) is Lipschitz \(\mathbf H\)-continuous with Lipschitz constant \(\sqrt{2} C e^{C}\) because the magnitude of the slope of the exponential function with an argument in the domain \(]-\infty , 0]\) is bounded by one. The statement about the gradient follows from an application of Corollary 5.6.9 and the proof of Lemma 5.6.7 in [31]. \(\square \)

Proposition 3.1

Suppose that Assumption 3.1 holds. Then, for any \(\eta \in \mathbf{H}_c\) and \({\delta \eta \in H_{\infty ,2}}\), f has a Gâteaux differential \(Df(\eta ;\delta \eta ) = \left\langle \nabla f(\eta ), \delta \eta \right\rangle _{H_2}\) at \(\eta \), where the gradient \(\nabla f(\eta ) = (\nabla _\xi f(\eta ), \nabla _u f(\eta ))^\top \in H_{\infty ,2}\) is given by

$$\begin{aligned} \nabla _\xi f(\eta )= & {} \int _{A} \nabla _{\xi } {\tilde{f}}(\eta ;\alpha )\phi (\alpha )\mathrm{d}\alpha , \text{ and } \nabla _u f(\eta )(t) = \int _{A} \nabla _{u} {\tilde{f}}(\eta ;\alpha )(t)\phi (\alpha )\mathrm{d}\alpha , \end{aligned}$$

\(t\in [0,1]\), and \(\nabla f{:}\,\mathbf{H}_c\rightarrow H_{\infty ,2}\) is Lipschitz \(\mathbf H\)-continuous on bounded subsets of  \(\mathbf{H}_c\).

Proof

The expression for the gradient \(\nabla f(\eta )\) follows a standard application of the dominated convergence theorem; see the proof of Proposition III.5 in [33]. We consider the Lipschitz result in more detail. Let \(\eta , \eta '\in \mathbf{H}_c\) be arbitrary. For any positive integer M, \(w_i^M\in ]0,\infty [\), and \(\alpha ^i\in A\), \(i = 1, 2, \ldots , M\), positive homogeneity, the triangle inequality, and Lemma 3.1(ii) yield

$$\begin{aligned}&\left\| \sum _{i=1}^M w_i^M \nabla _\eta {\tilde{f}}(\eta ; \alpha ^i) - \sum _{i=1}^M w_i^M \nabla _\eta {\tilde{f}}(\eta '; \alpha ^i) \right\| _{H_2}\nonumber \\&\quad \le \sum _{i=1}^M w_i^M \left\| \nabla _\eta {\tilde{f}}(\eta ; \alpha ^i) - \nabla _\eta {\tilde{f}}(\eta '; \alpha ^i) \right\| _{H_2}\nonumber \\&\quad \le \sum _{i=1}^M w_i^M e^{C^{\prime }} \left( \exp \left( \sqrt{2} C e^{C}\right) + C \left( \sqrt{2} C e^{C} + 1 \right) \right) \left\| \eta - \eta ' \right\| _{H_2}. \end{aligned}$$
(9)

Since each component of \(\nabla _\eta {\tilde{f}}(\eta ; \alpha )\) is continuously differentiable as a function of \(\alpha \) and those partial derivatives are bounded with a constant that holds for all \(\alpha \in A\) and all \(\eta \) in a bounded subset of \(\mathbf{H}_c\), numerical integration theory implies that for appropriate choices of \(\alpha ^i\) and \(w_i^M\), \(i = 1, \ldots , M\), \(\sum _{i=1}^M w_i^M\) is bounded and (9) tends to \(\Vert \nabla f(\eta ) - \nabla f(\eta ')\Vert _{H_2}\), as \(M\rightarrow \infty \). The conclusion then follows. \(\square \)

We state optimality conditions in terms of zeros of optimality functions (see Definition 1 of “Appendix” and [31, Section 4.2]) and define the nonpositive optimality function \(\theta {:}\,\mathbf{H}_c \rightarrow {{\mathbb {R}}}\) as

$$\begin{aligned} \theta (\eta ) := \min _{\eta ^{\prime } \in \mathbf{H}_c} \left\langle \nabla f(\eta ), \eta ^{\prime }-\eta \right\rangle _{H_2} + \frac{1}{2} \left\| \eta ^{\prime } - \eta \right\| ^2_{H_2}. \end{aligned}$$

This approach leads to the following optimality condition for P:

Proposition 3.2

Suppose that Assumption 3.1 is satisfied.

  1. (i)

    \(\theta \) is \(\mathbf{H}_c\)-continuous.Footnote 6

  2. (ii)

    If \(\hat{\eta } \in \mathbf{H}_c\) is a local minimizer of P, then \(\theta (\hat{\eta }) = 0\).

Proof

Results follow from same arguments in proof of Theorem 4.2.3 in [31]. \(\square \)

We note that \(\theta \) is an extension, with a quadratic regularization term, of the classical optimality condition in finite dimensions that says if \(x^*\) is a local minimizer of a function f over X, where X is a nonempty and convex subset of \({{\mathbb {R}}^n}\), then \(\min _{x\in X} \langle \nabla f (x^*), x-x^*\rangle = 0\).

4 Discretization and Consistency

In this section, we introduce discretization schemes that lead to a family of approximating problems. We show that these problems are consistent in the sense of Polak, which we show in our companion paper [32] leads to an implementable algorithm for P. We first deal with time discretization and follow closely the development in Sections 4.3 and 5.6 of [31]. The development in this subsection leads to standard results, but it is included here as a foundation for what follows. The second subsection handles discretization of the parameter space.

4.1 Time Discretization

We let \({\mathbb {N}}\) denote the positive integers and \({{\mathcal {N}}} := \{ 2^j \}_{j=1}^{\infty }\). To begin our development, we define an infinite set of finite-dimensional subspaces \(H_N \subset H_{\infty ,2}\), whose union is dense in \(H_{\infty ,2}\). For \(N \in {\mathcal {N}}\) and \(j = 0,1, \ldots ,N-1\), let the functions \(\pi _{N,j}{:}\,[0,1] \rightarrow {{\mathbb {R}}}\) be defined by

$$\begin{aligned} \pi _{N,j} (t) := \left\{ {\begin{array}{ll} {\sqrt{N} ~~\forall t \in [j/N, \quad (j+1)/N[, \quad \mathrm{if} \quad j\le N-2,} \\ {\sqrt{N} ~~\forall t \in [j/N, \quad (j+1)/N], \quad \mathrm{if} \quad j= N-1, } \\ {0,~~\mathrm{otherwise}.} \\ \end{array}} \right. \end{aligned}$$
(10)

We also define the subspace \(L_N \subset L_{\infty ,2}^m[0,1]\), by

$$\begin{aligned} \ L_N := \left\{ u \in L^m_{\infty ,2}[0,1]{:}\,u(\cdot ) = \sum _{j=0}^{N-1} {\bar{u}}_j \pi _{N,j}(\cdot ), {\bar{u}}_j \in {{\mathbb {R}}}^m , \quad j=0,1, \ldots ,N-1 \right\} , \end{aligned}$$
(11)

and let

$$\begin{aligned} H_N := {{\mathbb {R}}}^n \times L_N \subset H_{\infty ,2}. \end{aligned}$$

The functions \(\pi _{N,j} (\cdot )\) form a basis for \(L_N\) and are defined such that the relation between \(H_N\) and the Euclidean space of coefficients used for numerical computation is isometric. We use the function space \(H_N\) for proofs of consistency of approximation, and the Euclidean space of coefficients to develop an implementable algorithm. The definition of the Euclidean space of coefficients, \({\bar{H}}_N\), and presentation of the implementable algorithm are provided in our companion paper [32]. We also define

$$\begin{aligned} \mathbf{H}_N := \mathbf{H} \cap H_N \quad \mathrm{and } \quad \mathbf{H}_{c,N} := \mathbf{H}_c \cap H_N. \end{aligned}$$

From the definition of \({\mathcal {N}}\), we see that the subspaces \(\{H_N\}_{N\in {{\mathcal {N}}}}\) have a desirable nested structure; for any given \(N, N^{\prime } \in {\mathcal {N}}\) such that \(N^{\prime } > N, H_N \subset H_{N^{\prime }}\).

We make use of Proposition 4.3.1 from [31], included here for completeness. We use notation \(\rightarrow ^{{\mathcal {N}}}\) to indicate convergence of the subsequence defined by \({\mathcal {N}}\).

Proposition 4.1

[31] \(\mathbf{H}_{c,N} \rightarrow ^{{\mathcal {N}}} \mathbf{H}_c\), as \(N \rightarrow \infty \), where set convergence is in the sense of Painlevé–Kuratowski.Footnote 7

We now consider the approximate solution of (3) by means of forward Euler’s method. For any \(\eta =(\xi ,u)\in H_N\) and \(N\in {\mathcal {N}}\), we set \(x_N^{\eta ,k}(0)=\xi ^k\) and

$$\begin{aligned} x_N^{\eta ,k}\left( (j+1)/N\right) - x_N^{\eta ,k}\left( j/N\right) = \frac{1}{N}h^k \left( x_N^{\eta ,k}(j/N),u^k(j/N)\right) \end{aligned}$$
(12)

for \(j\!=\!0,1,\ldots ,N-1\), \(k\!=\!1,2,\ldots ,K\). Furthermore, \(x_N^{\eta }\!:=\! ( (x_N^{\eta ,1})^{\top },\ldots ,(x_N^{\eta ,K})^{\top })^{\top }\). Simultaneously, we approximately solve (4) also by forward Euler’s method. For any \(\eta =(\xi ,u)\in H_N\), \(\alpha \in A\), and \(N\in {\mathcal {N}}\), we set \(z_N^{\eta }(0;\alpha )=0\) and

$$\begin{aligned} z_N^{\eta }\left( (j+1)/N;\alpha \right) - z_N^{\eta }\left( j/N;\alpha \right) = \frac{1}{N}\sum _{k=1}^K \sum _{l=1}^L r^{k,l}\left( x_N^{\eta ,k}(j/N),y^l(j/N;\alpha )\right) \end{aligned}$$
(13)

for \(j=0,1,\ldots ,N-1\).

Using the discretized “information state” given by the recursion (13), we define the approximate objective functions \(f_{N}{:}\,\mathbf{H}_N\rightarrow {{\mathbb {R}}}\) for any \(\eta \in \mathbf{H}_N\) and \(N \in {\mathcal {N}}\) by

$$\begin{aligned} f_{N}(\eta ) := \int _{\alpha \in A} {\tilde{f}}_{N}(\eta ;\alpha ) \phi (\alpha )\mathrm{d}\alpha , \end{aligned}$$
(14)

where, for any \(\alpha \in A\), \({\tilde{f}}_{N}(\cdot ;\alpha ){:}\,\mathbf{H}_{c,N} \rightarrow {{\mathbb {R}}}\) is defined by \({\tilde{f}}_{N}(\eta ;\alpha ) := \exp (-{z}_N^{\eta }(1;\alpha ))\).

For any \(N\in {\mathcal {N}}\), the approximating problem then takes the form

$$\begin{aligned} P_{N}{:}\,\min _{\eta \in \mathbf{H}_{c,N}}~~f_{N}(\eta ). \end{aligned}$$

We find that \(f_{N}(\cdot )\) and \({\tilde{f}}_N(\cdot ;\alpha )\), \(N \in {{\mathcal {N}}}\), are Gâteaux differentiable and that their gradients are Lipschitz continuous, which follow in a manner similar to that of Lemma 3.1 and Proposition 3.1:

Lemma 4.1

Suppose that Assumption 3.1 is satisfied and that \(N \in {\mathcal {N}}\). For any \(\alpha \in A\),

  1. (i)

    \({\tilde{f}}_{N}(\cdot ;\alpha )\) has a Gâteaux differential \(D{\tilde{f}}_N(\eta ; \alpha ; \delta \eta )\) at \(\eta \in \mathbf{H}_{c,N}\) for all \(\delta \eta \in H_N\), which is given by

    $$\begin{aligned} D{\tilde{f}}_N(\eta ; \alpha ;\delta \eta ) = \left\langle \nabla _{\eta } {\tilde{f}}_N(\eta ; \alpha ), \delta \eta \right\rangle _{H_2}, \end{aligned}$$

    where the gradient \(\nabla {\tilde{f}}_{N}(\eta ;\alpha ) := \left( \nabla _\xi {\tilde{f}}_{N}(\eta ;\alpha ), \ \nabla _{u} {\tilde{f}}_{N}(\eta ;\alpha )\right) \in H_N\) is

    $$\begin{aligned} \nabla _\xi {\tilde{f}}_{N}(\eta ;\alpha ) := p^{\eta }_{N}(0;\alpha ), \text{ and } \nabla _u {\tilde{f}}_{N}(\eta ;\alpha )(t) := \sum _{j=0}^{N-1} \gamma _{N}^{\eta } (j/N;\alpha ) \pi _{N,j}(t), \end{aligned}$$

    \(t\in [0,1]\), and

    $$\begin{aligned} \gamma _{N}^{\eta } (j/N;\alpha ) = \frac{1}{\sqrt{N}} \tilde{h}_u(x_N^\eta (j/N),u(j/N);\alpha )^{\top } p^{\eta }_{N}((j+1)/N;\alpha ), \end{aligned}$$

    \(j = 0,1,. . .,N-1\), with \(p_{N}^{\eta }(\cdot ;\alpha )\) determined by the adjoint equation

    $$\begin{aligned}&p_{N}^{\eta }(j/N;\alpha )-p_{N}^{\eta }((j+1)/N;\alpha ) \\&\quad =\frac{1}{N} \tilde{h}_{x}(x_N^\eta (j/N),u(j/N);\alpha )^{\top } p_{N}^{\eta }((j+1)/N;\alpha ),\\&\quad j = 0,1, \ldots ,N-1,\\&\quad p_{N}^{\eta }(1;\alpha )= \left( 0,\ldots ,0,-\exp \left( - z_N^{\eta }(1;\alpha )\right) \right) ^{\top } \in {{\mathbb {R}}}^{n+1}. \end{aligned}$$
  2. (ii)

    \({\tilde{f}}_N(\cdot ;\alpha )\) and \(\nabla {\tilde{f}}_N(\cdot ;\alpha )\) are Lipschitz \(\mathbf{H}\)-continuous on bounded subsets of  \(\mathbf{H}_{c,N}\).

Proposition 4.2

Suppose that Assumption 3.1 is satisfied and \(N \in {\mathcal {N}}\). For any \({\eta \in \mathbf{H}_{c,N}}\) and \(\delta \eta \in H_N\), \(f_{N}\) has a Gâteaux differential \(Df_{N}(\eta ; \delta \eta ) = \left\langle \nabla f_{N}(\eta ), \delta \eta \right\rangle _{H_2}\), where the gradient \(\nabla f_N(\eta ) := (\nabla _\xi f_N(\eta ), \nabla _u f_N(\eta ))^\top \in H_N\) is given by

$$\begin{aligned} \nabla _\xi f_N(\eta ):= & {} \int _{A} \nabla _{\xi } {\tilde{f}}_N(\eta ;\alpha )\phi (\alpha )\mathrm{d}\alpha ,\\ \nabla _u f_N(\eta )(t):= & {} \int _{A} \nabla _{u} {\tilde{f}}_N(\eta ;\alpha )(t)\phi (\alpha )\mathrm{d}\alpha , ~t\in [0,1] \end{aligned}$$

and \(\nabla f_N{:}\,\mathbf{H}_{c,N}\rightarrow H_N\) is Lipschitz \(\mathbf H\)-continuous on bounded subsets of  \(\mathbf{H}_{c,N}\).

As in Sect. 3, we state an optimality condition for \(P_N\) in terms of zeros of an optimality function. For any \(N \in {\mathcal {N}}\), we define the nonpositive optimality function \(\theta _{N}{:}\,\mathbf{H}_{c,N} \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \theta _{N} (\eta ) := \min _{\eta ^{\prime } \in \mathbf{H}_{c,N}} \left\langle \nabla f_{N}(\eta ), \eta ^{\prime }-\eta \right\rangle _{H_2} + \frac{1}{2} \left\| \eta ^{\prime } - \eta \right\| ^2_{H_2}, \end{aligned}$$

which characterizes stationary points of \(P_{N}\) as follows, where the proof is omitted due to its similarity to that of Proposition 3.2.

Proposition 4.3

Suppose that Assumption 3.1 is satisfied.

  1. (i)

    \(\theta _{N}\) is \(\mathbf{H}_{c,N}\)-continuous.

  2. (ii)

    If \(\hat{\eta } \in \mathbf{H}_{c,N}\) is a local minimizer of \(P_{N}\), then \(\theta _{N}(\hat{\eta }) = 0\).

Using an approach similar to that found in Section 3.3 of [31], we next show that the pairs \((P_{N},\theta _{N})\) in the sequence \(\{(P_{N},\theta _{N})\}_{N \in {\mathcal {N}}}\) are consistent approximations in the sense of Polak; see Definition 2 of “Appendix.” We begin with the following intermediate result, whose proof can be found in [33].

Proposition 4.4

Suppose that Assumption 3.1 is satisfied. For every bounded subset \(S \subset \mathbf{H}_c\), there exist constants \(C_0^S, K_0^S < \infty \) such that for every \(\eta \in S \cap H_N\), and \(N \in {{\mathcal {N}}}\),

$$\begin{aligned} \left| f_N(\eta ) - f(\eta ) \right| \le \frac{C_0^S}{N}, \text{ and } \left\| \nabla f_N(\eta ) - \nabla f(\eta ) \right\| _{H_2} \le \frac{K_0^S}{N}. \end{aligned}$$
(15)

We are now ready to state the consistency of the time discretization scheme.

Theorem 4.1

Suppose that Assumption 3.1 is satisfied. Then, the pairs \((P_{N}, \ \theta _{N})\), in the sequence \(\{(P_{N},\theta _{N})\}_{N \in {\mathcal {N}}}\) are consistent approximations for the pair \((P, \theta )\).

Proof

Epiconvergence of \(P_N\) to P, as \(N\rightarrow \infty \), follows straightforwardly from the uniform error bound (15). Next, suppose that \(\left\{ \eta _N\right\} _{N \in {\mathcal {N}}}\) is such that \(\eta _N \in \mathbf{H}_{c,N}\), for all \(N \in {\mathcal {N}}\), and \(\eta _N \rightarrow ^{{\mathcal {N}}} \eta \in \mathbf{H}_c\), as \(N \rightarrow \infty \). By the uniform error bound (15), \(\nabla f_{N}(\eta _N) \rightarrow ^{{\mathcal {N}}} \nabla f(\eta )\), as \(N \rightarrow \infty \), and consequently, \(\theta _{N}(\eta _N) \rightarrow ^{{\mathcal {N}}} \theta (\eta )\), as \(N \rightarrow \infty \). These two facts satisfy the requirements of Definition 2 (see “Appendix”) for consistency of approximation. \(\square \)

4.2 Time and Parameter Discretization

We next consider both time and parameter discretization. Since \(A \subset {{\mathbb {R}}}^2\), we adopt the two-dimensional discretization parameter \(M = \left( M_1, M_2\right) ^{\top } \in {{\mathbb {N}}} \times {{\mathbb {N}}}\). We also define a numerical integration scheme, \(I_M^k\), for a function \(\varPsi :A \rightarrow {{\mathbb {R}}}^k\), with \({\varPsi \in {\mathcal {C}}^p(A,{{\mathbb {R}}}^k)}\), where \({\mathcal {C}}^p(A,{{\mathbb {R}}}^k)\) is the collection of p-times continuously differentiable functions from A to \({{\mathbb {R}}}^k\). We view \(I_M^k\) as a mapping from \({\mathcal {C}}^p(A,{{\mathbb {R}}}^k)\) to \({{\mathbb {R}}}^k\) defined by

$$\begin{aligned} I_{M}^k(\varPsi ) := \sum \limits _{i = 1}^{M_1} {\sum \limits _{j = 1}^{M_2} {W_{ij} \varPsi (\alpha _{ij} )} }, \end{aligned}$$

for any \(\varPsi \in {{\mathcal {C}}}^p(A,{{\mathbb {R}}}^k)\), where \(W_{ij} \in {{\mathbb {R}}},\ i=1,2,\ldots ,M_1,\ j=1,2,\ldots ,M_2\), are weights and \(\alpha _{ij} \in A\) are discretization points.

We define the approximate problem

$$\begin{aligned} P_{NM}{:}\,\min _{\eta \in \mathbf{H}_{c,N}}~~f_{NM}(\eta ), \text{ where } f_{NM}(\eta ) := I_{M}^1 \left( \exp \left[ -z_N^{\eta }(1;\cdot )\right] \phi (\cdot )\right) . \end{aligned}$$

We proceed as in Section 4.1 by stating properties of \(f_{NM}\). The first result is analogous to that in Proposition 3.1 and the proof is omitted.

Proposition 4.5

Suppose that Assumption 3.1 is satisfied, \(N \in {\mathcal {N}}\), and \(M \in {{\mathbb {N}}} \times {{\mathbb {N}}}\). For any \(\eta \in \mathbf{H}_{c,N}\) and \(\delta \eta \in H_{\infty ,2}\), \(f_{NM}\) has a Gâteaux differential

$$\begin{aligned} Df_{NM}(\eta ; \delta \eta ) = \left\langle \nabla f_{NM}(\eta ), \delta \eta \right\rangle _{H_2}, \end{aligned}$$

where the gradient

$$\begin{aligned} \nabla f_{NM}(\eta ) := (\nabla _\xi f_{NM}(\eta ), \nabla _u f_{NM}(\eta ))^\top \in H_{\infty ,2} \end{aligned}$$

is given by

$$\begin{aligned} \nabla _\xi f_{NM}(\eta ):= & {} I_M^{n+1}( \nabla _{\xi } {\tilde{f}}_{N}(\eta ;\cdot )\phi (\cdot )),\\ \nabla _u f_{NM}(\eta )(t):= & {} I_M^m( \nabla _{u} {\tilde{f}}_{N}(\eta ;\cdot )(t)\phi (\cdot )), \quad t\in [0,1] \end{aligned}$$

and \(\nabla f_{NM}{:}\,\mathbf{H}_{c,N}\rightarrow H_N\) is Lipschitz \(\mathbf H\)-continuous on bounded subsets of \(\mathbf{H}_{c,N}\).

Numerical integration errors are method dependent, and we assume A is rectangular and \(I_M^k\) is given by two-dimensional composite Simpson’s rule. We strengthen the assumption about smoothness with respect to \(\alpha \) to allow for integration error quantification. An analogous development can be carried out in other cases.

Assumption 4.1

We assume that

  1. (i)

    \(A = \left\{ (\alpha _1,\alpha _2) \in {{\mathbb {R}}}^2 ~:~ \alpha _1^l \le \alpha _1 \le \alpha _1^u, \alpha _2^l \le \alpha _2 \le \alpha _2^u \right\} \) for scalars \(\alpha _1^l<\alpha _1^u\) and \(\alpha _2^l<\alpha _2^u\),

  2. (ii)

    Composite Simpson’s rule is used for numerical integration across A, and

  3. (iii)

    the functions \(\phi \), \(y^l(t;\cdot )\), \(r^{k,l}(x^k,\cdot )\), and \(\nabla _x r^{k,l}(x^k,\cdot )\) are four times continuously differentiable on A for all \(t\in [0,1]\), \(x^k\in {\mathbb {R}}^{n_k}\), \(k = 1, 2, \ldots , K\), and \(l = 1, 2, \ldots , L\).

To quantify the discretization error, we start by showing that the partial derivatives of \({\tilde{f}}_N(\eta ;\cdot ) \phi (\cdot )\) and \(\nabla _\eta {\tilde{f}}_N(\eta ;\cdot )\phi (\cdot )\) are four times continuously differentiable and uniformly bounded for any \(\alpha \in A\), \(N \in {\mathcal {N}}\), and \(\eta \) in a bounded subset of \(\mathbf{H}_{c,N}\). We need the following notation. For any \(\eta \in \mathbf{H}_{c,N}\), \(\alpha \in A\), \(N \in {\mathcal {N}}\), and \(j=0,1,\ldots ,N-1\), we define

$$\begin{aligned} \zeta _1(\alpha ):= & {} \exp \left[ - \sum \limits _{j=0}^{N-1} \frac{1}{N} \sum _{k=1}^K \sum _{l=1}^L r^{k,l}\left( x_N^{\eta ,k}\left( \frac{j}{N}\right) ,y^l\left( \frac{j}{N};\alpha \right) \right) \right] ,\nonumber \\ \zeta _2(\alpha ):= & {} \phi (\alpha ), ~ \zeta _3(\alpha ) := p_N^{\eta }(0;\alpha ), \text{ and } \zeta _4(\alpha ) := p_N^{\eta }\left( \frac{j+1}{N};\alpha \right) . \end{aligned}$$
(16)

We note that \(\zeta _1\) and \(\zeta _3\) depend on \(\eta \) and N, and \(\zeta _4\) also on j, even though this dependence is not explicit in the notation. We omit the proof of next result, as it is found in Lemma III.25 of [33].

Lemma 4.2

Suppose that Assumptions 3.1 and 4.1 are satisfied and S is a bounded subset of  \(\mathbf{H}_{c,N}\). Then,

  1. (i)

    \(\zeta _i(\cdot )\), \(i = 1, 2, 3, 4\), are four times continuously differentiable and

  2. (ii)

    there exists \(C_0 < \infty \), such that, for all \(\eta \in S\), \(j=0,1,\ldots ,N-1\), \(\alpha \in A\), and \(N \in {\mathcal {N}}\),

    $$\begin{aligned} \max \left\{ \left| \frac{\partial ^{4} \zeta _{i}(\alpha )}{\partial \alpha ^{4}_1} \right| , \left| \frac{\partial ^{4} \zeta _{i}(\alpha )}{\partial \alpha ^{4}_2} \right| \right\} \le C_0, \quad i =1,2, 3, 4. \end{aligned}$$

We are now in a position to state approximation error bounds after both time and space discretization, where \({{\mathbb {N}}}_3\) denotes the odd integers no smaller than 3.

Proposition 4.6

Suppose that Assumptions 3.1 and 4.1 are satisfied. Then, for every bounded subset \(S \subset \mathbf{H}_c\), there exist constants \(C_{1}^S, C_{2}^S, K_{1}^S, K_{2}^S < \infty \) such that, for any \(N \in {\mathcal {N}}\), \(M \in {{\mathbb {N}}}_3 \times {{\mathbb {N}}}_3\) and \(\eta \in S \cap H_N\),

  1. (i)
    $$\begin{aligned} \left| f(\eta )-f_{NM}(\eta )\right| \le \frac{C_{0}^S}{N} + \frac{C_{1}^S}{\left( M_1-1\right) ^4} + \frac{C_{2}^S}{\left( M_2-1\right) ^4} \end{aligned}$$
    (17)

    and

  2. (ii)
    $$\begin{aligned} \Vert \nabla f(\eta ) - \nabla f_{NM}(\eta ) \Vert _{H_2} \le \frac{K_{0}^S}{N} + \frac{K_{1}^S}{\left( M_1-1\right) ^4} + \frac{K_{2}^S}{\left( M_2-1\right) ^4}, \end{aligned}$$
    (18)

where \(C_0^S\) and \(K_{0}^S\) are as in Proposition 4.4.

Proof

We first consider (17). In view of Proposition 4.4 and the fact that

$$\begin{aligned} \left| f(\eta )-f_{NM}(\eta )\right| \le \left| f(\eta )-f_{N}(\eta )\right| + \left| f_{N}(\eta )-f_{NM}(\eta )\right| , \end{aligned}$$
(19)

it suffices to consider the last term on the right-hand side in (19). Let

$$\begin{aligned} g(\alpha ) := \zeta _1(\alpha ) \zeta _2(\alpha ) = \exp (-z^{\eta }_N(1;\alpha ))\phi (\alpha ) \end{aligned}$$

be a short-hand notation for the integrand in (14). By Lemma 4.2, \(g(\cdot )\) is four times continuously differentiable and there exists a constant \(c<\infty \) such that \(\left| {\partial ^4 g(\alpha )}/{\partial \alpha ^{4}_{i}}\right| \le c\) for \(i = 1, 2\), \(\eta \in S\), \(\alpha \in A\), and \(N\in {\mathcal {N}}\), where we recall that \(g(\alpha )\) depends on N and \(\eta \) through the definition of \(\zeta _1(\alpha )\). Then, under Assumption 4.1, standard integration error analysis gives that

$$\begin{aligned}&\left| I_{M}^1 \left( g(\cdot ) \right) - \int _{\alpha \in A} g(\alpha )\mathrm{d}\alpha \right| \nonumber \\&\quad \le \frac{(\alpha _2^u-\alpha _2^l)(\alpha _1^u\!-\!\alpha _1^l)}{180} \left[ \frac{(\alpha _1^u-\alpha _1^l)^4}{\left( M_1\!-\!1\right) ^4} \max _{\alpha \in A} \left| \frac{\partial ^4 g(\alpha )}{\partial \alpha ^{4}_1}\right| \!+\! \frac{(\alpha _2^u-\alpha _2^l)^4}{\left( M_2-1\right) ^4} \max _{\alpha \in A} \left| \frac{\partial ^4 g(\alpha )}{\partial \alpha ^{4}_2} \right| \right] \nonumber \\&\quad \le \frac{(\alpha _2^u-\alpha _2^l)(\alpha _1^u-\alpha _1^l)}{180} \left[ \frac{(\alpha _1^u-\alpha _1^l)^4}{\left( M_1-1\right) ^4} c + \frac{(\alpha _2^u-\alpha _2^l)^4}{\left( M_2-1\right) ^4} c \right] . \end{aligned}$$
(20)

Consequently, (17) holds with \(C_{1}^S = c (\alpha _2^u-\alpha _2^l)(\alpha _1^u-\alpha _1^l)^5/180\) and \(C_{2}^S = c (\alpha _1^u-\alpha _1^l)(\alpha _2^u-\alpha _2^l)^5/180\).

Second, we consider (18) in a similar manner. Since Proposition 4.4 holds and

$$\begin{aligned} \Vert \nabla f(\eta ) - \nabla f_{NM}(\eta ) \Vert _{H_2} \le \Vert \nabla f(\eta ) - \nabla f_{N}(\eta ) \Vert _{H_2} + \Vert \nabla f_{N}(\eta ) - \nabla f_{NM}(\eta ) \Vert _{H_2}, \end{aligned}$$
(21)

we only need to consider the last term on the right-hand side of this expression. For the sake of notational simplicity, let \(G(\alpha ) := (G_\xi (\alpha ), G_u(\alpha ))\), where

$$\begin{aligned} G_{\xi }(\alpha ) := \nabla _{\xi } {\tilde{f}}_N(\eta ;\alpha )\phi (\alpha ) = p_N^{\eta }(0;\alpha )\phi (\alpha ) = \zeta _3(\alpha )\zeta _2(\alpha ), \end{aligned}$$

and for \(t \in [0,1]\),

$$\begin{aligned} G_{u}(\alpha )(t):= & {} \nabla _u {\tilde{f}}_N(\eta ;\alpha )(t)\phi (\alpha ) = \left[ \sum \limits _{j=0}^{N-1} \gamma _N^{\eta }\left( \frac{j}{N};\alpha \right) \pi _{N,j}(t) \right] \phi (\alpha ) \\= & {} \sum \limits _{j=0}^{N-1} \left( \begin{array}{c} h_u^1\left( x_N^{\eta }\left( \frac{j}{N}\right) ,u\left( \frac{j}{N}\right) \right) \\ \vdots \\ h_u^K\left( x_N^{\eta }\left( \frac{j}{N}\right) ,u\left( \frac{j}{N}\right) \right) \\ 0 \end{array} \right) ^{\top } \pi _{N,j}(t) \zeta _4(\alpha )\zeta _2(\alpha ). \end{aligned}$$

Clearly, \(G(\alpha )\) equates the integrand in Proposition 4.2; see Lemma 4.1 and the definitions given in (16). The vector \(G_\xi (\alpha )\) has \(n+1\) components, which we denote by \(G_{\xi j}(\alpha )\), \(j = 1, 2, \ldots , n+1\). In view of Lemma 4.2, we find that \(G_\xi (\cdot )\) and \(G_u(\cdot )(t)\) are four times continuously differentiable and there exists a constant \(c'<\infty \) such that

$$\begin{aligned} \max \left\{ \sup _{t\in [0,1]} \left| \frac{\partial ^4 G_{u}(\alpha )(t)}{\partial \alpha ^{4}_{i}}\right| , \max _{j=1, 2, \ldots , n+1}\left| \frac{\partial ^4 G_{\xi j}(\alpha )}{\partial \alpha ^{4}_{i}}\right| \right\} \le c' \end{aligned}$$
(22)

for \(i = 1, 2\), \(\eta \in S\), \(\alpha \in A\), and \(N\in {\mathcal {N}}\). In view of this fact, standard integration error analysis gives that

$$\begin{aligned}&\left\| I_{M}^{n+1} (G_{\xi }(\cdot ) ) - \int _{\alpha \in A} G_{\xi }(\alpha ) \mathrm{d}\alpha \right\| _\infty \nonumber \\&\quad \le \frac{(\alpha _2^u-\alpha _2^l)(\alpha _1^u-\alpha _1^l)}{180} \left[ \frac{(\alpha _1^u-\alpha _1^l)^4}{\left( M_1-1\right) ^4} c' + \frac{(\alpha _2^u-\alpha _2^l)^4}{\left( M_2-1\right) ^4} c' \right] \end{aligned}$$
(23)

and, similarly, for all \(t \in [0,1]\),

$$\begin{aligned} \left\| I_{M}^m (G_{u}(\cdot )(t) ) - \int _{\alpha \in A} G_{u}(\alpha )(t) d\alpha \right\| _\infty \end{aligned}$$
(24)

is bounded by the same expression. Using the properties of the \(H_2\)-norm, we find that (18) holds with \(K_{1}^S = \sqrt{n+1+m}c'(\alpha _2^u-\alpha _2^l)(\alpha _1^u-\alpha _1^l)^5/180\) and \(K_{2}^S = \sqrt{n+1+m}c' (\alpha _1^u-\alpha _1^l)(\alpha _2^u-\alpha _2^l)^5/180\). \(\square \)

As above, we define a nonpositive optimality function \(\theta _{NM}{:}\,\mathbf{H}_{c,N} \rightarrow {{\mathbb {R}}}\) by

$$\begin{aligned} \theta _{NM} (\eta ) := \min _{\eta ^{\prime } \in \mathbf{H}_{c,N}} \left\langle \nabla f_{NM}(\eta ), \eta ^{\prime }-\eta \right\rangle _{H_2} + \frac{1}{2} \left\| \eta ^{\prime } - \eta \right\| ^2_{H_2}, \end{aligned}$$
(25)

which characterizes stationary points of \(P_{NM}\), as stated next, where we again omit the proof.

Proposition 4.7

Suppose that Assumptions 3.1 and 4.1 are satisfied.

  1. (i)

    \(\theta _{NM}(\cdot )\) is an \(\mathbf{H}_{c,N}\)-continuous function.

  2. (ii)

    If \(\hat{\eta } \in \mathbf{H}_{c,N}\) is a local minimizer of \(P_{NM}\), then \(\theta _{NM}(\hat{\eta }) = 0\).

Since consistency of approximation is defined only for a single-parameter family of approximating problems (see Definition 2 in “Appendix”), we link the space discretization parameter M to the time discretization parameter N. For \(i = 1,2\), let \(M_i:{\mathcal {N}} \rightarrow {\mathbb {N}}\) be such that \(M_i(N)\rightarrow \infty \), as \(N\rightarrow \infty \), and \(M(N) \in {{\mathbb {N}}}_3 \times {{\mathbb {N}}}_3\). We conclude this section by stating the main result of consistency, but omit the proof as, in view of Proposition 4.6, it follows easily as in Theorem 4.1.

Theorem 4.2

Suppose that Assumptions 3.1 and 4.1 are satisfied. Then, the pairs \((P_{NM(N)}, \ \theta _{NM(N)})\), in the sequence \(\{(P_{NM(N)},\theta _{NM(N)})\}_{N \in {\mathcal {N}}}\), are consistent approximations for the pair \((P, \theta )\).

The consistency results of Theorem 4.2 ensure that global minimizers, local minimizers, and stationary points of the discretized problems \(P_{NM(N)}\) converge to global minimizers, local minimizers, and stationary points, respectively, of the original parameter-distributed optimal control model P as the level of discretization tends to infinity. Because the original model is nonconvex, it is especially important to have convergence of stationary solutions.

5 Conclusions

We formulate a search-trajectory optimization problem, with multiple searchers looking for multiple targets in continuous time and space, as a parameter-distributed optimal control model. We construct discretization schemes to solve these continuous time-and-space problems, and prove that they are consistent approximations in the sense of E. Polak. Our theoretical development and proofs of consistency are useful, if not essential, for algorithm development. In particular, Theorem 4.2 can be used to develop an implementable algorithm that automatically adjusts the time and space discretization parameters, as described in our companion paper [32]. In addition, Proposition 4.6 provides error bounds that can be used in subsequent algorithms to provide guidance for stopping criteria.