Keywords

Mathematics Subject Classification

14.1 Introduction

Consider an operator A:VY, where V and Y are Banach spaces. Suppose that it is continuously differentiable at a neighborhood of a point y 0Y. Denote by A′(y 0) the derivative of the operator A at the point y 0. It is well known that the following result holds (see, for example, [1]).

The Inverse Function Theorem

Assume that there exists the continuous inverse operator A′(y 0)−1. Then there exists an open neighborhood O of the point y 0 such that the set O′=A(O) is an open neighborhood of the point v 0=Ay 0; moreover, there exists the continuously differentiable inverse map A −1:O′→O, and its derivative is defined by the formula

$$\bigl(A^{-1} \bigr)'(v)= \bigl\{A' \bigl[A^{-1}(v) \bigr] \bigr\}^{-1} \quad \forall v\in O'. $$

This result has very important applications. It has relationships to the Implicit Function Theorem [2], Newton–Kantorovich Method [1, 2], Lusternik Smooth Manifold Approximation Theorem [3, 4], Brower Fixed Point Theorem [1, 5], Morse Smooth Function Singularity Lemma [1], Graves Cover Theorem [4], etc. Extensions of the Inverse Function Theorem to high orders differentiability [6], nonsmooth operators [79], multiple-valued maps [1, 7, 10], etc., are also known.

In reality the Inverse Function Theorem involve two different results. These are the invertibility of the given operator and the differentiability of the corresponding inverse operator. Sometimes only the second property is important. It is true, for example, for the extremum theory and the inverse problems theory. In particular, consider the system described by the equation

$$ Ay = v. $$
(1)

The term v can be interpreted here as a control or an identifiable parameter, and y is a state function. Suppose that (1) has a unique solution y=y(v) from the space Y for all values vV. Then the operator A is invertible. This result can be proved by some tools which are applicable to the given equation. Therefore, it is not necessary to use of the Inverse Function Theorem here.

Let U be a convex closed subset of the space V. The state functional is defined by the formula

$$I(v) = J(v) + K \bigl[y(v) \bigr], $$

where J is a functional on the set V, and K is a functional on the set Y. We have the following optimization control problem.

Problem 1

Minimize the functional I on the set U.

A necessary condition for the minimum of a smooth functional F on a convex set W at a point v 0 is the variational inequality (see [11])

$$ \bigl\langle F'(v_{0}),v-v_{0} \bigr\rangle \geq 0 \quad \forall v \in W, $$
(2)

where 〈λ,φ〉 is the value of the linear continuous functional λ at the point φ.

The functional I is the sum of J and the map vK[y(v)]. The last mapping is the superposition of the functional K and the map vy(v), which is, in fact, the inverse operator A −1. Then the proof of the differentiability of the given functional requires the differentiation of the inverse operator. This result can be obtained using the Inverse Function Theorem.

Lemma 1

Suppose that the operator A has a continuous inverse operator, which is continuously differentiable at an open neighborhood of the point y 0=y(v 0), and there exists the continuous inverse operator A′(y 0)−1. Then the map y(⋅):VY is Gateaux differentiable at the point v 0, and its derivative satisfies the formula

$$ \bigl\langle \mu,y'(v_{0})h \bigr\rangle = \bigl\langle p_{\mu}(v_{0}),h \bigr\rangle \quad \forall \mu \in Y^{*}, h\in V, $$
(3)

where Y is the adjoint space of Y, and p μ (v 0) is the solution of the equation

$$ \bigl[A'(y_{0})\bigr]^{*}p_{\mu}(v_{0})= \mu. $$
(4)

Proof

By the Inverse Function Theorem the map y(⋅):VY is differentiable at the point v 0, and, moreover,

$$y'(v_{0})=\bigl[A'(y_{0}) \bigr]^{-1}. $$

Then we get

$$\bigl\langle \mu,y'(v_{0})h \bigr\rangle = \bigl\langle \mu, \bigl[A'(y_{0}) \bigr]^{-1}h \bigr\rangle = \bigl\langle \bigl\{ \bigl[A'(y_{0}) \bigr]^{-1} \bigr\}^{*}\mu,h \bigr\rangle \quad \forall \mu \in Y^{*},h\in V. $$

It is known that each linear operator and its adjoint operator are invertible at the same time (see p. 460 in [2]). Therefore (4) has a unique solution

$$p_{\mu}(v_{0})= \bigl\{ \bigl[A'(y_{0}) \bigr]^{-1} \bigr\}^{*}\mu $$

from the space V . So the previous formula can be transformed to (4), and the equality (3) is true. □

Now we can prove the differentiability of the functional I and obtain necessary conditions of optimality. Let v 0 be the solution of the minimization problem for the functional I on the set U. Define y 0=y(v 0).

Lemma 2

Under the conditions of Lemma 1 suppose that the functional J is Gateaux differentiable at the point v 0, and the functional K is Frechet differentiable at the point y 0. Then the control v 0 satisfies the variational inequality

$$ \bigl\langle J'(v_{0})-p_{0},v-v_{0} \bigr\rangle \geq 0 \quad \forall v\in U, $$
(5)

where p 0 is a solution of the adjoint equation

$$ \bigl[A'(y_{0})\bigr]^{*}p_{0}=-K'(y_{0}). $$
(6)

Proof

Using the Composite Function Theorem (see p. 637 in [2]), we obtain that the Gateaux derivative of the map vK[y(v)] exists such that

$$(Ky)'(v_{0})=K'(y_{0})y'(v_{0}). $$

By equality (3) we get

$$\bigl\langle(Ky)'(v_{0}),h \bigr\rangle = \bigl\langle K'(y_{0}),y'(v_{0})h \bigr \rangle= -\langle p_{0},h\rangle \quad \forall h\in V, $$

where p 0 is the solution of (4) for μ=−K′(y 0). Thus, we obtain the adjoint equation (6). So the derivative of the map vK[y(v)] at the point v 0 equals to −p 0. Then the functional I has the derivative

$$I'(v_{0})=J'(v_{0})-p_{0} $$

at this point. Using (2), we obtain the variational inequality (5). □

Thus the Inverse Function Theorem is a good tool for proving the differentiability of the control-state mapping. This result is the basis for obtaining necessary optimality conditions. Note that we use now the serious assumption of the invertibility of the operator’s derivative. It is equivalent to the existence of the unique solution yY for the linearized equation

$$ A'(y_{0})y=v $$
(7)

for all vV.

Now we have the following questions:

  • How large is the class of operators that satisfy the mentioned assumption?

  • What is the criterion of the differentiability of the inverse operator at a concrete point?

  • Could we prove the differentiability of the inverse operator without using the Inverse Function Theorem?

  • Could we prove a weaker form of the differentiability of the inverse operator for obtaining optimality conditions in the case of non-invertibility of the operator’s derivative?

We will try to answer these questions.

14.2 Criterion for the Differentiability of the Inverse Operator

Consider an operator A:YV. Let it be continuous and differentiable at a neighborhood of a point y 0Y.

Theorem 1

Suppose the existence of an open neighborhood O of the point y 0 such that the set O′=A(O) is an open neighborhood of the point v 0=Ay 0. Suppose that there exists the inverse operator A −1:O′→O, and that (7) has not more than one solution. Then this inverse operator is Gateaux differentiable at v 0 if and only if the derivative A′(y 0) is a surjection.

Proof

Let the derivative A′(y 0) be a surjection. Then it is invertible by the assumptions of the theorem. By Banach Inverse Operator Theorem there exists the continuous inverse operator A′(y 0)−1. Therefore, the differentiability of the operator A −1 at the point v 0 follows from the Inverse Function Theorem directly.

Suppose now that the operator A −1 has the Gateaux derivative D at y 0, and that the derivative A′(y 0) is not a surjection. We get the equality

$$Ay(v_{0}+\sigma v)-Ay(v_{0})=\sigma v $$

for all vV and small enough number σ. Dividing it by σ and passing to the limit as σ→0, using the Composite Function Theorem and differentiability of A −1, we get

$$A'(y_{0})Dv=v. $$

Then there exists a point y=Dv from Y such that A′(y 0)y=v. So the derivative A′(y 0) is a surjection. However this conclusion contradicts our assumption. Hence, the operator A′(y 0) is a surjection whenever the inverse operator is differentiable. □

Thus Gateaux differentiability of the inverse operator is equivalent to the following property: the operator A′(y 0) is a surjection. It is called Lusternik Condition [4].

Consider as an example the homogeneous Dirichlet Problem for the equation

$$ -\varDelta y+|y|^{\rho}y =v $$
(8)

in the n-dimensional bounded set Ω, where ρ>0. Denote the space

$$Y=H^{1}_{0}(\varOmega )\cap L_{q}(\varOmega ), $$

where q=ρ+2. Using Monotone Operators Theory [12], we obtain that this boundary problem has the unique solution yY for all v from the set V, which is the adjoint space

$$Y^{*}=H^{-1}(\varOmega )+L_{q'}(\varOmega ), $$

where 1/q+1/q′=1. Denote the operator A:YV such that Ay equals to the left side of the equality (8). The existence of the operator A −1 follows from the one-valued solvability of the boundary problem. Its differentiability can be obtained by using the properties of the linearized equation. It is the homogeneous Dirichlet Problem for the equation

$$ -\varDelta y+(\rho+1)|y_{0}|^{\rho}y =v. $$
(9)

Corollary 1

The solution of the Dirichlet problem for (8) is Gateaux differentiable with respect to the absolute term at the point v=v 0 iff (9) has a solution yY for all vV.

Indeed, the continuous differentiability of the given operator A is obvious. The existence of the inverse operator follows from the one-valued solvability of the given boundary problem. It is obvious that the Dirichlet problem for the linear equation (9) cannot have two solutions. Then the criterion for the invertibility of the inverse operator is the Lusternik condition, by Theorem 1.

Now we obtain a criterion for the differentiability of the solution of (8) with respect to the absolute term on the space V.

Corollary 2

The solution of the Dirichlet problem for (8) is Gateaux differentiable with respect to the absolute term at an arbitrary point if and only if the embedding \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\) is true.

Proof

Multiply equality (9) by the function y and integrate the result in xΩ using the Green formula and the boundary condition. We get

$$\int_\varOmega {|\nabla y|^{2}dx} + (\rho +1) \int _\varOmega {|y_{0}|^{\rho}y^{2}dx} = \int_\varOmega {vy dx}. $$

We have \(Y=H^{1}_{0}(\varOmega )\) by the given assumption, hence V=H −1(Ω). So the a-priori estimate of the solution of (9) in the sense of Y for all vV follows from the obtained equality. Now we get the one-valued solvability of the linearized equation by means of the standard theory of elliptic equations (see, for example, [11]). Thus the differentiability of the solution of (8) with respect to the absolute term at an arbitrary point follows from Corollary 1.

We prove now that the solution of (8) is not differentiable with respect to its absolute term, if the mentioned embedding does not hold. Let y 0 be a continuous function from the space Y. Then the left side of the equality (9) is a point of the space H −1(Ω) for all yY. Therefore, the image of the derivative A′(y 0) is narrower than the set V, if the mentioned embedding does not hold. So (9) does not have any solutions from the space Y for all function v from the difference VH −1(Ω). Therefore, the solution of the homogeneous Dirichlet problem for (8) is not Gateaux differentiable at the point

$$v_{0}=-\varDelta y_{0}+|y_{0}|^{\rho}y_{0} $$

by Corollary 1. This completes the proof of Corollary 2. □

By Sobolev Theorem the embedding \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\) is true if n=2 or ρ≤4/(n−2) for n>2. Then the solution of (8) is differentiable with respect to the absolute term for small enough values of the set dimension n and nonlinearity parameter ρ. These characteristics determine a degree of the difficulty for the given equation. It is clear that the differentiability of the inverse operator (but not the absence of this property) follows from the Inverse Function Theorem. We will show soon that there exists another technique for proving this property. It is applicable even in the case of nondifferentiability in the sense of Gateaux. However it is important, that it satisfies some property which can be interpreted as a weak form of the differentiability.

The obtained result can be used for the analysis of optimization control problems for the system described by (8). Consider as an example the functional

$$I(v)=\frac{\alpha}{2}\|v\|^{2}_{*}+\frac{1}{2} \bigl\|y(v)-y_{d} \bigr\|^{2}, $$

where α>0, y d H −1(Ω), and y(v) is the solution of the Dirichlet problem (8) for the control v, besides ∥⋅∥ and ∥⋅∥ are the norms of the spaces \(H^{1}_{0}(\varOmega )\) and H −1(Ω). Consider the following optimization problem.

Problem 2

Minimize the functional I on the convex closed subset U of the space V.

The solvability of this problem can be proved by a standard method (see, for example Chap. 1, Theorem 1.1 in [11]) using the weak continuity of the state function with respect to the absolute term. Note that the indeterminacy of the functional I on the complete set U is not an obstacle for the analysis of the optimization problem [13].

Corollary 3

If \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\), then the solution v 0 of Problem 2 satisfies the inequality

$$ \int_\varOmega { (\alpha \varLambda v_{0}-p_{0} ) (v-v_{0})dx} \geq 0 \quad \forall v\in U, $$
(10)

where Λ is the canonical isomorphism of the spaces H −1(Ω) and \(H^{1}_{0}(\varOmega )\), and p 0 is the solution of the homogeneous Dirichlet problem for the equation

$$ -\varDelta p_{0}+(\rho +1)|y_{0}|^{\rho} p_{0}=\varDelta y_{0}-\varDelta y_{d}. $$
(11)

Proof

The derivative of the functional J (first term of the minimizing functional) is defined by the equality

$$\bigl\langle J'(v_{0}),h\bigr\rangle= \alpha(v_{0},h)_{*} \quad \forall h\in H^{-1}, $$

where (⋅,⋅) is the scalar product of the space H −1(Ω). By Riesz theorem there exists the canonical isomorphism \(\varLambda :H^{-1}(\varOmega )\rightarrow H^{1}_{0}(\varOmega )\). Then we get

$$J'(v_{0})=\alpha \varLambda v_{0}. $$

The derivative of the functional K (second term of the minimizing functional) is defined by the equality

$$\bigl\langle K'(y_{0}),h\bigr\rangle=(y_{0}-y_{d},h) \quad \forall h\in H^{1}_{0}(\varOmega ), $$

where (⋅,⋅) is the scalar product of the space \(H^{1}_{0}(\varOmega )\). Using Green formula, we obtain

$$K'(y_{0})=\varDelta y_{d}-\varDelta y_{0}. $$

The operator A′(y 0) is self-adjoint. Then the adjoint equation (6) transforms to (11), and the variational inequality (5) transforms to (10). This completes the proof of the corollary. □

14.3 Differentiation of the Inverse Operator

We will try to prove the differentiability of the inverse operator directly without using of the Inverse function Theorem. Consider again an operator A:VY and a point v 0V. Suppose the following assumption.

Property 1

The operator A is invertible in a neighborhood O of the point v 0.

Choose a small enough positive number σ such that the point v σ =v 0+σh is in O for all hV. Denote by y(v) the value A −1 v. Using the equalities Ay(v σ )=v σ , Ay(v 0)=v 0, we get

$$Ay(v_{\sigma})-Ay(v_{0})=\sigma h. $$

Assume the following property.

Property 2

The operator A is Gateaux differentiable.

By the Mean Value Theorem we obtain

$$Ay-Ay_{0}= \biggl\{\int_0^1{A' \bigl[y_{0}+\theta(y-y_{0}) \bigr]d\theta} \biggr \}(y-y_{0}), $$

where y 0=y(v 0). Then we have

$$G(v_{\sigma}) \bigl[y(v_{\sigma})-y(v_{0}) \bigr]=\sigma h, $$

where the linear continuous operator G(v):YV is defined by the formula

$$G(v)=\int_0^1{A' \bigl \{y_{0}+\theta\bigl[y(v)-y_{0}\bigr] \bigr\}d\theta} $$

for all vV. We get

$$ \bigl\langle G(v_{\sigma})^{*} \lambda, \bigl[y(v_{\sigma})-y(v_{0}) \bigr]/\sigma \bigr\rangle = \langle \lambda, h \rangle \quad \forall \lambda \in V^{*}. $$
(12)

Consider the linear operator equation

$$ G(v)^{*}p_{\mu}(v) = \mu. $$
(13)

It transforms to

$$ A'(y_{0})^{*}p_{\mu}(v_{0}) = \mu $$
(14)

for v=v 0. We will use the following assumption.

Property 3

Equation (13) has a unique solution p μ (v)∈V for all μY , vO.

Defining λ=p μ (v σ ) for small enough σ in (12) we get

$$ \bigl\langle \mu, \bigl[y(v_{0}+\sigma h)-y(v_{0}) \bigr]/ \sigma \bigr\rangle = \bigl\langle p_{\mu}(v_{\sigma}),h \bigr \rangle \quad \forall \mu \in Y^{*},h\in V. $$
(15)

Define

$$M= \bigl\{\mu\in Y^{*} | \|\mu\|=1 \bigr\}. $$

Property 4

The convergence p μ (v σ )→p μ (v 0) *-weakly in V uniformly with respect to μM as σ→0 is true for all vV.

Theorem 2

Let us suppose the Properties 14. Then the operator A −1 has the Gateaux derivative D at the point v 0 such that

$$ \langle \mu,Dh\rangle = \bigl\langle p_{\mu}(v_{0}),h \bigr \rangle \quad \forall \mu\in Y^{*},h\in V. $$
(16)

Proof

Let the operator D be defined by (16). It is a map from V to Y. Besides it is linear continuous. Using (15) and (16) we get

by the definition of the norm. Then we obtain p μ (v σ )→p μ (v 0) *-weakly in V uniformly with respect to μM for all hV because of Property 4. Passing to the limit in the last equality as σ→0, we get the convergence

$$\bigl[y(v_{0}+\sigma h)-y(v_{0}) \bigr]/\sigma\rightarrow Dh \quad \mbox{in } V \mbox{ for all } h\in V. $$

So the operator D is the Gateaux derivative of the operator A −1 at the point v 0. □

Let us explain applications of this result.

Lemma 3

The operator A for (8) satisfies the Properties 14 if \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\).

Proof

Property 1 is the one-valued solvability of (8). The differentiability of the operator A (Property 2) is obvious, moreover, its derivative is defined by the equality

$$A'(y)h=-\varDelta h+(\rho+1)|y|^{\rho}h \quad \forall h\in Y. $$

Thus it is necessary to use Properties 3 and 4 and properties of the adjoint equation (13).

We have

where ε∈[0,1]. Define

$$g(v)= \bigl|y_{0}+\varepsilon\bigl[y(v)-y_{0}\bigr] \bigr|^{\rho/2}, $$

so that we get

$$G(v)y=-\varDelta y+g(v)^{2}y. $$

Then we obtain the equality

$$\bigl\langle G(v)^{*}p,y \bigr\rangle = \bigl\langle p,G(v)y \bigr \rangle =\int_\varOmega { \bigl[-\varDelta y+g(v)^{2}y \bigr]p dx}=\int_\varOmega { \bigl[-\varDelta p+g(v)^{2}p \bigr]y dx} $$

for all yY, pV , vV. So we get

$$G(v)^{*}p=-\varDelta p+g(v)^{2}p, $$

and (13) is transformed to

$$ -\varDelta p_{\mu}(v_{\sigma})+g(v_{\sigma})^{2}p_{\mu}(v_{\sigma})= \mu. $$
(17)

Multiplying (17) by p μ (v σ ) and integrating in xΩ we have

$$\int_\varOmega { \bigl|\nabla p_{\mu}(v_{\sigma}) \bigr|^{2}}dx+\int_\varOmega { \bigl|g(v_{\sigma})p_{\mu}(v_{\sigma}) \bigr|^{2}}dx=\int_\varOmega {\mu p_{\mu}(v_{\sigma})}dx. $$

Then we obtain the inequality

$$\bigl\|p_{\mu}(v_{\sigma}) \bigr\|^{2}+ \bigl\|g(v_{\sigma})p_{\mu}(v_{\sigma}) \bigr\|^{2}_{2}\leq \|\mu\|_{*} \bigl\|p_{\mu}(v_{\sigma}) \bigr\|, $$

where ∥⋅∥ p is the norm in L p (Ω). So we get

$$ \bigl\|p_{\mu}(v_{\sigma}) \bigr\| \leq \|\mu\|_{*}, \qquad \bigl\|g(v_{\sigma})p_{\mu}(v_{\sigma}) \bigr\|_{2}\leq \|\mu \|_{*}. $$
(18)

Then (17) has the unique solution p μ (v σ )∈V for all μY , hV, and σ, and hence Property 3 holds.

The space V is reflexive, so it is sufficient to prove that p μ (v σ )→p μ (v 0) weakly in V uniformly with respect to μ as σ→0 for all hV. The set {p μ (v σ )} is bounded in the space \(H^{1}_{0}(\varOmega )\), and the set {g(v σ )p μ (v σ )} is bounded in the space L 2(Ω) uniformly with respect to μM for all hV because of the inequalities (18). Using the Banach–Alaogly Theorem we get p μ (v σ )→p weakly in \(H^{1}_{0}(\varOmega )\) uniformly with respect to μM for all hV. Applying the Rellich–Kondrashov Theorem we get p μ (v σ )→p strongly in L 2(Ω) and a.e. on Ω. Using the continuity of the solution of (8) with respect to the absolute term, we obtain y(v σ )→y(v 0) in \(H^{1}_{0}(\varOmega )\) and a.e. on Ω. Then

$$\bigl|g(v_{\sigma}) \bigr|^{2}p_{\mu}(v_{\sigma}) \rightarrow (\rho+1)|y_{0}|^{\rho}p \quad \mbox{a.e. on } \varOmega . $$

The sets {p μ (v σ )}, {y(v σ )}, and {g(v σ )2/ρ} are uniformly bounded in L q (Ω). We have

So the set {g(v σ )2 p μ (v σ )} is uniformly bounded in L q(Ω). Using Lemma 1.3 (see Chap. 1 in [12]), we get

$$g(v_{\sigma})^{2}p_{\mu}(v_{\sigma})\rightarrow ( \rho+1)|y_{0}|^{\rho}p \quad \mbox{weakly in } L_{q'}( \varOmega ) $$

uniformly with respect to μM for all hV.

Let us multiply (16) by a function \(\lambda \in H^{1}_{0}(\varOmega )\). After integration we get

$$\int_\varOmega { \bigl[-\varDelta p_{\mu}(v_{\sigma})+g(v_{\sigma})^{2}p_{\mu}(v_{\sigma}) \bigr]\lambda}dx=\int_\varOmega {\lambda\mu}dx. $$

Passing to the limit as σ→0, we obtain, that the function p=p μ (v 0) satisfies the equation

$$ -\varDelta p_{\mu}(v_{0})+(\rho+1)|y_{0}|^{\rho}p_{\mu}(v_{0})= \mu. $$
(19)

Thus p μ (v σ )→p μ (v 0) weakly in \(H^{1}_{0}(\varOmega )\) uniformly with respect to μM for all hV, notably the Property 4 is true. □

By Lemma 3 the differentiability of the solution of (8) with respect to the absolute term follows from Theorem 2 if the embedding \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\) holds.

Lemma 4

Properties 14 follow from the assumptions of the Inverse Function Theorem.

Proof

The existence of the inverse operator is a corollary of the Inverse Function Theorem. The differentiability of the operator A is the assumption of this theorem. So our general difficulty is the analysis of (13), namely the justification of Assumptions 3 and 4. Equation (13) can be transformed to

$$G(v_{0})^{*}p_{\mu}(v_{\sigma}) = A'(y_{0})^{*}p_{\mu}(v_{\sigma}) = \bigl[G(v_{0})^{*}-G(v_{\sigma})^{*} \bigr]p_{\mu}(v_{\sigma})+\mu. $$

The derivative A′(y 0) is invertible by the Inverse Function Theorem. So its adjoint operator is invertible too. Then (13) can be transformed to the equality

$$ p_{\mu}(v_{\sigma})=L_{\mu}(\sigma h)p_{\mu}(v_{\sigma}), $$
(20)

where the map L μ (σh):V V is defined by the formula

$$L_{\mu}(\sigma h)p= \bigl[A'(y_{0})^{*} \bigr]^{-1} \bigl\{ \bigl[G(v_{0})^{*}-G(v_{\sigma})^{*} \bigr]p+\mu \bigr\}. $$

Using properties of the operator norm we get the inequality

$$\begin{aligned} \bigl\|L_{\mu}(\sigma h)p_{1}-L_{\mu}(\sigma h)p_{2} \bigr\|_{V^{*}}&= \bigl\| \bigl[A'(y_{0})^{*} \bigr]^{-1} \bigl[G(v_{0})^{*}-G(v_{\sigma})^{*} \bigr](p_{1}-p_{2}) \bigr\|_{V^{*}} \\ &\leq \bigl\| \bigl[A'(y_{0})^{*} \bigr]^{-1} \bigr\| \bigl\|G(v_{0})^{*}-G(v_{\sigma})^{*} \bigr\|\|p_{1}-p_{2}\|_{V^{*}} \end{aligned} $$

for all p 1,p 2V . Then we obtain

$$\begin{aligned} &\bigl\|L_{\mu}(\sigma h)p_{1}-L_{\mu}(\sigma h)p_{2} \bigr\|_{V^{*}}\\ & \quad \leq \bigl\|A'(y_{0})^{-1} \bigr\| \bigl\|G(v_{0})-G(v_{\sigma}) \bigr\|\|p_{1}-p_{2} \|_{V^{*}} \quad \forall p_{1},p_{2}\in V^{*} \end{aligned} $$

because of the equality of the norms for adjoint operators. The operator A −1 is continuous at the point v 0 by the Inverse Function Theorem. Therefore we get the convergence y(v 0+σh)→y 0 in Y as σ→0 for all hV. Using the continuous differentiability of the operator A at the point y 0, we get G(v σ )→G(v 0) in the sense of the corresponding operator norm. The value σ can be chosen small enough such that

$$\bigl\|G(v_{\sigma})-G(v_{0}) \bigr\| \leq \chi \bigl\|A'(y_{0})^{-1} \bigr\|^{-1}, $$

where 0<χ<1. So we obtain the estimate

$$\bigl\|L_{\mu}(\sigma h)p_{1}-L_{\mu}(\sigma h)p_{2} \bigr\|_{V^{*}}\leq \chi \|p_{1}-p_{2} \|_{V^{*}} \quad \forall p_{1},p_{2}\in V^{*}. $$

Thus the operator L μ (σh) is contracting. Then (20) has a unique solution p μ (v σ )∈V because of the Contracting Mapping Theorem.

We get G(v σ )→G(v 0) as σ→0. So G(v σ )λG(v 0)λ in V for all λY. Using the obtained inequalities, we get

$$\begin{aligned} \bigl\|p_{\mu}(v_{\sigma}) \bigr\|_{V^{*}} &= \bigl\|L_{\mu}( \sigma h)p_{\mu}(v_{\sigma}) \bigr\|_{V^{*}} \\ &\leq \bigl\|\bigl[A'(y_{0})^{*}\bigr]^{-1} \bigr\| \bigl\| \bigl[G(v_{0})^{*}-G(v_{\sigma})^{*} \bigr]p_{\mu}(v_{\sigma})+\mu \bigr\|_{Y^{*}} \\ &\leq \bigl\|A'(y_{0})^{-1} \bigr\| \bigl[ \bigl\|G(v_{0})-G(v_{\sigma}) \bigr\| \bigl\|p_{\mu}(v_{\sigma}) \bigr\|_{V^{*}}+ \|\mu\|_{Y^{*}} \bigr] \\ &\leq\chi \bigl\|p_{\mu}(v_{\sigma}) \bigr\|_{V^{*}}+ \bigl\|A'(y_{0})^{-1} \bigr\|\|\mu\|_{Y^{*}}. \end{aligned} $$

So we have

$$(1-\chi) \bigl\|p_{\mu}(v_{\sigma}) \bigr\|_{V^{*}}\leq \bigl\|A'(y_{0})^{-1} \bigr\|\|\mu\|_{Y^{*}}. $$

Then p μ (v σ )→p *-weakly in V for all hV as σ→0.

Using inequality (13) we get

$$\bigl\langle p_{\mu}(v_{\sigma}),G(v_{\sigma})\lambda \bigr \rangle =\langle\mu,\lambda\rangle \quad \forall \lambda \in Y. $$

As a consequence {p μ (v σ )} converges *-weakly, and {G(v σ )} converges strongly. After passing to the limit we have A′(y 0) p=μ, and p=p μ (v 0). □

Thus the assumptions of Theorem 2 follow from the assumptions of the Inverse Operator Theorem. However assertions of Theorem 2 may be true if assumptions of the Inverse Operator Theorem are not satisfied.

14.4 Extended Differentiation of the Inverse Operator

The solution of (8) is differentiable with respect to the absolute term for small enough values of the set dimension n and nonlinearity parameter ρ. But it is not differentiable for large enough values of these parameters. Suppose n≥3. By Sobolev Theorem the embedding \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\) is true if ρ≤4/(n−2). It guarantees the differentiability of the considered inverse operator. However this embedding fails if the parameter ρ increases. Then the solution of the equation becomes non-differentiable with respect to the absolute term. It seems to be a strange situation. Properties of the inverse operator change with a jump at the neighborhood of some value ρ. The differentiability of the operator disappears after the passage of this value. This situation seems not likely. We could suppose the existence of a weaker operator differentiability than the Gateaux derivative. We would like also to determine the extension of the operator derivative because the solvability of our optimization problem was proved for all values of the set dimension and the nonlinearity parameter.

There exist extensions of classical operator differentiation, for example, subdifferential calculus [14], Clarke derivatives [15], quasidifferential calculus [7]. They are used also for the resolution of nonsmooth optimization problems. These results are effective enough for the analysis of operators with nonsmooth terms, for example, the absolute value or the maximum of functions. However similar terms are absent in our case. So we will try to define another form of operator derivatives extension.

It is known that “the general idea of the differential calculus is a local approximation of a function by a linear function” (see p. 170 in [16]). The differentiation is a tool of the local approximation of the analyzed object. The desired form of an operator derivative can be obtained by weakening of topological approximation properties of the differentiation. Then we get the extended operator derivative (see [1719]).

Definition

An operator L:VY is called (V 0,Y 0;V 1,Y 1)-extended differentiable in the sense of Gateaux at the point v 0V if there exist linear topological spaces V 0, Y 0, V 1, Y 1 with continuous embeddings

$$V_{1}\subset V_{0} \subset V, \qquad Y \subset Y_{0} \subset Y_{1}, $$

and a linear continuous operator D:V 0Y 0 such that

$$\bigl[L(v_{0}+\sigma h)-L(v_{0}) \bigr]/\sigma \rightarrow Dh \quad \mbox{in }Y_{1}\mbox{ for all }h\in V_{1} $$

as σ→0.

It is obvious that the (V,Y;V,Y)-derivative is the standard Gateaux derivative. The following result is known (see Theorem 4 in [18]; Theorem 5.4 in [19]).

Lemma 5

The operator A −1 for (8) is (V 0,Y 0;V 1,Y 1)-extended differentiable in the sense of Gateaux at an arbitrary point v 0V, where

$$Y_{1}=H^{1}_{0}(\varOmega ), \qquad Y_{0}=Y_{1} \cap \bigl\{y | |y_{0}|^{\rho/2}y\in L_{2}(\varOmega ) \bigr \}, $$
$$V_{1}=H^{-1}(\varOmega ), \qquad V_{0}=V_{1}+\bigl \{v | v=|y_{0}|^{\rho/2}\varphi, \varphi\in L_{2}(\varOmega ) \bigr\}, \quad y_{0}=y(v_{0}) ,$$

moreover, its derivative D satisfies the equality

$$ \int_\varOmega {\mu Dh}dx = \int_\varOmega {p_{\mu}(v_{0})h}dx \quad \forall \mu \in Y_{0}^{*},h\in V_{0}, $$
(21)

and p μ (v 0) is the solution of the homogeneous Dirichlet problem for (19).

Thus the inverse operator for the given example is extended differentiable for all values of the set dimensions and nonlinearity parameters. Its extended derivative is transformed to the Gateaux one for small enough values of these characteristics. However the Gateaux derivative does not exist for its large enough values, notably in the case of the high enough degree of the difficulty for the problem. Besides the difference between standard derivative and extended one is determined by this degree of the difficulty. Thus the inverse operator is extended differentiable without any constraints. However the extended derivative differs from the classical one after the augmentation of the parameters that determine the degree of the difficulty for the problem. Then we obtain the gradual change of the inverse operator properties after the gradual change of its parameters, although the standard derivatives theory permits the change with a jump.

We will prove that the obtained result is sufficient for the analysis of the given optimization problem without any constraints.

Corollary 4

The solution of the minimization problem of the functional I on the set U for (8) satisfies the variational inequality

$$ \int_\varOmega { (\alpha \varLambda v_{0}-p_{0} ) (v-v_{0})}dx \geq 0 \quad \forall v\in U_{1}, $$
(22)

where U 1=U∩(v 0+V 1), and p 0 is a solution of (11).

Indeed, if v 0 is a solution of the optimization problem, then

$$I\bigl[v_{0}+\sigma (v-v_{0})\bigr]-I(v_{0}) \geq 0 \quad \forall v\in U. $$

Let us choose vv 0+V 1. Passing to the limit and using Lemma 5 after division by σ we get

$$\int_\varOmega {\alpha \varLambda v_{0}(v-v_{0})}dx+ \int_\varOmega {\nabla(y_{0}-y_{d})\nabla D(v-v_{0})}dx \geq 0 \quad \forall v\in U_{1}. $$

Then the inequality (22) is true.

If \(H^{1}_{0}(\varOmega )\subset L_{q}(\varOmega )\), then U 1=U, and the variational inequalities (10) and (22) are equal. Thus necessary conditions of optimality can be obtained without any assumptions by means of the extended derivatives theory. Optimization problems for elliptic equations with power nonlinearity without Gateaux differentiability of the control-state mapping were considered in [18, 19]. But the control space was narrower, and the state functional was more regular there. This technique was used for the analysis of optimization problems for others equations in [20].

Note that Lemma 5 uses the technique of the proof of Theorem 2. We can suppose that it is possible to obtain the extended differentiability of the inverse operator in the general case. Consider Banach spaces Y, V, a map A:YV, and points y 0Y, v 0=Ay 0. Let V 1 be a Banach subspace of V with a neighborhood O 1 of zero. Then O=v 0+O 1 is a neighborhood of v 0. We suppose the following assertion.

Property 5

The operator A is invertible on the set O.

Define y(v)=A −1 v. We get the equality

$$Ay(v_{\sigma})-Ay(v_{0})=\sigma h $$

for all vV 1 and small enough σ, where v σ =v 0+σh. Let G(v) be the operator from the proof of Theorem 2. We have

$$G(v_{\sigma}) \bigl[y(v_{\sigma})-y(v_{0}) \bigr]=\sigma h, $$

so

$$\bigl\langle\lambda,{G}(v_{\sigma}) \bigl[y(v_{\sigma})-y(v_{0}) \bigr] \bigr\rangle = \sigma \langle\lambda,h\rangle \quad \forall \lambda\in V^{*}. $$

Consider Banach spaces V(v) and Y(v) such that the embeddings of the spaces Y, Y 1 and Y(v) to V(v), Y(v) and V, respectively, are continuous for all vO. Let the following assumption be true.

Property 6

The operator A is Gateaux differentiable, moreover, there exists the continuous extension \(\overline{G}(v)\) of the operator G(v) to Y(v) such that its image is a subset of V(v) for all vO.

Using the properties y(v)∈y 0+Y(v) and V(v)V we get

$$ \bigl\langle\overline{G}(v_{\sigma})^{*}\lambda, \bigl[y(v_{\sigma})-y(v_{0}) \bigr] \bigr\rangle = \sigma \langle \lambda,h\rangle \quad \forall \lambda\in V^{*}. $$
(23)

It is an analogue of (12). Consider the linear operator equation

$$ \overline{G}(v_{\sigma})^{*}p_{\mu}(v_{\sigma})= \mu, $$
(24)

which is an analogue of (13). It can be transformed to

$$\overline{A}'(y_{0})p_{\mu}(v_{0})= \mu $$

for v=v 0, where \(\overline{A}'(y_{0})=\overline{G}(v_{0})\) is the extension of the operator A′(y 0)=G(v 0) to the set Y(v 0).

Consider Banach space V 1 such that the embedding Y(v)⊂V 1 is continuous and dense for all vO. We suppose the following condition.

Property 7

Equation (24) has the unique solution p μ (v)∈V(v) for all vO, μY(v).

Defining in (23) λ=p μ (v σ ) for a small enough σ we get

$$ \bigl\langle \mu, \bigl[y(v_{0}+\sigma h)-y(v_{0}) \bigr]/ \sigma \bigr\rangle = \bigl\langle p_{\mu}(v_{\sigma}),h\bigr \rangle \quad \forall \mu \in Y(v_{\sigma})^{*},h\in V_{1}. $$
(25)

We will use the additional assumption.

Property 8

The convergence p μ (v σ )→p μ (v 0) holds *-weakly in \(V_{1}^{*}\) uniformly with respect to μM as σ→0 for all hV 1.

The extended differentiability of the inverse operator is guaranteed by the following result.

Theorem 3

Let us suppose the Properties 58. Then the operator A −1 has the (V(v 0),Y(v 0);V 1,Y 1)-extended Gateaux derivative D at the point v 0 such that

$$ \langle \mu,Dh\rangle = \bigl\langle p_{\mu}(v_{0}),h\bigr \rangle \quad \forall \mu \in Y(v_{0})^{*}, h\in V(v_{0}). $$
(26)

Proof

By (25), (26) we get

$$ \begin{aligned}[b] &\bigl\langle \mu, \bigl[y(v_{0}+\sigma h)-y(v_{0}) \bigr]/ \sigma-Dh \bigr\rangle\\ & \quad = \bigl\langle p_{\mu}(v_{\sigma})- p_{\mu}(v_{0}),h \bigr\rangle \quad \forall \mu \in M,h\in V(v_{1}). \end{aligned} $$
(27)

Then

$$\bigl\| \bigl[y(v_{0}+\sigma h)-y(v_{0}) \bigr]/\sigma-Dh \bigr\|_{V_{1}}= \sup_{\mu \in M} \bigl| \bigl\langle p_{\mu}(v_{\sigma})-p_{\mu}(v_{0}),h \bigr \rangle \bigr|. $$

We have p μ (v σ )→p μ (v 0)*-weakly in V 1 uniformly with respect to μM for all hV 1 as σ→0 by Property 8. Passing to the limit in the last equality we obtain

$$\bigl[y(v_{0}+\sigma h)-y(v_{0})\bigr]/\sigma\rightarrow Dh \quad \mbox{in } Y_{1} $$

for all hV 1. Thus D is an extended derivative of the inverse operator. □

A result of the extended differentiability of the inverse operator for nonnormalized spaces was obtained in [18].

Let us prove that the assumptions of the Theorem 3 are true for the considered example.

Lemma 6

The operator A, which is defined by (8), satisfies the Properties 58.

Proof

The Property 5 is the solvability of (8) at a neighborhood of the given point. It is obviously that this assumption is true. The differentiability of the operator A is clear. The operator G(v) for our case is determined by the equality

$$G(v)y=-\varDelta y +g(v)^{2}y \quad \forall y\in Y, $$

where

$$g(v)^{2}=(\rho+1) \bigl|y_{0}+\varepsilon \bigl[y(v)-y(v_{0}) \bigr] \bigr|^{\rho}, \quad \varepsilon \in [0,1]. $$

Let the spaces Y 1, V 1, Y(v), V(v) be those defined in the proof of Lemma 5. Define the map \(\overline{G}(v)\) by the equality

$$\overline{G}(v)y=-\varDelta y+g(v)^{2}y \quad \forall y\in Y(v). $$

Then Property 6 is true. Reliability of Properties 7 and 8 was obtained in the proof of Lemma 5. □

Thus extended differentiability of the inverse operator for (8) follows from Theorem 3.

Lemma 7

the Properties 58 follow from the assumptions of the Inverse Function Theorem.

Indeed, Property 5 is a direct corollary of this theorem. Let us define the spaces V=V 1, Y=Y 1. Then we get Y(v)=Y, V(v)=V. So the operator \(\overline{G}(v)\) is equal to G(v), and Property 6 is trivial. Therefore Properties 7 and 8 are transformed to Properties 3 and 4. Its validity was proved before.

Thus Theorem 3 is a generalization of the Theorem 2. The obtained results can be used for other applications if it is necessary to differentiate an inverse operator. For example the extended differentiable submanifolds of Banach spaces are defined in [21, 22]. Optimization control problems on differentiable submanifolds are considered there. Analogical results could be obtained for the implicit operator, including the case of nonnormalized spaces (see [23]). Banach spaces with extended differentiable operators form a category, and necessary conditions of optimality have a category interpretation (see [24]).