1 Introduction

We start with the singular perturbation problem

$$\begin{aligned} y^{'}(t)= & {} f(y(t),z(t)), \\ \epsilon z^{'}(t)= & {} g(y(t),z(t)), \quad 0< \epsilon<<1,\\ y(0)= & {} y_0, z(0)=z_0. \end{aligned}$$

The structure of the solutions of these problems is well understood (Hairer and Wanner 1991). They are a superposition of smooth solution having an \(\epsilon \)-expansion plus some rapidly decay functions. Of course, we should assume some hypotheses on the function g: \(g_z(y,z)\) is invertible and his logarithmic norm is negative in a \(\epsilon \)-independent neighborhood of the solution.

To approximate these equations, we should use implicit methods. For explicit methods, the discretization parameters depend on the perturbation parameters and they have not practical interest. The error analysis and the existence and uniqueness of the implicit Runge–Kutta equations are also analyzed in Hairer and Wanner (1991). The results are supported in the Newton–Kantorovich theory. In particular, we have the difficult task to find good initial guesses to approximate the nonlinear Runge–Kutta equations by Newton-type methods. In practice, this fact should be a real problem, as we will point out in the following example:

1.1 Linear vs nonlinear singular perturbation problems

The problems with nonlinear equations can appear even for very robust methods using step variable implementations. In this section, we analyze the numerical behavior, for linear and nonlinear equations, of the stiff solver ode23s, that we can find in the odeset of MATLAB (The MathWorks 2017).

We consider the linear problem (Stoer and Bulirsch 1993)

$$\begin{aligned} y_{1}^{\prime }(t)= & {} \dfrac{\lambda _{1}+\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}-\lambda _{2}}{2} y_{2}(t),\\ \epsilon y_{2}^{\prime }(t)= & {} \dfrac{\lambda _{1}-\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}+\lambda _{2}}{2} y_{2}(t), \end{aligned}$$

with \(\lambda _{i}<0,\) and two (associated) nonlinear problems:

$$\begin{aligned} y_{1}^{\prime }(t)= & {} \dfrac{\lambda _{1}+\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}-\lambda _{2}}{2} y_{2}(t) + y_1(t) y_2(t), \\ \epsilon y_{2}^{\prime }(t)= & {} \dfrac{\lambda _{1}-\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}+\lambda _{2}}{2} y_{2}(t). \end{aligned}$$

and

$$\begin{aligned} y_{1}^{\prime }(t)= & {} \dfrac{\lambda _{1}+\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}-\lambda _{2}}{2} y_{2}(t) + y_1(t) y_2(t), \\ \epsilon y_{2}^{\prime }(t)= & {} \dfrac{\lambda _{1}-\lambda _{2}}{2}y_{1}(t)+\dfrac{\lambda _{1}+\lambda _{2}}{2} y_{2}(t)+2 y_1(t) y_2(t). \end{aligned}$$

We consider in all the cases \(\lambda _1=-10^{15}\), \(\lambda _2=-10^{-15}\), \(\epsilon =10 ^{15}\) and (1, 1) as the initial conditions.

As we can see, in Fig. 1, for the linear case and for the first nonlinear perturbation, the method gives good results even for these large parameters. In this case, the nonlinear part appears only in the first equation (without \(\epsilon \)).

However, some oscillations occur in the second nonlinear problem (Fig. 2). The stiffness caused by the linear part at the beginning of the interval is well integrated. The oscillations are at the end of the interval where the nonlinear part dominates. This fact indicates that the considered initial guess is outside of the basin of attraction of the used Newton-type method (Hairer and Wanner 1991).

Fig. 1
figure 1

Left second component of the linear problem, right second component of the first nonlinear problem

Fig. 2
figure 2

Left second component of the second nonlinear problem, right a zoom of the oscillation near the end of the interval

1.2 Our variational approach

The ideas which we would like to introduce for the treatment of singular perturbation problems are based on the analysis of a certain error functional of the form:

$$\begin{aligned} E(y,z)=\frac{1}{2} \int _0^T|y^{'}(t)-f(y(t),z(t))|^2 + |\epsilon z^{'}(t)-g(y(t),z(t))|^2\,\mathrm{d}t, \end{aligned}$$

to be minimized among the absolutely continuous paths \(y,z:(0, T)\rightarrow \mathbf{R}^N\) with \((y(0),z(0))=(y_0,z_0)\). Note that if E(yz) is finite for one such path (yz), then automatically, \(y{'}\) and \(z^{'}\) are square integrable.

This error functional is associated in a natural way with the original singular perturbation problem.

One main property of this functional is the absence of local minima different from the solution of the problem. Thus, the typical minimization schemes like (steepest) descent methods will work fine as they can never get stuck in local minima, and converge steadily to the solution of the problem, no matter what the initialization is. In our approach, based on optimality conditions, we only need to approximate linear singular perturbation problems. In particular, the problem of having good initial guesses in the application of Newton-type methods, to approximate the associated nonlinear equations, is avoided. In fact, we will able to ensure global convergence independent of \(\epsilon \).

We should mention that we have already explored in some previous papers this point of view. Since the initial contributions (Amat and Pedregal 2009, 2013), we have also treated the reverse mechanism of using first discretization and then optimality (Amat et al. 2014). We also have addressed, from this viewpoint, other problems like DAEs (Amat et al. 2013) or problems with retarded arguments (Amat et al. 2012).

The paper is organized as follows: in Sect. 2, we introduce our algorithm and we prove its global convergence for all \(\epsilon >0\), and in Sect. 3, we present some numerical analysis pointing out some advantages of our variational approach in comparison with the classical implicit approach.

2 Some analysis of the variational approach

The analytical part of this contribution requires a special property on the maps f and g: for every positive \(C>0\) and small \(\eta >0\), there are constants \(D^f_{C, \eta }>0\) and \(D^g_{C, \eta }>0\), so that

$$\begin{aligned} |f(y+r,z+s)-f(y,z)-\nabla f(y,z)(r,s)|\le D_{C, \eta }^f|(r,s)|^2,\quad |(y,z)|\le C,|(r,s)|\le \eta , \end{aligned}$$

and

$$\begin{aligned} |g(y+r,z+s)-g(y,z)-\nabla g(y,z)(r,s)|\le D_{C, \eta }^g|(r,s)|^2,\quad |(y,z)|\le C,|(r,s)|\le \eta . \end{aligned}$$

This regularity is somehow not surprising as our approach here is based on regularity and optimality. On the other hand, that regularity holds for most of the important problems in applications. It certainly does in all numerical tests performed in this work. See, however, Amat and Pedregal (2009) for a similar analysis under more general assumptions. Our emphasis here is placed on the fact that this optimization strategy may be utilized to set up approximation schemes based on the minimization of the error functional. Indeed, the analytical part is oriented towards providing a solid basis for this approximation procedure.

Proposition 1

Let \((\overline{y}, \overline{z})\) be a critical point for the error E. Then, \((\overline{y}, \overline{z})\) is the solution of the singular perturbation problem.

Proof

The proof is elementary. Based on the smoothness and bounds assumed on the mappings f and g, we conclude that if \((y,z)\equiv (\overline{y},\overline{z})\) is a critical point for the error E, then (yz) ought to be a solution of the problem:

$$\begin{aligned}&-\frac{\mathrm{d}}{\mathrm{d}t}\left( y^{\prime }(t)-f(y(t),z(t))\right) -\left( y^{\prime }(t)-f(y(t),z(t))\right) \nabla f(y(t),z(t))=0\hbox { in }(0, T), \\&-\frac{\mathrm{d}}{\mathrm{d}t}\left( \epsilon z^{\prime }(t)-g(y(t),z(t))\right) -\left( \epsilon z^{\prime }(t)-g(y(t),z(t))\right) \nabla g(y(t),z(t))=0\hbox { in }(0, T), \\&y(0)=y_0, y^{\prime }(T)-f(y(T),z(T))=0, z(0)=z_0, \epsilon z^{'}(T)-g(y(T),z(T))=0. \end{aligned}$$

The vector-valued maps \(r(t)=y^{\prime }(t)-f(y(t),z(t))\) and \(s(t)=\epsilon z^{\prime }(t)-g(y(t),z(t))\) are then solutions of the linear, nondegenerate problems:

$$\begin{aligned} r^{\prime }(t)+r(t)\nabla f(y(t),z(t))= & {} 0\hbox { in }(0, T),\quad r(T)=0, \\ s^{\prime }(t)+s(t)\nabla g(y(t),z(t))= & {} 0\hbox { in }(0, T),\quad s(T)=0. \end{aligned}$$

The only solutions of these problems are \(r\equiv 0\) and \(s\equiv 0\), and so (yz) is the solution of the singular perturbed problem. \(\square \)

It, therefore, looks like a promising strategy to search for approximations of the solution (yz) by minimizing the error E.

On the other hand, suppose that we start with an initial crude approximation \((y_{(0)},z_{(0)})\equiv (y, z)\) to the solution of our original singular perturbation problem. We could take \((y_{(0)},z_{(0)})=(y_0,z_0)\) for all t. We would like to improve this approximation in such a way that the error is significantly decreased.

It is straightforward to find the Gâteaux derivative of E at a given feasible (yz) in the direction (rs) with \((r(0),s(0))=(0,0)\). Namely

$$\begin{aligned} E'(y,z)(r,s)= & {} \int _0^T (y^{\prime }(t)-f(y(t),z(t)),\epsilon z^{\prime }(t)-g(y(t),z(t)))^T\\&\cdot \,(r^{\prime }(t)-\nabla f(y(t),z(t))r(t),\epsilon s^{\prime }(t)-\nabla g(y(t),z(t))s(t))\,\mathrm{d}t. \end{aligned}$$

This expression suggests a main possibility to select (rs) from:

Choose (rs), such that

$$\begin{aligned} r^{\prime }(t)-\nabla f(y(t),z(t)) (r(t),s(t))=f(y(t),z(t))-y'(t)\hbox { in }(0, T),\quad r(0)=0, \\ \epsilon s^{\prime }(t)-\nabla g(y(t),z(t)) (r(t),s(t))=g(y(t),z(t))-\epsilon z'(t)\hbox { in }(0, T),\quad r(0)=0. \end{aligned}$$

In this way, it is clear that \(E'(y,z)(r,s)=-2E(y,z)\), and so the (local) decrease of the error is of the size E(yz). Finding (rs) requires only solving the above linear problem.

2.1 Strong global convergence

Suppose (yz) are feasible paths in the interval (0, T), so that \(y(0)=y_0\), \(z(0)=z_0\), \(y'\), \(z'\) are square integrable, and the quantity

$$\begin{aligned} E(y,z)=\frac{1}{2} \int _0^T|y^{\prime }(t)-f(y(t),z(t))|^2 + |\epsilon z^{\prime }(t)-g(y(t),z(t))|^2\,\mathrm{d}t, \end{aligned}$$

measures how far such x is from being a solution of our problem. We also assume that \(|(y(t),z(t))| \le C\) for a fixed constant C, and all \(t\in (0, T)\).

Choose \(\eta >0\) and \(0<\alpha <1\), so that \(\frac{\eta }{1-\sqrt{\alpha }}\le C\).

We then solve for (rs) the linear problem

$$\begin{aligned} r^{\prime }(t)-\nabla f(y(t),z(t)) (r(t),s(t))= & {} f(y(t),z(t))-y'(t)\hbox { in }(0, T),\quad r(0)=0, \\ \epsilon s^{\prime }(t)-\nabla g(y(t),z(t)) (r(t),s(t))= & {} g(y(t),z(t))-\epsilon z'(t)\hbox { in }(0, T),\quad s(0)=0, \end{aligned}$$

and pretend to update (yz) to \((y+r,z+s)\) in such a way that the error for \((y+r,z+s)\) be less than the error for the current iteration (yz).

Note that

$$\begin{aligned} E(y+r,z+s)= & {} \frac{1}{2} \int _0^T|y^{'}(t)+r^{'}(t)-f(y(t)+r(t),z(t)+s(t))|^2 \\&+\, |\epsilon (z^{'}(t)+s^{'}(t))-g(y(t)+r(t),z(t)+s(t))|^2\,\mathrm{d}t \\= & {} \frac{1}{2} \int _0^T |f(y(t)+r(t),z(t)+s(t))-f(y(t),z(t))-\nabla f(y(t),z(t)) (r(t),s(t))|^2 \\&+\, |g(y(t)+r(t),z(t)+s(t))-g(y(t),z(t))-\nabla g(y(t),z(t)) (r(t),s(t))|^2\,\mathrm{d}t \end{aligned}$$

where we have used the differential equations satisfied by r and s.

By our assumption on f and g above,

$$\begin{aligned} |f(y+r,z+s)-f(y,z)-\nabla f(y,z)(r,s)|\le D_{C, \eta }^f|(r,s)|^2, \end{aligned}$$

and

$$\begin{aligned} |g(y+r,z+s)-g(y,z)-\nabla g(y,z)(r,s)|\le D_{C, \eta }^g|(r,s)|^2, \end{aligned}$$

provided that \(|(y,z)|\le C,|(r,s)|\le \eta \). Let \(D:= \max \{ D_{C, \eta }^f , D_{C, \eta }^g \}\).

Since we know that (rs) is the solution of the above linear problem and recalling our main assumption, \(g_z(y,z)\) is invertible, and his logarithmic norm is negative in a \(\epsilon \)-independent neighborhood of the solution, we have the upper bounds (see Theorem 3.2 in Hairer and Wanner 1991):

$$\begin{aligned} \max \{|r(t)|^2,|s(t)|^2 \} \le T K E(y,z) \quad \hbox { for all }t\in [0, T], K>0. \end{aligned}$$
(1)

Notice that (r(t), s(t)) is the solution of a linear perturbation problem with jacobian respect to the second component equal to \(g_z(y,z)\) (the same as the original problem). Thus, assuming the above hypothesis on g, we ensure the above bounds for all the associated linear problems. In particular, all these constants are independent of \(\epsilon \).

Assume that we select \(T>0\) so small that

$$\begin{aligned} E(y,z)\le \frac{\eta ^2}{K T}, \end{aligned}$$
(2)

and then \(\max \{|r(t)|,|s(t)| \}\le \eta \) for all \(t\in [0, T]\).

By the previous bounds, we can write

$$\begin{aligned} E(y+r,z+s)\le \frac{D^2}{2}\int _0^T |r(t)|^4 + |s(t)|^4\,\mathrm{d}t\le \frac{D^2}{2} K^2 T^3E(y,z)^2. \end{aligned}$$
(3)

If, in addition, we demand, by making T smaller if necessary,

$$\begin{aligned} E(y,z)\le \frac{2\alpha }{D^2 K^2 T^3}, \end{aligned}$$
(4)

then \(E(y+r,z+s)\le \alpha E(y,z)\).

All these calculations form the basis of a typical induction argument (see Amat and Pedregal 2013 for more details), giving us the following convergence result:

Theorem 1

With the previous notation, the sequence

$$\begin{aligned} (y^{(j)},z^{(j)})=(y^{(j-1)}+r^{(j)}, z^{(j-1)}+s^{(j)}), \end{aligned}$$

where

$$\begin{aligned} (r^{(j)})'(t)-\nabla f(y^{(j-1)}(t),z^{(j-1)}(t)) (r^{(j)}(t),s^{(j)}(t))=f(y^{(j-1)}(t),z^{(j-1)}(t))-(y^{(j-1)})'(t), \\ \epsilon (s^{(j)})'(t)-\nabla g(y^{(j-1)}(t),z^{(j-1)}(t)) (r^{(j)}(t),s^{(j)}(t))=g(y^{(j-1)}(t),z^{(j-1)}(t))-\epsilon (z^{(j-1)})'(t), \end{aligned}$$

in (0, T), with \( r(0)=0, s(0)=0\), converges strongly in \(L^\infty (0, T)\) to the unique solution of the singular perturbed problem in a small interval (0, T).

Since the various ingredients of the problem do not depend on T, we can proceed to have a global solution in a big interval by successively performing this analysis in intervals of appropriate small size. For instance, we can always divide a global interval (0; T) into a certain number n of subintervals of small length \(h (T = nh)\) with

$$\begin{aligned} \frac{E(y,z) D^2 K^2}{2 \alpha }\le \frac{1}{h^3}. \end{aligned}$$
(5)

3 Numerical analysis

We pretend to provide some experimental evidence that our iterative schemes are competitive with respect to the classical approach, without pretending at this stage to provide any further numerical analysis for our approach. We consider some singular perturbation problems well known in the literature.

In practice, we use as stopping criterion that the norms of \(r^j\) and \(s^j\) be smaller than a prefixed tolerance Tol.

We will observe the good convergence of our approach for different sizes of \(\epsilon \) and moderate sizes of the discretization parameter. As initial guess of our algorithm, we simple consider the initial condition of the problems.

In all the paper, for simplicity in the notation, we only have considered autonomous problems. However, our approach works similarly for nonautonomous singular perturbed problems.

3.1 A nonautonomous singular perturbed problem

We consider the problem (Hairer and Wanner 1991)

$$\begin{aligned} \epsilon z'(t)= & {} - z(t) + \cos (t),\\ z(0)= & {} 1, \end{aligned}$$

whose explicit solution is

$$\begin{aligned} z(t)=(1+\epsilon ^2)^{-1} (\cos (t)+\epsilon \sin (t))+ C \exp (-t/\epsilon ), \end{aligned}$$

where \(C=1-(1+\epsilon ^2)^{-1}.\)

We consider again the trapezoid method with \(N=100\) and \(T=10\). We consider different tolerances for a given \(\epsilon \) and different \(\epsilon \) for two particular tolerances. In Tables 12, and 3, we observe a very good numerical behavior of our approach according with our global convergence result.

Table 1 Nonautonomous singular perturbed problem, \(\epsilon =10^{-5}\)
Table 2 Nonautonomous singular perturbed problem, \(Tol=10^{-3}\)
Table 3 Nonautonomous singular perturbed problem, \(Tol=10^{-5}\)
Fig. 3
figure 3

Classical implementation of the trapezoid method, nonlinear test problem, ‘circle’-original, ‘asterisk’-approximation, left \(\epsilon =10^{-2}\), and right \(\epsilon =10^{-3}\)

3.2 A nonlinear test problem

Finally, we consider a small nonlinear perturbation of the scalar linear test problem. Indeed, we are interesting in approximating the value x(1) of the solution x(t) of the problem:

$$\begin{aligned} \epsilon z'(t)= & {} - z(t) + 10^{-2} \cdot z^2(t),\\ z(0)= & {} 1, \end{aligned}$$

whose explicit solution is

$$\begin{aligned} z(t)=\frac{100}{1+99 e^{\frac{1}{\epsilon } t}}. \end{aligned}$$

We consider the trapezoid method

$$\begin{aligned} z_{n+1}=z_n+\frac{h}{2} (f(z_{n})+f(z_{n+1})). \end{aligned}$$
(6)

This method averages the Euler and backward Euler methods, advancing the approximate solution at each step along a line whose slope is the arithmetic mean of the derivatives at its endpoints.

We consider a discretization of \(N=100\) points in [0, 1]. In Fig. 3, we plot the approximation using a classical implementation of the (implicit) trapezoid method, that is, solving the associated nonlinear equation using Newton’s method. The method gives a bad approximation when the stiffness of the problem increases. We observe some oscillations near \(t=0\) where the algorithm is not able to find good initial guess for Newton’s method. However, looking at Figs. 4 and 5, we see that our method, starting with

$$\begin{aligned} z^0(t)=z(0)=1 \end{aligned}$$

and taking \(Tol=10^{-4}\) in the stopping criterion of our iterative approach, converges in all the cases.

Fig. 4
figure 4

New implementation of the trapezoid method, nonlinear test problem, ‘circle’-original, ‘asterisk’-approximation, left \(\epsilon =10^{-2}\), and right \(\epsilon =10^{-3}\)

Fig. 5
figure 5

New implementation of the trapezoid method, nonlinear test problem, ‘circle’-original, ‘asterisk’-approximation, and \(\epsilon =10^{-5}\)