1 Computer-assisted proofs and interval arithmetics

In the last 50 years computing power has experienced an enormous development. According to Moore’s Law [109], every two years the number of transistors has doubled since the 1970s. This phenomenon has resulted in the blooming of new techniques located in the verge between pure mathematics and computational ones. However, even nowadays when we can perform computations at the speeds of the order of Petaflops (a quatrillion floating point operations per second) we can not avoid the following questions, still fundamental in the rigorous analysis of the output of a computer program:

  1. Q1:

    Is a computer result influenced by the way the individual operations are done?

  2. Q2:

    Does the environment (operating system, computer architecture, compiler, rounding modes, \(\ldots \)) have any impact on the result?

Sadly, the answer to these questions is Yes, which can be easily illustrated by the following C++ codes (see Listings 1 and 3). The first one computes the harmonic series up to a given N in two ways: the first way adds the different numbers from the bigger ones to the smaller and the second one does the sum in the opposite way. The results for \(N = 10^{6}\) can be seen in Listing 2. They are not the same and curiously, the real result is not any of the two of them. The second program uses the MPFR library [69] to add two numbers given by the user in two different ways: rounding down and rounding up the result. The output is done in binary. We can see that the results differ (Listing 4).

figure a
figure b
figure c
figure d

This shows that even the simplest algorithms need a careful analysis: only two operations suffice to give different results if executed in different order or with different rounding methods.

1.1 What is a computer-assisted proof?

A computer-assisted proof may mean different things depending on the field. For simplicity, we will focus in the context of PDE or ODEs. The starting point is typically an object, or in a broader sense, a behaviour. Most computer-assisted proofs are devoted to show an instance of these objects or behaviours. This is usually done via two steps:

  1. Step 1:

    Perform an analytical reduction of the problem into a (possibly big) set of (typically open) conditions in a way that if those conditions are met, then the theorem is proved. Sometimes the amount of conditions is too big or too complicated to be checked by hand, even though a human could do it without computer-assistance given enough time and space. Examples of tour de force human checked conditions can be found in [47, 48].

  2. Step 2:

    Use the computer to rigorously validate the set of conditions from Step 1.

We now present more details of the technique in 3 flavours, outlining some of its possible applications:

Example 1.1

Compute explicit, tight bounds of hard (singular) integrals and use them to track short time behaviour of solutions of a PDE (see [24, 42, 79]).

Example 1.2

Bound the norm of a given operator, then use a fixed point theorem to show (by contractivity) existence of solutions, even for ill-posed or singular problems (see [13, 54, 59]).

Example 1.3

Track the spectrum of a given operator, and use this information to say something about the stability/instability, solve an eigenvalue problem, or quantify spectral gaps (see [22]).

In this paper we will give instances of Examples 1.1 and 1.3, in Sect. 3 and 4 respectively.

A natural concern is whether the intermediate numerical calculations may be error-prone or not, or depend on the implementation. To circumvent this problem, we will give up on calculating the exact answer (since we are now interested in checking inequalities—as opposed to equalities—) and factor in errors, etc.

The theory of interval analysis developed by Moore [110] is an example of a tool, which albeit being impractical due to inefficient resources at the time of its conception, is now being widely used. It belongs to the paradigm known as rigorous computing (in some contexts also called validated computing), in which numerical computations are used to provide rigorous mathematical statements about a result. The philosophy behind the theory of interval analysis consists in working with and producing objects which are not numbers, but intervals in which we are sure that the true result lies. Nevertheless, we should be precise enough since even with plenty of resources, overestimation might lead to too big intervals which might not guarantee the desired result.

In this setting, the main building blocks are intervals containing the answer and intermediate results. In Sect. 2 we explain how to develop an arithmetic for these objects and how to work with it.

Nowadays, there are a few free libraries that implement interval arithmetics and applications such as CAPD (CAPD—Computer assisted proofs in dynamics, a package for rigorous numerics), C-XSC [95], MPFR/MPFI [69, 121] or Arb [97].

1.2 History

Lately, interval methods have become quite popular among mathematicians. Several highly non-trivial results have been established by the use of interval arithmetics, see for example [71, 82, 87, 102, 125] as a small sample.

In analysis, the most celebrated result is the proof of the dynamics of the Lorenz attractor (Smale’s 14th Problem) [131]. However, the study of the dynamics of a system has been restricted until very recently almost always to (typically low-dimensional) ODEs. Some examples involving ODEs but an infinite dimensional system are the computation of the ground state energy of atoms or the relativistic stability of matter (see [61, 64, 128]).

Regarding PDE, most of the work has been carried out for dissipative systems (i.e. systems in which the \(L^{2}\)-norm of the function decreases with time). The most popular ones are the Kuramoto–Sivashinsky equations or Navier–Stokes in low dimensions. The main feature of these models is that one can study the first N modes of the Fourier expansion of the function and see the rest as an “error”. Since the system is dissipative, if N is large enough, one can get a control on the error throughout time.

Other techniques reduce the problem to compute the norm of an operator (as in Example 1.2) and apply a fixed point theorem, or use topological methods (Conley index). They have been successfully applied for instance in computing the following: Conley index for Kuramoto–Sivashinsky [141], bifurcation diagram for stationary solutions of Kuramoto–Sivashinsky [2], stationary solutions of viscous 1D Burgers with boundary conditions [68], traveling wave solutions for 1D Burgers equation [67], periodic orbits of Kuramoto–Sivashinsky [3, 65, 66, 72, 140], Conley index for the Swift–Hohenberg equation [50], bifurcation diagram of the Ohta-Kawasaki equation [135], stability of periodic viscous roll waves of the KdV–KS equation [5], existence of hexagons and rolls for a pattern formation model [133], self-similar solutions of a 1D model of 3D axisymmetric Euler [96], and many others.

In the elliptic setting, similar techniques have been developed via finite elements [112, 139]. Very recently, there are papers also dealing with the hyperbolic PDE case [4].

We also point the reader to the expository article [134], to the excellent monographs [110, 123, 132] and the survey [114].

2 Interval arithmetics

2.1 Basic arithmetic

Representing an abstract concept such as a real number by a finite number of zeros and ones has the advantage that the calculations are finite and the framework is practical. The drawback is naturally that the amount of numbers that can be written in this way is finite (although of the same order of magnitude as the age of the universe in seconds) and inaccuracies might arise while performing mathematical operations. We will now discuss the basics of interval arithmetics.

Let \(\mathbb {F}\) be the set of representable numbers by a computer. We will work with the set of representable closed intervals \(\mathbb {IR} = \{[{\underline{a}},{\overline{a}}] | \quad {\underline{a}} \le {\overline{a}}, \quad {\underline{a}},{\overline{a}} \in \mathbb {F}\}\). For every element \([a] \in \mathbb {IR}\) we will refer to it by either [a] or by \([{\underline{a}},{\overline{a}}]\), whenever we want to stress the importance of the endpoints of the interval. We can now define an arithmetic by the theoretic-set definition

$$\begin{aligned}{}[x] \star [y] = \{x \star y | \quad x \in [x], y \in [y]\}, \end{aligned}$$
(2.1)

for any operation \(\star \in \{+,-,\times ,\div \}\). We can easily define them by the following equations:

$$\begin{aligned}{}[x]+[y]&= [{\underline{x}}+{\underline{y}},{\overline{x}}+{\overline{y}}] \\ [x]-[y]&= [{\underline{x}}-{\overline{y}},{\overline{x}}-{\underline{y}}] \\ [x]\times [y]&= [\min \{{\underline{x}}{\underline{y}},{\underline{x}}{\overline{y}},{\overline{x}}{\underline{y}},{\overline{x}}{\overline{y}}\}, \max \{{\underline{x}}{\underline{y}},{\underline{x}}{\overline{y}},{\overline{x}}{\underline{y}},{\overline{x}}{\overline{y}}\}] \\ [x] \div [y]&= [x] \times \left[ \frac{1}{{\overline{y}}},\frac{1}{{\underline{y}}}\right] , \text { whenever } 0 \not \in [y]. \end{aligned}$$

Note that this interval-valued operators can be extended to other algebraic expressions involving exponential, trigonometric, inverse trigonometric functions, etc. This derivation is purely theoretical, and we should keep in mind that, if carried out on a computer, the results of an operation have to be rounded up or down according to whether we are calculating the left or right endpoint so that the true result is enclosed in the produced interval. The main feature of the arithmetic is that if \(x \in [x], y \in [y]\), then necessarily \(x \star y \in [x] \star [y]\) for any operator \(\star \). This property is fundamental in order to ensure that the true result is always contained in the interval we get from the computer.

We remark that this arithmetic is not distributive, but subdistributive, i.e:

$$\begin{aligned}{}[a]\times ([b]+[c])&\ne [a]\times [b] + [a]\times [c] \\ [a]\times ([b]+[c])&\subset [a]\times [b] + [a]\times [c] \end{aligned}$$

Example 2.1

If we set \([a] = [3,4]\), \([b] = [1,2]\), \([c] = [-1,1]\), then:

$$\begin{aligned}{}[a]\times ([b]+[c])&= [3,4] \times [0,3] = [0,12]\\ [a]\times [b] + [a]\times [c]&= [3,4] \times [1,2] + [3,4] \times [-1,1] = [3,8] + [-4,4] = [-1,12] \end{aligned}$$

This illustrates that the way in which operations are executed in the interval-based arithmetic matters much more than in the real-based. As an example, consider the function \(f(x) = 1-x^2\) and a domain \(D = [-1,1]\). Over the reals, we can write f as any of the following functions:

$$\begin{aligned} f_1(x)&= 1 - x^2 \\ f_2(x)&= 1 - x \cdot x \\ f_3(x)&= (1+x)\cdot (1-x) \end{aligned}$$

However, evaluating \(f_i\) over D we get the enclosures:

$$\begin{aligned} f_1(D)&= [0,1] \\ f_2(D)&= [0,2] \\ f_3(D)&= [0,4] \end{aligned}$$

We observe that although \(f_3\) is completely factored, if we expand it we get an expression of the form \(x - x\) which in the interval-based arithmetic is equal to an interval of a width twice the width of the domain in which we are evaluating the expression: a price too high to pay compared with the width of the interval [0, 0], another form to write the same expression over the reals.

For readability purposes, instead of writing the intervals as, for instance, [123456, 123789], we will sometimes instead refer to them as \(123^{456}_{789}\).

2.2 Automatic differentiation

One of the main tasks in which we will need the help of a computer is to calculate a massive amount of function evaluations and their derivatives up to a given order at several points and intervals. In order to perform it, one could first think of trying to differentiate the expressions symbolically. However, we don’t need the expression of the derivative, just its evaluation at given points. This, together with the fact that the amount of terms of the derivative might grow exponentially with the number of derivatives taken, makes the use of symbolic calculus impractical. Instead of calculating the expression of every derivative, we will use the so-called automatic differentiation methods. Suppose f(x) is a sufficiently regular function and let \(x_0\) be the point (or interval) of which we want to calculate its image by f. We define

$$\begin{aligned} (f)_{0}&= f(x_{0}) \\ (f)_{k}&= \frac{1}{k!} \frac{d^{k}}{dx^{k}}f(x_0), \quad k = 1,2,\ldots ,N, \end{aligned}$$

where N stands for the maximum number of derivatives of the function we want to evaluate. We can think about (f) as being the coefficients of the Taylor series around \(x_0\) up to order N. We now show how to compute the coefficients (f) for some of the functions that will appear in our programs. The generalization of the missing functions is immediate. However, it is possible to derive similar formulas for any solution of a differential equation (e.g. Bessel functions).

$$\begin{aligned} (u\pm v)_{k}&= (u)_{k} \pm (v)_{k} \\ (u \cdot v)_{k}&= \sum _{j=0}^{k}(u)_{j}(v)_{k-j} \\ (u \div v)_{k}&= (1/v) \left( (u)_{k} - \sum _{j=1}^{k}(v)_{j}(u \div v)_{k-j}\right) \\ (\sin (u))_{k}&= \frac{1}{k}\sum _{j=0}^{k-1}(j+1)(\cos (u))_{k-1-j}(u)_{j+1} \\ (\cos (u))_{k}&= -\frac{1}{k}\sum _{j=0}^{k-1}(j+1)(\sin (u))_{k-1-j}(u)_{j+1} \end{aligned}$$

Automatic differentiation has become a natural technique in the field of Dynamical Systems, since the cost for evaluating an expression up to order k is \(O(k^2)\), making it a fast and powerful tool to approximate accurately trajectories [130]. It has also been used for the computation of invariant tori and their associated invariant manifolds [85, 86] or the computation of normal forms of KAM tori [83]. For more applications in Dynamical Systems we refer the reader to the book [84]. Automatic Differentiation is also an important element in the so-called Taylor models [98, 106, 115], in which functions are represented by couples \((P,\Delta )\), being P a polynomial and \(\Delta \) an interval bound on the absolute value of the difference between the function and P. Nowadays, there are several packages that implement it, for example [6, 99].

2.3 Integration

In this section we will discuss the basics of rigorous integration. A few examples where rigorous integration has been used or developed are [9, 101, 103]. A more detailed version concerning singular integrals can be found in the next subsection. We will only give the details of the one-dimensional case, omitting the multidimensional one, which can be done extending the methods in a natural way.

The main problem we address here is to calculate bounds for a given integral

$$\begin{aligned} I = \int _{a}^{b} f(x)dx, \quad -\infty< a< b < \infty . \end{aligned}$$

Different strategies can be used for this purpose. For instance, we can extend the classical integration schemes:

$$\begin{aligned} I = \sum _{i=1}^{N} \int _{x_{i-1}}^{x_i}f(x)dx, \quad a = x_0< x_1< \cdots < x_{N} = b. \end{aligned}$$

In every interval, we approximate f(x) by a polynomial p(x) and an error term. We detail some typical examples in Table 1.

Table 1 Different schemes for the rigorous integration

It is now clear where the interval arithmetic takes place. In order to enclose the value of the integral, we need to compute rigorous bounds for some derivative of the function at the integration region.

Another approach consists of taking the Taylor series of the integrand up to order n as the polynomial \(p_i(x)\). Centering the Taylor series in the midpoint of the interval makes us integrate only roughly over half of the terms (since the other half are equal to zero). We can see that

$$\begin{aligned} \int _{a}^{b}f(x)dx&= \int _{a}^{b}\left( f(a) + (x-a)f'(a)+ \cdots + \frac{(x-a)^{n}}{n!}f^{n}(a)+\frac{(x-a)^{n+1}}{(n+1)!}f^{n+1}(\xi (x))\right) dx \\&\in \int _{a}^{b}\left( f(a) + (x-a)f'(a)+ \cdots + \frac{(x-a)^{n}}{n!}f^{n}(a)+\frac{(x-a)^{n+1}}{(n+1)!}f^{n+1}([a,b])\right) dx \\&= \underbrace{(b-a)f(a) + \frac{1}{2}(b-a)^2f'(a) + \cdots +\frac{(x-a)^{n+1}}{(n+1)!}f^{n}(a)}_{\text {Real number (thin interval)}}+\underbrace{\frac{(x-a)^{n+2}}{(n+2)!}f^{n+1}([a,b])}_{\text {Error (thick interval)}}. \end{aligned}$$

We now compare the two methods in the following examples, in which we integrate \(\int _{0}^{1}e^{x}dx\).

Example 2.2

If we take \(N = 4\) and use a trapezoidal rule, we enclose the integral in

$$\begin{aligned} \int _{0}^{1}e^{x}dx&= \frac{1}{2}\left( e^{0} + 2e^{1/4} + 2e^{1/2} + 2e^{3/4} + e^{1}\right) \frac{1}{4} - \frac{1}{12}\frac{1}{16}e^{[0,1]} \\&\qquad \in [1.72722,1.72723] - [0.0050283, 0.014578] = [1.712642,1.7222017] \end{aligned}$$

Example 2.3

$$\begin{aligned} \int _{0}^{1} e^{x} dx&\in \int _{0}^{1} 1 + x + \frac{x^{2}}{2} + \frac{x^{3}}{6}e^{[0,1]}dx \\&= \left. x + \frac{x^{2}}{2} + \frac{x^{3}}{6}\right| ^{x=1}_{x=0} + \left. [1,e]\frac{x^{4}}{24}\right| ^{x=1}_{x=0} \\&= \frac{10}{6} + \frac{1}{24}[1,e] \\&= [1\text {.}70833,1\text {.}77994] \end{aligned}$$

The exact result is \(e - 1 \approx 1.71828182846\). We can see that there is a tradeoff between function evaluations (efficiency of the scheme) and quality (precision) of the results, since the first method is more exact but requires more evaluations of the integrand, while for the second it is enough to compute the Taylor series of the integrand.

2.4 Singular integrals and integrals over unbounded domains

In this subsection we will discuss the computational details of the rigorous calculation of some singular integrals. In particular we will focus on the Hilbert transform, but the methods apply to any singular integral. Parts of the computation (the N and F parts) are slightly related to the Taylor models with relative remainder presented in [98]. See also the paper [17] regarding the rigorous inversion of operators involving singular integrals.

Let us suppose that we have a \(2\pi \)-periodic function f, which for simplicity we will assume it is \(C^k\) (this requirement can be relaxed). We want to calculate rigorously the Hilbert Transform of f, that is

$$\begin{aligned} Hf(x) = \frac{PV}{\pi } \int _{\mathbb {T}} \frac{f(x)-f(x-y)}{2\tan \left( \frac{y}{2}\right) }dy, \end{aligned}$$

We can split our integral in

$$\begin{aligned} Hf(x)&= \frac{PV}{\pi } \int _{\mathbb {T}} \frac{f(x)-f(x-y)}{2\tan \left( \frac{y}{2}\right) }dy \\&= \frac{PV}{\pi } \int _{|y|< \varepsilon _1} \frac{f(x)-f(x-y)}{2\tan \left( \frac{y}{2}\right) }dy + \frac{PV}{\pi } \int _{\varepsilon _1 \le |y| < \pi - \varepsilon _2} \frac{f(x)-f(x-y)}{2\tan \left( \frac{y}{2}\right) }dy\\&\quad + \frac{PV}{\pi } \int _{|y| \ge \pi - \varepsilon _2} \frac{f(x)-f(x-y)}{2\tan \left( \frac{y}{2}\right) }dy \\&\equiv H^{N} f(x) + H^{C}f(x) + H^{F}f(x). \end{aligned}$$

The integration of \(H^{C}f(x)\) is easy since the integrand is smooth, and the denominator is bounded away from zero.

We now move on the the term \(H^{N}f(x)\). In this case, we perform a Taylor expansion in both the denominator

$$\begin{aligned} 2\tan \left( \frac{y}{2}\right) = (y)+c(\varepsilon _1)(y)^{3}, \quad c(\varepsilon _1) = \text {(interval) constant} \end{aligned}$$

and the numerator

$$\begin{aligned} f(x) = f(x-y) + (y)f'(x) + \frac{1}{2}(y)^2f''(x) + \cdots \frac{1}{k!}(y)^{k}f^{k}(\eta ), \end{aligned}$$

around \(y = 0\). Here \(\eta \) belongs to an intermediate point between x and \(x-y\), and we can enclose \(f^{k}(\eta )\) in the whole interval \(f^{k}([x-\varepsilon _1,x+\varepsilon _1]) \subset f^{k}([-\pi ,\pi ])\). Finally, we can factor out (y) and divide both in the numerator and the denominator, getting

$$\begin{aligned} \frac{1}{\pi }\int _{|y| < \varepsilon _1} \frac{f'(x) + \frac{1}{2}(y)f''(x) + \cdots \frac{1}{k!}(y)^{k-1}f^{k}(\eta )}{1+cy^2}dy, \end{aligned}$$

which we could either bound or integrate explicitly since it is a smooth (interval) function and f(x) is explicit.

For \(H^{F}f(x)\) we will do the same, expanding the cotangent function to avoid division by \(\infty \):

$$\begin{aligned} \frac{1}{2}\cot \left( \frac{y}{2}\right) = -\frac{1}{4}(x-\pi ) + c(\varepsilon _2)(x-\pi )^3, \quad c(\varepsilon _2) = \text {(interval) constant} \end{aligned}$$

we obtain

$$\begin{aligned} \frac{1}{\pi }\int _{|y-\pi | < \varepsilon _2} (f(x)-f(x-y))\left( -\frac{1}{4}(y-\pi ) + c(\varepsilon _2)(y-\pi )^3\right) dy, \end{aligned}$$

which is smooth and therefore we can also integrate it as in the previous subsection.

The choice of \(\varepsilon _i\) is determined by the balance between accuracy and computation time. Most of the times, the \(\varepsilon _i\) will be taken very small and \(H^{N}f(x)\) and \(H^{F}f(x)\) will be regarded as error terms.

In the case where the integration domain is unbounded (for simplicity we may assume it is \(\mathbb {R}\) and the integrand decays fast enough), one can do two workarounds:

  • Perform a change of variables that maps \(\mathbb {R}\) onto a bounded domain, such as \(x = 2\tan \left( \frac{y}{2}\right) \). This change of variables is useful because the problem is mapped onto \([-\pi ,\pi ]\) and one can work with Fourier series there. The integral in the new coordinates becomes

    $$\begin{aligned} \int _{-\infty }^{\infty } f(x)dx = \int _{-\pi }^{\pi }f\left( 2\tan \left( \frac{y}{2}\right) \right) \sec \left( \frac{y}{2}\right) ^{2}dy, \end{aligned}$$

    which depending on f may potentially be singular, in which case we would apply the techniques outlined in the beginning of this subsection.

  • Choose a large enough number M and treat the contribution to the integral from \(|x| > M\) as an error. Thus:

    $$\begin{aligned} I = \int _{-\infty }^{\infty } f(x)dx = \int _{|x| \le M} f(x) dx + \int _{|x|>M} f(x) dx = I_1 + I_2. \end{aligned}$$

    The term \(I_1\) will be integrated normally. For the term \(I_2\), assuming \(|f(x)| \le \frac{C}{|x|^{k}}\) we easily obtain \(|I_2| \le \frac{2C}{k-1} \frac{1}{M^{k-1}}\). Making M large this term will go to zero.

3 The Muskat problem

The first problem that we will present is the so-called Muskat problem. This problem models the evolution of the interface between two different incompressible fluids with the same viscosity in a two-dimensional porous medium, and is used in the context of oil wells [111].

The setup consists of two incompressible fluids with different densities, \(\rho ^{1}\) and \(\rho ^{2}\), and the same viscosity, evolving in a porous medium with permeability \(\kappa (x)\). The velocity is given by Darcy’s law:

$$\begin{aligned} \mu \frac{v}{\kappa }=-\nabla p-g\left( \begin{array}{cc}0 \\ \rho \end{array}\right) , \end{aligned}$$
(3.1)

where \(\mu \) is the viscosity, p is the pressure and g is the acceleration due to gravity, and the incompressibility condition

$$\begin{aligned} \nabla \cdot v=0. \end{aligned}$$
(3.2)

We take \(\mu =g=1\). The fluids also satisfy the conservation of mass equation

$$\begin{aligned} \partial _t\rho +v\cdot \nabla \rho =0. \end{aligned}$$
(3.3)

We will work in different settings: the flat at infinity and horizontally periodic cases \((\Omega = \mathbb {R}^{2}\) and \(\Omega = \mathbb {T}\times \mathbb {R}\) respectively) or the confined case (\(\Omega = \mathbb {R}\times \left( -\frac{\pi }{2},\frac{\pi }{2}\right) \)). We denote by \(\Omega ^{1}\) the region occupied by the fluid with density \(\rho ^{1}\) (the “top” fluid) and by \(\Omega ^{2}\) the region occupied by the fluid with density \(\rho ^{2} \ne \rho ^{1}\) (the “bottom” fluid). All quantities with superindex 1 (resp. 2) will refer to \(\Omega ^{1}\) (resp. \(\Omega ^{2}\)). The interface between the two fluids at any time t is a planar curve denoted by \(z(\cdot ,t)\).

In the case \(\Omega = \mathbb {R}^{2}\), one can rewrite the system (3.1)–(3.3) in terms of the curve \(z=(z^1,z^2)\), obtaining

$$\begin{aligned} \partial _{t} z(\alpha ,t) = \frac{\rho ^{2} - \rho ^{1}}{2\pi } P.V. \int _\mathbb {R} \frac{z^1(\alpha ,t) - z^1(\beta ,t)}{|z(\alpha ,t) - z(\beta ,t)|^{2}}(\partial _{\alpha }z(\alpha ,t) - \partial _{\beta }z(\beta ,t)) d\beta . \end{aligned}$$
(3.4)

In the horizontally periodic case (\(\Omega = \mathbb {T}\times \mathbb {R}\)) with \(z(x+2\pi ,t)=z(x,t)+(2\pi ,0)\), the evolution of the curve is given by the formula

$$\begin{aligned} \partial _{t} z(\alpha ,t) = \frac{\rho ^{2} - \rho ^{1}}{4\pi } \int _\mathbb {T} \frac{\sin (z^1(\alpha ,t) - z^1(\beta ,t))(\partial _{\alpha }z(\alpha ,t) - \partial _{\beta }z(\beta ,t))}{\cosh (z^{2}(\alpha ,t) - z^{2}(\beta ,t)) - \cos (z^{1}(\alpha ,t) - z^{1}(\beta ,t))} d\beta . \end{aligned}$$
(3.5)

Finally, in the confined case:

$$\begin{aligned}&\partial _tz(\alpha ,t)=\frac{\rho ^{2}-\rho ^{1}}{4\pi }\int _\mathbb {R}\frac{(\partial _{\alpha }z(\alpha ,t)-\partial _{\alpha }z(\alpha -\beta ,t))\sinh (z_1(\alpha ,t)-z_1(\alpha -\beta ,t))}{\cosh (z_1(\alpha ,t)-z_1(\alpha -\beta ,t))-\cos (z_2(\alpha ,t)-z_2(\alpha -\beta ,t))}d\beta \\&+\frac{\rho ^{2}-\rho ^{1}}{4\pi }\int _\mathbb {R}\frac{(\partial _{\alpha }z_1(\alpha ,t)-\partial _{\alpha }z_1(\alpha -\beta ,t),\partial _{\alpha }z_2(\alpha ,t)+\partial _{\alpha }z_2(\alpha -\beta ,t))\sinh (z_1(\alpha ,t)-z_1(\alpha -\beta ,t))}{\cosh (z_1(\alpha ,t)-z_1(\alpha -\beta ,t))+\cos (z_2(\alpha ,t)+z_2(\alpha -\beta ,t))}d\beta . \end{aligned}$$

We define the Rayleigh–Taylor condition

$$\begin{aligned} RT(\alpha ,t)=-(\nabla p^2(z(\alpha ,t))-\nabla p^1(z(\alpha ,t)))\cdot \partial _\alpha ^\bot z(\alpha ,t), \end{aligned}$$

which can be written in terms of the interface as

$$\begin{aligned} RT(\alpha ,t) = (\rho ^{-} - \rho ^{+})\partial _{\alpha } z^{1}(\alpha ,t). \end{aligned}$$

Linearizing around the steady state \((\alpha ,0)\), the evolution equation of a small perturbation \((0,\varepsilon f_{L}(\alpha ,t))\) satisfies at the linear level:

$$\begin{aligned} \partial _t f_{L}(\alpha ,t) = -RT^{L}(\alpha ,t) \Lambda f_{L}(\alpha ,t), \end{aligned}$$
(3.6)

where \(RT^{L}(\alpha ,t)\) is the linearized version of the Rayleigh–Taylor condition

$$\begin{aligned} RT^{L}(\alpha ,t)=g(\rho ^2-\rho ^1) \end{aligned}$$

and \(\Lambda = (-\Delta )^{\frac{1}{2}}\). Thus, the Eq. (3.6) is parabolic if \(RT^{L}(\alpha ,t) > 0\). At the nonlinear level, similar estimates of the form

$$\begin{aligned} \partial _{t} \partial _{\alpha }^{k}z(\alpha ,t) = -RT(\alpha ,t) \Lambda \partial _{\alpha }^{k}z(\alpha ,t) + \text { lower order terms } \end{aligned}$$

can be derived for large enough k. It is therefore easy to see that the sign of \(RT(\alpha ,t)\) is crucial, since it will govern the stability of the equation (stable for positive RT, negative otherwise). For a fixed t, if \(RT(\alpha ,t)>0 , \;\forall \alpha \in \mathbb {R}\) we will say that the curve is in the Rayleigh–Taylor stable regime and if \(RT(\alpha ,t)<0\) for some \(\alpha \), we will say that the curve is in the Rayleigh–Taylor unstable regime. In other words, we have the correspondence:

$$\begin{aligned} z(\alpha ,t) \text { can be parametrized as a graph} \Leftrightarrow z(\alpha ,t) \text { is in the R-T stable regime} \end{aligned}$$

We will also say that if the curve \(z(\alpha ,t)\) changes from graph to non-graph or viceversa, it undergoes a stability shift.

3.1 Brief history of the problem

The Muskat problem has been studied in many works. A proof of local existence of classical solutions in the Rayleigh–Taylor stable regime and a maximum principle for \(\Vert \partial _xf(\cdot ,t)\Vert _{L^\infty }\) can be found in [39]. See also [1]. Ill-posedness in the unstable regime appears in [38].

Moreover, the authors in [38] showed that if \(\Vert \partial _{x} f_0\Vert _{L^\infty }\,{<}\,1\), then \(\Vert \partial _{x} f(\cdot ,t)\Vert _{L^\infty } \le \Vert \partial _{x} f_0\Vert _{L^\infty }\) for all \(t>0\). Further work has shown instant analyticity and existence of finite time turning [18]: in other words, the curve ceases to be a graph in finite time and the Rayleigh–Taylor condition changes sign to negative somewhere along the curve. The gap between these two results (i.e., the question whether the constant 1 is sharp or not for guaranteeing global existence) is still an open question, and there is numerical evidence of data with \(\Vert \partial _x f_{0}\Vert _{L^{\infty }} = 50\) which turns over [41].

Given the parabolic character of the equation, it is natural to expect global existence, at least for small initial data. The first proof for small initial data was carried out in [129] in the case where the fluids have different viscosities and the same densities (see [38] for the setting of the present paper—different densities and the same viscosities—and also [27] for the general case). The regularity requirement of the initial data has been subsequently lowered down: see [7, 12, 28,29,30, 44, 56, 108] for recent developments of global existence in different spaces. A blow-up criterion was found in [30]. For large time estimates see [75, 116]. In the case where surface tension is taken into account, global existence was shown in [60, 70].

Contrary to the previous intuition, there is also formation of singularities: there are initial data which start smooth, but once the Rayleigh–Taylor condition is not satisfied (i.e. they have ceased to be a graph), the smoothness of the curve may break down in finite time [15]. Another possibility is the appearance of a finite time self-intersection of the free boundary, either at a point (“splash” singularity) or along an arc (“splat” singularity). In the stable density jump case these cannot happen [40, 76]. However, in the one phase case there are finite time splash singularities [16] but no splat singularities are possible [45].

More general models, which take into account finite depth or non-constant permeability, and which also exhibit (single) stability shift were studied in [8, 43, 46, 79, 80, 118].

For a bigger and more comprehensive list of references we refer the reader to the two surveys [19] and [74].

3.2 Theorems

We now present a few theorems which can be proved by means of computer-assisted estimates, following the abstract idea from Example 1.1. The intuition behind these results is driven by numerical simulations.

The first result compares the confined and flat at infinity regimes [79]:

Theorem 3.1

There exists a family of analytic curves \(z^{0}(\alpha ) = (z_1^{0}(\alpha ),z_2^{0}(\alpha ))\), flat at infinity, for which there exists a finite time T such that the solution to the confined Muskat problem develops a stability shift before \(t=T\) and the non confined does not.

Specifically, we prove that the solution with initial data \(z^{0}(\alpha )\) can be parametrized as a graph for all \(t < T\) in the flat at infinity setting and can not be in the confined one.

We now outline the reduction to a finite set of conditions.

Reduction of Theorem

3.1 We want to show that \(RT(0,t) \sim At + O(t^2)\) for small, positive t, where \(A < 0\) in the confined case and \(A > 0\) in the flat at infinity, and this will ensure the Theorem. After some calculations one obtains:

$$\begin{aligned} A_{confined}= & {} 2 \partial _{\alpha } z_2(0)\int _0^\infty \partial _\alpha z_1(\eta )\sinh (z_1(\eta ))\sin (z_2(\eta ))\\&\times \bigg (\frac{1}{(\cosh (z_1(\eta ))-\cos (z_2(\eta )))^2} +\frac{1}{(\cosh (z_1(\eta ))+\cos (z_2(\eta )))^2}\bigg )d\eta . \end{aligned}$$

With the same approach, for the unconfined case the expression is

$$\begin{aligned} A_{flat}=8 \partial _{\alpha } z_2(0)\int _0^\infty \frac{\partial _\alpha z_1(\eta )z_1(\eta )z_2(\eta )}{(z_1(\eta ))^2+(z_2(\eta ))^2)^2}d\eta . \end{aligned}$$

Thus, the theorem will be proved if we manage to validate the following open conditions:

$$\begin{aligned} A_{confined}<0,\quad A_{flat}>0. \end{aligned}$$
(3.7)

\(\square \)

We can rigorously validate them for the following data (see Fig.  1):

$$\begin{aligned} z_1(\alpha )&= \alpha - \sin (\alpha )e^{-B\alpha ^{2}}, \quad B = 10^{-4} \\ z_2(\alpha )&= \left\{ \begin{array}{ll} \displaystyle \frac{\sin (3\alpha )}{3} &{}\quad \displaystyle \text { if } 0 \le \alpha \le \frac{\pi }{3} \\ \displaystyle - \alpha + \frac{\pi }{3} &{}\quad \displaystyle \text { if } \frac{\pi }{3} \le \alpha \le \frac{\pi }{2} \\ \displaystyle \alpha - \frac{2\pi }{3} &{}\quad \displaystyle \text { if } \frac{\pi }{2} \le \alpha \le \frac{2\pi }{3} \\ \displaystyle 0 &{}\quad \displaystyle \text { if } \frac{2\pi }{3} \le \alpha , \\ \end{array} \right. \end{aligned}$$

where \(z_2\) is extended such that it is an odd function. In Fig. 1 (inset), we plot the normal velocity around the vertical tangent for the two scenarios (confined and flat at infinity), both scaled by a factor 1 / 100. We can observe that the velocity denoted by squares, which corresponds to the confined case, will make the curve develop a turning singularity, where the dotted one (non-confined case) will force the curve to stay in the stable regime.

Fig. 1
figure 1

The curve in Theorem 3.1. Inset: close caption around zero, solid: initial condition, dotted: normal component of the velocity for the flat at infinity case, squared: normal component of the velocity for the confined case

In fact, one can find more striking behaviours if the expansion of \(RT(\alpha ,t)\) is done at higher order (at the price of higher complexity of the calculations). The following Theorem was proved in [42]:

Theorem 3.2

There exist \(T>\gamma >0\) and a spatially analytic solution z to (3.5) on the time interval \([-T,T]\) such that \(z(\cdot ,t)\) is a graph of a smooth function of \(\alpha \) when \(|t|\in [T-\gamma ,T]\) (i.e., z is in the stable regime near \(t=\pm T\)) but \(z(\cdot ,t)\) is not a graph of a function of x when \(|t|\le \gamma \) (i.e., z is in the unstable regime near \(t=0\)).

In other words, there exists solutions of (3.5) that make the transition stable \(\rightarrow \) unstable \(\rightarrow \) stable.

The intuition behind this result comes from the numerical experiments which were started in [41], where it was proved that there were solutions that were exhibiting the unstable \(\rightarrow \) stable \(\rightarrow \) unstable transition. These suggested existence of curves which are (barely) in the unstable regime, and such that the evolution both forward and backwards in time transports them into the stable regime. (We note that neither the velocity nor any other quantity was observed to become degenerate in these experiments). We remark that this behaviour is purely nonlinear and thus nonlinear effects may dominate the linear ones under certain conditions.

Reduction of Theorem

3.2 Let \(\varepsilon \ge 0\) and consider the initial family of curves \(z_{\varepsilon }(\alpha ,0)=(z^{1}_{\varepsilon }(\alpha ,0),z^{2}_{\varepsilon }(\alpha ,0))\), with

$$\begin{aligned} z^{1}_{\varepsilon }(\alpha ,0)&= \alpha - \sin (\alpha ) - \varepsilon \sin (\alpha ), \\ z^{2}_{\varepsilon }(\alpha ,0)&= A(\varepsilon ) \sin (2\alpha ). \end{aligned}$$

The goal is to show that this family of solutions satisfies \(RT(0,t) \sim -\varepsilon + Ct^2 + O(t^3)\). See Fig. 2.

Fig. 2
figure 2

\(z_{\varepsilon }(\alpha ,0)\) from Theorem 3.2 with \(A(\varepsilon ) = 1.08050\). Inset: closeup around \(x = 0\). Thick curve: \(\varepsilon = 10^{-6}\), thin curve: \(\varepsilon = 0\). We remark that both curves are indistinguishable at the larger scale

This is done in two steps. The first one is to choose \(A(\varepsilon )\) accordingly: one can prove that for any \(\varepsilon \in [0,10^{-6}]\), there exists \(A(\varepsilon )\in (1.08050, 1.08055)\) such that if \(z_\varepsilon \) solves (3.5) with initial data \(z_{\varepsilon }(\alpha ,0)\), then

$$\begin{aligned} \partial _{t} RT(0,0) = 0. \end{aligned}$$

The second one is to show that there exist \(T > 0,C \ge 1\), independent of \(\varepsilon \), such that for any \(\varepsilon \in [0,10^{-6}]\) and \(A(\varepsilon )\) chosen before, there is a unique analytic solution \(z_{\varepsilon }\) of (3.5) on the time interval \((-T,T)\) with initial data \(z_{\varepsilon }(\alpha ,0)\), and it satisfies

$$\begin{aligned} \partial _{tt} RT(0,0)\ge 30. \end{aligned}$$
(3.8)

The first step is accomplished calculating \(\partial _{t} RT(0,0)\) for \(A(\varepsilon ) = 1.08050\) and for \(A(\varepsilon ) = 1.08055\) and checking that one is negative and the other is positive, for each \(\varepsilon \in [0,10^{-6}]\). The second step follows by taking \(A(\varepsilon ) = [1.08050,1.08055]\) (the full interval, since we do not know \(A(\varepsilon )\) explicitly) and propagating this interval in the relevant computations. The drawback of this method is that \(\partial _{tt} RT(0,0)\) consists of tens of terms of the type

$$\begin{aligned} B_{11}(\alpha )= & {} -\int _{\mathbb {T}} \int _{\mathbb {T}} \frac{\sin (z^1(\alpha ) - z^1(\alpha -y)))(z^{1}_{\alpha }(\alpha ) - z^{1}_{\alpha }(\alpha -y))^{2}}{\cosh (z^2(\alpha ) - z^2(\alpha -y)) - \cos (z^1(\alpha ) - z^1(\alpha -y))} \\&\times \left( \frac{\sin (z^1(\alpha ) - z^1(\alpha -z))(z^{1}_{\alpha }(\alpha ) - z^{1}_{\alpha }(\alpha -z))}{\cosh (z^2(\alpha ) - z^2(\alpha -z)) - \cos (z^1(\alpha ) - z^1(\alpha -z))}\right. \\&\left. - \frac{\sin (z^1(\alpha -y) - z^1(\alpha -y-z))(z^{1}_{\alpha }(\alpha -y) - z^{1}_{\alpha }(\alpha -y-z))}{\cosh (z^2(\alpha -y) - z^2(\alpha -y-z)) - \cos (z^1(\alpha -y) - z^1(\alpha -y-z))} \right) dy dz \end{aligned}$$

which are 2-dimensional integrals which contain a singularity. In order to overcome this issue, natural extensions (to 2D) of the schemes outlined in Sect. 2.4 in the 1D case need to be done. \(\square \)

Corollary 3.3

Using the same techniques as in [41], we can construct solutions that change stability 4 times according to the transition unstable \(\rightarrow \) stable \(\rightarrow \) unstable \(\rightarrow \) stable \(\rightarrow \) unstable.

The last theorem we present applies to a model where we also take into account a permeability jump. This problem is important in the context of geothermal reservoirs [25], where it could represent different types of rock layers (impermeable, permeable), below which a heat source (magma) is located. See Fig. 3 for a depiction of the setting.

Fig. 3
figure 3

Situation of the different fluids and permeabilities.We have 3 regions with parameters \((\rho _1, \kappa _1), (\rho _2,\kappa _1),(\rho _2,\kappa _2)\) separated by two boundaries \(z(\alpha ,t)\) and \((\alpha ,-h_2)\) in a confined medium. The first one is a free boundary and the second is fixed

In that case, the evolution equation for the interface \(z(\alpha ,t)\) is more complicated and given by:

$$\begin{aligned} \partial _tz(\alpha )=&\,\bar{\rho }\text {P.V.}\int _\mathbb {R}\frac{(\partial _{\alpha }z(\alpha )-\partial _{\alpha }z(\beta ))\sinh (z_1(\alpha )-z_1(\beta ))}{\cosh (z_1(\alpha )-z_1(\beta ))-\cos (z_2(\alpha )-z_2(\beta ))}d\beta \nonumber \\&+{\bar{\rho }}\text {P.V.}\int _\mathbb {R}\frac{(\partial _{\alpha }z_1(\alpha )-\partial _{\alpha }z_1(\beta ),\partial _{\alpha }z_2(\alpha )+\partial _{\alpha }z_2(\beta ))\sinh (z_1(\alpha )-z_1(\beta ))}{\cosh (z_1(\alpha )-z_1(\beta ))+\cos (z_2(\alpha )+z_2(\beta ))}d\beta \nonumber \\&+\frac{1}{4\pi }\text {P.V.}\int _\mathbb {R}\varpi _2(\beta )BS(z_1(\alpha ),z_2(\alpha ),\beta ,-h_2)d\beta \nonumber \\&+\frac{\partial _{\alpha }z(\alpha )}{4\pi }\text {P.V.}\int _\mathbb {R}\varpi _2(\beta )\frac{\sin (z_2(\alpha )+h_2)}{\cosh (z_1(\alpha )-\beta )-\cos (z_2(\alpha )+h_2)}d\beta \nonumber \\&+\frac{\partial _{\alpha }z(\alpha )}{4\pi }\text {P.V.}\int _\mathbb {R}\varpi _2(\beta )\frac{\sin (z_2(\alpha )-h_2)}{\cosh (z_1(\alpha )-\beta )+\cos (z_2(\alpha )-h_2)}d\beta . \end{aligned}$$
(3.9)

with

$$\begin{aligned} \varpi _2(\alpha )=&-2\mathcal {K}BR(\varpi _1,z)h(\alpha )\cdot (1,0)+\frac{2\mathcal {K}^2}{2\pi }BR(\varpi _1,z)h(\alpha )\cdot (1,0)*G_{h_2,\mathcal {K}}\nonumber \\ =&\, 2\mathcal {K}{\bar{\rho }}\left[ \text {P.V.}\int _\mathbb {R}\partial _{\alpha }z_2(\beta )\frac{\sin (h_2+z_2(\beta ))}{\cosh (\alpha -z_1(\beta ))-\cos (h_2+z_2(\beta ))}d\beta \nonumber \right. \\&-\text {P.V.}\int _\mathbb {R}\partial _{\alpha }z_2(\beta )\frac{\sin (-h_2+z_2(\beta ))}{\cosh (\alpha -z_1(\beta ))+\cos (-h_2+z_2(\beta ))}d\beta \nonumber \\&-\frac{\mathcal {K}}{2\pi }G_{h_2,\mathcal {K}}*\text {P.V.}\int _\mathbb {R}\frac{\partial _{\alpha }z_2(\beta )\sin (h_2+z_2(\beta ))}{\cosh (\alpha -z_1(\beta ))-\cos (h_2+z_2(\beta ))}d\beta \nonumber \\&+\left. \frac{\mathcal {K}}{2\pi }G_{h_2,\mathcal {K}}*\text {P.V.}\int _\mathbb {R}\frac{\partial _{\alpha }z_2(\beta )\sin (-h_2+z_2(\beta ))}{\cosh (\alpha -z_1(\beta ))+\cos (-h_2+z_2(\beta ))}d\beta \right] , \end{aligned}$$
(3.10)

and

$$\begin{aligned} G_{h_2,\mathcal {K}}(\xi )=\int _\mathbb {R}\frac{\cos (y\xi )\sinh (2h_2 y)}{\sinh (\pi y)+\mathcal {K}\sinh (2h_2 y)}dy, \quad \mathcal {K}=\frac{\kappa ^1-\kappa ^2}{\kappa ^1+\kappa ^2}, \quad {\bar{\rho }}=\frac{\kappa ^1(\rho ^2-\rho ^1)}{4\pi }. \end{aligned}$$

The goal is to illustrate the different behaviours that may arise by looking at the short-time evolution of a family of initial data, depending on the height of the permeability jump and the magnitude of the permeabilities. This is shown in the bifurcation diagram in Fig. 4, where we plot whether for short time, the curve will shift stability or not.

Theorem 3.4

There exists a family of analytic initial data \(z(\alpha ,h_2) = (z_1(\alpha ,h_2),z_2(\alpha ,h_2))\), depending on the height at which the permeability jump is located, such that the corresponding solution to the confined, inhomogeneous Muskat (3.9) and (3.10):

  1. (a)
    1. 1.

      For all \(0.25< h_2 < h_2^{ntu} = 0.648\), the curve will not shift independently of \(\mathcal {K}\).

    2. 2.

      For all \( 0.676< h_2 < 0.686\), the permeabilities help the shift.

    3. 3.

      For all \( 0.715< h_2 < 0.738\), the permeabilities prevent the shift.

    4. 4.

      For all \( 0.77 = h_2^{tu}< h_2 < 1.25\), the curve will shift independently of \(\mathcal {K}\).

  2. (b)

    There exists a \(C^1\) curve \((h_2,\mathcal {K}(h_2))\), located in \([0.648,0.77] \times (-1,1)\), such that for every \(h_2\) for which the curve is defined, for every \(\mathcal {K}<\mathcal {K}(h_2)\) the curve does not turn and for every \(\mathcal {K}>\mathcal {K}(h_2)\) the curve turns.

Reduction of Theorem

3.4(a)

We proceed now to calculate \(\partial _{t} RT(0,0)\). Then, the appropriate expression is

$$\begin{aligned} \partial _{t} RT(0,0)=C(I_1+I_2), \end{aligned}$$

for some \(C > 0\), where

$$\begin{aligned} I_1=2\partial _{\alpha }z_2(0)\int _0^\infty \frac{\partial _{\alpha }z_1(\beta )\sinh (z_1(\beta ))\sin (z_2(\beta ))}{\left( \cosh (z_1(\beta ))-\cos (z_2(\beta ))\right) ^2}+\frac{\partial _{\alpha }z_1(\beta )\sinh (z_1(\beta ))\sin (z_2(\beta ))}{\left( \cosh (z_1(\beta ))+\cos (z_2(\beta ))\right) ^2}d\beta , \end{aligned}$$

and

$$\begin{aligned} I_2= & {} 4\partial _{\alpha }z_2(0)\mathcal {K}\int _0^\infty \int _0^\infty \frac{\partial _{\alpha }z_2(\gamma )\cos (z_1(\gamma )y)}{(\sinh (\pi y)+\mathcal {K}\sinh (2h_2 y))\cosh \left( y\frac{\pi }{2}\right) }\\&\times \left( 2y\cosh \left( \frac{y\pi }{2}- y h_2\right) \cosh \left( \frac{y \pi }{2}\right) -\frac{2\sinh \left( y h_2\right) }{\tan (h_2)}\right) \\&\times \cosh \left( y z_2(\gamma )\right) \cosh \left( y\left( \frac{\pi }{2}-h_2\right) \right) d\gamma dy. \end{aligned}$$

\(\square \)

The initial condition family we used for the bifurcation diagram was

$$\begin{aligned} z_1(\alpha )&= \alpha - \sin (\alpha )e^{-B \alpha ^{2}}, \quad B = 10^{-4} \nonumber \\ z_2(\alpha )&= h_2\frac{3}{\pi }\left( \frac{\sin (3\alpha )}{3}- \frac{\sin (\alpha )}{2.5}\left( e^{-(\alpha +2)^2}+e^{-(\alpha -2)^2}\right) \right) 1_{\{|\alpha | \le \pi \}}. \end{aligned}$$
(3.11)

We computed the bifurcation diagram depicted in Fig. 4. We could give an answer regarding the question of short time stability shifting to \(97.14\%\) of the parameter space. \(53.23\%\) of the space turned (red) and \(43.91\%\) did not turn (yellow). The remaining \(2.86\%\) is painted in white.

We proceeded as follows: for each region in parameter space, we computed an enclosure of \(\partial _t RT(0,0\)) for all values in that region. If we could establish a sign we painted the region of the corresponding color. If not, we subdivided the region and recomputed up to a certain maximum number of subdivisions. We remark that due to the actual answer being very close to zero or zero, the enclosures were not conclusive in some regions and more precision is required for those.

Fig. 4
figure 4

Bifurcation diagram corresponding to the phenomenon of stability shift for the initial condition given by the family of curves (3.11). Yellow (lighter color): no shift, red (darker color): shift. The boundary separating the two colors is smooth and can be parametrized as \((h_2,\mathcal {K}(h_2))\)

Reduction of Theorem

3.4(b) We want to invoke the Implicit Function Theorem. Thus, we have to check that

$$\begin{aligned} \frac{d}{d\mathcal {K}}\partial _tRT(0,0)\ne 0\text { for points }(h_2,\mathcal {K}) \text { such that }\partial _tRT(0,0)=0. \end{aligned}$$

In particular, we have to check the previous condition in an open set containing the white region in Fig. 4. We compute

$$\begin{aligned} DI_2\equiv & {} \frac{d}{d\mathcal {K}}\partial _tRT(0,0)=4\partial _{\alpha }z_2(0)\int _0^\infty \int _0^\infty \frac{\sinh (\pi y)\partial _{\alpha }z_2(\gamma )\cos (z_1(\gamma )y)}{(\sinh (\pi y)+\mathcal {K}\sinh (2h_2 y))^2\cosh \left( y\frac{\pi }{2}\right) }\\&\times \left( 2y\cosh \left( \frac{y\pi }{2}- y h_2\right) \cosh \left( \frac{y \pi }{2}\right) -\frac{2\sinh \left( y h_2\right) }{\tan (h_2)}\right) \\&\times \cosh \left( y z_2(\gamma )\right) \cosh \left( y\left( \frac{\pi }{2}-h_2\right) \right) d\gamma dy. \end{aligned}$$

and show it is always non-zero. \(\square \)

4 The surface quasi-geostrophic equation

The Surface Quasi-Geostrophic equation (SQG) is the following active scalar equation

$$\begin{aligned} \left( \partial _t + u\cdot \nabla \right) \theta = 0 \end{aligned}$$
(4.1)

where the relation between the incompressible velocity u and \(\theta \) is given by

$$\begin{aligned} u = \nabla ^{\perp }\psi , \quad \theta =-(-\Delta )^{\frac{1}{2}}\psi . \end{aligned}$$

The scalar \(\theta (x,t)\) represents the temperature and \(\psi (x,t)\) is the stream function. The non-local operator \(\Lambda ^{\gamma } = (-\Delta )^{\frac{\gamma }{2}}\) is defined through the Fourier transform by \(\widehat{\Lambda ^{\gamma }f}(\xi )=|\xi |^{\gamma }{\hat{f}}(\xi )\).

This equation has applications to meteorology and oceanography, since it comes from models of atmospheric and ocean fluids [117] and is a special case of the more general 3D quasi-geostrophic equation. There was a high scientific interest to understand the behavior of the SQG equation, initially because it is a plausible model to explain the formation of fronts of hot and cold air [117], and more recently [32] this system was proposed as a 2D model for the 3D vorticity intensification and a geometric and analytic analogy with the 3D incompressible Euler equations was shown.

One can also see a strong analogy with the 2D Euler equation in vorticity form:

$$\begin{aligned}&\left( \partial _t + u\cdot \nabla \right) \omega = 0 \nonumber \\&u = \nabla ^{\perp }\psi , \quad \omega =-(-\Delta )^{1}\psi , \end{aligned}$$
(4.2)

the only difference being a stronger singular character of the velocity in the SQG case.

The problem of whether the SQG system presents finite time singularities or there is global existence is open for the smooth case.

4.1 Brief history of the problem

Local existence has been proved in various functional settings [26, 32, 104, 137, 138]. Starting from initial data with infinite energy, a gradient blowup may occur [14]. For finite energy initial data, solutions may start arbitrarily small and grow arbitrarily big in finite time [100].

The numerical simulations in [32] proposed a blowup scenario in the form of a closing hyperbolic saddle. This was ruled out in [34, 35]. More modern numerical simulations were able to resolve past the initially predicted singular time and found no singularities [31]. A new scenario was proposed in in [126], starting from elliptical configurations, that develops filamentation and after a few cascades, blowup of \(\nabla \theta \).

Global existence of weak solutions in \(L^{2}\) was shown in [120], and extended to the class of initial data belonging to \(L^{p}\) with \(p > 4/3\) in [107]. Non-uniqueness of weak solutions has been proved in [10]. See also [113].

Through a different motivation, [36, 62, 63] the existence of a special type of solutions that are known as “almost sharp fronts” was studied. These solutions can be thought of as a regularization of a front, with a small strip around the front in which the solution changes (reasonably) from one value of the front to the other. See [81] for a construction of traveling waves.

4.2 Main Theorem

The main Theorem is the following [22]:

Theorem 4.1

There is a nontrivial global smooth solution for the SQG equations that has finite energy, is compactly supported and is 3-fold.

It is well known that radial functions are stationary solutions of (4.1) due to the structure of the nonlinear term. The solutions that will be constructed are a smooth perturbation in a suitable direction of a specific radial function. The smooth profile we will perturb satisfies (in polar coordinates)

$$\begin{aligned} \theta (r)\equiv \left\{ \begin{array}{lll} 1 &{}\quad \text {for } 0\le r \le 1-a\\ \text {smooth and decreasing} &{} \quad \text {for }1-a< r< 1 \\ 0 &{} \quad \text {for }1\le r <\infty \end{array}\right. , \end{aligned}$$

where a is a small number. In addition the dynamics of these solutions consist of global rotating level sets with constant angular velocity. These level sets are a perturbation of the circle. The motivation comes from the so-called “patch” problem: namely when \(\theta \) is a step function (see Fig. 5). In this setting, the uniformly rotating solutions are known as V-states.

Fig. 5
figure 5

The patch setting: the scalar \(\theta \) is an indicator function of a domain \(\Omega \). This setting is preserved in time: \(\theta \) will be an indicator function of a moving domain \(\Omega (t)\) for all t

In this setting, local existence of patch solutions has been obtained in [73, 122] and uniqueness in [33]. There are two scenarios suggesting finite time singularities: the first one [37], starting from two patches, suggests an asymptotically self-similar collapse between the two patches, and at the same time a blowup of the curvature at the touching point; the second one [127] evolves a thin elliptical patch and indicates a self-similar filamentation cascade ending at a singularity with a blowup of the curvature.

The first computations of V-states for the 2D Euler equation are numerical [55] and since then there have been many works in different settings [58, 105, 124, 136]. However, the first proof dates to [11] proving their existence and later [92] their regularity. See also [52, 53, 89, 90, 93, 94] for other studies in different directions (regularity of the boundary, different topologies, etc.).

For the SQG equation, in [20, 21] existence and analyticity of the boundary was shown. In [78] nontrivial stationary solutions are constructed. The existence of doubly-connected V-states with analytic boundary was done in [119]. For more results concerning V-States of other active scalar equations see [51, 57, 88, 91].

We have also managed to prove a similar result as Theorem 4.1 for other active scalar equations such as the 2D Euler equations [23] (see also [77]).

Reduction of Theorem

4.1

We start writing down the scalar \(\theta \) in terms of the level sets: \(\theta (z(\alpha ,\rho ,t),t) = f(\rho )\) and write down the level sets in polar coordinates as \(z(\alpha ,\rho ,t) = R(\lambda t)r(\alpha ,\rho )(\cos (\alpha ),\sin (\alpha ))\) with R being a rotation matrix. Then after a few algebraic manipulations one can show that a solution of the SQG equation that rotates with angular velocity \(\lambda \) has to satisfy \(F(r(\alpha ,\rho ),\lambda ) = 0\), where \(F(r(\alpha ,\rho ),\lambda )\) is an integrodifferential equation, and \(r(\alpha ,\rho )\) is the radial component of the solution. In this formulation, the coordinate \(\rho \) is related to the level of the level sets, and \(\alpha \) is an angular coordinate. The strategy is to apply an abstract Crandall–Rabinowitz theorem [49] bifurcating from \(r = \rho \), which corresponds to a radial function (and therefore a solution for every \(\lambda \)). We bifurcate from a smooth “annular” profile which is 1 inside the disk of radius 0.95, 0 outside the disk of radius 1 and \(C^4\) in between. The main steps to be shown are the following:

  1. 1.

    F is well defined and is \(C^{1}\).

  2. 2.

    Ker(\(\mathcal {F}\)) is one-dimensional, where \(\mathcal {F}\) is the linearized operator around \(r = \rho \) at \(\lambda = \lambda _{3}\), where \(\lambda _{3}\) has to be determined.

  3. 3.

    Y/Range(\(\mathcal {F}\)) is one-dimensional and \(F_{r \lambda }(r,\lambda _{3})(r_3) \not \in \) Range(\(\mathcal {F}\)), where Ker\((\mathcal {F}) = \langle r_3 \rangle \).

The hardest step is step 2. We look for solutions with 3-fold symmetry. To do so, we decompose the space and project onto the 3-rd Fourier mode in \(\alpha \). We try to find a function \(B_{3}(\rho )\) in such a way that the kernel of \(\mathcal {F}\) is generated by \(\rho B_{3}(\rho ) \cos (3\alpha )\). After rewriting the equations we end up having to solve an equation of the type

$$\begin{aligned} \lambda _{3} B_{3}(\rho ) = I(\rho )B_{3}(\rho ) + \int T^{3}(\rho ,\rho ')B_{3}(\rho ')d \rho ', \end{aligned}$$

where both I and \(T^{3}\) are

$$\begin{aligned} I(\rho )&= -\frac{1}{2\pi } \int _{0}^{1} f_\rho (\rho ')\left( \int _{-\pi }^\pi \frac{\cos (x)}{\sqrt{1+\left( \frac{\rho }{\rho '}\right) ^2-2\left( \frac{\rho }{\rho '}\right) \cos (x)}} dx\right) d\rho ' \\ T^3(\rho ,\rho ')&= \frac{1}{2\pi } f_\rho (\rho ') \frac{\rho '}{\rho }\int _{-\pi }^{\pi } \frac{\cos (mx)}{\sqrt{1+\left( \frac{\rho }{\rho '}\right) ^2-2\left( \frac{\rho }{\rho '}\right) \cos (x)}}dx d\rho '. \end{aligned}$$

Note that these functions can also be written in terms of elliptic integrals. We regard the equation as an eigenvalue problem, having to find an eigenpair (\(\lambda _{3}, B_{3}\)). In fact, we will look for the smallest eigenvalue \(\lambda _{3}\). The first drawback is that the integral operator is not symmetric: only close to symmetric so a priori it is not clear whether there is a (real) solution or not. Moreover, the appearance of the multiplication operator I has the effect that the RHS is not compact, making this step more challenging. Nonetheless, we can prove the existence of \(\lambda _{3}\) and \(B_{3}\).

The strategy is to consider this problem as a perturbation of a symmetric one. The hope is that if the antisymmetric part is small enough (compared to the gap between the eigenvalues), then there will be a real eigenpair. See Fig. 6 for a sketch of the situation: since there is only one eigenvalue inside the grey ball—which is sufficiently small—, it has to be real (otherwise there would be two). We can recast it into explicit, quantitative conditions involving the gap between the first eigenvalues and the norm of the antisymmetric part.

Fig. 6
figure 6

Sketch of the spectrum of the symmetric part of the RHS and the RHS

We now focus on the symmetric part of the RHS. Getting a lower bound of the first eigenvalue is easy via Rayleigh–Ritz bounds. The hard part is to get a lower bound of the second eigenvalue of a symmetric operator.

We can get advantage of the compact part. If it were finite dimensional, then the problem reduces to finding an eigenvalue of a matrix. The crucial observation is that since the operator is compact, it will be well approximated by a finite rank operator modulo a small error, which can be made arbitrarily small increasing the dimension of the finite rank operator. The error can be written as an explicit (singular) integral which can be bounded as the ones in Sect. 2.4. To deal with the singularity, we need to perform Taylor approximations of the elliptic functions with explicit error estimates. The philosophy can be summarized by the following: if we have an approximate guess of the eigenpair, then there is a true eigenpair nearby. This way we can get tight, explicit bounds of the spectrum which lead to the proof of Step 2.

Step 1 is standard and technical, and step 3 follows from a similar analysis of the adjoint problem. \(\square \)