1 Introduction

We consider the nonlinear two-parameter eigenvalue problem (N2EP)

$$\begin{aligned} \begin{aligned} T_1(\lambda ,\mu ) x_1&= 0,\\ T_2(\lambda ,\mu ) x_2&= 0, \end{aligned} \end{aligned}$$
(1.1)

where \(T_i(\lambda ,\mu )\) is an \(n_i\times n_i\) complex matrix, whose elements are analytic functions of \({\lambda ,\mu } \in {\mathbb C}\), and \(x_i\in {\mathbb C}^{n_i}\) for \(i=1,2\). Matrices could be of different sizes, but to keep things simple, from now on we assume that \(n_1=n_2=n\).

We are searching for nonzero vectors \(x_1, x_2\) and values \(\lambda ,\mu \) such that (1.1) is satisfied. In such case we say that the pair \((\lambda ,\mu )\) is an eigenvalue and the tensor product \(x_1\otimes x_2\) is the corresponding (right) eigenvector. Similarly, if \(y_1\) and \(y_2\) are nonzero vectors such that \(y_1^HT_1(\lambda ,\mu )=0\) and \(y_2^HT_2(\lambda ,\mu )=0\), then \(y_1\otimes y_2\) is the corresponding left eigenvector.

The N2EP can be seen as a generalization of both the nonlinear eigenvalue problem (NEP) and the two-parameter eigenvalue problem (2EP). The eigenvalues of (1.1) are the solutions of the following system of bivariate characteristic functions

$$\begin{aligned} \begin{aligned} f_1(\lambda ,\mu )&:= \det (T_1(\lambda ,\mu ))=0,\\ f_2(\lambda ,\mu )&:= \det (T_2(\lambda ,\mu ))=0. \end{aligned} \end{aligned}$$
(1.2)

Similar as the NEP, the problem (1.1) can have zero, finite, or infinite number of eigenvalues. We assume that the set of eigenvalues is zero-dimensional, i.e., each eigenvalue \((\lambda ,\mu )\) is an isolated point, so that the problem of numerical computation of some or all of the eigenvalues is well defined.

The paper if organized as follows. In Sect. 2, some motivating problems are presented. In Sect. 3, some basic facts about a related linear two-parameter eigenvalue problem are given. In Sect. 4 we give some theoretical results on N2EPs with simple eigenvalues. The main part of the paper are numerical methods for the N2EP in Sect. 5, which are followed by numerical examples in Sect. 6.

2 Motivating problems

An example of a N2EP is the quadratic two-parameter eigenvalue problem (Q2EP) [15, 22] of the form

$$\begin{aligned} \begin{aligned} Q_1(\lambda , \mu ) \, x_1&:= (A_{00} + \lambda A_{10} + \mu A_{01} + \lambda ^2 A_{20} + \lambda \mu A_{11} + \mu ^2 A_{02} ) \, x_1 = 0,\\ Q_2(\lambda , \mu ) \, x_2&:= (B_{00} + \lambda B_{10} + \mu B_{01} + \lambda ^2 B_{20} + \lambda \mu B_{11} + \mu ^2 B_{02} ) \, x_2 = 0,\qquad \quad \end{aligned} \end{aligned}$$
(2.1)

where \(A_{ij},B_{ij}\) are given \(n \times n\) complex matrices. A Q2EP of a simpler form, with some of the quadratic terms \(\lambda ^2\), \(\lambda \mu \), and \(\mu ^2\) missing, appears in delay-differential equations (DDEs) with the single delay [15].

The eigenvalues of the Q2EP (2.1) are the roots of the system of bivariate characteristic polynomials \(\det (Q_i(\lambda ,\mu ))=0\) for \(i=1,2\). It follows from Bézout’s theorem (see, e.g., [7]) that in the generic case the Q2EP (2.1) has \(4n^2\) eigenvalues.

As a Q2EP can be linearized as a singular linear two-parameter eigenvalue problem [10, 22], we cannot consider it entirely as a genuine nonlinear example. This is not true for the following N2EP

$$\begin{aligned} \begin{aligned} (A_0-\lambda I +\mu A_1 +\mu ^\alpha A_2)x_1&=0,\\ (A_1+\lambda \mu I + \mu A_0 + \mu ^{1-\alpha } A_2)x_2&=0, \end{aligned} \end{aligned}$$
(2.2)

where \(\alpha >0\) is not an integer. Problem (2.2) occurs in the study of critical delays of DDEs with two or more non-commensurate delays. For more details about the problem and its numerical solution, see Example 6.2.

3 Linear problem

A (linear) two-parameter eigenvalue problem (2EP) has the form

$$\begin{aligned} \begin{aligned} A_1x_1&=\lambda B_1x_1+\mu C_1x_1, \\ A_2x_2&=\lambda B_2x_2+\mu C_2x_2, \end{aligned} \end{aligned}$$
(3.1)

where \(A_i,B_i\), and \(C_i\) are \(n\times n\) complex matrices. We can study (3.1) in the tensor product space \({{\mathbb C}}^{n}\otimes {{\mathbb C}}^{n}\) by defining the so-called operator determinants

$$\begin{aligned} \varDelta _0= & {} B_1\otimes C_2-C_1\otimes B_2, \\ \varDelta _1= & {} A_1\otimes C_2-C_1\otimes A_2, \\ \varDelta _2= & {} B_1\otimes A_2-A_1\otimes B_2 \end{aligned}$$

(for details see, e.g., [3]). We say that the 2EP (3.1) is nonsingular when \(\varDelta _0\) is nonsingular. In this case the matrices \(\varDelta _0^{-1}\varDelta _1\) and \(\varDelta _0^{-1}\varDelta _2\) commute and (3.1) is equivalent to a coupled pair of generalized eigenvalue problems

$$\begin{aligned} \begin{aligned} \varDelta _1 z&= \lambda \varDelta _0 z, \\ \varDelta _2 z&= \mu \varDelta _0 z \end{aligned} \end{aligned}$$
(3.2)

for a decomposable tensor \(z=x_1\otimes x_2\).

For an overview of numerical methods for 2EPs, see, e.g., [9]. If n is small, we can solve the coupled pair (3.2). An algorithm of this kind, based on the QZ algorithm, is presented in [9]. Its complexity is \({\mathscr {O}}(n^6)\) because the \(\varDelta \)-matrices are of size \(n^2\times n^2\). Therefore, when n is large it is not feasible to compute all eigenpairs. Instead, we can compute few eigenpairs with iterative methods. The Jacobi–Davidson type method [9, 13] with harmonic Ritz values can compute eigenvalues \((\lambda ,\mu )\) that are close to a given target \((\lambda _T,\mu _T)\), while the Sylvester–Arnoldi type method [19] gives very good results in applications from mathematical physics where we need the eigenvalues with the smallest \(|\mu |\).

There are some iterative methods that can be applied to compute a solution close to a good initial approximation. One such method is the tensor Rayleigh quotient iteration (TRQI) from [24], which is a generalization of the standard Rayleigh quotient iteration (see, e.g., [8]) and computes one eigenpair at a time.

4 Simple eigenvalues

In this section we give some theoretical results on simple eigenvalues of a N2EP and discuss similarities and differences with the NEP.

For a start we recall the situation in the one-parameter case. Let \(\lambda _*\) be an eigenvalue of a NEP \(A(\lambda )x=0\), where the elements of matrix A are analytic functions of \(\lambda \). The geometric multiplicity \(m_g(\lambda _*)\) of \(\lambda _*\) is equal to the nullity of \(A(\lambda _*)\), and the algebraic multiplicity \(m_a(\lambda _*)\) is equal to the multiplicity of \(\lambda _*\) as a root of \(\det (A(\lambda ))=0\). We know that \(m_g(\lambda _*)\le m_a(\lambda _*)\) holds for each eigenvalue \(\lambda _*\).

If \((\lambda _*,\mu _*)\) is an eigenvalue of the N2EP (1.1), then its geometric multiplicity is

$$\begin{aligned} m_g(\lambda _*,\mu _*)=\mathrm{nullity}(T_1(\lambda _*,\mu _*))\cdot \mathrm{nullity}(T_2(\lambda _*,\mu _*)). \end{aligned}$$

Algebraic multiplicity \(m_a(\lambda _*,\mu _*)\) is the multiplicity of \((\lambda _*,\mu _*)\) as a root of the system (1.2). The Jacobian of (1.2) is nonsingular in an algebraically simple eigenvalue. Following a proof for the NEP in [27, Proposition 2.1], we can show that also for the N2EP the algebraic simplicity of an eigenvalue implies the geometric simplicity.

Proposition 4.1

Each algebraically simple eigenvalue of the N2EP (1.1) is geometrically simple.

Proof

Let \((\lambda _*,\mu _*)\) be an algebraically simple eigenvalue of (1.1) and suppose that its geometric multiplicity is \(m_g\ge 2\). Without loss of generality we can assume that in this case \(k:=\mathrm{nullity}(T_1(\lambda _*,\mu _*))\ge 2\). Then there exist permutation matrices \(P_\ell \) and \(P_r\) such that the \((n-k)\times (n-k)\) block \(A_{11}(\lambda _*,\mu _*)\) is nonsingular, where

$$\begin{aligned} P_\ell T_1(\lambda ,\mu )P_r^T = \left[ \begin{matrix}A_{11}(\lambda ,\mu ) &{}\quad A_{12}(\lambda ,\mu )\\ A_{21}(\lambda ,\mu ) &{}\quad A_{22}(\lambda ,\mu )\end{matrix}\right] =:A(\lambda ,\mu ). \end{aligned}$$

For \((\lambda ,\mu )\) close to \((\lambda _*,\mu _*)\), block \(A_{11}(\lambda ,\mu )\) is nonsingular and we can write

$$\begin{aligned} L(\lambda ,\mu )A(\lambda ,\mu )R(\lambda ,\mu )= \left[ \begin{matrix}A_{11}(\lambda ,\mu ) &{}\quad 0 \\ 0 &{}\quad S(\lambda ,\mu )\end{matrix}\right] =:D(\lambda ,\mu ), \end{aligned}$$

where

$$\begin{aligned} L(\lambda ,\mu )=\left[ \begin{matrix}I_{n-k} &{}\quad 0\\ -A_{21}(\lambda ,\mu )A_{11}^{-1}(\lambda ,\mu ) &{}\quad I_k\end{matrix}\right] ,\quad R(\lambda ,\mu )=\left[ \begin{matrix}I_{n-k} &{}\quad -A_{11}^{-1}(\lambda ,\mu )A_{12}(\lambda ,\mu ) \\ 0 &{}\quad I_k\end{matrix}\right] , \end{aligned}$$

and

$$\begin{aligned} S(\lambda ,\mu )=A_{22}(\lambda ,\mu )-A_{21}(\lambda ,\mu )A_{11}^{-1}(\lambda ,\mu )A_{12}(\lambda ,\mu ) \end{aligned}$$

is the Schur complement of \(A_{11}(\lambda ,\mu )\).

Since \(\mathrm{rank}(D(\lambda _*,\mu _*))=n-k\) and \(A_{11}(\lambda _*,\mu _*)\) is nonsingular, \(S(\lambda _*,\mu _*)=0\). The determinant \(f_1(\lambda ,\mu ):=\det (T_1(\lambda ,\mu ))\) agrees up to the sign with \(\det (D(\lambda ,\mu ))=\det (A_{11}(\lambda ,\mu ))\det (S(\lambda ,\mu ))\). Since \(S(\lambda _*,\mu _*)=0\) and the size of the block \(S(\lambda ,\mu )\) is at least \(2\times 2\), it follows that \({\partial \over \partial \lambda }\det (S(\lambda _*,\mu _*))={\partial \over \partial \mu }\det (S(\lambda _*,\mu _*))=0\) and thus \({\partial f_1\over \partial \lambda }(\lambda _*,\mu _*)={\partial f_1\over \partial \mu }(\lambda _*,\mu _*)=0\). The Jacobian of \(f_1(\lambda ,\mu )\) and \(f_2(\lambda ,\mu )\) at \((\lambda _*,\mu _*)\) is then singular and \((\lambda _*,\mu _*)\) is a multiple eigenvalue, which is a contradiction.\(\square \)

If \(\lambda _*\) is an algebraically simple eigenvalue of the NEP \(A(\lambda )x=0\), then \(A(\sigma )\) is nonsingular for \(\sigma \ne \lambda _*\) sufficiently close to \(\lambda _*\). On the contrary, if \((\lambda _*,\mu _*)\) is an algebraically simple eigenvalue of the N2EP (1.1) then there always exist points \((\sigma ,\tau )\ne (\lambda _*,\mu _*)\) arbitrarily close to \((\lambda _*,\mu _*)\) such that one of the matrices \(T_1(\sigma ,\tau )\) or \(T_2(\sigma ,\tau )\) is singular. In fact, it follows from the nonsingularity of the Jacobian of (1.2) in \((\lambda _*,\mu _*)\) and the implicit function theorem that in a small neighbourhood of \((\lambda _*,\mu _*)\) there exists a parametric curve \((\lambda _i(t),\mu _i(t))\), where \(t\in (-\varepsilon ,\varepsilon )\), such that \(\lambda _i(0)=\lambda _*\), \(\mu _i(0)=\mu _*\), and \(\det (T_i(\lambda _i(t),\mu _i(t)))=0\) for \(t\in (-\varepsilon ,\varepsilon )\) for \(i=1,2\). The eigenvalue \((\lambda _*,\mu _*)\) is the only intersection of curves \((\lambda _1(t),\mu _1(t))\) and \((\lambda _2(t),\mu _2(t))\).

If y and x are the left and the right eigenvector of an algebraically simple eigenvalue \(\lambda _*\) of a NEP \(A(\lambda )x=0\), then it is well-known that \(y^HA'(\lambda _*)x\ne 0\), see, e.g., [2, 23, 27]. In [21] this relation is generalized to the following proposition for the N2EP.

Proposition 4.2

([21, Proposition 3.2]) Let \((\lambda _*, \mu _*)\) be an algebraically simple eigenvalue of the N2EP (1.1), and let \( x_1\otimes x_2\) and \( y_1\otimes y_2\) be the corresponding right and left eigenvector. Then the matrix

$$\begin{aligned} \left[ \begin{matrix} y_1^H{\partial T_1\over \partial \lambda }( \lambda _*, \mu _*) x_1&{}&{}\quad y_1^H{\partial T_1\over \partial \mu }( \lambda _*, \mu _*) x_1\\ y_2^H{\partial T_2\over \partial \lambda }( \lambda _*, \mu _*) x_2&{}&{}\quad y_2^H{\partial T_2\over \partial \mu }( \lambda _*, \mu _*) x_2 \end{matrix}\right] \end{aligned}$$
(4.1)

is nonsingular.

The above result is used in the next section to show the quadratic convergence of a generalization of the inverse iteration to the N2EP close to an algebraically simple eigenvalue. It is also a part of the selection criteria that enables us to compute more than one eigenvalue using the Jacobi–Davidson method [11].

5 Numerical methods

In this section we generalize some numerical methods for NEPs to N2EPs. For an overview of numerical methods for the NEP, see, e.g., [20, 26]. All methods in this section can be applied to a truly N2EP, i.e, such that cannot be transformed into a polynomial one. The methods can of course be applied to the polynomial two-parameter eigenvalue problem (P2EP) as well, but let us remark that there are other specific methods for P2EPs, like the linearization to a singular 2EP [10, 22] and the Jacobi–Davidson method [11].

5.1 Inverse iteration

First of the methods that can be generalized to N2EP is the inverse iteration, introduced by Ruhe [26]. In this method we choose nonzero vectors \(v_1,v_2\in {\mathbb C}^n\) and apply Newton’s method to \(F(x_1,x_2,\lambda ,\mu )=0\), where

$$\begin{aligned} F(x_1,x_2,\lambda ,\mu ):=\left[ \begin{array}{l}T_1(\lambda ,\mu )x_1 \\ T_2(\lambda ,\mu )x_2 \\ v_1^Hx_1-1 \\ v_2^Hx_2-1\end{array}\right] . \end{aligned}$$

The vector \(v_i\) is used to normalize the eigenvector \(x_i\), so it should not be orthogonal to \(x_i\) for \(i=1,2\). If \((x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k})\) is the current approximation for the eigenpair, then we get the update from the linear system

$$\begin{aligned} JF(x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k}) \left[ \begin{matrix}\varDelta x_1^{(k)}\\ \varDelta x_2^{(k)}\\ \varDelta \lambda _k\\ \varDelta \mu _k\end{matrix}\right] = -\left[ \begin{matrix}T_1(\lambda _k,\mu _k)x_1^{(k)} \\ T_2(\lambda _k,\mu _k)x_2^{(k)} \\ v_1^Hx_1^{(k)}-1 \\ v_2^Hx_2^{(k)}-1\end{matrix}\right] \end{aligned}$$
(5.1)

with the Jacobian

$$\begin{aligned} JF(x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k})= \left[ \begin{matrix}T_1({\scriptstyle \lambda _k,\mu _k}) &{}\quad 0 &{}\quad {\partial T_1\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)} &{}\quad {\partial T_1\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)}\\ 0 &{}\quad T_2({\scriptstyle \lambda _k,\mu _k}) &{}\quad {\partial T_2\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)} &{}\quad {\partial T_2\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)}\\ v_1^H &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad v_2^H &{}\quad 0 &{}\quad 0\end{matrix}\right] . \end{aligned}$$
(5.2)

Let us assume that the matrices \(T_1({ \lambda _k,\mu _k})\) and \(T_2({ \lambda _k,\mu _k})\) are nonsingular. We can further assume that \(x_1^{(k)}\) and \(x_2^{(k)}\) are normalized, i.e., \(v_1^Hx_1^{(k)}=v_2^Hx_2^{(k)}=1\). It now follows from the bottom two rows of (5.1) that \(v_1^H\varDelta x_1^{(k)}=v_2^H\varDelta x_2^{(k)}=0\). By multiplying the top two rows of (5.1) by \(v_1^H\) and \(v_2^H\), respectively, we obtain a \(2\times 2\) linear system

$$\begin{aligned} \left[ \begin{matrix}v_1^HT_1({ \lambda _k,\mu _k})^{-1}{\partial T_1\over \partial \lambda }({ \lambda _k,\mu _k})x_1^{(k)} &{} &{} v_1^HT_1({ \lambda _k,\mu _k})^{-1}{\partial T_1\over \partial \mu }({ \lambda _k,\mu _k})x_1^{(k)}\\ v_2^HT_2({ \lambda _k,\mu _k})^{-1}{\partial T_2\over \partial \lambda }({ \lambda _k,\mu _k})x_2^{(k)} &{} &{} v_2^HT_2({ \lambda _k,\mu _k})^{-1}{\partial T_2\over \partial \mu }({ \lambda _k,\mu _k})x_2^{(k)}\end{matrix}\right] \left[ \begin{matrix}\varDelta \lambda _k\\ \varDelta \mu _k\end{matrix}\right] = \left[ \begin{matrix}-1 \\ -1\end{matrix}\right] \end{aligned}$$

for \(\varDelta \lambda _k\) and \(\varDelta \mu _k\). Once \(\varDelta \lambda _k\) and \(\varDelta \mu _k\) are known, we get the corrections

$$\begin{aligned} \varDelta x_1^{(k)}= & {} - T_1({ \lambda _k,\mu _k})^{-1}\left( T_1({ \lambda _k,\mu _k}) +\varDelta \lambda _k {\partial T_1\over \partial \lambda }({ \lambda _k,\mu _k}) + \varDelta \mu _k {\partial T_1\over \partial \mu }({ \lambda _k,\mu _k})\right) x_1^{(k)}, \nonumber \\ \varDelta x_2^{(k)}= & {} - T_2({ \lambda _k,\mu _k})^{-1}\left( T_2({ \lambda _k,\mu _k}) +\varDelta \lambda _k {\partial T_2\over \partial \lambda }({ \lambda _k,\mu _k}) + \varDelta \mu _k {\partial T_2\over \partial \mu }({ \lambda _k,\mu _k})\right) x_2^{(k)}\nonumber \\ \end{aligned}$$
(5.3)

from the top two rows of (5.1). This gives the approximation for the eigenvector

$$\begin{aligned} x_1^{(k+1)}= & {} -\varDelta \lambda _k T_1({\lambda _k,\mu _k})^{-1}{\partial T_1\over \partial \lambda }({ \lambda _k,\mu _k})x_1^{(k)} - \varDelta \mu _k T_1({ \lambda _k,\mu _k})^{-1}{\partial T_1\over \partial \mu }({ \lambda _k,\mu _k})x_1^{(k)},\nonumber \\ x_2^{(k+1)}= & {} -\varDelta \lambda _k T_2({ \lambda _k,\mu _k})^{-1}{\partial T_2\over \partial \lambda }({ \lambda _k,\mu _k})x_2^{(k)} - \varDelta \mu _k T_2({ \lambda _k,\mu _k})^{-1}{\partial T_2\over \partial \mu }({ \lambda _k,\mu _k})x_2^{(k)}. \end{aligned}$$

The overall algorithm is presented in Algorithm 1. The complexity of one step is \({\mathscr {O}}(n^3)\). Note that Step 8 is just for precaution in numerical computation as in theory vectors \(x_1^{(k+1)}\) and \(x_2^{(k+1)}\) should already be normalized.

figure a

Steps 3 and 4 rely on the nonsingularity of \(T_1(\lambda _k,\mu _k)\) and \(T_2(\lambda _k,\mu _k)\). Let us show that in practice this does not present a difficulty. When we are close to the solution, \(T_1({ \lambda _k,\mu _k})\) and \(T_2({ \lambda _k,\mu _k})\) are nearly singular and we can expect that vectors \(a_1,a_2,b_1\), and \(b_2\) have huge norm, which then also applies to the matrix in Step 5. However, the condition number of this matrix stays bounded as in vicinity of the eigenpair it is close to a diagonally scaled matrix (4.1), where the scaling factors are expected to be of a similar magnitude. A rough justification is as follows.

Let \((x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k})\) be close to \((x_1,x_2,\lambda _*,\mu _*)\), where \((\lambda _*,\mu _*)\) is an algebraically simple eigenvalue with the right and left eigenvectors \(x_1\otimes x_2\) and \(y_1\otimes y_2\). Then the smallest singular value \(\sigma _{in}\) of \(T_i(\lambda _k,\mu _k)\) is close to zero and the corresponding right and left singular vectors are close to \(x_i\) and \(y_i\), respectively. It follows that

$$\begin{aligned} a_i\approx {y_i^H {\partial T_i\over \partial \lambda }(\lambda _*,\mu _*)x_i\over \sigma _{in}}x_i, \quad b_i\approx {y_i^H {\partial T_i\over \partial \mu }(\lambda _*,\mu _*)x_i\over \sigma _{in}}x_i \end{aligned}$$

for \(i=1,2\) and

$$\begin{aligned} \left[ \begin{matrix}v_1^Ha_1 &{}\quad v_1^Hb_1 \\ v_2^Ha_2 &{}\quad v_2^Hb_2\end{matrix}\right] \approx \left[ \begin{matrix} \sigma _{1n}^{-1} &{}\quad \\ &{}\quad \sigma _{2n}^{-1}\end{matrix}\right] \left[ \begin{matrix} y_1^H{\partial T_1\over \partial \lambda }( \lambda _*, \mu _*) x_1&{}&{}\quad y_1^H{\partial T_1\over \partial \mu }( \lambda _*, \mu _*) x_1\\ y_2^H{\partial T_2\over \partial \lambda }( \lambda _*, \mu _*) x_2&{}&{}\quad y_2^H{\partial T_2\over \partial \mu }( \lambda _*, \mu _*) x_2 \end{matrix}\right] . \end{aligned}$$

In practice we do not have to worry if \(T_i(\lambda _k,\mu _k)\) is singular, because, similar as in the inverse power method, rounding errors prevent the method from getting into trouble. A safe alternative with no worries is to solve (5.1) as a \((2n+2)\times (2n+2)\) linear system. However, this is approximately four times more expensive since we do not exploit the block structure of the Jacobian matrix.

Theorem 5.1

Let \((\lambda _*,\mu _*)\) be an algebraically simple eigenvalue of the N2EP (1.1) and let \(x_1\otimes x_2\) be the corresponding right eigenvector such that \(v_1^Hx_1=v_2^Hx_2=1\). Then the inverse iteration has quadratic convergence close to the solution.

Proof

Since this is Newton’s method, it is enough to show that the Jacobian (5.2) is nonsingular at \((x_1,x_2,\lambda _*,\mu _*)\). So, suppose that the Jacobian is singular. Then there exist vectors \(z_1,z_2\) and scalars \(\alpha _1,\alpha _2\), not all of them being zero, such that

$$\begin{aligned} JF(x_1,x_2,\lambda _*,\mu _*) \left[ \begin{matrix}z_1 \\ z_2 \\ \alpha _1 \\ \alpha _2\end{matrix}\right] =0. \end{aligned}$$
(5.4)

If we multiply the first and the second equation of (5.4) by \(y_1^H\) and \(y_2^H\), respectively, where \(y_1\otimes y_2\) is the left eigenvector of \((\lambda _*,\mu _*)\), we obtain

$$\begin{aligned} \left[ \begin{matrix} y_1^H{\partial T_1\over \partial \lambda }( \lambda _*, \mu _*) x_1&{} &{} y_1^H{\partial T_1\over \partial \mu }( \lambda _*, \mu _*) x_1\\ y_2^H{\partial T_2\over \partial \lambda }( \lambda _*, \mu _*) x_2&{} &{} y_2^H{\partial T_2\over \partial \mu }( \lambda _*, \mu _*) x_2 \end{matrix}\right] \left[ \begin{matrix}\alpha _1 \\ \alpha _2\end{matrix}\right] =0. \end{aligned}$$

As we know from Proposition 4.2 that the above system is nonsingular for an algebraically simple eigenvalue, it follows that \(\alpha _1=\alpha _2=0\).

From the first and the second equation of (5.4) we now see that there should exist scalars \(\beta _1\) and \(\beta _2\) such that \(z_1=\beta _1x_1\) and \(z_2=\beta _2x_2\), as the null spaces of \(T_1(\lambda _*,\mu _*)\) and \(T_2(\lambda _*,\mu _*)\) have dimension 1. But then the last two equations of (5.4) read as \(\beta _1 v_1^H x_1=\beta _2 v_2^H x_2=0\), which yields \(\beta _1=\beta _2=0\) and we have a contradiction.\(\square \)

The inverse iteration was applied to 2EPs in [16] and as a part of the continuation method in [24]. On both occasions the method was enhanced by a generalization of the Rayleigh quotient for the 2EP. In the next subsection we introduce several generalizations of the Rayleigh quotient for N2EPs that could be used for such purpose.

5.2 Residual inverse iteration

In each step of the inverse iteration we spend many operations for solving the linear systems with the matrices \(T_1(\lambda _k,\mu _k)\) and \(T_2(\lambda _k,\mu _k)\). The same difficulty appears in the inverse iteration for the NEP. Neumaier showed in [23] that in the NEP we can use a fixed matrix instead, if we are prepared to trade the quadratic convergence for a linear one.

This approach can be generalized to the N2EP as follows. Starting from (5.3) and following arguments from [23] we write

$$\begin{aligned} x_i^{(k)}-x_i^{(k+1)}= & {} \textstyle T_i(\lambda _k,\mu _k)^{-1}\big (T_i(\lambda _k,\mu _k)+\varDelta \lambda _k {\partial T_i\over \partial \lambda }(\lambda _k,\mu _k) +\varDelta \mu _k {\partial T_i\over \partial \mu }(\lambda _k,\mu _k)\big )x_i^{(k)}\\= & {} \textstyle T_i(\lambda _k,\mu _k)^{-1}\big (T_i(\lambda _{k+1},\mu _{k+1})+{\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \big )x_i^{(k)}\\= & {} \textstyle T_i(\lambda _k,\mu _k)^{-1}T_i(\lambda _{k+1},\mu _{k+1})x_i^{(k)}+{\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \end{aligned}$$

for \(i=1,2\). We can approximate \(T_i(\lambda _k,\mu _k)^{-1}\) by \(T_i(\sigma ,\tau )^{-1}\), where \((\sigma ,\tau )\) is a fixed shift close to the eigenvalue, if we have an approximation for \((\lambda _{k+1},\mu _{k+1})\). To obtain such approximation, we generalize the Rayleigh functional to N2EP.

If \(x_1^{(k)}\otimes x_2^{(k)}\) is an approximation for the eigenvector, then the first generalization of the Rayleigh functional are solutions of the following, in general nonlinear, bivariate system

$$\begin{aligned} \begin{aligned} {x_1^{(k)}}^H T_1(\lambda _{k+1},\mu _{k+1}) x_1^{(k)}&= 0, \\ {x_2^{(k)}}^H T_2(\lambda _{k+1},\mu _{k+1}) x_2^{(k)}&= 0. \end{aligned} \end{aligned}$$
(5.5)

For a new approximation \((\lambda _{k+1},\mu _{k+1})\) we take the solution of (5.5) that is closest to \((\lambda _k,\mu _k)\). For the 2EP the system (5.5) has exactly one solution. It is called the tensor Rayleigh quotient and is equal to the pair \((\rho _1,\rho _2)\), where

$$\begin{aligned} \rho _1(x_1^{(k)}\!\!,\,x_2^{(k)}):= & {} { ({x_1^{(k)}}^H\! A_1 x_1^{(k)})({x_2^{(k)}}^H\! C_2 x_2^{(k)}) - ({x_1^{(k)}}^H\! C_1 x_1^{(k)})({x_2^{(k)}}^H\! A_2 x_2^{(k)}) \over ({x_1^{(k)}}^H\! B_1 x_1^{(k)})({x_2^{(k)}}^H\! C_2 x_2^{(k)}) - ({x_1^{(k)}}^H\! C_1 x_1^{(k)})({x_2^{(k)}}^H\! B_2 x_2^{(k)}) },\\ \rho _2(x_1^{(k)}\!\!,\,x_2^{(k)}):= & {} { ({x_1^{(k)}}^H\! B_1 x_1^{(k)})({x_2^{(k)}}^H\! A_2 x_2^{(k)}) - ({x_1^{(k)}}^H\! A_1 x_1^{(k)})({x_2^{(k)}}^H\! B_2 x_2^{(k)}) \over ({x_1^{(k)}}^H\! B_1 x_1^{(k)})({x_2^{(k)}}^H\! C_2 x_2^{(k)}) - ({x_1^{(k)}}^H\! C_1 x_1^{(k)})({x_2^{(k)}}^H\! B_2 x_2^{(k)}) }. \end{aligned}$$

For example, if we take the Q2EP, then (5.5) has 4 solutions in the general case. At least one of these four solutions is an eigenvalue if \(x_1^{(k)}\otimes x_2^{(k)}\) is an exact eigenvector.

We can think of (5.5) as a one-sided Rayleigh functional, because we use the same vectors on the left and the right side. We could use the two-sided version instead, where we use approximation for the left eigenvector on the left side. If, similar to the NEP, we use \(T_1(\sigma ,\tau )^{-H}v_1\otimes T_2(\sigma ,\tau )^{-H}v_2\) for an approximation to the left eigenvector, we obtain the following system for the two-sided Rayleigh functional:

$$\begin{aligned} \begin{aligned} v_1^HT_1(\sigma ,\tau )^{-1}T_1(\lambda _{k+1},\mu _{k+1})x_1^{(k)}&=0,\\\ v_2^HT_2(\sigma ,\tau )^{-1}T_2(\lambda _{k+1},\mu _{k+1})x_2^{(k)}&=0. \end{aligned} \end{aligned}$$
(5.6)

Instead of solving (5.6) we perform one step of Newton’s method using \((\lambda _k,\mu _k)\) as an initial approximation. As we show in Theorem 5.2, this is enough for convergence. This yields

$$\begin{aligned} \left[ \begin{matrix} v_1^HT_1({\scriptstyle \sigma ,\tau })^{-1}{\partial T_1\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)} &{}\quad v_1^HT_1({\scriptstyle \sigma ,\tau })^{-1}{\partial T_1\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)}\\ v_2^HT_2({\scriptstyle \sigma ,\tau })^{-1}{\partial T_2\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)} &{}\quad v_2^HT_2({\scriptstyle \sigma ,\tau })^{-1}{\partial T_2\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)}\end{matrix}\right] \left[ \begin{matrix}\varDelta \lambda _k\\ \varDelta \mu _k\end{matrix}\right] = -\left[ \begin{matrix}\gamma _1\\ \gamma _2\end{matrix}\right] , \end{aligned}$$
(5.7)

where \(\gamma _i=v_i^H T_i(\sigma ,\tau )^{-1} T_i(\lambda _k,\mu _k)x_i^{(k)}\) for \(i=1,2\). The values \(\lambda _{k+1}=\lambda _k+\varDelta \lambda _k\) and \(\mu _{k+1}=\mu _k+\varDelta \mu _k\) present a generalization of Lancaster’s one-parameter Rayleigh functional [18].

Hence, when we use the residual iteration, we first compute the new eigenvalue approximation \((\lambda _{k+1},\mu _{k+1})\) from (5.7). The new eigenvector approximation before normalization is then

$$\begin{aligned} z_1= & {} x_1^{(k)}-T_1(\sigma ,\tau )^{-1} T_1(\lambda _{k+1},\mu _{k+1})x_1^{(k)},\nonumber \\ z_2= & {} x_2^{(k)}-T_2(\sigma ,\tau )^{-1}T_2(\lambda _{k+1},\mu _{k+1})x_2^{(k)}. \end{aligned}$$
figure b

The above procedure is presented in Algorithm 2, where we use the initial eigenvalue approximation \((\lambda _0,\mu _0)\) as a fixed shift \((\sigma ,\tau )\). The residual iteration has linear convergence if the initial approximation is close to the eigenpair.

Theorem 5.2

Let \((\lambda _*,\mu _*)\) be an algebraically simple eigenvalue of the N2EP (1.1) and let \(x_1\otimes x_2\) be the corresponding right eigenvector such that \(v_1^Hx_1=v_2^Hx_2=1\). If \((\lambda _0,\mu _0)\) is sufficiently close to the eigenvalue, then the residual iteration has linear convergence close to the solution.

Proof

We know that the inverse iteration, being a Newton’s method, has quadratic convergence close to the solution. If we replace the Jacobian (5.2) in the inverse iteration with the matrix

$$\begin{aligned} B(x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k}):= \left[ \begin{matrix}T_1({\scriptstyle \lambda _0,\mu _0}) &{}\quad 0 &{}\quad {\partial T_1\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)} &{}\quad {\partial T_1\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_1^{(k)}\\ 0 &{}\quad T_2({\scriptstyle \lambda _0,\mu _0}) &{}\quad {\partial T_2\over \partial \lambda }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)} &{}\quad {\partial T_2\over \partial \mu }({\scriptstyle \lambda _k,\mu _k})x_2^{(k)}\\ v_1^H &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad v_2^H &{}\quad 0 &{}\quad 0\end{matrix}\right] , \end{aligned}$$

where the top left diagonal blocks of the Jacobian are fixed at \((\lambda _0,\mu _0)\), then the new method has linear convergence when \((\lambda _0,\mu _0)\) is sufficiently close to \((\lambda _*,\mu _*)\) (see, e.g., [17, Theorem 5.4.1]). If we assume that \(v_1^Hx_1^{(k)}=v_2^Hx_2^{(k)}=1\), then the solution of the system

$$\begin{aligned} B(x_1^{(k)}\!\!,\,x_2^{(k)}\!\!,\,\lambda _{k},\mu _{k})^{-1} \left[ \begin{matrix}\varDelta x_1^{(k)}\\ \varDelta x_2^{(k)}\\ \varDelta \lambda _k\\ \varDelta \mu _k\end{matrix}\right] = -\left[ \begin{matrix}T_1(\lambda _k,\mu _k)x_1^{(k)} \\ T_2(\lambda _k,\mu _k)x_2^{(k)} \\ v_1^Hx_1^{(k)}-1 \\ v_2^Hx_2^{(k)}-1\end{matrix}\right] \end{aligned}$$

gives (5.7) for the corrections \(\varDelta \lambda _k\) and \(\varDelta \mu _k\), and we get

$$\begin{aligned} x_i^{(k+1)}= & {} \textstyle x_i^{(k)}-T_i(\lambda _0,\mu _0)^{-1}\big (T_i(\lambda _k,\mu _k) + \varDelta \lambda _k {\partial T_i\over \partial \lambda }(\lambda _k,\mu _k) + \varDelta \mu _k {\partial T_i\over \partial \mu }(\lambda _k,\mu _k)\big )x_i^{(k)}\\= & {} x_i^{(k)}-T_i(\lambda _0,\mu _0)^{-1}\left( T_i(\lambda _{k+1},\mu _{k+1})+ {\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \right) x_i^{(k)} \end{aligned}$$

for \(i=1,2\). This differs only for \({\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \) from the vectors that are computed in Steps 8 and 9 in Algorithm 2. As this difference is too small to destroy the convergence, the residual iteration has linear convergence as well.\(\square \)

5.3 Successive linear problems

This method was also introduced by Ruhe [26]. Its generalization to the N2EP is as follows. We expand

$$\begin{aligned} T_1(\lambda _k-\varDelta \lambda _k,\mu _k-\varDelta \mu _k)x_1= & {} 0,\\ T_2(\lambda _k-\varDelta \lambda _k,\mu _k-\varDelta \mu _k)x_2= & {} 0 \end{aligned}$$

in the Taylor series as

$$\begin{aligned} \textstyle \big (T_1(\lambda _k,\mu _k)-\varDelta \lambda _k{\partial T_1\over \partial \lambda }(\lambda _k,\mu _k) -\varDelta \mu _k{\partial T_1\over \partial \mu }(\lambda _k,\mu _k) +{\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \big )x_1= & {} 0,\\ \textstyle \big (T_2(\lambda _k,\mu _k)-\varDelta \lambda _k{\partial T_2\over \partial \lambda }(\lambda _k,\mu _k) -\varDelta \mu _k{\partial T_2\over \partial \mu }(\lambda _k,\mu _k) +{\mathscr {O}}\left( (|\varDelta \lambda _k|+|\varDelta \mu _k|)^2\right) \big )x_2= & {} 0. \end{aligned}$$

We discard higher order terms and take for \((\varDelta \lambda _k,\varDelta \mu _k)\) the eigenvalue closest to (0, 0) of the 2EP

$$\begin{aligned} \begin{aligned} \textstyle T_1(\lambda _k,\mu _k)x_1&= \textstyle \varDelta \lambda _k{\partial T_1\over \partial \lambda }(\lambda _k,\mu _k)x_1 +\varDelta \mu _k{\partial T_1\over \partial \mu }(\lambda _k,\mu _k)x_1, \\ \textstyle T_2(\lambda _k,\mu _k)x_2&= \textstyle \varDelta \lambda _k{\partial T_2\over \partial \lambda }(\lambda _k,\mu _k)x_2 +\varDelta \mu _k{\partial T_2\over \partial \mu }(\lambda _k,\mu _k)x_2. \end{aligned} \end{aligned}$$
(5.8)

The procedure is presented in Algorithm 3.

figure c

Numerical results show that the convergence is quadratic. One step has complexity \({\mathscr {O}}(n^6)\) when the algorithm from [9] is used to solve the 2EP in Step 3. But, since we only need one eigenvalue of (5.8), we can merge Steps 3 and 4 and apply an iterative method that can efficiently compute the closest eigenvalue. Here, the Jacobi–Davidson method from [13] or subspace methods from [19] could be applied.

5.4 Newton’s method on determinants

Instead of (1.1) we can consider the system of determinants (1.2). In order to apply Newton’s method we need an efficient numerical method for the partial derivatives \({\partial f_i\over \partial \lambda }(\lambda ,\mu )\) and \({\partial f_i\over \partial \mu }(\lambda ,\mu )\), where \(f_i(\lambda ,\mu ):=\det (T_i(\lambda ,\mu ))\) for \(i=1,2\). If \(f_i(\lambda ,\mu )\ne 0\) then Jacobi’s formula for the derivative of the determinant yields

$$\begin{aligned} \begin{aligned} {1\over f_i(\lambda ,\mu )}\cdot {\partial f_i\over \partial \lambda }(\lambda ,\mu )&= \mathrm{tr}\left( T_i(\lambda ,\mu )^{-1}{\partial T_i\over \partial \lambda }(\lambda ,\mu )\right) ,\\ {1\over f_i(\lambda ,\mu )}\cdot {\partial f_i\over \partial \mu }(\lambda ,\mu )&= \mathrm{tr}\left( T_i(\lambda ,\mu )^{-1}{\partial T_i\over \partial \mu }(\lambda ,\mu )\right) \end{aligned} \end{aligned}$$
(5.9)

for \(i=1,2\). Using the above formulae we can compute the derivatives in \({\mathscr {O}}(n^3)\).

In [5] one can find an algorithm based on the LU factorization. To simplify the presentation, we describe the algorithm for one-parameter only. Suppose that \(\det (A(\lambda ))\ne 0\) and that \(PA(\lambda )=LU\) is the result of the Gaussian elimination with partial pivoting, where P is a permutation matrix, L is a lower triangular matrix with ones on the diagonal and U is an upper triangular matrix. Then

$$\begin{aligned}f(\lambda )=\det (A(\lambda ))=\det (P)\prod _{j=1}^n u_{jj}.\end{aligned}$$

If we fix the permutation matrix P, then for each \(\nu \) in a small neighbourhood of \(\lambda \) there exist matrices \(L(\nu )\) and \(U(\nu )\) such that

$$\begin{aligned} L(\nu )U(\nu )=PA(\nu ) \end{aligned}$$
(5.10)

is the LU factorization of \(PA(\nu )\). By differentiating (5.10) at \(\nu =\lambda \) we get

$$\begin{aligned} PA'(\lambda )=L'(\lambda )U(\lambda )+L(\lambda )U'(\lambda )=MU+LV, \end{aligned}$$
(5.11)

where \(M:=L'(\lambda )\) is a lower triangular matrix with zeros on the diagonal and \(V:=U'(\lambda )\) is an upper triangular matrix. Matrices M and V of the proper form that satisfy (5.11) can be computed in \({\mathscr {O}}(n^3)\) flops from \(A'(\lambda )\), P, L, and U, for details, see [5] or Algorithm 6.1 in [25]. It follows that

$$\begin{aligned} {f'(\lambda )\over f(\lambda )}=\sum \limits _{j=1}^n{v_{jj}\over u_{jj}}. \end{aligned}$$

The Newton’s method combined with the above approach to compute the derivatives is presented in Algorithm 4. We assume that \(T_1(\lambda _k,\mu _k)\) and \(T_2(\lambda _k,\mu _k)\) are nonsingular. If not, we can use a slightly modified algorithm from [5] that is able to compute \(f'(\lambda )\) by the LU factorization even when \(f(\lambda )=0\).

figure d

As this is Newton’s method, the obtained method has quadratic convergence close to an algebraically simple eigenvalue. A variant of Algorithm 4 was already applied to the 2EP, see, e.g. [6].

5.5 Implicit determinant algorithm

A bottleneck of Algorithm 4 is the computation of partial derivatives of the determinants. An alternative is based on the implicit determinant algorithm, proposed in [29], see also [1], for the one-parameter nonlinear eigenvalue problem.

Lemma 5.3

Let \((\lambda _*,\mu _*)\) be an algebraically simple eigenvalue of the N2EP (1.1) and let \( x_1\otimes x_2\) and \( y_1\otimes y_2\) be the corresponding right and left eigenvector. If vectors \(u_i\) and \(v_i\) are such that \(u_i^Hy_i\ne 0\) and \(v_i^Hx_i\ne 0\), then the bordered matrix

$$\begin{aligned} M_i(\lambda ,\mu ):=\left[ \begin{matrix}T_i(\lambda ,\mu ) &{}\quad u_i\\ v_i^H &{}\quad 0\end{matrix}\right] \end{aligned}$$
(5.12)

is nonsingular at \((\lambda _*,\mu _*)\) for \(i=1,2\).

Proof

Suppose that

$$\begin{aligned} M_i(\lambda _*,\mu _*)\left[ \begin{matrix}z \\ \alpha \end{matrix}\right] = \left[ \begin{matrix}T_i(\lambda ,\mu ) &{}\quad u_i\\ v_i^H &{}\quad 0\end{matrix}\right] \left[ \begin{matrix}z \\ \alpha \end{matrix}\right] =0. \end{aligned}$$

When we multiply the first equation by \(y_i^H\) we get \(\alpha y_i^Hu_i=0\) and \(y_i^Hu_i\ne 0\) yields \(\alpha =0\). It follows that \(z=\beta x_i\) for a scalar \(\beta \). But \(\beta v_i^Hx_i=0\) and \(v_i^Hx_i\ne 0\), therefore \(\beta =0\). \(\square \)

Close to \((\lambda _*,\mu _*)\) we define vector \(x_i(\lambda ,\mu )\) and scalar \(g_i(\lambda ,\mu )\) as the solution of

$$\begin{aligned} M_i(\lambda ,\mu )\left[ \begin{matrix}x_i(\lambda ,\mu )\\ g_i(\lambda ,\mu )\end{matrix}\right] =\left[ \begin{matrix}0\\ 1\end{matrix}\right] \end{aligned}$$

for \(i=1,2\). Then, by Cramer’s rule,

$$\begin{aligned} g_i(\lambda ,\mu )={\det (T_i(\lambda ,\mu ))\over \det (M_i(\lambda ,\mu ))},\quad i=1,2,\end{aligned}$$
(5.13)

and \(f_1(\lambda ,\mu )=f_2(\lambda ,\mu )=0\) has the same root \((\lambda _*,\mu _*)\) as \(g_1(\lambda ,\mu )=g_2(\lambda ,\mu )=0\). By differentiating (5.13) we get the linear system

$$\begin{aligned} M_i(\lambda ,\mu )\left[ \begin{matrix}{\partial x_i\over \partial \lambda }(\lambda ,\mu ) &{}\quad {\partial x_i\over \partial \mu }(\lambda ,\mu )\\ {\partial g_i\over \partial \lambda }(\lambda ,\mu )&{} {\partial g_i\over \partial \mu }(\lambda ,\mu )\end{matrix}\right] = -\left[ \begin{matrix}{\partial T_i\over \partial \lambda }(\lambda ,\mu ) x_i &{}\quad {\partial T_i\over \partial \mu }(\lambda ,\mu ) x_i \\ 0 &{} 0\end{matrix}\right] \end{aligned}$$

with the same matrix as in (5.13) and solve it to get the partial derivatives of \({\partial g_i\over \partial \lambda }\) and \({\partial g_i\over \partial \mu }\) for the Newton update. The overall method is presented in Algorithm 5.

In Algorithm 5 the matrix in Steps 3 and 4 is the same, so we have to compute the factorization only once. One step is faster than applying (5.9) or Algorithm 4. The algorithm depends on the vectors \(u_i\) and \(v_i\). The optimal choice for \(u_i\) and \(v_i\) are left and right eigenvector components \(u_i=y_i\) and \(v_i=x_i\) for \(i=1,2\) (see, e.g., Subsection 4.5 in [1]). If \(u_i\) and \(v_i\) differ much from \(y_i\) and \(x_i\) then Algorithm 5 can converge to a root different than \((\lambda _*,\mu _*)\).

figure e

5.6 Jacobi–Davidson method

All the above-mentioned methods require a good initial approximation to converge to a wanted eigenvalue. In addition, they compute one eigenpair only. When we lack such approximation we can try a Jacobi–Davidson type method. This is also a method of choice when matrices are large and sparse. The first Jacobi–Davidson method for a 2EP was a one-sided version for a right-definite 2EP [12]. It was followed by the two-sided version for a general 2EP [9], while the latest version [13] uses harmonic Ritz values and works well for the interior eigenvalues. If we look at the NEP, then the Jacobi–Davidson method was applied to a polynomial eigenvalue problem in [28], while a version for a general NEP is presented in [4].

In Algorithm 6 we give a Jacobi–Davidson type method for a N2EP. Most of the details are omitted as they can be found in, e.g., [11] and references therein.

figure f

Let us give some comments on the algorithm. In Steps 3 and 4 we use the repeated Gram–Schmidt orthogonalization so that the columns of \(U_{ik}\) and \(V_{ik}\) are orthonormal for \(i=1,2\). The choice of an appropriate test space depends on whether we are using one-sided variant, two-sided variant, or harmonic Ritz values. In the basic one-sided variant we take \(V_{ik}=U_{ik}\) and then Step 4 is redundant.

In Step 5 we need a numerical method that can be applied to a projected N2EP with small matrices to compute the Ritz pairs of interest. In case of a P2EP we can apply the linearization, compute all Ritz pairs, and then choose the most appropriate one. For some other problems that are truly nonlinear, for example (2.2), we can apply the methods from the previous subsections to obtain a Ritz pair. In this case it is useful to have a good approximation for the eigenvalue, otherwise the methods might not converge. In a typical scenario we are looking for an eigenvalue close to a given target and the matrices are so large and sparse that the iterative iteration is too expensive. Instead, we apply the Jacobi–Davidson method and use the iterative iteration to solve the small projected problems.

In Step 8 we solve the correction equation only approximately by applying a few steps of the GMRES method or another appropriate subspace method. For a better convergence we should use preconditioning and restart the method when subspaces become too large.

The Jacobi–Davidson method was applied to the P2EP in [11]. Using a selection criteria that prevents the algorithm from selecting the already converged Ritz values it is possible to compute more eigenvalues.

6 Numerical examples

Two numerical examples are given. They were obtained on 64-bit Windows version of Matlab R2012b running on Intel 8700 processor and 8 GB of RAM. In the first example we compare the convergence of methods on a Q2EP. In the second example we apply inverse iteration on a N2EP related to the determination of the critical curve of a DDE.

Example 6.1

We take the Q2EP of the form (2.1), where \( A_{ij}\) and \( B_{ij}\) are random \(n\times n\) matrices generated in Matlab as

figure g

For \(n=250\), 500, and 1000 we first compute one eigenpair of the Q2EP by the Jacobi–Davidson method from [11], which is basically Algorithm 6 adjusted to Q2EP, and then use a perturbed solution for an initial approximation. We tested Algorithms 1, 2, 3, 4, and 5, to which we refer from now on by more descriptive names InvIter, ResIter, SuccLP, NewtCF, and ImpDet, respectively. For numerical experiments with the Jacobi–Davidson method (Algorithm 6) for Q2EP, see [11].

For \(n=250\) we give in Table 1 for all algorithms the norms of the eigenvalue updates \((|\varDelta \lambda _k|^2+|\varDelta \mu _k|^2)^{1/2}\) in individual iterations. In addition, for InvIter and ResIter, where eigenvectors are computed as well, we give the norms of the residuals \((\Vert T_1(\lambda _k,\mu _k)x_1^{(k)}\Vert ^2+\Vert T_2(\lambda _k,\mu _k)x_2^{(k)}\Vert ^2)^{1/2}\). It is clearly seen that ResIter has linear convergence while the other four algorithms have quadratic convergence.

Table 1 Comparison of convergence of Algorithms 1 (InvIter), 2 (ResIter), 3 (SuccLP), 4 (NewtCF), and 5 (ImpDet) on a random Q2EP with matrices of size \(n=250\)

In Table 2 we give the number of iterations and computational times. We iterate InvIter and ResIter until the norm of the residual drops below \(10^{-10}\), while in SuccLP, NewtCF, and ImpDet the same bound is used for the norm of the eigenvalue update.

Table 2 Number of iterations and running times for Algorithms 1 (InvIter), 2 (ResIter), 3 (SuccLP), 4 (NewtCF), and 5 (ImpDet) applied to a random Q2EP with matrices of size \(n=250\), 500, and 1000

NewtCF is slower than InvIter and ResIter. We must remark that in the numerical experiments we replaced the Steps 4 and 5 by (5.9). In theory, Steps 4 and 5 should be faster as they require 25 % less flops than (5.9). In practice, it is difficult to implement this part efficiently as all computations are done at most in Level 1 BLAS. For instance, Algorithm 4 using an implementation of Steps 4 and 5 in C using MEX (Matlab implementation is even much slower) requires 168 s for \(n=1000\).

In SuccLP we use subspace iteration with Ritz projections from [19] to compute the eigenvalue closest to zero, which is to our believe the fastest available option at the moment. Based on the properties of the algorithm from [19] it is clear that one step of SuccLP is always more expensive than one step of InvIter. As they both have quadratic convergence, the cheaper inverse iteration is preferred. ResIter is also very competitive and has an advantage that by using the fixed matrix \(T_i(\lambda _0,\mu _0)\) we avoid potential difficulty of \(T_i(\lambda _k,\mu _k)\) being singular when \((\lambda _k,\mu _k)\) is close to an eigenvalue.

In ImpDet we choose the vectors \(u_i\) and \(v_i\) as the result of one step of standard inverse iteration on \(T_i(\lambda _0,\mu _0)\) and \(T_i(\lambda _0,\mu _0)^H\), respectively, for \(i=1,2\). The method is competitive to InvIter and ResIter. Its advantage is that the bordered matrix (5.12) is nonsingular at exact eigenvalue \((\lambda _*,\mu _*)\).

Example 6.2

We consider a DDE with two delays of the form

$$\begin{aligned} \dot{x}(t) = A_0 x(t) + A_1 x(t-h_1) + A_2 x(t-h_2). \end{aligned}$$

The corresponding characteristic equation is

$$\begin{aligned} M(\lambda )z := (-\lambda I + A_0 + A_1 e^{-h_1 \lambda } + A_2 e^{-h_2 \lambda })z=0, \end{aligned}$$
(6.1)

where nonzero z is an eigenvector and \(\lambda \) is the corresponding eigenvalue. In the critical delay, where the stability of the DDE changes, \(\lambda \) is purely imaginary. We would like to find the critical curve, i.e., we are interested in the curve \((h_1,h_2)\) for \(h_1,h_2\ge 0\), where the stability changes.

A usual approach is to assume that \(h_2=\alpha h_1\) for different values of \(\alpha \). Then, as \(\alpha \) goes from 0 to \(\infty \), we compute the points on the critical curve. When we introduce \(\mu =e^{-h_1\lambda }\) in (6.1) and write the conjugate equation, where, since \(\lambda \) is purely imaginary, \(\overline{\lambda }=-\lambda \) and \(\overline{\mu }=\mu ^{-1}\), we obtain (2.2). This equation can be solved for some integer values of \(\alpha \), for instance \(\alpha =0,1,2\), if we transform the problem into a polynomial eigenvalue problem or into a P2EP, see, e.g., [15]. We could also take for \(\alpha \) a rational number with small numerator and denominator and then solve the problem as a P2EP.

This is how we get some points on the critical curve that we can use as initial approximations for other values of \(\alpha \) and thus follow the critical curve. For the missing values of \(\alpha \) we can solve (2.2) using any of the proposed local methods.

For the numerical example we consider the DDE with two delays from [14]:

$$\begin{aligned} u_t\!=\!u_{xx}+a_0(x)u+a_1(x)u(x,t-h_1)+a_2(x)u(x,t-h_2),\ u(0,t)\!=\!u(\pi ,t)\!=\!0,\nonumber \\ \end{aligned}$$
(6.2)

where the coefficients are \(a_0(x)=2+0.3\sin (x)\), \(a_1(x)=-2+0.2x(1-e^{x-\pi })\), and \(a_2(x)=-2-0.3x(\pi -x)\). We discretize (6.2) by the finite differences with matrices of size \(100\times 100\) and thus obtain \(A_0\), \(A_1\), and \(A_2\) in (2.2). For the initial point we assume \(h_1=h_2\) and compute the critical delay by the Q2EP formulation and the Jacobi–Davidson method as explained in [11].

It is important that (2.2) can be formulated as a Q2EP for \(h_1=h_2\). This enables us to apply the linearization in Step 5 of Algorithm 6 and compute all Ritz values of the projected problem. Only one eigenvalue of the related Q2EP corresponds to the critical delay and a special selection criteria (for details, see [11]) guides the Jacobi–Davidson method to this particular eigenvalue. This gives \(\lambda = 4.2399286i\) and \(h_1=h_2=0.30266688\) for the initial value \(\alpha _0=1\).

Let us remark that for \(\alpha \ne 1\) such that (2.2) cannot be formulated as a P2EP, it is difficult to compute the critical delay without a good approximation. Namely, if we apply the Jacobi–Davidson method, then in Step 5 of Algorithm 6 we get only one Ritz value and there is no guarantee that the computed eigenvalue corresponds to the critical delay.

In the second phase we set

$$\begin{aligned} \alpha _k=\tan \left( {(m-k) \pi \over 4m}\right) \end{aligned}$$

for \(k=1,\ldots ,m\). For each \(\alpha _k\) we take the solution from step \(k-1\) as an initial value for InvIter. In this way we get points where \(h_2<h_1\). For the other part of the critical curve we exchange the roles of \(h_1\) and \(h_2\) and repeat the procedure starting again from \(\alpha _0=1\). The results for \(m=20\) are presented in Fig. 1, where the dot presents the initial delay at \(h_1=h_2\). The computation of all 41 points including the initial one takes just 1.5 s, which is much faster than reported in [14]. In addition, the results are more accurate as we use much finer mesh.

Fig. 1
figure 1

The critical curve \((h_1,h_2)\) for the DDE (6.2). We start in \(h_1=h_2=0.30266688\) and follow the curve with the inverse iteration method

For large and sparse matrices \(A_0,A_1\), and \(A_2\) it would be more efficient to apply the Jacobi–Davidson method in the second phase instead of InvIter. But in our particular example, where the matrices \(A_0,A_1\) and \(A_2\) are tridiagonal, InvIter is very efficient and there is no need for the Jacobi–Davidson method.

7 Conclusions

We presented several numerical methods for nonlinear two-parameter eigenvalue problems. The most competitive local methods are the inverse iteration with quadratic convergence and the less expensive residual iteration with linear convergence. If we know how to solve the smaller projected problem, then we can use the Jacobi–Davidson method as a global method. As a practical application we presented the computation of critical curves of delay-differential equations with multiple delays.