Keywords

1 Introduction

Subresultants are one of the most fundamental tools in computer algebra. They are at the core of numerous algorithms including, but not limited to, polynomial GCD computations, polynomial system solving, and symbolic integration. When the subresultant chain of two polynomials is required in a procedure, not all polynomials of the chain, or not all coefficients of a given subresultant, may be needed. Based on that observation, the authors of [5] studied different practical schemes, and their implementation, for efficiently computing subresultants.

The main objective of [5] is, given two univariate polynomials \(a, b \in {\mathcal {A}}[y]\) over some commutative ring \(\mathcal {A}\), to compute the subresultant chain of \(a, b \in {\mathcal {A}}[y]\) speculatively. To be precise, the objective is to compute the subresultants of index 0 and 1, delaying the computation of subresultants of higher index until it is proven necessary. The practical importance of this objective, as well as related works, are discussed extensively in [5].

Taking advantage of the Half-GCD algorithm and evaluation-interpolation methods, the authors of [5] consider the cases in which the coefficient ring \(\mathcal {A}\) is a polynomial ring with one or two variables, and with coefficients in a field, \(\mathbb {Q}\) or \({\mathbb {Z}}/p{\mathbb {Z}}\), for a prime number p. The reported experimentation demonstrates the benefits of computing subresultant chains speculatively in the context of polynomial system solving.

That strategy, however, based on the Half-GCD algorithm, cannot scale to situations in which the coefficient ring \(\mathcal {A}\) is a polynomial ring in many variables, say 5 or more. The reason is that, for the Half-GCD algorithm to bring benefits, the degree in y of the polynomials ab must be in the 100’s, implying that the resultant of ab is likely to have very large degrees in the variables of \(\mathcal {A}\), thus making computations not feasible in practice when \(\mathcal {A}\) has many variables.

Therefore, for this latter situation, one should consider an alternative approach in order to compute subresultant chains speculatively, which is the objective of the present paper. To this end, we consider subresultant chain computations using Bézout matrices. Most notably, [1] introduced an algorithm to compute the nominal coefficients of subresultants by calculating the determinants of sub-matrices of a modified version of the Bézout matrix. Later, [15] generalized this approach to compute all subresultants instead of only the nominal coefficients. Although the approach is theoretically slower than Ducos’ subresultant chain algorithm [10], early experimental results in Maple, collected during the development of the SubresultantChain method in the RegularChains library [16], indicate that approaches based on the Bézout matrix are particularly well-suited for sparse polynomials with many variables.

In this paper, we report on further work following this approach. In Sect. 2, we discuss how to compute the necessary determinants of the sub-matrices of the Bézout matrix. We modify and optimize the fraction-free LU decomposition (FFLU) of a matrix over a polynomial ring presented in [14]. We demonstrate the efficacy of the proposed methods using implementations in Maple and the Basic Polynomial Algebra Subprograms (BPAS) library [3]. Our optimization techniques include smart-pivoting and using the BPAS multithreaded interface to parallelize the row elimination step. All of our code, is open source and part of the BPAS library available at www.bpaslib.org.

In Sect. 3, we focus on the computation of subresultants using the Bézout matrix. In Sect. 3.1, we review the definitions of the Bézout matrix and a modified version of it, known as the Hybrid Bézout matrix. Then, we introduce a speculative approach for computing subresultants by modifying the fraction-free LU factorization and utilizing the Hybrid Bézout matrices in Sect. 3.2. We have implemented these computational schemes for subresultant chains and our experimental results, presented in Sect. 3.3, illustrate the benefits of the proposed methods.

2 Fraction-Free LU Decomposition

A standard way to compute the determinant of a matrix A is to reduce it to a triangular form and then take the product of the resulting diagonal elements [18]. One such triangular form is given by an LU matrix decomposition. When the input matrix A has elements in a polynomial ring, standard LU decomposition algorithms lead to matrices with rational functions as elements. In order to keep the elements in the ring of polynomials, while controlling expression swell, one can use a fraction-free LU decomposition (FFLU), taking advantage of Bareiss’ algorithm [6], which was originally developed for integer matrices. Although in an FFLU decomposition the matrices contain only elements from the ring of polynomials, the intermediate computations do require exact divisions. Reducing the cost of these divisions is a practical challenge, one which we discuss in this section. The main algorithm on which we rely has been described in [12, Ch. 9] and [14]. The main theorem is the following.

Theorem 1

A rectangular matrix A with elements from an integral domain \(\mathbb {B}\), having dimensions \(m \times n\) and rank r, may be factored into matrices containing only elements from \(\mathbb {B}\) in the form,

$$ A = P_r L D^{-1} U P_c = P_r \begin{pmatrix} \mathcal {L} \\ \mathcal {M} \end{pmatrix} D^{-1} \begin{pmatrix} \mathcal {U}&\mathcal {V} \end{pmatrix} P_c, $$

where the permutation matrix \(P_r\) is \(m \times m\); the permutation matrix \(P_c\) is \(n \times n\); \(\mathcal {L}\) is \(r \times r\), lower triangular and has full rank:

$$ \mathcal {L} = \begin{pmatrix} p_1 &{} 0 &{} \ldots &{} 0 \\ l_{21} &{} p_2 &{} \ddots &{} \vdots \\ \vdots &{} \vdots &{} \ddots &{} 0 \\ l_{r1} &{} l_{r2} &{} \ldots &{} p_r \end{pmatrix}, $$

where the \(p_i \ne 0\) are the pivots in a Gaussian elimination; \(\mathcal {M}\) is \((m-r) \times r\) and is null when \(m=n\) holds; D is \(r \times r\) and diagonal:

$$ D = \text {diag}(p_1, p_1 p_2, p_2 p_3, \cdots , p_{r-2} p_{r-1}, p_{r-1} p_{r}), $$

\(\mathcal {U}\) is \(r \times r\) and upper triangular, while \(\mathcal {V}\) is \(r \times (n-r)\) and is null when \(m=n\) holds:

$$ \mathcal {U} = \begin{pmatrix} p_1 &{} u_{12} &{} \ldots &{} u_{1r} \\ 0 &{} p_2 &{} \ldots &{} u_{2r} \\ \vdots &{} \ddots &{} \ddots &{} \vdots \\ 0 &{} \ldots &{} 0 &{} p_{r} \end{pmatrix} . $$

Proof [14, Theorem 2]. Note that the elements of the matrix D belong to \(\mathbb {B}\), but the matrix \(D^{-1}\), if explicitly calculated, lies in the quotient field.

Algorithm 3 implements Theorem 1 while Algorithm 2 utilizes Theorem 1 for computing the determinant of A, when A is square. Both Algorithm 3 and Algorithm 2 rely on Algorithm 1, which is a helper-function. This latter algorithm updates the input matrix A in-place, to record the upper triangular matrix U; it also computes the “denominator” d, the rank r of the matrix A and the column permutation of the input matrix. This is sufficient information to calculate the determinant of a square matrix.

In Algorithm 2, the routine check-parity calculates the parity of the given permutation modulo 2. Note that in both Algorithms 1 and 3, we only consider row-operations to find the pivot and store the row permutation patterns in the list \(P_r\) of size m. Column-permutations, and the corresponding list \(P_c\), are used in Sect. 2.1.

To optimize the FFLU algorithm, we use a smart-pivoting strategy, discussed in Sect. 2.1. The idea is to find a “best” pivot by searching through the matrix to pick a non-zero coefficient (actually a polynomial) with the minimum number of terms in each iteration. The goal of this technique is to reduce the cost of the exact divisions in Bareiss’ algorithm; see Sect. 2.1 for the details.

In addition, we discuss the parallel opportunities of this algorithm in Sect. 2.2, taking advantage of the BPAS multithreaded interface. Finally, Sect. 2.3 highlights the performance of these algorithms in the BPAS library, utilizing sparse multivariate polynomial arithmetic.

figure a
figure b
figure c

Example 1

Consider matrix \(A \in \mathbb {B}^{4 \times 4}\) where \(\mathbb {B}= \mathbb {Z}[x]\). \(A = \)

$$ \begin{pmatrix} 11 x^2 - 11 x + 3\ &{} -3 (x - 1) (2 x - 3) &{} 0 &{} 0 \\ 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) &{} 0 \\ 0 &{} 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) \\ -2 x + 3 &{} 0 &{} 0 &{} -x \end{pmatrix}. $$

To compute the determinant of this matrix, Algorithm 1 starts with \(d = 1\), \(k = 0\), \(c = 0\), \(P_r = [ 0, 1, 2, 3 ]\), \(A_{0,0} = 11 x^2 - 11 x + 3 \ne 0\), and \(r = 1\). After the first iteration, the nested for-loops update the (bottom-right) sub-matrix from the second row and column; we have \(A^{(1)} = \)

$$\begin{aligned} \left( \begin{array}{c|ccc} A_{0,0} &{} -3 (x - 1) (2 x - 3) &{} 0 &{} 0 \\ \hline 0 &{} {{(A_{0,0})}^2} &{} -3 (x - 1) (2 x - 3) A_{0,0} &{} {0} \\ 0 &{} {0} &{} {{(A_{0,0})}^2} &{} -3 (x - 1) (2 x - 3) A_{0,0} \\ -2 x + 3 &{} {-3 (x - 1) (2 x - 3)^2} &{} {0} &{} {-xA_{0,0}} \end{array} \right) , \end{aligned}$$

where \(A_{1,2}^{(1)} = A_{2,3}^{(1)} = -3 (x - 1) (2 x - 3) (11 x^2 - 11 x + 3)\). In the second iteration of the while-loop, we have \(d = - 11 x^2 + 11 x - 3\), \(k = 1\), \(c = 1\), \(A_{1, 1}^{(1)} = (11 x^2 - 11 x + 3)^2 \ne 0\), and \(r = 2\). Then, \(A^{(2)} =\)

$$ \left( \begin{array}{cc|cc} A_{0,0} &{} A_{0,1} &{} 0 &{} 0 \\ 0 &{} {{(A_{0,0})}^2} &{} -3 (x - 1) (2 x - 3) A_{0,0} &{} {0} \\ \hline 0 &{} 0 &{} {(A_{0,0})}^3 &{} -3 (x - 1) (2 x - 3) {{(A_{0,0})}^2} \\ -2 x + 3 &{} (2 x - 3) A_{0,1}&{} -9 (x-1)^2 (2x-3)^3 &{} -x {{(A_{0,0})}^2} \\ \end{array}\right) . $$

In the third iteration of the while-loop, we have \(d = - (11 x^2 - 11 x + 3)^2 \), \( k = 2\), \(c = 2\), \(A_{2, 2}^{(2)} = - (11 x^2 - 11 x + 3)^3 \ne 0\), and \(r = 3\). And so, \(A^{(3)} =\)

$$ \left( \begin{array}{ccc|c} A_{0,0} &{} A_{0,1} &{} 0 &{} 0 \\ 0 &{} {(A_{0,0})}^2 &{} -3 (x - 1) (2 x - 3) A_{0,0} &{} 0 \\ 0 &{} 0 &{} {(A_{0,0})}^3 &{} -3 (x - 1) (2 x - 3) {{(A_{0,0})}^2} \\ \hline -2 x + 3 &{} (2 x - 3) A_{0,1} &{} -9 (x-1)^2 (2x-3)^3 &{} A_{3,3}^{(3)} \\ \end{array}\right) , $$

where \(A_{3,3}^{(3)} = -1763 x^7 + 7881 x^6 - 19986 x^5 + 35045 x^4 - 41157 x^3 + 30186 x^2 - 12420 x + 2187\). In fact, one can check that \(A_{3,3}^{(3)}\) is the determinant of the full-rank (\(r = 4\)) matrix \(A \in \mathbb {Z}[x]^{4 \times 4}\).

In [6], Bareiss introduced an alternative version of this algorithm, known as multi-step Bareiss’ algorithm to compute fraction-free LU decomposition. This method reduces the computation of row eliminations by adding three cheaper divisions to compute each row in the while-loop and removing one multiplication in each iteration of the nested for-loops; see the results in Table 1 and [12, Chapter 9] for more details.

In the next sections, we investigate optimizations of Algorithm 1 to compute the determinant of matrices over multivariate polynomials. These optimizations are achieved by reducing the cost of exact divisions by finding better pivots and utilizing the BPAS multithreaded interface to parallelize this algorithm.

2.1 Smart-Pivoting in FFLU Algorithm

Returning to Example 1, we performed exact divisions for the following divisors in the second and third iterations,

$$\begin{aligned} d^{(1)}&= - 11 x^2 + 11 x - 3, \\ d^{(2)}&= -121 x^4 + 242 x^3 - 187 x^2 + 66 x - 9. \end{aligned}$$

However, we could pick a polynomial with fewer terms as our pivot in every iteration to reduce the cost of these exact divisions. Such a method, which finds a polynomial with the minimum number of terms in each column as the pivot of each iteration, is referred to as column-wise smart-pivoting. For matrix A of Example 1, one can pick \(A_{3,0} = -2 x + 3\) as the first pivot. Applying this method yields, after the first iteration, \(A^{(1)}\) =

$$ \left( \begin{array}{c|ccc} -2 x + 3 &{} 0 &{} 0 &{} -x \\ \hline 0 &{} -(2x-3)A_{0,0} &{} 3(x-1)(2x-3)^2 &{} 0 \\ 0 &{} 0 &{} -(2x-3)A_{0,0} &{} 3(x-1)(2x-3)^2 \\ A_{0,0} &{} 3(x-1)(2x-3)^2 &{} 0 &{} xA_{0,0} \end{array}\right) , $$

where \(d = 2x - 3\). Continuing this method from Algorithm 1, we get the following matrix for \(r = 4\), \(A^{(4)} = \)

$$ \left( \begin{array}{ccc|c} -2 x + 3 &{} 0 &{} 0 &{} -x \\ 0 &{} -(2x-3)A_{0,0} &{} 3(x-1)(2x-3)^2 &{} 0 \\ 0 &{} 0 &{} -(2x-3)A_{0,0}^2 &{} 3(x-1)(2x-3)^2A_{0,0} \\ \hline A_{0,0} &{} 3(x-1)(2x-3)^2 &{} 9(x-1)^2(2x-3)^3 &{} A^{(4)}_{3,3} \end{array}\right) , $$

where \(A_{3,3}^{(4)} = 1763 x^7 - 7881 x^6 + 19986 x^5 - 35045 x^4 + 41157x^3 - 30186 x^2 + 12420 x - 2187\), \(P_r = [3, 1, 2, 0]\), and we have \(\textsc {det}(A) = -A_{3,3}^{(4)}\) from Algorithm 2.

In column-wise smart-pivoting, we limited our search for the best pivot to the corresponding column of the current row. To extend this method, one can try searching for the best pivot in the sub-matrix starting from the next current row and column. To perform this method, referred to as (fully) smart pivoting, we need to use column-operations and a column-wise permutation matrix \(P_c\). The column operations along with row operations are not cache-friendly. This is certainly an issue for matrices with (large) multivariate polynomial entries while this may not be an issue with (relatively small) matrices with numerical entries. Therefore, we avoid column swapping within the decomposition, and instead we keep track of column permutations in the list of column-wise permutation patterns \(P_c\) to calculate the parity check later in Algorithm 2.

Algorithm 4 presents the pseudo-code of the smart pivoting fraction-free LU decomposition utilizing both row-wise and column-wise permutation patterns \(P_r, P_c\). This algorithm updates A in-place, to become the upper triangular matrix U, and returns the rank and denominator of the given matrix \(A \in \mathbb {B}^{m \times n}\).

2.2 Parallel FFLU Algorithm

For further practical performance, we now investigate opportunities for parallelism alongside our schemes for cache-efficiency. In particular, notice that during the row reduction step (the for loops on lines 24–28 of Algorithm 4) the update of each element is independent. Implementing this step as a parallel_for loop is easily achieved with the multithreading support provided in the BPAS library; see further details in [4].

figure d
figure e

Algorithm 5 shows a naïve implementation of this parallel algorithm. Note that in a parallel_for loop, each iteration is (potentially) executed in parallel. In Algorithm 5, this means lines 3–5 are executed independently and in parallel for each possible value of (ij). If that number of such possible values exceeds a pre-determined limit (e.g. the number of hardware threads supported), then the number of iterations will be divided as evenly as possible among the available threads.

A difficulty to this parallelization scheme is that the size of the sub-matrices decreases with each iteration. Therefore, the amount of work executed by each thread also decreases. In practice, to address this load-balancing and to maximize parallelism, we only parallelize the outer loop (line 1 of Algorithm 5).

2.3 Experimentation

In this section, we compare the fraction-free LU decomposition algorithms for Bézout matrix (Definition 2) of randomly generated, non-zero and sparse polynomials in \(\mathbb {Z}[x_1, \ldots , x_v]\) for \(v \ge 5\) in the BPAS library. We recall that methods based on the Bézout matrix have been observed (during development of the RegularChains library [16], and later in Sect. 3.3) to be well-suited for sparse polynomials with many variables. Throughout this paper, our benchmarks were collected on a machine running Ubuntu 18.04.4 and GMP 6.1.2, with an Intel Xeon X5650 processor running at 2.67 GHz, with 12 \(\times \) 4 GB DDR3 memory at 1.33 GHz.

Table 1 shows the comparison between the standard implementation of the fraction-free LU decomposition (Algorithm 1; denoted plain), the column-wise smart pivoting (denoted col-wise SP), the fully smart-pivoting method (Algorithm 4; denoted fully SP), and Bareiss’ multi-step technique added to Algorithm 4 (denoted multi-step). Here, \(v = 5\) and the generated polynomials have a sparsity ratio (the fraction of zero terms to the total possible number of terms in a fully dense polynomial of the same partial degrees) of 0.98.

This table indicates that using smart-pivoting yields up to a factor of 3 speed-up. Comparing col-wise SP and fully SP shows that calculating \(P_c\) (column-wise permutation patterns) along with \(P_r\) (row-wise permutation patterns) does not cause any slow-down in the calculation of d.

Moreover, using both multi-step technique and smart-pivoting does not bring any additional speed-up. The smart-pivoting technique is already minimized the cost of exact divisions in each iteration. Table 2 shows \({\text {plain}/\text {fully SP}}\), \({\text {plain}/\text {multi-step}}\), and \({\text {fully SP}/\text {multi-step}}\) ratios from Table 1.

To analyze the performance of parallel FFLU algorithm, we compare Algorithm 5 and Algorithm 4 for \( n \times n\) matrices of randomly generated non-zero univariate polynomials with integer coefficients and degree 1. Table 3 summarizes these results. For \(n=75\), \(2.14{\times }\) parallel speed-up is achived, and speed-up continues to increase with increasing n.

Table 1. Compare the execution time (in seconds) of fraction-free LU decomposition algorithms for Bézout matrix of randomly generated, non-zero and sparse polynomials \(a, b \in \mathbb {Z}[x_1, x_2, \ldots , x_5]\) with \(x_5< \cdots< x_2 < x_1\), \(\deg (a, x_1) = \deg (b, x_1) + 1 = d\), \(\deg (a, x_2) = \deg (b, x_2) + 1 = 5\), \(\deg (a, x_3) = \deg (b, x_3) = 1\), \(\deg (a, x_4) = \deg (b, x_4) = 1\), \(\deg (a, x_5) = \deg (b, x_5) = 1\)
Table 2. Ratios of FFLU algorithms for polynomials in Table 1
Table 3. Comparing the execution time (in seconds) of Algorithm 4 and Algorithm 5 for \(n \times n\) matrices of random non-zero degree 1 univariate integer polynomials

3 Bézout Subresultant Algorithms

In this section, we continue exploring the subresultant algorithms for multivariate polynomials based on calculating the determinant of (Hybrid) Bézout matrices.

3.1 Bézout Matrix and Subresultants

A traditional way to define subresultants is via computing determinants of submatrices of the Sylvester matrix (see, e.g. [5] or [11, Ch. 6]). Li [17] presented an elegant way to calculate subresultants directly from the following matrices. This method follows the same idea as subresultants based on Sylvester matrix.

Theorem 1

The k-th subresultant \(S_{k} (a, b)\) of \(a = \sum _{i=0}^{m}a_i y^i , b = \sum _{i=0}^{n} b_i y^i \in {\mathbb {B}}[y]\) is calculated by the determinant of the following \((m+n-k) \times (m+n-k)\) matrix:

figure f
$$\begin{aligned} S_{k} (a, b) = (-1)^{k (m-k+1)} \text {det} \left( E_{k} \right) . \end{aligned}$$

Proof. [17, Section 2]

Another practical division-free approach is through utilizing the Bézout matrix to compute the subresultant chain of multivariate polynomials by calculating the determinant of the Bézout matrix of the input polynomials [13]. From [7], we define the symmetric Bézout matrix as follows.

Definition 1

The Bézout matrix associated with \(a, b \in \mathbb {B}[y]\), where \(m :=\deg (a)\) \(\ge n :=\deg (b)\) is the symmetric matrix:

$$\begin{aligned} \text {Bez}(a, b) :=\left( \begin{array}{ccc} c_{0,0} &{} \cdots &{} c_{0, m-1} \\ \vdots &{} \ddots &{} \vdots \\ c_{m-1, 0} &{} \cdots &{} c_{m-1,m-1} \end{array} \right) , \end{aligned}$$

where the coefficients \(c_{i,j}\), for \(0 \le i, j < m\), are defined by the so-called Cayley expression as follows,

$$\begin{aligned} \frac{a(x) b(y) - a(y) b(x)}{x-y} = \sum _{i,j = 0}^{m-1} c_{i, j} y^ix^j. \end{aligned}$$

The relations between the Sylvester and Bézout matrices have been studied for decades yielding an efficient algorithm to construct the Bézout matrix [2] using a so-called Hybrid Bézout matrix.

Definition 2

The Hybrid Bézout matrix of \(a = \sum _{i=0}^{m} a_i y^i\) and \(b = \sum _{i=0}^{n} b_i y^i \) is defined as the \(m \times m\) matrix

$$ \text {HBez}(a, b) :=\left( \begin{array}{ccc} h_{0,0} &{} \cdots &{} h_{0, m-1} \\ \vdots &{} \ddots &{} \vdots \\ h_{m-1, 0} &{} \cdots &{} h_{m-1,m-1} \end{array} \right) , $$

where the coefficients \(h_{i, j}\), for \(0 \le i, j < m\), are defined as:

$$\begin{aligned} h_{i, j}&= \text{ coeff }(H_{m-i+1}, m-j) \text { for } 1 \le i \le n, \\ h_{i, j}&= \text{ coeff }(x^{m-i} b, m-j) \text { for } m+1 \le i \le n, \end{aligned}$$

with,

$$\begin{aligned} H_{i}&= (a_m y^{i-1} + \cdots + a_{m-i+1} ) ( b_{n-i} y^{m-i} + \cdots + b_{0} y^{m-n}) \\&\ \ \ - ( a_{m-i} y^{m-i} + \cdots + a_{0} ) ( b_{n} y^{i-1} + \cdots + b_{n-i+1}). \end{aligned}$$

Example 2

Consider the polynomials \(a = 5 y^5 + y^3 + 2 y + 1\) and \(b = 3 y^3 + y + 3\) in \(\mathbb {Z}[y]\). The Sylvester matrix of ab is:

$$\begin{aligned} \text {Sylv}(a, b) = \begin{pmatrix} 5 &{} 0 &{} 1 &{} 0 &{} 2 &{} 1 &{} 0 &{} 0 \\ 0 &{} 5 &{} 0 &{} 1 &{} 0 &{} 2 &{} 1 &{} 0 \\ 0 &{} 0 &{} 5 &{} 0 &{} 1 &{} 0 &{} 2 &{} 1 \\ 3 &{} 0 &{} 1 &{} 3 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 3 &{} 0 &{} 1 &{} 3 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 3 &{} 0 &{} 1 &{} 3 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 3 &{} 0 &{} 1 &{} 3 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 3 &{} 0 &{} 1 &{} 3 \end{pmatrix}, \end{aligned}$$

and the Bézout matrix of ab is:

$$\begin{aligned} \text {Bez}(a, b) = \left( \begin{array}{rrrrr} 0 &{} -15 &{} 0 &{} -5 &{} -15 \\ -15 &{} 0 &{} -5 &{} -15 &{} 0 \\ 0 &{} -5 &{} -15 &{} 5 &{} 0 \\ -5 &{} -15 &{} 5 &{} 0 &{} 0 \\ -15 &{} 0 &{} 0 &{} 0 &{} -5 \end{array} \right) , \end{aligned}$$

while the Hybrid Bézout matrix of ab is:

$$\begin{aligned} \text {HBez}(a, b) = \left( \begin{array}{rrrrr} 15 &{} -6 &{} 0 &{} -2 &{} -1 \\ 2 &{} 15 &{} -6 &{} -3 &{} 0 \\ 0 &{} 2 &{} 15 &{} -6 &{} -3 \\ 3 &{} 0 &{} 1 &{} 3 &{} 0 \\ 0 &{} 3 &{} 0 &{} 1 &{} 3 \end{array} \right) . \end{aligned}$$

Diaz-Toca and Gonzalez-Vega examined the relations between Bézout matrices and subresultants in [9]. Hou and Wang studied to apply the Hybrid Bézout matrix for the calculation of subresultants in [13].

Notation 1

Let \(J_{m}\) denote the backward identity matrix of order m and let B and H be defined as follows:

$$\begin{aligned} B :=J_{m} \text { Bez}(a, b) \ J_{m} = \left( \begin{array}{ccc} c_{m-1,m-1} &{} \cdots &{} c_{m-1, 0} \\ \vdots &{} \ddots &{} \vdots \\ c_{0, m-1} &{} \cdots &{} c_{0, 0} \end{array} \right) , \end{aligned}$$
$$\begin{aligned} H :=J_{m} \text { HBez}(a, b) \ \ = \left( \begin{array}{ccc} h_{m-1,0} &{} \cdots &{} h_{m-1, m-1} \\ \vdots &{} \ddots &{} \vdots \\ h_{0, 0} &{} \cdots &{} h_{0,m-1} \end{array} \right) . \end{aligned}$$

Now, we can state how to compute the subresultants from Bézout matrices as follows.

Theorem 2

For polynomials \(a = \sum _{i=0}^{m} a_i y^i\) and \(b = \sum _{i=0}^{n} b_i y^i \) in \(\mathbb {B}[y]\), the k-th subresultant of ab, i.e., \(S_{k}(a, b)\), can be obtained from:

$$\begin{aligned} (-1)^{(m-1)(m-k-1)/2} a_{m}^{m-n} S_{k}(a, b) = \sum _{i=0}^{k} B_{m-k, k-i} \ y^{i}, \end{aligned}$$

where \(B_{m-k, i}\) for \(0 \le i \le k\) denotes the \((m-k) \times (m-k)\) minor extracted from the first \(m-k\) rows, the first \(m-k-1\) columns and the (\(m-k+i\))-th column of B.

Proof. [2, Theorem 2.3]

Theorem 3

For those polynomials \(a, b \in \mathbb {B}[y]\), the k-th subresultant of ab, i.e., \(S_{k}(a, b)\), can be obtained from:

$$\begin{aligned} (-1)^{(m-1)(m-k-1)/2} S_{k}(a, b) = \sum _{i=0}^{k} H_{m-k, k-i} \ y^{i}, \end{aligned}$$

where \(H_{m-k, i}\) for \(0 \le i \le k\) denotes the \((m-k) \times (m-k)\) minor extracted from the first \(m-k\) rows, the first \(m-k-1\) columns and the (\(m-k+i\))-th column of H.

Proof. [2, Theorem 2.3]

Abdeljaoued et al. in [2] study further this relation between subresultants and Bézout matrices. Theorem 4 is the main result of this paper.

Theorem 4

For those polynomials \(a, b \in \mathbb {B}[y]\), the k-th subresultant of ab can be obtained from the following \(m \times m\) matrices, where \(\tau = (m-1)(m-k-1)/2\):

$$\begin{aligned} (-1)^{\tau } a_{m}^{m-n} S_{k}(a, b) = (-1)^{k} \left| \begin{matrix} c_{m-1, m-1} &{} c_{m-1, m-2} &{} \cdots &{} \cdots &{} \cdots &{} c_{m-1, 0} \\ \vdots &{} \vdots &{} \cdots &{} \cdots &{} \cdots &{} \vdots \\ c_{k, m-1} &{} c_{k, m-2} &{} \cdots &{} \cdots &{} \cdots &{} c_{k, 0} \\ &{} &{} 1 &{} -y &{} &{} \\ &{} &{} &{} \ddots &{} \ddots &{} \\ &{} &{} &{} &{} 1 &{} - y \end{matrix}\right| , \end{aligned}$$
$$\begin{aligned} (-1)^{\tau } S_{k}(a, b) = (-1)^{k}\left| \begin{matrix} h_{m-1, 0} &{} h_{m-1, 1} &{} \cdots &{} \cdots &{} \cdots &{} h_{m-1, m-1} \\ \vdots &{} \vdots &{} \cdots &{} \cdots &{} \cdots &{} \vdots \\ h_{k, 0} &{} h_{k, 1} &{} \cdots &{} \cdots &{} \cdots &{} h_{k, m-1} \\ &{} &{} 1 &{} -y &{} &{} \\ &{} &{} &{} \ddots &{} \ddots &{} \\ &{} &{} &{} &{} 1 &{} - y \end{matrix}\right| . \end{aligned}$$

Proof. [2, Theorem 2.4]

The advantage of this aforementioned method is that one can compute the entire subresultant chain in a bottom-up fashion. This process starts from computing the determinant of matrix H (or B) in Definition 1 to calculate \(S_{0} (a, b)\), the resultant of ab, and update the last k rows of H (or B) to calculate \(S_{k}(a, b)\) for \(1 \le k \le n\).

Example 3

Consider polynomials \(a = -5 y^4 x + 3 y x - y - 3 x + 3\) and \(b = -2 y^3 x + 3 y^3 - x\) in \(\mathbb {Z}[x, y]\) where \(x < y\). From Definition 2, the Hybrid Bézout matrix of ab is the matrix A from Example 1 on page 34. Recall from Example 1 that the determinant of this matrix can be calculated using the fraction-free LU decomposition schemes. Theorem 4, for \(k = 0\), yields that,

$$\begin{aligned} S_0(a, b)&= -1763 x^7 + 7881 x^6 - 19986 x^5 + 35045 x^4 - 41157 x^3 \\&\ \ \ + 30186 x^2 - 12420 x + 2187. \end{aligned}$$

For \(k = 1\), one can calculate \(S_1(a, b)\) from the determinant of:

$$\begin{aligned} H^{(1)} = \begin{pmatrix} -2 x + 3 &{} 0 &{} 0 &{} -x \\ 0 &{} 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) \\ 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) &{} 0 \\ 0 &{} 0 &{} 1 &{} -y \\ \end{pmatrix}, \end{aligned}$$

that is,

$$\begin{aligned} S_{1}(a, b)&= -242 x^5 y + 132 x^5 + 847 x^4 y - 660 x^4 - 1100 x^3 y + 1257 x^3 \\&\ \ \ + 693 x^2 y - 1134 x^2 - 216 x y + 486 x + 27 y - 81. \end{aligned}$$

We can continue calculating subresultants of higher indices with updating matrix \(H^{(1)}\). For instance, the 2nd and 3rd subresultants are, respectively, from the determinant of:

$$\begin{aligned} H^{(2)} = \begin{pmatrix} -2 x + 3 &{} 0 &{} 0 &{} -x \\ 0 &{} 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) \\ 0 &{} 1 &{} -y &{} 0 \\ 0 &{} 0 &{} 1 &{} -y \\ \end{pmatrix}, \end{aligned}$$

and,

$$\begin{aligned} H^{(3)} = \begin{pmatrix} -2 x + 3 &{} 0 &{} 0 &{} -x \\ 1 &{} -y &{} 0 &{} 0 \\ 0 &{} 1 &{} -y &{} 0 \\ 0 &{} 0 &{} 1 &{} -y \\ \end{pmatrix}, \end{aligned}$$

which are,

$$\begin{aligned} S_2(a, b)&= 22 y x^3 - 12 x^3 - 55 y x^2 + 48 x^2 + 39 y x - 63 x - 9 y + 27, \\ S_3(a, b)&= -2 y^3 x + 3 y^3 - x. \end{aligned}$$

We further studied the performance of computing subresultants from Theorem 4 in comparison to the Hybrid Bézout matrix in Definition 2 for multivariate polynomials with integer coefficients. In our implementation, we took advantage of the FFLU schemes reviewed in Sect. 2 to compute the determinant of these matrices using smart-pivoting technique in parallel; see Sect. 3.3 for implementation details and results.

3.2 Speculative Bézout Subresultant Algorithms

In Example 3, the Hybrid Bézout matrix was used to compute subresultants of two polynomials in \(\mathbb {Z}[x, y]\). We constructed the square matrix H from Definition 1 and updated the last \(k \ge 0\) rows following Theorem 4. Thus, the kth subresultant could be directly computed from the determinant of this matrix.

Consider solving systems of polynomial equations by triangular decomposition, and particularly, regular chains. This method uses a Regular GCD subroutine (see [8]) which requires the computation of subresultants in a bottom-up fashion: for multivariate polynomials ab (viewed as univariate in their main variable) compute \(S_0(a, b)\), then possibly \(S_1(a, b)\), then possibly \(S_2(a, b)\), etc., to try and find a regular GCD. This bottom-up approach for computing subresultant chains is discussed in [5].

In the approach explained in the previous section, we would call the determinant algorithm twice for \(H^{(0)} :=H\) and \(H^{(1)}\) to compute \(S_0, S_{1}\) respectively. Here, we study a speculative approach to compute both \(S_0\) and \(S_1\) at the cost of computing only one of them. This approach can also be extended to compute any two successive subresultants \(S_k, S_{k+1}\) for \(2 \le k < \deg (b, x_n)\).

To compute \(S_0, S_1\) of polynomials \(a = -5 y^4 x + 3 y x - y - 3 x + 3\) and \(b = -2 y^3 x + 3 y^3 - x\) in \(\mathbb {Z}[x, y]\) from Example 3, consider the \((m+1) \times m\) matrix, with \(m=4\), derived from the Hybrid Bézout matrix of ab, \(H^{(0, 1)} = \)

$$ \left( \begin{array}{cccc} -2 x + 3 &{} 0 &{} 0 &{} -x \\ 0 &{} 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) \\ 0 &{} 11 x^2 - 11 x + 3 &{} -3 (x - 1) (2 x - 3) &{} 0 \\ \mathbf {11 x^2 - 11 x + 3} &{} \mathbf {-3 (x - 1) (2 x - 3)} &{} \mathbf {0} &{} \mathbf {0} \\ 0 &{} 0 &{} 1 &{} -y \\ \end{array}\right) \!. $$

In this matrix, the first three rows are identical to the first three rows of \(H^{(0)}\) and \(H^{(1)}\), while the 4th (bold) row is the 4th row of \(H^{(0)}\) and the 5th (italicized) row is the 4th row of \(H^{(1)}\). A deeper look into the determinant algorithm reveals that the Gaussian (row) elimination for the first three rows in each iteration of the fraction-free LU decomposition is similar in both \(H^{(0)}\) and \(H^{(1)}\) and the only difference is within the 4th row.

Hence, managing these row eliminations in the fraction-free LU decomposition, we can compute determinants of \(H^{(0)}\) and \(H^{(1)}\) by using \(H^{(0,1)}\) only calling the FFLU algorithm once. Indeed, when this algorithm tries to eliminate the last rows of \(H^{(0)}\) and \(H^{(1)}\), we should use the last two rows of \(H^{(0, 1)}\) separately and return two denominators corresponding to \(S_0, S_1\).

We can further extend this speculative approach to compute \(S_2\) and \(S_3\) by updating the matrix \(H^{(0, 1)}\) to get the \((m+3) \times m\) matrix \(H^{(2, 3)}\) =

figure g

Therefore, to calculate subresultants of index 2 and 3, we should respectively consider the 2nd (bold) and 5th (italicized) rows of \(H^{(2, 3)}\) in the fraction-free LU decomposition while ignoring the 3rd and 4th (strikethrough) rows. An adaptation of the FFLU algorithm can then modify \(H^{(2, 3)}\) as follows to return \(d_{(2)}\), ignoring the 5th and strikethrough rows.

figure h

where \(d_{(2)} = -22 x^3 y + 12 x^3 + 55 x^2 y - 48 x^2 - 39 x y + 63 x + 9 y - 27\) and \(S_2 = - d_{(2)}\). Note that the 2nd and 6th rows are swapped to find a proper pivot.

The adapted FFLU algorithm can also modify \(H^{(2, 3)}\) to rather return \(d_{(3)}\), ignoring the 2nd (bold) and strikethrough rows,

figure i

where \(d_{(3)} = - 2 x y^3 + 3 y^3 - x\) and \(S_3 = d_{(3)}\).

Generally, to compute subresultants of index k and \(k+1\), one can construct the matrix \(H^{(k, k+1)}\) from the previously constructed \(H^{(k-2, k-1)}\) for \(k > 1\). This recycling of previous information makes computing the next subresultants of index k and \(k+1\) much more efficient, and is discussed below. We proceed with an adapted FFLU algorithm over:

  • the first \(m-k-1\) rows,

  • the bold row for computing \(S_k\), or the italicized row for computing \(S_{k+1}\), and

  • the last k rows

of matrix \(H^{(k, k+1)} \in \mathbb {B}^{(m+k) \times k}\) with \(\mathbb {B}= \mathbb {Z}{[x_1, \ldots , x_v]}\),

figure j

As seen in the last example, the FFLU algorithm, depending on the input polynomials, may create two completely different submatrices to calculate \(d_{(2)}\) and \(d_{(3)}\). Thus, the cost of computing \(S_{k}, S_{k+1}\) from \(H^{(k, k+1)}\) speculatively may not necessarily be less than computing them successively from \(H^{(k)}, H^{(k+1)}\) for some \(k > 1\).

We improve the performance of computing \(S_{k}, S_{k+1}\) speculatively via caching, and then reusing, intermediate data calculated to compute \(S_{k-2}, S_{k-1}\) from \(H^{(k-2, k-1)}\). In this approach, the adapted FFLU algorithm returns \(d_{(k-2)}, d_{(k-1)}\) along with \(H^{(k-2, k-1)}\), the reduced matrix \(H^{(k-2, k-1)}\) to compute \(d_{(k-1)}\), the list of permutation patterns and pivots.

Therefore, we can utilize \(H^{(k-2, k-1)}\) to construct \(H^{(k, k+1)}\). In addition, if the first \(\delta :=m-k-1\) pivots are picked from the first \(\delta \) rows of \(H^{(k-2, k-1)}\), then one can use the first \(\delta \) rows of the reduced matrix \(H^{(k-2, k-1)}\) along with the list of permutation patterns and pivots to perform the first \(\delta \) row eliminations of \(H^{(k, k+1)}\) via recycling the first \(\delta \) rows of the reduced matrix cached a priori.

3.3 Experimentation

In this section, we compare the subresultant algorithms based on (Hybrid) Bézout matrix against the Ducos’ subresultant chain algorithm in BPAS and Maple 2020. In BPAS, our optimized Ducos’ algorithm (denoted OptDucos), is detailed in [5].

Table 4 and Table 5 show the running time of plain and speculative algorithms for randomly generated, non-zero, sparse polynomials \(a, b \in \mathbb {Z}[x_1, x_2, \ldots , x_6]\) with \(x_6< \cdots< x_2 < x_1\), \(\deg (a, x_1) = \deg (b, x_1) + 1 = d, \text { and } \deg (a, x_i) = \deg (b, x_i) = 1 \text { for } 2 \le i \le 6.\) Table 6 and Table 7 show the running time of plain, speculative and caching subresultant schemes for randomly generated, non-zero, and sparse polynomials \(a, b \in \mathbb {Z}[x_1, x_2, \ldots , \) \(x_7]\) with \(x_7< \cdots< x_2 < x_1\), \(\deg (a, x_1) = \deg (b, x_1) + 1 = d\), and \(\deg (a, x_i) = \deg (b, x_i) = 1\) for \(2 \le i \le 7\).

Note that the Bézout algorithm in Maple computes the resultant of ab (\(S_0(a, b)\)) meanwhile both Maple’s and BPAS’s Ducos’ algorithm computes the entire subresultant chain. In BPAS, we have the following:

  1. 1.

    Bézout \({ (\rho = 0)}\) calculates the resultant (\(S_0(a, b)\)) via the determinant of Hybrid Bézout matrix of ab;

  2. 2.

    Bézout \({ (\rho = 1)}\) calculates \(S_1(a, b)\) following Theorem 4 from the Hybrid Bézout matrix of ab;

  3. 3.

    SpecBézout \({ (\rho = 0)}\) calculates \(S_0(a, b), S_1(a, b)\) speculatively using \(H^{(0, 1)}\);

  4. 4.

    SpecBézout \({ (\rho = 2)}\) calculates \(S_2(a, b), S_3(a, b)\) speculatively using \(H^{(2, 3)}\);

  5. 5.

    SpecBézout\(_{\texttt {cached}}\) \({ (\rho = 2)}\) calculates \(S_2(a, b), S_3(a, b)\) speculatively via \(H^{(2, 3)}\) and the cached information calculated in SpecBézout \({ (\rho = 0)}\)

  6. 6.

    SpecBézout\(_{\texttt {cached}}\) \({ (\rho = \texttt {\small all})}\) calculates the entire subresultant chain using the speculative algorithm and caching.

To compute subresultants from Bézout matrices in Maple, we use the command SubresultantChain( ... , ‘representation’=‘BezoutMatrix’) from the RegularChains library. Our Bézout algorithm is up to \(3{\times }\) faster than the Maple implementation to calculate only \(S_0\). Moreover, our results show that Bézout algorithms outperform the Ducos’ algorithm in both BPAS and Maple for sparse polynomials with many variables.

Tables 4 and 6 show that the cost of computing subresultants \(S_0, S_1\) speculatively is comparable to the running time of computing only one of them. Tables 5 and 7 indicate the importance of recycling cached data to compute higher subresultants speculatively. Our Bézout algorithms can calculate all subresultants speculatively in a comparable running time to the Ducos’ algorithm.

Table 4. Comparing the execution time (in seconds) of subresultant algorithms based on Bézout matrix for randomly generated, non-zero, sparse polynomials \(a, b \in \mathbb {Z}[x_6< \ldots < x_1]\), \(\deg (a, x_1) = \deg (b, x_1) + 1 = d\), and \(\deg (a, x_i) = \deg (b, x_i) = 1\) for \(2 \le i \le 6\)
Table 5. Comparing the execution time (in seconds) of speculative subresultant algorithms for polynomials in Table 4
Table 6. Comparing the execution time (in seconds) of subresultant algorithms based on Bézout matrix for randomly generated, non-zero, sparse polynomials \(a, b \in \mathbb {Z}[x_7< \ldots < x_1]\), \(\deg (a, x_1) = \deg (b, x_1) + 1 = d\), and \(\deg (a, x_i) = \deg (b, x_i) = 1\) for \(2 \le i \le 7\)
Table 7. Comparing the execution time (in seconds) of speculative and caching subresultant algorithms for polynomials in Table 6
Table 8. Comparing time (in seconds) to solve polynomial systems with \(\texttt {nvar} \ge 5\); system names come from a test suite detailed [4]

As described in Sect. 3.1, polynomial system solving benefits from computing regular GCDs in a bottom-up approach. From a test suite of over 3000 polynomial systems, coming from the literature and collected from Maple user-data (see [4, Section 6]) we compare the benefits of (Speculative) Bézout methods for computing subresultants vs BPAS’s optimized Ducos algorithm. Table 8 shows this data for some systems of the test suite with at least 5 variables. Table 9 shows systems which are very challenging to solve, requiring at least 50 s. For these hard systems, speculative methods achived a speed-up of up to \(1.6{\times }\) compared to Ducos’ method. Note that, in some cases, the regular GCD has high degree and is thus equal to a subresultant of high index. Thus, Ducos’ method to compute the entire subresultant chain may be more efficient than repeated calls to the speculative method.

Table 9. Comparing time (in seconds) to solve “hard” polynomial systems with \(\texttt {nvar} \ge ~5\); system names come from a test suite detailed [4]