1 Introduction

QR factorization is widely used throughout mathematics, engineering, and computer science. Specifically, QR is regularly used to solve linear least squares problems [1, 2], reveal the rank of matrices [3,4,5], solve highly ill-conditioned linear systems [6, 7], compute Eigenvalues [8,9,10], and as a subroutine in quadratic programming [11,12,13].

In the context of quadratic programming, QR factorization was shown to be an efficient tool for solving quadratic programs with high (floating-point) accuracy [12]. Likewise, on a parallel effort, Gartner and Schonherr [14] developed and implemented an exact quadratic programming algorithm for computational geometry problems; notably, this exact QP solver was based on an extension of an exact linear programming solver, not on QR factorization. This paper bridges the gap between these two disjoint approaches by presenting an exact QR factorization, which may be subsequently used to develop an exact QP algorithm. Specifically, we present a general-purpose roundoff-error-free (REF) QR factorization framework that can be used out-of-the-box to exactly and reliably solve any numerically-challenging application.

That said, the benefits of REF QR extend well beyond quadratic programming because general-purpose (floating point) QR factorizations have been shown to suffer from numerical issues in some applications. For example, if A is very ill-conditioned, the Q matrix, when computed with traditional floating-point approaches, tends to lose orthogonality and thus may lead to incorrect solutions [6, 15]. Indeed, in practice, a previous implementation of (floating-point) QR factorization in LAPACK was shown to numerically fail in rank-revealing computations [16]. Though this shortcoming was successfully addressed therein, another far more recent example, in the context of deep neural networks, illustrates that state-of-the-art, out-of-the-box floating-point QR factorization lacks accuracy when solving some least-square problems [17]. Again, though the issue was addressed in that context (via QR code specialized for their application), the authors state, “Our experience seems to suggest that presently with neural engines, matrix factorizations (QR, LU, Cholesky) are best to be co-designed with their applications (linear solver, least square, orthogonalization, SVD, etc.) to achieve high performance and adequate accuracy and reliability.” While this suggestion is worthwhile, our REF QR factorization may be directly used to benchmark specialized floating-point matrix factorizations after they are fine-tuned for specific applications.

In pursuit of this, we introduce two new QR factorizations of the form \(A = QDR\), where the columns of Q are pairwise orthogonal, D is diagonal, and R is upper trapezoidal. Notably, both Q and R are comprised of exclusively integer entries with polynomially-bounded bit-lengths, while D (whose entries are all reciprocals of integers with also polynomially-bounded bit-lengths) never needs to be explicitly computed or applied when using the REF QR factorization. Given a matrix \(A\in {\mathbb {Z}}^{m\times n}\), with \(m\ge n\), we present two variants of the REF QR factorization: (1) the “thin REF QR Factorization” in which \(Q\in {\mathbb {Z}}^{m \times n}\) and \(R\in {\mathbb {Z}}^{n \times n}\) is the REF Cholesky [18, 19] factor of \(A^T A\), and (2) the “standard REF QR Factorization” in which \(Q\in {\mathbb {Z}}^{m \times m}\) and \(R\in {\mathbb {Z}}^{m \times n}\) is the REF Cholesky factor of \(A^T A\) appended by \(m-n\) rows of zeros. In addition to proving the existence of the REF QR factorizations, we derive several of their properties, which are analogous to the properties of the traditional QR factorizations as well as derive the algorithms to compute them. Moreover, when A has full column rank, we show that solving the linear system \(A{\textbf{x}}= {\textbf{b}}\) using the REF QR factorization provides the exact least squares solution (i.e., the exact minimizer of the two-norm of \(A{\textbf{x}}- {\textbf{b}}\)). Conversely, when A is rank deficient, we show that REF QR can be used to compute an exact basic solution to the system \(A{\textbf{x}}= {\textbf{b}}\).

We note that Zhou and Jeffrey [20] were the first to present an exact thin QR factorization called fraction-free QR. The contributions highlighted above significantly expand on their work; as we develop both thin and standard QR factorizations, prove that they have properties analogous to traditional QR factorizations, and show how to use them to solve both full-rank and rank-deficient linear systems. Thus, another contribution of this work is to generalize the ideas of [20] and relate them to the broader linear algebra body of knowledge.

2 Background

This section briefly reviews the theoretical background needed to derive REF QR. Hereafter, it is assumed that \(A\in {\mathbb {Z}}^{m \times n}\) with \(m \ge n\). Note that if \(n>m\), the derived properties apply to \(A^T\), while if A is decimal or rational, it can be made integral by multiplying by the appropriate power of 10 or the least common multiple, respectively. Lastly, as in several authoritative matrix linear algebra and numerical analysis textbooks (e.g., [6, 15, 21]), we utilize the MATLAB notation, where the (ij) element of A, the ith row of A, and the ith column of A are denoted as A(ij), A(i,  : ), and A( : , i) respectively; while the submatrix at the intersection of (consecutive) rows \(i_1\) to \(i_2\) and (consecutive) columns \(j_1\) to \(j_2\) is denoted as \(A(i_1:i_2,j_1:j_2)\). Sections 2.1, 2.22.3, and 2.4 review the traditional QR factorizations, the Pursell & Trimble algorithm to compute the traditional QR factorizations via Gaussian elimination, integer-preserving Gaussian elimination, and REF Cholesky, respectively.

2.1 Traditional QR factorization

Given a matrix \(A\in {\mathbb {R}}^{m \times n}\), a QR factorization factors A into the product \(A = QR\), where Q is orthonormal and R is upper trapezoidal. There are two classes of QR factorization: thin QR and standard QR. Thin QR factorization is typically computed via Gram-Schmidt orthogonalization [15, 22] and results in a rectangular \(Q\in {\mathbb {R}}^{m \times n}\) and a square \(R\in {\mathbb {R}}^{n \times n}\). On the other hand, the standard QR factorization, which is typically referred to just as QR factorization, is computed via either Givens rotations [23] or Householder reflections [24] and results in a square \(Q\in {\mathbb {R}}^{m \times m}\) and a rectangular \(R\in {\mathbb {R}}^{m \times n}\). These traditional QR factorizations have several properties of interest to this paper:

  1. 1.

    Q is an orthonormal matrix. In thin QR, Q is left orthonormal; i.e., \(Q^TQ = I\). While, in standard QR, Q is fully orthonormal; i.e., \(Q Q^T = Q^T Q = I\) and thus \(Q^{-1} = Q^T\).

  2. 2.

    R contains the Cholesky factor of \(A^T A\). Specifically, in thin QR, R is square and the Cholesky factor of \(A^T A\); i.e., \(R^T R = A^T A\). While, in standard QR, R is rectangular, and its first n rows are the Cholesky factor of \(A^T A\) (its last \(m-n\) rows of R are zeros).

For an in-depth look at QR factorization, we refer the reader to [15].

2.2 QR factorization via Gaussian elimination

The three standard approaches to computing a QR factorization (Gram-Schmidt, Givens, and Householder) are orthogonalization algorithms that construct the orthogonal matrix \({\hat{Q}}\). Pursell and Trimble present a totally different approach to computing a QR factorization [25], which is based on Gaussian elimination. Specifically, they show that performing Gaussian elimination on the matrix \([A^T A \mid A^T]\) to obtain its row echelon form yields \([{\hat{R}}\mid {\hat{Q}}^T]\), where \(A={\hat{Q}}{\hat{R}}\) is a scaled version (because the columns of \({\hat{Q}}\) are not normalized) of the thin (floating-point) QR factorization of A. While the computational complexity of this approach is not competitive with that of the other three approaches, Pursell and Trimble’s approach is quite useful in proving several of the theorems in this paper.

2.3 IPGE

Integer-preserving Gaussian elimination (IPGE) is an exact variant of Gaussian elimination used for solving a system of linear equations \(A{\textbf{x}}= {\textbf{b}}\). Given a full rank matrix \(A\in {\mathbb {Z}}^{n \times n}\) and right hand side vector \({\textbf{b}}\), denote the kth iteration IPGE matrix as \(A^{(k)}\) for \(k=0,\ldots ,n\) (where \(A^{(0)} \triangleq A\))Footnote 1. Then, assuming there are no row permutations, let \(a_{i,j}^{(k)}\) and \(\rho ^k \triangleq a_{k,k}^{(k-1)}\) denote the individual entries of \(A^{(k)}\) and the kth pivot element for \(1 \le i \le n\), \(1 \le j \le n\), and \(0 \le k \le n\) (with \(\rho ^0 \triangleq 1\)), respectively. Then, at iteration k, the IPGE algorithm computes the entries \(a_{i,j}^{(k)}\) as follows:

$$\begin{aligned} a_{i,j}^{(k)} = {\left\{ \begin{array}{ll} a_{i,j}^{(k-1)} &{} \quad \text {if }i = k,\\ \frac{ \rho ^k a_{i,j}^{(k-1)} - a_{k,j}^{(k-1)} a_{i,k}^{(k-1)}}{ \rho ^{k-1}} &{}\quad \text {otherwise} \end{array}\right. } \end{aligned}$$
(1)

Note that Equation (1) differs from traditional Gaussian elimination only in the denominator (in traditional Gaussian elimination, the division is by the current pivot, \(\rho ^{k}\), instead of the previous pivot). This seemingly minor modification leads to the following two key properties:

Lemma 2.1

Throughout all of the IPGE algorithm, the divisions in Equation (1) are guaranteed to be integral (have a zero reminder) [26,27,28].

Lemma 2.2

Given a matrix A, with \(\sigma \triangleq \max \limits _{i,j} \mid a_{i,j}^{(0)} \mid \), the maximum bit length to store any IPGE entry, \(\beta _{max}\), is upper-bounded polynomially as follows [18, 29]:

$$\begin{aligned} {\left\{ \begin{array}{ll} \beta _{max} \le \lceil n\log (\sigma ) \rceil &{}\quad \text {if }A\text { is SPD,} \\ \beta _{max} \le \lceil n \log (\sqrt{n}\sigma ) \rceil &{}\quad \text {if }A\text { is not SPD} \end{array}\right. } \end{aligned}$$

This polynomial bound on IPGE entries is generally pessimistic and is tight if and only if A is diagonal (when A is SPD) or A has orthogonal columns (when A is not SPD). [18, 29,30,31]

2.4 REF Cholesky

The REF Cholesky factorization [18], based on IPGE, factors an SPD matrix A into the product \(A = L D L^T\), where L is a lower triangular matrix comprised of integer entries and \(D = diag(\rho ^0 \rho ^1, \rho ^1 \rho ^2, \ldots , \rho ^{n-1} \rho ^n)^{-1}\). Notably, D is never needed to compute the factorization or when using the factorization to solve an SPD linear system. Along with associated REF forward and backward substitution algorithms, REF Cholesky can exactly solve the SPD linear system \(A{\textbf{x}}= {\textbf{b}}\) exclusively in integer arithmetic. Additionally, the left-looking and up-looking sparse REF Cholesky factorization algorithms were derived in [19]. Notably, these algorithms solve a sparse SPD linear system \(A{\textbf{x}}= {\textbf{b}}\) in asymptotically efficient time complexity, meaning that the dominant cost in these algorithms’ complexities is that of the arithmetic operations and do not have ancillary operations such as greatest common divisor operations—required by rational-arithmetic algorithms. Accordingly, these exact factorizations outperform competitor rational-arithmetic LDL and unsymmetric exact factorizations [19].

3 Roundoff-error-free QR factorizations

This section derives two REF QR factorizations, with which a given matrix A is factored as \(A=QDR\), and proves their key properties. Specifically, Sects. 3.1 and 3.2 formally present the thin REF QR factorization (i.e., where \(Q\in {\mathbb {Z}}^{m \times n}\) and \(R\in {\mathbb {Z}}^{n \times n}\)) and the standard REF QR factorization (i.e., where \(Q\in {\mathbb {Z}}^{m \times m}\) and \(R\in {\mathbb {Z}}^{m \times n}\)), respectively. Lastly, Sects. 3.4 establishes the relationship between the REF QR factorizations and the traditional (floating-point) QR factorizations.

3.1 Thin REF QR factorization

Theorem 3.1 presents the thin REF QR factorization and its key properties.

Theorem 3.1

Every full column rank integral matrix \(A\in {\mathbb {Z}}^{m \times n}\) has a unique integral factorization \(A = QDR\) with the following properties:

  1. (a)

    Q and R are both integral matrices; specifically, \(Q\in {\mathbb {Z}}^{m \times n}\) and \(R\in {\mathbb {Z}}^{n \times n}\).

  2. (b)

    DR is the REF Cholesky factorization of \(A^T A\); that is \(R^T D R = A^T A\), where R is an upper triangular integral matrix, and D is a diagonal matrix. Moreover, \(D=diag(\rho ^0 \rho ^1, \rho ^1 \rho ^2, \ldots , \rho ^{n-1} \rho ^n)^{-1}\), where \(\rho ^i\) is the ith pivot element of the REF Cholesky factorization of \(A^T A\), and thus \(D^{-1}\in {\mathbb {Z}}^{n \times n}\).

  3. (c)

    R can be computed as \(R = Q^T A\) (just like in the traditional QR factorization \(A = QR\)).

  4. (d)

    The columns of Q are pairwise orthogonal. Namely, \(Q^T Q = D^{-1}\).

Proof

We prove this by constructing the matrices Q and R and showing that they satisfy properties (a)-(e) and that \(A=QDR\). First, we obtain R by performing the REF Cholesky factorization on the SPD matrix \(A^T A\). Specifically, we factorize \(A^T A = R^T D R\) where R is the upper triangular integral REF Cholesky factor of \(A^T A\) and \(D = diag(\rho ^0 \rho ^1, \rho ^1 \rho ^2, \ldots , \rho ^{n-1} \rho ^n)^{-1}\). Since this factorization can be computed by performing IPGE operations on \(A^TA\) in order to reduce it to an upper triangular matrix, R; these IPGE operations on \(A^T A\) are equivalent to left-multiplying a matrix by \((R^TD)^{-1}\). Next, by left multiplying \([A^T A \mid A^T]\) by \((R^TD)^{-1}\) we obtain the integral matrix \([R \mid (R^T D)^{-1} A^T]\). Again, note that this matrix, \([R \mid (R^T D)^{-1} A^T]\), is an integral matrix because multiplying by \((R^TD)^{-1}\) is equivalent to performing the associated IPGE operation on the full matrix \([A^T A \mid A^T]\) which are guaranteed to be integral [26, 27]. Now letting \(Q^T = (R^T D)^{-1} A^T\) (i.e., \(Q = A (DR)^{-1}\)), it follows that \(QDR = A (DR)^{-1}(DR) = A\). We next show that the R and Q integral matrices, which satisfy properties (a) and (b), also satisfy properties (c) and (d).

Property (c) is proved by expanding the product \(Q^T A\) as \((R^T D)^{-1} A^T A\), then since \(A^T A = R^T D R\), we obtain:

$$\begin{aligned} Q^T A = (R^T D)^{-1} R^T D R = R. \end{aligned}$$

In a similar fashion, property (d) is proved by expanding the product \(Q^T Q\) as \((R^T D)^{-1} A^T A (DR)^{-1}\), then since \(A^T A = R^T D R\), we obtain:

$$\begin{aligned} Q^T Q = (R^T D)^{-1} R^T D R (DR)^{-1} = D^{-1}. \end{aligned}$$

Finally, the uniqueness of the thin REF QR follows from the following observations: (1) DR is unique (because it is the REF Cholesky factorization of \(A^T A\)) and (2) \(Q = A R^{-1} D^{-1}\) is a unique product (because A has full column rank). \(\square \)

3.2 Standard REF QR factorization

Similar to the thin REF QR factorization, the standard REF QR constructs the factorization \(A = Q D R\) where Q and R are comprised of exclusively integer entries. In contrast to the thin REF QR, the standard REF QR results in a square \(Q\in {\mathbb {Z}}^{m \times m}\) and an upper trapezoidal \(R \in {\mathbb {Z}}^{m \times n}\). Theorem 3.2 formally introduces this version of REF QR and proves some of its properties.

Theorem 3.2

Given a full column rank integral matrix \(A\in {\mathbb {Z}}^{m \times n}\), there exists an integral factorization \(A = Q D R\) with the following properties:

  1. (a)

    Q and R are both integral matrices. Specifically \(Q\in {\mathbb {Z}}^{m \times m}\) and \(R\in {\mathbb {Z}}^{m \times n}\).

  2. (b)

    The REF Cholesky factorization of \(A^T A\), \(A^T A = {\tilde{R}}^T {\tilde{D}}{\tilde{R}}\), is embedded in DR. Specifically, the first n rows of R and D(1 : n, 1 : n) are the REF Cholesky factors \({\tilde{R}}\) and \({\tilde{D}}\).

  3. (c)

    R can be computed as \(R = Q^T A\) (just like in the traditional QR factorization \(A = QR\)).

  4. (d)

    The columns of Q are pairwise orthogonal. Namely \(Q^T Q = D^{-1}\) (note that \(D^{-1}\) is a diagonal integral matrix).

  5. (e)

    Q resembles an orthogonal matrix in that \(Q^{-1} = D Q^T\) (whereas, in a truly orthogonal matrix, \(Q^{-1} = Q^T\)).

Proof

To prove this theorem, we define several useful matrices. Without loss of generality, since A is full column rank, assume the first n rows of A are linearly independent. Construct the matrix \( \bar{A} \triangleq \left[ {A\;\begin{array}{*{20}c} {\mathbf{0}} \\ {I_{{m - n}} } \\ \end{array} } \right] \triangleq [A\;\bar{I}] \) where \(\varvec{0}\) is the zero matrix of the appropriate size. Let the thin REF QR decompositions of A and \({\bar{A}}\) be given as \(A={\tilde{Q}}{\tilde{D}}{\tilde{R}}\) and \({\bar{A}}={\bar{Q}}{\bar{D}}{\bar{R}}\), respectively.

From the sequence of equations

$$\begin{aligned} \begin{bmatrix} A&{\bar{I}}\end{bmatrix} = {\bar{A}}= {\bar{Q}}{\bar{D}}{\bar{R}}= & {} \begin{bmatrix} {\bar{Q}}_1&{\bar{Q}}_2 \end{bmatrix} \begin{bmatrix} {\bar{D}}_{11} &{} \varvec{0} \\ \varvec{0} &{} {\bar{D}}_{22} \end{bmatrix} \begin{bmatrix} {\bar{R}}_{11} &{} {\bar{R}}_{12} \\ \varvec{0} &{} {\bar{R}}_{22} \end{bmatrix} \\= & {} \begin{bmatrix} {\bar{Q}}_1{\bar{D}}_{11}{\bar{R}}_{11}&{\bar{Q}}_1{\bar{D}}_{11}{\bar{R}}_{12} + {\bar{Q}}_2{\bar{D}}_{22}{\bar{R}}_{21} \end{bmatrix} \end{aligned}$$

we make the following observations:

  1. 1.

    Comparing the first and last terms, we obtain \(A={\bar{Q}}_1{\bar{D}}_{11}{\bar{R}}_{11}\). Furthermore, \({\bar{Q}}_1={\tilde{Q}}\), \({\bar{D}}_{11}={\tilde{D}}\), and \({\bar{R}}_{11}={\tilde{R}}\) because the thin REF QR factorization of \(A={\tilde{Q}}{\tilde{D}}{\tilde{R}}\) is unique.

  2. 2.

    Let \(R = \begin{bmatrix} {\bar{R}}_{11} \\ \varvec{0} \end{bmatrix}\). Then, \(A={\bar{Q}}{\bar{D}}R\) (this can be observed by ignoring the second column in the decomposition of \({\bar{R}}\), and performing the matrix multiplication yielding \({\bar{Q}}_1{\bar{D}}_{11}{\bar{R}}_{11}\), which in turn equals A).

The second observation establishes that A can always be factorized as \(A={\bar{Q}}{\bar{D}}R\). With this factorization in hand, we now proceed to prove properties 3.2(a) to 3.2(e).

Observing how the matrices \({\bar{Q}}\) and R were obtained (via thin QR factorizations of A and \({\bar{A}}\)), we conclude that the matrices \({\bar{Q}}\) and R are integral matrices of the sizes specified in Theorem 3.2; i.e., \({\bar{Q}}\in {\mathbb {Z}}^{m\times m}\) and \(R\in {\mathbb {Z}}^{m\times n}\). This proves Property 3.2(a).

Since \(A={\tilde{Q}}{\tilde{D}}{\tilde{R}}\) and Property 3.1(b), we have that \({\tilde{R}}^T{\tilde{D}}{\tilde{R}}\) is the Cholesky factorization of \(A^TA\). Thus, Property  3.2(b) follows from Observation 2 above.

Since \({\bar{Q}}^T {\bar{A}}= {\bar{R}}\), the following sequence of equations shows how Property 3.2(c) follows from Property 3.1(c).

$$\begin{aligned} {\bar{Q}}^T {\bar{A}}= \begin{bmatrix} {\bar{Q}}_{1}^T \\ {\bar{Q}}_{2}^T \end{bmatrix} \begin{bmatrix} A&{\bar{I}}\end{bmatrix} = \begin{bmatrix} {\bar{R}}_{11} &{} {\bar{R}}_{12} \\ \varvec{0} &{} {\bar{R}}_{22} \end{bmatrix} \Rightarrow {\bar{Q}}^T A = \begin{bmatrix} {\bar{R}}_{1,1} \\ \varvec{0} \end{bmatrix} = R \end{aligned}$$

Thereby showing that \({\bar{Q}}^T A = R\).

Next, Property 3.2(d) \({\bar{Q}}^T {\bar{Q}}= {\bar{D}}^{-1}\) follows from Property 3.1(d).

Finally, to show Property 3.2(e), we utilize Property 3.1(d) again. Specifically, since \(Q^T Q = D^{-1}\), we can left multiply by D to obtain \(D Q^T Q = I\). Finally, since Q is a full rank \(m \times m\) matrix, which follows from Property  3.1(d), we can right multiply by \(Q^{-1}\) and we obtain \(Q^{-1} = D Q^T\) thereby completing the proof. \(\square \)

3.3 Computing REF QR and theoretical bounds

The proofs above, as well as Pursell & Trimble [25] and Zhou & Jeffrey [20], give a method to compute thin REF QR based on performing IPGE on \([A^T A \mid A^T]\). By exploiting the properties of our REF QR factorizations, this section presents (1) a slightly modified yet expedited version of the algorithm for thin REF QR, (2) a new algorithm for standard REF QR, (3) proof that the bit-length of each entry is bounded polynomially, and (4) the computational complexity of both algorithms.

3.3.1 Thin REF QR algorithm

Perform the REF Cholesky factorization of \([A^T A]\), carrying the row operations throughout \(A^T\). By using REF Cholesky, this approach efficiently exploits the symmetry in \(A^TA\) and thus requires about \(n^3/2\) fewer operations than Zhou and Jeffrey because they compute a full LU factorization on \([A^T A \mid A^T]\). The total number of operations required to compute thin REF QR with our algorithm is \({\mathcal {O}}(m^2 n + n^3)\), as one must compute \(A^T A\) (\({\mathcal {O}}(n^2 m)\)), perform REF Cholesky on \(A^T A\) (\({\mathcal {O}}(n^3)\)) and, concurrently, carry on the row operations on \(A^T\) (\({\mathcal {O}}(n^2 m)\)). This algorithm returns the unique thin REF QR factorization \(A = Q D R\). Note that D does not need to be explicitly calculated as its entries are known directly from R, and, furthermore, as the ensuing sections describe, D is not needed to use the QR factorization.

3.3.2 Standard REF QR algorithm

Convert A to an appropriate full-rank square matrix \({\bar{A}}= [A {\bar{I}}]\) where \({\bar{I}}\) is the last \(m -n\) columns of the \(m \times m\) identity matrix (note that the augmented matrix needs only be one which guarantees \({\bar{A}}\) has full rank). Then, perform thin QR on \({\bar{A}}\) to obtain \(A = {\bar{Q}}{\bar{D}}{\bar{R}}\). The standard REF QR is then \(A = {\bar{Q}}{\bar{D}}R\) where R comprises the first n columns of \({\bar{R}}\). This approach requires \({\mathcal {O}}(m^3)\) operations (it simply performs thin QR discussed above with \(n = m\)). While Standard REF QR is not necessary for the applications discussed in the next section, this approach gives a method to obtain it if desired.

Theorem 3.3 gives the maximum bit-length required to store (and compute) any entry in REF QR.

Theorem 3.3

Given a matrix \(A\in {\mathbb {Z}}^{m\times n}\), let \(\sigma \) denote the absolute value of the largest entry in A. The maximum bit-length of every entry in the REF QR factorizations of A, denoted \(\beta _{max}\), is bounded above by:

$$\begin{aligned} {\left\{ \begin{array}{ll} \beta _{max} \le \lceil 2 n \log (m \sigma ) \rceil &{} \quad \text {for thin REF QR}, \\ \beta _{max} \le \lceil 2 m \log (m \sigma ) \rceil &{} \quad \text {for standard REF QR}. \end{array}\right. } \end{aligned}$$
(2)

Proof

For thin QR, R and Q can be obtained by applying IPGE on \([A^T A \mid A^T]\). Every entry in \([A^T A \mid A^T]\) is of at most size \(m\sigma ^2\), which would occur only if A has a fully dense row of entries of magnitude \(\sigma \). Though this matrix is of size \(n\times n+m\), Edmonds [26] showed that every entry throughout IPGE is the determinant of a square submatrix (in this case, at most a \(n\times n\) matrix); thus, using Lemma  2.2, we obtain:

$$\begin{aligned} \beta _{max} \le n \log (\sqrt{n} m \sigma ^2) \le n \log (m^2 \sigma ^2) = 2 n \log (m \sigma ). \end{aligned}$$

The proof is complete by noting that the standard QR bound applies by just changing the dimension of the matrix to be \(m \times m\) instead of \(m \times n\). \(\square \)

Theorem 3.3 gives a pessimistic bound for two reasons: (1) Hadamard’s bound itself is pessimistic, and (2) it assumes A contains a fully dense row comprised of the largest entries in A. Despite these drawbacks, Theorem 3.3 is valuable as it shows that every entry in REF QR and within their computation is polynomially bounded.

Finally, we present the complexity of computing each factorization.

Corollary 3.1

Let \(A\in {\mathbb {Z}}^{m\times n}\) and let \(\beta _{max}\) be the maximum bit-length of any entry in the REF QR factorization of A. Then, the worst-case complexity of computing REF QR is:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\mathcal {O}}( n^2 m (\beta _{max} \log \beta _{max} \log \log \beta _{max}) ) &{}\quad \text {for thin REF QR}, \\ {\mathcal {O}}( m^3 (\beta _{max} \log \beta _{max} \log \log \beta _{max}) ) &{}\quad \text {for standard REF QR}. \end{array}\right. } \end{aligned}$$
(3)

The above result follows directly from combining Theorem 3.3 with the number of operations within each algorithm discussed above.

3.4 Relationship of REF QR and traditional QR

This section gives the mathematical relation between the thin REF QR factorization and the traditional thin QR factorization. Through this relation, one can obtain, from the REF QR, the traditional QR factorization in floating-point to any desired level of precision. Note that this section focuses solely on the thin QR factorization as it is the only unique QR factorization.

Theorem 3.4

Let \(A\in {\mathbb {Z}}^{m \times n}\) be factored via the thin REF QR as \(A = Q DR\) and via the traditional thin QR factorization as \(A = {\hat{Q}}{\hat{R}}\). Then, \({\hat{Q}}= Q \sqrt{D}\) and \({\hat{R}}= \sqrt{D} R\).

Proof

Recall that, given a matrix \(A \in {\mathbb {Z}}^{m \times n}\) and its traditional QR factorization (\(A = {\hat{Q}}{\hat{R}}\)), \({\hat{R}}\) is the (unique) Cholesky factor of \(A^T A\); namely, \({\hat{R}}^T {\hat{R}}= A^T A\). Likewise, given the REF QR factorization of A, \(A = QDR\), and \(R^T D R = A^T A\). Thus, \(\sqrt{D} R = {\hat{R}}\). Therefore, \(A = Q D R = Q \sqrt{D}{\hat{R}}\). Furthermore, since \({\hat{Q}}\) and Q are full rank and unique, it must be the case that \({\hat{Q}}= Q \sqrt{D}\). \(\square \)

Interestingly, due to the square root operations, Theorem 3.4 implies that it is effectively impossible for a matrix to have a rational QR factorization (at least without some diagonal matrix akin D to “hide” the square roots).

4 Solving linear systems with REF QR

The linear system \(A{\textbf{x}}={\textbf{b}}\) where \(A\in {\mathbb {Z}}^{m \times n}\) and \({\textbf{b}}\in {\mathbb {Z}}^m\) is not guaranteed to have an exact solution because the product \(A{\textbf{x}}\) lies in the span of the columns of A which is a proper subspace of \({\mathbb {Q}}^m\) (since \(n<m\)). In spite of this, several techniques exist to find an acceptable solution to such systems. Specifically, if A has full column rank, the typical strategy is to find the least squares solution to the system \(A{\textbf{x}}={\textbf{b}}\); i.e., to find the vector \({\textbf{x}}\) which uniquely minimizes the two-norm of \(A{\textbf{x}}-{\textbf{b}}\). Alternatively, if A does not have full column rank, the least squares problem has an infinite number of solutions; thus, a common QR factorization-based approach is to find a basic solution. This section describes how to use REF QR to obtain either an exact solution to the least squares problem (full rank A) or a basic solution (rank deficient A).

4.1 Full-column-rank linear systems

Given a linear system \(A{\textbf{x}}={\textbf{b}}\) where \(A\in {\mathbb {Z}}^{m \times n}\), \(n<m\), \({\textbf{b}}\in {\mathbb {Z}}^m\), and A has full column rank, the typical strategy is to find the unique least squares solution of the given system, \({\textbf{x}}\), which is the unique vector minimizing the two-norm of \(A{\textbf{x}}- {\textbf{b}}\). Theorem 4.1 and Corollary 4.1 show how to use the thin and standard REF QR factorizations, respectively, to obtain the unique least squares solution to \(A{\textbf{x}}= {\textbf{b}}\).

Theorem 4.1

Given a full column rank matrix, \(A\in {\mathbb {Z}}^{m \times n}\), and a vector, \({\textbf{b}}\in {\mathbb {Z}}^m\), the unique least squares solution to \(A{\textbf{x}}= {\textbf{b}}\) can be obtained by solving \(R {\textbf{x}}= Q^T {\textbf{b}}\), where R and Q are the thin REF QR factors of A.

Proof

First, note that the system \(R{\textbf{x}}= Q^T{\textbf{b}}\) has a unique solution because R has a full rank (as it is the REF Cholesky factor of \(A^T A\)).

Second, we show that the solution to \(R{\textbf{x}}= Q^T{\textbf{b}}\) is the exact least squares solution to \(A{\textbf{x}}= {\textbf{b}}\). In particular, we give below a series of equivalent systems that show that \(R{\textbf{x}}= Q^T {\textbf{b}}\) is equivalent to the “normal equations” \(A^T A {\textbf{x}}= A^T {\textbf{b}}\), whose solution is well-known to be the unique least squares solution to \(A{\textbf{x}}= {\textbf{b}}\) [15].

Per Theorem 3.1, using the thin REF QR factorization on A yields \(A=Q D R\) and \(A^T A = R^T D R\); moreover since D is diagonal, \(A^T = R^T D^T Q^T = R^T D Q^T\). Substituting these expressions for \(A^T A\) and \(A^T\) into the normal equations \(A^T A {\textbf{x}}= A^T {\textbf{b}}\), we obtain the equivalent system:

$$\begin{aligned} R^T D R {\textbf{x}}= R^T D Q^T {\textbf{b}}\end{aligned}$$

The proof is completed by left multiplying both sides by \((R^T D)^{-1}\) (note that both R and D have full rank and thus are invertible), thereby yielding:

$$\begin{aligned} R {\textbf{x}}= Q^T {\textbf{b}}\end{aligned}$$

\(\square \)

Corollary 4.1

Given a full column rank matrix \(A\in {\mathbb {Z}}^{m \times n}\) and vector \({\textbf{b}}\in {\mathbb {Z}}^m\), the exact least squares solution to \(A{\textbf{x}}= {\textbf{b}}\) can be obtained by solving \(R(1:n,1:n){\textbf{x}}= [Q^T {\textbf{b}}](1:n)\), where R and Q are the standard REF QR factors of A.

Proof

The proof follows directly from Theorems 3.2 and 4.1. Specifically, Theorem 3.2 states that R and Q contain as submatrices the R and Q from the thin REF QR factorization. Using this, this proof follows directly from Theorem 4.1. \(\square \)

Given the relationship between the QR matrices of the thin and standard QR factorizations used in the above proof and in Theorem 3.2, for simplicity, the remainder of this section uses only the matrices from thin QR factorization.

Theorem 4.1 and Corollary 4.1 establish that the solution to \(R{\textbf{x}}= Q^T {\textbf{b}}\) is the least squares solution to \(A{\textbf{x}}= {\textbf{b}}\). Next, Theorem 4.2 shows how to solve the system \(R{\textbf{x}}= Q^T {\textbf{b}}\) exactly, and thus how to obtain the unique least squares solution to \(A{\textbf{x}}= {\textbf{b}}\) exactly (free of roundoff errors). Specifically, by scaling the right-hand side vector, one can obtain the exact rational solution to \(R{\textbf{x}}= Q^T {\textbf{b}}\) almost entirely in integer arithmetic, using rational numbers only in a final division.

Theorem 4.2

The exact solution to the linear system \(R{\textbf{x}}= Q^T {\textbf{b}}\) can be obtained entirely in integer arithmetic.

Proof

The proof follows from Theorem 4.1, Cramer’s rule, and REF backward substitution [18]. From Theorems 4.1 and 4.1, we know that the systems \(R{\textbf{x}}= Q^T{\textbf{b}}\) and \(A^T A{\textbf{x}}= A^T{\textbf{b}}\) are equivalent. By Cramer’s rule and since the determinant of \(A^T A\) is R(nn) [18], the system \(R{\textbf{x}}= R(n,n)Q^T{\textbf{b}}\) has an integral solution vector. Therefore, applying REF backward substitution [18] to this scaled system, one can obtain its solution entirely in integer arithmetic. Then, the exact solution of the original system is given as an integral numerator vector \({\textbf{x}}\) and integral denominator R(nn). \(\square \)

Note that as a consequence of Theorem 4.2, one can obtain the exact two-norm solution either as the above rational vector or to any level of precision. To conclude this subsection, we note that solving the normal equations directly with Cholesky would have an equivalent worst-case computational complexity to QR factorization. Yet, depending on the application and structure of the matrix, one method can be preferred over the other, and it is not always trivial to determine in advance which method would be better (see, for example, the detailed discussion in [15, 32] comparing both approaches in the floating-point case). Examining the same nuances computationally in the exact case would be an interesting problem, but it is beyond the scope of this paper.

4.2 Rank-deficient linear systems

Given a linear system \(A{\textbf{x}}={\textbf{b}}\) where \(A\in {\mathbb {Z}}^{m \times n}\), \(n<m\), \({\textbf{b}}\in {\mathbb {Z}}^m\), and A is rank-deficient, there are an infinite number of solutions to the linear least squares problem \(A {\textbf{x}}= {\textbf{b}}\). Several approaches exist, including finding the minimum norm solution, the truncated SVD solution, or a basic solution [15]. The first approach relies on performing either a complete orthogonal decomposition or finding a pseudoinverse (see [33, 34]), the second approach is SVD based, and the third approach uses QR factorization. This subsection shows how to use the third approach and REF QR to find an exact basic solution of \(A{\textbf{x}}= {\textbf{b}}\).

The QR-based approach is to perform QR with column pivoting; that is, the factorization \(QR = AP\) is performed, where P is a permutation matrix chosen during factorization (specifically if a rank-deficient column is found—i.e., a column where one cannot find an eligible non-zero element to pivot—, the rank-deficient column is replaced with a different column; this process is repeated until no more linearly independent columns exist). Given a matrix A with rank r such that \(r<n\), without loss of generality, assume that the first r columns of A are linearly independent. Then, the following lemma shows the structure of the thin REF QR factorization of A.

Lemma 4.1

Given a rank-deficient matrix \(A\in {\mathbb {Z}}^{m \times n}\) with rank \(r<n\), the thin REF QR decomposition of A has the following structure:

$$\begin{aligned} A = Q D R = \begin{bmatrix} Q_{11} &{} 0 \\ Q_{21} &{} 0 \end{bmatrix} \begin{bmatrix} D_{11} &{} 0 \\ 0 &{} 0 \end{bmatrix} \begin{bmatrix} R_{11} &{} R_{12} \\ 0 &{} 0 \end{bmatrix} \end{aligned}$$

where \(Q_{11}, D_{11}, R_{11}\) are all of dimension \(r \times r\).

Proof

Consider performing IPGE on the matrix \([A^T A \mid A^T]\). Since A has column rank r, \(A^T A \in {\mathbb {Z}}^{n \times n}\) has rank r and \(A^T\) has row rank r. The result follows trivially from the fact that IPGE on this matrix will lead to a row of \(n-r\) zeros on both \(A^T A\) and \(A^T\), giving the factorization shown above. \(\square \)

Thus, a basic solution is found by solving the system \(R {\textbf{x}}= Q^T {\textbf{b}}\), which expands to:

$$\begin{aligned} \begin{bmatrix} R_{11} &{} R_{12} \\ 0 &{} 0 \end{bmatrix} \begin{bmatrix} {\textbf{x}}_1 \\ {\textbf{x}}_2 \end{bmatrix} = \begin{bmatrix} Q_{11}^T &{} Q_{21}^T \\ 0 &{} 0 \end{bmatrix} \begin{bmatrix} {\textbf{b}}_1 \\ {\textbf{b}}_2 \end{bmatrix} \end{aligned}$$

Thus, a unique exact basic solution is found by setting \({\textbf{x}}_2 = 0\), and solving the square linear system \(R_{11} {\textbf{x}}_1 = Q_{11}^T {\textbf{b}}_1\). Note that Golub & Pareyra [35] show that such a solution minimizes the two-norm if and only if \(R_{12} = 0\). Finding exact two-norm minimizers of rank-deficient systems which have \(R_{12} \ne 0\) is an important topic but requires methods outside the scope of this paper.

5 Conclusion

This paper presents and thoroughly analyzes a comprehensive REF QR factorization framework to obtain exact factorizations of any matrix. Specifically, this paper presents the thin and standard REF QR factorizations, which exactly factorize the matrix A into the product \(A = Q D R\) where Q has pairwise orthogonal columns, D is diagonal, and R is upper trapezoidal. Matrices Q and R comprise exclusively integer entries, while the diagonal matrix D, whose entries are reciprocal of integers, is never explicitly needed (nor computed by the algorithms). Importantly, we derive properties of our REF QR factorizations that are analogous to those of the floating point QR factorizations and illustrate how our exact factorizations are related to the traditional (inexact) floating-point QR factorizations—specifically, we show how to obtain the inexact factorizations from our REF QR factorizations. Furthermore, we present algorithms to compute each factorization and prove that the size of each entry in REF QR (and throughout their computation) is bounded polynomially. Finally, we discuss solving the system \(A {\textbf{x}}= {\textbf{b}}\) for both full-column rank and rank-deficient systems. Specifically, if A has full column rank, REF QR finds the exact least squares solution to \(A {\textbf{x}}= {\textbf{b}}\). Conversely, if A is rank deficient, utilizing column pivoting, REF QR can find an exact basic solution of \(A {\textbf{x}}= {\textbf{b}}\).