Abstract
Many scientific computational problems in various areas of application involve vectors and matrices. Programming languages such as C provide the capabilities for working with the individual elements but not directly with the arrays. Modern Fortran and higher-level languages such as Octave or Matlab and R allow direct manipulation of objects that represent vectors and matrices. The vectors and matrices are arrays of floating-point numbers.
Access provided by CONRICYT-eBooks. Download chapter PDF
Many scientific computational problems in various areas of application involve vectors and matrices. Programming languages such as C provide the capabilities for working with the individual elements but not directly with the arrays. Modern Fortran and higher-level languages such as Octave or Matlab and R allow direct manipulation of objects that represent vectors and matrices. The vectors and matrices are arrays of floating-point numbers.
The distinction between the set of real numbers, \(\mathrm{I\!R}\), and the set of floating-point numbers, \(\mathrm{I\!F}\), that we use in the computer has important implications for numerical computations. As we discussed in Sect. 10.2, beginning on page 483, an element x of a vector or matrix is approximated by a computer number [x]c, and a mathematical operation ∘ is simulated by a computer operation [∘]c. The familiar laws of algebra for the field of the reals do not hold in \(\mathrm{I\!F}\), especially if uncontrolled parallel operations are allowed. These distinctions, of course, carry over to arrays of floating-point numbers that represent real numbers, and the properties of vectors and matrices that we discussed in earlier chapters may not hold for their computer counterparts. For example, the dot product of a nonzero vector with itself is positive (see page 24), but 〈x c, x c〉c = 0 does not imply x c = 0.
A good general reference on the topic of numerical linear algebra is Čížková and Čížek (2012).
1 Computer Storage of Vectors and Matrices
The elements of vectors and matrices are represented as ordinary numeric data, as we described in Sect. 10.1, in either fixed-point or floating-point representation.
1.1 Storage Modes
The elements of vectors and matrices are generally stored in a logically contiguous area of the computer’s memory. What is logically contiguous may not be physically contiguous, however.
Accessing data from memory in a single pipeline may take more computer time than the computations themselves. For this reason, computer memory may be organized into separate modules, or banks, with separate paths to the central processing unit. Logical memory is interleaved through the banks; that is, two consecutive logical memory locations are in separate banks. In order to take maximum advantage of the computing power, it may be necessary to be aware of how many interleaved banks the computer system has.
There are no convenient mappings of computer memory that would allow matrices to be stored in a logical rectangular grid, so matrices are usually stored either as columns strung end-to-end (a “column-major” storage) or as rows strung end-to-end (a “row-major” storage). In using a computer language or a software package, sometimes it is necessary to know which way the matrix is stored. The type of matrix computation to be performed may determine whether a vectorized processor should operate on rows or on columns.
For some software to deal with matrices of varying sizes, the user must specify the length of one dimension of the array containing the matrix. (In general, the user must specify the lengths of all dimensions of the array except one.) In Fortran subroutines, it is common to have an argument specifying the leading dimension (number of rows), and in C functions it is common to have an argument specifying the column dimension. (See the examples in Fig. 12.2 on page 563 and Fig. 12.3 on page 564 for illustrations of the leading dimension argument.)
1.2 Strides
Sometimes in accessing a partition of a given matrix, the elements occur at fixed distances from each other. If the storage is row-major for an n × m matrix, for example, the elements of a given column occur at a fixed distance of m from each other. This distance is called the “stride”, and it is often more efficient to access elements that occur with a fixed stride than it is to access elements randomly scattered.
Just accessing data from the computer’s memory contributes significantly to the time it takes to perform computations. A stride that is not a multiple of the number of banks in an interleaved bank memory organization can measurably increase the computational time in high-performance computing.
1.3 Sparsity
If a matrix has many elements that are zeros, and if the positions of those zeros are easily identified, many operations on the matrix can be speeded up. Matrices with many zero elements are called sparse matrices. They occur often in certain types of problems; for example in the solution of differential equations and in statistical designs of experiments. The first consideration is how to represent the matrix and to store the matrix and the location information. Different software systems may use different schemes to store sparse matrices. The method used in the IMSL Libraries, for example, is described on page 550. An important consideration is how to preserve the sparsity during intermediate computations.
2 General Computational Considerations for Vectors and Matrices
All of the computational methods discussed in Chap. 10 apply to vectors and matrices, but there are some additional general considerations for vectors and matrices.
2.1 Relative Magnitudes of Operands
One common situation that gives rise to numerical errors in computer operations is when a quantity x is transformed to t(x) but the value computed is unchanged:
that is, the operation actually accomplishes nothing. A type of transformation that has this problem is
where | ε | is much smaller than | x |. If all we wish to compute is x + ε, the fact that [x + ε]c = [x]c is probably not important. Usually, of course, this simple computation is part of some larger set of computations in which ε was computed. This, therefore, is the situation we want to anticipate and avoid.
Another type of problem is the addition to x of a computed quantity y that overwhelms x in magnitude. In this case, we may have
Again, this is a situation we want to anticipate and avoid.
2.1.1 Condition
A measure of the worst-case numerical error in numerical computation involving a given mathematical entity is the “condition” of that entity for the particular computations. The condition number of a matrix is the most generally useful such measure. For the matrix A, we denote the condition number as κ(A). We discussed the condition number in Sect. 6.1 and illustrated it in the toy example of equation (6.1). The condition number provides a bound on the relative norms of a “correct” solution to a linear system and a solution to a nearby problem. A specific condition number therefore depends on the norm, and we defined κ 1, κ 2, and κ ∞ condition numbers (and saw that they are generally roughly of the same magnitude). We saw in equation (6.10) that the L2 condition number, κ 2(A), is the ratio of magnitudes of the two extreme eigenvalues of A.
The condition of data depends on the particular computations to be performed. The relative magnitudes of other eigenvalues (or singular values) may be more relevant for some types of computations. Also, we saw in Sect. 10.3.2 that the “stiffness” measure in equation (10.3.2.7) is a more appropriate measure of the extent of the numerical error to be expected in computing variances.
2.1.2 Pivoting
Pivoting, discussed on page 277, is a method for avoiding a situation like that in equation (11.3). In Gaussian elimination, for example, we do an addition, x + y, where the y is the result of having divided some element of the matrix by some other element and x is some other element in the matrix. If the divisor is very small in magnitude, y is large and may overwhelm x as in equation (11.3).
2.1.3 “Modified” and “Classical” Gram-Schmidt Transformations
Another example of how to avoid a situation similar to that in equation (11.1) is the use of the correct form of the Gram-Schmidt transformations.
The orthogonalizing transformations shown in equations (2.56) on page 38 are the basis for Gram-Schmidt transformations of matrices. These transformations in turn are the basis for other computations, such as the QR factorization. (Exercise 5.10 required you to apply Gram-Schmidt transformations to develop a QR factorization.)
As mentioned on page 38, there are two ways we can extend equations (2.56) to more than two vectors, and the method given in Algorithm 2.1 is the correct way to do it. At the k th stage of the Gram-Schmidt method, the vector x k (k) is taken as x k (k−1) and the vectors x k+1 (k), x k+2 (k), …, x m (k) are all made orthogonal to x k (k). After the first stage, all vectors have been transformed. This method is sometimes called “modified Gram-Schmidt” because some people have performed the basic transformations in a different way, so that at the k th iteration, starting at k = 2, the first k − 1 vectors are unchanged (i.e., x i (k) = x i (k−1) for i = 1, 2, …, k − 1), and x k (k) is made orthogonal to the k − 1 previously orthogonalized vectors x 1 (k), x 2 (k), …, x k−1 (k). This method is called “classical Gram-Schmidt” for no particular reason. The “classical” method is not as stable, and should not be used; see Rice (1966) and Björck (1967) for discussions. In this book, “Gram-Schmidt” is the same as what is sometimes called “modified Gram-Schmidt”. In Exercise 11.1, you are asked to experiment with the relative numerical accuracy of the “classical Gram-Schmidt” and the correct Gram-Schmidt. The problems with the former method show up with the simple set of vectors x 1 = (1, ε, ε), x 2 = (1, ε, 0), and x 3 = (1, 0, ε), with ε small enough that
2.2 Iterative Methods
As we saw in Chap. 6, we often have a choice between direct methods (that is, methods that compute a closed-form solution) and iterative methods. Iterative methods are usually to be favored for large, sparse systems.
Iterative methods are based on a sequence of approximations that (it is hoped) converge to the correct solution. The fundamental trade-off in iterative methods is between the amount of work expended in getting a good approximation at each step and the number of steps required for convergence.
2.2.1 Preconditioning
In order to achieve acceptable rates of convergence for iterative algorithms, it is often necessary to precondition the system; that is, to replace the system Ax = b by the system
for some suitable matrix M. As we indicated in Chaps. 6 and 7, the choice of M involves some art, and we will not consider any of the results here. Benzi (2002) provides a useful survey of the general problem and work up to that time, but this is an area of active research.
2.2.2 Restarting and Rescaling
In many iterative methods, not all components of the computations are updated in each iteration. An approximation to a given matrix or vector may be adequate during some sequence of computations without change, but then at some point the approximation is no longer close enough, and a new approximation must be computed. An example of this is in the use of quasi-Newton methods in optimization in which an approximate Hessian is updated, as indicated in equation (4.28) on page 202. We may, for example, just compute an approximation to the Hessian every few iterations, perhaps using second differences, and then use that approximate matrix for a few subsequent iterations.
Another example of the need to restart or to rescale is in the use of fast Givens rotations. As we mentioned on page 241 when we described the fast Givens rotations, the diagonal elements in the accumulated C matrices in the fast Givens rotations can become widely different in absolute values, so to avoid excessive loss of accuracy, it is usually necessary to rescale the elements periodically. Anda and Park (1994, 1996) describe methods of doing the rescaling dynamically. Their methods involve adjusting the first diagonal element by multiplication by the square of the cosine and adjusting the second diagonal element by division by the square of the cosine. Bindel et al. (2002) discuss in detail techniques for performing Givens rotations efficiently while still maintaining accuracy. (The BLAS routines (see Sect. 12.2.1) rotmg and rotm, respectively, set up and apply fast Givens rotations.)
2.2.3 Preservation of Sparsity
In computations involving large sparse systems, we may want to preserve the sparsity, even if that requires using approximations, as discussed in Sect. 5.10.2. Fill-in (when a zero position in a sparse matrix becomes nonzero) would cause loss of the computational and storage efficiencies of software for sparse matrices.
In forming a preconditioner for a sparse matrix A, for example, we may choose a matrix \(M =\widetilde{ L}\widetilde{U}\), where \(\widetilde{L}\) and \(\widetilde{U}\) are approximations to the matrices in an LU decomposition of A, as in equation (5.51). These matrices are constructed as indicated in equation (5.52) so as to have zeros everywhere A has, and \(A \approx \widetilde{ L}\widetilde{U}\). This is called incomplete factorization, and often, instead of an exact factorization, an approximate factorization may be more useful because of computational efficiency.
2.2.4 Iterative Refinement
Even if we are using a direct method, it may be useful to refine the solution by one step computed in extended precision. A method for iterative refinement of a solution of a linear system is given in Algorithm 6.3.
2.3 Assessing Computational Errors
As we discuss in Sect. 10.2.2 on page 485, we measure error by a scalar quantity, either as absolute error, \(\vert \tilde{r} - r\vert\), where r is the true value and \(\tilde{r}\) is the computed or rounded value, or as relative error, \(\vert \tilde{r} - r\vert /r\) (as long as r ≠ 0). We discuss general ways of reducing them in Sect. 10.3.2.
2.3.1 Errors in Vectors and Matrices
The errors in vectors or matrices are generally expressed in terms of norms; for example, the relative error in the representation of the vector v, or as a result of computing v, may be expressed as \(\|\tilde{v} - v\|/\|v\|\) (as long as ∥v∥ ≠ 0), where \(\tilde{v}\) is the computed vector. We often use the notation \(\tilde{v} = v +\delta v\), and so ∥δv∥∕∥v∥ is the relative error. The choice of which vector norm to use may depend on practical considerations about the errors in the individual elements. The L ∞ norm, for example, gives weight only to the element with the largest single error, while the L 1 norm gives weights to all magnitudes equally.
2.3.2 Assessing Errors in Given Computations
In real-life applications, the correct solution is not known, but we would still like to have some way of assessing the accuracy using the data themselves. Sometimes a convenient way to do this in a given problem is to perform internal consistency tests. An internal consistency test may be an assessment of the agreement of various parts of the output. Relationships among the output are exploited to ensure that the individually computed quantities satisfy these relationships. Other internal consistency tests may be performed by comparing the results of the solutions of two problems with a known relationship.
The solution to the linear system Ax = b has a simple relationship to the solution to the linear system Ax = b + ca j , where a j is the j th column of A and c is a constant. A useful check on the accuracy of a computed solution to Ax = b is to compare it with a computed solution to the modified system. Of course, if the expected relationship does not hold, we do not know which solution is incorrect, but it is probably not a good idea to trust either. To test the accuracy of the computed regression coefficients for regressing y on x 1, …, x m , they suggest comparing them to the computed regression coefficients for regressing y + dx j on x 1, …, x m . If the expected relationships do not obtain, the analyst has strong reason to doubt the accuracy of the computations.
Another simple modification of the problem of solving a linear system with a known exact effect is the permutation of the rows or columns. Although this perturbation of the problem does not change the solution, it does sometimes result in a change in the computations, and hence it may result in a different computed solution. This obviously would alert the user to problems in the computations.
A simple internal consistency test that is applicable to many problems is to use two levels of precision in some of the computations. In using this test, one must be careful to make sure that the input data are the same. Rounding of the input data may cause incorrect output to result, but that is not the fault of the computational algorithm.
Internal consistency tests cannot confirm that the results are correct; they can only give an indication that the results are incorrect.
3 Multiplication of Vectors and Matrices
Arithmetic on vectors and matrices involves arithmetic on the individual elements. The arithmetic on the individual elements is performed as we have discussed in Sect. 10.2.
The way the storage of the individual elements is organized is very important for the efficiency of computations. Also, the way the computer memory is organized and the nature of the numerical processors affect the efficiency and may be an important consideration in the design of algorithms for working with vectors and matrices.
The best methods for performing operations on vectors and matrices in the computer may not be the methods that are suggested by the definitions of the operations.
In most numerical computations with vectors and matrices, there is more than one way of performing the operations on the scalar elements. Consider the problem of evaluating the matrix times vector product, c = Ab, where A is n × m. There are two obvious ways of doing this:
-
compute each of the n elements of c, one at a time, as an inner product of m-vectors, c i = a i T b = ∑ j a ij b j , or
-
update the computation of all of the elements of c simultaneously as
-
1.
For i = 1, …, n, let c i (0) = 0.
-
2.
For j = 1, …, m,
{
for i = 1, …, n,
{
let c i (i) = c i (i−1) + a ij b j .
}
}
-
1.
If there are p processors available for parallel processing, we could use a fan-in algorithm (see page 487) to evaluate Ax as a set of inner products:
The order of the computations is nm (or n 2).
Multiplying two matrices A and B can be considered as a problem of multiplying several vectors b i by a matrix A, as described above. In the following we will assume A is n × m and B is m × p, and we will use the notation a i to represent the i th column of A, a i T to represent the i th row of A, b i to represent the i th column of B, c i to represent the i th column of C = AB, and so on. (This notation is somewhat confusing because here we are not using a i T to represent the transpose of a i as we normally do. The notation should be clear in the context of the diagrams below, however.) Using the inner product method above results in the first step of the matrix multiplication forming
Using the second method above, in which the elements of the product vector are updated all at once, results in the first step of the matrix multiplication forming
The next and each successive step in this method are axpy operations:
for k going to m − 1.
Another method for matrix multiplication is to perform axpy operations using all of the elements of b 1 T before completing the computations for any of the columns of C. In this method, the elements of the product are built as the sum of the outer products a i b i T. In the notation used above for the other methods, we have
and the update is
The order of computations for any of these methods is O(nmp), or just O(n 3), if the dimensions are all approximately the same. Strassen’s method, discussed next, reduces the order of the computations.
3.1 Strassen’s Algorithm
Another method for multiplying matrices that can be faster for large matrices is the so-called Strassen algorithm (from Strassen 1969). Suppose A and B are square matrices with equal and even dimensions. Partition them into submatrices of equal size, and consider the block representation of the product,
where all blocks are of equal size. Form
Then we have (see the discussion on partitioned matrices in Sect. 3.1)
Notice that the total number of multiplications is 7 instead of the 8 it would be in forming
directly. Whether the blocks are matrices or scalars, the same analysis holds. Of course, in either case there are more additions. The addition of two k × k matrices is O(k 2), so for a large enough value of n the total number of operations using the Strassen algorithm is less than the number required for performing the multiplication in the usual way.
The partitioning of the matrix factors can also be used recursively; that is, in the formation of the P matrices. If the dimension, n, contains a factor 2e, the algorithm can be used directly e times, and then conventional matrix multiplication can be used on any submatrix of dimension ≤ n∕2e.) If the dimension of the matrices is not even, or if the matrices are not square, it may be worthwhile to pad the matrices with zeros, and then use the Strassen algorithm recursively.
The order of computations of the Strassen algorithm is \(\mathrm{O}(n^{\log _{2}7})\), instead of O(n 3) as in the ordinary method (log27 = 2. 81). The algorithm can be implemented in parallel (see Bailey et al. 1990), and this algorithm is actually used in some software systems.
Several algorithms have been developed that use similar ideas to Strassen’s algorithm and are asymptotically faster; that is, with order of computations O(n k) where k < log27). (Notice that k must be at least 2 because there are n 2 elements.) None of the algorithms that are asymptotically faster than Strassen’s are competitive in practice, however, because they all have much larger start-up costs.
3.2 Matrix Multiplication Using MapReduce
While methods such as Strassen’s algorithm achieve speedup by decreasing the total number of computations, other methods increase the overall speed by performing computations in parallel. Although not all computations can be performed in parallel and there is some overhead in additional computations for setting up the job, when multiple processors are available, the total number of computations may not be very important. One of the major tasks in parallel processing is just keeping track of the individual computations. MapReduce (see page 515) can sometimes be used in coordinating these operations.
For the matrix multiplication AB, in the view that the multiplication is a set of inner products, for i running over the indexes of the rows of A and j running over the indexes of the columns of B, we merely access the i th row of A, a i∗, and the j th column of B, b ∗j , and form the inner product a i∗ T b ∗j as the (i, j)th element of the product AB. In the language of relational databases in which the two matrices are sets of data with row and column identifiers, this amounts to accessing the rows of A and the columns of B one by one, matching the elements of the row and the column so that the column designator of the row element matches the row designator of the column element, summing the product of the A row elements and the B column elements, and then grouping the sums of the products (that is, the inner products) by the A row designators and the B column designators. In SQL, it is
SELECT A.row, B.col SUM(A.value*B.value) FROM A,B WHERE A.col=B.row GROUP BY A.row, B.col;
In a distributed computing environment, MapReduce could be used to perform these operations. However the matrices are stored, possibly each over multiple environments, MapReduce would first map the matrix elements using their respective row and column indices as keys. It would then make the appropriate associations of row element from A with the column elements from B and perform the multiplications and the sum. Finally, the sums of the multiplications (that is, the inner products) would be associated with the appropriate keys for the output. This process is described in many elementary descriptions of Hadoop, such as in Leskovec, Rajaraman, and Ullman (2014) (Chapter 2).
4 Other Matrix Computations
Many other matrix computations depend on a matrix factorization. The most useful factorization is the QR factorization. It can be computed stably using either Householder reflections, Givens rotations, or the Gram-Schmidt procedure, as described respectively in Sects. 5.8.8, 5.8.9, and 5.8.10 (beginning on page 252). This is one time when the computational methods can follow the mathematical descriptions rather closely. Iterations using the QR factorization are used in a variety of matrix computations; for example, they are used in the most common method for evaluating eigenvalues, as described in Sect. 7.4, beginning on page 318.
Another very useful factorization is the singular value decomposition (SVD). The computations for SVD described in Sect. 7.7 beginning on page 322, are efficient and preserve numerical accuracy. A major difference in the QR factorization and the SVD is that the computations for SVD are necessarily iterative (recall the remarks at the beginning of Chap. 7).
4.1 Rank Determination
It is often easy to determine that a matrix is of full rank. If the matrix is not of full rank, however, or if it is very ill-conditioned, it is often difficult to determine its rank. This is because the computations to determine the rank eventually approximate 0. It is difficult to approximate 0; the relative error (if defined) would be either 0 or infinite. The rank-revealing QR factorization (equation (5.43), page 251) is the preferred method for estimating the rank. (Although I refer to this as “estimation”, it more properly should be called “approximation”. “Estimation” and the related term “testing”, as used in statistical applications, apply to an unknown object, as in estimating or testing the rank of a model matrix as discussed in Sect. 9.5.5, beginning on page 433.) When this decomposition is used to estimate the rank, it is recommended that complete pivoting be used in computing the decomposition. The LDU decomposition, described on page 242, can be modified the same way we used the modified QR to estimate the rank of a matrix. Again, it is recommended that complete pivoting be used in computing the decomposition.
The singular value decomposition (SVD) shown in equation (3.276) on page 161 also provides an indication of the rank of the matrix. For the n × m matrix A, the SVD is
where U is an n × n orthogonal matrix, V is an m × m orthogonal matrix, and D is a diagonal matrix of the singular values. The number of nonzero singular values is the rank of the matrix. Of course, again, the question is whether or not the singular values are zero. It is unlikely that the values computed are exactly zero.
A problem related to rank determination is to approximate the matrix A with a matrix A r of rank r ≤ rank(A). The singular value decomposition provides an easy way to do this,
where D r is the same as D, except with zeros replacing all but the r largest singular values. A result of Eckart and Young (1936) guarantees A r is the rank r matrix closest to A as measured by the Frobenius norm,
(see Sect. 3.10). This kind of matrix approximation is the basis for dimension reduction by principal components.
4.2 Computing the Determinant
The determinant of a square matrix can be obtained easily as the product of the diagonal elements of the triangular matrix in any factorization that yields an orthogonal matrix times a triangular matrix. As we have stated before, however, it is not often that the determinant need be computed.
One application in statistics is in optimal experimental designs. The D-optimal criterion, for example, chooses the design matrix, X, such that | X T X | is maximized (see Sect. 9.3.2).
4.3 Computing the Condition Number
The computation of a condition number of a matrix can be quite involved. Clearly, we would not want to use the definition, κ(A) = ∥A∥ ∥A −1∥, directly. Although the choice of the norm affects the condition number, recalling the discussion in Sect. 6.1, we choose whichever condition number is easiest to compute or estimate.
Various methods have been proposed to estimate the condition number using relatively simple computations. Cline et al. (1979) suggest a method that is easy to perform and is widely used. For a given matrix A and some vector v, solve
and then
By tracking the computations in the solution of these systems, Cline et al. conclude that
is approximately equal to, but less than, ∥A −1∥. This estimate is used with respect to the L1 norm in the LINPACK software library (see page 558 and Dongarra et al. 1979), but the approximation is valid for any norm. Solving the two systems above probably does not require much additional work because the original problem was likely to solve Ax = b, and solving a system with multiple right-hand sides can be done efficiently using the solution to one of the right-hand sides. The approximation is better if v is chosen so that ∥x∥ is as large as possible relative to ∥v∥.
Stewart (1980) and Cline and Rew (1983) investigated the validity of the approximation. The LINPACK estimator can underestimate the true condition number considerably, although generally not by an order of magnitude. Cline et al. (1982) give a method of estimating the L2 condition number of a matrix that is a modification of the L1 condition number used in LINPACK. This estimate generally performs better than the L1 estimate, but the Cline/Conn/Van Loan estimator still can have problems (see Bischof 1990).
Hager (1984) gives another method for an L1 condition number. Higham (1988) provides an improvement of Hager’s method, given as Algorithm 11.1 below, which is used in the LAPACK software library (Anderson et al. 2000).
Algorithm 11.1
The Hager/Higham LAPACK condition number estimator γ of the n × n matrix A
Higham (1987) compares Hager’s condition number estimator with that of Cline et al. (1979) and finds that the Hager LAPACK estimator is generally more useful. Higham (1990) gives a survey and comparison of the various ways of estimating and computing condition numbers. You are asked to study the performance of the LAPACK estimate using Monte Carlo methods in Exercise 11.5 on page 538.
Exercises
-
11.1.
Gram-Schmidt orthonormalization.
-
a)
Write a program module (in Fortran, C, R, Octave or Matlab, or whatever language you choose) to implement Gram-Schmidt orthonormalization using Algorithm 2.1. Your program should be for an arbitrary order and for an arbitrary set of linearly independent vectors.
-
b)
Write a program module to implement Gram-Schmidt orthonormalization using equations (2.56) and (2.57).
-
c)
Experiment with your programs. Do they usually give the same results? Try them on a linearly independent set of vectors all of which point “almost” in the same direction. Do you see any difference in the accuracy? Think of some systematic way of forming a set of vectors that point in almost the same direction. One way of doing this would be, for a given x, to form x + εe i for i = 1, …, n − 1, where e i is the i th unit vector and ε is a small positive number. The difference can even be seen in hand computations for n = 3. Take x 1 = (1, 10−6, 10−6), x 2 = (1, 10−6, 0), and x 3 = (1, 0, 10−6).
-
a)
-
11.2.
Given the n × k matrix A and the k-vector b (where n and k are large), consider the problem of evaluating c = Ab. As we have mentioned, there are two obvious ways of doing this: (1) compute each element of c, one at a time, as an inner product c i = a i T b = ∑ j a ij b j , or (2) update the computation of all of the elements of c in the inner loop.
-
a)
What is the order of computation of the two algorithms?
-
b)
Why would the relative efficiencies of these two algorithms be different for different programming languages, such as Fortran and C?
-
c)
Suppose there are p processors available and the fan-in algorithm on page 530 is used to evaluate Ax as a set of inner products. What is the order of time of the algorithm?
-
d)
Give a heuristic explanation of why the computation of the inner products by a fan-in algorithm is likely to have less roundoff error than computing the inner products by a standard serial algorithm. (This does not have anything to do with the parallelism.)
-
e)
Describe how the following approach could be parallelized. (This is the second general algorithm mentioned above.)
$$\displaystyle{\begin{array}{l} \mathrm{for}\;i = 1,\ldots,n\\ \{ \\ \ \ c_{i} = 0 \\ \ \ \mathrm{for}\;j = 1,\ldots,k\\ \ \ \{ \\ \ \ c_{i} = c_{i} + a_{ij}b_{j}\\ \ \ \}\\ \}\\ \end{array} }$$ -
f)
What is the order of time of the algorithms you described?
-
a)
-
11.3.
Consider the problem of evaluating C = AB, where A is n × m and B is m × q. Notice that this multiplication can be viewed as a set of matrix/vector multiplications, so either of the algorithms in Exercise 11.2d above would be applicable. There is, however, another way of performing this multiplication, in which all of the elements of C could be evaluated simultaneously.
-
a)
Write pseudocode for an algorithm in which the nq elements of C could be evaluated simultaneously. Do not be concerned with the parallelization in this part of the question.
-
b)
Now suppose there are nmq processors available. Describe how the matrix multiplication could be accomplished in O(m) steps (where a step may be a multiplication and an addition).
Hint: Use a fan-in algorithm.
-
a)
-
11.4.
Write a Fortran or C program to compute an estimate of the L1 LAPACK condition number γ using Algorithm 11.1 on page 536.
-
11.5.
Design and conduct a Monte Carlo study to assess the performance of the LAPACK estimator of the L1 condition number using your program from Exercise 11.4. Consider a few different sizes of matrices, say 5 × 5, 10 × 10, and 20 × 20, and consider a range of condition numbers, say 10, 104, and 108. In order to assess the accuracy of the condition number estimator, the random matrices in your study must have known condition numbers. It is easy to construct a diagonal matrix with a given condition number. The condition number of the diagonal matrix D, with nonzero elements d 1, …, d n , is max | d i | ∕min | d i |. It is not so clear how to construct a general (square) matrix with a given condition number. The L2 condition number of the matrix UDV, where U and V are orthogonal matrices is the same as the L2 condition number of U. We can therefore construct a wide range of matrices with given L2 condition numbers. In your Monte Carlo study, use matrices with known L2 condition numbers. The next question is what kind of random matrices to generate. Again, make a choice of convenience. Generate random diagonal matrices D, subject to fixed κ(D) = max | d i | ∕min | d i |. Then generate random orthogonal matrices as described in Exercise 4.10 on page 223. Any conclusions made on the basis of a Monte Carlo study, of course, must be restricted to the domain of the sampling of the study. (See Stewart, 1980, for a Monte Carlo study of the performance of the LINPACK condition number estimator.)
References
Abramowitz, Milton, and Irene A. Stegun, eds. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Washington: National Bureau of Standards (NIST). (Reprinted in 1965 by Dover Publications, Inc., New York.)
Alefeld, Göltz, and Jürgen Herzberger. (1983). Introduction to Interval Computation. New York: Academic Press.
Ammann, Larry, and John Van Ness. 1988. A routine for converting regression algorithms into corresponding orthogonal regression algorithms. ACM Transactions on Mathematical Software 14:76–87.
Anda, Andrew A., and Haesun Park. 1994. Fast plane rotations with dynamic scaling. SIAM Journal of Matrix Analysis and Applications 15:162–174.
Anda, Andrew A., and Haesun Park. 1996. Self-scaling fast rotations for stiff least squares problems. Linear Algebra and Its Applications 234:137–162.
Anderson, E., Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenhaum, S. Hammarling, A. McKenney, and D. Sorensen. 2000. LAPACK Users’ Guide, 3rd ed. Philadelphia: Society for Industrial and Applied Mathematics.
Anderson, T. W. 1951. Estimating linear restrictions on regression coefficients for multivariate nomal distributions. Annals of Mthematical Statistics 22:327–351.
Anderson, T. W. 2003. An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: John Wiley and Sons.
ANSI. 1978. American National Standard for Information Systems — Programming Language FORTRAN, Document X3.9-1978. New York: American National Standards Institute.
ANSI. 1989. American National Standard for Information Systems — Programming Language C, Document X3.159-1989. New York: American National Standards Institute.
ANSI. 1992. American National Standard for Information Systems — Programming Language Fortran-90, Document X3.9-1992. New York: American National Standards Institute.
ANSI. 1998. American National Standard for Information Systems — Programming Language C++, Document ISO/IEC 14882-1998. New York: American National Standards Institute.
Atkinson, A. C., and A. N. Donev. 1992. Optimum Experimental Designs. Oxford, United Kingdom: Oxford University Press.
Attaway, Stormy. 2016. Matlab: A Practical Introduction to Programming and Problem Solving, 4th ed. Oxford, United Kingdom: Butterworth-Heinemann.
Bailey, David H. 1993. Algorithm 719: Multiprecision translation and execution of FORTRAN programs. ACM Transactions on Mathematical Software 19:288–319.
Bailey, David H. 1995. A Fortran 90-based multiprecision system. ACM Transactions on Mathematical Software 21:379–387.
Bailey, David H., King Lee, and Horst D. Simon. 1990. Using Strassen’s algorithm to accelerate the solution of linear systems. Journal of Supercomputing 4:358–371.
Bapat, R. B., and T. E. S. Raghavan. 1997. Nonnegative Matrices and Applications. Cambridge, United Kingdom: Cambridge University Press.
Barker, V. A., L. S. Blackford, J. Dongarra, J. Du Croz, S. Hammarling, M. Marinova, J. Wasniewsk, and P. Yalamov. 2001. LAPACK95 Users’ Guide. Philadelphia: Society for Industrial and Applied Mathematics.
Barrett, R., M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. Philadelphia: Society for Industrial and Applied Mathematics.
Basilevsky, A. 1983. Applied Matrix Algebra in the Statistical Sciences. New York: North Holland
Beaton, Albert E., Donald B. Rubin, and John L. Barone. 1976. The acceptability of regression solutions: Another look at computational accuracy. Journal of the American Statistical Association 71:158–168.
Benzi, Michele. 2002. Preconditioning techniques for large linear systems: A survey. Journal of Computational Physics 182:418–477.
Bickel, Peter J., and Joseph A. Yahav. 1988. Richardson extrapolation and the bootstrap. Journal of the American Statistical Association 83:387–393.
Bindel, David, James Demmel, William Kahan, and Osni Marques. 2002. On computing Givens rotations reliably and efficiently. ACM Transactions on Mathematical Software 28:206–238.
Birkhoff, Garrett, and Surender Gulati. 1979. Isotropic distributions of test matrices. Journal of Applied Mathematics and Physics (ZAMP) 30:148–158.
Bischof, Christian H. 1990. Incremental condition estimation. SIAM Journal of Matrix Analysis and Applications 11:312–322.
Bischof, Christian H., and Gregorio Quintana-Ortí. 1998a. Computing rank-revealing QR factorizations. ACM Transactions on Mathematical Software 24:226–253.
Bischof, Christian H., and Gregorio Quintana-Ortí. 1998b. Algorithm 782: Codes for rank-revealing QR factorizations of dense matrices. ACM Transactions on Mathematical Software 24:254–257.
Björck, Åke. 1967. Solving least squares problems by Gram-Schmidt orthogonalization. BIT 7:1–21.
Björck, Åke. 1996. Numerical Methods for Least Squares Problems. Philadelphia: Society for Industrial and Applied Mathematics.
Blackford, L. S., J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997a. ScaLAPACK Users’ Guide. Philadelphia: Society for Industrial and Applied Mathematics.
Blackford, L. S., A. Cleary, A. Petitet, R. C. Whaley, J. Demmel, I. Dhillon, H. Ren, K. Stanley, J. Dongarra, and S. Hammarling. 1997b. Practical experience in the numerical dangers of heterogeneous computing. ACM Transactions on Mathematical Software 23:133–147.
Blackford, L. Susan, Antoine Petitet, Roldan Pozo, Karin Remington, R. Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, Michael Heroux, Linda Kaufman, and Andrew Lumsdaine. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Transactions on Mathematical Software 28:135–151.
Bollobás, Béla. 2013. Modern Graph Theory. New York: Springer-Verlag.
Brown, Peter N., and Homer F. Walker. 1997. GMRES on (nearly) singular systems. SIAM Journal of Matrix Analysis and Applications 18: 37–51.
Bunch, James R., and Linda Kaufman. 1977. Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of Computation 31:163–179.
Buttari, Alfredo, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35:38–53.
Calvetti, Daniela. 1991. Roundoff error for floating point representation of real data. Communications in Statistics 20:2687–2695.
Campbell, S. L., and C. D. Meyer, Jr. 1991. Generalized Inverses of Linear Transformations. New York: Dover Publications, Inc.
Carmeli, Moshe. 1983. Statistical Theory and Random Matrices. New York: Marcel Dekker, Inc.
Chaitin-Chatelin, Françoise, and Valérie Frayssé. 1996. Lectures on Finite Precision Computations. Philadelphia: Society for Industrial and Applied Mathematics.
Chambers, John M. 2016. Extending R. Boca Raton: Chapman and Hall/CRC Press.
Chan, T. F. 1982a. An improved algorithm for computing the singular value decomposition. ACM Transactions on Mathematical Software 8:72–83.
Chan, T. F. 1982b. Algorithm 581: An improved algorithm for computing the singular value decomposition. ACM Transactions on Mathematical Software 8:84–88.
Chan, T. F., G. H. Golub, and R. J. LeVeque. 1982. Updating formulae and a pairwise algorithm for computing sample variances. In Compstat 1982: Proceedings in Computational Statistics, ed. H. Caussinus, P. Ettinger, and R. Tomassone, 30–41. Vienna: Physica-Verlag.
Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. 1983. Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician 37:242–247.
Chapman, Barbara, Gabriele Jost, and Ruud van der Pas. 2007. Using OpenMP: Portable Shared Memory Parallel Programming. Cambridge, Massachusetts: The MIT Press.
Cheng, John, Max Grossman, and Ty McKercher. 2014. Professional CUDA C Programming. New York: Wrox Press, an imprint of John Wiley and Sons.
Chu, Moody T. 1991. Least squares approximation by real normal matrices with specified spectrum. SIAM Journal on Matrix Analysis and Applications 12:115–127.
Čížková, Lenka, and Pavel Čížek. 2012. Numerical linear algebra. In Handbook of Computational Statistics: Concepts and Methods, 2nd revised and updated ed., ed. James E. Gentle, Wolfgang Härdle, and Yuichi Mori, 105–137. Berlin: Springer.
Clerman, Norman, and Walter Spector. 2012. Modern Fortran. Cambridge, United Kingdom: Cambridge University Press.
Cline, Alan K., Andrew R. Conn, and Charles F. Van Loan. 1982. Generalizing the LINPACK condition estimator. In Numerical Analysis, Mexico, 1981, ed. J. P. Hennart, 73–83. Berlin: Springer-Verlag.
Cline, A. K., C. B. Moler, G. W. Stewart, and J. H. Wilkinson. 1979. An estimate for the condition number of a matrix. SIAM Journal of Numerical Analysis 16:368–375.
Cline, A. K., and R. K. Rew. 1983. A set of counter-examples to three condition number estimators. SIAM Journal on Scientific and Statistical Computing 4:602–611.
Cody, W. J. 1988. Algorithm 665: MACHAR: A subroutine to dynamically determine machine parameters. ACM Transactions on Mathematical Software 14:303–329.
Cody, W. J., and Jerome T. Coonen. 1993. Algorithm 722: Functions to support the IEEE standard for binary floating-point arithmetic. ACM Transactions on Mathematical Software 19:443–451.
Coleman, Thomas F., and Charles Van Loan. 1988. Handbook for Matrix Computations. Philadelphia: Society for Industrial and Applied Mathematics.
Cragg, John G., and Stephen G. Donald. 1996. On the asymptotic properties of LDU-based tests of the rank of a matrix. Journal of the American Statistical Association 91:1301–1309.
Cullen, M. R. 1985. Linear Models in Biology. New York: Halsted Press.
Dauger, Dean E., and Viktor K. Decyk. 2005. Plug-and-play cluster computing: High-performance computing for the mainstream. Computing in Science and Engineering 07(2):27–33.
Davies, Philip I., and Nicholas J. Higham. 2000. Numerically stable generation of correlation matrices and their factors. BIT 40:640–651.
Dempster, Arthur P., and Donald B. Rubin. 1983. Rounding error in regression: The appropriateness of Sheppard’s corrections. Journal of the Royal Statistical Society, Series B 39:1–38.
Devlin, Susan J., R. Gnanadesikan, and J. R. Kettenring. 1975. Robust estimation and outlier detection with correlation coefficients. Biometrika 62:531–546.
Dey, Aloke, and Rahul Mukerjee. 1999. Fractional Factorial Plans. New York: John Wiley and Sons.
Dongarra, J. J., J. R. Bunch, C. B. Moler, and G. W. Stewart. 1979. LINPACK Users’ Guide. Philadelphia: Society for Industrial and Applied Mathematics.
Dongarra, J. J., J. DuCroz, S. Hammarling, and I. Duff. 1990. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software 16:1–17.
Dongarra, J. J., J. DuCroz, S. Hammarling, and R. J. Hanson. 1988. An extended set of Fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software 14:1–17.
Dongarra, Jack J., and Victor Eijkhout. 2000. Numerical linear algebra algorithms and software. Journal of Computational and Applied Mathematics 123:489–514.
Draper, Norman R., and Harry Smith. 1998. Applied Regression Analysis, 3rd ed. New York: John Wiley and Sons.
Duff, Iain S., Michael A. Heroux, and Roldan Pozo. 2002. An overview of the sparse basic linear algebra subprograms: the new standard from the BLAS technical forum. ACM Transactions on Mathematical Software 28:239–267.
Duff, Iain S., Michele Marrone, Guideppe Radicati, and Carlo Vittoli. 1997. Level 3 basic linear algebra subprograms for sparse matrices: A user-level interface. ACM Transactions on Mathematical Software 23:379–401.
Duff, Iain S., and Christof Vömel. 2002. Algorithm 818: A reference model implementation of the sparse BLAS in Fortran 95. ACM Transactions on Mathematical Software 28:268–283.
Eckart, Carl, and Gale Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika 1:211–218.
Eddelbuettel, Dirk. 2013. Seamless R and C++ Integration with Rcpp. New York: Springer-Verlag.
Ericksen, Wilhelm S. 1985. Inverse pairs of test matrices. ACM Transactions on Mathematical Software 11:302–304.
Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. Least angle regression. The Annals of Statistics 32:407–499.
Escobar, Luis A., and E. Barry Moser. 1993. A note on the updating of regression estimates. The American Statistician 47:192–194.
Eskow, Elizabeth, and Robert B. Schnabel. 1991. Algorithm 695: Software for a new modified Cholesky factorization. ACM Transactions on Mathematical Software 17:306–312.
Eubank, Randall L., and Ana Kupresanin. 2012. Statistical Computing in C++ and R. Boca Raton: Chapman and Hall/CRC Press.
Fasino, Dario, and Luca Gemignani. 2003. A Lanczos-type algorithm for the QR factorization of Cauchy-like matrices. In Fast Algorithms for Structured Matrices: Theory and Applications, ed. Vadim Olshevsky, 91–104. Providence, Rhode Island: American Mathematical Society.
Filippone, Salvatore, and Michele Colajanni. 2000. PSBLAS: A library for parallel linear algebra computation on sparse matrices. ACM Transactions on Mathematical Software 26:527–550.
Fuller, Wayne A. 1995. Introduction to Statistical Time Series, 2nd ed. New York: John Wiley and Sons.
Galassi, Mark, Jim Davies, James Theiler, Brian Gough, Gerard Jungman, Michael Booth, and Fabrice Rossi. 2002. GNU Scientific Library Reference Manual, 2nd ed. Bristol, United Kingdom: Network Theory Limited.
Gandrud, Christopher. 2015. Reproducible Research with R and R Studio, 2nd ed. Boca Raton: Chapman and Hall/CRC Press.
Gantmacher, F. R. 1959. The Theory of Matrices, Volumes I and II, translated by K. A. Hirsch, Chelsea, New York.
Geist, Al, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and Vaidy Sunderam. 1994. PVM. Parallel Virtual Machine. A Users’ Guide and Tutorial for Networked Parallel Computing. Cambridge, Massachusetts: The MIT Press.
Gentle, James E. 2003. Random Number Generation and Monte Carlo Methods, 2nd ed. New York: Springer-Verlag.
Gentle, James E. 2009. Computational Statistics. New York: Springer-Verlag.
Gentleman, W. M. 1974. Algorithm AS 75: Basic procedures for large, sparse or weighted linear least squares problems. Applied Statistics 23:448–454.
Gill, Len, and Arthur Lewbel. 1992. Testing the rank and definiteness of estimated matrices with applications to factor, state-space and ARMA models. Journal of the American Statistical Association 87:766–776.
Golub, G., and W. Kahan. 1965. Calculating the singular values and pseudo-inverse of a matrix. SIAM Journal of Numerical Analysis, Series B 2:205–224.
Golub, G. H., and C. Reinsch. 1970. Singular value decomposition and least squares solutions. Numerische Mathematik 14:403–420.
Golub, G. H., and C. F. Van Loan. 1980. An analysis of the total least squares problem. SIAM Journal of Numerical Analysis 17:883–893.
Golub, Gene H., and Charles F. Van Loan. 1996. Matrix Computations, 3rd ed. Baltimore: The Johns Hopkins Press.
Graybill, Franklin A. 1983. Introduction to Matrices with Applications in Statistics, 2nd ed. Belmont, California: Wadsworth Publishing Company.
Greenbaum, Anne, and Zdeněk Strakoš. 1992. Predicting the behavior of finite precision Lanczos and conjugate gradient computations. SIAM Journal for Matrix Analysis and Applications 13:121–137.
Gregory, Robert T., and David L. Karney. 1969. A Collection of Matrices for Testing Computational Algorithms. New York: John Wiley and Sons.
Gregory, R. T., and E. V. Krishnamurthy. 1984. Methods and Applications of Error-Free Computation. New York: Springer-Verlag.
Grewal, Mohinder S., and Angus P. Andrews. 1993. Kalman Filtering Theory and Practice. Englewood Cliffs, New Jersey: Prentice-Hall.
Griva, Igor, Stephen G. Nash, and Ariela Sofer. 2009. Linear and Nonlinear Optimization, 2nd ed. Philadelphia: Society for Industrial and Applied Mathematics.
Gropp, William D. 2005. Issues in accurate and reliable use of parallel computing in numerical programs. In Accuracy and Reliability in Scientific Computing, ed. Bo Einarsson, 253–263. Philadelphia: Society for Industrial and Applied Mathematics.
Gropp, William, Ewing Lusk, and Anthony Skjellum. 2014. Using MPI: Portable Parallel Programming with the Message-Passing Interface, 3rd ed. Cambridge, Massachusetts: The MIT Press.
Gropp, William, Ewing Lusk, and Thomas Sterling (Editors). 2003. Beowulf Cluster Computing with Linux, 2nd ed. Cambridge, Massachusetts: The MIT Press.
Haag, J. B., and D. S. Watkins. 1993. QR-like algorithms for the nonsymmetric eigenvalue problem. ACM Transactions on Mathematical Software 19:407–418.
Hager, W. W. 1984. Condition estimates. SIAM Journal on Scientific and Statistical Computing 5:311–316.
Hanson, Richard J., and Tim Hopkins. 2013. Numerical Computing with Modern Fortran. Philadelphia: Society for Industrial and Applied Mathematics.
Harville, David A. 1997. Matrix Algebra from a Statistician’s Point of View. New York: Springer-Verlag.
Heath, M. T., E. Ng, and B. W. Peyton. 1991. Parallel algorithms for sparse linear systems. SIAM Review 33:420–460.
Hedayat, A. S., N. J. A. Sloane, and John Stufken. 1999. Orthogonal Arrays: Theory and Applications. New York: Springer-Verlag.
Heiberger, Richard M. 1978. Algorithm AS127: Generation of random orthogonal matrices. Applied Statistics 27:199–205.
Heroux, Michael A. 2015. Editorial: ACM TOMS replicated computational results initiative. ACM Transactions on Mathematical Software 41:Article No. 13.
Higham, Nicholas J. 1987. A survey of condition number estimation for triangular matrices. SIAM Review 29:575–596.
Higham, Nicholas J. 1988. FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation. ACM Transactions on Mathematical Software 14:381–386.
Higham, Nicholas J. 1990. Experience with a matrix norm estimator. SIAM Journal on Scientific and Statistical Computing 11:804–809.
Higham, Nicholas J. 1991. Algorithm 694: A collection of test matrices in Matlab. ACM Transactions on Mathematical Software 17:289–305.
Higham, Nicholas J. 1997. Stability of the diagonal pivoting method with partial pivoting. SIAM Journal of Matrix Analysis and Applications 18:52–65.
Higham, Nicholas J. 2002. Accuracy and Stability of Numerical Algorithms, 2nd ed. Philadelphia: Society for Industrial and Applied Mathematics.
Higham, Nicholas J. 2008. Functions of Matrices. Theory and Computation. Philadelphia: Society for Industrial and Applied Mathematics.
Hill, Francis S., Jr., and Stephen M Kelley. 2006. Computer Graphics Using OpenGL, 3rd ed. New York: Pearson Education.
Hoffman, A. J., and H. W. Wielandt. 1953. The variation of the spectrum of a normal matrix. Duke Mathematical Journal 20:37–39.
Hong, H. P., and C. T. Pan. 1992. Rank-revealing QR factorization and SVD. Mathematics of Computation 58:213–232.
Horn, Roger A., and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press.
IEEE. 2008. IEEE Standard for Floating-Point Arithmetic, Std 754-2008. New York: IEEE, Inc.
Jansen, Paul, and Peter Weidner. 1986. High-accuracy arithmetic software — some tests of the ACRITH problem-solving routines. ACM Transactions on Mathematical Software 12:62–70.
Jaulin, Luc, Michel Kieffer, Olivier Didrit, and Eric Walter. (2001). Applied Interval Analysis. New York: Springer.
Jolliffe, I. T. 2002. Principal Component Analysis, 2nd ed. New York: Springer-Verlag.
Karau, Holden, Andy Konwinski, Patrick Wendell, and Matei Zaharia. 2015. Learning Spark. Sabastopol, California: O’Reilly Media, Inc.
Kearfott, R. Baker. 1996. Interval_arithmetic: A Fortran 90 module for an interval data type. ACM Transactions on Mathematical Software 22:385–392.
Kearfott, R. Baker, and Vladik Kreinovich (Editors). 1996. Applications of Interval Computations. Netherlands: Kluwer, Dordrecht.
Kearfott, R. B., M. Dawande, K. Du, and C. Hu. 1994. Algorithm 737: INTLIB: A portable Fortran 77 interval standard-function library. ACM Transactions on Mathematical Software 20:447–459.
Keller-McNulty, Sallie, and W. J. Kennedy. 1986. An error-free generalized matrix inversion and linear least squares method based on bordering. Communications in Statistics — Simulation and Computation 15:769–785.
Kennedy, William J., and James E. Gentle. 1980. Statistical Computing. New York: Marcel Dekker, Inc.
Kenney, C. S., and A. J. Laub. 1994. Small-sample statistical condition estimates for general matrix functions. SIAM Journal on Scientific Computing 15:191–209.
Kenney, C. S., A. J. Laub, and M. S. Reese. 1998. Statistical condition estimation for linear systems. SIAM Journal on Scientific Computing 19:566–583.
Kim, Hyunsoo, and Haesun Park. 2008. Nonnegative matrix factorization based on alternating non-negativity-constrained least squares and the active set method. SIAM Journal on Matrix Analysis and Applications 30:713–730.
Kleibergen, Frank, and Richard Paap. 2006. Generalized reduced rank tests using the singular value decomposition. Journal of Econometrics 133:97–126.
Kollo, Tõnu, and Dietrich von Rosen. 2005. Advanced Multivariate Statistics with Matrices. Amsterdam: Springer.
Kshemkalyani, Ajay D., and Mukesh Singhal. 2011. Distributed Computing: Principles, Algorithms, and Systems. Cambridge, United Kingdom: Cambridge University Press.
Kulisch, Ulrich. 2011. Very fast and exact accumulation of products. Computing 91:397–405.
Lawson, C. L., R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software 5:308–323.
Lee, Daniel D., and H. Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 556–562. Cambridge, Massachusetts: The MIT Press.
Lemmon, David R., and Joseph L. Schafer. 2005. Developing Statistical Software in Fortran 95. New York: Springer-Verlag.
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets, 2nd ed. Cambridge, United Kingdom: Cambridge University Press.
Levesque, John, and Gene Wagenbreth. 2010. High Performance Computing: Programming and Applications. Boca Raton: Chapman and Hall/CRC Press.
Liem, C. B., T. Lü, and T. M. Shih. 1995. The Splitting Extrapolation Method. Singapore: World Scientific.
Linnainmaa, Seppo. 1975. Towards accurate statistical estimation of rounding errors in floating-point computations. BIT 15:165–173.
Liu, Shuangzhe and Heinz Neudecker. 1996. Several matrix Kantorovich-type inequalities. Journal of Mathematical Analysis and Applications 197:23–26.
Loader, Catherine. 2012. Smoothing: Local regression techniques. In Handbook of Computational Statistics: Concepts and Methods, 2nd revised and updated ed., ed. James E. Gentle, Wolfgang Härdle, and Yuichi Mori, 571–596. Berlin: Springer.
Longley, James W. 1967. An appraisal of least squares problems for the electronic computer from the point of view of the user. Journal of the American Statistical Association 62:819–841.
Luk, F. T., and H. Park. 1989. On parallel Jacobi orderings. SIAM Journal on Scientific and Statistical Computing 10:18–26.
Magnus, Jan R., and Heinz Neudecker. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics, revised ed. New York: John Wiley and Sons.
Markus, Arjen. 2012. Modern Fortran in Practice. Cambridge, United Kingdom: Cambridge University Press.
Marshall, A. W., and I. Olkin. 1990. Matrix versions of the Cauchy and Kantorovich inequalities. Aequationes Mathematicae 40:89–93.
Metcalf, Michael, John Reid, and Malcolm Cohen. 2011. Modern Fortran Explained. Oxford, United Kingdom: Oxford University Press.
Meyn, Sean, and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability, 2nd ed. Cambridge, United Kingdom: Cambridge University Press.
Miller, Alan J. 1992. Algorithm AS 274: Least squares routines to supplement those of Gentleman. Applied Statistics 41:458–478 (Corrections, 1994, ibid. 43:678).
Miller, Alan. 2002. Subset Selection in Regression, 2nd ed. Boca Raton: Chapman and Hall/CRC Press.
Miller, Alan J., and Nam-Ky Nguyen. 1994. A Fedorov exchange algorithm for D-optimal design. Applied Statistics 43:669–678.
Mizuta, Masahiro. 2012. Dimension reduction methods. In Handbook of Computational Statistics: Concepts and Methods, 2nd revised and updated ed., ed. James E. Gentle, Wolfgang Härdle, and Yuichi Mori, 619–644. Berlin: Springer.
Moore, E. H. 1920. On the reciprocal of the general algebraic matrix. Bulletin of the American Mathematical Society 26:394–395.
Moore, Ramon E. (1979). Methods and Applications of Interval Analysis. Philadelphia: Society for Industrial and Applied Mathematics.
Mosteller, Frederick, and David L. Wallace. 1963. Inference in an authorship problem. Journal of the American Statistical Association 58:275–309.
Muirhead, Robb J. 1982. Aspects of Multivariate Statistical Theory. New York: John Wiley and Sons.
Mullet, Gary M., and Tracy W. Murray. 1971. A new method for examining rounding error in least-squares regression computer programs. Journal of the American Statistical Association 66:496–498.
Nachbin, Leopoldo. 1965. The Haar Integral, translated by Lulu Bechtolsheim. Princeton, New Jersey: D. Van Nostrand Co Inc.
Nakano, Junji. 2012. Parallel computing techniques. In Handbook of Computational Statistics: Concepts and Methods, 2nd revised and updated ed., ed. James E. Gentle, Wolfgang Härdle, and Yuichi Mori, 243–272. Berlin: Springer.
Nguyen, Nam-Ky, and Alan J. Miller. 1992. A review of some exchange algorithms for constructing D-optimal designs. Computational Statistics and Data Analysis 14:489–498.
Olshevsky, Vadim (Editor). 2003. Fast Algorithms for Structured Matrices: Theory and Applications. Providence, Rhode Island: American Mathematical Society.
Olver, Frank W. J., Daniel w. Lozier, Ronald F. Boisvert, and Charles W. Clark. 2010. NIST Handbook of Mathematical Functions. Cambridge: Cambridge University Press.
Overton, Michael L. 2001. Numerical Computing with IEEE Floating Point Arithmetic. Philadelphia: Society for Industrial and Applied Mathematics.
Parsian, Mahmoud. 2015. Data Algorithms. Sabastopol, California: O’Reilly Media, Inc.
Penrose, R. 1955. A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society 51:406–413.
Quinn, Michael J. 2003. Parallel Programming in C with MPI and OpenMP. New York: McGraw-Hill.
Rice, John R. 1966. Experiments on Gram-Schmidt orthogonalization. Mathematics of Computation 20:325–328.
Rice, John R. 1993. Numerical Methods, Software, and Analysis, 2nd ed. New York: McGraw-Hill Book Company.
Robin, J. M., and R. J. Smith. 2000. Tests of rank. Econometric Theory 16:151–175.
Roosta, Seyed H. 2000. Parallel Processing and Parallel Algorithms: Theory and Computation. New York: Springer-Verlag.
Rousseeuw, Peter J., and Geert Molenberghs. 1993. Transformation of nonpositive semidefinite correlation matrices. Communications in Statistics — Theory and Methods 22:965–984.
Rust, Bert W. 1994. Perturbation bounds for linear regression problems. Computing Science and Statistics 26:528–532.
Saad, Y., and M. H. Schultz. 1986. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing 7:856–869.
Schott, James R. 2004. Matrix Analysis for Statistics, 2nd ed. New York: John Wiley and Sons.
Searle, S. R. 1971. Linear Models. New York: John Wiley and Sons.
Searle, Shayle R. 1982. Matrix Algebra Useful for Statistics. New York: John Wiley and Sons.
Shao, Jun. 2003. Mathematical Statistics, 2nd ed. New York: Springer-Verlag.
Sherman, J., and W. J. Morrison. 1950. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Annals of Mathematical Statistics 21:124–127.
Siek, Jeremy, and Andrew Lumsdaine. 2000. A modern framework for portable high-performance numerical linear algebra. In Advances in Software Tools for Scientific Computing, ed. Are Bruaset, H. Langtangen, and E. Quak, 1–56. New York: Springer-Verlag.
Skeel, R. D. 1980. Iterative refinement implies numerical stability for Gaussian elimination. Mathematics of Computation 35:817–832.
Smith, B. T., J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler. 1976. Matrix Eigensystem Routines — EISPACK Guide. Berlin: Springer-Verlag.
Stallings, W. T., and T. L. Boullion. 1972. Computation of pseudo-inverse using residue arithmetic. SIAM Review 14:152–163.
Stewart, G. W. 1980. The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM Journal of Numerical Analysis 17:403–409.
Stewart, G. W. 1990. Stochastic perturbation theory. SIAM Review 32:579–610.
Stodden, Victoria, Friedrich Leisch, and Roger D. Peng. 2014. Implementing Reproducible Research. Boca Raton: Chapman and Hall/CRC Press.
Strang, Gilbert, and Tri Nguyen. 2004. The interplay of ranks of submatrices. SIAM Review 46:637–646.
Strassen, V. 1969, Gaussian elimination is not optimal. Numerische Mathematik 13:354–356.
Szabó, S., and R. Tanaka. 1967. Residue Arithmetic and Its Application to Computer Technology. New York: McGraw-Hill.
Tanner, M. A., and R. A. Thisted. 1982. A remark on AS127. Generation of random orthogonal matrices. Applied Statistics 31:190–192.
Titterington, D. M. 1975. Optimal design: Some geometrical aspects of D-optimality. Biometrika 62:313–320.
Trefethen, Lloyd N., and Mark Embree. 2005. Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton: Princeton University Press.
Trefethen, Lloyd N., and David Bau III. 1997. Numerical Linear Algebra. Philadelphia: Society for Industrial and Applied Mathematics.
Trosset, Michael W. 2002. Extensions of classical multidimensional scaling via variable reduction. Computational Statistics 17:147–163.
Unicode Consortium. 1990. The Unicode Standard, Worldwide Character Encoding, Version 1.0, Volume 1. Reading, Massachusetts: Addison-Wesley Publishing Company.
Unicode Consortium. 1992. The Unicode Standard, Worldwide Character Encoding, Version 1.0, Volume 2. Reading, Massachusetts: Addison-Wesley Publishing Company.
Vandenberghe, Lieven, and Stephen Boyd. 1996. Semidefinite programming. SIAM Review 38:49–95.
Venables, W. N., and B. D. Ripley. 2003. Modern Applied Statistics with S, 4th ed. New York: Springer-Verlag.
Walker, Homer F. 1988. Implementation of the GMRES method using Householder transformations. SIAM Journal on Scientific and Statistical Computing 9:152–163.
Walker, Homer F., and Lu Zhou. 1994. A simpler GMRES. Numerical Linear Algebra with Applications 1:571–581.
Walster, G. William. 1996. Stimulating hardware and software support for interval arithmetic. In Applications of Interval Computations, ed. R. Baker Kearfott and Vladik Kreinovich, 405–416. Dordrecht, Netherlands: Kluwer.
Walster, G. William. 2005. The use and implementation of interval data types. In Accuracy and Reliability in Scientific Computing, ed. Bo Einarsson, 173–194. Philadelphia: Society for Industrial and Applied Mathematics.
Watkins, David S. 2002. Fundamentals of Matrix Computations, 2nd ed. New York: John Wiley and Sons.
White, Tom. 2015. Hadoop: The Definitive Guide, 4th ed. Sabastopol, California: O’Reilly Media, Inc.
Wickham, Hadley. 2015) Advanced R. Boca Raton: Chapman and Hall/CRC Press.
Wilkinson, J. H. 1959. The evaluation of the zeros of ill-conditioned polynomials. Numerische Mathematik 1:150–180.
Wilkinson, J. H. 1963. Rounding Errors in Algebraic Processes. Englewood Cliffs, New Jersey: Prentice-Hall. (Reprinted by Dover Publications, Inc., New York, 1994).
Wilkinson, J. H. 1965. The Algebraic Eigenvalue Problem. New York: Oxford University Press.
Woodbury, M. A. 1950. “Inverting Modified Matrices”, Memorandum Report 42, Statistical Research Group, Princeton University.
Wynn, P. 1962. Acceleration techniques for iterated vector and matrix problems. Mathematics of Computation 16:301–322.
Xie, Yihui. 2015. Dynamic Documents with R and knitr, 2nd ed. Boca Raton: Chapman and Hall/CRC Press.
Zhou, Bing Bing, and Richard P. Brent. 2003. An efficient method for computing eigenvalues of a real normal matrix. Journal of Parallel and Distributed Computing 63:638–648.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Gentle, J.E. (2017). Numerical Linear Algebra. In: Matrix Algebra. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-64867-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-64867-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64866-8
Online ISBN: 978-3-319-64867-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)