1 Introduction

Background. The congested clique model is a model in distributed computing that has recently received increasing attention [2, 611, 1719, 22, 23]. In this model n nodes communicate with each other over a fully-connected network (i.e., a clique) by exchanging messages of size \(O(\log n)\) in synchronous rounds. Compared with the more traditional congested model [24], the congested clique model removes the effect of distances in the computation and thus focuses solely on understanding the role of congestion in distributed computing.

Typical computational tasks studied in the congested clique model are graph-theoretic problems [2, 68, 11, 22], where a graph G on n vertices is initially distributed among the n nodes of the network (the \(\ell \)-th node of the network knows the set of vertices adjacent to the \(\ell \)-th vertex of the graph, and the weights of the corresponding edges if the graph is weighted) and the nodes want to compute properties of G. Besides their theoretical interest and potential applications, such problems have the following natural interpretation in the congested clique model: the graph G represents the actual topology of the network, each node knows only its neighbors but can communicate to all the nodes of the network, and the nodes want to learn information about the topology of the network.

Censor-Hillel et al. [2] recently developed algorithms for several graph-theoretic problems in the congested clique model by showing how to implement centralized algebraic algorithms for matrix multiplication in this model. More precisely, they constructed a \(O(n^{1-2/\omega })\)-round algorithm for matrix multiplication, where \(\omega \) denotes the exponent of matrix multiplication (the best known upper bound on \(\omega \) is \(\omega <2.3729\), obtained in [16, 29], which gives exponent \(1-2/\omega <0.1572\) in the congested clique model), improving over the \(O(n^{2-\omega })\) algorithm mentioned in [7], in the following setting: given two \(n\times n\) matrices A and B over a field, the \(\ell \)-th node of the network initially owns the \(\ell \)-th row of A and the \(\ell \)-column of B, and needs to output the \(\ell \)-th row and the \(\ell \)-column of the product AB. Censor-Hillel et al. consequently obtained \(O(n^{1-2/\omega })\)-round algorithms for several graph-theoretic tasks that reduce to computing the powers of (some variant of) the adjacency matrix of the graph, such as counting the number of triangles in a graph (which lead to an improvement over the prior best algorithms for this task [6, 7]), detecting the existence of a constant-length cycle and approximating the all-pairs shortest paths in the input graph (improving the round complexity obtained in [22]). One of the main advantages of such an algebraic approach in the congested clique model is its versatility: it makes possible to construct fast algorithms for graph-theoretic problems, and especially for problems for which the best non-algebraic centralized algorithm is highly sequential and does not seem to be implementable efficiently in the congested clique model, simply by showing a reduction to matrix multiplication (and naturally also showing that this reduction can be implemented efficiently in the congested clique model).

Our results. In this paper we develop additional algebraic tools for the congested clique model.

We first consider the task of computing in the congested clique model not only one matrix product, but multiple independent matrix products. More precisely, given k matrices \(A_1,\ldots ,A_k\) each of size \(n\times m\) and k matrices \(B_1,\ldots ,B_k\) each of size \(m\times m\), initially evenly distributed among the n nodes of the network, the nodes want to compute the k matrix products \(A_1B_1,\ldots ,A_kB_k\). Prior works [2, 7] considered only the case \(k=1\) and \(m=n\), i.e., one product of two square matrices. Our contribution is thus twofold: we consider the rectangular case, and the case of several matrix products as well. Let us first discuss our results for square matrices (\(m=n\)). By using sequentially k times the matrix multiplication algorithm from [2], k matrix products can naturally be computed in \(O(k n^{1-2/\omega })\) rounds. In this work we show that we can actually do better.

Theorem 1

(Simplified version). In the congested clique model k independent products of pairs of \(n\times n\) matrices can be computed with round complexity

$$ \left\{ \begin{array}{ll} O(k^{2/\omega }n^{1-2/\omega })&{} { if }\,\, 1\le k< n,\\ O(k) &{} { if }\,\,k\ge n. \end{array} \right. $$

This generalization of the results from [2] follows from a simple strategy: divide the n nodes of the network into k blocks (when \(k\le n\)), each containing roughly n / k nodes, compute one of the k matrix products per block by using an approach similar to [2] (i.e., a distributed version of the best centralized algorithm computing one instance of square matrix multiplication), and finally distribute the relevant part of the k output matrices to all the nodes of the network. Analyzing the resulting protocol shows that the dependence in k in the overall round complexity is reduced to \(k^{2/\omega }\). This sublinear dependence in k has a significant number of implications (see below).

The complete version of Theorem 1, given in Sect. 3, also considers the general case where the matrices may not be square (i.e., the case \(m\ne n\)), which will be crucial for some of our applications to the All-Pairs Shortest Path problem. The proof becomes more technical than for the square case, but is conceptually very similar: the main modification is simply to now implement a distributed version of the best centralized algorithm for rectangular matrix multiplication. The upper bounds obtained on the round complexity depend on the complexity of the best centralized algorithms for rectangular matrix multiplication (in particular the upper bounds given in [15]). While the major open problem is still whether the product of two square matrices can be computed in a constant (or nearly constant) number of rounds, our results show that for \(m=O(n^{0.651\ldots })\), the product of an \(n\times m\) matrix by an \(m\times n\) matrix can indeed be computed in \(O(n^\epsilon )\) rounds for any \(\epsilon >0\). We also show lower bounds on the round complexity of the general case (Proposition 1 in Sect. 3), which are tight for most values of k and m, based on simple arguments from communication complexity.

We then study the following basic problems in linear algebra: computing the determinant, the rank or the inverse of an \(n\times n\) matrix over a finite field \(\mathbb {F}\) of order upper bounded by a polynomial of n, and solving a system of n linear equations and n variables. We call these problems \(\mathsf {DET}(n,\mathbb {F})\), \(\mathsf {Rank}(n,\mathbb {F})\), \(\mathsf {INV}(n,\mathbb {F})\) and \(\mathsf {SYS}(n,\mathbb {F})\), respectively (the formal definitions are given in Sect. 2). While it is known that in the centralized setting these problems can be solved with essentially the same time complexity as matrix multiplication [1], these reductions are typically sequential and do not work in a parallel setting. In this paper we design fast deterministic and randomized algorithm for these four basis tasks, and obtain the following results.

Theorem 2

Assume that \(\mathbb {F}\) has characteristic greater than n. In the congested clique model, the deterministic round complexity of \(\mathsf {DET}(n,\mathbb {F})\) and \(\mathsf {INV}(n,\mathbb {F})\) is \(O(n^{1-1/\omega })\).

Theorem 3

Assume that \(\mathbb {F}\) has order \(|\mathbb {F}|=\varOmega (n^2\log n)\). In the congested clique model, the randomized round complexity of \(\mathsf {DET}(n,\mathbb {F})\), \(\mathsf {SYS}(n,\mathbb {F})\) and \(\mathsf {Rank}(n,\mathbb {F})\) is \(O(n^{1-2/\omega }\log n)\).

The upper bounds of Theorems 2 and 3 are \(O(n^{0.5786})\) and \(O(n^{0.1572})\), respectively, by basing our implementation on the asymptotically fastest (but impractical) centralized algorithm for matrix multiplication corresponding to the upper bound \(\omega <2.3729\). These bounds are \(O(n^{2/3})\) and \( O(n^{1/3}\log n)\), respectively, by basing our implementation on the trivial (but practical) centralized algorithm for matrix multiplication (corresponding to the bound \(\omega \le 3\)). These algorithms are obtained by carefully adapting to the congested clique model the relevant known parallel algorithms [5, 1214, 25] for linear algebra, and using our efficient algorithm for computing multiple matrix products (Theorem 1) as a subroutine. An interesting open question is whether \(\mathsf {INV}(n,\mathbb {F})\) can be solved with the same (randomized) round complexity as the other tasks. This problem may very well be more difficult; in the parallel setting in particular, to the best of our knowledge, whether matrix inversion can be done with the same complexity as these other tasks is also an open problem.

Applications of our results. The above results give new algorithms for many graph-theoretic problems in the congested clique model, as described below and summarized in Table 1.

Table 1. Summary of the applications of our algebraic techniques to graph-theoretic problems in the congested clique model. Here n both represents the number of vertices in the input graph and the number of nodes in the network.

Our main key tool to derive these applications is Theorem 7 in Sect. 3, which gives an algorithm computing efficiently the distance product (defined in Sect. 2) of two matrices with small integer entries based on our algorithm for multiple matrix multiplication of Theorem 1. Computing the distance product is a fundamental graph-theoretic task deeply related to the All-Pairs Shortest Path (APSP) problem [27, 28, 30]. Combining this result with techniques from [28], and observing that these techniques can be implemented efficiently in the congested clique model, we then almost immediately obtain the following result.

Theorem 4

In the congested clique model, the deterministic round complexity of the all-pairs shortest paths problem in an undirected graph of n vertices with integer weights in \(\{0,\ldots ,M\}\), where M is an integer such that \(M\le n\), is \(\tilde{O}(M^{2/\omega }n^{1-2/\omega })\).

Since computing the diameter of a graph reduces to solving the all-pairs shortest paths, we obtain the same round complexity for diameter computation in the same class of graphs. This improves over the \(\tilde{O}(Mn^{1-2/\omega })\)-round algorithm for these tasks (implicitly) given in [2]. The main application of our results nevertheless concerns the all-pair shortest paths problem over directed graphs (for which the approach based on [28] does not work) with constant weights. We obtain the following result by combining our algorithm for distance product computation with Zwick’s approach [30].

Theorem 5

In the congested clique model, the randomized round complexity of the all-pairs shortest paths problem in a directed graph of n vertices with integer weights in \(\{-M,\ldots ,0,\ldots ,M\}\), where \(M=O(1)\), is \(O(n^{0.2096})\).

Prior to this work, the upper bound for the round complexity of this problem was \(\tilde{O}(n^{1/3})\), obtained by directly computing the distance product (as done in [2]) in the congested clique model. Again, Theorem 5 follows easily from Theorem 7 and the observation that the reduction to distance product computation given in [30] can be implemented efficiently in the congested clique model. The exponent 0.2096 in the statement of Theorem 5 is derived from the current best upper bounds on the complexity of rectangular matrix multiplication in the centralized setting [15].

Theorems 2 and 3 also enable us to solve a multitude of graph-theoretic problems in the congested clique model with a sublinear number of rounds. Examples described in this paper are computing the number of edges in a maximum matching of a simple graph with \(O(n^{1-2/\omega }\log n)\) rounds, computing the set of allowed edges in a perfect matching, the Gallai-Edmonds decomposition of a simple graph, and a minimum vertex cover in a bipartite graph with \(O(n^{1-1/\omega })\) rounds. These results are obtained almost immediately from the appropriate reductions to matrix inversion and similar problems known the centralized setting [3, 20, 26] — indeed it is not hard to adapt all these reductions so that they can be implemented efficiently in the congested clique model. Note that while non-algebraic centralized algorithms solving these problems also exist (see, e.g., [21]), they are typically sequential and do not appear to be efficiently implementable in the congested clique model. The algebraic approach developed in this paper, made possible by our algorithms for the computation of the determinant, the rank and the inverse of matrix, appears to be currently the only way of obtaining fast algorithms for these problems in the congested clique model.

Remarks on the organization of the paper. Due to space constraints, most of the technical proofs are not included, but can be found in the full version of the present paper. The discussion of randomized algorithms for the determinant (Theorem 3) is also omitted from this version. The whole discussion detailing the applications of our algebraic methods is omitted as well, with the exception of the statement of Theorem 7 given in Sect. 3.

2 Preliminaries

Notations.Through this paper we will use n to denote the number of nodes in the network. The n nodes will be denoted \(1,2,\dots ,n\). The symbol \(\mathbb {F}\) will always denote a finite field of order upper bounded by a polynomial in n (which means that each field element can be encoded with \(O(\log n)\) bits and thus sent using one message in the congested clique model). Given any positive integer p, we use the notation [p] to represent the set \(\{1,2,\ldots ,p\}\). Given any \(p\times p'\) matrix A, we will write its entries as A[ij] for \((i,j)\in [p]\times [p']\), and use the notation \(A[i,*]\) to represent its i-th row and \(A[*,j]\) to represent its j-th column.

Graph-theoretic problems in the congested clique model. As mentioned in the introduction, typically the main tasks that we want to solve in the congested clique model are graph-theoretical problems. In all the applications given in this paper the number of vertices of the graph will be n, the same as the number of nodes of the network. The input will be given as follows: initially each node \(\ell \in [n]\) has the \(\ell \)-th row and the \(\ell \)-th column of the adjacency matrix of the graph. Note that this distribution of the input, while being the most natural, is not essential; the only important assumption is that the entries are evenly distributed among the n nodes since they can then be redistributed in a constant number of rounds as shown in the following Lemma by Dolev et al. [6], which we will use many times in this paper.

Lemma 1

[6] In the congested clique model a set of messages in which no node is the source of more than n messages and no node is the destination of more than n messages can be delivered within two rounds if the source and destination of each message is known in advance to all nodes.

Algebraic problems in the congested clique model. The five main algebraic problems that we consider in this paper are defined as follows.

figure a

Note that the distribution of the inputs and the outputs assumed in the above five problems is mostly chosen for convenience. For instance, if needed the whole vector x in the output of \(\mathsf {SYS}(n,\mathbb {F})\) can be sent to all the nodes of the network in two rounds using Lemma 1. The only important assumption is that when dealing with matrices, the entries of the matrices must be evenly distributed among the n nodes.

We will also in this paper consider the distance product of two matrices, defined as follows.

Definition 1

Let m and n be two positive integers. Let A be an \(n\times m\) matrix and B be an \(m\times n\) matrix, both with entries in \(\mathbb R\cup \{\infty \}\). The distance product of A and B, denoted \(A*B\), is the \(n\times n\) matrix C such that \(C[i,j]=\min _{s\in [m]}\{A[i,s]+B[s,j]\}\) for all \((i,j)\in [n]\times [n]\).

We will be mainly interested in the case when the matrices have integer entries. More precisely, we will consider the following problem.

figure b

Centralized algebraic algorithms for matrix multiplication. We now briefly describe algebraic algorithms for matrix multiplication and known results about the complexity of rectangular matrix multiplication. We refer to [1] for a detailed exposition of these concepts.

Let \(\mathbb {F}\) be a field and mn be two positive integer. Consider the problem of computing the product of an \(n\times m\) matrix by an \(m\times n\) matrix over \(\mathbb {F}\). An algebraic algorithm for this problem is described by three sets \(\{\alpha _{ij\mu }\}\), \(\{\beta _{ij\mu }\}\) and \(\{\lambda _{ij\mu }\}\) of coefficients from \(\mathbb {F}\) such that, for any \(n\times m\) matrix A and any \(m\times n\) matrix B, the equality

$$\begin{aligned} C[i,j]=\sum _{\mu =1}^t\lambda _{ij\mu }S^{(\mu )}T^{(\mu )} \end{aligned}$$

holds for all \((i,j)\in [n]\times [n]\), where \(C=AB\) and

$$ S^{(\mu )}=\sum _{i=1}^n\sum _{j=1}^m\alpha _{ij\mu } A[i,j], \quad T^{(\mu )}=\sum _{i=1}^n\sum _{j=1}^m\beta _{ij\mu } B[j,i], $$

for each \(s\in [t]\). Note that each \(S^{(\mu )}\) and each \(T^{(\mu )}\) is an element of \(\mathbb {F}\). The integer t is called the rank of the algorithm, and corresponds to the complexity of the algorithm.

For instance, consider the trivial algorithm computing this matrix product using the formula

$$\begin{aligned} C[i,j]=\sum _{s=1}^m A[i,s]B[s,j]. \end{aligned}$$

This algorithm can be described in the above formalism by taking \(t=n^2m\), writing each \(\mu \in [n^2m]\) as a triple \(\mu =(i',j',s')\in [n]\times [n]\times [m]\), and choosing

$$\begin{aligned} \lambda _{ij(i',j',s')}&=\left\{ \begin{array}{cl} 1&{} \text { if}\ i=i' \text {and}\ j=j', \\ 0&{} \text { otherwise}, \end{array} \right. \\ \alpha _{ij(i',j',s')}&=\left\{ \begin{array}{cl} 1&{}\text { if}\ i=i' \text {and}\ j=s', \\ 0&{} \text { otherwise}, \end{array} \right. \beta _{ij(i',j',s')}=\left\{ \begin{array}{cl} 1&{}\text { if}\ i=j' \text {and}\ j=s', \\ 0&{} \text { otherwise}. \end{array} \right. \end{aligned}$$

Note that this trivial algorithm, and the description we just gave, also works over any semiring.

The exponent of matrix multiplication. For any non-negative real number \(\gamma \), let \(\omega (\gamma )\) denote the minimal value \(\tau \) such that the product of an \(n\times \left\lceil n^\gamma \right\rceil \) matrix over \(\mathbb {F}\) by an \(\left\lceil n^\gamma \right\rceil \times n\) matrix over \(\mathbb {F}\) can be computed by an algebraic algorithm of rank \(n^{\tau +o(1)}\) (i.e., can be computed with complexity \(O(n^{\tau +\epsilon })\) for any \(\epsilon >0\)). As usual in the literature, we typically abuse notation and simply write that such a product can be done with complexity \(O(n^{\omega (\gamma )})\), i.e., ignoring the o(1) in the exponent. The value \(\omega (1)\) is denoted by \(\omega \), and often called the exponent of square matrix multiplication. Another important quantity is the value \(\alpha =\sup \{\gamma \,|\,\omega (\gamma )=2\}\).

The trivial algorithm for matrix multiplication gives the upper bound \(\omega (\gamma )\le 2+\gamma \), and thus \(\omega \le 3\) and \(\alpha \ge 0\). The current best upper bound on \(\omega \) is \(\omega <2.3729\), see [16, 29]. The current best bound on \(\alpha \) is \(\alpha >0.3029\), see [15]. The best bounds on \(\omega (\gamma )\) for \(\gamma >\alpha \) can also be found in [15].

3 Matrix Multiplication in the Congested Clique Model

In this section we discuss the round complexity of Problems \(\mathsf {MM}(n,m,k,\mathbb {F})\) and \(\mathsf {DIST}(n,m,M)\).

Our first result is the following theorem.

Theorem 6

(Complete version). For any positive integer \(k\le n\), the deterministic round complexity of \(\mathsf {MM}(n,m,k,\mathbb {F})\) is

$$ \left\{ \begin{array}{ll} O(k)&{} { if }\,\, 0\le m\le \sqrt{kn},\\ O(k^{2/\omega (\gamma )}n^{1-2/\omega (\gamma )})&{} { if }\,\,\sqrt{kn}\le m< n^2/k,\\ O(km/n) &{} { if }\,\,m\ge n^2/k, \end{array} \right. $$

where \(\gamma \) is the solution of the equation

$$\begin{aligned} \left( 1-\frac{\log k}{\log n}\right) \gamma =1-\frac{\log k}{\log n}+\left( \frac{\log m}{\log n}-1\right) \omega (\gamma ). \end{aligned}$$
(1)

For any \(k\ge n\), the deterministic round complexity of \(\mathsf {MM}(n,m,k,\mathbb {F})\) is

$$ \left\{ \begin{array}{ll} O(k)&{}{ if }\,\, 1\le m\le n,\\ O(km/n) &{} { if }\,\,m\ge n. \end{array} \right. $$

The proof of Theorem 1, which will also show that Eq. (1) always has a solution when \(k\le n\) and \(\sqrt{kn}\le m< n^2/k\), can be found in the full version of the paper (a short discussion of the proof ideas was presented in the introduction). As briefly mentioned in the introduction, the round complexity is constant for any \(k\le \sqrt{n}\), and we further have round complexity \(O(n^{\epsilon })\), for any \(\epsilon >0\), for all values \(k\le n^{(1+\alpha )/2}\) (the bound \(\alpha >0.3029\) implies \((1+\alpha )/2> 0.6514\)). For the case \(m=n\) the solution of Eq. (1) is \(\gamma =1\), which gives the bounds of the simplified version of Theorem 1 presented in the introduction.

We now give lower bounds on the round complexity of \(\mathsf {MM}(n,m,k,\mathbb {F})\) that show that the upper bounds of Theorem 1 are tight, except possibly in the case \(\sqrt{kn}\le m < n^2/k\) when \(k\le n\).

Proposition 1

The randomized round complexity of \(\mathsf {MM}(n,m,k,\mathbb {F})\) is

$$ \left\{ \begin{array}{ll} \varOmega (k)&{} { if }\,\, 1\le m\le n,\\ \varOmega (km/n) &{} { if }\,\,m\ge n. \end{array} \right. $$

Proof

We first prove the lower bound \(\varOmega (km/n)\) for any \(m\ge n\). Let us consider instances of \(\mathsf {MM}(n,m,k,\mathbb {F})\) of the following form: for each \(s\in [k]\) all the rows of \(A_{s}\) are zero except the first row; for each \(s\in [k]\) all the columns of \(B_{s}\) are zero except the second column. Let us write \(C_s=A_sB_s\) for each \(s\in [k]\). We prove the lower bound by partitioning the n nodes of the network into the two sets \(\{1\}\) and \(\{2,\ldots ,n\}\), and considering the following two-party communication problem. Alice (corresponding to the set \(\{1\}\)) has for input \(A_{s}[1,j]\) for all \(j\in [m]\) and all \(s\in [k]\). Bob (corresponding to the set \(\{2,\ldots ,n\}\)) has for input \(B_{s}[i,2]\) for all \(i\in [m]\) and all \(s\in [k]\). The goal is for Alice to output \(C_{s}[1,2]\) for all \(s\in [k]\). Note that \(C_{s}[1,2]\) is the inner product (over \(\mathbb {F}\)) of the first row of \(A_{s}\) and the second column of \(B_s\). Thus \(\sum _{s=1}^k C_{s}[1,2]\) is the inner product of two vectors of size km. Alice and Bob must exchange \(\varOmega (km\log |\mathbb {F}|)\) bits to compute this value [4], which requires \(\varOmega (km/n)\) rounds in the original congested clique model.

We now prove the lower bound \(\varOmega (k)\) for any \(m\ge 1\). Let us consider instances of \(\mathsf {MM}(n,m,k,\mathbb {F})\) of the following form: for each \(s\in [k]\), all entries of \(A_{s}\) are zero except the entry \(A_{s}[1,1]\) which is one; for each \(s\in [k]\), \(B_{s}[i,j]=0\) for all \((i,j)\notin \{(1,j)\,|\, j\in \{2,\ldots ,n\}\}\) (the other \(n-1\) entries are arbitrary). Again, let us write \(C_s=A_sB_s\) for each \(s\in [k]\). We prove the lower bound by again partitioning the n nodes of the network into the two sets \(\{1\}\) and \(\{2,\ldots ,n\}\), and considering the following two-party communication problem. Alice has no input. Bob has for input \(B_{s}[1,j]\) for all \(j\in \{2,\ldots ,n\}\) and all \(s\in [k]\). The goal is for Alice to output \(C_{s}[1,j]\) for all \(j\in \{2,\ldots ,n\}\) and all \(s\in [k]\). Since the output reveals Bob’s whole input to Alice, Alice must receive \(\varOmega (k(n-1)\log |\mathbb {F}|)\) bits, which gives round complexity \(\varOmega (k)\) in the original congested clique model.    \(\square \)

One of the main applications of Theorem 1 is the following result, which will imply all our results on the all-pairs shortest paths and diameter computation, including Theorems 4 and 5.

Theorem 7

For any \(M\le n\) and \(m\le n\), the deterministic round complexity of \(\mathsf {DIST}(n,m,M)\) is

$$ \left\{ \begin{array}{ll} O(M\log m)&{} { if }\,\, 0\le m\le \sqrt{Mn\log m},\\ O\left( (M\log m)^{2/\omega (\gamma )}n^{1-2/\omega (\gamma )}\right) &{} { if }\,\,\sqrt{Mn \log m}\le m\le n^2/(M\log m),\\ O\left( mM\log m/n\right) &{} { if }\,\,n^2/(M\log m)\le m\le n, \end{array} \right. $$

where \(\gamma \) is the solution of the equation \( \left( 1\!-\!\frac{\log M}{\log n}\right) \gamma =1-\frac{\log M}{\log n}+\left( \frac{\log m}{\log n}\!-\!1\right) \omega (\gamma ). \)

The proof of Theorem 7 is omitted, but can be found in the full version of the paper. The idea is to show that \(\mathsf {DIST}(n,m,M)\) reduces to \(\mathsf {MM}(n,m,k,\mathbb {F})\) for \(k\approx M\log m\) and a well-chosen finite field \(\mathbb {F}\), and then use Theorem 1 to get a factor \((M\log m)^{2/\omega (\gamma )}\), instead of the factor M obtained in a straightforward implementation of the distance product, in the complexity. This reduction is done by first applying a standard encoding of the distance product into a usual matrix product of matrices with integer entries of absolute value \(\exp (M)\), and then using Fourier transforms to split this latter matrix product into roughly \(M\log m\) independent matrix products over a small field.

4 Deterministic Computation of Determinant and Inverse Matrix

In this section we present deterministic algorithms for computing the determinant of a matrix and the inverse of a matrix in the congested clique model, and prove Theorem 2. Our algorithms can be seen as efficient implementations of the parallel algorithm by Preparata and Sarwate [25] based on the Faddeev-Leverrier method.

Let A be an \(n\times n\) matrix over a field \(\mathbb {F}\). Let \( \det (\lambda I-A)=\lambda ^n+c_1\lambda ^{n-1}+\cdots +c_{n-1}\lambda +c_n \) be its characteristic polynomial. The determinant of A is \((-1)^nc_n\) and, if \(c_n\ne 0\), its inverse is

$$\begin{aligned} A^{-1}=-\frac{A^{n-1}+c_1A^{n-2}+\cdots +c_{n-2}A+c_{n-1}I}{c_n}. \end{aligned}$$

Define the vector \(\mathbf {c}=(c_1,\ldots ,c_n)^T\in \mathbb {F}^{n\times 1}\). For any \(k\in [n]\) let \(s_k\) denote the trace of the matrix \(A^k\), and define the vector \(\mathbf {s}=(s_1,\ldots ,s_n)^T\in \mathbb {F}^{n\times 1}\). Define the \(n\times n\) matrix

$$ S=\left( \begin{array}{ccccc} 1&{}&{}&{}&{}\\ s_1&{}2&{}&{}&{}\\ s_2&{}s_1&{}3&{}&{}\\ \vdots &{}\vdots &{}\vdots &{}\ddots &{}\\ s_{n-1}&{}s_{n-2}&{}s_{n-3}&{}...\,s_1&{}n \end{array} \right) . $$

It can be easily shown (see, e.g., [5, 25]) that \(S\mathbf {c}=-\mathbf {s}\), which enables us to recover \(\mathbf {c}\) from \(\mathbf {s}\) if S is invertible. The matrix S is invertible whenever \(n!\ne 0\), which is true in any field of characteristic zero or in any finite field of characteristic strictly larger than n. The following proposition shows that the inverse of an invertible triangular matrix can be computed efficiently in the congested clique model.

Proposition 2

Let \(\mathbb {F}\) be any field. There is a deterministic algorithm with round complexity \(O(n^{1-2/\omega })\) that solves \(\mathsf {INV}(n,\mathbb {F})\) when the input A is an invertible lower triangular matrix.

We are now ready to give the proof of Theorem 2.

Proof

(of Theorem 2 ). For convenience we assume that n is a square, and write \(p=\sqrt{n}\). If n is not a square we can easily adapt the proof by taking \(p=\left\lceil \sqrt{n} \right\rceil \). Observe that any integer \(a\in \{0,1,\ldots ,n-1\}\) can be written in a unique way as \(a=(a_1-1)p+(a_2-1)\) with \(a_1,a_2\in [p]\). Below when we write \(a=(a_1,a_2)\in [n]\), we mean that \(a_1\) and \(a_2\) are the two elements in [p] such that \(a=(a_1-1)p+(a_2-1)\).

For any \(\ell \in [n]\), let \(R_\ell \) be the \(p\times n\) matrix such that the i-th row of \(R_\ell \) is the \(\ell \)-th row of \(A^{(i-1)p}\), for each \(i\in [p]\). Similarly, for any \(\ell \in [n]\), let \(C_\ell \) be the \(n\times p\) matrix such that the j-th column of \(C_\ell \) is the \(\ell \)-th column of \(A^{j-1}\), for each \(j\in [p]\). For each \(\ell \in [n]\) define \(U_\ell =R_\ell C_\ell \), which is a \(p\times p\) matrix. Observe that, for any \(k=(k_1,k_2)\in [n]\), the identity

$$\begin{aligned} s_k=\sum _{\ell =1}^n U_\ell [k_1,k_2] \end{aligned}$$
(2)

holds. We will use this expression, together with the equation \(\mathbf {c}=-S^{-1}\mathbf {s}\) to compute the determinant in the congested clique model.

In order to compute the inverse of A we then use the following approach. For any \((a_1,a_2)\in [p]\times [p]\), define the coefficient \(c_{a_1,a_2}\in \mathbb {F}\) as follows:

$$ c_{a_1,a_2}=\left\{ \begin{array}{ll} c_{n-1-(a_1-1)p-(a_2-1)}&{}\text { if }(a_1,a_2)\ne (p,p),\\ 1&{}\text { if }(a_1,a_2)=(p,p). \end{array} \right. $$

For any \(a_2\in [p]\), define the \(n\times n\) matrix \(E_{a_2}\) as follows:

$$\begin{aligned} E_{a_2}=\sum _{a_1=1}^p c_{a_1,a_2}A^{(a_1-1)p}. \end{aligned}$$

Note that the following holds whenever \(c_n\ne 0\):

$$ A^{-1}=-\frac{\sum _{a=0}^{n-1}c_{n-1-a}A^a}{c_n} =-\frac{\sum _{a_1=1}^{p}\sum _{a_2=1}^p c_{a_2,a_1}A^{(a_1-1)p+(a_2-1)}}{c_n}, $$

which gives

$$\begin{aligned} A^{-1} =-\frac{\sum _{a_2=1}^{p} E_{a_2}A^{a_2-1}}{c_n}. \end{aligned}$$
(3)
Fig. 1.
figure 1

Distributed algorithm for computing the determinant of an \(n\times n\) matrix A and computing \(A^{-1}\) if \(\det (A)\ne 0\). Initially each node \(\ell \in [n]\) has as input \(A[\ell ,*]\) and \(A[*,\ell ]\).

The algorithm for \(\mathsf {DET}(n,\mathbb {F})\) and \(\mathsf {INV}(n,\mathbb {F})\) is described in Fig. 1. Steps 1 and 7.2 can be implemented in \(O(p^{2/\omega }n^{1-2/\omega })\) rounds from Theorem 1 (or its simplified version in the introduction). Step 5 can be implemented in \(O(n^{1-2/\omega })\) rounds, again from Theorem 1. At Steps 2, 3 and 6 each node receives n elements from the field \(\mathbb {F}\), so each of these three steps can be implemented in two rounds from Lemma 1. The other steps (Steps 4, 7.1 and 7.3) do not require any communication. The total round complexity of the algorithm is thus \( O\left( p^{2/\omega }n^{1-2/\omega }\right) =O\left( n^{1-1/\omega }\right) , \) as claimed.     \(\square \)