1 Introduction

Over the last two decades, a significant line of work in theoretical algorithms has made progress in solving linear systems of the form \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\), where \({{\textbf {L}}}\) is the Laplacian matrix of a weighted graph with weights \(w(i,j)>0\) on the edges. Starting with the work of Spielman and Teng [1], researchers have devised a number of algorithms that run in near-linear time in the number of edges of the graph (corresponding to the number of non-zeros in the matrix \({{\textbf {L}}}\)). The solution \({{\textbf {x}}}\) of the linear system can be interpreted as the potentials of an electrical flow in which the resistance of each edge is \(r(i,j)=1/w(i,j)\), and the current supplied to each node i is b(i). There have been many nice algorithmic ideas introduced using this interpretation of the linear system, as well as many applications of the fast algorithms for solving this system to other flow problems, such as the maximum flow problem [2,3,4,5,6].

Since the initial work of Spielman and Teng, a number of different near-linear time algorithms have been proposed. In this paper, we focus on a particular simple, combinatorial algorithm by Kelner et al. [7], hereafter referred to as the KOSZ algorithm; this algorithm has been the subject of several implementation studies [8,9,10,11]. The KOSZ algorithm uses the idea of an electrical flow in solving the linear system \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\). An electrical flow \({{\textbf {f}}}\) is one that obeys the flow conservation constraints at each node i, saying that the net flow out of i is b(i) (sometimes known in this context as the Kirchoff Current Law, or KCL), and Ohm’s law, which says that there exists a vector \({{\textbf {x}}}\) such that the flow from i to j, f(ij), equals \((x(i) - x(j))/r(i,j)\). There exists a flow \({{\textbf {f}}}\) that satisfies these two properties, and the corresponding potential vector \({{\textbf {x}}}\) solves \({{\textbf {L}}} {{\textbf {x}}} = {{\textbf {b}}}\). Ohm’s Law is known to be equivalent to the Kirchoff Potential Law (KPL), which says that the flow \({{\textbf {f}}}\) satisfies the property that around any directed cycle C, \(\sum _{(i,j) \in C} r(i,j) f(i,j) = 0\). Given KPL, potentials satisfying Ohm’s law can be constructed by picking any spanning tree T rooted at a vertex r, setting x(r) to 0, and x(k) to the sum of f(ij)r(ij) on the path in T from k to r; these potentials are known as tree-induced potentials.

The KOSZ algorithm starts by picking a spanning tree T; for the running time of the algorithm, it is important that the tree has low stretch; its definition is otherwise not crucial to the description of the algorithm. The algorithm starts by constructing a flow \({{\textbf {f}}}\) that satisfies flow conservation using only the edges in T. For a near-linear number of iterations, the algorithm picks at random a non-tree edge (ij) and considers the fundamental cycle C closed by adding (ij) to T; it then alters the flow \({{\textbf {f}}}\) along C to satisfy KPL (and such that KCL continues to be satisfied). By picking (ij) with the appropriate probability, Kelner et al. show that the energy of the resulting flow decreases by a factor \(1 - \frac{1}{\tau }\) in expectation, where \(\tau \) is a parameter related to the stretch. The algorithm then returns the tree-induced potentials \({{\textbf {x}}}\) associated with T and the flow \({{\textbf {f}}}\). Kelner et al. [7] show that the resulting potentials are close to the potentials of the associated electrical flow for the graph. The KOSZ algorithm has the pleasing properties that it is easy to understand both the algorithm and the analysis, and it is also close in spirit to known network flow algorithms; in particular, it resembles the primal network simplex algorithm for the minimum-cost flow problem.

The primal network simplex algorithm for the minimum-cost flow problem has a natural dual algorithm in the dual network simplex algorithm, in which node potentials are altered on one side of a fundamental cut of a tree (the cut induced by removing an edge in the tree). Similarly, polynomial-time cycle-canceling algorithms for minimum-cost flow (e.g. Goldberg and Tarjan [12]) have natural dual analogs of polynomial-time cut-cancelling algorithms (e.g. Ervolina and McCormick [13]). We refer to the first type of algorithm as a cycle-toggling algorithm, and the dual analog as a cut-toggling algorithm. Thus, the KOSZ algorithm is a cycle-toggling algorithm for solving Laplacian linear systems. However, no corresponding cut-toggling algorithm exists in the literature, leading immediately to the following question:

Does there exist a cut-toggling algorithm for solving Laplacian linear

systems/computing near-minimum energy flows, and how efficiently can it be

implemented?

1.1 Our Contributions

We propose a dual analog of the KOSZ algorithm which performs cut-toggling rather than cycle-toggling. We refer to this algorithm as Dual KOSZ, and we show it converges in a nearly-linear number of cut-toggling steps. Thus, Dual KOSZ would be a nearly-linear time algorithm if each cut-toggling operation could be implemented to run in polylogarithmic time.

Our next contribution is to show that the natural data structure abstraction of this cut-toggling process can be reduced to the online matrix–vector (OMv) problem, which has been conjectured to be hard [14]. This implies that it is unlikely for there to be a black-box data structure that implements a single cut-toggling operation in sublinear time, unlike cycle toggling.

This result initially seems to present an insurmountable difficulty to obtaining a nearly-linear time algorithm. However, we show that we can exploit the offline nature of the cut toggles, obtaining an exact data structure for computing a sequence of K cut-toggles in \(O(K\sqrt{m})\) time total, which yields an algorithm that runs in \({\tilde{O}}(m^{1.5})\) time overall. Interestingly, Boman, Deweese, and Gilbert [9] explored an implementation of KOSZ that also batched its cycle-toggling updates by looking for collections of edge-disjoint cycles.

We further show that by incorporating sparsification and its associated approximations, we can reduce the running time to almost-linear, which means the cut toggling algorithm can still be implemented in almost-linear time.

Our result demonstrates that graph optimization algorithms and dynamic graph data structures can—and sometimes need to—interact in more intricate fashion than optimization in the outer loop and black-box data structure in the inner loop.

The remainder of the paper is structured as follows. In Sect. 1.2, we give a high-level overview of Dual KOSZ and our implementation ideas to obtain almost linear time. Section 2 describes some notation and concepts we will use. Section 3 gives the Dual KOSZ in detail and shows that it can be implemented in a near-linear number of cut-toggling iterations, with each iteration running in linear time. In Sect. 4, we abstract the problem of toggling a cut to a data structure problem, and show that given the OMv conjecture of [14], we cannot implement the operations needed in sublinear time. In Sect. 5, we show how to overcome this difficulty by batching the cut-toggle operations, and further speed up the algorithm through sparsification and recursion.

1.2 Technical Overview

As the KOSZ algorithm maintains a flow \({{\textbf {f}}}\) and implicitly tree-induced potentials \({{\textbf {x}}}\), its natural dual is to maintain potentials \({{\textbf {x}}}\), which implicitly define a flow \({{\textbf {f}}}\).Footnote 1 The Dual KOSZ algorithm starts by choosing a low-stretch spanning tree T. It maintains a set of potentials \({{\textbf {x}}}\) (initially zero), and the corresponding (infeasible) flow \({{\textbf {f}}}\) implied by Ohm’s Law. In each iteration, we sample a fundamental cut S of the tree T and perform a cut-toggling update so that the net flow leaving S is \(\sum _{i \in S} b(i)\), as required in every feasible flow. Following arguments dual to those made in Kelner et al. we show that this algorithm also performs a near-linear number of iterations in order to find a near-optimal set of potentials \({{\textbf {x}}}\) and flow \({{\textbf {f}}}\).

Theorem 1

Let \(\tau \) be the total stretch of T. After \(K = \tau \ln (\frac{\tau }{\epsilon })\) iterations, Dual KOSZ returns \({{\textbf {x}}}^K \in {\mathbb {R}}^V\) and \({{\textbf {f}}}^K \in {\mathbb {R}}^{\vec {E}}\) such that \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}^K}\right\Vert _{{\textbf {L}}}^2 \le \frac{\epsilon }{\tau }\left\Vert {{{\textbf {x}}}^*}\right\Vert _{{\textbf {L}}}^2\) and \({\mathbb {E}}[{\mathcal {E}}({{\textbf {f}}}^K)] \le (1+\epsilon ) {\mathcal {E}}({{\textbf {f}}}^*)\), for \({{\textbf {f}}}^*\) and \({{\textbf {x}}}^*\) optimal primal and dual solutions respectively.

Here \(\Vert {{\textbf {y}}}\Vert _{{{\textbf {L}}}} = \sqrt{{{\textbf {y}}}^{^\top }{{\textbf {L}}}{{\textbf {y}}}}\), and \({\mathcal {E}}({{\textbf {f}}})\) is the energy of flow \({{\textbf {f}}}\).

However, unlike Kelner et al., we cannot show that each individual cut-toggling update can be made to run in polylogarithmic time. If we abstract the desired cut-toggling update step as a natural data structure problem, we show that such a data structure cannot be implemented in \(O(n^{1-\epsilon })\) time for any \(\epsilon > 0\) given a conjecture about the online matrix–vector multiplication problem (OMv) made by Henzinger, Krinninger, Nanongkai and Saranurak [14]. They have conjectured that this problem does not have any algorithm that can carry out an online sequence of n Boolean matrix–vector multiplications in time \(O(n^{3-\epsilon })\), and show that if the conjecture is false, then various long-standing dynamic graph problems will have faster algorithms. We show that a single Boolean matrix–vector multiply can be carried out as a sequence of O(n) operations of our desired data structure. Given the conjecture, then, we cannot implement the data structure operations in \(O(n^{1-\epsilon })\) time. Thus there is not a straightforward near-linear time version of the Dual KOSZ algorithm.Footnote 2

In a bit more detail, the data structure we define is given as input an undirected graph with a spanning tree T, edge resistances and a supply b(v) and a potential \(\textsf{value}(v)\) at every vertex v. The data structure supports two operations: 1) additively update the value of all the vertices in a given subtree of T, and 2) query the value of the flow induced by the potential values across any given fundamental cut of T. We show that even if the data structure can only decide whether the value of the flow is non-zero or not, it would refute the OMV conjecture. Therefore, we obtain a lower bound for this data structure conditional on the OMv conjecture [14]. This even holds if all edge resistances are 1 and all b(v) are 0.

Nevertheless, we circumvent this data structural lower bound by exploiting the fact that the sequence of cuts to be updated can be sampled in advance and, thus, the updates can be batched, circumventing the “online” (or “sequential”) requirement in OMv. This is possible because both the spanning tree T and the probability distribution over cuts of T are fixed at the beginning of the algorithm. More precisely, denote the number of iterations of Dual KOSZ by K (which is \({\widetilde{O}}(m)\)). Instead of sampling the fundamental cuts one at a time, consider sampling the next l cuts that need to be updated for some \(l \ll K\). In each “block" of size \(l \ll K\), we contract all the edges of T that do not correspond to one of the l fundamental cuts to be updated. In this way, we work with a contracted tree of size O(l) in each block (instead of the full tree, which has size O(n)). This makes the updates faster. However, the price we pay is that at the end of each block, we need to propagate the updates we made (which were on the contracted tree), back to the entire tree. Overall, we show that each block takes \(O(l^2 + m)\) time. Since there are \({\widetilde{O}}(\frac{m}{l})\) blocks, the total runtime is \(\widetilde{O}(ml + \frac{m^2}{l})\). Choosing \(l=\sqrt{m}\) thus gives a \({\tilde{O}}(m^{1.5})\) time algorithm.

By augmenting the batching idea with sparsification and recursion, one can further improve the running time of Dual KOSZ to \({\widetilde{O}}(m^{1+\delta })\) for any \(\delta > 0\). To do this, observe that l cut-toggling updates effectively break the spanning tree into \(l+1\) components. After contracting the components to get a graph H with \(l+1\) vertices, we can show that solving an appropriate Laplacian system on H gives a single update step that makes at least as much progress as the sequence of l updates performed by the straightforward unbatched algorithm. A natural approach is to solve this Laplacian system by recursively calling the algorithm. However, this by itself does not give an improved running time. Instead, we first spectrally sparsify H and then call the algorithm recursively to solve the sparsified Laplacian system. Here we use a combinatorial spectral sparsifier [16] because it does not require calling Laplacian solvers as a subroutine (e.g. [17]). By carefully analyzing the error incurred by sparsification, we are able to show that the update step using sparsification makes about as much progress as the update step without sparsification. The total running time of the recursive algorithm is then obtained by bounding the time taken at each layer of the recursion tree.

Theorem 2

For any \(\delta \in (0,1)\) and \(\epsilon > 0\), Dual KOSZ with batching, sparsification, and recursion finds \({{\textbf {x}}}\) with \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}}\right\Vert _{{\textbf {L}}}^2 \le \epsilon \left\Vert {{{\textbf {x}}}^*}\right\Vert _{{\textbf {L}}}^2\) in \(O(m^{1+\delta }(\log n)^{\frac{8}{\delta } }(\log \frac{1}{\epsilon })^{\frac{1}{\delta }})\) time.

2 Notation and Problem Statement

In this section, we give some notation and define concepts we will use in our algorithm. We are given as input an undirected graph \(G = (V, E)\), with positive resistances \({{\textbf {r}}} \in {\mathbb {R}}_+^E\). Although our graph is undirected, it will be helpful for us notationally to direct the edges. To that end, fix an arbitrary orientation of the edges, and denote the set of directed edges by \(\vec {E}\).

In addition to G and the resistances \({{\textbf {r}}}\), we are given a supply vector \({{\textbf {b}}} \in {\mathbb {R}}^V\).

The Laplacian matrix of G with respect to the resistances \({{\textbf {r}}}\) is the matrix \({{\textbf {L}}} \in {\mathbb {R}}^{V \times V}\) defined by

$$\begin{aligned} {{\textbf {L}}} = \sum _{ij \in E} \frac{1}{r(i,j)}({{\textbf {e}}_{\textbf {i}}} - {{\textbf {e}}_{\textbf {j}}})({{\textbf {e}}_{\textbf {i}}} - {{\textbf {e}}_{\textbf {j}}}){^\top }, \end{aligned}$$

where \({\textbf {e}}_{\textbf {i}}\) is the ith unit basis vector. We note then that \({{\textbf {x}}}{^\top }{{\textbf {L}}} {{\textbf {x}}} = \sum _{(i,j) \in E} \frac{1}{r(i,j)}(x(i)-x(j))^2\) for any vector \({{\textbf {x}}}\).

Our goal is to solve the system of linear equations \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\) for \({{\textbf {x}}}\). However, we will not be able to solve \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\) exactly, so we will solve it approximately instead. It is usual to measure the quality of a solution \({{\textbf {x}}}\) in terms of the matrix norm induced by \({{\textbf {L}}}\). In other words, if \({{\textbf {x}}}\) is the vector of potentials returned by our algorithm and \({{\textbf {x}}}^*\) is an actual solution to \({{\textbf {L}}}{{\textbf {x}}}^* = {{\textbf {b}}}\), then the error of our solution \({{\textbf {x}}}\) is

$$\begin{aligned} \left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}}\right\Vert _{{{\textbf {L}}}}^2:= \left( {{\textbf {x}}}^* - {{\textbf {x}}}\right) {^\top }{{\textbf {L}}} \left( {{\textbf {x}}}^* - {{\textbf {x}}}\right) . \end{aligned}$$

Hence, our objective is to find \({{\textbf {x}}} \in {\mathbb {R}}^V\) that minimizes \(\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}}\right\Vert _{{{\textbf {L}}}}^2\). A precise statement of this is given below.

figure a

Of course, the algorithm does not know the actual solution \({{\textbf {x}}}^*\). The place where \({{\textbf {x}}}^*\) appears is in the analysis.

Equations of the form \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\), where \({{\textbf {L}}}\) is the Laplacian matrix of a graph, are called Laplacian systems, and are found in a wide variety of applications in computer science and other fields. When interpreted as an optimization problem, solving a Laplacian linear system has the following nice interpretation: it is the dual of the problem of finding an electrical flow. The primal problem below is that of finding an electrical flow; here \({{\textbf {A}}}\) is the vertex-arc incidence matrix of \((V, \vec {E})\). The optimal solution to the primal is called the electrical flow in G defined by the resistances \({{\textbf {r}}}\). The dual problem is equivalent to solving \({{\textbf {L}}}{{\textbf {x}}}= {{\textbf {b}}}\); this fact can be seen by setting the gradient of the dual objective to equal 0 since the dual is concave.

$$\begin{aligned} (P) \quad&\min \quad \frac{1}{2} \sum _{(i,j)\in \vec {E}} r(i,j){f(i,j)}^2 \qquad \qquad (D)&\max \quad {{\textbf {b}}}{^\top }{{\textbf {x}}}- \frac{1}{2} {{\textbf {x}}}{^\top }{{\textbf {L}}}{{\textbf {x}}}\\&\text {s.t.} \quad {{\textbf {A}}}{{\textbf {f}}}= {{\textbf {b}}}&\text {s.t.} \quad {{\textbf {x}}}\in {\mathbb {R}}^{V}~\text {unconstrained} \end{aligned}$$

Let \({\mathcal {E}}({{\textbf {f}}})\) denote the primal objective and \({\mathcal {B}}({{\textbf {x}}})\) denote the dual objective. The primal objective is sometimes referred to as the energy of the electrical flow \({{\textbf {f}}}\). Also, let \({{\textbf {f}}}^*\) denote the optimal primal solution and let \({{\textbf {x}}}^*\) denote an optimal dual solution. Note that there are infinitely many dual solutions, because the dual objective is invariant under adding a constant to every component of \({{\textbf {x}}}\). By strong duality (note that Slater’s condition holds), \({\mathcal {E}}({{\textbf {f}}}^*) = {\mathcal {B}}({{\textbf {x}}}^*)\). Moreover, the KKT conditions give a primal-dual characterization of optimality.

Fact 1

(KKT Conditions for Electrical Flow) Consider \({{\textbf {f}}}\in {\mathbb {R}}^{\vec {E}}\) and \({{\textbf {x}}}\in {\mathbb {R}}^V\). Then \({{\textbf {f}}}\) is optimal for the primal and \({{\textbf {x}}}\) is optimal for the dual if and only if the following conditions hold:

  1. 1.

    \({{\textbf {f}}}\) is a feasible \({{\textbf {b}}}\)-flow;

  2. 2.

    (Ohm’s Law) \(f(i, j) = \frac{x(i) - x(j)}{r(i,j)}\) for all \((i, j) \in \vec {E}\).

Thus solving \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\) and finding the electrical \({{\textbf {b}}}\)-flow in G are equivalent: Given a solution \({{\textbf {x}}}\) to \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\), we can calculate the electrical flow \({{\textbf {f}}}\) using \(f(i,j) = \frac{x(i) - x(j)}{r(i, j)}\). On the other hand, given the electrical flow \({{\textbf {f}}}\), we can recover corresponding potentials \({{\textbf {x}}}\) by setting \(x(v) = 0\) for some arbitrary vertex v, and using the equation \(f(i,j) = \frac{x(i) - x(j)}{r(i,j)}\) to solve for the potentials on every other vertex.

A \({{\textbf {b}}}\)-flow satisfies Ohm’s Law if and only if it satisfies the Kirchoff Potential Law (KPL): KPL states that for every directed cycle C, \(\sum _{(i,j) \in C} f(i,j)r(i,j) = 0\).

Both the Kelner et al. algorithm and our algorithm use a low-stretch spanning tree T. Given resistances \({{\textbf {r}}}\), the stretch of a tree is defined as

$$\begin{aligned} \text {st}_T(G) = \sum _{(i, j) \in \vec {E}} \text {st}_T(i, j) = \sum _{(i, j) \in \vec {E}} \frac{1}{r(i, j)}\sum _{(k, l) \in P(i,j)} r(k, l), \end{aligned}$$

where P(ij) is the unique path from i to j in T. We can find a spanning tree T with stretch \(\text {st}_T(G)=O(m\log n \log \log n)\) in \(O(m \log n \log \log n)\) time [18].

We use the notation \(\mathbb {1}\) to stand for the vector of all 1 s, and \(\mathbb {1}_X\) to be the characteristic vector of a set X that has 1 s in the entries corresponding to the elements of X and 0 s elsewhere.

3 A Cut-Toggling Algorithm for Solving Laplacian Linear Systems

We present a cut-toggling algorithm for computing an approximate solution to \({{\textbf {L}}}{{\textbf {x}}}= {{\textbf {b}}}\), and also an approximate minimum-energy \({{\textbf {b}}}\)-flow. The goal of this section is to show that the cut-toggling algorithm converges in a near-linear number of iterations, and that each iteration runs in linear time. Later in Sect. 5, we will show how to speed up the algorithm to an almost-linear total running time. Since our algorithm is dual to the cycle-toggling algorithm of Kelner et al. [7] (which we call KOSZ in this paper), we will begin by describing the KOSZ algorithm.

The KOSZ algorithm works by maintaining a feasible \({{\textbf {b}}}\)-flow \({{\textbf {f}}}\), and iteratively updates \({{\textbf {f}}}\) along cycles to satisfy Kirchkoff’s Potential Law on the cycle. It starts by choosing a spanning tree T that has low stretch, and computes a \({{\textbf {b}}}\)-flow \({{\textbf {f}}}^0\) that uses only edges in the tree T. Then for a number of iterations K that depends on the stretch of the tree, it chooses a non-tree edge \((i,j) \in E-T\) according to a probability distribution, and for the fundamental cycle closed by adding edge (ij) to T, it modifies the flow \({{\textbf {f}}}\) so that Kirchoff’s Potential Law is satisfied on the cycle. The probability \(P_{ij}\) that edge (ij) gets chosen is proportional to the total resistance around the cycle closed by (ij) divided by r(ij). Given the tree T with root r and the current flow \({{\textbf {f}}}^t\) in iteration t, there is a standard way to define a set of potentials \({{\textbf {x}}}^t\) (called the tree-induced or tree-defined potentials): set x(r) to 0, and x(k) to the sum of f(ij)r(ij) on the path in T from k to r. We summarize KOSZ in Algorithm 1.

Algorithm 1
figure b

The KOSZ algorithm for solving \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\).

Our algorithm, which we will call Dual KOSZ, works by maintaining a set of potentials \({{\textbf {x}}}\). It iteratively samples cuts in the graph, updating potentials on one side of the cut to satisfy flow conservation across that cut. Following KOSZ, we choose a spanning tree T of low stretch. Then for a number of iterations K that depends on the stretch of tree T, we repeatedly sample a fundamental cut from the spanning tree (i.e. a cut induced by removing one of the tree edges). We update all of the potentials on one side of the cut by an amount \(\Delta \) so that the amount of flow crossing the cut via Ohm’s Law is what is required by the supply vector. We summarize Dual KOSZ in Algorithm 2. The main result of this section is a bound on the iteration complexity of Dual KOSZ.

Theorem 1

Let \(\tau \) be the total stretch of T. After \(K = \tau \ln (\frac{\tau }{\epsilon })\) iterations, Dual KOSZ returns \({{\textbf {x}}}^K \in {\mathbb {R}}^V\) and \({{\textbf {f}}}^K \in {\mathbb {R}}^{\vec {E}}\) such that \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}^K}\right\Vert _{{\textbf {L}}}^2 \le \frac{\epsilon }{\tau }\left\Vert {{{\textbf {x}}}^*}\right\Vert _{{\textbf {L}}}^2\) and \({\mathbb {E}}[{\mathcal {E}}({{\textbf {f}}}^K)] \le (1+\epsilon ) {\mathcal {E}}({{\textbf {f}}}^*)\), for \({{\textbf {f}}}^*\) and \({{\textbf {x}}}^*\) optimal primal and dual solutions respectively.

Next we give the algorithm in somewhat more detail. Let \(R(C)\! \!=\!\! (\sum _{(k,l) \!\in \! \delta (C)} \!\frac{1}{r(k,l)})^{-1}\) for \(C \subset V\), where \(\delta (C)\) is the set of edges with exactly one endpoint in C. Note that R(C) has units of resistance. For every tree edge (ij), let C(ij) be the set of vertices on one side of the fundamental cut defined by (ij), such that \(i \in C(i, j)\) and \(j \not \in C(i, j)\). We set up a probability distribution \(P_{ij}\) on edges (ij) in the spanning tree T, where \(P_{ij} \propto \frac{r(i,j)}{R(C(i,j))}\). We initialize potentials \(x^0(i)\) to 0 for all nodes \(i \in V\). In each iteration, we sample edge \((i,j) \in T\) according to the probabilities \(P_{ij}\). Let \(b(C) ={{\textbf {b}}}{^\top }\mathbb {1}_C\) be the total supply of the nodes in C. Note that b(C) is also the amount of flow that should be flowing out of C in any feasible \({{\textbf {b}}}\)-flow.

Let \(f^{t}(C)\) be the total amount of flow going out of C in the flow induced by \({{\textbf {x}}}^{t}\). That is,

$$\begin{aligned} f^{t}(C) = \sum _{\begin{array}{c} ij \in E \\ i \in C, \, j \not \in C \end{array}} \frac{{x}^t(i) - {x}^t(j)}{r(i,j)}. \end{aligned}$$

Note that \(f^t(C)\) can be positive or negative. In any feasible \({{\textbf {b}}}\)-flow, the amount of flow leaving C should be equal to \({{\textbf {b}}}{^\top }\mathbb {1}_C = b(C)\). Hence, we define \(\Delta ^t = (b(C) - f^{t}(C))\cdot R(C).\) Observe that \(\Delta ^t\) is precisely the quantity by which we need to increase the potentials of every node in C so that flow conservation is satisfied on \(\delta (C)\). We then update the potentials, so that

$$\begin{aligned} p^{t+1}(v) = {\left\{ \begin{array}{ll} p^{t}(v) +\Delta ^t, &{}\text {if} v \in C, \\ p^{t}(v), &{}\text {if} v \not \in C. \end{array}\right. } \end{aligned}$$

After K iterations, we return the final potentials \({{\textbf {x}}}^{K}\). The last step is to convert \({{\textbf {x}}}^K\) to a feasible flow by taking a tree-defined flow with respect to T: \(f^K(i,j) = \frac{x^K(i)-x^K(j)}{r(i,j)}\) on all non-tree edges, and \({{\textbf {f}}}^K\) routes the unique flow on T to make \({{\textbf {f}}}^K\) a feasible \({{\textbf {b}}}\)-flow.

Algorithm 2
figure c

Algorithm Dual KOSZ for solving \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\).

3.1 Analysis of Dual KOSZ

Recall that \({\mathcal {E}}({{\textbf {f}}}) = \frac{1}{2}\sum _{e \in E}{f(e)}^2\) and \({\mathcal {B}}({{\textbf {x}}}) = {{\textbf {b}}}^T{{\textbf {x}}}- \frac{1}{2}{{\textbf {x}}}^T{{\textbf {L}}}{{\textbf {x}}}\). By convex duality, we have \({\mathcal {B}}({{\textbf {x}}}) \le {\mathcal {E}}({{\textbf {f}}})\) for any \({{\textbf {x}}}\in {\mathbb {R}}^V\) and \({{\textbf {b}}}\)-flow \({{\textbf {f}}}\). Moreover, \({{\textbf {x}}}\) maximizes \({\mathcal {B}}({{\textbf {x}}})\) if and only if \({{\textbf {L}}}{{\textbf {x}}}= {{\textbf {b}}}\). (See e.g. [19, Lemma 8.9]). Thus solving the Laplacian system \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\) is equivalent to finding a vector of potentials that maximizes the dual objective. In what follows, we present the lemmas that form the bulk of the analysis. Their proofs are deferred to the “Appendix”. These lemmas (and their proofs) are similar to their counterparts in Kelner et al. [7], because everything that appears here is dual to what appears there.

First, we show that each iteration of the algorithm increases \({\mathcal {B}}({{\textbf {x}}})\).

Lemma 1

Let \({{\textbf {x}}} \in {\mathbb {R}}^V\) be a vector of potentials and let \(C \subset V\). Let \({{\textbf {x}}}'\) be the potentials obtained from \({{\textbf {x}}}\) as in the algorithm (that is, by adding \(\Delta \) to the potential of every vertex in C so that flow conservation is satisfied across \(\delta (C)\)). Then

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}') - {\mathcal {B}}({{\textbf {x}}}) = \frac{\Delta ^2}{2R(C)}. \end{aligned}$$

The second ingredient in the analysis is to introduce an upper bound on how large the potential bound \({\mathcal {B}}({{\textbf {x}}})\) can become. This will allow us to bound the number of iterations the algorithm takes.

Definition 1

(Gap) Let \({{\textbf {f}}}\) be a feasible \({{\textbf {b}}}\)-flow and let \({{\textbf {x}}}\) be any vertex potentials. Define

$$\begin{aligned} {{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}}):= {\mathcal {E}}({{\textbf {f}}}) - {\mathcal {B}}({{\textbf {x}}}) = \frac{1}{2}\sum _{e \in E} r(e)f(e)^2 - \left( {{\textbf {b}}}{^\top }{{\textbf {x}}} -\frac{1}{2} {{\textbf {x}}}{^\top }{{\textbf {L}}}{{\textbf {x}}}\right) . \end{aligned}$$

This same notion of a gap was introduced in the analysis of the Kelner et al. algorithm, and was also used to bound the number of iterations of the algorithm.

The electrical flow \({{\textbf {f}}}^*\) minimizes \({\mathcal {E}}({{\textbf {f}}})\) over all \({{\textbf {b}}}\)-flows \({{\textbf {f}}}\), and the corresponding vertex potentials \({{\textbf {x}}}^*\) maximize \({\mathcal {B}}({{\textbf {x}}}^*)\) over all vertex potentials \({{\textbf {x}}}\). Moreover, \({\mathcal {E}}({{\textbf {f}}}^*) = {\mathcal {B}}({{\textbf {x}}}^*)\). Therefore, for any feasible flow \({{\textbf {f}}}\), \({{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}})\) is an upper bound on optimality:

$$\begin{aligned} {{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}}) \ge {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}). \end{aligned}$$

The lemma below gives us another way to write \({{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}})\), and will be useful to us later. This relation is shown in Kelner et al. [7, Lemma 4.4].

Lemma 2

We have \({{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}}) = \frac{1}{2}\sum _{(i, j) \in \vec {E}} r(i, j) \left( f(i, j) - \frac{x(i)-x(j)}{r(i, j)}\right) ^2.\)

The analysis of Kelner et al. [7] relies on measuring progress in terms of the above-defined duality gap between primal flow energy and dual potential bound. The high-level idea of the analysis is that one can show that the duality gap decreases by a constant factor each iteration, which implies a linear convergence rate. In the analysis of their algorithm, they maintain a feasible \({{\textbf {b}}}\)-flow \({{\textbf {f}}}\) at each iteration, and measure \({{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}})\) against corresponding tree-defined potentials \({{\textbf {x}}}\).

One difference between their algorithm and ours is that we do not maintain a feasible \({{\textbf {b}}}\)-flow at each iteration. However, for \({{\,\textrm{gap}\,}}({{\textbf {f}}}, {{\textbf {x}}})\) to be a valid bound on distance to optimality, we need \({{\textbf {f}}}\) to be a feasible \({{\textbf {b}}}\)-flow. To this end, we introduce the definition of “tree-defined flow” below.

Definition 2

(Tree-defined flow) Let T be a spanning tree, \({{\textbf {x}}} \in {\mathbb {R}}^V\) vertex potentials, and \({{\textbf {b}}} \in {\mathbb {R}}^V\) satisfying \(\mathbb {1}{^\top }{{\textbf {b}}} = 0\) be a supply vector. The tree-defined flow with respect to T, \({{\textbf {x}}}\) and \({{\textbf {b}}}\) is the flow \({{\textbf {f}}}_{T, {{\textbf {x}}}}\) defined by

$$\begin{aligned} {{\textbf {f}}}_{T, {{\textbf {x}}}}(i, j) = \frac{x(i) - x(j)}{r(i, j)} \quad \text {if} (i, j) \not \in T, \end{aligned}$$

and for \((i, j) \in T\), \(f_{T,{{\textbf {x}}}}(i, j)\) is the unique value such that the resulting \({{\textbf {f}}}_{T, {{\textbf {x}}}}\) is a feasible \({{\textbf {b}}}\)-flow. That is, for \((i, j) \in T\), if \(C = C(i, j)\) is the fundamental cut defined by (ij) and \(b(C) = {{\textbf {b}}}{^\top }\mathbb {1}_C\) is the amount of flow that should be flowing out of C in a feasible \({{\textbf {b}}}\)-flow, then

$$\begin{aligned} f_{T,{{\textbf {x}}}}(i, j) = b(C) - \sum _{\begin{array}{c} k \in C,\, l \not \in C \\ kl \in E - ij \end{array}} f_{T,{{\textbf {x}}}}(k, l) = b(C) - \sum _{\begin{array}{c} k \in C,\, l \not \in C \\ kl \in E - ij \end{array}} \frac{x(k) - x(l)}{r(k, l)}. \end{aligned}$$

In other words, \({{\textbf {f}}}_{T,{{\textbf {x}}}}\) is a potential-defined flow outside of the tree T, and routes the unique flow on T to make it a feasible \({{\textbf {b}}}\)-flow.

The below lemma expresses \({{\,\textrm{gap}\,}}({{\textbf {f}}}_{T, {{\textbf {x}}}}, {{\textbf {x}}})\) in a nice way.

Lemma 3

Let T be a spanning tree, \({{\textbf {x}}}\) vertex potentials, and \({{\textbf {b}}}\) a supply vector. Let \({{\textbf {f}}}_{T, {{\textbf {x}}}}\) be the associated tree-defined flow. Then

$$\begin{aligned} {{\,\textrm{gap}\,}}({{\textbf {f}}}_{T, {{\textbf {x}}}}, {{\textbf {x}}}) = \frac{1}{2} \sum _{(i, j) \in T} r(i, j) \cdot \frac{\Delta (C(i, j))^2}{R(C(i, j))^2}. \end{aligned}$$

Suppose we have a probability distribution \((P_{ij}: (i, j) \in T)\) on the edges in T. If the algorithm samples an edge \((i, j) \in T\) from this distribution, then by Lemma 1 the expected increase in the dual objective is

$$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}')] - {\mathcal {B}}({{\textbf {x}}}) =\frac{1}{2} \sum _{(i,j) \in T} P_{ij}\cdot \Delta (C(i, j))^2/R(C(i, j)). \end{aligned}$$

We want to set the \(P_{ij}\) to cancel terms appropriately so that the right-hand side is a multiple of the gap. Looking at Lemma 3, we see that an appropriate choice is to set

$$\begin{aligned} P_{ij}:= \frac{1}{\tau } \cdot \frac{r(i, j)}{R(C(i, j))}, \end{aligned}$$

where \(\tau := \sum _{(i, j) \in T} \frac{r(i, j)}{R(C(i, j))}\) is the normalizing constant. For this choice of probabilities,

$$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}')] - {\mathcal {B}}({{\textbf {x}}}) = \frac{1}{2\tau } \sum _{(i, j) \in T} r(i,j)\cdot \frac{\Delta (C(i, j))^2}{R(C(i, j))^2} = \frac{1}{\tau }{{\,\textrm{gap}\,}}({{\textbf {f}}}_{T, {{\textbf {x}}}}, {{\textbf {x}}}), \end{aligned}$$

where \({{\textbf {f}}}_{T, {{\textbf {x}}}}\) is the tree-defined flow associated with potentials \({{\textbf {x}}}\). As a consequence, we have:

Lemma 4

If each iteration of the algorithm samples an edge \((i, j) \in T\) according to the probabilities \(P_{ij} = \frac{1}{\tau } \cdot \frac{r(i, j)}{R(C(i, j))}\), then we have

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^{t+1})] \le \left( 1 - \frac{1}{\tau }\right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) . \end{aligned}$$

Corollary 1

After \(K = \tau \ln (\frac{1}{\epsilon })\) iterations, \({\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^K)] \le \epsilon \cdot {\mathcal {B}}({{\textbf {x}}}^*)\).

We now use the previous lemmas to bound the number of iterations Dual KOSZ takes. Lemma 4 shows that the quantity \({\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\) decreases multiplicatively by \((1 - \frac{1}{\tau })\) each iteration. Thus, a smaller value of \(\tau \) gives faster progress. Moreover, it is not difficult to show that \(\tau = \text {st}_T(G, {{\textbf {r}}})\) (see Lemma 12 in the “Appendix”), which is why T is chosen to be a low-stretch spanning tree.

We also need to argue that rounding \({{\textbf {x}}}^K\) to \({{\textbf {f}}}^K\) via a tree-defined flow preserves approximate optimality. One can show that for any distribution over \({{\textbf {x}}}\) such that \({\mathbb {E}}_{{{\textbf {x}}}}[{\mathcal {B}}({{\textbf {x}}})] \ge (1-\frac{\epsilon }{\tau }){\mathcal {B}}({{\textbf {x}}}^*)\), we have \({\mathbb {E}}_{{{\textbf {x}}}}[{\mathcal {E}}({{\textbf {f}}}_{T, {{\textbf {x}}}})] \le (1+\epsilon ){\mathcal {E}}({{\textbf {f}}}^*)\). Combining everything together, we conclude:

Theorem 1

Let \(\tau \) be the total stretch of T. After \(K = \tau \ln (\frac{\tau }{\epsilon })\) iterations, Dual KOSZ returns \({{\textbf {x}}}^K \in {\mathbb {R}}^V\) and \({{\textbf {f}}}^K \in {\mathbb {R}}^{\vec {E}}\) such that \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}^K}\right\Vert _{{\textbf {L}}}^2 \le \frac{\epsilon }{\tau }\left\Vert {{{\textbf {x}}}^*}\right\Vert _{{\textbf {L}}}^2\) and \({\mathbb {E}}[{\mathcal {E}}({{\textbf {f}}}^K)] \le (1+\epsilon ) {\mathcal {E}}({{\textbf {f}}}^*)\), for \({{\textbf {f}}}^*\) and \({{\textbf {x}}}^*\) optimal primal and dual solutions respectively.

We end this section with a naïve bound on the running time of Dual KOSZ.

Lemma 5

Dual KOSZ can be implemented to run in \(\widetilde{O}(mn\log \frac{1}{\epsilon })\) time.

Proof

We can find a spanning tree T with total stretch \(\tau =O(m\log n \log \log n)\) in \(O(m \log n \log \log n)\) time [18].

For concreteness, fix an arbitrary vertex to be the root of T, and direct all edges in T towards the root. The set of fundamental cuts we consider will be the vertex sets of subtrees of T.

To compute b(C) for these \(n-1\) fundamental cuts C, we can work our way from the leaves up to the root. If \(C = \{v\}\) is a leaf of T, then \(b(C) = b(v)\). Otherwise, C is a subtree rooted at v, and \(b(C) = b(v) + \sum _{C'} b(C')\), where the sum is over the subtrees of C. Hence we can compute b(C) for all fundamental cuts C in O(n) time.

To compute R(C) for the fundamental cuts C, we can maintain \(n-1\) variables, one for each fundamental cut. The variable corresponding to cut C will represent \(R(C)^{-1} = \sum _{e \in \delta (C)} \frac{1}{r(e)}\), and the variables are initialized to 0. We then iterate through all the edges in the graph, and for each such edge e, add \(\frac{1}{r(e)}\) to the value of each variable that represents a cut C such that \(e \in \delta (C)\). Although this naive implementation takes O(mn) time in the worst-case, it is possible to improve this running time to \(O(m\log n)\) using link-cut trees [20]. One can also achieve this running time using the same data structure as the one used in [7].

The last part of the running time is the time it takes to run a single iteration of the algorithm. In each iteration of the algorithm, we need to compute \(\Delta = (b(C) - f(C))\cdot R(C)\), where C is the fundamental cut selected at that iteration. In the above two paragraphs, we described how to precompute the values of b(C) and R(C) for every fundamental cut C; note that these values are fixed at the beginning and do not change during the course of the algorithm. Hence, it remains to compute f(C). One way to compute f(C) is to simply iterate over all the edges in \(\delta (C)\), summing each edge’s contribution to f(C). This takes time proportional to \(\left|{\delta (C)}\right|\), which could be O(m) in the worst case. We can get this down to O(n) per iteration by maintaining the values of f(C) for every fundamental cut C, and updating these values each time the algorithm updates potentials. Since there are \(n-1\) cuts, to do this in O(n) time requires us to be able to update f(C) for a single cut in O(1) time. To do this, we can precompute an \((n-1) \times (n-1)\) table with a row/column for each fundamental cut, where the \((C_1, C_2)\) entry is the amount by which the flow out of \(C_2\) increases if we add 1 to the potential of every node in \(C_1\). Let \(H(C_1, C_2)\) denote this value. With this table, updating the value of f(C) after a potential update step essentially reduces to a single table lookup, which takes O(1) time.

Finally, note that one can construct the \(H(C_1, C_2)\) table in \(O(n^2)\) time using results from [21]. In the language of Definitions 5.3 and 5.5 in that paper, we are trying to compute \(C(v^\downarrow , w^\downarrow )\) for all vertices vw in the tree, where the edge weights are the reciprocals of the resistances. At the bottom of page 11, it states that the \(n^2\) values \(C(v, w^\downarrow )\) can be computed in \(O(n^2)\) time. At the top of Page 12, it then says that we get the values of \(C(v^\downarrow , w^\downarrow )\) using n treefix sums. (Each treefix sum is the procedure described in Lemma 5.8, and takes O(n) time.))

To summarize, we can run each iteration of the algorithm in O(m) time, which can be reduced to O(n) time if we precompute the \(H(C_1, C_2)\) table, which incurs an overhead of \(O(n^2)\) storage and \(O(n^2)\) preprocessing time.

Suppose each iteration of the algorithm takes O(I) time, and the algorithm uses O(L) preprocessing time (not including the time needed to compute the low-stretch spanning tree). Then the total running time of the algorithm is \(O(L + I \tau \ln (\frac{\tau }{\epsilon }) + m \log n \log \log n) = O(L + mI\log n \log \log n\log \frac{\tau }{\epsilon })\).

If we use the version which uses \(O(n^2)\) preprocessing time and O(n) time per iteration, then \(L = O(n^2)\) and \(I = O(n)\). This gives the running time of Dual KOSZ to be \(O(mn\log n \log \log n\log \frac{\tau }{\epsilon })\). \(\square \)

In Sect. 4, we argue that given a natural abstraction of the data structure problem we use in computing f(C) and updating potentials, it appears unlikely that we can implement each iteration in \(o(n^{1-\epsilon })\) time, if each iteration is to be processed one-by-one in an online fashion. In Sect. 5, we show how to overcome this data structure lower bound by taking advantage of the fact that the sequence of updates that we perform can be generated in advance.

4 Lower Bound on the Per-Iteration Complexity of the Algorithm

Recall that each single iteration of KOSZ can be implemented in logarithmic time. In this section we show that assuming the OMv conjecture (see below) each single iteration of Dual KOSZ cannot be implemented in linear time. This implies that in order to speed up our algorithm we need to “batch-up” iterations, which is the approach we use in the next section. We first present a natural data structure, called the TreeFlow data structure, such that each iteration of the algorithm requires only two operations of the TreeFlow data structure and then prove that assuming the OMv conjecture [14] it is impossible to implement the TreeFlow data structure such that each operation of the data structure takes \(O(n^{1-\epsilon })\) time. To simplify the reduction we reduce from a closely related problem called the Online Vector–Matrix-Vector Multiplication Problem (OuMv).

Definition 3

(Online Vector–Matrix-Vector Multiplication Problem) We are given a positive integer n, and a Boolean \(n\times n\) matrix \({{\textbf {M}}}\). At each time step \(t =1, \ldots , n\), we are shown a pair of Boolean vectors \(({\textbf {u}}_{\textbf {t}}, {\textbf {v}}_{\textbf {t}})\), each of length n. Our task is to output \({\textbf {u}}_{\textbf {t}}^\top {{\textbf {M}}} {\textbf {v}}_{\textbf {t}}\) using Boolean matrix–vector operations. Specifically, “addition" is replaced by the OR operation, so that \(0+0 = 0\), and \(0+1 = 1+0 = 1+1 = 1\). Hence, \({\textbf {u}}_{\textbf {t}}^\top {{\textbf {M}}} {\textbf {v}}_{\textbf {t}}\) is always either 0 or 1.

The OMv conjecture implies that no algorithm for the OuMv problem can do substantially better than naively multiplying \({\textbf {u}}_{\textbf {t}}^\top {{\textbf {M}}} {\textbf {v}}_{\textbf {t}}\) at time step t. Specifically, it says the following:

Lemma 6

( [14]) Let \(\epsilon > 0\) be any constant. Assuming the OMv conjecture, there is no algorithm for the online vector–matrix-vector multiplication problem that uses preprocessing time \(O(n^{3-\epsilon })\) and takes total time \(O(n^{3-\epsilon })\) with error probability at most 1/3 in the word-RAM model with \(O(\log n)\) bit words.

Thus we will reduce the OuMv problem to the TreeFlow data structure such that computing \({\textbf {u}}_{\textbf {t}}^\top {{\textbf {M}}} {\textbf {v}}_{\textbf {t}}\) requires two operations in the TreeFlow data structure. The lower bound then follows from Lemma 6.

4.1 The TreeFlow Data Structure

The TreeFlow data structure is given as input (1) an undirected graph \(G = (V, E)\) with \(n = \left|{V}\right|\), (2) a spanning tree T of G that is rooted at a fixed vertex x, (3) a value r(uv) for each edge \((u,v) \in E\) (representing the resistance of (uv)), and (4) a value b(v) for each vertex \(v \in V\) (representing the supply at v). The quantities r(uv) and b(v) are given at the beginning and will remain unchanged throughout the operations. For any set \(C \subset V\), let \(b(C):= \sum _{v \in C} b(v)\).

Furthermore, each vertex v has a non-negative value, denoted \(\textsf{value}(v)\), which can be seen as the “potential” of v. It is initially 0 and can be modified. For any set \(C \subset V\) we define the flow out of C to be the quantity

$$\begin{aligned} f(C):= \sum _{(u,v) \in E, u \in C, v \not \in C} \left( \textsf{value}(u) - \textsf{value}(v)\right) /r(u,v). \end{aligned}$$

The TreeFlow data structure supports the following operations.

  • \(\textsf {addvalue}({{{\textbf {{\textsf {vertex}}}}}}~v, {{{\textbf {{\textsf {real}}}}}}~x)\): Add x to the value of every vertex in the subtree of T rooted at v.

  • \(\textsf {findflow}({{{\textbf {{\textsf {vertex}}}}}}~v)\): Return \(b(C) - f(C)\), where C is the set of vertices in the subtree of T rooted at v.

The TreeFlow data structure implements exactly the operations we require for each iteration of Dual KOSZ: The addvalue operation allows us to update the potentials on a fundamental cut, and findflow computes \(b(C) - f(C)\), thereby allowing us to compute \(\Delta \) at each iteration. Note that if all b(v)-values are zero, the TreeFlow data structure simply returns \(-f(C)\), which gives it its name.

We even show the lower bound for a “relaxed” version defined as follows: In an \(\alpha \)-approximate TreeFlow data structure the operation addvalue remains as above and the operation findflow(v) returns a value that is within a multiplicative factor \(\alpha \ge 1\) (that can be a function of n) of the correct answer, i.e., a value between \((b(C) - f(C))/\alpha \) and \((b(C) - f(C))\cdot \alpha \). The hardness of approximation is interesting, because it turns out that even an approximation of this quantity is sufficient to obtain an algorithm for \({{\textbf {L}}}{{\textbf {x}}} = {{\textbf {b}}}\). (Albeit with a convergence rate that deteriorates with the approximation factor.)

Lemma 7

Let \(\epsilon > 0\) be any constant and let \(\alpha \ge 1\) be any value. Assuming the OMv conjecture, no implementation of the \(\alpha \)-approximate TreeFlow data structure exists that uses preprocessing time \(O(n^{3-\epsilon })\) and where the two operations addvalue and findflow both take \(O(n^{1-\epsilon })\) time, such that over a polynomial number of operations the error probability is at most 1/3 in the word-RAM model with \(O(\log n)\) bit words. This even holds if all r(uv) values are 1 and if all b(v) are 0.

Proof

Given an \(n \times n\) Boolean matrix \({{\textbf {M}}}\), we create the following TreeFlow data structure. The graph contains \(2n+1\) nodes, namely a special node x, one node \(c_j\) for each column j with \(1 \le j \le n\) and one node \(d_i\) for each row i with \(1 \le i \le n\). There is an edge \((d_i,c_j)\) if entry \(M_{ij}\) = 1. Additionally, every node \(c_j\) and every node \(d_i\) has an edge to x. These edges are added to guarantee that the graph is connected. We set \(r(c,d) = 1\) for every edge (cd) and denote this graph by G. Let T be the spanning tree of G that is rooted at x and consists of all the edges incident to x. Note that the subtree of T rooted at any node \(y \ne x\) consists of a single node y.

Now consider the sequence of n vector pairs \(({{\textbf {u}}}_t, {{\textbf {v}}}_t)\) of the OuMv problem. Let \(({{\textbf {u}}},{{\textbf {v}}})\) be any such pair. We show below how to compute \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}}\) with O(n) operations in the TreeFlow data structure. Thus the sequence of n vector pairs leads to \(O(n^2)\) operations. It then follows from the OMv conjecture and Lemma 6 that this sequence of \(O(n^2)\) operations in the TreeFlow data structure cannot take time \(O(n^{3-\epsilon })\), i.e., that it is not possible that the complexity of both the \(\textsf {addvalue}\) operation and the \(\textsf {findflow}\) operation are \(O(n^{1-\epsilon })\).

It remains to show how to compute \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}}\) with O(n) operations in the TreeFlow data structure. Initially the value \(\textsf{value}(v)\) of all nodes v is 0. Let Z be a large enough constant that we will specify later. First, increase the value of all nodes to Z by calling \(\textsf{addvalue}(x, Z)\).

When given \(({{\textbf {u}}},{{\textbf {v}}})\) we decrease the value of each row node \(d_i\) with \(u_i = 1\) to 0 by calling \(\textsf {addvalue}(d_i, -Z).\) Then, we perform a \(\textsf {findflow}(c_j)\) operation for each column node \(c_j\) with \(v_j = 1\). Afterwards we undo these operations again, so that every node has value Z again. (Alternatively, we could also increase the value of every node \(c_j\) with \(v_j=1\) and every node \(d_i\) with \(u_i = 0\) to 2Z, and instead of undoing the operations, we increase after the query the value of every node to 2Z. In which way we never execute an \(\textsf {addvalue}\) operation with negative second parameter.)

Note that \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}} = 1\) iff there exists an edge between a column node \(c_j\) with \(v_j = 1\) (i.e. \(\textsf{value}(c_j)=Z\)) and a row node \(d_i\) with \(u_i = 1\) (i.e. \(\textsf{value}(d_i)=0\)).

We now show that \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}} = 1\) iff \(f(c_j)>0\) for some column node \(c_j\) with \(v_j = 1\). (a) Assume first that \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}} = 1\) and let \(c^*\) denote a node \(c_j\) and \(d^*\) denote a node \(d_i\) such that \(v_j = 1\), \(u_i = 1\) and \(M_{ij} = 1.\) We will show that \(f(c^*) > 0\). Recall that the subtree of \(c^*\) consists only of \(c^*\). The edge \((c^*, d^*)\) leaves the subtree of \(c^*\), contributing a positive amount to \(f(c^*)\) because \(\textsf{value}(c^*)=Z\) and \(\textsf{value}(d^*)=0\). All other edges leaving the subtree of \(c^*\) contribute a non-negative amount to \(f(c^*)\), since \(\textsf{value}(c^*)=Z\) and \(\textsf{value}(d_k)\) for other \(k \ne i\) is either Z or 0. Thus \(f(c^*) > 0\). (b) Assume next that \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}} = 0\). In this case every node \(c_j\) with \(u_j = 1\) (and value Z) only has edges to nodes \(d_i\) with \(v_i = 0\) (and value Z). As before the subtree of every node \(c_j\) only consists of \(c_j\) and, thus, all edges leaving the subtree of \(c_j\) contribute 0 to the flow out of the subtree. Thus, for every node \(c_j\) with \(u_j = 1\) we have \(f(c_j) = 0\).

To summarize we have shown above that \({{\textbf {u}}}{^\top }{{\textbf {M}}} {{\textbf {v}}} = 1\) iff \(f(c_j) > 0\) for some column node \(c_j\) with \(\textsf{value}(c_j) = Z\). We will now show how to use the results of the findflow queries returned by an \(\alpha \)-approximate TreeFlow data structure to determine if \(f(c_j)\) is positive or zero.

Here is where we will choose the value of Z. The idea is to make Z large enough so that if \(f(c_j) > 0\), then \(f(c_j)\) is very large. The idea is that this will allow us to distinguish between \(f(c_j) = 0\) versus \(f(c_j) > 0\), even if we only have access to an \(\alpha \)-approximation of \(S(c_j) - f(c_j) = b(c_j) - f(c_j)\).

It will suffice to choose Z large enough so that if \(f(c_j) > 0\), then \(f(c_j) > \max \{b(c_j), b(c_j)(1 - \alpha ^2)\}\) (As \((1-\alpha ^2) < 0\), the second term makes sense if \(b(c_j) < 0\).) The value of Z depends on \(\alpha \), the supplies \({{\textbf {b}}}\), and the resistances \({{\textbf {r}}}\). For instance, it suffices to choose \(Z > \left\Vert {r}\right\Vert _\infty \left\Vert {b}\right\Vert _\infty \alpha ^2\). For this choice of Z, we have that if \(f(c_j) > 0\) then (since it must have an edge to some \(d_i\) with \(\textsf{value}(d_i) = 0\)),

$$\begin{aligned} f(c_j) \ge \frac{\textsf{value}(c_j) - \textsf{value}(d_i)}{r(c_j, d_i)} = \frac{Z-0}{r(c_j, d_i)}> \left|{b(c_j)}\right|\alpha ^2 > \max \{b(c_j),\, b(c_j)(1 - \alpha ^2)\}. \end{aligned}$$

Having chosen Z this way, we have the following:

  • If \(b(c_j) \ge 0\), then \(b(c_j) - f(c_j)\) is non-negative if \(f(c_j) = 0\), and negative otherwise (because \(f(c_j) > b(c_j)\) when \(f(c_j) > 0\).) Any \(\alpha \)-approximation of \(b(c_j) - f(c_j)\) allows us to correctly deduce the sign of \(b(c_j) - f(c_j)\), hence also whether \(b(c_j) -f(c_j) \ge 0\) or whether \(b(c_j) - f(c_j) < 0\). From this we can deduce whether \(f(c_j) = 0\) or \(f(c_j) > 0\).

  • Suppose \(b(c_j) < 0\). If \(f(c_j) = 0\), the approximate data structure returns an answer in the interval \([b(c_j)\cdot \alpha ,\,\frac{ b(c_j)}{\alpha }]\). If \(f(c_j) > 0\), it returns an answer in the interval \([(b(c_j) - f(c_j))\cdot \alpha ,\, \frac{b(c_j) - f(c_j)}{\alpha }]\). Note that the left endpoint of the first interval is to the right of the right endpoint of the second interval as \(f(c_j) > b(c_j)\left( 1 - \alpha ^2\right) \) implies that

    $$\begin{aligned} \implies b(c_j)\cdot \alpha > \frac{b(c_j) - f(c_j)}{\alpha }. \end{aligned}$$

    Since the two intervals for \(f(c_j) = 0\) and \(f(c_j) > 0\) do not overlap, we can correctly distinguish the two cases using the approximate data structure.

To summarize, each findflow query on \(c_j\) allows us to determine if \(f(c_j) > 0\) or \(f(c_j) = 0\). If the flow is positive for some \(c_j\), then the answer is \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}} = 1\), otherwise it is 0. Note that it requires O(n) operations in the TreeFlow data structure to determine one \({\textbf {u}}^\top {{\textbf {M}}} {{\textbf {v}}}\) value, which completes the proof. \(\square \)

Remark 1

Note that the proof can be modified to be more similar to the update sequence generated by Dual KOSZ which alternates between addvalue and findflow operations by inserting after each addvalue operation a findflow operation (whose answer might be ignored for the logic of the proof).

As mentioned, the proof can also be adapted so that the values stored at the nodes are only increased, but this is not necessary for our application.

5 Speeding Up Dual KOSZ

We now show how to surmount the OMv lower bound by taking advantage of the fact that the sequence of updates that Dual KOSZ performs can be generated in advance. In Sect. 5.1, we show that batching the updates yields a modification of the algorithm that runs in \(\widetilde{O}(m^{1.5})\) time. Then in Sect. 5.2, we use sparsification and recursion to further improve the runtime to \(\widetilde{O}(m^{1+\alpha })\) for any \(\alpha > 0\).

5.1 A Faster Algorithm Using Batching

First, we show that it is possible to speed up the running time to \(\widetilde{O}(m^{1.5})\) time by batching the updates performed by Dual KOSZ. In Lemma 5, we showed that the algorithm can be implemented to run in time \({\widetilde{O}}(mn)\). (Here the tilde hides a factor of \(\log n \log \log n \log \frac{1}{\epsilon }\).) This running time essentially comes from \(\widetilde{O}(m)\) iterations, and O(n) time per iteration. Recall that each iteration of Dual KOSZ involves sampling a fundamental cut C of the low-stretch spanning tree T from a fixed probability distribution P, and then adding a constant to the potential of every vertex in C so that the resulting potential-defined flow satisfies flow conservation across C.

The main idea of batching is as follows. Denote the number of iterations by K (which is \({\widetilde{O}}(m)\)). Instead of sampling the fundamental cuts one at a time, consider sampling the next l cuts that need to be updated for some \(l \ll K\). We can perform this sampling in advance because both the tree T and the probability distribution over cuts of T are fixed over the entire course of the algorithm. In each “block" of size \(l \ll K\), we contract all the edges of T that do not correspond to one of the l fundamental cuts to be updated. In this way, we work with a contracted tree of size O(l) in each block (instead of the full tree, which has size O(n)). This makes the updates faster. However, the price we pay is that at the end of each block, we need to propagate the updates we made (which were on the contracted tree), back to the entire tree. We will show that by choosing \(l = \sqrt{m}\), we can balance this tradeoff and get an improved running time of \(\widetilde{O}(m^{1.5})\). Pseudocode for Dual KOSZ with batching is given in Algorithm 3. Note that the correctness of this algorithm follows directly from the correctness of Dual KOSZ: Algorithm 3 samples cuts from exactly the same distribution as Dual KOSZ, and if we fix the same sequence of cuts to be used by both algorithms, then the output of the two algorithms is identical.

Algorithm 3
figure d

Dual KOSZ with batching

Theorem 3

The running time of Dual KOSZ with batching is \(O(m^{1.5}\log n \log \log n \log \frac{1}{\epsilon })\). This is achieved by choosing \(l = \sqrt{m}\).

Proof

At the beginning of the algorithm, we compute the values of b(C) (O(n) time via a dynamic program) and R(C) (\(O(m \log n)\) time via link-cut trees), just as in the proof of Lemma 5.

Consider a batch of l updates. Note that the contracted tree \(\widetilde{T}\) has at most \(l+1\) vertices. After contracting, we need to perform l updates. This involves, for each \(k \in \{1,2,\ldots , l\}\):

  • Computing \(\Delta _k:= (b(C_k) - f(C_k))\cdot R(C_k)\),

    • This takes O(1) time assuming \(f(C_k)\) has already been computed. (Recall that the values \(b(C_k)\) and \(R(C_k)\) are computed at the very beginning of the algorithm.)

  • Adding \(\Delta _k\) to \({{\textbf {y}}}({\tilde{v}})\) for every \({\tilde{v}} \in \widetilde{C}_k\).

    • This takes O(l) time, because the contracted tree has size O(l).

  • Updating the values \(f(C_{k+1}), f(C_{k+2}), \ldots , f(C_{l})\) so they can be used in the later iterations of the inner loop.

    • If each \(f(C_j)\) can be updated in O(1) time, this takes O(l) time.

    • To update each \(f(C_j)\) in O(1) time, we can precompute at the beginning of the block the \(H(C_i, C_j)\) table for \(i,j \in \{1,2,\ldots , l\}\), just as in the proof of Lemma 5. The difference now is that we only need to compute the table for the cuts that will be updated in the block. There are l such cuts, so the total time to compute the table is \(O(l^2)\), again using Karger’s method.

At the end of each block, we propagate the updates we made on the contracted graph back to the original graph. This involves

  • Determining the new potential of each node in G.

    • This takes O(n) time by iterating over the nodes of G.

  • Determining the value of f(C) for each fundamental cut determined by T. (By convention, assume the edges of T are directed toward the root, and that the fundamental cuts we consider are the vertex sets of the subtrees of T.)

    • This can be done in O(m) time using a dynamic program that works from the leaves to the root. First, we compute f(C) at each leaf of the tree. Next, suppose we have are at a non-leaf node v, and let C be the set of vertices in the subtree rooted at v. Suppose we have already computed \(f(D_1), f(D_2), \ldots , f(D_k)\), where \(D_1, \ldots , D_k\) are the proper subtrees of v. Then we can compute f(C) as follows:

      $$\begin{aligned} f(C) = \sum _{i=1}^k f(D_i) + \sum _{w: vw \in E} \frac{p(v) - p(w)}{r(v, w)}. \end{aligned}$$

      This sum correctly counts the flow leaving C. This is because any edge leaving C is counted once. On the other hand, if an edge is between \(D_i\) and \(D_j\), then it is counted once in the \(f(D_i)\) term, and once with the opposite sign in the \(f(D_j)\) term, so it zeros out. Similarly, if an edge is between \(D_i\) and v, it also zeros out. The running time of this dynamic program is O(m), because the time taken at each node is proportional to its degree, and the sum of all the node degrees is equal to 2m.

To summarize, there are K iterations, divided into blocks of size l. In each block, we pay the following.

  • Start of block O(m) time to contract the tree, and \(O(l^2)\) time to compute the \(H(C, C')\) table for the cuts that will be updated in the block.

  • During the block O(l) time per iteration. Since each block consists of l iterations, this is \(O(l^2)\) in total.

  • End of block O(m) time to propagate the changes from the contracted tree to the original tree.

Hence, each block takes \(O(m + l^2)\) time. Multiplying by the number of blocks, which is K/l, this gives a running time of \(O(K(\frac{m}{l} + l))\). Choosing \(l = \sqrt{m}\) to minimize this quantity, we get \(O(K\sqrt{m})\).

The final running time is therefore \(O(K\sqrt{m})\) plus the preprocessing time. Preprocessing consists of finding a a low-stretch spanning tree (\(O(m\log n \log \log n\))), plus computing the values of R(C) (\(O(m\log n)\)), and f(C) (O(n)).

Thus, the preprocessing time is dominated by the time it takes to run the iterations. So, the total running time is now: \(O(K\sqrt{m}) = O(m^{1.5}\log n \log \log n \log \frac{1}{\epsilon })\). \(\square \)

5.2 A Still Faster Algorithm via Batching, Sparsification, and Recursion

We now show that we can further speed up the algorithm using sparsification and recursion. The goal is to show that we can we can obtain a running time of the form \(O(A^{\frac{1}{\delta }}m^{1+\delta }(\log n)^{\frac{B}{\delta }}(\log \frac{1}{\epsilon })^{\frac{1}{\delta }})\) for any \(\delta > 0\), where A and B are numerical constants.

Consider batching the iterations of the algorithm as follows. Pick a positive integer d, and repeat K times:

  • Sample the next d updates to be performed by the algorithm. These correspond to d edges of the spanning tree T.

  • Let \(V_0, V_1, \ldots , V_d\) be the vertex sets that T is partitioned into by the d tree edges.

  • Add \(\Delta (i)\) to every vertex in \(V_i\). We will choose the values \(\Delta (0), \Delta (1), \ldots , \Delta (d)\) to greedily maximize the increase in the dual bound.

Note that our original algorithm corresponds to the case when \(d = 1\). The lemma below quantifies the increase of the dual objective after one step of the above update.

Lemma 8

Let \((V_0, \ldots , V_d)\) be a partition of V. Let \({{\textbf {x}}}\in {\mathbb {R}}^V\) be a vector of potentials, and let \(\Delta = (\Delta (0), \ldots , \Delta (d))\) be any vector in \({\mathbb {R}}^{d+1}\). Let \({\tilde{{{\textbf {x}}}}}\) be obtained from \({{\textbf {x}}}\) by adding \(\Delta (i)\) to the potential of every node in \(V_i\). Then, the increase in the dual bound is given by the formula

$$\begin{aligned} {\mathcal {B}}(\widetilde{{{\textbf {x}}}}) - {\mathcal {B}}({{\textbf {x}}}) = {{{\textbf {b}}}}_H^T\Delta - \frac{1}{2}\Delta ^T{{{\textbf {L}}}}_H\Delta , \end{aligned}$$

where

  • H is the contracted graph with vertices \(V_0, V_1, \ldots , V_d\) and resistances \(r(V_k, V_l) = \left( \sum _{ij \in \delta (V_k, V_l)} \frac{1}{r(i,j)}\right) ^{-1}\),

  • \({{{\textbf {L}}}}_H\) is the Laplacian matrix of H, and

  • \({b}_H(k) = b(V_k) - f(V_k)\) for \(k = 0,1,\ldots , d\).

In particular, the choice of \(\Delta \) that maximizes \({\mathcal {B}}(\widetilde{{{\textbf {x}}}}) - {\mathcal {B}}({{\textbf {x}}})\) is given by the solution to \({{{\textbf {L}}}}_H\Delta = {{\textbf {b}}}_H\).

Proof

We write the increase in the dual potential bound. Recall that \(f(V_k)\) is the amount of flow leaving \(V_k\) in the flow \(f(i,j) = \frac{x(i)-x(j)}{r(i,j)}\). We let \(f(V_k, V_l)\) be the amount of flow going from \(V_k\) to \(V_l\).

$$\begin{aligned}&2\left( {\mathcal {B}}(\widetilde{{{\textbf {x}}}}) - {\mathcal {B}}({{\textbf {x}}})\right) \\&\quad = (2{{\textbf {b}}}^T\widetilde{{{\textbf {x}}}} - \widetilde{{{\textbf {x}}}}^T{{\textbf {L}}}\widetilde{{{\textbf {x}}}}) - (2{{\textbf {b}}}^T{{\textbf {x}}}- {{\textbf {x}}}^T{{\textbf {L}}}{{\textbf {x}}}) \\&\quad = 2\sum _k b(V_k)\Delta (k) + \sum _{(i,j) \in \vec {E}} \frac{1}{r(i,j)}\left[ (x(i)-x(j))^2 - ({\tilde{x}}(i) - {\tilde{x}}(j))^2\right] \\&\quad = 2\sum _k b(V_k)\Delta (k) + \sum _{(i,j) \in \vec {E}} \frac{1}{r(i,j)}\left[ (x(i)-x(j) + {\tilde{x}}(i) - {\tilde{x}}(j))(x(i)-x(j) - {\tilde{x}}(i) + {\tilde{x}}(j))\right] \\&\quad = 2\sum _k b(V_k)\Delta (k) + \sum _{k<l}\sum _{(i,j)\in \delta (V_k,V_l)} \frac{1}{r(i,j)}\left[ (2x(i)-2x(j) +\Delta (k) -\Delta (l))(\Delta (l) - \Delta (k))\right] \\&\quad = 2\sum _k b(V_k)\Delta (k) + 2\sum _{k<l}(\Delta (l) - \Delta (k))\sum _{(i,j)\in \delta (V_k,V_l)} \frac{1}{r(i,j)}(x(i)-x(j)) \\&\qquad -\sum _{k<l}(\Delta (k) - \Delta (l))^2\sum _{(i,j) \in \delta (V_k, V_l)} \frac{1}{r(i,j)} \\&\quad = 2\sum _k b(V_k)\Delta (k) + 2\sum _{k<l}(\Delta (l) - \Delta (k))f(V_k, V_l) -\Delta ^T{{\textbf {L}}}_{H}\Delta \\&\quad = 2\sum _k b(V_k)\Delta (k) - 2\sum _{k}\Delta (k) f(V_k) -\Delta ^T{{\textbf {L}}}_{H}\Delta \\&\quad = 2\sum _k (b(V_k)-f(V_k))\Delta (k) -\Delta ^T{{\textbf {L}}}_{H}\Delta \\&\quad = 2{{{\textbf {b}}}}_H^T\Delta -\Delta ^T{{\textbf {L}}}_{H}\Delta \end{aligned}$$

Note that this is a concave function of \(\Delta \), because \({{\textbf {L}}}_{H}\) is positive semidefinite. Therefore, maximizing this expression is equivalent to setting its gradient to 0. Taking its gradient and setting to 0 yields \({{\textbf {L}}}_{H}\Delta = {{\textbf {b}}}_H\), as claimed. \(\square \)

Remark 2

Another interpretation of the \(\Delta \) that maximizes \({\mathcal {B}}(\widetilde{{{\textbf {x}}}}) - {\mathcal {B}}({{\textbf {x}}})\) in the Lemma above is as follows: \((\Delta (0), \ldots , \Delta (d))\) are the values such that if one adds \(\Delta (i)\) to the potential of every vertex in \(V_i\), the resulting potential-induced flow satisfies the flow constraints \(f(V_k) = b(V_k)\) for all \(k = 0, \ldots , d\).

5.3 The Sparsify and Recurse Algorithm

Next we give the algorithm with sparsification and recursion in more detail. Observe that d cut-toggling updates effectively break the spanning tree into \(d+1\) components. After contracting the components to get a graph H with \(d+1\) vertices, Lemma 8 shows that solving the Laplacian system \({{\textbf {L}}}_H \Delta = {{\textbf {b}}}_H\) gives the update that maximizes the increase in \({\mathcal {B}}(\widetilde{{{\textbf {x}}}}) - {\mathcal {B}}({{\textbf {x}}})\) among all updates that increment the potential of all vertices in \(V_i\) by the same amount. In particular, the progress made by this update step is is at least as large as the progress made by the sequence of d updates performed by the straightforward unbatched algorithm.

A natural approach is to solve \({{\textbf {L}}}_H \Delta = {{\textbf {b}}}\) recursively. However, this by itself does not give an improved running time. Instead, we will first spectrally sparsify H to get a sparsified approximation \(\widetilde{{{\textbf {L}}}}_H\) of \({{\textbf {L}}}_H\), satisfying \((1-\gamma ){{\textbf {L}}}_H \preceq \widetilde{{{\textbf {L}}}}_H \preceq (1+\gamma ) {{\textbf {L}}}_H\) for an appropriate constant \(\gamma \in (0,1)\). Such a matrix \(\widetilde{{{\textbf {L}}}}_H\) is known as a \(\gamma \)-spectral sparsifier of \({{\textbf {L}}}_H\). We then call the algorithm recursively on H to solve \(\widetilde{{{\textbf {L}}}}_H \widetilde{\Delta } = {{\textbf {b}}}_H\). This approach is akin to recursive sparsification based Laplacian solvers [22,23,24], but whose progress is bounded via the effect of single cut toggles. A main task of the analysis is to bound the error incurred by solving the sparsified system instead of the exact one.

For the spectral sparsification, we need to use an algorithm that does not require calling Laplacian solvers as a subroutine. For examples of such algorithms, see [16, 25,26,27,28]. We will use the sparsifier given in Theorem 1.2 of [16], which returns a \(\gamma \)-spectral sparsifier with \(O(n\log ^2 n \log \log n / \gamma ^2)\) edges in \(O(m \log ^2 n \log \log n / \gamma ^2)\) time, with probability at least \(1 - \frac{1}{\text {poly}(n)}\). Here, mn are the number of edges and vertices, respectively, in the graph before sparsifying, and the \(1- \frac{1}{\text {poly}(n)}\) probability means that one can make the probability \(1 - \frac{1}{n^k}\) for any constant k, with the k appearing as a multiplicative factor in the big-O expressions. We will take \(k=2\) when calling the sparsifier in our algorithm, so it succeeds with probability at least \(1 -\frac{1}{n^2}\).

Algorithm 4
figure e

Dual KOSZ with batching, sparsification, and recursion

Pseudocode for Dual KOSZ with batching, sparsification, and recursion is given in Algorithm 4. The base case of the recursive algorithm is when \(\left|{V}\right| \le n_0\), where \(n_0\) is a constant that the algorithm can choose. For the base case, we simply use Gaussian elimination to solve \({{\textbf {L}}}_G{{\textbf {x}}}={{\textbf {b}}}\), which takes O(1) time since \(n_0\) is a constant. For every t, contracting G down to \(H^t\) and computing the new resistances takes O(m) time; this is because each edge in G contributes to the resistance between exactly one pair of nodes in \(H^t\).

5.4 Analysis of the Sparsify and Recurse Algorithm

We now analyze Algorithm 4. We first bound the convergence rate, then analyze the running time.

5.4.1 Error Analysis

The lemma below bounds the expected rate of convergence of \({{\textbf {x}}}^t\) to \({{\textbf {x}}}^*\).

Lemma 9

For all \(t \ge 0\), we have \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}^t}\right\Vert _{{{\textbf {L}}}_G}^2 \le \left( 1 - \beta + \beta e^{-\frac{d}{\tau }}\right) ^t \left\Vert {{{\textbf {x}}}^*}\right\Vert _{{{\textbf {L}}}_G}^2\). Here,

  • \(\tau = O(m\log n \log \log n)\) is the stretch of the spanning tree,

  • d is the number of updates in each batch,

  • \(\beta = \left( 1-\frac{1}{n_0^2}\right) \left( 1-\left( 4\epsilon '\cdot \frac{1+\gamma }{1-\gamma }\cdot \left( 1+\frac{\gamma ^2}{(1-\gamma )^2}\right) + \frac{2\gamma ^2}{(1-\gamma )^2}\right) \right) .\)

In particular, if we choose \(n_0 = 10\), \(\gamma = \frac{1}{100}\), and \(\epsilon ' = \frac{1}{100}\), then \(\beta \ge \frac{4}{5}\), so that

$$\begin{aligned} {\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}^t}\right\Vert _{{{\textbf {L}}}_G}^2 \le \left( \frac{1}{5} + \frac{4}{5} e^{-\frac{d}{\tau }}\right) ^t \left\Vert {{{\textbf {x}}}^*}\right\Vert _{{{\textbf {L}}}_G}^2. \end{aligned}$$

Proof

Define the random variable \(D^t:= {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\). We will show in Lemma 11 that for every possible realization \({{\textbf {x}}}^t\), we have \({\mathbb {E}}\left[ D^{t+1} \mid {{\textbf {x}}}^t\right] \le \left( 1 -\beta +\beta e^{-d/\tau }\right) {\mathbb {E}}\left[ D^t \mid {{\textbf {x}}}^t\right] .\) This implies that \({\mathbb {E}}[D^{t+1}] \le \left( 1 -\beta +\beta e^{-d/\tau }\right) {\mathbb {E}}\left[ D^t\right] \) unconditionally.

It then follows that

$$\begin{aligned} {\mathbb {E}}\left[ D^t\right] \le \left( 1 -\beta +\beta e^{-d/\tau }\right) ^t{\mathbb {E}}\left[ D^0\right] = \left( 1 -\beta +\beta e^{-d/\tau }\right) ^t{\mathcal {B}}({{\textbf {x}}}^*). \end{aligned}$$

Thus, \({\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^t)] \le \left( 1 -\beta +\beta e^{-d/\tau }\right) ^t{\mathcal {B}}({{\textbf {x}}}^*).\) \(\square \)

As in the original analysis of Dual KOSZ, we will study the duality gap and analyze its decrease at each step of the algorithm. Consider some iteration t of the algorithm. Recall that \({{\textbf {x}}}^t\) is the iterate at the start of iteration t. For every possible sequence of \(e_1^t, \ldots , e_d^t\) (the trees edges chosen in iteration t), define the following:

  • Let \({\hat{{{\textbf {x}}}}}^{t+1}\) be the vector obtained from \({{\textbf {x}}}^t\) by adding \({\Delta }^t(V^t_i)\) to every vertex in \(V^t_i\), where \({\Delta }^t:= {{\textbf {L}}}_{H^t}^\dag {{\textbf {b}}}^t_{H^t}\).

  • Let \({\bar{{{\textbf {x}}}}}^{t+1}\) be obtained from \({{\textbf {x}}}^t\) by applying the updates for the sequence of tree edges \(e_1^t, \ldots , e_d^t\), one by one. (i.e. Exactly as in the original, unbatched version of Dual KOSZ described in Sect. 3.)

Lemma 10

Fix any choice of \(e^t_1, \ldots , e^t_d\), and assume that \(\widetilde{{{\textbf {L}}}}_{H^t}\) is a \(\gamma \)-approximate sparsifier of \({{\textbf {L}}}_{H^t}\). Then

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t) \ge (1-\alpha )\left( {\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t)\right) \end{aligned}$$

where \(\alpha = 4\epsilon '\cdot \frac{1+\gamma }{1-\gamma }\cdot \left( 1+\frac{\gamma ^2}{(1-\gamma )^2}\right) + \frac{2\gamma ^2}{(1-\gamma )^2}\).

If we further assume that \(\gamma \in (0,\frac{1}{2})\), we can simplify to get

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t) \ge (1-12\epsilon '-8\gamma ^2-48\epsilon '\gamma ^2)\left( {\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t)\right) . \end{aligned}$$

Proof

To simplify notation, in this proof we will use

  • \({{\textbf {b}}}_H\) to denote \({{{\textbf {b}}}^t_{H^t}}\),

  • \({\tilde{\Delta }}_{\epsilon '}\) to denote \({\tilde{\Delta }}_{\epsilon '}^t\),

  • \({\tilde{\Delta }}\) to denote \({\tilde{\Delta }}^t\)

  • \(\Delta \) to denote \(\Delta ^t\),

  • H to denote \(H^t\),

Later in this proof, we will show that

$$\begin{aligned} \left\Vert {{\tilde{\Delta }}_{\epsilon '} - \Delta }\right\Vert _{{{\textbf {L}}}_H}^2 \le \alpha \left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2, \end{aligned}$$
(1)

for a constant \(\alpha \) that depends on \(\epsilon '\) and \(\gamma \). Assuming (1) holds, by the definition of the matrix norm it follows that

$$\begin{aligned} \left( {\tilde{\Delta }}_{\epsilon '} - \Delta \right) ^T{{\textbf {L}}}_H\left( {\tilde{\Delta }}_{\epsilon '} - \Delta \right) \le \alpha \Delta ^T{{\textbf {L}}}_H\Delta . \end{aligned}$$

Expanding the left-hand side and rearranging, we get

$$\begin{aligned} 2{\tilde{\Delta }}_{\epsilon '}^T{{\textbf {L}}}_H \Delta - {\tilde{\Delta }}_{\epsilon '}^T{{\textbf {L}}}_H {\tilde{\Delta }}_{\epsilon '} \ge (1 - \alpha ) \Delta ^T{{\textbf {L}}}_H\Delta . \end{aligned}$$

Using \({{\textbf {L}}}_H \Delta = {{\textbf {b}}}_H\), this becomes

$$\begin{aligned} 2{\tilde{\Delta }}_{\epsilon '}^T{{\textbf {b}}}_H - {\tilde{\Delta }}_{\epsilon '}^T {{\textbf {L}}}_H{\tilde{\Delta }}_{\epsilon '} \ge (1 - \alpha ) \Delta ^T{{\textbf {L}}}_H\Delta . \end{aligned}$$

Recall that \({{\textbf {x}}}^{t+1}\) is obtained from \({{\textbf {x}}}^t\) by adding \({\tilde{\Delta }}_{\epsilon '}(V_i^t)\) to every vertex in \(V_i^t\). Using Lemma 8 with \(\Delta (i) = \widetilde{\Delta }_{\epsilon '}(V_i^t)\), \({{\textbf {x}}}= {{\textbf {x}}}^t\), \(\widetilde{{{\textbf {x}}}} = {{\textbf {x}}}^{t+1}\), \(\widetilde{{{\textbf {L}}}} = {{\textbf {L}}}_H\), and \(\widetilde{{{\textbf {b}}}} = {{\textbf {b}}}_H\) it follows that the left-hand side is equal to \({\mathcal {B}}({{\textbf {x}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t)\). On the other hand, \({\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t) = 2{{\textbf {b}}}_H^T\Delta - \Delta ^T{{\textbf {L}}}_H\Delta = \Delta ^T{{\textbf {L}}}_H\Delta \). (Since \({{\textbf {L}}}_H \Delta = {{\textbf {b}}}_H\).) Thus, the right-hand side is equal to \((1-\alpha )\left( {\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^{t})\right) \). Thus we have

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t) \ge (1 - \alpha )\left( {\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t)\right) , \end{aligned}$$

as claimed.

It remains to prove (1). To prove (1), note that we have

  1. 1.

    \(\left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\tilde{\Delta }}}\right\Vert _{\widetilde{{{\textbf {L}}}}_H}^2 \le \epsilon ' \left\Vert {{\tilde{\Delta }}}\right\Vert _{\widetilde{{{\textbf {L}}}}_H}^2\) (This is the error from the recursive solve).

  2. 2.

    \(\left\Vert {{\tilde{\Delta }} - \Delta }\right\Vert ^2_{{{\textbf {L}}}_H} \le h(\gamma ) \left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2\) (follows by part 2 of Proposition 1 in the “Appendix”. This is the error from sparsification).

The first inequality, together with \((1-\gamma ){{\textbf {L}}}_H \preceq \widetilde{{{\textbf {L}}}}_H \preceq (1+\gamma ){{\textbf {L}}}_H\) and part 1 of Proposition 1 in the “Appendix”, implies that

$$\begin{aligned} \left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\tilde{\Delta }}}\right\Vert _{{{\textbf {L}}}_H}^2 \le \frac{1}{1-\gamma } \left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\tilde{\Delta }}}\right\Vert _{\widetilde{{{\textbf {L}}}}_H}^2 \le \frac{\epsilon '}{1-\gamma } \left\Vert {{\tilde{\Delta }}}\right\Vert _{\widetilde{{{\textbf {L}}}}_H}^2\le \epsilon ' \cdot \frac{1+\gamma }{1-\gamma } \cdot \left\Vert {{\tilde{\Delta }}}\right\Vert ^2_{{{\textbf {L}}}_H}. \end{aligned}$$

Now, using the inequality \(\left\Vert {a+b}\right\Vert ^2 \le 2\left\Vert {a}\right\Vert ^2 + 2\left\Vert {b}\right\Vert ^2\) (which holds for any norm), we note that

$$\begin{aligned} \left\Vert {{\tilde{\Delta }}}\right\Vert ^2_{{{\textbf {L}}}_H} \le 2\left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2 + 2\left\Vert {{\tilde{\Delta }} -\Delta }\right\Vert _{{{\textbf {L}}}_H}^2 \le 2\left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2 +2h(\gamma )\left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2. \end{aligned}$$

Hence,

$$\begin{aligned} \left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\tilde{\Delta }}}\right\Vert _{{{\textbf {L}}}_H}^2 \le 2\epsilon ' \cdot \frac{1+\gamma }{1-\gamma } \cdot (1+h(\gamma )) \left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2. \end{aligned}$$

Again using \(\left\Vert {a+b}\right\Vert ^2 \le 2\left\Vert {a}\right\Vert ^2 + 2\left\Vert {b}\right\Vert ^2\), we have

$$\begin{aligned} \left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\Delta }}\right\Vert _{{{\textbf {L}}}_H}^2&\le 2\left( \left\Vert {{\tilde{\Delta }}_{\epsilon '} - {\tilde{\Delta }}}\right\Vert _{{{\textbf {L}}}_H}^2 + \left\Vert {{\tilde{\Delta }} - {\Delta }}\right\Vert _{{{\textbf {L}}}_H}^2\right) \\&\le 2\left( 2\epsilon ' \cdot \frac{1+\gamma }{1-\gamma } \cdot (1+h(\gamma )) \left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2 + h(\gamma )\left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2\right) \\&= \left( 4\epsilon ' \cdot \frac{1+\gamma }{1-\gamma } \cdot (1+h(\gamma )) + 2h(\gamma ) \right) \left\Vert {\Delta }\right\Vert _{{{\textbf {L}}}_H}^2. \end{aligned}$$

Therefore, (1) holds with \(\alpha =4\epsilon ' \cdot \frac{1+\gamma }{1-\gamma } \cdot (1+h(\gamma )) + 2\,h(\gamma )\). \(\square \)

Lemma 11

For any vector \({{\textbf {x}}}^t\), we have

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({{{\textbf {x}}}}^{t+1})] \le \left( 1 -\beta +\beta e^{-d/\tau }\right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) , \end{aligned}$$

where \(\beta = (1-\frac{1}{n_0^2})(1-\alpha )\).

Here, the expectation is taken over the random choices of \(e_1^t, \ldots , e^t_d\), and also over the randomness of the sparsification step. (Recall that we use the sparsifier given in Theorem 1.2 of [16], which successfully returns a \(\gamma \)-approximate sparsifier with probability at least \(1 - \frac{1}{\left|{V(H^t)}\right|^2}\).)

Proof

By Lemma 4, we know that

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({\bar{{{\textbf {x}}}}}^{t+1})] \le \left( 1 - \frac{1}{\tau }\right) ^d\left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) \le e^{-\frac{d}{\tau }}\left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) . \end{aligned}$$

Rearranging, this is equivalent to

$$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({\bar{{{\textbf {x}}}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \ge \left( 1 - e^{-\frac{d}{\tau }} \right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) . \end{aligned}$$

Observe that for every realization of \(e^t_1, \ldots , e^t_d\), we have \({\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) \ge {\mathcal {B}}({\bar{{{\textbf {x}}}}}^{t+1})\). This is because \({\hat{{{\textbf {x}}}}}^{t+1} - {{\textbf {x}}}^t = \Delta ^t\), where \(\Delta ^t\) by definition is the vector that maximizes the increase \({\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) - {\mathcal {B}}({{\textbf {x}}}^t)\) while subject to being incremented by the same amount on each of the components \(V^t_0, \ldots , V^t_d\). On the other hand, the vector \({\bar{{{\textbf {x}}}}}^{t+1} - {{\textbf {x}}}^t\) is also incremented by the same amount on each of the components \(V^t_0, \ldots , V^t_d\) by the way our original algorithm works.

Since \({\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1}) \ge {\mathcal {B}}({\bar{{{\textbf {x}}}}}^{t+1})\) holds for every realization of \(e^t_1, \ldots , e^t_d\), it follows that \({\mathbb {E}}[{\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1})] \ge {\mathbb {E}}[{\mathcal {B}}({\bar{{{\textbf {x}}}}}^{t+1})]\), where the expectation is taken over the random choices of \(e^t_1, \ldots , e^t_d\) made by the algorithm. Hence,

$$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \ge \left( 1 - e^{-\frac{d}{\tau }}\right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) . \end{aligned}$$
(2)

To conclude, we will use Lemma 10 to translate the above inequality (which is in terms of \({\hat{{{\textbf {x}}}}}^{t+1}\)), to an inequality in terms of \({{\textbf {x}}}^{t+1}\). We have

  • With probability \(\ge 1 - \frac{1}{n_0^2}\), the sparsifier is successful and by Lemma 10,

    $$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \ge (1-\alpha )\left( {\mathbb {E}}[{\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \right) . \end{aligned}$$
  • With probability \(\le \frac{1}{n_0^2}\), the sparsifier is unsuccessful and \({\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \ge 0\). This is because in the algorithm, we evaluate \({\mathcal {B}}({{\textbf {x}}}^{t+1})\) and only update \({{\textbf {x}}}^t\) to \({{\textbf {x}}}^{t+1}\) if \({\mathcal {B}}({{\textbf {x}}}^{t+1}) \ge {\mathcal {B}}({{\textbf {x}}}^t)\). Otherwise, we make \({{\textbf {x}}}^{t+1} = {{\textbf {x}}}^t\).

Note that the above expectations are with respect to the random choices of \(e^t_1, \ldots , e^t_d\), conditioned on the sparsifier being successful/unsuccessful. Now, taking another expectation with respect to the randomness of the sparsifier, we get

$$\begin{aligned} {\mathbb {E}}[{\mathcal {B}}({{\textbf {x}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t)&\ge \left( 1-\frac{1}{n_0^2}\right) (1-\alpha )\left( {\mathbb {E}}[{\mathcal {B}}({\hat{{{\textbf {x}}}}}^{t+1})] - {\mathcal {B}}({{\textbf {x}}}^t) \right) \\&\ge \left( 1-\frac{1}{n_0^2}\right) (1-\alpha )\left( 1 - e^{-\frac{d}{\tau }}\right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right)&(\text {by} (2)) \end{aligned}$$

Rearranging the above inequality gives

$$\begin{aligned} {\mathcal {B}}({{\textbf {x}}}^*) - {\mathbb {E}}[{\mathcal {B}}({{{\textbf {x}}}}^{t+1})] \le \left( 1 - \left( 1-\frac{1}{n_0^2}\right) (1-\alpha )\left( 1 - e^{-\frac{d}{\tau }}\right) \right) \left( {\mathcal {B}}({{\textbf {x}}}^*) - {\mathcal {B}}({{\textbf {x}}}^t)\right) , \end{aligned}$$

as claimed. \(\square \)

5.4.2 Running Time Analysis

The following theorem bounds the running time of the algorithm.

Theorem 2

For any \(\delta \in (0,1)\) and \(\epsilon > 0\), Dual KOSZ with batching, sparsification, and recursion finds \({{\textbf {x}}}\) with \({\mathbb {E}}\left\Vert {{{\textbf {x}}}^* - {{\textbf {x}}}}\right\Vert _{{\textbf {L}}}^2 \le \epsilon \left\Vert {{{\textbf {x}}}^*}\right\Vert _{{\textbf {L}}}^2\) in \(O(m^{1+\delta }(\log n)^{\frac{8}{\delta } }(\log \frac{1}{\epsilon })^{\frac{1}{\delta }})\) time.

Proof

By Lemma 9, it suffices to run the algorithm for K iterations, for any \(K \ge \frac{\ln \epsilon }{\ln \left( \frac{1}{5} + \frac{4}{5}e^{-d/\tau }\right) }\).

Using the inequalities \(e^{-x} \le 1 - \frac{x}{2}\) and \(\ln (1-x) \le -\frac{x}{2}\) which hold for \(x\in (0,1)\), we see that it suffices to choose \(K = \frac{5\tau }{d} \ln (1/\epsilon )\).

Recall \(\tau \le c_3m\log ^2n\), for some constant \(c_3\). Thus, we choose \(d = c_3{\bar{m}}(\log ^2n)\cdot {m}^{-\delta }.\) Here, \({\bar{m}}\) is the number of edges in the current iteration, while m is the number of edges in the topmost iteration (i.e. in the original graph G). With this choice of d, we have \(K = 5c_3m^{\delta }\ln (1/\epsilon ).\) (Note that K is the same at every level of the recursion tree.)

The work involved in each call to the algorithm consists of

  • Computing T,

  • Doing K times

    • Contracting and sparsifying to a graph with d vertices and \(a_1 d \log ^cn / \gamma ^2\) edges, for some constant \(a_1\).

    • Doing a recursive call.

If m is the number of edges in the graph at one level of the recursion tree, then at the next level, the number of edges is

$$\begin{aligned} a_1 d \log ^2n\log \log n / \gamma ^2 \le a_1c_3{\bar{m}}(\log ^2n)\cdot {m}^{-\delta }\log ^3n / \gamma ^2 = a_1c_3{\bar{m}}\cdot {m}^{-\delta }(\log n)^{5}/\gamma ^2. \end{aligned}$$

Therefore, the number of edges in a graph at level l of the recursion tree is

$$\begin{aligned} \text {edges}_l \le {m}^{1-l\delta }(a_1c_3)^l(\log n)^{5l}/\gamma ^{2l}. \end{aligned}$$

The total work required by a node at level l of the recursion tree is dominated by the sparsifier, which takes time

$$\begin{aligned} \text {work}_l = K\cdot a_2\cdot \text {edges}_l \cdot \log ^{2}n\log \log n/ \gamma ^2 \le a_2\cdot \text {edges}_l \cdot \log ^{3}n/ \gamma ^2 \end{aligned}$$

for some constant \(a_2\).

Finally, the total number of recursion tree nodes at level l is equal to \(K^l\). This implies that the total work required by all the nodes at level l of the recursion tree is equal to

$$\begin{aligned} \text {total work}_l&= \text {work}_l \cdot K^l \\&\le K\cdot a_2\cdot \text {edges}_l \cdot \log ^{3}n\cdot \frac{1}{\gamma ^2} \cdot K^l \\&\le K\cdot a_2\cdot \left( {m}^{1-l\delta }(a_1c_3)^l(\log n)^{5l}/\gamma ^{2l}\right) \cdot \log ^{3}n\cdot \frac{1}{\gamma ^2} \cdot K^l\\&= 5^{l+1}c_3^{l+1}{m}^{1+\delta }(\ln 1/\epsilon )^{l+1}a_2(a_1c_3)^l(\log n)^{(5l+3)} \cdot \frac{1}{\gamma ^{2l+2}} \\&\le A^l {m}^{1+\delta }(\log n)^{8l}(\ln 1/\epsilon )^{l+1}, \end{aligned}$$

for some numerical constant A.

To conclude, we note that the total work summed across all the levels is at most a constant factor times the total work at the maximum level, which is \(l = \frac{1}{\delta }\). \(\square \)

Remark 3

By setting \(\delta = \sqrt{\frac{8\log \log n + \log \log \frac{1}{\epsilon }}{\log m}}\) (which is the choice of \(\delta \) that minimizes \(m^{1+\delta }(\log n)^{\frac{8}{\delta }}(\log \frac{1}{\epsilon })^{\frac{1}{\delta }}\)), the running time in Theorem 2 becomes \(O\left( m \exp \left( 2\sqrt{\log m(8\log \log n + \log \log \frac{1}{\epsilon }) }\right) \right) \), which is \(O(m^{1+o(1)})\) if \(\epsilon \) is a constant.

6 Conclusion

We propose a cut-based combinatorial algorithm to solve Laplacian systems approximately. This algorithm is dual to the cycle-based algorithm by Kelner et al. [7]. We show that our algorithm converges in a near-linear number of iterations.

To achieve a near-linear running time, we would further need each iteration to run in polylogarithmic time. We give evidence against this, by presenting a reduction from the OMv conjecture. This is in contrast to the algorithm in [7], which uses a data structure such that each iteration of the algorithm runs in \(O(\log n)\) time. In order to obtain a better running time, one would need to show it is possible take advantage of the particular structure of updates in the algorithm to implement the data structure. Note that our reduction crucially needs that a very specific spanning tree (albeit with very small stretch) is chosen. Is it possible for the algorithm to choose a different small-stretch spanning tree that is amendable to a polylogarithmic time implementation?

figure f